Conversation
📝 WalkthroughWalkthroughKaggle download now returns multiple files and the app lets users pick which file to load via a new helper; README text was slightly reworded; correlation heatmap rendering now masks the upper triangle. Changes
Sequence DiagramsequenceDiagram
actor User
participant App as App (Streamlit)
participant Kaggle as KaggleHandler
participant FS as File System
participant UI as Selection_UI
User->>App: Submit dataset ID / click "Load"
App->>Kaggle: download_files(dataset_id)
Kaggle->>FS: Download dataset files
FS-->>Kaggle: Return files (1..N)
Kaggle-->>App: list[Path] of files
App->>App: Store kaggle_files, kaggle_dataset_id in session state
alt Single file
App->>App: Auto-load file into df
App-->>User: Show success (file loaded)
else Multiple files
App->>UI: Render file selector (kaggle_files)
User->>UI: Select file and click "Load Selected File"
UI-->>App: Selected file_path
App->>App: _load_specific_kaggle_file(file_path, dataset_id)
App->>FS: Read selected file
FS-->>App: Return DataFrame
App->>App: Update session state (df, input_source)
App-->>User: Show success (file + dataset)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app.py (1)
128-150: Clear stale Kaggle selection state on new attempts.If a download fails, the previous
kaggle_fileslist can linger and show unrelated options.🐛 Proposed fix
def load_kaggle_data(kaggle_input): """Handle loading of Kaggle datasets.""" + # Reset previous selection state to avoid stale UI + st.session_state.kaggle_files = None + st.session_state.kaggle_dataset_id = None try: with st.spinner("Connecting to Kaggle..."): # Extract ID if it's a URL if KaggleHandler.is_kaggle_url(kaggle_input): dataset_id = KaggleHandler.extract_dataset_id(kaggle_input) @@ - except Exception as e: + except Exception as e: + st.session_state.kaggle_files = None + st.session_state.kaggle_dataset_id = None st.error(f"Error processing Kaggle dataset: {e}")
🤖 Fix all issues with AI agents
In `@app.py`:
- Around line 192-204: The selectbox currently uses only file.name so files with
identical names get resolved to the first match; change the choice list to use a
unique identifier (e.g., full or relative path) for display and matching so the
selected item maps unambiguously to the correct object in
st.session_state.kaggle_files. Build display strings (for example
f"{file.path}/{file.name}" or any unique file attribute) into file_names and/or
keep a mapping dict from that display string to the file object, then when the
user clicks the "Load Selected File" button find the selected_path by looking up
the selected display string in that mapping (used by selected_filename and
_load_specific_kaggle_file) rather than by comparing only .name.
🧹 Nitpick comments (2)
src/dora/plots/multivariate.py (1)
52-54: Consider keeping diagonal visible in the masked heatmap.If you want self-correlation values to remain visible, use
k=1; the current mask hides the diagonal as well.♻️ Proposed tweak
- mask = np.triu(np.ones_like(corr, dtype=bool)) + mask = np.triu(np.ones_like(corr, dtype=bool), k=1)src/dora/kaggle.py (1)
68-73: Sort the file list for stable selection ordering.Filesystem iteration order is nondeterministic; sorting avoids confusing reordering between runs.
♻️ Proposed fix
- files = [ - file - for file in dataset_download_directory.glob("**/*") - if file.suffix.lower() in supported_extensions and file.is_file() - ] + files = sorted( + [ + file + for file in dataset_download_directory.glob("**/*") + if file.suffix.lower() in supported_extensions and file.is_file() + ], + key=lambda p: p.as_posix().lower(), + )
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@app.py`:
- Around line 201-208: The current mapping builds keys with unique_name =
f"{f.parent.name}/{f.name}" which can collide for files from different parent
paths; change the key to use the file's full path string (e.g., str(f)) when
populating file_mapping in the loop that iterates over files so each file
(variable f) is uniquely keyed and won't be overwritten; keep any existing
display logic separate if you still want a shorter label for UI purposes but use
str(f) as the dictionary key.
🧹 Nitpick comments (1)
app.py (1)
160-161: Consider uncommenting to clear selection state after load.Leaving
kaggle_filespopulated after a successful load means the multi-file selection UI will persist on the next rerun. Uncommenting this line (or clearing in the button handler) provides a cleaner UX where the selection UI disappears once a file is loaded.♻️ Suggested change
st.success(f"Successfully loaded '{file_path.name}' from '{dataset_id}'") - # Clear the file list selection state once loaded, if you prefer - # st.session_state.kaggle_files = None + # Clear the file list selection state once loaded + st.session_state.kaggle_files = None
Fixes the issues mentioned in #57
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.