Skip to content

#57 Fix: Bugs in the Streamlit UI#59

Merged
Asifdotexe merged 5 commits intomainfrom
57-fix-multiple-file-selection-on-streamlit-application
Jan 23, 2026
Merged

#57 Fix: Bugs in the Streamlit UI#59
Asifdotexe merged 5 commits intomainfrom
57-fix-multiple-file-selection-on-streamlit-application

Conversation

@Asifdotexe
Copy link
Copy Markdown
Owner

@Asifdotexe Asifdotexe commented Jan 23, 2026

Fixes the issues mentioned in #57

Summary by CodeRabbit

  • New Features

    • Multi-file Kaggle dataset support with an interactive file-selection flow so users can pick which file to load.
  • Bug Fixes

    • Cleaner correlation heatmaps by hiding one triangle of the matrix for improved readability.
  • Documentation

    • Minor wording update to the product description in the README.

✏️ Tip: You can customize this high-level summary in your review settings.

@Asifdotexe Asifdotexe self-assigned this Jan 23, 2026
@Asifdotexe Asifdotexe added the enhancement New feature or request label Jan 23, 2026
@Asifdotexe Asifdotexe linked an issue Jan 23, 2026 that may be closed by this pull request
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 23, 2026

📝 Walkthrough

Walkthrough

Kaggle download now returns multiple files and the app lets users pick which file to load via a new helper; README text was slightly reworded; correlation heatmap rendering now masks the upper triangle.

Changes

Cohort / File(s) Summary
Kaggle multi-file flow
app.py, src/dora/kaggle.py
API changed to return multiple files: KaggleHandler.download_files() -> list[Path]. Added static download_dataset() wrapper. App stores kaggle_files and kaggle_dataset_id in session state, defers auto-load when multiple files exist, and added _load_specific_kaggle_file(file_path, dataset_id) to load a chosen file with error handling and user feedback.
Correlation heatmap rendering
src/dora/plots/multivariate.py
Added numpy triangular mask and applied it to the seaborn heatmap to hide the upper triangle while keeping annotations and formatting.
Documentation
README.md
Minor textual change in the intro tag: "interactive command-line tool" → "interactive power-tool".

Sequence Diagram

sequenceDiagram
    actor User
    participant App as App (Streamlit)
    participant Kaggle as KaggleHandler
    participant FS as File System
    participant UI as Selection_UI

    User->>App: Submit dataset ID / click "Load"
    App->>Kaggle: download_files(dataset_id)
    Kaggle->>FS: Download dataset files
    FS-->>Kaggle: Return files (1..N)
    Kaggle-->>App: list[Path] of files
    App->>App: Store kaggle_files, kaggle_dataset_id in session state

    alt Single file
        App->>App: Auto-load file into df
        App-->>User: Show success (file loaded)
    else Multiple files
        App->>UI: Render file selector (kaggle_files)
        User->>UI: Select file and click "Load Selected File"
        UI-->>App: Selected file_path
        App->>App: _load_specific_kaggle_file(file_path, dataset_id)
        App->>FS: Read selected file
        FS-->>App: Return DataFrame
        App->>App: Update session state (df, input_source)
        App-->>User: Show success (file + dataset)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 A Rabbit's Note on Files
Multiple files hop in a row,
Pick one to load, and off we go.
Triangles masked so correlations sing,
Our power-tool hums — insights take wing. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is vague and overly generic. It uses a non-descriptive term 'Bugs in the Streamlit UI' that doesn't convey the specific nature of the changes, which primarily involve implementing multi-file selection support for Kaggle downloads. Revise the title to be more specific about the main change, such as 'Add multi-file selection support for Kaggle downloads' or 'Implement multi-file Kaggle dataset handling in Streamlit UI'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Asifdotexe Asifdotexe marked this pull request as ready for review January 23, 2026 17:14
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app.py (1)

128-150: Clear stale Kaggle selection state on new attempts.

If a download fails, the previous kaggle_files list can linger and show unrelated options.

🐛 Proposed fix
 def load_kaggle_data(kaggle_input):
     """Handle loading of Kaggle datasets."""
+    # Reset previous selection state to avoid stale UI
+    st.session_state.kaggle_files = None
+    st.session_state.kaggle_dataset_id = None
     try:
         with st.spinner("Connecting to Kaggle..."):
             # Extract ID if it's a URL
             if KaggleHandler.is_kaggle_url(kaggle_input):
                 dataset_id = KaggleHandler.extract_dataset_id(kaggle_input)
@@
-    except Exception as e:
+    except Exception as e:
+        st.session_state.kaggle_files = None
+        st.session_state.kaggle_dataset_id = None
         st.error(f"Error processing Kaggle dataset: {e}")
🤖 Fix all issues with AI agents
In `@app.py`:
- Around line 192-204: The selectbox currently uses only file.name so files with
identical names get resolved to the first match; change the choice list to use a
unique identifier (e.g., full or relative path) for display and matching so the
selected item maps unambiguously to the correct object in
st.session_state.kaggle_files. Build display strings (for example
f"{file.path}/{file.name}" or any unique file attribute) into file_names and/or
keep a mapping dict from that display string to the file object, then when the
user clicks the "Load Selected File" button find the selected_path by looking up
the selected display string in that mapping (used by selected_filename and
_load_specific_kaggle_file) rather than by comparing only .name.
🧹 Nitpick comments (2)
src/dora/plots/multivariate.py (1)

52-54: Consider keeping diagonal visible in the masked heatmap.

If you want self-correlation values to remain visible, use k=1; the current mask hides the diagonal as well.

♻️ Proposed tweak
-    mask = np.triu(np.ones_like(corr, dtype=bool))
+    mask = np.triu(np.ones_like(corr, dtype=bool), k=1)
src/dora/kaggle.py (1)

68-73: Sort the file list for stable selection ordering.

Filesystem iteration order is nondeterministic; sorting avoids confusing reordering between runs.

♻️ Proposed fix
-        files = [
-            file
-            for file in dataset_download_directory.glob("**/*")
-            if file.suffix.lower() in supported_extensions and file.is_file()
-        ]
+        files = sorted(
+            [
+                file
+                for file in dataset_download_directory.glob("**/*")
+                if file.suffix.lower() in supported_extensions and file.is_file()
+            ],
+            key=lambda p: p.as_posix().lower(),
+        )

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@app.py`:
- Around line 201-208: The current mapping builds keys with unique_name =
f"{f.parent.name}/{f.name}" which can collide for files from different parent
paths; change the key to use the file's full path string (e.g., str(f)) when
populating file_mapping in the loop that iterates over files so each file
(variable f) is uniquely keyed and won't be overwritten; keep any existing
display logic separate if you still want a shorter label for UI purposes but use
str(f) as the dictionary key.
🧹 Nitpick comments (1)
app.py (1)

160-161: Consider uncommenting to clear selection state after load.

Leaving kaggle_files populated after a successful load means the multi-file selection UI will persist on the next rerun. Uncommenting this line (or clearing in the button handler) provides a cleaner UX where the selection UI disappears once a file is loaded.

♻️ Suggested change
             st.success(f"Successfully loaded '{file_path.name}' from '{dataset_id}'")
-            # Clear the file list selection state once loaded, if you prefer
-            # st.session_state.kaggle_files = None 
+            # Clear the file list selection state once loaded
+            st.session_state.kaggle_files = None

@Asifdotexe Asifdotexe merged commit 6f5bc8c into main Jan 23, 2026
2 checks passed
@Asifdotexe Asifdotexe deleted the 57-fix-multiple-file-selection-on-streamlit-application branch January 23, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: multiple file selection on streamlit application

1 participant