Skip to content

fixed sampling rate issue on compute_fbank_librispeech.py#2060

Open
Prithwin55 wants to merge 1 commit intok2-fsa:masterfrom
Prithwin55:fix/sampling
Open

fixed sampling rate issue on compute_fbank_librispeech.py#2060
Prithwin55 wants to merge 1 commit intok2-fsa:masterfrom
Prithwin55:fix/sampling

Conversation

@Prithwin55
Copy link
Copy Markdown

@Prithwin55 Prithwin55 commented Jan 20, 2026

Fixed sampling rate convertion to 16000 HZ on librispeech fbank computing to match fbank extractors expected sampling rate

Summary by CodeRabbit

  • Bug Fixes
    • Improved audio data preprocessing by ensuring proper sampling rate standardization (16000 Hz) during the initial data preparation stage, enhancing downstream feature extraction and model training reliability.

✏️ Tip: You can customize this high-level summary in your review settings.

… computing to match fbank extractors expected sampling rate
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @Prithwin55, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical sampling rate inconsistency in the Librispeech fbank computation script. By explicitly resampling audio to 16kHz, it guarantees that the feature extraction process receives data in the correct format, thereby ensuring the reliability and accuracy of the generated fbank features for downstream tasks.

Highlights

  • Sampling Rate Correction: Introduced an explicit audio resampling step to ensure the input audio for fbank computation is consistently at 16000 Hz.
  • Fbank Extractor Compatibility: The resampling ensures that the audio data matches the expected sampling rate of the fbank extractors, preventing potential mismatches or errors in feature generation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 20, 2026

📝 Walkthrough

Walkthrough

Introduces a preprocessing step that resamples audio cuts to 16000 Hz immediately after loading from manifests, before subsequent feature extraction and augmentation operations. This ensures the Fbank extractor receives audio at its expected sampling rate.

Changes

Cohort / File(s) Summary
Audio preprocessing normalization
egs/librispeech/ASR/local/compute_fbank_librispeech.py
Added resampling operation to standardize cut_set to 16000 Hz sample rate after manifest loading, ensuring downstream Fbank feature extraction receives correctly formatted audio

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

Poem

🐰 A hop, a resample, sixteen-thousand times,
The audio now matches what Fbank designs,
No more mismatched rates to cause a fuss,
Just clean sixteen kHz flowing through us! 🎵

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: fixing a sampling rate issue in compute_fbank_librispeech.py by resampling to 16000 Hz.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue with the sampling rate in compute_fbank_librispeech.py by explicitly resampling the audio to 16000 Hz to match the Fbank extractor's expectation. The change is correct and addresses the problem described. My feedback includes a suggestion to replace the hardcoded sampling rate with a constant to improve code quality and maintainability.

Comment on lines +134 to +135
logging.info(f"Resampling audio to 16000 Hz")
cut_set = cut_set.resample(16000)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sampling rate 16000 is used as a magic number in both the log message and the resample call. It's a good practice to define such values as constants to improve code readability and maintainability. This makes it easier to find and change the value if needed in the future.

For example:

SAMPLING_RATE = 16000
logging.info(f"Resampling audio to {SAMPLING_RATE} Hz")
cut_set = cut_set.resample(SAMPLING_RATE)

Ideally, this constant would be defined at a broader scope (e.g., at the top of the function) and also used when creating the Fbank extractor on line 119 to ensure consistency.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@egs/librispeech/ASR/local/compute_fbank_librispeech.py`:
- Line 134: The logging call using an unnecessary f-string should be changed to
a normal string literal: update the logging.info call in
compute_fbank_librispeech.py (the line that currently reads
logging.info(f"Resampling audio to 16000 Hz")) to remove the leading "f" so it
becomes logging.info("Resampling audio to 16000 Hz"); this eliminates the unused
f-string prefix and resolves the F541 lint warning while keeping the same log
message.

)

# Resample audio to 16kHz to match Fbank extractor's expected sampling rate
logging.info(f"Resampling audio to 16000 Hz")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Drop the unused f-string prefix.

This log line has no placeholders, so the f prefix is redundant and triggers F541.

🔧 Proposed fix
-            logging.info(f"Resampling audio to 16000 Hz")
+            logging.info("Resampling audio to 16000 Hz")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logging.info(f"Resampling audio to 16000 Hz")
logging.info("Resampling audio to 16000 Hz")
🧰 Tools
🪛 Ruff (0.14.13)

134-134: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
In `@egs/librispeech/ASR/local/compute_fbank_librispeech.py` at line 134, The
logging call using an unnecessary f-string should be changed to a normal string
literal: update the logging.info call in compute_fbank_librispeech.py (the line
that currently reads logging.info(f"Resampling audio to 16000 Hz")) to remove
the leading "f" so it becomes logging.info("Resampling audio to 16000 Hz"); this
eliminates the unused f-string prefix and resolves the F541 lint warning while
keeping the same log message.


# Resample audio to 16kHz to match Fbank extractor's expected sampling rate
logging.info(f"Resampling audio to 16000 Hz")
cut_set = cut_set.resample(16000)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain which error this is fixing?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssertionError: Fbank was instantiated for sampling_rate 16000, but sampling_rate=8000 was passed to extract().

There was mismatch in the sampling rate when some audios are not recorded in 16000 Hz

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it your own data?

All data from librispeech has a sample rate 16000 Hz.

Copy link
Copy Markdown
Author

@Prithwin55 Prithwin55 Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well i was using a custom dataset!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants