fixed sampling rate issue on compute_fbank_librispeech.py#2060
fixed sampling rate issue on compute_fbank_librispeech.py#2060Prithwin55 wants to merge 1 commit intok2-fsa:masterfrom
Conversation
… computing to match fbank extractors expected sampling rate
Summary of ChangesHello @Prithwin55, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical sampling rate inconsistency in the Librispeech fbank computation script. By explicitly resampling audio to 16kHz, it guarantees that the feature extraction process receives data in the correct format, thereby ensuring the reliability and accuracy of the generated fbank features for downstream tasks. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughIntroduces a preprocessing step that resamples audio cuts to 16000 Hz immediately after loading from manifests, before subsequent feature extraction and augmentation operations. This ensures the Fbank extractor receives audio at its expected sampling rate. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request fixes an issue with the sampling rate in compute_fbank_librispeech.py by explicitly resampling the audio to 16000 Hz to match the Fbank extractor's expectation. The change is correct and addresses the problem described. My feedback includes a suggestion to replace the hardcoded sampling rate with a constant to improve code quality and maintainability.
| logging.info(f"Resampling audio to 16000 Hz") | ||
| cut_set = cut_set.resample(16000) |
There was a problem hiding this comment.
The sampling rate 16000 is used as a magic number in both the log message and the resample call. It's a good practice to define such values as constants to improve code readability and maintainability. This makes it easier to find and change the value if needed in the future.
For example:
SAMPLING_RATE = 16000
logging.info(f"Resampling audio to {SAMPLING_RATE} Hz")
cut_set = cut_set.resample(SAMPLING_RATE)Ideally, this constant would be defined at a broader scope (e.g., at the top of the function) and also used when creating the Fbank extractor on line 119 to ensure consistency.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@egs/librispeech/ASR/local/compute_fbank_librispeech.py`:
- Line 134: The logging call using an unnecessary f-string should be changed to
a normal string literal: update the logging.info call in
compute_fbank_librispeech.py (the line that currently reads
logging.info(f"Resampling audio to 16000 Hz")) to remove the leading "f" so it
becomes logging.info("Resampling audio to 16000 Hz"); this eliminates the unused
f-string prefix and resolves the F541 lint warning while keeping the same log
message.
| ) | ||
|
|
||
| # Resample audio to 16kHz to match Fbank extractor's expected sampling rate | ||
| logging.info(f"Resampling audio to 16000 Hz") |
There was a problem hiding this comment.
Drop the unused f-string prefix.
This log line has no placeholders, so the f prefix is redundant and triggers F541.
🔧 Proposed fix
- logging.info(f"Resampling audio to 16000 Hz")
+ logging.info("Resampling audio to 16000 Hz")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logging.info(f"Resampling audio to 16000 Hz") | |
| logging.info("Resampling audio to 16000 Hz") |
🧰 Tools
🪛 Ruff (0.14.13)
134-134: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In `@egs/librispeech/ASR/local/compute_fbank_librispeech.py` at line 134, The
logging call using an unnecessary f-string should be changed to a normal string
literal: update the logging.info call in compute_fbank_librispeech.py (the line
that currently reads logging.info(f"Resampling audio to 16000 Hz")) to remove
the leading "f" so it becomes logging.info("Resampling audio to 16000 Hz"); this
eliminates the unused f-string prefix and resolves the F541 lint warning while
keeping the same log message.
|
|
||
| # Resample audio to 16kHz to match Fbank extractor's expected sampling rate | ||
| logging.info(f"Resampling audio to 16000 Hz") | ||
| cut_set = cut_set.resample(16000) |
There was a problem hiding this comment.
Can you explain which error this is fixing?
There was a problem hiding this comment.
AssertionError: Fbank was instantiated for sampling_rate 16000, but sampling_rate=8000 was passed to extract().
There was mismatch in the sampling rate when some audios are not recorded in 16000 Hz
There was a problem hiding this comment.
Is it your own data?
All data from librispeech has a sample rate 16000 Hz.
There was a problem hiding this comment.
well i was using a custom dataset!
Fixed sampling rate convertion to 16000 HZ on librispeech fbank computing to match fbank extractors expected sampling rate
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.