Skip to content

[not_for_merge_yet] fixing github workflows with automatic tests#2061

Open
KarelVesely84 wants to merge 6 commits intok2-fsa:masterfrom
KarelVesely84:git_workflow_fix
Open

[not_for_merge_yet] fixing github workflows with automatic tests#2061
KarelVesely84 wants to merge 6 commits intok2-fsa:masterfrom
KarelVesely84:git_workflow_fix

Conversation

@KarelVesely84
Copy link
Copy Markdown
Contributor

@KarelVesely84 KarelVesely84 commented Jan 20, 2026

  • the test egs/librispeech/ASR/zipformer/test_rknn_on_cpu_simulator_ctc_streaming.py is crashing for missing this dependency

Summary by CodeRabbit

  • Chores

    • Added two additional package dependencies to build and test environments.
  • Bug Fixes / Reliability

    • Corrected beam-search and training logic to improve decoding and duration handling.
    • Relaxed model loading to be more tolerant during averaged-model workflows.
  • API Changes

    • Simplified a batch display/save function signature (call sites updated).
  • Portability

    • Converted several absolute paths to relative references and minor formatting cleanups.
  • Tests

    • Removed an RKNN CPU-simulator CTC test file.

✏️ Tip: You can customize this high-level summary in your review settings.

- test `egs/librispeech/ASR/zipformer/test_rknn_on_cpu_simulator_ctc_streaming.py`
  is crashing for missing this dependency
@KarelVesely84
Copy link
Copy Markdown
Contributor Author

KarelVesely84 commented Jan 20, 2026

Hi Fangyun @csukuangfj ,
the github workflow test is currently crashing in egs/librispeech/ASR/zipformer/test_rknn_on_cpu_simulator_ctc_streaming.py,
the module kaldi_native_fbank is missing.

I found out that the corresponding workflow is .github/workflows/test.yml, and this uses some Docker image.

But I am not sure, which Dockerfile is used (perhaps this one .github/scripts/docker/Dockerfile ?)
And how does the docker image get updated ?
(I added the dependency kaldi_native_fbank into this file, but the "module missing" error is still there...)

I can add a pip install directly into .github/workflows/test.yml, but having it already in the Docker image would be a cleaner solution.

Best regards
Karel

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 20, 2026

📝 Walkthrough

Walkthrough

Added two pip packages to CI/Docker installs; adjusted import/path references and rewrote one prepare_transcripts file (no logic change); fixed encoder call and averaged-model loading; corrected duration check and pruning logic; removed one test file and minor whitespace edits across utilities.

Changes

Cohort / File(s) Summary
CI & Docker dependencies
/.github/workflows/test.yml, /.github/scripts/docker/Dockerfile
Added kaldi_native_fbank and rknn_toolkit2 to pip install steps in CI workflow and Dockerfile.
IWSLT transcripts path update
egs/iwslt22_ta/ASR/local/prepare_transcripts.py
Replaced absolute path with relative path (../../ST/local/prepare_transcripts.py).
ST prepare_transcripts rewrite
egs/iwslt22_ta/ST/local/prepare_transcripts.py
File rewritten in-place; semantics unchanged.
Zipformer import path
egs/iwslt22_ta/ST/zipformer/profile.py
Adjusted import resolution to ../../ASR/zipformer/profile.py.
Encoder / averaged-model load
egs/multi_conv_zh_es_ta/ST/hent_srt/export.py
Use self.model.forward_encoder(features, feature_lengths) (fix undefined var); call load_state_dict(..., strict=False) when loading averaged model.
Training logic fixes
egs/multi_conv_zh_es_ta/ST/zipformer_multijoiner_st/train.py
Fixed duration comparison to <= 0.1; simplified ST-CTC pruning condition; minor formatting changes.
Beam-search logits fix
egs/iwslt22_ta/ST/zipformer/beam_search.py
Corrected non-blank log-prob construction by aligning shift shape and combining shifted logits with blank log-prob; minor formatting and a leftover breakpoint.
Display API change
egs/iwslt22_ta/ST/zipformer/train.py
Removed sp parameter from display_and_save_batch and updated call sites to use only sp_tgt.
Test removal
egs/wenetspeech/ASR/zipformer/test_rknn_on_cpu_simulator_ctc.py
Deleted test file exercising RKNN CPU-simulator CTC workflow.
Whitespace / minor edits
icefall/utils.py, other small hunks
Whitespace and formatting-only changes; no logic changes.

Sequence Diagram(s)

(Skipped — changes do not introduce a new multi-component control flow that requires a sequence diagram.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • HENT-SRT #2026 — touches many of the same egs/multi_conv_zh_es_ta/ST files and HENT‑SRT-related modules; likely code-level overlap.

Suggested reviewers

  • csukuangfj

Poem

🐰 I hopped through Docker, pip in my paw,

kaldi and rknn snug in the draw,
Paths aligned and encoder calls neat,
Small fixes made the flow complete,
I nibble a carrot and hum a soft waw.

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ⚠️ Warning The PR title is vague and misleading. It says '[not_for_merge_yet] fixing github workflows' but the actual changes include file path refactoring, logic fixes in training scripts, and removal of test files beyond just workflow fixes. Update the title to accurately reflect the main changes, such as 'Add kaldi_native_fbank and rknn_toolkit2 dependencies; fix path references and training logic' or use a more specific title reflecting the primary intent.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@egs/iwslt22_ta/ST/local/prepare_transcripts.py`:
- Around line 1-3: Move the shebang line (#!/usr/bin/python) to the very top of
the file with no preceding bytes or comments so the shell can recognize it;
place the copyright/comment lines (e.g., "Copyright 2023 Johns Hopkins
University  (Amir Hussein)") immediately after the shebang and ensure there are
no blank lines or other characters before the shebang.
- Around line 51-63: The code builds langdirs from args.src_langdir and
optionally args.tgt_langdir but then always accesses langdirs[1], causing an
IndexError when args.tgt_langdir is None; update the block that opens transcript
files to handle the single-langdir case: if len(langdirs) == 1, open only
langdirs[0] / "transcript_words.txt" and write only src_txt for each cut (using
c.supervisions[0].text), otherwise open both langdirs[0] and langdirs[1] and
write src_txt and tgt_txt (c.supervisions[0].custom['translated_text']['eng'])
respectively. Ensure you reference the same variables (langdirs, cuts, src_txt,
tgt_txt, args.tgt_langdir) so the change integrates cleanly.
🧹 Nitpick comments (3)
egs/iwslt22_ta/ST/local/prepare_transcripts.py (2)

11-11: Remove unused pdb import.

The pdb module is imported but not used. This appears to be a leftover debug artifact.

Proposed fix
-import pdb

48-48: Use is not None for None comparison.

Per PEP 8, comparisons to None should use identity operators (is / is not) rather than equality operators.

Proposed fix
-    if args.tgt_langdir != None:
+    if args.tgt_langdir is not None:
.github/workflows/test.yml (1)

105-105: Redundant install once Docker image is updated.

Since kaldi_native_fbank is now added to the Dockerfile (line 47), this pip install will be redundant once the Docker image is rebuilt and published. Consider removing this line after confirming the updated image is available to keep the workflow clean and avoid potential version drift.

If this is intentional as a temporary fix until the image is updated, that's acceptable—just ensure it's removed afterward.

Comment on lines +1 to +3
# Copyright 2023 Johns Hopkins University (Amir Hussein)

#!/usr/bin/python
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Shebang should be at the beginning of the file.

The shebang (#!/usr/bin/python) on line 3 should come before the copyright comment on line 1 to be recognized by the shell.

Proposed fix
+#!/usr/bin/python
 # Copyright 2023 Johns Hopkins University  (Amir Hussein)
-
-#!/usr/bin/python
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Copyright 2023 Johns Hopkins University (Amir Hussein)
#!/usr/bin/python
#!/usr/bin/python
# Copyright 2023 Johns Hopkins University (Amir Hussein)
🧰 Tools
🪛 Ruff (0.14.13)

3-3: Shebang should be at the beginning of the file

(EXE005)

🤖 Prompt for AI Agents
In `@egs/iwslt22_ta/ST/local/prepare_transcripts.py` around lines 1 - 3, Move the
shebang line (#!/usr/bin/python) to the very top of the file with no preceding
bytes or comments so the shell can recognize it; place the copyright/comment
lines (e.g., "Copyright 2023 Johns Hopkins University  (Amir Hussein)")
immediately after the shebang and ensure there are no blank lines or other
characters before the shebang.

Comment on lines +51 to +63
else:
langdirs = [Path(args.src_langdir)]

for langdir in langdirs:
if not os.path.exists(langdir):
os.makedirs(langdir)

with open(langdirs[0] / "transcript_words.txt", 'w') as src, open(langdirs[1] / "transcript_words.txt", 'w') as tgt:
for c in cuts:
src_txt = c.supervisions[0].text
tgt_txt = c.supervisions[0].custom['translated_text']['eng']
src.write(src_txt + '\n')
tgt.write(tgt_txt + '\n')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

IndexError when tgt_langdir is None.

When args.tgt_langdir is None, langdirs contains only one element (line 52), but line 58 unconditionally accesses langdirs[1], causing an IndexError.

Proposed fix: handle single langdir case
-    with open(langdirs[0] / "transcript_words.txt", 'w') as src, open(langdirs[1] / "transcript_words.txt", 'w') as tgt:
-        for c in cuts:
-            src_txt = c.supervisions[0].text
-            tgt_txt = c.supervisions[0].custom['translated_text']['eng']
-            src.write(src_txt + '\n')
-            tgt.write(tgt_txt + '\n')
+    if args.tgt_langdir is not None:
+        with open(langdirs[0] / "transcript_words.txt", 'w') as src, \
+             open(langdirs[1] / "transcript_words.txt", 'w') as tgt:
+            for c in cuts:
+                src_txt = c.supervisions[0].text
+                tgt_txt = c.supervisions[0].custom['translated_text']['eng']
+                src.write(src_txt + '\n')
+                tgt.write(tgt_txt + '\n')
+    else:
+        with open(langdirs[0] / "transcript_words.txt", 'w') as src:
+            for c in cuts:
+                src_txt = c.supervisions[0].text
+                src.write(src_txt + '\n')
🤖 Prompt for AI Agents
In `@egs/iwslt22_ta/ST/local/prepare_transcripts.py` around lines 51 - 63, The
code builds langdirs from args.src_langdir and optionally args.tgt_langdir but
then always accesses langdirs[1], causing an IndexError when args.tgt_langdir is
None; update the block that opens transcript files to handle the single-langdir
case: if len(langdirs) == 1, open only langdirs[0] / "transcript_words.txt" and
write only src_txt for each cut (using c.supervisions[0].text), otherwise open
both langdirs[0] and langdirs[1] and write src_txt and tgt_txt
(c.supervisions[0].custom['translated_text']['eng']) respectively. Ensure you
reference the same variables (langdirs, cuts, src_txt, tgt_txt,
args.tgt_langdir) so the change integrates cleanly.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
egs/multi_conv_zh_es_ta/ST/zipformer_multijoiner_st/train.py (3)

1118-1205: --st-scale currently has no effect on loss computation.

alpha_st/alpha_asr are computed but never used, so the CLI option doesn’t influence training. Either remove the option or apply it to the loss terms.

🛠️ Proposed fix (apply scales to ASR/ST loss terms)
     if params.st_scale != 1:
         alpha_st = params.st_scale
-        alpha_asr = 1-params.st_scale
+        alpha_asr = 1 - params.st_scale
     else:
-        alpha_st, alpha_asr = 1, 1
+        alpha_st, alpha_asr = 1.0, 1.0
@@
-            loss += simple_loss_scale * simple_loss + pruned_loss_scale * pruned_loss
+            loss += alpha_asr * (
+                simple_loss_scale * simple_loss + pruned_loss_scale * pruned_loss
+            )
             if params.use_st_joiner:
-                loss += simple_loss_scale * st_simple_loss + pruned_loss_scale * st_pruned_loss
+                loss += alpha_st * (
+                    simple_loss_scale * st_simple_loss + pruned_loss_scale * st_pruned_loss
+                )
@@
-            loss += params.ctc_loss_scale * ctc_loss
+            loss += alpha_asr * params.ctc_loss_scale * ctc_loss
             if use_asr_cr_ctc:
-                loss += params.cr_loss_scale * cr_loss
+                loss += alpha_asr * params.cr_loss_scale * cr_loss
         if params.use_st_ctc:
-            loss += params.st_ctc_loss_scale * st_ctc_loss
+            loss += alpha_st * params.st_ctc_loss_scale * st_ctc_loss
             if use_st_cr_ctc:
-                loss += params.st_cr_loss_scale * st_cr_loss
+                loss += alpha_st * params.st_cr_loss_scale * st_cr_loss

1602-1612: Duration filter comment doesn’t match thresholds.

The comment says 1–20 seconds, but the code uses 0.1–30. Please align the comment or the thresholds.

✍️ Comment-only fix (if thresholds are intended)
-        # Keep only utterances with duration between 1 second and 20 seconds
+        # Keep only utterances with duration between 0.1 second and 30 seconds

1637-1649: ST pruning boundary is stricter than ASR branch.

ST uses T <= len(st_tokens) while ASR uses T < len(tokens). If the requirement is T >= S, this should likely match ASR.

✅ Suggested adjustment
-            if T <= len(st_tokens):
+            if T < len(st_tokens):
                 return False
egs/multi_conv_zh_es_ta/ST/hent_srt/export.py (1)

594-601: Log incompatible keys when using strict=False.

When strict=False, mismatched/unexpected weights are silently ignored. Capture and log these incompatibilities to prevent export problems from going unnoticed.

Suggested fix
-            model.load_state_dict(
+            incompatible = model.load_state_dict(
                 average_checkpoints_with_averaged_model(
                     filename_start=filename_start,
                     filename_end=filename_end,
                     device=device,
-                ), strict=False
+                ),
+                strict=False,
             )
+            if incompatible.missing_keys or incompatible.unexpected_keys:
+                logging.warning(
+                    "load_state_dict strict=False: missing=%s unexpected=%s",
+                    incompatible.missing_keys,
+                    incompatible.unexpected_keys,
+                )

@KarelVesely84 KarelVesely84 changed the title [testing if still crashes] add kaldi_native_fbank package to Docker for test.yml [not_for_merge_yet] add kaldi_native_fbank package to Docker for test.yml Jan 20, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
egs/iwslt22_ta/ST/zipformer/train.py (3)

1074-1081: Undefined sp causes NameError and signature mismatch.

The pipeline failure confirms sp is undefined. Additionally, compute_validation_loss (lines 858-864) doesn't have an sp parameter—only sp_tgt. This will crash at runtime.

🐛 Proposed fix
             valid_info = compute_validation_loss(
                 params=params,
                 model=model,
-                sp=sp,
                 sp_tgt=sp_tgt,
                 valid_dl=valid_dl,
                 world_size=world_size,
             )

1131-1136: Undefined sp used for blank_id and vocab_size initialization.

sp is never defined in this function—only sp_tgt is loaded. This will cause a NameError at runtime.

🐛 Proposed fix
     sp_tgt = spm.SentencePieceProcessor()
     sp_tgt.load(params.bpe_tgt_model)
 
     # <blk> is defined in local/train_bpe_model.py
-    params.blank_id = sp.piece_to_id("<blk>")
-    params.vocab_size = sp.get_piece_size()
+    params.blank_id = sp_tgt.piece_to_id("<blk>")
+    params.vocab_size = sp_tgt.get_piece_size()

1314-1329: Docstring still references removed sp parameter.

The docstring at lines 1328-1329 still documents the sp parameter which has been commented out from the signature.

📝 Proposed fix
     """Display the batch statistics and save the batch into disk.
 
     Args:
       batch:
         A batch of data. See `lhotse.dataset.K2SpeechRecognitionDataset()`
         for the content in it.
       params:
         Parameters for training. See :func:`get_params`.
-      sp:
-        The BPE model.
+      sp_tgt:
+        The target BPE model.
     """
🤖 Fix all issues with AI agents
In `@egs/iwslt22_ta/ST/zipformer/beam_search.py`:
- Line 1212: Remove the stray debug call `breakpoint()` inside the beam search
implementation (the one found in `modified_beam_search_hat`), delete that line
so decoding won't drop into a debugger in production, and scan the file for any
other leftover `breakpoint()` or debugging statements to remove; then run the
unit/decoding tests or lint to confirm no runtime interruptions remain.
🧹 Nitpick comments (2)
egs/iwslt22_ta/ST/zipformer/train.py (2)

795-798: Commented-out code should be removed.

The commented lines #y = sp.encode(...) and #y = k2.RaggedTensor(y)... are dead code. If no longer needed, remove them entirely rather than leaving commented code.

♻️ Proposed cleanup
     texts = batch["supervisions"]["text"]
     tgt_texts = batch["supervisions"]["tgt_text"]['eng']
-    `#y` = sp.encode(texts, out_type=int)
     y_tgt = sp_tgt.encode(tgt_texts, out_type=int)
-    `#y` = k2.RaggedTensor(y).to(device)
     y_tgt = k2.RaggedTensor(y_tgt).to(device)

1317-1318: Remove commented-out code in display_and_save_batch.

Similar to the above, these commented lines should be removed if no longer needed.

♻️ Proposed cleanup
 def display_and_save_batch(
     batch: dict,
     params: AttributeDict,
-    `#sp`: spm.SentencePieceProcessor,
     sp_tgt: spm.SentencePieceProcessor,
 ) -> None:
-    `#y` = sp.encode(supervisions["text"], out_type=int)
     y_tgt = sp_tgt.encode(supervisions["tgt_text"], out_type=int)

Also applies to: 1342-1342

# Additionally, to ensure the the probs of blank and non-blank sum to 1, we
# need to add the following term to the log-probs of non-blank symbols. This
# is equivalent to log(1 - sigmoid(logits[..., 0])).
breakpoint()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove debug breakpoint() before merging.

This breakpoint() will halt execution and drop into the debugger in production, breaking any decoding that uses modified_beam_search_hat.

🐛 Proposed fix
-        breakpoint()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
breakpoint()
🧰 Tools
🪛 Ruff (0.14.13)

1212-1212: Trace found: breakpoint used

(T100)

🤖 Prompt for AI Agents
In `@egs/iwslt22_ta/ST/zipformer/beam_search.py` at line 1212, Remove the stray
debug call `breakpoint()` inside the beam search implementation (the one found
in `modified_beam_search_hat`), delete that line so decoding won't drop into a
debugger in production, and scan the file for any other leftover `breakpoint()`
or debugging statements to remove; then run the unit/decoding tests or lint to
confirm no runtime interruptions remain.

@KarelVesely84 KarelVesely84 changed the title [not_for_merge_yet] add kaldi_native_fbank package to Docker for test.yml [not_for_merge_yet] fixing github workflows with automatic tests Jan 20, 2026
@csukuangfj
Copy link
Copy Markdown
Collaborator

Hi Fangyun @csukuangfj , the github workflow test is currently crashing in egs/librispeech/ASR/zipformer/test_rknn_on_cpu_simulator_ctc_streaming.py, the module kaldi_native_fbank is missing.

I found out that the corresponding workflow is .github/workflows/test.yml, and this uses some Docker image.

But I am not sure, which Dockerfile is used (perhaps this one .github/scripts/docker/Dockerfile ?) And how does the docker image get updated ? (I added the dependency kaldi_native_fbank into this file, but the "module missing" error is still there...)

I can add a pip install directly into .github/workflows/test.yml, but having it already in the Docker image would be a cleaner solution.

Best regards Karel

Hi Karel,

thanks for your contribution!

this uses some Docker image.

Yes, you are right.

The exact image is given below

image: ghcr.io/${{ github.repository_owner }}/icefall:cpu-py${{ matrix.python-version }}-torch${{ matrix.torch-version }}-v${{ matrix.version }}

where python-version, torch-version and version are generated by

python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10"

which uses the code below:

version = "20250630"
# torchaudio 2.5.0 does not support python 3.13
python_version = ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13"]
torch_version = []
torch_version += ["1.13.0", "1.13.1"]
torch_version += ["2.0.0", "2.0.1"]
torch_version += ["2.1.0", "2.1.1", "2.1.2"]
torch_version += ["2.2.0", "2.2.1", "2.2.2"]
# Test only torch >= 2.3.0
torch_version += ["2.3.0", "2.3.1"]
torch_version += ["2.4.0"]
torch_version += ["2.4.1"]
torch_version += ["2.5.0"]
torch_version += ["2.5.1"]
torch_version += ["2.6.0", "2.7.0", "2.7.1"]

See also
https://github.com/k2-fsa/icefall/pkgs/container/icefall
Screenshot 2026-01-21 at 09 52 48


But I am not sure, which Dockerfile is used

Yes, you are right. It is
https://github.com/k2-fsa/icefall/blob/master/.github/scripts/docker/Dockerfile

And how does the docker image get updated ?

It is updated by
https://github.com/k2-fsa/icefall/blob/master/.github/workflows/build-cpu-docker.yml

I usually trigger it manually due to

Screenshot 2026-01-21 at 09 55 53

Perhaps you don't have permisison to trigger it in k2-fsa/icefall repo, but you should have permission to do that in your own fork. If you want to test that, make sure you remove the following line

if: github.repository_owner == 'csukuangfj' || github.repository_owner == 'k2-fsa'

but having it already in the Docker image would be a cleaner solution

Yes, I agree with you.


I will try to fix all CI failures in the following days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants