Skip to content

Conversation

@aaryansinhaa
Copy link
Contributor

Rationale

data.js takes up too much space, and hence too much time to load, not a scalable design. restructuring it to data_{lang}.js and data_{lang}_{slug}.js helps in lesser memory loading, as smaller chunks of data are loaded at a time.

Fixes #257

Changes

Major Change:

  • generate_datafile now uses two helper functions to generate two separate file:
    • file which acts as index for that language(data_{lang}.js)
    • file which contains detailed info about the video. (data_{lang}_{slug}.js)
  • If no language is set, it will default to english.
  • Language selection now relies on persisted storage, making ?lang query parameters optional rather than required.

locally tested with offliner definition:

ted2zim --links https://www.ted.com/talks/danit_peleg_forget_shopping_soon_you_ll_download_your_new_clothes --languages en,es,fr --name testing --output ./out

See screenshots for reference:

image image image

@kelson42 kelson42 requested a review from benoit74 December 20, 2025 15:01
@codecov
Copy link

codecov bot commented Dec 20, 2025

Codecov Report

❌ Patch coverage is 0% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 4.77%. Comparing base (62593c0) to head (c958251).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/ted2zim/scraper.py 0.00% 32 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##            main    #258      +/-   ##
========================================
- Coverage   4.86%   4.77%   -0.10%     
========================================
  Files          8       8              
  Lines       1130    1153      +23     
  Branches     248     255       +7     
========================================
  Hits          55      55              
- Misses      1074    1097      +23     
  Partials       1       1              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, looks very promising.

I did not tested it, but looking at the code it seems to be doing what we expected it to do. Will merge and test on a real Zimfarm recipe.

@benoit74 benoit74 merged commit 0797a48 into openzim:main Dec 22, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scraper UI architecture is not scalable at all

2 participants