Skip to content

Fix stale DB redirects appearing as new assessments in quality logs#1107

Open
arnchlmcodes wants to merge 5 commits intoopenzim:mainfrom
arnchlmcodes:fix-db-redirects
Open

Fix stale DB redirects appearing as new assessments in quality logs#1107
arnchlmcodes wants to merge 5 commits intoopenzim:mainfrom
arnchlmcodes:fix-db-redirects

Conversation

@arnchlmcodes
Copy link

page_touched was being parsed into timestamp_dt directly, overwriting the project's last run cutoff before any comparison could be made. Now it Stores the parsed value in row_timestamp_dt and return None if not newer than last run

Fixes #462

@kelson42 kelson42 requested review from audiodude and Copilot and removed request for Copilot March 7, 2026 06:23
@kelson42
Copy link
Collaborator

kelson42 commented Mar 7, 2026

I love it! 4 years old challenge fixed by a two lines patch!

@codecov
Copy link

codecov bot commented Mar 7, 2026

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.73%. Comparing base (d4bedfc) to head (81f4b9b).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
wp1/logic/page.py 0.00% 3 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1107      +/-   ##
==========================================
- Coverage   92.78%   92.73%   -0.05%     
==========================================
  Files          74       74              
  Lines        4308     4310       +2     
==========================================
  Hits         3997     3997              
- Misses        311      313       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@audiodude audiodude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you just add a comparison between page_touched and timestamp_dt to the SQL query itself? Why try to compare after the row has already been retrieved?

@arnchlmcodes
Copy link
Author

@audiodude Changed the approach to check the condition within the SQL query itself, Please take a look and lmk

args_dict = {
"title": wiki_db_title,
"namespace": namespace,
"timestamp": timestamp_dt.strftime("%Y%m%d%H%M%S")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not use a string literal for the timestamp format.

As you may have noticed, WP1 has very strange, unorthodox, and confusing ways of storing timestamps in MariaDB. It comes down to the fact that:

  1. Timestamps are stored as strings, in a format such that string comparison is equivalent to comparison of corresponding TIMESTAMP types.
  2. The WP1 application database, often referenced with the variable wp10db uses a different such string fromat than the Wikipedia replica database (often referenced as wikidb).

The formatting constants are stored in constants.py: https://github.com/openzim/wp1/blob/main/wp1/constants.py#L16-L17

See an example here:
https://github.com/openzim/wp1/blob/main/wp1/models/wp10/log.py

Not that the "WP1" model of log stores one of its timestamp fields in the WP1 format, because it is native data generated by our application, and the other in the Wikipedia replica format, because it is data taken from the replica db.

Since you are querying the replica db here (wikidb), it seems you have in fact introduced a bug, because the formatting you chose looks more like the WP1 format. But please confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bot marking articles from 2016 as "new" in quality logs

3 participants