Skip to content

fix: check daily if update is due based on UPDATE_INTERVAL#303

Open
aaronspruit wants to merge 2 commits intortuszik:devfrom
aaronspruit:fix-updates
Open

fix: check daily if update is due based on UPDATE_INTERVAL#303
aaronspruit wants to merge 2 commits intortuszik:devfrom
aaronspruit:fix-updates

Conversation

@aaronspruit
Copy link
Copy Markdown

Originally updates would only be checked if they need to happen based on the duration of UPDATE_INTERVAL. This means that if the container is restarted, it resets the timer.

This change compares the timestamp on DATA_DIR/.photon-index-updated to the UPDATE_INTERVAL on a daily basis.

I believe this was the originally intended functionality, as by default it means the container needs to be running for 30 days before an update is even attempted...instead of doing updates every UPDATE_INTERVAL.

I was very confused as to why my service hadn't updated when the last time it did so was Feb 17th and UPDATE_INTERVAL=30d (even with bouncing the container). This fix correctly identified that today it's 45 days out of date and did the update.

$ task check
task: [deadcode] uv run vulture --min-confidence 100 --exclude ".venv" .
task: [format] uv run ruff format
task: [lint] uv run ruff check --fix
task: [typecheck] uv run ty check
All checks passed!
17 files left unchanged
All checks passed!

$ task test
task: [test] uv run pytest
================================================= test session starts =================================================
platform linux -- Python 3.13.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /mnt/c/Users/rebel/repos/photon-docker
configfile: pyproject.toml
plugins: cov-7.0.0
collected 36 items                                                                                                    

tests/utils/test_regions.py ...................                                                                 [ 52%]
tests/utils/test_sanitize.py ......                                                                             [ 69%]
tests/utils/test_validate_config.py ...........                                                                 [100%]

================================================= 36 passed in 2.99s ==================================================

@aaronspruit aaronspruit changed the title fix: check if update is due daily based on UPDATE_INTERVAL fix: check daily if update is due based on UPDATE_INTERVAL Apr 3, 2026
@rtuszik
Copy link
Copy Markdown
Owner

rtuszik commented Apr 4, 2026

Thanks for the PR!

This approach is something I have considered initially. Having the schedule reset on container restart is really not ideal.

However, there are some issues with this:

  • A 24 hour polling interval as you have set it in this PR would mean that an actual update timeframe is technically between 24 and 48 hours. Thats the maximum precision this would allow us.

  • An update will be attempted on every poll if the time difference exceeds UPDATE_INTERVAL as set by the user. The marker is only updated after a successful update. If I were to set a 24 hour update interval, it would attempt an update every 24 hours. If the polling frequency was reduced to tackle the precision issue, the frequency of update attempts would increase further.

I'm happy to hear your thoughts on this as I haven't found the right approach here without totally over-engineering this.

@aaronspruit
Copy link
Copy Markdown
Author

Ah, ok, I see what you're trying to avoid. It just wasn't clear from the ENV VAR what was going on - and that the container had to be UP for the complete UPDATE_INTERVAL for it to even trigger.

* A 24 hour polling interval as you have set it in this PR would mean that an actual update timeframe is technically between 24 and 48 hours. Thats the maximum precision this would allow us.

Agreed. If you have the UPDATE_INTERVAL set to anything >1d you lose the precision of WHEN the updates fire within the day window. Unsure that this is a bad thing though (are people updating in less than a day? TBH, based on the guidance you have, I'd say there should be code to make the UPDATE_INTERVAL minimum 7d or larger). To remediate, you could change the thread to sleep for 1, instead of 86400. As the default is 30 days, and I imagine this is because you don't want people hammering the downloads, I figured to check every day would be OK. However, to make it more in-line with your existing code, simple change. Granted, as you point out below, that'd be bad from a retry perspective :)

* An update will be attempted on every poll if the time difference exceeds `UPDATE_INTERVAL` as set by the user. The marker is only updated after a successful update. If I were to set a 24 hour update interval, it would attempt an update every 24 hours. If the polling frequency was reduced to tackle the precision issue, the frequency of update attempts would increase further.

Also agreed. I guess the issue becomes do you want people to artificially lower the UPDATE_INTERVAL to something that makes sense based on restarts after they figure this out - in my case it'd be somewhere <= 7d (ultimately more load on the system happy-path) as I'm running in K8s. Or something like this where errors could cause daily retries daily. Unsure if you have a way to capture how many download issues are happening forcing retries. Anecdotally, I don't have that issue. A way to mitigate would be to do an exponential back-off based on days, that doubles with each failure. That means, if you assume the default of 30 days, you would have 4 additional tries (assuming the container doesn't get restarted, could mitigate with writing the error duration to disk). Could be a decent compromise.

Another mitigation would be around date-versioning the hosted tar and doing a lookup. I have no idea how often the dataset is actually updated on your mirror either, and so I don't know if I change UPDATE_INTERVAL to something <30d, I'm downloading the same stuff each time.

Just some ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants