Skip to content

Non-Test Changes for Rule Engine#1058

Open
TeachMeTW wants to merge 2 commits intoe-mission:masterfrom
TeachMeTW:pr_split_non_tests
Open

Non-Test Changes for Rule Engine#1058
TeachMeTW wants to merge 2 commits intoe-mission:masterfrom
TeachMeTW:pr_split_non_tests

Conversation

@TeachMeTW
Copy link
Contributor

@TeachMeTW TeachMeTW commented Apr 24, 2025

This PR implements two main changes:

Rule Engine Migration:

  • Migrates mode inference to use rule_engine for improved maintainability and performance
  • Enables more consistent handling of mode inference rules
  • Addresses issues with existing mode inference implementation

Transit Matching Caching:

  • Implements caching mechanism for Overpass API results in match_stops.py
  • Significantly reduces API calls to external services
  • Improves performance for repeated transit matching operations
  • Cache files are stored as JSON for easy inspection and debugging

Related Issues:

@TeachMeTW TeachMeTW changed the title Migrated to rule_engine for mode inference and added transit matching… Non-Test Changes for Rule Engine Apr 24, 2025
@TeachMeTW TeachMeTW force-pushed the pr_split_non_tests branch from b6edff9 to 529251c Compare April 25, 2025 01:52
@TeachMeTW TeachMeTW marked this pull request as ready for review April 25, 2025 02:04
@TeachMeTW TeachMeTW marked this pull request as draft April 25, 2025 20:37
@TeachMeTW TeachMeTW marked this pull request as ready for review April 29, 2025 06:23
@shankari shankari force-pushed the master branch 4 times, most recently from a2a9a44 to e50e9f3 Compare June 4, 2025 15:34
@shankari
Copy link
Contributor

@JGreenlee I am not sure we need to cache values here; the public server has been able to handle our load from production for a while. And even if we do cache, do we have to check in the cached files? Which trips do they represent? (I assume the ones from the automated tests).

And finally, we are using geofabrik on production, but it is not clear that other deployers are. If they use the public server, we will cache the results on their filesystem. Which is potentially OK, but we should then make sure that we don't run out of disk space.

Note also that in general, if the pipeline is run "serverless" it will not have access to persistent storage other than the database. And even if we do store in the database, caching values precludes getting refreshes when there are new transit stops (e.g. a new BRT line https://metrobrtproject.com/)

So basically, I think that caching the data is engineering overkill and should be implemented only if needed after the batching is in place

@JGreenlee
Copy link
Member

@shankari These changes are not supposed to have any effect on production. They are just to help us test the rule engine without putting additional burden on the Overpass API

Which trips do they represent? (I assume the ones from the automated tests)

Yes, they represent trips from real_examples that are used in the unit tests (whether run locally or in GH Actions)

Previously, we did not have any tests for the rule engine. Now that we do (awaiting merge in #1057), we will make a ton of repetitive Overpass queries every time we run tests.

@shankari
Copy link
Contributor

These changes are not supposed to have any effect on production.

But they are not checking for dev/prod, they are checking for whether we are connecting to geofabrik or to the default. And the connection is to the default (e.g. in the UNSW or the UToronto deployment), then the cached values will be both saved and loaded.

They are just to help us test the rule engine without putting additional burden on the Overpass API

So then why are they part of the "non-test changes" for the rule engine. They should be part of #1039
And even in #1039, it would be better if they would be handled through mocking instead of modifying the source code for the overpass retrieval.

e.g. something like https://medium.com/@everythingismindgame/mocking-the-network-call-with-unittest-python-51e9f42a28f9

@JGreenlee
Copy link
Member

So then why are they part of the "non-test changes" for the rule engine. They should be part of #1039

These changes were part of #1039 and you had requested it to be split into #1057 and #1058.

it would be better if they would be handled through mocking instead of modifying the source code for the overpass retrieval.

e.g. something like https://medium.com/@everythingismindgame/mocking-the-network-call-with-unittest-python-51e9f42a28f9

Yes, much better. This can easily be converted to a mock, since it basically just puts a wrapper around match_stops.make_request_and_catch

@shankari
Copy link
Contributor

shankari commented Jul 11, 2025

These changes were part of #1039 and you had requested it to be split into #1057 and #1058.

Yes, sorry, I meant that, with the split, they should be part of #1057 since they are really only required for testing and these are the non-test changes.

In fact, if we mock, I don't think that there will be any non-test changes.

@shankari shankari force-pushed the master branch 3 times, most recently from cd45974 to 30c4273 Compare September 5, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants