Skip to content

Improving Trip Segmentation by reducing DB calls#956

Closed
humbleOldSage wants to merge 2 commits intoe-mission:masterfrom
humbleOldSage:FasterTripSegmentation
Closed

Improving Trip Segmentation by reducing DB calls#956
humbleOldSage wants to merge 2 commits intoe-mission:masterfrom
humbleOldSage:FasterTripSegmentation

Conversation

@humbleOldSage
Copy link
Contributor

The changes below that led to these performance upgrades are investigated in e-mission/e-mission-docs#1041 . They are :

  1. db calls for transition and motion dataframes are moved upstream from is_tracking_restarted_in_range function and get_ongoing_motion_in_range in restart_checking.py to trip_segmentaiton.py. The old setting which had multiple db calls ( for each iteration ) now happen once in the improved setting.

  2. All the other changes in trip_segmentation.py and dwell_segmentation_dist_filter.py are just to support the change in point 1 ( above).

  3. in dwell_segmentation_time_filter.py,other than the changes to support point 1 ( above), there an additional improvement. The calculations for last10PointsDistances and last5MinsPoints are vectorised. For this, calDistance in common.py now supports numpy arrays.

The changes  below  that led to these performance upgrades are investigated in   e-mission/e-mission-docs#1041 . They are :

1.  db calls for transition and motion dataframes are moved upstream from  `is_tracking_restarted_in_range` function  and `get_ongoing_motion_in_range` in
 `restart_checking.py` to `trip_segmentaiton.py`.  The old setting which had multiple db calls ( for each iteration ) now happen once in the improved setting.

2. All the other changes in `trip_segmentation.py` and `dwell_segmentation_dist_filter.py` are just to support the change  in point 1 ( above).

3. in `dwell_segmentation_time_filter.py`,other than the changes to support point 1 ( above), there an additional improvement.  The calculations for `last10PointsDistances` and `last5MinsPoints` are vectorised.  For this,  `calDistance` in `common.py` now supports numpy arrays.
@shankari
Copy link
Contributor

shankari commented Feb 4, 2024

in dwell_segmentation_time_filter.py,other than the changes to support point 1 ( above), there an additional improvement. The calculations for last10PointsDistances and last5MinsPoints are vectorised. For this, calDistance in common.py now supports numpy arrays.

I don't see any indication of the performance improvement related to this change in the issue. I would anticipate the change to be minimal, since len(last10PointsDistances) == 10 and vectorization is unlikely to be helpful until we reach thousands of rows. I am not sure that the additional code complexity is worth it.

Also, it would be helpful if you would link to the specific comment in the issue that documented the improvement, and even duplicate the numbers here to make it easier to look up...

@humbleOldSage
Copy link
Contributor Author

humbleOldSage commented Feb 4, 2024

I don't see any indication of the performance improvement related to this change in the issue.

Its here e-mission/e-mission-docs#1041 (comment).

I tried some pandas improvements which might be overkill ( in which case we can roll this back) which reduced overall runtime from ~2.12 to ~1.5.

I have now added a separate comment for this.
e-mission/e-mission-docs#1041 (comment)

I would anticipate the change to be minimal, since len(last10PointsDistances) == 10 and vectorization is unlikely to be helpful until we reach thousands of rows. I am not sure that the additional code complexity is worth it.

Yes, it's just for 10 rows. But, since this part falls inside the loop, it's the overall loop time that is improved as mentioned.

... and even duplicate the numbers here to make it easier to look up...

Sure.

@humbleOldSage
Copy link
Contributor Author

humbleOldSage commented Feb 4, 2024

Overall, Current AndorraWrapper runtime is ~1.6s , iOS Runtime is ~0.4s and CombinedWrapper runtime is ~2.1 ( as mentioned here e-mission/e-mission-docs#1041 (comment) )

@shankari
Copy link
Contributor

shankari commented Feb 5, 2024

@humbleOldSage there is a comment in the issue that indicates that there were some failures, but I don't see any updates on the fix. e-mission/e-mission-docs#1041 (comment)

Also, what are you testing this on? Why do we have the magic number of 327? Please also separate the vectorization from the code that actually reduces DB calls so we can evaluate each of them independently

@humbleOldSage
Copy link
Contributor Author

Did , in a separate PR #958 . Closing this one since no updates expected here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants