Skip to content

✂️ Add script to purge inactive users from decommissioned deployments#1010

Draft
JGreenlee wants to merge 2 commits intoe-mission:masterfrom
JGreenlee:purge-inactive-script
Draft

✂️ Add script to purge inactive users from decommissioned deployments#1010
JGreenlee wants to merge 2 commits intoe-mission:masterfrom
JGreenlee:purge-inactive-script

Conversation

@JGreenlee
Copy link
Member

The following conditions must be met before this script should be run on a program:

  • The program’s collection period (per MOU) has passed, and we have confirmed with program admins that it is ok to shut down
  • Data has already been archived in TSDC

I tested this on my local dump of nrel-commute:

(emission) jgreenle@jgreenle-34794s e-mission-server % PROD_LIST=nrel-commute DB_HOST=mongodb://localhost:27017/openpath_prod_nrel_commute PYTHONPATH=. ./e-mission-py.bash bin/historical/migrations/purge_inactive_users.py
Config file not found, returning a copy of the environment variables instead...
Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_nrel_commute', 'DB_RESULT_LIMIT': None}
Connecting to database URL mongodb://localhost:27017/openpath_prod_nrel_commute
PROD_LIST: ['nrel-commute']
About to run purge_inactive_users on 1 deployments. Proceed? [y/n]
y
Running purge_inactive_users for nrel-commute on DB nrel_commute
Config file not found, returning a copy of the environment variables instead...
Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_nrel_commute', 'DB_RESULT_LIMIT': None}
Connecting to database URL mongodb://localhost:27017/openpath_prod_nrel_commute
Total users: 51
Finding inactive users...
Checking activity for user (uuid)
for user <uuid>, last call was 1734401813.8308148
(...continued)
Of 51 users, found 46 inactive users:
(list of UUIDs)
Purging inactive users...
Purging user (uuid)
(...continued)

I also ran it a second time to ensure nothing bad happened:

Of 5 users, found 0 inactive users:
[]
Purging inactive users...

I inspected the DB and confirmed that 5 users remain.

Then, I referenced the UUIDs table on the admin dashboard and verified that those 5 users have a recent last_call_ts, while the other 46 users do not.

This gives us a chance to double-check the list of deployments before a script will be run on all of them.
Otherwise, a typo in the PROD_LIST environment variable could be potentially catastrophic

In practice, the console looks something like this:
```
PROD_LIST: ['wyoming', 'mm-masscec', 'uue', 'ccebikes', 'unc-ebike', 'uw-prs', 'uprm-civic', 'sm-ebike', 'dfc-fermata', '4core-ebike', 'godcgo', 'e-bikes-for-essentials', 'doee-electricbike-proj', 'ride2own', 'smart-commute-ebike', 'ebikethere-garfield-county', 'usaid-laos-ev', 'caeb-co', 'denver-casr', 'r2ohillsboro', 'nrel-commute', 'stm-community', 'r2omilwaukie', 'open-access', 'ca-ebike', 'r2oparkrose', 'uprm-nicr', 'durham', 'nc-transit-equity-study', 'ebikegj', 'cortezebikes', 'dcebike', 'washingtoncommons', 'choose-your-ride', 'cosa-ebike-project', 'fortmorgan']
About to run purge_inactive_users on 36 deployments. Proceed? [y/n]
```
This script identifies users in the DB who have no location data nor API calls in the last 90 days, and purges all of their data.
This should ONLY be run on inactive deployments that we are decommissioning, and that have already been archived in the TSDC.
Comment on lines +13 to +18
'''
Users are inactive if:
- no API calls in the last 90 days
AND
- no locations in the last 90 days
'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JGreenlee is this consistent with the definition of "active user" in the admin dashboard and the public dashboard? I think we should ensure that our definition of "active user" is consistent throughout the platform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the public dashboard, active users are the number of users that were active on a specific day

https://github.com/e-mission/em-public-dashboard/blob/00b70c96495f2bcc07837c4d89ff073bc30fdcc1/viz_scripts/generic_timeseries.ipynb#L186

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the admin dashboard, active users are users that have had activity in the last day

https://github.com/e-mission/op-admin-dashboard/blob/33d41d16ce9cd175922aa5cc6cf4264aee247911/pages/home.py#L237

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the admin dashboard also only uses the timestamp of API calls and no location timestamps

Copy link
Contributor

@shankari shankari Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use both API calls and locations. We can use this definition for now, but then encode this into https://github.com/JGreenlee/e-mission-common and change the others to use it.

Comment on lines +63 to +64
if __name__ == '__main__':
run_on_all_deployments(purge_inactive_users)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful, but I think that a first step is just a script to identify the number of inactive users in each program so that we can also identify inactive programs.

Such inactive programs are:

  • candidates for experimenting with the new architecture, AND
  • proactively reaching out to ask if we can shut down, even if the MOU is still active

Since the new proposed script would also use find_inactive_uuids, you can have it be a variant of this (controlled via a command line argument that you get using argparse) or a separate script that imports find_inactive_uuids

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this essentially be the same script but without the part where it actually removes the users?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

I would make the default be to print, and actually remove the users only when the appropriate command line arg is passed in.

@catarial catarial mentioned this pull request Feb 7, 2025
@shankari shankari force-pushed the master branch 4 times, most recently from a2a9a44 to e50e9f3 Compare June 4, 2025 15:34
@shankari shankari force-pushed the master branch 3 times, most recently from cd45974 to 30c4273 Compare September 5, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants