-
Notifications
You must be signed in to change notification settings - Fork 123
✂️ Add script to purge inactive users from decommissioned deployments #1010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| import arrow | ||
| import pymongo | ||
| import emission.core.get_database as edb | ||
| import emission.storage.timeseries.abstract_timeseries as esta | ||
| import bin.debug.common as common | ||
| from _common import run_on_all_deployments | ||
|
|
||
| SECONDS_90_DAYS = 60 * 60 * 24 * 90 | ||
|
|
||
| NOW_SECONDS = arrow.now().timestamp() | ||
|
|
||
| def find_inactive_uuids(uuids_entries): | ||
| ''' | ||
| Users are inactive if: | ||
| - no API calls in the last 90 days | ||
| AND | ||
| - no locations in the last 90 days | ||
| ''' | ||
| inactive_uuids = [] | ||
| for u in uuids_entries: | ||
| print(f'Checking activity for user {u["uuid"]}') | ||
| ts = esta.TimeSeries.get_time_series(u['uuid']) | ||
|
|
||
| last_call_ts = ts.get_first_value_for_field( | ||
| key='stats/server_api_time', | ||
| field='data.ts', | ||
| sort_order=pymongo.DESCENDING | ||
| ) | ||
| print(f'for user {u["uuid"]}, last call was {last_call_ts}') | ||
| if last_call_ts > NOW_SECONDS - SECONDS_90_DAYS: | ||
| continue | ||
|
|
||
| last_loc_ts = ts.get_first_value_for_field( | ||
| key='background/location', | ||
| field='data.ts', | ||
| sort_order=pymongo.DESCENDING | ||
| ) | ||
| print(f'for user {u["uuid"]}, last location was {last_loc_ts}') | ||
| if last_loc_ts > NOW_SECONDS - SECONDS_90_DAYS: | ||
| continue | ||
|
|
||
| print(f'User {u["uuid"]} is inactive') | ||
| inactive_uuids.append(u['uuid']) | ||
|
|
||
| return inactive_uuids | ||
|
|
||
|
|
||
| def purge_inactive_users(): | ||
| total_users = edb.get_uuid_db().count_documents({}) | ||
| print(f'Total users: {total_users}') | ||
| uuids_entries = edb.get_uuid_db().find() | ||
| print('Finding inactive users...') | ||
| inactive_uuids = find_inactive_uuids(uuids_entries) | ||
| print(f'Of {total_users} users, found {len(inactive_uuids)} inactive users:') | ||
| print(inactive_uuids) | ||
|
|
||
| print("Purging inactive users...") | ||
| for u in inactive_uuids: | ||
| print(f'Purging user {u}') | ||
| common.purge_entries_for_user(u, True) | ||
|
|
||
|
|
||
| if __name__ == '__main__': | ||
| run_on_all_deployments(purge_inactive_users) | ||
|
Comment on lines
+63
to
+64
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is useful, but I think that a first step is just a script to identify the number of inactive users in each program so that we can also identify inactive programs. Such inactive programs are:
Since the new proposed script would also use
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would this essentially be the same script but without the part where it actually removes the users?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. I would make the default be to print, and actually remove the users only when the appropriate command line arg is passed in. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JGreenlee is this consistent with the definition of "active user" in the admin dashboard and the public dashboard? I think we should ensure that our definition of "active user" is consistent throughout the platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the public dashboard, active users are the number of users that were active on a specific day
https://github.com/e-mission/em-public-dashboard/blob/00b70c96495f2bcc07837c4d89ff073bc30fdcc1/viz_scripts/generic_timeseries.ipynb#L186
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the admin dashboard, active users are users that have had activity in the last day
https://github.com/e-mission/op-admin-dashboard/blob/33d41d16ce9cd175922aa5cc6cf4264aee247911/pages/home.py#L237
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the admin dashboard also only uses the timestamp of API calls and no location timestamps
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use both API calls and locations. We can use this definition for now, but then encode this into https://github.com/JGreenlee/e-mission-common and change the others to use it.