Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions bin/historical/migrations/_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@ def run_on_all_deployments(fn_to_run):
The list of deployments (PROD_LIST) is retrieved from the
nrel-openpath-deploy-configs repo upon initialization of this module.
"""
print(f'About to run {fn_to_run.__name__} on {len(PROD_LIST)} deployments. Proceed? [y/n]')
if input() != 'y':
print("Aborting")
return
for prod in PROD_LIST:
prod_db_name = prod.replace("-", "_")
print(f"Running {fn_to_run.__name__} for {prod} on DB {prod_db_name}")
Expand Down
64 changes: 64 additions & 0 deletions bin/historical/migrations/purge_inactive_users.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import arrow
import pymongo
import emission.core.get_database as edb
import emission.storage.timeseries.abstract_timeseries as esta
import bin.debug.common as common
from _common import run_on_all_deployments

SECONDS_90_DAYS = 60 * 60 * 24 * 90

NOW_SECONDS = arrow.now().timestamp()

def find_inactive_uuids(uuids_entries):
'''
Users are inactive if:
- no API calls in the last 90 days
AND
- no locations in the last 90 days
'''
Comment on lines +13 to +18
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JGreenlee is this consistent with the definition of "active user" in the admin dashboard and the public dashboard? I think we should ensure that our definition of "active user" is consistent throughout the platform.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the public dashboard, active users are the number of users that were active on a specific day

https://github.com/e-mission/em-public-dashboard/blob/00b70c96495f2bcc07837c4d89ff073bc30fdcc1/viz_scripts/generic_timeseries.ipynb#L186

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the admin dashboard, active users are users that have had activity in the last day

https://github.com/e-mission/op-admin-dashboard/blob/33d41d16ce9cd175922aa5cc6cf4264aee247911/pages/home.py#L237

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the admin dashboard also only uses the timestamp of API calls and no location timestamps

Copy link
Copy Markdown
Contributor

@shankari shankari Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use both API calls and locations. We can use this definition for now, but then encode this into https://github.com/JGreenlee/e-mission-common and change the others to use it.

inactive_uuids = []
for u in uuids_entries:
print(f'Checking activity for user {u["uuid"]}')
ts = esta.TimeSeries.get_time_series(u['uuid'])

last_call_ts = ts.get_first_value_for_field(
key='stats/server_api_time',
field='data.ts',
sort_order=pymongo.DESCENDING
)
print(f'for user {u["uuid"]}, last call was {last_call_ts}')
if last_call_ts > NOW_SECONDS - SECONDS_90_DAYS:
continue

last_loc_ts = ts.get_first_value_for_field(
key='background/location',
field='data.ts',
sort_order=pymongo.DESCENDING
)
print(f'for user {u["uuid"]}, last location was {last_loc_ts}')
if last_loc_ts > NOW_SECONDS - SECONDS_90_DAYS:
continue

print(f'User {u["uuid"]} is inactive')
inactive_uuids.append(u['uuid'])

return inactive_uuids


def purge_inactive_users():
total_users = edb.get_uuid_db().count_documents({})
print(f'Total users: {total_users}')
uuids_entries = edb.get_uuid_db().find()
print('Finding inactive users...')
inactive_uuids = find_inactive_uuids(uuids_entries)
print(f'Of {total_users} users, found {len(inactive_uuids)} inactive users:')
print(inactive_uuids)

print("Purging inactive users...")
for u in inactive_uuids:
print(f'Purging user {u}')
common.purge_entries_for_user(u, True)


if __name__ == '__main__':
run_on_all_deployments(purge_inactive_users)
Comment on lines +63 to +64
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful, but I think that a first step is just a script to identify the number of inactive users in each program so that we can also identify inactive programs.

Such inactive programs are:

  • candidates for experimenting with the new architecture, AND
  • proactively reaching out to ask if we can shut down, even if the MOU is still active

Since the new proposed script would also use find_inactive_uuids, you can have it be a variant of this (controlled via a command line argument that you get using argparse) or a separate script that imports find_inactive_uuids

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this essentially be the same script but without the part where it actually removes the users?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

I would make the default be to print, and actually remove the users only when the appropriate command line arg is passed in.