U/kkasp/tron 2240 delete pickles#1063

Open

KaspariK wants to merge 3 commits intomasterfrom

u/kkasp/TRON-2240-delete-pickles

Member

KaspariK commented Jun 26, 2025 •

edited

Loading

Problem

We've been saving both pickles and JSON as part of our "safe migration" off of pickle data. Before we can remove pickles (and ultimately delete Mesos + hella old things) we need to stop saving pickles. This change removes the writing pipeline for pickles.

Solution

Generally, just stop writing pickles. This includes:

Removing read_json toggles from everywhere except the actual config schema
Upping the partition sizes now that we only write half the data we used to

Some additional "just 'cause" changes:

Updating moto to a version that has actual transact_write_items support so we stop rolling our own
Refactor basically all tests for DynamoDB state
Add some more explicit failures to the restore path that should prevent Tron amnesia state (run nums resetting to 0)

Note: I'll test restoring on some more tables, but so far so good. I think it's pretty reviewable as is. Ideally we do some more read_json cleanup, but for now this is a safe approach. One annoying thing is writing to json_val and num_json_partitions, but that's just the way the cookie crumbles.

Extra note: This will not be rollback safe (we can technically rollback with restored data, but generally I think it's best to consider this release not pleasantly undone). Full steam ahead 🚂.

KaspariK added 2 commits

June 25, 2025 14:27


          Purge pickling pathways

cb0232d


          Modernize dynamodb/state store tests (use a newer version of moto tha…

fd4662f

…t handles transactions for us + refactor tests)

KaspariK commented

View reviewed changes

requirements.txt

    
              mock==3.0.5

              moto==1.3.13

              more-itertools==10.5.0

              moto==2.0.8

Member Author

KaspariK Jun 26, 2025 •

edited

Loading

This gets us transact_write_items. There is probably value to going a bit further since this is still quite old and some of these older versions have older limits (e.g. a newer version has support for 100 items in a transaction, which we recently removed from Tron, though I'm sure there are other things 🤷), but I was able to plug this version in without much pain so I stopped there.

nemacysts reviewed

View reviewed changes

tron/serialize/runstate/dynamodb_state_store.py Outdated Show resolved Hide resolved


          Remove read_json from statemanager. Fail fast on bad data when mergin…

c43f913

…g items on restore

KaspariK marked this pull request as ready for review

June 27, 2025 20:10

KaspariK requested a review from a team as a code owner

June 27, 2025 20:10

nemacysts reviewed

View reviewed changes

tests/serialize/runstate/dynamodb_state_store_test.py

-              def small_object():
-                  yield {
+              def job_state_object():
+                  return {"enabled": True, "run_nums": [0, 1]}

Member

nemacysts Jun 27, 2025

it's nice to always yield in fixtures since that's one less change to make if we ever need some post-fixture cleanup - but this is also fine as-is :)

tests/serialize/runstate/dynamodb_state_store_test.py

    
              @pytest.mark.usefixtures("store", "small_object", "large_object")

              @pytest.mark.usefixtures("store", "job_state_object", "small_jobrun_state_object", "large_jobrun_state_object")

Member

nemacysts Jun 27, 2025

just curious (in case you know): do we need pytest.mark.usefixtures here? seems like we actually refer to the fixtures in our tests and, outside of the store fixture, the other fixtures only return data

tests/serialize/runstate/dynamodb_state_store_test.py

Comment on lines +95 to +101

+                  @pytest.mark.parametrize(
+                      "object_fixture_name",
+                      ["small_jobrun_state_object", "large_jobrun_state_object"],
+                  )
+                  def test_save_and_restore(self, store, object_fixture_name, request):
+                      """Verify that objects of different sizes can be saved and restored correctly."""
+                      state_object = request.getfixturevalue(object_fixture_name)

Member

nemacysts Jun 27, 2025

Suggested change

      
                @pytest.mark.parametrize(
          
                    "object_fixture_name",
          
                    ["small_jobrun_state_object", "large_jobrun_state_object"],
          
                )
          
                def test_save_and_restore(self, store, object_fixture_name, request):
          
                    """Verify that objects of different sizes can be saved and restored correctly."""
          
                    state_object = request.getfixturevalue(object_fixture_name)
          
                @pytest.mark.parametrize(
          
                    "state_object",
          
                    [small_jobrun_state_object, large_jobrun_state_object],
          
                )
          
                def test_save_and_restore(self, store, state_object, request):
          
                    """Verify that objects of different sizes can be saved and restored correctly."""

disclaimer: i haven't test that - but shouldn't we be able to use the fixtures directly here rather than using the getfixturevalue indirection?

tests/serialize/runstate/dynamodb_state_store_test.py

Comment on lines +113 to +118

+                      if object_fixture_name == "small_jobrun_state_object":
+                          assert num_partitions == 1
+                      else:
+                          # For large objects, we expect multiple partitions, the number of partitions can vary depending on our object size
+                          # and the DynamoDB partition size limit. We assert that there is more than one partition.
+                          assert num_partitions > 1

Member

nemacysts Jun 27, 2025

we should try to avoid conditionals in tests - that said, not sure how to handle this specific case without having one of the parameterized values to pass through be a lambda or something :p

tests/serialize/runstate/dynamodb_state_store_test.py

    
                          "UnprocessedKeys": {

                              store.name: {

                                  "Keys": [{"key": {"S": store.build_key("job_state", 0)}, "index": {"N": "0"}}],

                              table_name: {

Member

nemacysts Jun 27, 2025

it's probably more future-proof to keep using store.name (and would remove a fixture from this test) - but this isn't the end of the world either :p

tron/serialize/runstate/dynamodb_state_store.py

                       return f"{type} {iden}"
-                  def restore(self, keys, read_json: bool = False) -> dict:
+                  # KKASP: removed read_json so I think we can remove this parameter from the config

Member

nemacysts Jun 27, 2025

Suggested change

      
                # KKASP: removed read_json so I think we can remove this parameter from the config

tron/serialize/runstate/dynamodb_state_store.py

+                              try:
+                                  json_items[key] += partition_item["json_val"]["S"]
+                              except KeyError:
+                                  log.critical(

Member

nemacysts Jun 27, 2025

does log.critical print the traceback if it exists? if not, might be nice to change this to log.exception() to ensure that folks have a line number to jump to from the traceback :)

tron/serialize/runstate/dynamodb_state_store.py

                                   time.sleep(5)
                                   continue
                               with self.save_lock:
+                                  # KKASP: what is this again?

Member

nemacysts Jun 27, 2025

Suggested change

      
                                # KKASP: what is this again?
          
                                # TODO: what is this again?

Member

nemacysts Jun 27, 2025

and isn't this the delete shenanigans we do in tron?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet