Skip to content

Conversation

@subagonsouth
Copy link
Contributor

Change Summary

Overview

This PR add the allowance for the repoint argument to the dependency.get_upstream_dependency_inputs and dependency.get_files functions to be a list of repoint numbers. That change allows for jobs to have dependencies that span multiple pointings.

The other changes add special handling of the Hi goodtimes job. First, the potential hi goodtimes jobs get extended to past and future repoints so that a single file will trigger jobs for a range of repoints. Secondly, for each potential job Hi goodtimes job, the single repoint number gets extended to span past and future repoints so that when getting the dependencies, a range of repoints will be retrieved.

Updated Files

  • sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/batch_starter.py
    • Trigger multiple jobs across range of repoints when triggered downstream job is a hi goodtimes job.
    • description of change 2 in file 2
  • sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency.py
    • Allow for list of repoint numbers to be input to get_upstream_dependency_inputs and get_files functions so that we can retrieve dependencies across a range of repoints.
    • Add special handling of potential Hi Goodtimes jobs to expand repoint to a range of repoints for retrieving upstream dependencies.
  • tests/lambda_endpoints/test_batch_starter.py
    • Add test covering special handling of Hi Goodtimes job
  • tests/lambda_endpoints/test_dependency_api.py
    • Add test coverage for getting range of repoint files
    • Add test coverage for getting range of repoint jobs

Closes: #1029

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements custom handling for Hi Goodtimes jobs to support multi-repoint dependencies. Hi Goodtimes ancillary processing requires data from multiple repoints (past and future), unlike typical jobs that only need data from a single repoint. The implementation allows the repoint parameter to accept either a single integer or a list of integers in dependency query functions, and adds special logic to expand repoint ranges for Hi Goodtimes jobs both when triggering jobs and when retrieving dependencies.

Key changes:

  • Extended get_upstream_dependency_inputs and get_files functions to accept a list of repoint numbers
  • Added Hi Goodtimes-specific logic to expand single repoints into ranges when querying dependencies
  • Implemented multi-repoint job triggering when Hi L1B DE files arrive

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/dependency.py Added configuration constants for Hi Goodtimes repoint ranges; modified get_upstream_dependency_inputs, get_files, and get_jobs to support list of repoints; added special handling to expand repoint ranges for Hi Goodtimes jobs
sds_data_manager/lambda_code/SDSCode/pipeline_lambdas/batch_starter.py Refactored DependencyConfig import; added special handling in s3_processing_event to trigger multiple Hi Goodtimes jobs across a range of repoints when a single Hi L1B DE file arrives
tests/lambda_endpoints/test_dependency_api.py Added fixture for Hi L1B DE files across multiple repoints; added tests for querying files with single and multiple repoints; added test for Hi Goodtimes multi-repoint dependency retrieval
tests/lambda_endpoints/test_batch_starter.py Added test to verify Hi Goodtimes jobs are triggered for multiple repoints when a single Hi L1B DE file arrives

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 2303 to 2332
def test_hi_goodtimes_multi_repoint_trigger(
mock_submit_all_jobs,
mock_get_dependencies,
session,
s3_client,
monkeypatch,
):
"""Test Hi Goodtimes multi-repoint trigger logic in s3_processing_event.
When a Hi L1B DE file with repoint N arrives, it should trigger goodtimes
jobs for repoints [N-M, N+M] where M is determined by the configuration.
"""
# Monkeypatch configuration values for testing
monkeypatch.setattr(dependency, "HI_GOODTIMES_NUM_PAST_REPOINTS", 1)
monkeypatch.setattr(dependency, "HI_GOODTIMES_NUM_FUTURE_REPOINTS", 2)

mock_get_dependencies.side_effect = [
[
{
"data_source": "hi",
"data_type": "ancillary",
"descriptor": "45sensor-goodtimes",
"relationship": "UPSTREAM",
}
],
[],
]

# L1B DE and spice files are added to the DB table by the
# hi_l1b_de_repoint_files fixture
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test function is missing the hi_l1b_de_repoint_files fixture parameter. The comment on line 2332 states that "L1B DE and spice files are added to the DB table by the hi_l1b_de_repoint_files fixture", but the fixture is not included in the function parameters. This means the Hi L1B DE files will not be present in the database when the test runs, causing the test to not properly validate the multi-repoint trigger logic. Add hi_l1b_de_repoint_files to the function parameters after monkeypatch.

Copilot uses AI. Check for mistakes.
):
repoint_param = list(
range(
repoint - HI_GOODTIMES_NUM_PAST_REPOINTS,
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range calculation can produce invalid negative or zero repoint numbers. When repoint - HI_GOODTIMES_NUM_PAST_REPOINTS is less than 1 (e.g., repoint=1 and NUM_PAST=1), the range will start at 0, which may be invalid if repoint numbers start at 1. Add validation to ensure the range starts at a valid repoint number, such as using max(1, repoint - HI_GOODTIMES_NUM_PAST_REPOINTS) to prevent querying for invalid repoints.

Suggested change
repoint - HI_GOODTIMES_NUM_PAST_REPOINTS,
max(1, repoint - HI_GOODTIMES_NUM_PAST_REPOINTS),

Copilot uses AI. Check for mistakes.
target_repoint = call_args.args[4]
repoints_submitted.add(target_repoint)

# Verify we submitted for repoints 1, 2, 3
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "Verify we submitted for repoints 1, 2, 3" but should include repoint 4 to match the assertion on line 2425.

Suggested change
# Verify we submitted for repoints 1, 2, 3
# Verify we submitted for repoints 1, 2, 3, 4

Copilot uses AI. Check for mistakes.
lambda_handler(events, context)

# Process the event
# lambda_handler(events, {})
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the commented-out code. The lambda_handler is already called on line 2397, making this line redundant.

Suggested change
# lambda_handler(events, {})

Copilot uses AI. Check for mistakes.
return start_date, end_date


# ruff: noqa: PLR0912
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ruff directive noqa: PLR0912 (too many branches) is added to suppress complexity warnings. Consider adding a comment explaining why this complexity is necessary for the Hi Goodtimes special handling, or refactor the special case logic into a separate helper function to reduce the complexity of s3_processing_event.

Copilot uses AI. Check for mistakes.
# the correct way to calculate the repoint affected by the new file
# that triggered here.
for target_repoint in range(
repoint - dependency.HI_GOODTIMES_NUM_FUTURE_REPOINTS,
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range calculation can produce invalid negative or zero repoint numbers. When repoint - HI_GOODTIMES_NUM_FUTURE_REPOINTS is less than 1 (e.g., repoint=1 and NUM_FUTURE=2), the range will include repoint 0 or negative numbers, which are likely invalid. Add validation to ensure target_repoint values are >= 1, or use max(1, repoint - HI_GOODTIMES_NUM_FUTURE_REPOINTS) as the range start to prevent generating jobs for invalid repoints.

Suggested change
repoint - dependency.HI_GOODTIMES_NUM_FUTURE_REPOINTS,
max(1, repoint - dependency.HI_GOODTIMES_NUM_FUTURE_REPOINTS),

Copilot uses AI. Check for mistakes.
if call_args[0][1]["descriptor"] == "45sensor-goodtimes"
]

# Should have 3 calls for repoints 1, 2, 3
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "Should have 3 calls" but the assertion checks for 4 calls. Update the comment to match the assertion.

Suggested change
# Should have 3 calls for repoints 1, 2, 3
# Should have 4 calls for repoints 1, 2, 3, 4

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@lacoak21 lacoak21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a couple of questions related to missing repointing data.

@patch(
"sds_data_manager.lambda_code.SDSCode.pipeline_lambdas.batch_starter.submit_all_jobs"
)
def test_hi_goodtimes_multi_repoint_trigger(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth it to add a test where the job is not triggered due to one of the l1b de repoints missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I started working on this and quickly realized that there is no checking that the coverage by the retrieved dependencies is correct. I opened another PR to implement that checking.
#1041

}
],
)
def test_get_jobs_hi_goodtimes_multi_repoint(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if you called it like

science_files = get_files( session, dependency=dep, start_date=datetime(2024, 1, 1), end_date=datetime(2024, 1, 2), repoint=[1, 2], )

And there was only data for repoint 1?

@subagonsouth subagonsouth marked this pull request as draft December 17, 2025 22:37
# Special handling for Hi Goodtimes - needs L1B DE from multiple repoints
# Pass a list of repoints instead of a single repoint
repoint_param = repoint
if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate if-statement from batch starter, is it possible to combine the checks into one place?

submit_all_jobs(
session,
job,
trigger_start_time,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the start and end time need to be changed to match repoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not here. This is just submitting a potential job for each of the repoints that the trigger file would be used in. It is in the dependency code where the start/end time need to get updated when gathering upstream dependencies for a job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Create custom handling for hi goodtimes jobs

3 participants