Skip to content

Conversation

@adrien-laposta
Copy link
Contributor

The purpose of this PR is to change the implementation of data-dependent filters in the current pipeline.
At the moment, only T-to-P leakage deprojection required a specific treatment outside of the processing pipeline structure.

This PR solves the issue for T-to-P leakage and for any other future processes for which we would need to have access to time ordered data that cannot be stored in the preprocessing database.

The intended use is:

  • Before filtering a set of simulations, we first load the data AxisManager to have access to time ordered data.
data_aman = pu.multilayer_load_and_preprocess(
    obs_id,
    configs_init,
    configs_proc,
    meta=meta,
    logger=logger,
    stop_for_sims=True
)

The new argument stop_for_sims allows to return a dict of AxisManager which saves the data before every steps needed to filter simulations. As an example for T-to-P leakage, data_aman will be

{(13, 'subtract_t2p'): AxisManager(timestamps[samps], ancil*[samps],..., bin_az_samps:IndexAxis(1351))}
  • This dictionary is then passed to the multilayer_load_and_preprocess_sim function, which will then use data_aman for processes having the associated flag in the process configuration: use_data_aman.
  • As explained above, this solution proposes to introduce a new boolean flag to the configuration file structure, use_data_aman to indicate which processes require to access the data. The way of using data is then specified in individual processes definition, where I already implemented the T2P leakage case.
  • Finally, I also changed the behavior of the skip_on_sims argument which no longer has a default value (False) as it was the cause of some confusion in the past.

@adrien-laposta adrien-laposta marked this pull request as draft October 10, 2025 14:48
@kwolz
Copy link
Contributor

kwolz commented Oct 27, 2025

Working example on a single signal atomic with v3 preprocessing and fp_thin=8 as usual for sims. This compares the old solution (T2P template rerunning until end of preprocessing_init, before applying azss subtraction) with the new solution. Shown here are Q_old, Q_new, U_old, U_new, Q_diff, U_diff. This difference is expected to come from building T2P from data with azss subtracted (new, correct) vs without. It is at the percent level.
image
image
image
image
image
image

Copy link
Contributor

@kwolz kwolz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good overall. Maybe @msilvafe and @mmccrackan have more comments.

out_amans = {}
loc_aman = aman.copy()
for (step, name), pipe in pipes.items():
pipe.run(loc_aman, aman.preprocess, select=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you ensure that no pipeline step is run twice, avoiding computational overhead. Nice!

@msilvafe
Copy link
Contributor

msilvafe commented Nov 4, 2025

image

What are the actual amplitude of the residuals here?--colorbar is 1% but then it seems like its not rescaled (i.e. residuals are a fraction of full cbar range). And where does the difference come from?

@kwolz
Copy link
Contributor

kwolz commented Nov 5, 2025

The difference comes from the fact that before we didn't apply azss subtraction on the AxisManager we then used for T-to-P template subtraction, now we do. This is needed because we apply azss and t2p in this order on simulations. The relative difference in map amplitude is 1% for Q and ~0.4% for U in this particular case. @adrien-laposta has done transfer function tests and found no difference.

@msilvafe
Copy link
Contributor

msilvafe commented Nov 5, 2025

The difference comes from the fact that before we didn't apply azss subtraction on the AxisManager we then used for T-to-P template subtraction, now we do. This is needed because we apply azss and t2p in this order on simulations. The relative difference in map amplitude is 1% for Q and ~0.4% for U in this particular case. @adrien-laposta has done transfer function tests and found no difference.

Ok, but a fair comparison is probably an atomic map on exactly the same preprocessing with and without these commits. Can you make such a comparison. I'm finishing the rest of the review now.

@kwolz
Copy link
Contributor

kwolz commented Nov 5, 2025

I could do that by inserting the azss step "by hand" on the data_aman after T-to-P. Probably the cleanest way would be to use a custom yaml_init_t2p that does that on the template only. Can do that

Copy link
Contributor

@msilvafe msilvafe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some inline questions/comments.

has injested flags and other information into ``proc_aman``.
data_amans: dict (Optional)
A dictionary of AxisManagers with keys (step, process.name)
filled with AxisManager processed up to step-1. This is used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear what "step-1" means here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! Here step is the index of the process in a given config file. step-1 is then the process preceding it. (i.e. the step before getting the T2P template for example)

Comment on lines +425 to +429
stop_for_sims: bool
Optinal. If True, will stop before each step of the pipeline
with the flag `use_data_aman` set to True. The intended use is
to prepare all necessary data products that cannot be stored in
the preprocessing database, to process simulations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data aman holds in the axismanager a full copy of the data for every preprocess step which has been specified with use_data_aman. This seems like it could balloon really quickly if the config file is improperly configured. Even 2 extra copies of the data gets pretty big. Perhaps there should be a check for if there's more than 2 or 3 steps which specify the data_aman that you warn the user and force them to acknowledge they're about to launch a job with ~2-3x in the normal memory usage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I only tested it for a couple of stops in the pipe. I think this is related to your other comments about only keeping what's necessary in the AxisManagers, I will propose something more memory efficient

Comment on lines 491 to 495
if stop_for_sims:
batch_idx = [
(step, process.name)
for step, process in enumerate(pipe_proc)
if process.use_data_aman
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this only include pipe_proc and ignore pipe_init?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be generalized to pipe_init. This adaptation was motivated by the way we were dealing with T2P (in proc)

for (step, name), pipe in pipes.items():
pipe.run(loc_aman, aman.preprocess, select=False)
out_amans[step, name] = loc_aman.copy()
return out_amans
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be a dictionary of numpy arrays or restricted AxisManagers that only include the fields you need to reduce memory overhead?

Copy link
Contributor Author

@adrien-laposta adrien-laposta Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of doing this but fields to keep will depend on the filter and it will need to be defined somewhere. I find it a bit too verbose to be defined in the config files, I'm happy to take suggestions on this one. Also, this implementation follows the existing behaviour, i.e. loading a full data AxisManager in the filtering script and looping over all seeds to get filtered atomics.

Comment on lines -583 to -597

if t2ptemplate_aman is not None:
# Replace Q,U with simulated timestreams
t2ptemplate_aman.wrap("demodQ", aman.demodQ, [(0, 'dets'), (1, 'samps')], overwrite=True)
t2ptemplate_aman.wrap("demodU", aman.demodU, [(0, 'dets'), (1, 'samps')], overwrite=True)

t2p_aman = t2pleakage.get_t2p_coeffs(
t2ptemplate_aman,
merge_stats=False
)
t2pleakage.subtract_t2p(
aman,
t2p_aman,
T_signal=t2ptemplate_aman.dsT
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this now called in your run script? Do you have examples of such scripts in another repo which I can look at?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, I was forced to hardcode it there for T2P leakage. The more general implementation in this PR allows me to write this piece of code in processes.py (see here). Which means that for each data-dependent process, we'll need to implement it in the corresponding process.

@kwolz
Copy link
Contributor

kwolz commented Nov 8, 2025

I could do that by inserting the azss step "by hand" on the data_aman after T-to-P. Probably the cleanest way would be to use a custom yaml_init_t2p that does that on the template only. Can do that

Here is a map-level diff of the three methods mentioned: ISOv3 (ignoring AZsub completely in the T2P template), ALP (this PR), KW (ISOv3 w/ ad-hoc AZsub before T2P estimation). The diff ALP-KW is another ~2 orders smaller than the original bias reported above. Not sure where that comes from. See also this repo folder where you can rerun all of this.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants