In some time series, missing values don’t seem reasonable to treat as missing at random. in my data, whether an observation was recorded appeared to depend on the state of the underlying time series, with dropouts around extreme values or rapid changes.
pymc supports missing data by treating missing observations as latent variables, which works well when missingness is unrelated to the latent process. In this case, assuming missing at random led to posteriors that smoothed through unstable regions and showed lower uncertainty exactly where the system seemed most volatile.
Conceptually, this suggests two coupled processes: a latent time series
𝑦𝑡 that exists at all time points, and a missingness indicator 𝑚𝑡 whose probability depends on 𝑦𝑡. In this setting, the absence of an observation itself carries information about the latent state.
This has been discussed in the pymc main repo (pymc-devs/pymc#8112), and it was suggested that this would fit well as a small example in pymc-examples. the goal would be a minimal, educational notebook focusing on the generative story, without relying on nan auto-imputation or proposing new api surface.
if there are no objections, Im happy to open a pr ,, the example is ready on my side.
In some time series, missing values don’t seem reasonable to treat as missing at random. in my data, whether an observation was recorded appeared to depend on the state of the underlying time series, with dropouts around extreme values or rapid changes.
pymc supports missing data by treating missing observations as latent variables, which works well when missingness is unrelated to the latent process. In this case, assuming missing at random led to posteriors that smoothed through unstable regions and showed lower uncertainty exactly where the system seemed most volatile.
Conceptually, this suggests two coupled processes: a latent time series
𝑦𝑡 that exists at all time points, and a missingness indicator 𝑚𝑡 whose probability depends on 𝑦𝑡. In this setting, the absence of an observation itself carries information about the latent state.
This has been discussed in the pymc main repo (pymc-devs/pymc#8112), and it was suggested that this would fit well as a small example in pymc-examples. the goal would be a minimal, educational notebook focusing on the generative story, without relying on nan auto-imputation or proposing new api surface.
if there are no objections, Im happy to open a pr ,, the example is ready on my side.