Skip to content

Fix #556: Validate observed state name order in build_statespace_graph#650

Open
dhruvkachhela wants to merge 2 commits intopymc-devs:mainfrom
dhruvkachhela:fix-issue--#556
Open

Fix #556: Validate observed state name order in build_statespace_graph#650
dhruvkachhela wants to merge 2 commits intopymc-devs:mainfrom
dhruvkachhela:fix-issue--#556

Conversation

@dhruvkachhela
Copy link
Copy Markdown

@dhruvkachhela dhruvkachhela commented Feb 17, 2026

Description

This PR fixes Issue #556 by adding a validation check to ensure that the column order of an input pandas DataFrame matches the model's observed_state_names.

Problem

In State Space models, mapping is positional. If a user provides a DataFrame with swapped columns, the data is mapped to the wrong latent states without warning.

Changes

  • Updated PyMCStateSpace.build_statespace_graph to validate pd.DataFrame columns.
  • Added a regression test in tests/statespace/core/test-issue-#556.py.
  • The model now raises a ValueError with a descriptive message if the order is incorrect.

Verification

  • Verified that swapped columns trigger a ValueError.
  • Verified that correct ordering and NumPy arrays pass validation.

Fixes #556

…pace_graph

### Description
This PR adds a validation check to `build_statespace_graph` to ensure that the column order of an input pandas DataFrame matches the model's `observed_state_names`.

### Problem
In State Space models, the observation matrix $Z$ maps latent states to observed data based on their positional index. If a user provides a DataFrame with the correct column names but in the wrong order, the model would previously map the data to the wrong state variables without raising an error. This lead to silent, incorrect inference.

### Changes
- Updated `PyMCStateSpace.build_statespace_graph` to include a validation step for pandas DataFrames.
- The check compares `data.columns` against `self.observed_state_names`.
- Raises a `ValueError` if there is a mismatch in names or ordering, explicitly showing the user the expected order vs the received order.

### Verification
Verified the logic using a Mock subclass of `PyMCStateSpace`. Tests confirmed that:
1. Swapped columns correctly trigger a `ValueError`.
2. Correct column ordering allows the graph building to proceed.
3. Non-pandas inputs (e.g., NumPy arrays) are unaffected and continue to work as expected.

Fixes pymc-devs#556
This commit adds a dedicated test file test-issue-pymc-devs#556.py to verify the fix for Issue pymc-devs#556.

The test ensures that:

A ValueError is raised when DataFrame columns are in the wrong order.

A ValueError is raised when columns are missing.

Standard NumPy/array inputs continue to work as expected.

This provides automated verification that the validation logic in statespace.py works correctly.
@jessegrabowski
Copy link
Copy Markdown
Member

Hi @dhruvkachhela . llm generated contributions are welcome but it appears you made no effort to validate what it generated, or make it conform to the standards of the repository. This will not be merged as-is, and is simply a burden on the maintainers to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check that order of observed states coincide between build_statespace_graph and observed_state_names

2 participants