Fix #556: Validate observed state name order in build_statespace_graph#650
Open
dhruvkachhela wants to merge 2 commits intopymc-devs:mainfrom
Open
Fix #556: Validate observed state name order in build_statespace_graph#650dhruvkachhela wants to merge 2 commits intopymc-devs:mainfrom
dhruvkachhela wants to merge 2 commits intopymc-devs:mainfrom
Conversation
…pace_graph ### Description This PR adds a validation check to `build_statespace_graph` to ensure that the column order of an input pandas DataFrame matches the model's `observed_state_names`. ### Problem In State Space models, the observation matrix $Z$ maps latent states to observed data based on their positional index. If a user provides a DataFrame with the correct column names but in the wrong order, the model would previously map the data to the wrong state variables without raising an error. This lead to silent, incorrect inference. ### Changes - Updated `PyMCStateSpace.build_statespace_graph` to include a validation step for pandas DataFrames. - The check compares `data.columns` against `self.observed_state_names`. - Raises a `ValueError` if there is a mismatch in names or ordering, explicitly showing the user the expected order vs the received order. ### Verification Verified the logic using a Mock subclass of `PyMCStateSpace`. Tests confirmed that: 1. Swapped columns correctly trigger a `ValueError`. 2. Correct column ordering allows the graph building to proceed. 3. Non-pandas inputs (e.g., NumPy arrays) are unaffected and continue to work as expected. Fixes pymc-devs#556
This commit adds a dedicated test file test-issue-pymc-devs#556.py to verify the fix for Issue pymc-devs#556. The test ensures that: A ValueError is raised when DataFrame columns are in the wrong order. A ValueError is raised when columns are missing. Standard NumPy/array inputs continue to work as expected. This provides automated verification that the validation logic in statespace.py works correctly.
Member
|
Hi @dhruvkachhela . llm generated contributions are welcome but it appears you made no effort to validate what it generated, or make it conform to the standards of the repository. This will not be merged as-is, and is simply a burden on the maintainers to review. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes Issue #556 by adding a validation check to ensure that the column order of an input pandas DataFrame matches the model's
observed_state_names.Problem
In State Space models, mapping is positional. If a user provides a DataFrame with swapped columns, the data is mapped to the wrong latent states without warning.
Changes
PyMCStateSpace.build_statespace_graphto validatepd.DataFramecolumns.tests/statespace/core/test-issue-#556.py.ValueErrorwith a descriptive message if the order is incorrect.Verification
ValueError.Fixes #556