Skip to content

ENH: Migrate numpydoc validation to sphinx#64595

Open
tuhinsharma121 wants to merge 6 commits intopandas-dev:mainfrom
tuhinsharma121:migrate-numpydoc-validation-to-sphinx
Open

ENH: Migrate numpydoc validation to sphinx#64595
tuhinsharma121 wants to merge 6 commits intopandas-dev:mainfrom
tuhinsharma121:migrate-numpydoc-validation-to-sphinx

Conversation

@tuhinsharma121
Copy link
Contributor

@tuhinsharma121 tuhinsharma121 commented Mar 14, 2026

Summary

  • Removes the standalone scripts/validate_docstrings.py (480 lines) and its test suite scripts/tests/test_validate_docstrings.py by integrating all pandas-specific docstring checks directly into the Sphinx build via a monkey-patch of numpydoc.validate in doc/source/conf.py.
  • Enables numpydoc_validation_checks = {"all"} with an explicit numpydoc_validation_exclude set of 171 fully-qualified object names (replacing the previous 25 regex wildcards), so every public API object is validated during the documentation build.
  • Removes the separate ci/code_checks.sh docstrings CI step and its corresponding GitHub Actions workflow entry, since validation is now handled entirely by Sphinx.

Motivation

(following up on #64565 (comment))

Previously, pandas maintained two independent validation paths: numpydoc's built-in Sphinx checks and a custom standalone script (scripts/validate_docstrings.py) that enforced four pandas-specific rules (GL04, PD01, SA05, EX04). This had several downsides:

  1. Duplicate infrastructure -- two separate validation tools, each with their own exclusion lists and test suites.
  2. Inconsistent results -- a docstring could pass the standalone script but fail during the Sphinx build, or vice versa.
  3. Maintenance burden -- nearly 1,000 lines of custom validation code and tests to maintain.

By monkey-patching numpydoc.validate.validate in conf.py, all checks (standard numpydoc + pandas-specific) now run in a single pass during the Sphinx build. Contributors only need one command to validate a docstring.

Changes

doc/source/conf.py

  • Added monkey-patch that injects four custom error codes (GL04, PD01, SA05, EX04) into numpydoc.validate.ERROR_MSGS and wraps the validate function.
  • Added numpydoc_validation_checks = {"all"} to enable comprehensive validation.
  • Added numpydoc_validation_exclude with 171 explicit fully-qualified entries replacing the previous regex wildcards for objects that legitimately cannot have numpydoc-compliant docstrings (e.g., Jinja2 template attributes) or are missed out (e.g. offset .base/.rollback/.rollforward methods, ExtensionDtype stubs).

scripts/validate_docstrings.py (deleted)

scripts/tests/test_validate_docstrings.py (deleted)

ci/code_checks.sh

  • Removed the ### DOCSTRINGS ### section that invoked scripts/validate_docstrings.py.
  • Removed docstrings from the usage instructions and argument validation.
  • Removed the now-unused BASE_DIR variable.

.github/workflows/code-checks.yml

  • Removed the Run docstring validation step (ci/code_checks.sh docstrings).

doc/source/development/contributing_documentation.rst

  • Replaced references to scripts/validate_docstrings.py with the single recommended command for validating a docstring:
    python doc/make.py --warnings-are-errors --no-browser --single pandas.DataFrame.mean
  • Updated the description of docstring validation to explain that all checks now run during the Sphinx build.

doc/source/development/contributing_codebase.rst

  • Updated the description of ci/code_checks.sh to reflect that standard numpydoc validation is now enforced during the Sphinx build.

Test plan

  • Run python doc/make.py --warnings-are-errors --no-browser --single pandas.DataFrame.mean and verify it completes without errors.
  • Deliberately introduce a SA05 violation (e.g., add pandas.Series.rename to a See Also section) and verify the single-page build catches it.
  • Run ./ci/code_checks.sh and verify all remaining checks (code, doctests, single-docs, notebooks) pass without the removed docstrings step.
  • Run a full doc build (cd doc && python make.py --num-jobs 1 html) and verify no regressions in validation warnings.

@tuhinsharma121 tuhinsharma121 force-pushed the migrate-numpydoc-validation-to-sphinx branch from 98f13e6 to 5e7abea Compare March 14, 2026 17:55
@tuhinsharma121 tuhinsharma121 marked this pull request as ready for review March 14, 2026 20:05
@tuhinsharma121 tuhinsharma121 marked this pull request as draft March 15, 2026 06:18
@tuhinsharma121 tuhinsharma121 force-pushed the migrate-numpydoc-validation-to-sphinx branch 2 times, most recently from a60e90b to 87bb56a Compare March 15, 2026 08:50
@tuhinsharma121 tuhinsharma121 marked this pull request as ready for review March 15, 2026 09:44
@tuhinsharma121 tuhinsharma121 force-pushed the migrate-numpydoc-validation-to-sphinx branch from 87bb56a to 1af9932 Compare March 16, 2026 05:48
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Generally looking good

@mroeschke mroeschke added the Docs label Mar 16, 2026
tuhinsharma121 and others added 6 commits March 16, 2026 23:00
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
@tuhinsharma121 tuhinsharma121 force-pushed the migrate-numpydoc-validation-to-sphinx branch from 50c9384 to a60b01b Compare March 16, 2026 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants