polars_readstat

Polars plugin for SAS (.sas7bdat), Stata (.dta), and SPSS (.sav/.zsav) files.

The Python package wraps the Rust core in polars_readstat_rs and exposes a Polars-first API. The project includes cross-library parity tests and roundtrip checks to reduce regressions.

The Rust engine is generally faster for many workloads, but performance varies by file shape and options. If you need the legacy C/C++ engine, use version 0.11.1 (see the prior version).

Why use this?

In project benchmarks, the new Rust-backed engine is typically faster than pandas/pyreadstat on large SAS/Stata files, especially for subset/filter workloads.
It avoids the older C/C++ toolchain complexity and ships as standard Python wheels.
API is Polars-first (scan_readstat, read_readstat, write_readstat, write_sas_csv_import).

Install

pip install polars-readstat

Core API

1) Lazy scan

import polars as pl
from polars_readstat import scan_readstat

lf = scan_readstat("/path/file.sas7bdat", preserve_order=True)
df = lf.select(["SERIALNO", "AGEP"]).filter(pl.col("AGEP") >= 18).collect()

2) Getting metadata

from polars_readstat import ScanReadstat

reader = ScanReadstat(path="/path/file.sav")
schema = reader.schema      # polars.Schema
metadata = reader.metadata  # dict with file info and per-column details
lf = reader.df              # LazyFrame — same as calling scan_readstat(path)

metadata is a dict with a columns list. Each column entry includes:

"name" — column name
"label" — variable label (description), if present
"value_labels" — dict mapping coded values to label strings, if present

3) Write (Experimental)

Writing support is experimental and compatibility varies across tools. Stata roundtrip tests are included; SPSS roundtrip coverage is limited. Please report issues.

from polars_readstat import write_readstat, write_sas_csv_import

write_readstat(df, "/path/out.dta")
write_readstat(df, "/path/out.sav")
write_sas_csv_import(df, "/path/out/sas_bundle", dataset_name="my_data")

write_readstat supports Stata (dta) and SPSS (sav).
Use write_sas_csv_import for SAS-ingestible output (.csv + .sas import script). Binary .sas7bdat writing is not currently supported.

Docs

View the docs at https://jrothbaum.github.io/polars_readstat/ for more information on the options you can pass to the scan and write functions.

Benchmark

Benchmarks compare four scenarios: 1) load the full file, 2) load a subset of columns (Subset:True), 3) filter to a subset of rows (Filter: True), 4) load a subset of columns and filter to a subset of rows (Subset:True, Filter: True).

Benchmark context:

Machine: AMD Ryzen 7 8845HS (16 cores), 14 GiB RAM, Linux Mint 22
Storage: external SSD
polars-readstat (rust engine v0.12.4) last run: February 24, 2026; comparison library timings for SAS/Stata (v0.11.1) last run August 31, 2025
Version tested: polars-readstat 0.12.4 (new Rust engine) against polars-readstat 0.11.1 (prior C++ and C engines) and pandas and pyreadstat
Method: wall-clock timings via Python time.time()

Compared to Pandas and Pyreadstat (using read_file_multiprocessing for parallel processing in Pyreadstat)

SAS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library	Full File	Subset: True	Filter: True	Subset: True, Filter: True
polars_readstat New rust engine	0.72 (2.9×)	0.04 (51.5×)	1.04 (2.9×)	0.04 (52.5×)
polars_readstat engine="cpp" (fastest for 0.11.1)	1.31 (1.6×)	0.09 (22.9×)	1.56 (1.9×)	0.09 (23.2×)
pandas	2.07	2.06	3.03	2.09
pyreadstat	10.75 (0.2×)	0.46 (4.5×)	11.93 (0.3×)	0.50 (4.2×)

Stata

all times in seconds (speedup relative to pandas in parenthesis below each)

Library	Full File	Subset: True	Filter: True	Subset: True, Filter: True
polars_readstat New rust engine	0.17 (6.7×)	0.12 (9.8×)	0.24 (4.1×)	0.11 (8.7×)
polars_readstat engine="readstat" (the only option for 0.11.1)	1.80 (0.6×)	0.27 (4.4×)	1.31 (0.8×)	0.29 (3.3×)
pandas	1.14	1.18	0.99	0.96
pyreadstat	7.46 (0.2×)	2.18 (0.5×)	7.66 (0.1×)	2.24 (0.4×)

SPSS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library	Full File	Subset: True	Filter: True	Subset: True, Filter: True
polars_readstat New rust engine	0.22 (6.6×)	0.15 (9.1×)	0.25 (6.0×)	0.26 (4.5×)
pandas	1.46	1.36	1.49	1.16

Detailed benchmark notes and dataset descriptions are in BENCHMARKS.md.

Tests run

Test coverage includes:

Cross-library comparisons on the pyreadstat and pandas test data, checking results against polars-readstat==0.11.1, pyreadstat, and pandas.
Stata/SPSS read/write roundtrip tests.
Large-file read/write benchmark runs on real-world data (results below).

If you want to run the same checks locally, helper scripts and tests are in scripts/ and tests/.

Name		Name	Last commit message	Last commit date
Latest commit History 379 Commits
.cargo		.cargo
.claude		.claude
.github/workflows		.github/workflows
crates		crates
docs		docs
polars_readstat/polars_readstat		polars_readstat/polars_readstat
scripts		scripts
test		test
tests		tests
.gitignore		.gitignore
.python-version		.python-version
BENCHMARKS.md		BENCHMARKS.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
explore_informative_nulls.py		explore_informative_nulls.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
rust-toolchain.toml		rust-toolchain.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

polars_readstat

Why use this?

Install

Core API

1) Lazy scan

2) Getting metadata

3) Write (Experimental)

Docs

Benchmark

Compared to Pandas and Pyreadstat (using read_file_multiprocessing for parallel processing in Pyreadstat)

SAS

Stata

SPSS

Tests run

About

Uh oh!

Releases 32

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

polars_readstat

Why use this?

Install

Core API

1) Lazy scan

2) Getting metadata

3) Write (Experimental)

Docs

Benchmark

Compared to Pandas and Pyreadstat (using read_file_multiprocessing for parallel processing in Pyreadstat)

SAS

Stata

SPSS

Tests run

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 32

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages