Medicare Provider Analysis and Review (MedPAR) Outcomes

This repository provides a workflow generating tagged outcomes from the MedPAR denominator file using a user defined set of ICD (International Classification of Diseases) diagnosis code lists. The workflow also generates outcome counts per year or zipcode-years.

About MedPAR dataset

The MedPAR dataset contains detailed records of hospital inpatient and skilled nursing facility (SNF) stays for Medicare beneficiaries in the United States. This dataset includes key information such as dates of admission and discharge, diagnoses, procedures, and billing details, all coded using the ICD codes.

Project Overview

The MedPAR Outcomes Processing repository aims to provide a streamlined approach for hospital admissions in the MedPAR dataset using ICD codes. This process is essential for researchers and healthcare analysts who need to filter and analyze hospital data based on certain medical conditions.

Repository Structure

The repository is organized into the following directories and files:

data/: Directory containing subfolders for input, and output files. For NSAPH internal processes, the data/input/README.md includes the symlinks commands for shared secure cluster usage.
- input/: Contains MedPAR datasets that need to be processed.
- output/: The directory where the processed datasets are saved. These datasets include admission details along with the labels indicating the presence of conditions as defined by the ICD codes.
icd_codes/: Contains YAML files that lists ICD codes used to label and categorize hospital admissions. Each file contains multiple conditions for a given study.
scripts/: Python scripts for processing the MEDPAR data and applying the ICD code lists.
notes/: Includes exploratory notebooks and project-specific details.
README.md: Provides an overview of the project and instructions for usage.
requirements.yaml: Conda environment file, containing the Python package dependencies and versions required to run the scripts.

Getting Started

Clone the repository:

Clone the repository and create a conda environment.

git clone <https://github.com/<user>/repo>
cd <repo>

conda env create -f requirements.yml
conda activate <env_name>

Usage

Step 1: Prepare Your Data

Add symlinks to input, and output folders inside the corresponding /data subfolders.

For example:

export HOME_DIR=$(pwd)

cd $HOME_DIR/data/input/ .
ln -s <input_path> .

cd $HOME_DIR/data/output/
ln -s <output_path> .

The README.md files inside the /data subfolders contain path documentation for NSAPH internal purposes.

Step 2: Define ICD Code Lists

The ICD code lists are central to labeling the admissions. These lists should be defined in YAML format and placed in the icd_codes/ directory. Each file in this directory should represent a specific list of conditions and include the relevant ICD codes.

Step 3: Run the Processing Script

To process the data and label the admissions based on the ICD code lists, the script reads ICD codes from a YAML file, constructs SQL queries to match these codes against the diagnoses in the MedPAR data, and then tags each hospitalization accordingly. The tagged data is then saved in a specified format (such as Parquet, Feather, or CSV) for further analysis. Run the following command:

python src/get_outcomes.py

In addition, .sbatch templates are provided for SLURM users. Be mindful that each HPC clusters has a different configuration and the .sbatch files might need to be modified accordingly.

Step 4: Review the Output

The output of the processing script will be saved in the output/ directory. The labeled dataset will include original admission details along with additional columns indicating the presence of conditions as defined by the ICD code lists.

ICD Code Lists

# Icd list created for project ....
# Example: Dementia with Lewy bodies
lewy:
  long_name: "Dementia with Lewy Bodies"
  icd9: ['33182']
  icd10: ['G3183']

YAML Format for ICD Code Lists The ICD code lists are defined in YAML format to ensure they are easy to read and maintain. Each YAML file should have the following structure:

#Icd list created for project: This comment should precede the listing of ICD codes to indicate the purpose of the file.
lewy: This is a unique key representing the condition (in this case, Dementia with Lewy Bodies).
long_name: A human-readable name for the condition.
icd9: A list of ICD-9 codes associated with this condition.
icd10: A list of ICD-10 codes associated with this condition.

Adding New ICD Code Lists

To add a new ICD code list:

Create a New YAML File: In the icd_codes/ directory, create a new YAML file with a descriptive name (e.g., icd_code_.yml).
Add ICD Codes: Define the condition and associated ICD codes in the YAML format, as shown above.
Update the Script: Ensure that the processing script references this new ICD code list file if necessary in the argument parsing.

Output

After processing, the output will be a labeled dataset saved in the output/ directory. The dataset includes all original hospital admission details along with labels indicating the presence of conditions or procedures based on the ICD code lists.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.github/workflows		.github/workflows
conf/icd_codes		conf/icd_codes
data		data
notes		notes
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hosp_subset_outcome.R		hosp_subset_outcome.R
requirements.yml		requirements.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medicare Provider Analysis and Review (MedPAR) Outcomes

About MedPAR dataset

Table of Contents

Project Overview

Repository Structure

Getting Started

Usage

ICD Code Lists

Output

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Medicare Provider Analysis and Review (MedPAR) Outcomes

About MedPAR dataset

Table of Contents

Project Overview

Repository Structure

Getting Started

Usage

ICD Code Lists

Output

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages