Skip to content

Commit 95007ed

Browse files
authored
chore: Improve repo review compliance (#237)
* chore: Add additional pre-commit hooks for better repo-review compliance * Restore version 2 identifier to rtd config
1 parent 99eea88 commit 95007ed

27 files changed

+455
-366
lines changed

.github/dependabot.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ updates:
55
schedule:
66
interval: "weekly"
77
groups:
8-
actions:
9-
patterns:
10-
- "*"
8+
actions:
9+
patterns:
10+
- "*"
1111
- package-ecosystem: "pip"
1212
directory: "/"
1313
schedule:

.github/workflows/ci.yml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,22 +2,21 @@ name: CI
22

33
on:
44
push:
5-
branches: [ main ]
5+
branches: [main]
66

77
pull_request:
8-
branches: [ main ]
8+
branches: [main]
99

1010
concurrency:
1111
group: ${{ github.workflow }}-${{ github.ref }}
1212
cancel-in-progress: true
1313

1414
jobs:
15-
1615
Test:
1716
runs-on: ubuntu-latest
1817
strategy:
1918
matrix:
20-
python-version: [ "3.10", "3.11", "3.12", "3.13" ]
19+
python-version: ["3.10", "3.11", "3.12", "3.13"]
2120
steps:
2221
- uses: actions/checkout@v6
2322
- name: Set up Python ${{ matrix.python-version }}

.pre-commit-config.yaml

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
ci:
2+
autoupdate_schedule: monthly
3+
autoupdate_commit_msg: "chore: Update pre-commit hooks"
4+
autofix_commit_msg: "style: Pre-commit fixes"
15
repos:
26
- repo: https://github.com/pre-commit/pre-commit-hooks
37
rev: v6.0.0
@@ -11,6 +15,22 @@ repos:
1115
- repo: https://github.com/astral-sh/ruff-pre-commit
1216
rev: v0.14.7
1317
hooks:
14-
- id: ruff
18+
- id: ruff-check
1519
types_or: [python, pyi, jupyter]
1620
args: [--fix, --show-fixes, --exit-non-zero-on-fix]
21+
- id: ruff-format
22+
23+
- repo: https://github.com/pre-commit/pygrep-hooks
24+
rev: v1.10.0
25+
hooks:
26+
- id: python-no-log-warn
27+
- id: rst-backticks
28+
- id: rst-directive-colons
29+
- id: rst-inline-touching-normal
30+
- id: text-unicode-replacement-char
31+
32+
- repo: https://github.com/rbubley/mirrors-prettier
33+
rev: v3.7.3
34+
hooks:
35+
- id: prettier
36+
args: ["--cache-location=.prettier_cache/cache"]

.readthedocs.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
# Read the Docs configuration file
33
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
44

5+
version: 2
6+
57
build:
68
os: ubuntu-24.04
79
tools:

CHANGES.md

Lines changed: 130 additions & 94 deletions
Large diffs are not rendered by default.

CITATION.cff

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@ cff-version: 1.2.0
22
type: software
33
title: bioframe
44
license: MIT
5-
repository-code: 'https://github.com/open2c/bioframe'
5+
repository-code: "https://github.com/open2c/bioframe"
66
message: >-
77
If you use this software, please cite it using the
88
metadata from this file.
99
authors:
1010
- given-names: Nezar
1111
family-names: Abdennur
12-
orcid: 'https://orcid.org/0000-0001-5814-0864'
12+
orcid: "https://orcid.org/0000-0001-5814-0864"
1313
- given-names: Geoffrey
1414
family-names: Fudenberg
1515
orcid: "https://orcid.org/0000-0001-5905-6517"
@@ -57,7 +57,7 @@ preferred-citation:
5757
- family-names: Open2C
5858
- given-names: Nezar
5959
family-names: Abdennur
60-
orcid: 'https://orcid.org/0000-0001-5814-0864'
60+
orcid: "https://orcid.org/0000-0001-5814-0864"
6161
- given-names: Geoffrey
6262
family-names: Fudenberg
6363
orcid: "https://orcid.org/0000-0001-5905-6517"

CONTRIBUTING.md

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,19 @@
11
# Contributing
22

3-
43
## General guidelines
54

65
If you haven't contributed to open-source before, we recommend you read [this excellent guide by GitHub on how to contribute to open source](https://opensource.guide/how-to-contribute). The guide is long, so you can gloss over things you're familiar with.
76

87
If you're not already familiar with it, we follow the [fork and pull model](https://help.github.com/articles/about-collaborative-development-models) on GitHub. Also, check out this recommended [git workflow](https://www.asmeurer.com/git-workflow/).
98

10-
119
## Contributing Code
1210

1311
This project has a number of requirements for all code contributed.
1412

15-
* We follow the [PEP-8 style](https://www.python.org/dev/peps/pep-0008/) convention.
16-
* We use [NumPy-style docstrings](https://numpydoc.readthedocs.io/en/latest/format.html).
17-
* It's ideal if user-facing API changes or new features have documentation added.
18-
* It is best if all new functionality and/or bug fixes have unit tests added with each use-case.
19-
13+
- We follow the [PEP-8 style](https://www.python.org/dev/peps/pep-0008/) convention.
14+
- We use [NumPy-style docstrings](https://numpydoc.readthedocs.io/en/latest/format.html).
15+
- It's ideal if user-facing API changes or new features have documentation added.
16+
- It is best if all new functionality and/or bug fixes have unit tests added with each use-case.
2017

2118
## Setting up Your Development Environment
2219

@@ -96,7 +93,6 @@ This will build the documentation and serve it on a local http server which list
9693

9794
Documentation from the `main` branch and tagged releases is automatically built and hosted on [readthedocs](https://readthedocs.org/).
9895

99-
10096
## Acknowledgments
10197

10298
This document is based off of the [guidelines from the sparse project](https://github.com/pydata/sparse/blob/master/docs/contributing.rst).

README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ Bioframe enables flexible and scalable operations on genomic interval dataframes
1414

1515
Bioframe is built directly on top of [Pandas](https://pandas.pydata.org/). Bioframe provides:
1616

17-
* A variety of genomic interval operations that work directly on dataframes.
18-
* Operations for special classes of genomic intervals, including chromosome arms and fixed-size bins.
19-
* Conveniences for diverse tabular genomic data formats and loading genome assembly summary information.
17+
- A variety of genomic interval operations that work directly on dataframes.
18+
- Operations for special classes of genomic intervals, including chromosome arms and fixed-size bins.
19+
- Conveniences for diverse tabular genomic data formats and loading genome assembly summary information.
2020

2121
Read the [documentation](https://bioframe.readthedocs.io/en/latest/), including the [guide](https://bioframe.readthedocs.io/en/latest/guide-intervalops.html), as well as the [publication](https://doi.org/10.1093/bioinformatics/btae088) for more information.
2222

@@ -34,10 +34,10 @@ pip install bioframe
3434

3535
Interested in contributing to bioframe? That's great! To get started, check out the [contributing guide](https://github.com/open2c/bioframe/blob/main/CONTRIBUTING.md). Discussions about the project roadmap take place on the [Open2C Discord](https://discord.com/invite/qVfSbDYHNG) server and regular developer meetings scheduled there. Anyone can join and participate!
3636

37-
3837
## Interval operations
3938

4039
Key genomic interval operations in bioframe include:
40+
4141
- `overlap`: Find pairs of overlapping genomic intervals between two dataframes.
4242
- `closest`: For every interval in a dataframe, find the closest intervals in a second dataframe.
4343
- `cluster`: Group overlapping intervals in a dataframe into clusters.
@@ -46,6 +46,7 @@ Key genomic interval operations in bioframe include:
4646
Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: `coverage`, `expand`, `merge`, `select`, and `subtract`.
4747

4848
To `overlap` two dataframes, call:
49+
4950
```python
5051
import bioframe as bf
5152

@@ -62,8 +63,8 @@ For these two input dataframes, with intervals all on the same chromosome:
6263
<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/overlap_inner_0.png" width=60%>
6364
<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/overlap_inner_1.png" width=60%>
6465

65-
6666
To `merge` all overlapping intervals in a dataframe, call:
67+
6768
```python
6869
import bioframe as bf
6970

@@ -90,12 +91,12 @@ ctcf_motif_calls = bioframe.read_table(jaspar_url, schema='jaspar', skiprows=1)
9091
```
9192

9293
## Tutorials
93-
See this [jupyter notebook](https://github.com/open2c/bioframe/tree/master/docs/tutorials/tutorial_assign_motifs_to_peaks.ipynb) for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.
9494

95+
See this [jupyter notebook](https://github.com/open2c/bioframe/tree/master/docs/tutorials/tutorial_assign_motifs_to_peaks.ipynb) for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.
9596

9697
## Citing
9798

98-
If you use ***bioframe*** in your work, please cite:
99+
If you use **_bioframe_** in your work, please cite:
99100

100101
```bibtex
101102
@article{bioframe_2024,

docs/api-resources.rst

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Bioframe provides a collection of genome assembly metadata for commonly used
88
genomes. These are accessible through a convenient dataclass interface via :func:`bioframe.assembly_info`.
99

1010
The assemblies are listed in a manifest YAML file, and each assembly
11-
has a mandatory companion file called `seqinfo` that contains the sequence
11+
has a mandatory companion file called _seqinfo_ that contains the sequence
1212
names, lengths, and other information. The records in the manifest file contain
1313
the following fields:
1414

@@ -22,7 +22,7 @@ the following fields:
2222
- ``default_units``: default assembly units to include from the seqinfo file
2323
- ``url``: URL to where the corresponding sequence files can be downloaded
2424

25-
The `seqinfo` file is a TSV file with the following columns (with header):
25+
The _seqinfo_ file is a TSV file with the following columns (with header):
2626

2727
- ``name``: canonical sequence name
2828
- ``length``: sequence length
@@ -31,21 +31,20 @@ The `seqinfo` file is a TSV file with the following columns (with header):
3131
- ``unit``: assembly unit of the chromosome (e.g., "primary", "non-nuclear", "decoy")
3232
- ``aliases``: comma-separated list of aliases for the sequence name
3333

34-
We currently do not include sequences with "alt" or "patch" roles in `seqinfo` files, but we
34+
We currently do not include sequences with "alt" or "patch" roles in _seqinfo_ files, but we
3535
do support the inclusion of additional decoy sequences (as used by so-called NGS *analysis
3636
sets* for human genome assemblies) by marking them as members of a "decoy" assembly unit.
3737

38-
The `cytoband` file is an optional TSV file with the following columns (with header):
39-
38+
The _cytoband_ file is an optional TSV file with the following columns (with header):
4039
- ``chrom``: chromosome name
4140
- ``start``: start position
4241
- ``end``: end position
4342
- ``band``: cytogenetic coordinate (name of the band)
4443
- ``stain``: Giesma stain result
4544

46-
The order of the sequences in the `seqinfo` file is treated as canonical.
47-
The ordering of the chromosomes in the `cytobands` file should match the order
48-
of the chromosomes in the `seqinfo` file.
45+
The order of the sequences in the _seqinfo_ file is treated as canonical.
46+
The ordering of the chromosomes in the _cytobands_ file should match the order
47+
of the chromosomes in the _seqinfo_ file.
4948

5049
The manifest and companion files are stored in the ``bioframe/io/data`` directory.
5150
New assemblies can be requested by opening an issue on GitHub or by submitting a pull request.

docs/guide-bedtools.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@ kernelspec:
1414

1515
# Bioframe for bedtools users
1616

17-
1817
Bioframe is built around the analysis of genomic intervals as a pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) in memory, rather than working with tab-delimited text files saved on disk.
1918

2019
Bioframe supports reading a number of standard genomics text file formats via [`read_table`](https://bioframe.readthedocs.io/en/latest/api-fileops.html#bioframe.io.fileops.read_table), including BED files (see [schemas](https://github.com/open2c/bioframe/blob/main/bioframe/io/schemas.py)), which will load them as pandas DataFrames, a complete list of helper functions is [available here](API_fileops).
@@ -25,7 +24,6 @@ For example, with gtf files, you do not need to turn them into bed files, you ca
2524

2625
Finally, if needed, bioframe provides a convenience function to write dataframes to a standard BED file using [`to_bed`](https://bioframe.readthedocs.io/en/latest/api-fileops.html#bioframe.io.bed.to_bed).
2726

28-
2927
## `bedtools intersect`
3028

3129
### Select unique entries from the first bed overlapping the second bed `-u`
@@ -107,7 +105,6 @@ out = bf.overlap(A, B, how='inner', suffixes=('_', ''))[B.columns]
107105

108106
> **Note:** This gives one row per overlap and can contain duplicates. The output dataframe of the former method will use the same pandas index as the input dataframe `B`, while the latter result --- the join output --- will have an integer range index, like a pandas merge.
109107
110-
111108
### Intersect multiple beds against A
112109

113110
```sh

0 commit comments

Comments
 (0)