Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ jobs:
run: python build_differ.py ${{ steps.rid.outputs.rid }}

- name: Install packages (editable)
run: pip install -e packages/core -e packages/ooxmlpowertools -e packages/docxodus pytest
run: pip install -e packages/core -e packages/ooxmlpowertools -e packages/clippit -e packages/docxodus pytest

- name: Run tests
run: python -m pytest tests/ -v
Expand Down Expand Up @@ -91,6 +91,7 @@ jobs:
for rid in ${{ matrix.rids }}; do
python build_differ.py "$rid"
python -m build --wheel --no-isolation packages/ooxmlpowertools --outdir dist
python -m build --wheel --no-isolation packages/clippit --outdir dist
python -m build --wheel --no-isolation packages/docxodus --outdir dist
done
- name: Check wheels
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
name: Upload Python Package

# Builds and publishes all three packages on a tagged release:
# Builds and publishes all packages on a tagged release:
# python-redlines (core, pure-Python sdist + wheel)
# python-redlines-ooxmlpowertools (per-platform engine wheels)
# python-redlines-clippit (per-platform engine wheels)
# python-redlines-docxodus (per-platform engine wheels)

on:
Expand Down Expand Up @@ -54,6 +55,7 @@ jobs:
for rid in ${{ matrix.rids }}; do
python build_differ.py "$rid"
python -m build --wheel --no-isolation packages/ooxmlpowertools --outdir dist
python -m build --wheel --no-isolation packages/clippit --outdir dist
python -m build --wheel --no-isolation packages/docxodus --outdir dist
done
- uses: actions/upload-artifact@v4
Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ __pycache__/
# C# Build Dirs
csproj/bin/*
csproj/obj/*
csproj-clippit/bin/*
csproj-clippit/obj/*
docxodus/**/bin/*
docxodus/**/obj/*

Expand Down Expand Up @@ -239,4 +241,4 @@ fabric.properties
.idea/httpRequests

# Android studio 3.1+ serialized cache file
.idea/caches/build_file_checksums.ser
.idea/caches/build_file_checksums.ser
34 changes: 19 additions & 15 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,24 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

Python-Redlines generates `.docx` redline/tracked-changes documents by comparing two Word files. A pure-Python wrapper drives compiled C# (.NET 8) engine binaries; the Python layer handles platform detection, binary extraction, temp file management, and subprocess execution.

Two comparison engines are available:
Three comparison engines are available:
- **XmlPowerToolsEngine** — wraps Open-XML-PowerTools WmlComparer (original engine)
- **ClippitEngine** — wraps Clippit, an actively-maintained .NET 8 fork of Open-XML-PowerTools (same WmlComparer API)
- **DocxodusEngine** — wraps Docxodus, a modernized .NET 8.0 fork with better move detection

## Monorepo structure — three published packages
## Monorepo structure — four published packages

This repo publishes **three** PyPI packages, each with its own `pyproject.toml` under `packages/`:
This repo publishes **four** PyPI packages, each with its own `pyproject.toml` under `packages/`:

| Directory | PyPI name | Contents | Wheel |
|---|---|---|---|
| `packages/core` | `python-redlines` | Pure-Python wrapper (`engines.py`) | `py3-none-any` |
| `packages/ooxmlpowertools` | `python-redlines-ooxmlpowertools` | Open-XML-PowerTools binary | per-platform |
| `packages/clippit` | `python-redlines-clippit` | Clippit binary | per-platform |
| `packages/docxodus` | `python-redlines-docxodus` | Docxodus binary | per-platform |

Engine binaries are **optional dependencies**. Users install an engine via an extra:
`pip install python-redlines[docxodus]`, `[ooxmlpowertools]`, or `[all]`. The core
`pip install python-redlines[docxodus]`, `[ooxmlpowertools]`, `[clippit]`, or `[all]`. The core
package has no binaries; each binary package ships one platform's compiled binary as a
prebuilt wheel, so end users never compile anything.

Expand All @@ -39,8 +41,8 @@ git submodule update --init --recursive
python build_differ.py linux-x64
python build_differ.py --all

# Install all three packages editable for development
pip install -e packages/core -e packages/ooxmlpowertools -e packages/docxodus pytest
# Install all packages editable for development
pip install -e packages/core -e packages/ooxmlpowertools -e packages/clippit -e packages/docxodus pytest

# Run tests (from repo root)
python -m pytest tests/
Expand All @@ -57,15 +59,15 @@ python -m build --wheel packages/docxodus # needs an archive in _binaries/
- `BaseEngine` — locates the engine binary in its companion package via
`importlib.resources`, extracts the platform archive once into a writable
user cache dir (`platformdirs.user_cache_dir`), and runs it via subprocess.
- `XmlPowerToolsEngine` / `DocxodusEngine` — subclasses declaring `BINARY_PACKAGE`,
`BINARY_BASE_NAME`, and `EXTRA_NAME`.
- `XmlPowerToolsEngine` / `ClippitEngine` / `DocxodusEngine` — subclasses declaring
`BINARY_PACKAGE`, `BINARY_BASE_NAME`, and `EXTRA_NAME`.
- `EngineNotInstalledError` — raised on instantiation if the companion binary
package is missing, with the `pip install` command to fix it.

Both engines expose `run_redline(author_tag, original, modified, **kwargs)`.
All engines expose `run_redline(author_tag, original, modified, **kwargs)`.
`DocxodusEngine` overrides `_build_command()` to translate kwargs (e.g. `detect_moves`,
`detail_threshold`) into CLI flags. `XmlPowerToolsEngine` uses the legacy
4-positional-arg format and ignores kwargs.
`detail_threshold`) into CLI flags. `XmlPowerToolsEngine` and `ClippitEngine` use the
legacy 4-positional-arg format and ignore kwargs.

2. **Binary packages** ship one platform archive under
`src/<pkg>/_binaries/<rid>.tar.gz` (or `.zip` for Windows). The archive is
Expand All @@ -74,6 +76,7 @@ python -m build --wheel packages/docxodus # needs an archive in _binaries/

3. **C# sources**:
- `csproj/Program.cs` — Open-XML-PowerTools CLI tool
- `csproj-clippit/Program.cs` — Clippit CLI tool (Clippit pulled from NuGet, no submodule)
- `docxodus/tools/redline/Program.cs` — Docxodus CLI tool (git submodule)

`build_differ.py` compiles an engine for a given RID with `dotnet publish` and
Expand All @@ -86,22 +89,23 @@ python -m build --wheel packages/docxodus # needs an archive in _binaries/
the wheel, repeat.
- `.github/workflows/ci.yml` — tests on each OS (native RID) + builds all wheels.
- `.github/workflows/python-publish.yml` — on release, builds per-platform engine
wheels across 3 OS runners, the core sdist+wheel, and publishes all three packages.
wheels across 3 OS runners, the core sdist+wheel, and publishes all packages.

## Version management

`packages/core/src/python_redlines/__about__.py` is the single source of truth.
The two binary packages read it via `[tool.hatch.version] path = "../core/..."`,
so all three always share one version. Bump only that file.
The binary packages read it via `[tool.hatch.version] path = "../core/..."`,
so all packages always share one version. Bump only that file.

## Testing Notes

Tests live in repo-root `tests/` and must be run from the repo root (fixtures use
relative paths like `tests/fixtures/original.docx`). They require all three packages
relative paths like `tests/fixtures/original.docx`). They require all packages
installed and the binaries built for the current platform. The XmlPowerToolsEngine
integration test validates exactly 9 revisions on the fixture documents.

## Stdout Format Differences

- **XmlPowerToolsEngine**: `"Revisions found: 9"`
- **ClippitEngine**: `"Revisions found: 9"` (same WmlComparer-based format)
- **DocxodusEngine**: `"Redline complete: 9 revision(s) found"`
30 changes: 23 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ comparison settings, and how the packages are built and distributed.

## Comparison Engines

Python-Redlines provides **two comparison engines**. `DocxodusEngine` is the default and
recommended choice; `XmlPowerToolsEngine` remains available as a legacy option.
Python-Redlines provides **three comparison engines**. `DocxodusEngine` is the default and
recommended choice; `ClippitEngine` and `XmlPowerToolsEngine` are also available.

### `DocxodusEngine` — Default (Recommended)

Expand All @@ -61,6 +61,19 @@ engine = DocxodusEngine()
redline_bytes, stdout, stderr = engine.run_redline("AuthorName", original_bytes, modified_bytes)
```

### `ClippitEngine`

Wraps [Clippit](https://github.com/sergey-tihon/Clippit), an actively-maintained .NET 8 fork of
Open-XML-PowerTools. It uses the same `WmlComparer` API as `XmlPowerToolsEngine` (same options and
`Revisions found: N` stdout) but rides a maintained, modern dependency.

```python
from python_redlines import ClippitEngine

engine = ClippitEngine()
redline_bytes, stdout, stderr = engine.run_redline("AuthorName", original_bytes, modified_bytes)
```

### `XmlPowerToolsEngine` — Legacy

Wraps the original [Open-XML-PowerTools](https://github.com/OpenXmlDev/Open-Xml-PowerTools) `WmlComparer`. This
Expand All @@ -76,7 +89,7 @@ redline_bytes, stdout, stderr = engine.run_redline("AuthorName", original_bytes,
> **Note:** Open-XML-PowerTools was archived by Microsoft and is no longer maintained. It uses an older
> version of the Open XML SDK. While it works for many purposes, Docxodus is the recommended engine going forward.

Both engines share the same API — the only difference is the class you instantiate and the stdout format
All engines share the same API — the only difference is the class you instantiate and the stdout format
(see [Stdout Differences](#stdout-differences) below).

## Getting Started
Expand All @@ -92,8 +105,9 @@ as extras:

```commandline
pip install python-redlines[docxodus] # Docxodus engine
pip install python-redlines[clippit] # Clippit engine (maintained OOXML PowerTools fork)
pip install python-redlines[ooxmlpowertools] # Open-XML-PowerTools engine
pip install python-redlines[all] # both engines
pip install python-redlines[all] # all engines
```

Prebuilt wheels are available for Linux, macOS, and Windows (x64 and arm64); `pip`
Expand Down Expand Up @@ -146,12 +160,13 @@ redline_bytes, stdout, stderr = engine.run_redline(

Both engines follow the same pattern: a Python wrapper class invokes a self-contained C# binary via subprocess.

The repository is a **monorepo of three separately-published packages**:
The repository is a **monorepo of four separately-published packages**:

| Package | PyPI name | Contents |
|---|---|---|
| `packages/core` | `python-redlines` | Pure-Python wrapper; no binaries |
| `packages/ooxmlpowertools` | `python-redlines-ooxmlpowertools` | Open-XML-PowerTools engine binary |
| `packages/clippit` | `python-redlines-clippit` | Clippit engine binary |
| `packages/docxodus` | `python-redlines-docxodus` | Docxodus engine binary |

The core package's `[docxodus]` / `[ooxmlpowertools]` / `[all]` extras pull in the
Expand Down Expand Up @@ -189,6 +204,7 @@ The two engines produce slightly different stdout messages:
| Engine | Example stdout |
|---|---|
| `XmlPowerToolsEngine` | `Revisions found: 9` |
| `ClippitEngine` | `Revisions found: 9` |
| `DocxodusEngine` | `Redline complete: 9 revision(s) found` |

## Development
Expand All @@ -211,8 +227,8 @@ git submodule update --init --recursive
# Build the engine binaries for your platform (RIDs: linux-x64, win-x64, osx-arm64, ...)
python build_differ.py linux-x64

# Install all three packages editable
pip install -e packages/core -e packages/ooxmlpowertools -e packages/docxodus pytest
# Install all packages editable
pip install -e packages/core -e packages/ooxmlpowertools -e packages/clippit -e packages/docxodus pytest
```

### Commands
Expand Down
9 changes: 9 additions & 0 deletions build_differ.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,15 @@
"python_redlines_ooxmlpowertools", "_binaries",
),
},
{
"name": "clippit",
"csproj": os.path.join("csproj-clippit"),
"csproj_file": os.path.join("csproj-clippit", "clippit-redline.csproj"),
"binaries_dir": os.path.join(
"packages", "clippit", "src",
"python_redlines_clippit", "_binaries",
),
},
{
"name": "docxodus",
"csproj": os.path.join("docxodus", "tools", "redline"),
Expand Down
56 changes: 56 additions & 0 deletions csproj-clippit/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
using System;
using System.IO;
using Clippit;
using Clippit.Word;
using DocumentFormat.OpenXml.Packaging;

class Program
{
static void Main(string[] args)
{
if (args.Length != 4)
{
Console.WriteLine("Usage: clippit <author_tag> <original_path.docx> <modified_path.docx> <redline_path.docx>");
return;
}

string authorTag = args[0];
string originalFilePath = args[1];
string modifiedFilePath = args[2];
string outputFilePath = args[3];

if (!File.Exists(originalFilePath) || !File.Exists(modifiedFilePath))
{
Console.WriteLine("Error: One or both files do not exist.");
return;
}

try
{
var originalBytes = File.ReadAllBytes(originalFilePath);
var modifiedBytes = File.ReadAllBytes(modifiedFilePath);
var originalDocument = new WmlDocument(originalFilePath, originalBytes);
var modifiedDocument = new WmlDocument(modifiedFilePath, modifiedBytes);

var comparisonSettings = new WmlComparerSettings
{
AuthorForRevisions = authorTag,
DetailThreshold = 0
};

var comparisonResults = WmlComparer.Compare(originalDocument, modifiedDocument, comparisonSettings);
var revisions = WmlComparer.GetRevisions(comparisonResults, comparisonSettings);

// Output results
Console.WriteLine($"Revisions found: {revisions.Count}");

File.WriteAllBytes(outputFilePath, comparisonResults.DocumentByteArray);
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
Console.WriteLine("Detailed Stack Trace:");
Console.WriteLine(ex.StackTrace);
}
}
}
17 changes: 17 additions & 0 deletions csproj-clippit/clippit-redline.csproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<!-- Assembly/project id must not equal the referenced "Clippit" NuGet
package id (case-insensitive), or NuGet restore reports NU1108. -->
<AssemblyName>clippit-redline</AssemblyName>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="Clippit" Version="3.0.1" />
</ItemGroup>

</Project>
11 changes: 11 additions & 0 deletions packages/clippit/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# python-redlines-clippit

Compiled Clippit redline engine binary for
[`python-redlines`](https://pypi.org/project/python-redlines/).

This package only contains the platform-specific engine binary. Install it via
the `python-redlines` extra rather than directly:

```bash
pip install python-redlines[clippit]
```
42 changes: 42 additions & 0 deletions packages/clippit/hatch_build.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
"""Wheel build hook: stamp the platform tag from the bundled binary archive.

Each binary package wheel must target exactly one platform. The archive placed
in src/<pkg>/_binaries/ by build_differ.py determines the wheel's platform tag.
"""
import pathlib

from hatchling.builders.hooks.plugin.interface import BuildHookInterface

# .NET runtime identifier -> wheel platform tag
PLATFORM_TAGS = {
"linux-x64": "manylinux2014_x86_64",
"linux-arm64": "manylinux2014_aarch64",
"win-x64": "win_amd64",
"win-arm64": "win_arm64",
"osx-x64": "macosx_11_0_x86_64",
"osx-arm64": "macosx_11_0_arm64",
}


class RedlinesBinaryBuildHook(BuildHookInterface):
PLUGIN_NAME = "custom"

def initialize(self, version, build_data):
archives = sorted(
p for p in (pathlib.Path(self.root) / "src").glob("*/_binaries/*")
if p.name.endswith((".tar.gz", ".zip"))
)
if len(archives) != 1:
raise ValueError(
f"Expected exactly one binary archive under src/*/_binaries/, "
f"found {len(archives)}: {[a.name for a in archives]}. "
f"Run `python build_differ.py <rid>` to populate it before building."
)

rid = archives[0].name.split(".", 1)[0]
if rid not in PLATFORM_TAGS:
raise ValueError(f"Unknown runtime identifier '{rid}' from archive {archives[0].name}")

build_data["pure_python"] = False
build_data["infer_tag"] = False
build_data["tag"] = f"py3-none-{PLATFORM_TAGS[rid]}"
Loading
Loading