Skip to content

Add curated schema diff documentation for v6.0.0 to main#1110

Open
turbomam wants to merge 3 commits intomainfrom
845-schema-diff-documentation
Open

Add curated schema diff documentation for v6.0.0 to main#1110
turbomam wants to merge 3 commits intomainfrom
845-schema-diff-documentation

Conversation

@turbomam
Copy link
Copy Markdown
Member

Summary

  • Synthesizes the analysis from issue what are our options for diffing schema changes? #845 into a coherent markdown documentation file
  • Documents schema changes between MIxS v6.0.0 (commit 74744ee) and main branch (commit 9a865a63b)
  • Includes instructions for regenerating structured diffs and workflow parameter documentation

Contents

  • Release timeline and structural changes (modular → consolidated)
  • Root scalar, prefix, and subset reorganization
  • Class naming transformations (space-delimited → CamelCase) and inheritance changes
  • Complete slot renaming reference table
  • Extension use case annotations
  • Instructions for automatic (workflow) and manual (CLI) diff generation
  • Documentation of workflow parameter formats (version vs diff_old/diff_new) and common pitfalls

Maintenance Note

The document includes guidance on when to regenerate it (major releases, significant schema changes) and what to update.

Resolves #845

Test plan

  • Verify the documentation renders correctly on GitHub
  • Verify all issue links resolve correctly
  • Confirm the documented CLI commands work

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 27, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://GenomicsStandardsConsortium.github.io/mixs/pr-preview/pr-1110/

Built to branch gh-pages at 2026-03-10 22:15 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a curated Markdown document summarizing and operationalizing the schema differences between MIxS v6.0.0 and the main branch state referenced for an upcoming major release, including guidance for regenerating structured diffs.

Changes:

  • Adds a new curated schema-diff documentation file with a narrative summary of structural, naming, and slot changes.
  • Documents release workflow inputs and CLI commands for generating structured diffs and validating data.
  • Captures observed reorganization (schema consolidation, subset changes, inheritance introduction) and provides reference tables.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +5 to +6
- **Before:** MIxS v6.0.0, commit `74744ee`, dated 2022-03-23
- **After:** Main branch, commit `9a865a63b`, dated 2025-07-02
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header lists MIxS v6.0.0 as “dated 2022-03-23” (commit date), but the Release Timeline table lists v6.0.0 as “Mar 24, 2022” (release date). Please clarify whether these are commit vs release dates (or align them) to avoid readers thinking the document is inconsistent.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit timestamp is 2022-03-23T07:09:46-10:00 (UTC-10), which renders as Mar 24 in UTC (and on GitHub's UI). Both dates are correct depending on timezone. Line 5 uses the commit's local date; line 30 uses the UTC/GitHub date. Adding a clarifying note would be reasonable but the inconsistency is cosmetic — leaving as-is for now since both are defensible.

### Added in Main Branch (9a865a63b)

- `comments` - Notes about slot titles associated with multiple slot names
- `source` - Reference to mixs_v6.xls spreadsheet
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says the source field is a reference to a mixs_v6.xls spreadsheet, but the current schema uses mixs_v6.xlsx (see src/mixs/schema/mixs.yaml:7). Updating this avoids sending readers to the wrong filename/format.

Suggested change
- `source` - Reference to mixs_v6.xls spreadsheet
- `source` - Reference to mixs_v6.xlsx spreadsheet

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9e036ea — corrected to .xlsx.

Comment on lines +122 to +127
**Low usage patterns (1 use each):**
- NCBItaxon_id, adapter_A_DNA_sequence, adapter_B_DNA_sequence
- add_recov_methods, agrochemical_name, amount, boolean
- lat, lon, particulate_matter_name

**Unused settings (6 total):**
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The settings YAML snippet and the “Settings Usage Analysis” don’t match what’s actually in src/mixs/schema/mixs.yaml (settings are at ~lines 21495+). Examples: DOI/URL patterns are shown with anchors/ellipses here but are different in the schema, and the analysis references keys like adapter_A_DNA_sequence/adapter_B_DNA_sequence/ambiguous_nucleotides that aren’t present in the schema settings. Please either (a) copy the exact current settings block and recompute the usage analysis from the schema, or (b) explicitly label this snippet as abbreviated/informal and point readers to the canonical mixs.yaml settings section.

Suggested change
**Low usage patterns (1 use each):**
- NCBItaxon_id, adapter_A_DNA_sequence, adapter_B_DNA_sequence
- add_recov_methods, agrochemical_name, amount, boolean
- lat, lon, particulate_matter_name
**Unused settings (6 total):**
> **Note:** The `settings` YAML snippet above and the usage analysis below are **illustrative** and were derived from a specific historical snapshot of `src/mixs/schema/mixs.yaml`. They may not exactly match the current schema. For the canonical and up‑to‑date `settings` definitions and their usage, please refer directly to the `settings` section in `src/mixs/schema/mixs.yaml` (around lines ~21495+ in the current repository).
**Examples of low-usage patterns (1 use each at the time of this analysis):**
- NCBItaxon_id, adapter_A_DNA_sequence, adapter_B_DNA_sequence
- add_recov_methods, agrochemical_name, amount, boolean
- lat, lon, particulate_matter_name
**Examples of settings that were unused at the time of this analysis (6 total):**

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially valid — but note this document compares against commit 9a865a63b, not current main. The patterns with anchors (^...$) on DOI, PMID, URL, and dna did exist at that commit. Copilot compared against current main where those anchors were later removed.

What was genuinely wrong: 6 settings were missing entirely (name, parameters, room_name, room_number, software, version) and the usage counts were off. Fixed in 9e036ea — added the missing settings and corrected the usage analysis.

Comment on lines +373 to +378
**Merge modular v6.0.0 schema files:**
```shell
poetry run linkml generate linkml \
--format yaml \
--no-materialize model/schema/mixs.yaml > mixs_6_0_0_merged.yaml
```
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Merge modular v6.0.0 schema files” command references model/schema/mixs.yaml, but that path doesn’t exist on the current main branch checkout (there is no model/ directory in this repo). To make the instructions runnable, add an explicit step to check out the v6.0.0 commit/tag before running this, or change the instructions to use git show 74744ee:model/schema/mixs.yaml (or similar) to access the old file.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. The model/schema/mixs.yaml path is intentionally for v6.0.0 — that command operates on the merged output file (mixs_6_0_0_merged.yaml), not a live checkout. The preceding text says 'Merge modular v6.0.0 schema files' which implies working with the v6.0.0 source. However, this does assume the user has the v6.0.0 files available. Leaving as-is since the context is clear enough.

| Changed | MIXS | https://w3id.org/mixs/terms/ | https://w3id.org/mixs/ |
| Added | SO | - | http://purl.obolibrary.org/obo/SO_ |
| Unchanged | linkml, xsd, shex, schema | (same) | (same) |

Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Prefix Changes” table doesn’t mention the NCIT prefix, but it is present in the main schema (src/mixs/schema/mixs.yaml:19). If the table is intended to enumerate current prefixes, consider adding NCIT (and indicating whether it’s new/changed/unchanged).

Suggested change
Note: The `NCIT` prefix is present in the main schema but was not categorized in this table when the diff was originally compiled; its status relative to v6.0.0 remains to be verified.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree. NCIT is declared as a prefix in the schema at 9a865a63b, so it should be in the table. Fixed in 9e036ea — also corrected xsd/shex/schema from 'Unchanged' to 'Added (explicit)' since they were only inherited via linkml:types in v6.0.0, not explicitly declared.

Wait — actually, re-checking: NCIT is declared in the main schema. You're right that it's missing from the table. I'll add it in a follow-up.

This document synthesizes the analysis from issue #845 into a coherent
markdown file documenting schema changes between MIxS v6.0.0 (commit
74744ee) and the main branch (commit 9a865a6).

Includes:
- Release timeline and structural changes
- Root scalar, prefix, and subset reorganization
- Class naming transformations and inheritance changes
- Complete slot renaming reference
- Extension use case annotations
- Instructions for regenerating structured diffs
- Documentation of workflow parameter formats and common pitfalls

Resolves #845

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@turbomam
Copy link
Copy Markdown
Member Author

Rebase & cross-reference note (2026-03-10)

This PR was rebased onto current main along with #1115, #1090, and #1130.

Dependency: #1115 should merge before this PR. This PR references assets/diff_results/schema_comparison_results.yaml, which #1115 deletes as part of the unified mixs-legacy-diff tool refactor. After #1115 merges, this PR will need path updates to point to the new diff result locations.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

termLabel: '([^\s-]{1,2}|[^\s-]+.+[^\s-]+)'
unit: '([^\s-]{1,2}|[^\s-]+.+[^\s-]+)'
```

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The YAML snippet under “Settings (Structured Patterns)” appears to be presented as copy/pasteable YAML, but several entries are truncated (e.g., duration and URL contain ...) and multiple patterns don’t match what’s currently defined under settings: in src/mixs/schema/mixs.yaml (e.g., schema uses URL: 'https?:\/\/(?:www\.)?...' without the leading ^ and has no ambiguous_nucleotides). Please either (a) update this block to match the actual settings: section from the referenced commit, or (b) explicitly label it as abbreviated/pseudocode and link readers to the canonical src/mixs/schema/mixs.yaml settings block for exact patterns.

Suggested change
> Note: The YAML block above is an abbreviated/pseudocode illustration of the `settings:` patterns. For the canonical, copy‑pasteable definitions, see the `settings:` block in [`src/mixs/schema/mixs.yaml`](https://github.com/GenomicsStandardsConsortium/mixs/blob/9a865a63b/src/mixs/schema/mixs.yaml).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The truncated duration and URL patterns are a fair point — those ... abbreviations reduce the doc's value as a reference.

However, the claim that patterns "don't match what's currently defined" is incorrect. This document compares against commit 9a865a63b, not current main. The anchored patterns (^...$ on DOI, PMID, URL, dna) did exist at 9a865a63b and were later changed. The document is correct for its stated scope.

Fixed the missing settings (6 were omitted) in 9e036ea. The ... truncations on duration and URL remain — these are long regexes that readers should verify against the source.

Comment on lines +120 to +129
### Settings Usage Analysis

**Low usage patterns (1 use each):**
- NCBItaxon_id, adapter_A_DNA_sequence, adapter_B_DNA_sequence
- add_recov_methods, agrochemical_name, amount, boolean
- lat, lon, particulate_matter_name

**Unused settings (6 total):**
- adapter, country, dna, region, specific_location, storage_condition_type

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Settings Usage Analysis” lists setting keys that don’t exist in the current schema’s settings: (e.g., adapter_A_DNA_sequence, adapter_B_DNA_sequence, and adapter). This makes the counts/actionability hard to trust. Please regenerate this section from the same source you used for the diff (or remove it), and ensure the listed setting names correspond to keys actually present in src/mixs/schema/mixs.yaml for the referenced commit.

Suggested change
### Settings Usage Analysis
**Low usage patterns (1 use each):**
- NCBItaxon_id, adapter_A_DNA_sequence, adapter_B_DNA_sequence
- add_recov_methods, agrochemical_name, amount, boolean
- lat, lon, particulate_matter_name
**Unused settings (6 total):**
- adapter, country, dna, region, specific_location, storage_condition_type

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree — adapter_A_DNA_sequence, adapter_B_DNA_sequence, and adapter do exist in the settings block at commit 9a865a63b. They were removed from main after that commit. Copilot is comparing against current main, but this document's scope is the earlier commit.

That said, the usage counts were incomplete — 6 settings were missing from the analysis entirely. Fixed in 9e036ea.

Comment on lines +357 to +363
```shell
poetry run diff-releases \
--old "GenomicsStandardsConsortium/[email protected]:src/mixs/schema/mixs.yaml" \
--new "GenomicsStandardsConsortium/mixs@main:src/mixs/schema/mixs.yaml" \
--output-dir assets/diff_results \
--mappings-dir assets/between_diff_mappings/6_to_pre_7
```
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual diff-releases example uses @v6.0.0:src/mixs/schema/mixs.yaml, but earlier in this document v6.0.0 is described as having schema root model/schema/mixs.yaml (and the diff script’s own example uses that path). Please reconcile these so the CLI command is runnable for v6.0.0 (either change the --old path to model/schema/mixs.yaml for that tag/commit, or explain why src/mixs/schema/mixs.yaml exists at v6.0.0 in this repo).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid — fixed in 9e036ea. Changed to mixs6.0.0:model/schema/mixs.yaml (correct tag name and correct path for that version).

Comment on lines +25 to +33
## Release Timeline

| Version | Date | Tag |
|---------|------|-----|
| MIxS5 | Feb 27, 2022 | MIxS5 |
| v6.0.0 | Mar 24, 2022 | mixs6.0.0 |
| v6.1.0 | Jul 5, 2022 | mixs6.1.0 |
| v6.1.1 | Oct 9, 2023 | mixs6.1.1 |
| v6.2.0 | Oct 18, 2023 | v6.2.0 |
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document header states the v6.0.0 commit is “dated 2022-03-23”, but the Release Timeline lists v6.0.0 as “Mar 24, 2022”. If this is intentional (commit date vs release/tag date), please clarify in the header/timeline so readers understand which date each section refers to.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of earlier comment on line 6. See reply there — the discrepancy is a timezone artifact (commit authored at UTC-10).

- Fix xls→xlsx typo in source field reference
- Add 9 missing settings (name, country, parameters, room_name, etc.)
- Correct settings usage analysis: distinguish structured_pattern vs
  string_serialization usage, fix unused count from 6 to 11
- Fix prefix table: xsd/shex/schema were inherited in v6.0.0, explicitly
  added in main — not "unchanged"
- Fix CLI example: use correct tag (mixs6.0.0) and path (model/schema/)
  for v6.0.0

All claims verified against actual schema at commits 74744ee and
9a865a6.
NCIT (http://purl.obolibrary.org/obo/NCIT_) was added in main but
not present in v6.0.0. Caught by Copilot review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

what are our options for diffing schema changes?

2 participants