Add curated schema diff documentation for v6.0.0 to main#1110
Add curated schema diff documentation for v6.0.0 to main#1110
Conversation
|
There was a problem hiding this comment.
Pull request overview
Adds a curated Markdown document summarizing and operationalizing the schema differences between MIxS v6.0.0 and the main branch state referenced for an upcoming major release, including guidance for regenerating structured diffs.
Changes:
- Adds a new curated schema-diff documentation file with a narrative summary of structural, naming, and slot changes.
- Documents release workflow inputs and CLI commands for generating structured diffs and validating data.
- Captures observed reorganization (schema consolidation, subset changes, inheritance introduction) and provides reference tables.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - **Before:** MIxS v6.0.0, commit `74744ee`, dated 2022-03-23 | ||
| - **After:** Main branch, commit `9a865a63b`, dated 2025-07-02 |
There was a problem hiding this comment.
The header lists MIxS v6.0.0 as “dated 2022-03-23” (commit date), but the Release Timeline table lists v6.0.0 as “Mar 24, 2022” (release date). Please clarify whether these are commit vs release dates (or align them) to avoid readers thinking the document is inconsistent.
There was a problem hiding this comment.
The commit timestamp is 2022-03-23T07:09:46-10:00 (UTC-10), which renders as Mar 24 in UTC (and on GitHub's UI). Both dates are correct depending on timezone. Line 5 uses the commit's local date; line 30 uses the UTC/GitHub date. Adding a clarifying note would be reasonable but the inconsistency is cosmetic — leaving as-is for now since both are defensible.
| ### Added in Main Branch (9a865a63b) | ||
|
|
||
| - `comments` - Notes about slot titles associated with multiple slot names | ||
| - `source` - Reference to mixs_v6.xls spreadsheet |
There was a problem hiding this comment.
This says the source field is a reference to a mixs_v6.xls spreadsheet, but the current schema uses mixs_v6.xlsx (see src/mixs/schema/mixs.yaml:7). Updating this avoids sending readers to the wrong filename/format.
| - `source` - Reference to mixs_v6.xls spreadsheet | |
| - `source` - Reference to mixs_v6.xlsx spreadsheet |
| **Low usage patterns (1 use each):** | ||
| - NCBItaxon_id, adapter_A_DNA_sequence, adapter_B_DNA_sequence | ||
| - add_recov_methods, agrochemical_name, amount, boolean | ||
| - lat, lon, particulate_matter_name | ||
|
|
||
| **Unused settings (6 total):** |
There was a problem hiding this comment.
The settings YAML snippet and the “Settings Usage Analysis” don’t match what’s actually in src/mixs/schema/mixs.yaml (settings are at ~lines 21495+). Examples: DOI/URL patterns are shown with anchors/ellipses here but are different in the schema, and the analysis references keys like adapter_A_DNA_sequence/adapter_B_DNA_sequence/ambiguous_nucleotides that aren’t present in the schema settings. Please either (a) copy the exact current settings block and recompute the usage analysis from the schema, or (b) explicitly label this snippet as abbreviated/informal and point readers to the canonical mixs.yaml settings section.
| **Low usage patterns (1 use each):** | |
| - NCBItaxon_id, adapter_A_DNA_sequence, adapter_B_DNA_sequence | |
| - add_recov_methods, agrochemical_name, amount, boolean | |
| - lat, lon, particulate_matter_name | |
| **Unused settings (6 total):** | |
| > **Note:** The `settings` YAML snippet above and the usage analysis below are **illustrative** and were derived from a specific historical snapshot of `src/mixs/schema/mixs.yaml`. They may not exactly match the current schema. For the canonical and up‑to‑date `settings` definitions and their usage, please refer directly to the `settings` section in `src/mixs/schema/mixs.yaml` (around lines ~21495+ in the current repository). | |
| **Examples of low-usage patterns (1 use each at the time of this analysis):** | |
| - NCBItaxon_id, adapter_A_DNA_sequence, adapter_B_DNA_sequence | |
| - add_recov_methods, agrochemical_name, amount, boolean | |
| - lat, lon, particulate_matter_name | |
| **Examples of settings that were unused at the time of this analysis (6 total):** |
There was a problem hiding this comment.
Partially valid — but note this document compares against commit 9a865a63b, not current main. The patterns with anchors (^...$) on DOI, PMID, URL, and dna did exist at that commit. Copilot compared against current main where those anchors were later removed.
What was genuinely wrong: 6 settings were missing entirely (name, parameters, room_name, room_number, software, version) and the usage counts were off. Fixed in 9e036ea — added the missing settings and corrected the usage analysis.
| **Merge modular v6.0.0 schema files:** | ||
| ```shell | ||
| poetry run linkml generate linkml \ | ||
| --format yaml \ | ||
| --no-materialize model/schema/mixs.yaml > mixs_6_0_0_merged.yaml | ||
| ``` |
There was a problem hiding this comment.
The “Merge modular v6.0.0 schema files” command references model/schema/mixs.yaml, but that path doesn’t exist on the current main branch checkout (there is no model/ directory in this repo). To make the instructions runnable, add an explicit step to check out the v6.0.0 commit/tag before running this, or change the instructions to use git show 74744ee:model/schema/mixs.yaml (or similar) to access the old file.
There was a problem hiding this comment.
Fair point. The model/schema/mixs.yaml path is intentionally for v6.0.0 — that command operates on the merged output file (mixs_6_0_0_merged.yaml), not a live checkout. The preceding text says 'Merge modular v6.0.0 schema files' which implies working with the v6.0.0 source. However, this does assume the user has the v6.0.0 files available. Leaving as-is since the context is clear enough.
| | Changed | MIXS | https://w3id.org/mixs/terms/ | https://w3id.org/mixs/ | | ||
| | Added | SO | - | http://purl.obolibrary.org/obo/SO_ | | ||
| | Unchanged | linkml, xsd, shex, schema | (same) | (same) | | ||
|
|
There was a problem hiding this comment.
The “Prefix Changes” table doesn’t mention the NCIT prefix, but it is present in the main schema (src/mixs/schema/mixs.yaml:19). If the table is intended to enumerate current prefixes, consider adding NCIT (and indicating whether it’s new/changed/unchanged).
| Note: The `NCIT` prefix is present in the main schema but was not categorized in this table when the diff was originally compiled; its status relative to v6.0.0 remains to be verified. |
There was a problem hiding this comment.
Disagree. NCIT is declared as a prefix in the schema at 9a865a63b, so it should be in the table. Fixed in 9e036ea — also corrected xsd/shex/schema from 'Unchanged' to 'Added (explicit)' since they were only inherited via linkml:types in v6.0.0, not explicitly declared.
Wait — actually, re-checking: NCIT is declared in the main schema. You're right that it's missing from the table. I'll add it in a follow-up.
This document synthesizes the analysis from issue #845 into a coherent markdown file documenting schema changes between MIxS v6.0.0 (commit 74744ee) and the main branch (commit 9a865a6). Includes: - Release timeline and structural changes - Root scalar, prefix, and subset reorganization - Class naming transformations and inheritance changes - Complete slot renaming reference - Extension use case annotations - Instructions for regenerating structured diffs - Documentation of workflow parameter formats and common pitfalls Resolves #845 Co-Authored-By: Claude Opus 4.5 <[email protected]>
7dcc002 to
3c52f01
Compare
|
Rebase & cross-reference note (2026-03-10) This PR was rebased onto current Dependency: #1115 should merge before this PR. This PR references
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| termLabel: '([^\s-]{1,2}|[^\s-]+.+[^\s-]+)' | ||
| unit: '([^\s-]{1,2}|[^\s-]+.+[^\s-]+)' | ||
| ``` | ||
|
|
There was a problem hiding this comment.
The YAML snippet under “Settings (Structured Patterns)” appears to be presented as copy/pasteable YAML, but several entries are truncated (e.g., duration and URL contain ...) and multiple patterns don’t match what’s currently defined under settings: in src/mixs/schema/mixs.yaml (e.g., schema uses URL: 'https?:\/\/(?:www\.)?...' without the leading ^ and has no ambiguous_nucleotides). Please either (a) update this block to match the actual settings: section from the referenced commit, or (b) explicitly label it as abbreviated/pseudocode and link readers to the canonical src/mixs/schema/mixs.yaml settings block for exact patterns.
| > Note: The YAML block above is an abbreviated/pseudocode illustration of the `settings:` patterns. For the canonical, copy‑pasteable definitions, see the `settings:` block in [`src/mixs/schema/mixs.yaml`](https://github.com/GenomicsStandardsConsortium/mixs/blob/9a865a63b/src/mixs/schema/mixs.yaml). |
There was a problem hiding this comment.
The truncated duration and URL patterns are a fair point — those ... abbreviations reduce the doc's value as a reference.
However, the claim that patterns "don't match what's currently defined" is incorrect. This document compares against commit 9a865a63b, not current main. The anchored patterns (^...$ on DOI, PMID, URL, dna) did exist at 9a865a63b and were later changed. The document is correct for its stated scope.
Fixed the missing settings (6 were omitted) in 9e036ea. The ... truncations on duration and URL remain — these are long regexes that readers should verify against the source.
| ### Settings Usage Analysis | ||
|
|
||
| **Low usage patterns (1 use each):** | ||
| - NCBItaxon_id, adapter_A_DNA_sequence, adapter_B_DNA_sequence | ||
| - add_recov_methods, agrochemical_name, amount, boolean | ||
| - lat, lon, particulate_matter_name | ||
|
|
||
| **Unused settings (6 total):** | ||
| - adapter, country, dna, region, specific_location, storage_condition_type | ||
|
|
There was a problem hiding this comment.
“Settings Usage Analysis” lists setting keys that don’t exist in the current schema’s settings: (e.g., adapter_A_DNA_sequence, adapter_B_DNA_sequence, and adapter). This makes the counts/actionability hard to trust. Please regenerate this section from the same source you used for the diff (or remove it), and ensure the listed setting names correspond to keys actually present in src/mixs/schema/mixs.yaml for the referenced commit.
| ### Settings Usage Analysis | |
| **Low usage patterns (1 use each):** | |
| - NCBItaxon_id, adapter_A_DNA_sequence, adapter_B_DNA_sequence | |
| - add_recov_methods, agrochemical_name, amount, boolean | |
| - lat, lon, particulate_matter_name | |
| **Unused settings (6 total):** | |
| - adapter, country, dna, region, specific_location, storage_condition_type |
There was a problem hiding this comment.
Disagree — adapter_A_DNA_sequence, adapter_B_DNA_sequence, and adapter do exist in the settings block at commit 9a865a63b. They were removed from main after that commit. Copilot is comparing against current main, but this document's scope is the earlier commit.
That said, the usage counts were incomplete — 6 settings were missing from the analysis entirely. Fixed in 9e036ea.
| ```shell | ||
| poetry run diff-releases \ | ||
| --old "GenomicsStandardsConsortium/[email protected]:src/mixs/schema/mixs.yaml" \ | ||
| --new "GenomicsStandardsConsortium/mixs@main:src/mixs/schema/mixs.yaml" \ | ||
| --output-dir assets/diff_results \ | ||
| --mappings-dir assets/between_diff_mappings/6_to_pre_7 | ||
| ``` |
There was a problem hiding this comment.
The manual diff-releases example uses @v6.0.0:src/mixs/schema/mixs.yaml, but earlier in this document v6.0.0 is described as having schema root model/schema/mixs.yaml (and the diff script’s own example uses that path). Please reconcile these so the CLI command is runnable for v6.0.0 (either change the --old path to model/schema/mixs.yaml for that tag/commit, or explain why src/mixs/schema/mixs.yaml exists at v6.0.0 in this repo).
There was a problem hiding this comment.
Valid — fixed in 9e036ea. Changed to mixs6.0.0:model/schema/mixs.yaml (correct tag name and correct path for that version).
| ## Release Timeline | ||
|
|
||
| | Version | Date | Tag | | ||
| |---------|------|-----| | ||
| | MIxS5 | Feb 27, 2022 | MIxS5 | | ||
| | v6.0.0 | Mar 24, 2022 | mixs6.0.0 | | ||
| | v6.1.0 | Jul 5, 2022 | mixs6.1.0 | | ||
| | v6.1.1 | Oct 9, 2023 | mixs6.1.1 | | ||
| | v6.2.0 | Oct 18, 2023 | v6.2.0 | |
There was a problem hiding this comment.
The document header states the v6.0.0 commit is “dated 2022-03-23”, but the Release Timeline lists v6.0.0 as “Mar 24, 2022”. If this is intentional (commit date vs release/tag date), please clarify in the header/timeline so readers understand which date each section refers to.
There was a problem hiding this comment.
Duplicate of earlier comment on line 6. See reply there — the discrepancy is a timezone artifact (commit authored at UTC-10).
- Fix xls→xlsx typo in source field reference - Add 9 missing settings (name, country, parameters, room_name, etc.) - Correct settings usage analysis: distinguish structured_pattern vs string_serialization usage, fix unused count from 6 to 11 - Fix prefix table: xsd/shex/schema were inherited in v6.0.0, explicitly added in main — not "unchanged" - Fix CLI example: use correct tag (mixs6.0.0) and path (model/schema/) for v6.0.0 All claims verified against actual schema at commits 74744ee and 9a865a6.
NCIT (http://purl.obolibrary.org/obo/NCIT_) was added in main but not present in v6.0.0. Caught by Copilot review.
Summary
74744ee) and main branch (commit9a865a63b)Contents
versionvsdiff_old/diff_new) and common pitfallsMaintenance Note
The document includes guidance on when to regenerate it (major releases, significant schema changes) and what to update.
Resolves #845
Test plan
🤖 Generated with Claude Code