Documentation: MIxS Term (linkML slot) specification documentation#944
Documentation: MIxS Term (linkML slot) specification documentation#944jfy133 wants to merge 52 commits intoGenomicsStandardsConsortium:mainfrom
Conversation
|
I skimmed this and it looks like a fantastic starting point. I didn't see anything that I disagree with yet and I'm sure we can add more over time. So I will read it again more carefully and am looking forward to advocating for it to be merged in! We should see if anything has come out of @only1chunts's related efforts about defining or clarifying the role of the different LinkML metaslots for MIxS terms/slots @sierra-moxon and I have been talking about refining the definitions of LinkML metaslots, and this may serve as a contribution towards that. |
|
nice :D I look forward to @only1chunts 's thoughts, and if mostly happy we can move to a bigger discussion in one of the TWG/CIG meetings? |
Woolly-at-EBI
left a comment
There was a problem hiding this comment.
This is really useful.
Most of my comments are minor
|
|
||
| ### 8.2 All extension terms must be assigned the environment subset | ||
|
|
||
| A term (slot) defined in an extension (rather than a core checklist term) MUST be assigned to the 'Environment' section (subset). |
There was a problem hiding this comment.
clarification sought: as well as "core terms" there will be some terms that occur in both environmental and non-environmental use cases, but not in all use cases so it is it correct to say "must"?
For example at ENA we are working on an "laboratory animal checklist", yes a laboratory is obviously an environment, but it is not quite like a pond or even a building, when the subject is a mouse, okay the mouse is a kind of environment too. Umm my head is hurting.
There was a problem hiding this comment.
I never actually understood why some terms were repeated in the extensions (it seems), e.g. samp_name is in all the extensions, but it is in the 'core' checklists? Unless this is an error @turbomam or @mslarae13 would you happen to know?
There was a problem hiding this comment.
Originally we had core terms. These were terms we anticipated being used across many (genomic) checklists.
Currently we have genomic checklists and environmental extensions. With the idea that sample metadata that is being submitted, e.g., to INSDC, would include terms from a checklist and one or more environmental extensions. However, the submission could be using just a 'genomic checklist'. Thus 'sample name' is the key term to have in any reported metadata submission.
| ### 10.2 Specifying units | ||
|
|
||
| Terms (slots) that require the use of a measurement unit SHOULD specify the types of units through a dedicated structured string pattern component. | ||
|
|
There was a problem hiding this comment.
Agree preferred_unit ought to be snake_case and lower case.
The preferred_unit's do serve a useful purpose in that theoretically an implementor can validate them. Many systems do hold the unit and the value separately to reduce complexity, but I get that people are worried about losing the units. (It would be a major and unwelcome short term change for many implementors)
Woolly-at-EBI
left a comment
There was a problem hiding this comment.
Agree with the changed following my suggestions. Thank you.
lschriml
left a comment
There was a problem hiding this comment.
Would be useful to combine sections of this document that are similar.
And to separate the MIxS specific and LinkML specific sections.
| - [`modified_by`](https://linkml.io/linkml-model/latest/docs/modified_by/) | ||
| - [`last_updated_on`](https://linkml.io/linkml-model/latest/docs/last_updated_on/) | ||
|
|
||
| ### 15.2 Terms should record original author |
| created_by: orcid:0000-1234-1234-1234 # Erika Mustermann | ||
| ``` | ||
|
|
||
| ### 15.3. Term creation date should be recorded |
|
|
||
| The date in which the term (slot) was updated or modified SHOULD be recorded using the LinkML [`last_updated_on`](https://linkml.io/linkml-model/latest/docs/last_updated_on/) attribute. | ||
|
|
||
| ## 16 Importing terms from other standards |
There was a problem hiding this comment.
these sections should be merged, reviewed.
|
|
||
| - [Darwin Core (DwC)](https://dwc.tdwg.org/) | ||
|
|
||
| ### 16.2 Imported external standards terms requirements |
|
|
||
| Minor modifications to the term (slot) structured comment (name) MAY BE made to ensure compliance with MIxS naming conventions. | ||
|
|
||
| ## References |
Co-authored-by: lschriml <[email protected]>
Co-authored-by: lschriml <[email protected]>
…-standards-consortium-mixs into docs-slot-specifications
There was a problem hiding this comment.
Pull request overview
Adds a new reference-style specification document describing how MIxS metadata terms (“slots”) should be designed in the LinkML schema, and updates MkDocs configuration to support footnotes used by the new document.
Changes:
- Add
src/docs/slot_specifications.mdwith detailed guidance for MIxS slot design (naming, ranges, examples, provenance, etc.). - Enable Markdown footnotes in
mkdocs.ymlto render the new document correctly. - Minor cleanup in
mkdocs.yml(removal of an extraneous line).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 16 comments.
| File | Description |
|---|---|
| src/docs/slot_specifications.md | Introduces a comprehensive slot/term specification guide (new documentation page). |
| mkdocs.yml | Enables footnotes extension and cleans YAML to support rendering the new doc. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| ### 11.1 Range options should be valid LinkML types | ||
|
|
||
| See section [4](#4-data-types). |
There was a problem hiding this comment.
Broken internal reference: "See section 4" points to an anchor that doesn't exist in this document (section 4 is "Attributes for MIxS terms"). Update the link target/text to the intended section (likely section 3 on range types).
| See section [4](#4-data-types). | |
| See section [3](#3-data-types). |
|
|
||
| A term that requires a specific value syntax or a structured string layout SHOULD use the `structured_pattern` slot attribute, where the pattern components SHOULD be predefined in the `settings:` section of the schema when theoretically could be used more than once. | ||
|
|
||
| A slot MAY use `pattern:` attribute when XYZ <!-- TODO -->. |
There was a problem hiding this comment.
This section still contains a placeholder ("when XYZ "). If this is intended to be merged as a reference specification, the TODO should be resolved or converted into an explicit "TBD" note with a tracking issue/link so readers don't treat it as normative guidance.
| A slot MAY use `pattern:` attribute when XYZ <!-- TODO -->. | |
| > [!NOTE] | |
| > TBD: The specific conditions under which a slot MAY use the `pattern:` attribute instead of `structured_pattern` are under discussion and will be documented in a future revision of this specification. See the project issue tracker for the latest status. |
| | `culture` | `cult_` | `cult_result_org` | | ||
| | `culture` | `cult_` | `cult_root_med` | | ||
| | `dissolved` | `diss_` | `diss_carb_dioxide` | | ||
| | `dissovled` | `diss_` | `diss_hydrogen` | |
There was a problem hiding this comment.
Spelling: "dissovled" is misspelled; should be "dissolved".
| | `dissovled` | `diss_` | `diss_hydrogen` | | |
| | `dissolved` | `diss_` | `diss_hydrogen` | |
| - `integer` | ||
| - `float` | ||
| - `boolean` | ||
| - An '[enumeration](#145-enumerations)' (i.e., controlled vocabulary') predefined by MIxS (see top of the [schema](https://github.com/GenomicsStandardsConsortium/mixs/blob/main/src/mixs/schema/mixs.yaml#L28)). |
There was a problem hiding this comment.
This bullet has mismatched quotes/parentheses ("An '[enumeration]...'" and "controlled vocabulary'") which renders oddly and may confuse readers. Clean up the quoting so the sentence reads unambiguously.
| - An '[enumeration](#145-enumerations)' (i.e., controlled vocabulary') predefined by MIxS (see top of the [schema](https://github.com/GenomicsStandardsConsortium/mixs/blob/main/src/mixs/schema/mixs.yaml#L28)). | |
| - An [enumeration](#145-enumerations) (i.e., controlled vocabulary) predefined by MIxS (see top of the [schema](https://github.com/GenomicsStandardsConsortium/mixs/blob/main/src/mixs/schema/mixs.yaml#L28)). |
|
|
||
| ### 11.4 Preferred units | ||
|
|
||
| Terms (slots) that record a measurement SHOULD specify the preferred unit of measurement for the term (slot) within a LinkmL `annotation` slot sub-attribute called `Preferred_unit:`. |
There was a problem hiding this comment.
LinkML attribute name: this refers to a LinkML annotation slot attribute, but LinkML uses annotations (plural) and MIxS uses an annotations: map with a Preferred_unit key. Also "LinkmL" capitalization looks accidental; fixing this prevents readers from copying invalid YAML.
| Terms (slots) that record a measurement SHOULD specify the preferred unit of measurement for the term (slot) within a LinkmL `annotation` slot sub-attribute called `Preferred_unit:`. | |
| Terms (slots) that record a measurement SHOULD specify the preferred unit of measurement for the term (slot) within a LinkML `annotations` slot attribute using a `Preferred_unit` key. |
| - Valid example: `library size`. | ||
| - Invalid examples: | ||
| - `Library size` (capitalisation of first character). | ||
| - `Library Size` (capitalisation of of all words). |
There was a problem hiding this comment.
Typo: "capitalisation of of all words" has a duplicated "of".
| - `Library Size` (capitalisation of of all words). | |
| - `Library Size` (capitalisation of all words). |
|
|
||
| ### 7.1 Minimum number of examples | ||
|
|
||
| There MUST have minimum of 1 examples for a term (slot). |
There was a problem hiding this comment.
Grammar/pluralization: "There MUST have minimum of 1 examples" should be rewritten (e.g., "There MUST be a minimum of 1 example"). As written it's ungrammatical and mixes singular/plural.
| There MUST have minimum of 1 examples for a term (slot). | |
| There MUST be a minimum of 1 example for a term (slot). |
| > The guidance in this section regarding `subset`s may be replaced with the use of `slot-group` in the future. | ||
|
|
||
| ### 8.1 All core terms must be assigned a subset | ||
|
|
||
| All core checklist terms (slot) MUST be assigned to a section (subset). | ||
|
|
||
| ### 8.2 All extension terms must be assigned the environment subset | ||
|
|
||
| A term (slot) defined in an extension (rather than a core checklist term) MUST be assigned to the 'Environment' section (subset). |
There was a problem hiding this comment.
LinkML terminology mismatch: this section refers to assigning a slot to a "subset", but in the MIxS schema slots use the in_subset attribute (and subsets: defines subset objects). Consider updating wording to explicitly instruct using in_subset on the slot to avoid implementers looking for a non-existent subset slot attribute.
| > The guidance in this section regarding `subset`s may be replaced with the use of `slot-group` in the future. | |
| ### 8.1 All core terms must be assigned a subset | |
| All core checklist terms (slot) MUST be assigned to a section (subset). | |
| ### 8.2 All extension terms must be assigned the environment subset | |
| A term (slot) defined in an extension (rather than a core checklist term) MUST be assigned to the 'Environment' section (subset). | |
| > The guidance in this section currently assumes that term (slot) membership in a section is modelled using the LinkML `in_subset` attribute on the slot, pointing to subset identifiers defined under the schema's top-level `subsets:` section. This mechanism MAY be replaced with the use of `slot_group` in the future. | |
| ### 8.1 All core terms must be assigned to a subset via `in_subset` | |
| All core checklist terms (slots) MUST be assigned to a section subset via the slot's `in_subset` attribute, referring to an appropriate subset defined under `subsets:`. | |
| ### 8.2 All extension terms must be assigned the environment subset via `in_subset` | |
| A term (slot) defined in an extension (rather than a core checklist term) MUST have its slot `in_subset` attribute include the 'Environment' section subset (as defined under `subsets:`). |
| syntax: ^{particulate_matter_name};{float} {unit}$ | ||
| ``` | ||
|
|
||
| ### 11.4 Preferred units |
There was a problem hiding this comment.
Section numbering: this heading is labeled "### 11.4" but there is already a previous "### 11.4" above (Specifying units). Renumbering will avoid confusing cross-references/anchors.
| ### 11.4 Preferred units | |
| ### 11.5 Preferred units |
| Original author or contributors SHOULD be referred to by a stable ID such as an [ORCID](https://orcid.org/), with the persons name included as a comment. | ||
|
|
||
| ```yaml | ||
| created_by: orcid:0000-1234-1234-1234 # Erika Mustermann |
There was a problem hiding this comment.
The example for recording the modification author uses created_by, but this section is describing modified_by. This looks like a copy/paste error and could lead to incorrect provenance fields being added to terms.
| created_by: orcid:0000-1234-1234-1234 # Erika Mustermann | |
| modified_by: orcid:0000-1234-1234-1234 # Erika Mustermann |
|
I requested Copilot reviews here because I’m helping triage/review MIxS PRs, not because I authored them. If you have the same GitHub permissions and Copilot access, you can do the same on any PR. You’ll know you’re enabled if you can see the Copilot review option in the PR review UI or related actions; if not, you likely need org/repo access and a Copilot seat or feature enablement from the repo or GitHub org admins. |
|
We've requested a GitHub Copilot review on this PR as part of a pass across all open MixS PRs. Copilot catches things like unused imports, resource leaks, and naming inconsistencies — it's a lightweight first pass, not a substitute for human review. No action needed from you unless Copilot flags something you agree with. |
This is a natural extension of PR #943 .
Instead of providing examples to slots that can be used as templates by newcomers for writing/preparing new slots, this is meant to act as a precise and exact reference (as far as possible) of exactly how a slot should be designed.
I have based the structure (e.g. with numbering, which could be likely automated instead of manually defining by a website rendering engine) off of another bioinformatics community project I am heavily involved in (example).
This is not yet finished, and will likely need large community input - however I place this hear to kick-start a conversation.
I will write based on my impression of the MIxS LinkML schema.
Warning
This page is entirely based on the experiences of a novice user, and will likely require heavy editing by experts