Skip to content

Documentation: MIxS Term (linkML slot) specification documentation#944

Open
jfy133 wants to merge 52 commits intoGenomicsStandardsConsortium:mainfrom
jfy133:docs-slot-specifications
Open

Documentation: MIxS Term (linkML slot) specification documentation#944
jfy133 wants to merge 52 commits intoGenomicsStandardsConsortium:mainfrom
jfy133:docs-slot-specifications

Conversation

@jfy133
Copy link
Copy Markdown
Collaborator

@jfy133 jfy133 commented Apr 28, 2025

This is a natural extension of PR #943 .

Instead of providing examples to slots that can be used as templates by newcomers for writing/preparing new slots, this is meant to act as a precise and exact reference (as far as possible) of exactly how a slot should be designed.

I have based the structure (e.g. with numbering, which could be likely automated instead of manually defining by a website rendering engine) off of another bioinformatics community project I am heavily involved in (example).

This is not yet finished, and will likely need large community input - however I place this hear to kick-start a conversation.

I will write based on my impression of the MIxS LinkML schema.

Warning

This page is entirely based on the experiences of a novice user, and will likely require heavy editing by experts

jfy133 added a commit to jfy133/genomics-standards-consortium-mixs that referenced this pull request Apr 28, 2025
@jfy133 jfy133 marked this pull request as ready for review May 5, 2025 19:37
@turbomam
Copy link
Copy Markdown
Member

turbomam commented May 6, 2025

I skimmed this and it looks like a fantastic starting point. I didn't see anything that I disagree with yet and I'm sure we can add more over time. So I will read it again more carefully and am looking forward to advocating for it to be merged in!

We should see if anything has come out of @only1chunts's related efforts about defining or clarifying the role of the different LinkML metaslots for MIxS terms/slots

@sierra-moxon and I have been talking about refining the definitions of LinkML metaslots, and this may serve as a contribution towards that.

@jfy133
Copy link
Copy Markdown
Collaborator Author

jfy133 commented May 7, 2025

nice :D I look forward to @only1chunts 's thoughts, and if mostly happy we can move to a bigger discussion in one of the TWG/CIG meetings?

Copy link
Copy Markdown
Collaborator

@Woolly-at-EBI Woolly-at-EBI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really useful.
Most of my comments are minor


### 8.2 All extension terms must be assigned the environment subset

A term (slot) defined in an extension (rather than a core checklist term) MUST be assigned to the 'Environment' section (subset).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarification sought: as well as "core terms" there will be some terms that occur in both environmental and non-environmental use cases, but not in all use cases so it is it correct to say "must"?
For example at ENA we are working on an "laboratory animal checklist", yes a laboratory is obviously an environment, but it is not quite like a pond or even a building, when the subject is a mouse, okay the mouse is a kind of environment too. Umm my head is hurting.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never actually understood why some terms were repeated in the extensions (it seems), e.g. samp_name is in all the extensions, but it is in the 'core' checklists? Unless this is an error @turbomam or @mslarae13 would you happen to know?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally we had core terms. These were terms we anticipated being used across many (genomic) checklists.
Currently we have genomic checklists and environmental extensions. With the idea that sample metadata that is being submitted, e.g., to INSDC, would include terms from a checklist and one or more environmental extensions. However, the submission could be using just a 'genomic checklist'. Thus 'sample name' is the key term to have in any reported metadata submission.

### 10.2 Specifying units

Terms (slots) that require the use of a measurement unit SHOULD specify the types of units through a dedicated structured string pattern component.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree preferred_unit ought to be snake_case and lower case.
The preferred_unit's do serve a useful purpose in that theoretically an implementor can validate them. Many systems do hold the unit and the value separately to reduce complexity, but I get that people are worried about losing the units. (It would be a major and unwelcome short term change for many implementors)

@jfy133 jfy133 changed the title Documentation: create a specifications document on how to write a MIxS LinkML slot Documentation: MIxS Term (linkML slot) specification documentation Sep 15, 2025
Copy link
Copy Markdown
Collaborator

@Woolly-at-EBI Woolly-at-EBI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the changed following my suggestions. Thank you.

Copy link
Copy Markdown
Member

@lschriml lschriml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be useful to combine sections of this document that are similar.
And to separate the MIxS specific and LinkML specific sections.

- [`modified_by`](https://linkml.io/linkml-model/latest/docs/modified_by/)
- [`last_updated_on`](https://linkml.io/linkml-model/latest/docs/last_updated_on/)

### 15.2 Terms should record original author
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is LinkML, not MIxS

created_by: orcid:0000-1234-1234-1234 # Erika Mustermann
```

### 15.3. Term creation date should be recorded
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is LinkML, not MIxS

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also for the sections below.


The date in which the term (slot) was updated or modified SHOULD be recorded using the LinkML [`last_updated_on`](https://linkml.io/linkml-model/latest/docs/last_updated_on/) attribute.

## 16 Importing terms from other standards
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is covered earlier

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these sections should be merged, reviewed.


- [Darwin Core (DwC)](https://dwc.tdwg.org/)

### 16.2 Imported external standards terms requirements
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be reviewed


Minor modifications to the term (slot) structured comment (name) MAY BE made to ensure compliance with MIxS naming conventions.

## References
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not part of MIxS

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new reference-style specification document describing how MIxS metadata terms (“slots”) should be designed in the LinkML schema, and updates MkDocs configuration to support footnotes used by the new document.

Changes:

  • Add src/docs/slot_specifications.md with detailed guidance for MIxS slot design (naming, ranges, examples, provenance, etc.).
  • Enable Markdown footnotes in mkdocs.yml to render the new document correctly.
  • Minor cleanup in mkdocs.yml (removal of an extraneous line).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 16 comments.

File Description
src/docs/slot_specifications.md Introduces a comprehensive slot/term specification guide (new documentation page).
mkdocs.yml Enables footnotes extension and cleans YAML to support rendering the new doc.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


### 11.1 Range options should be valid LinkML types

See section [4](#4-data-types).
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken internal reference: "See section 4" points to an anchor that doesn't exist in this document (section 4 is "Attributes for MIxS terms"). Update the link target/text to the intended section (likely section 3 on range types).

Suggested change
See section [4](#4-data-types).
See section [3](#3-data-types).

Copilot uses AI. Check for mistakes.

A term that requires a specific value syntax or a structured string layout SHOULD use the `structured_pattern` slot attribute, where the pattern components SHOULD be predefined in the `settings:` section of the schema when theoretically could be used more than once.

A slot MAY use `pattern:` attribute when XYZ <!-- TODO -->.
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section still contains a placeholder ("when XYZ "). If this is intended to be merged as a reference specification, the TODO should be resolved or converted into an explicit "TBD" note with a tracking issue/link so readers don't treat it as normative guidance.

Suggested change
A slot MAY use `pattern:` attribute when XYZ <!-- TODO -->.
> [!NOTE]
> TBD: The specific conditions under which a slot MAY use the `pattern:` attribute instead of `structured_pattern` are under discussion and will be documented in a future revision of this specification. See the project issue tracker for the latest status.

Copilot uses AI. Check for mistakes.
| `culture` | `cult_` | `cult_result_org` |
| `culture` | `cult_` | `cult_root_med` |
| `dissolved` | `diss_` | `diss_carb_dioxide` |
| `dissovled` | `diss_` | `diss_hydrogen` |
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "dissovled" is misspelled; should be "dissolved".

Suggested change
| `dissovled` | `diss_` | `diss_hydrogen` |
| `dissolved` | `diss_` | `diss_hydrogen` |

Copilot uses AI. Check for mistakes.
- `integer`
- `float`
- `boolean`
- An '[enumeration](#145-enumerations)' (i.e., controlled vocabulary') predefined by MIxS (see top of the [schema](https://github.com/GenomicsStandardsConsortium/mixs/blob/main/src/mixs/schema/mixs.yaml#L28)).
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bullet has mismatched quotes/parentheses ("An '[enumeration]...'" and "controlled vocabulary'") which renders oddly and may confuse readers. Clean up the quoting so the sentence reads unambiguously.

Suggested change
- An '[enumeration](#145-enumerations)' (i.e., controlled vocabulary') predefined by MIxS (see top of the [schema](https://github.com/GenomicsStandardsConsortium/mixs/blob/main/src/mixs/schema/mixs.yaml#L28)).
- An [enumeration](#145-enumerations) (i.e., controlled vocabulary) predefined by MIxS (see top of the [schema](https://github.com/GenomicsStandardsConsortium/mixs/blob/main/src/mixs/schema/mixs.yaml#L28)).

Copilot uses AI. Check for mistakes.

### 11.4 Preferred units

Terms (slots) that record a measurement SHOULD specify the preferred unit of measurement for the term (slot) within a LinkmL `annotation` slot sub-attribute called `Preferred_unit:`.
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LinkML attribute name: this refers to a LinkML annotation slot attribute, but LinkML uses annotations (plural) and MIxS uses an annotations: map with a Preferred_unit key. Also "LinkmL" capitalization looks accidental; fixing this prevents readers from copying invalid YAML.

Suggested change
Terms (slots) that record a measurement SHOULD specify the preferred unit of measurement for the term (slot) within a LinkmL `annotation` slot sub-attribute called `Preferred_unit:`.
Terms (slots) that record a measurement SHOULD specify the preferred unit of measurement for the term (slot) within a LinkML `annotations` slot attribute using a `Preferred_unit` key.

Copilot uses AI. Check for mistakes.
- Valid example: `library size`.
- Invalid examples:
- `Library size` (capitalisation of first character).
- `Library Size` (capitalisation of of all words).
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "capitalisation of of all words" has a duplicated "of".

Suggested change
- `Library Size` (capitalisation of of all words).
- `Library Size` (capitalisation of all words).

Copilot uses AI. Check for mistakes.

### 7.1 Minimum number of examples

There MUST have minimum of 1 examples for a term (slot).
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar/pluralization: "There MUST have minimum of 1 examples" should be rewritten (e.g., "There MUST be a minimum of 1 example"). As written it's ungrammatical and mixes singular/plural.

Suggested change
There MUST have minimum of 1 examples for a term (slot).
There MUST be a minimum of 1 example for a term (slot).

Copilot uses AI. Check for mistakes.
Comment on lines +251 to +259
> The guidance in this section regarding `subset`s may be replaced with the use of `slot-group` in the future.

### 8.1 All core terms must be assigned a subset

All core checklist terms (slot) MUST be assigned to a section (subset).

### 8.2 All extension terms must be assigned the environment subset

A term (slot) defined in an extension (rather than a core checklist term) MUST be assigned to the 'Environment' section (subset).
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LinkML terminology mismatch: this section refers to assigning a slot to a "subset", but in the MIxS schema slots use the in_subset attribute (and subsets: defines subset objects). Consider updating wording to explicitly instruct using in_subset on the slot to avoid implementers looking for a non-existent subset slot attribute.

Suggested change
> The guidance in this section regarding `subset`s may be replaced with the use of `slot-group` in the future.
### 8.1 All core terms must be assigned a subset
All core checklist terms (slot) MUST be assigned to a section (subset).
### 8.2 All extension terms must be assigned the environment subset
A term (slot) defined in an extension (rather than a core checklist term) MUST be assigned to the 'Environment' section (subset).
> The guidance in this section currently assumes that term (slot) membership in a section is modelled using the LinkML `in_subset` attribute on the slot, pointing to subset identifiers defined under the schema's top-level `subsets:` section. This mechanism MAY be replaced with the use of `slot_group` in the future.
### 8.1 All core terms must be assigned to a subset via `in_subset`
All core checklist terms (slots) MUST be assigned to a section subset via the slot's `in_subset` attribute, referring to an appropriate subset defined under `subsets:`.
### 8.2 All extension terms must be assigned the environment subset via `in_subset`
A term (slot) defined in an extension (rather than a core checklist term) MUST have its slot `in_subset` attribute include the 'Environment' section subset (as defined under `subsets:`).

Copilot uses AI. Check for mistakes.
syntax: ^{particulate_matter_name};{float} {unit}$
```

### 11.4 Preferred units
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Section numbering: this heading is labeled "### 11.4" but there is already a previous "### 11.4" above (Specifying units). Renumbering will avoid confusing cross-references/anchors.

Suggested change
### 11.4 Preferred units
### 11.5 Preferred units

Copilot uses AI. Check for mistakes.
Original author or contributors SHOULD be referred to by a stable ID such as an [ORCID](https://orcid.org/), with the persons name included as a comment.

```yaml
created_by: orcid:0000-1234-1234-1234 # Erika Mustermann
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example for recording the modification author uses created_by, but this section is describing modified_by. This looks like a copy/paste error and could lead to incorrect provenance fields being added to terms.

Suggested change
created_by: orcid:0000-1234-1234-1234 # Erika Mustermann
modified_by: orcid:0000-1234-1234-1234 # Erika Mustermann

Copilot uses AI. Check for mistakes.
@turbomam
Copy link
Copy Markdown
Member

I requested Copilot reviews here because I’m helping triage/review MIxS PRs, not because I authored them. If you have the same GitHub permissions and Copilot access, you can do the same on any PR. You’ll know you’re enabled if you can see the Copilot review option in the PR review UI or related actions; if not, you likely need org/repo access and a Copilot seat or feature enablement from the repo or GitHub org admins.

@turbomam
Copy link
Copy Markdown
Member

We've requested a GitHub Copilot review on this PR as part of a pass across all open MixS PRs. Copilot catches things like unused imports, resource leaks, and naming inconsistencies — it's a lightweight first pass, not a substitute for human review. No action needed from you unless Copilot flags something you agree with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants