-
Notifications
You must be signed in to change notification settings - Fork 6
Looking for comments: implementing the DATS JSON schemas #1
Description
The proposed cross-cut metadata model, aka DATS, is available as machine readable JSON schemas. Instance files can be serialized as JSON and linked data support is provided via one or several JSON-LD context files. We currently provide 2 distinct JSON-LD context files based on 2 complementary, community-driven vocabularies resources: (i) schema.org, and (ii) relevant OBO Foundry ontologies.
Justification: will will use these two resources because: (a) they meet two different requirements, schema.org enables discoverability by major search engines, whilst OBO Foundry facilitates interoperability with many biomedical databases; and (b) there is no single vocabulary that fulfill the requirement of the metadata elements.
Background: Schema.org (http://schema.org) is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond; it is sponsored by Google, Microsoft, Yahoo and Yandex, Schema.org is already used by over 10 million sites to markup their web pages and email messages. Anchoring the cross-cut metadata model to such a (potentially) powerful vocabulary is very valuable and serve the specific discoverability scope; although it covers many topics, it is shallow when describing datasets, experiments, sample etc. The Oxford team has also contributed suggestion to extend schema.org, but the process is very controlled and centrally coordinated, as one would expect given the scope, and it seems that additions are prioritised according to their cross-domain applicability, as expected. The Oxford team also contributes (by participating in and leading on) activities under the Bioschemas umbrella (http://bioschemas.org), which includes major data repositories, BD2K and ELIXIR resources and is set to cover other digital objects beyond data. However, these ‘extensions’ may actually remain such and will not necessarily to be used/included in the general schema.org vocabulary; the process is still unclear. Nevertheless, also the scope of bioschemas is discoverability (especially if and when it will become clear how these will be added to/used by schema) and the vocabularies are not rich/deep.
Conclusion: the need to complement schema.org/bioschemas with OBO Foundry ontologies remains, which ensure compatibility with models such as biolinks. Relying on another framework also allows to test reactivity and responsiveness of the community when sending term requests, as gaps are identified. It thus allows to devise key performance indicators which could be used to select a resource over another one.
Future intent: This initial choice of 2 framework is by no means final. In fact, more JSON-LD context may be produced to support other needs as found in clinical context (NCIT, LOINC, CDISC-RDF). Finally, one has to stress that these framework are not mutually exclusive and in fact ought to be used together to maximize their effects and respectives values.