-
Notifications
You must be signed in to change notification settings - Fork 7
Add dataset publishing and citation content #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
266b8a7
Add guidelines on citing and publishing datasets
vmartinez-cu 44453e7
Add images used in a couple data managment guides
vmartinez-cu c88603d
Minor fixes to typos and file paths
vmartinez-cu d6d6555
Merge branch 'main' into publishing-datasets
bourque a36a118
Add word for codespell to ignore in toml
vmartinez-cu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
69 changes: 69 additions & 0 deletions
69
docs/source/data_management/citing_and_publishing_datasets.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| # Publishing and Citing Datasets | ||
|
|
||
| Guidelines for making datasets publicly available, creating DOIs, and properly citing datasets. | ||
|
|
||
| ## Purpose | ||
|
|
||
| This guideline supports Data Systems workflows by ensuring datasets referenced in publications are | ||
| openly accessible, properly identified with DOIs, and cited according to community standards. It | ||
| aligns with publisher and funder policies and promotes scientific reproducibility. | ||
|
|
||
| ## How to Publish and Cite Datasets | ||
|
|
||
| **Make Data Open and Accessible** | ||
| Ensure datasets associated with publications are stored in publicly accessible, machine-readable formats. | ||
|
|
||
| **Create a DOI and Landing Page** | ||
| Digital Object Identifiers, DOIs, are machine-readable identifiers that resolve to information about a resource. | ||
| In addition to datasets, researchers can have an ORCID digital identifier, see https://orcid.org/. Publishers are | ||
| now generally requiring DOIs that point to data referenced publications, and often ORCIDs as well. | ||
| - See [Digital Object Identifiers](digital_object_identifiers.md) for an introduction to DOIs. | ||
| - See [Creating a DOI via CU Libraries and DataCite](creating_a_doi.md) for a quick start on creating a DOI and a | ||
| landing page for a dataset. | ||
| - LASP could build resources to create and manage DOIs and associated landing pages. | ||
|
|
||
| **Cite Datasets in Publications** | ||
| Follow established data citation principles to ensure datasets are properly cited in scholarly works. Reference | ||
| the [Force 11 Joint Declaration of Data Citation Principles](https://www.force11.org/datacitationprinciples) and | ||
| follow practices described in the [ESIP Dataset Citation Guidelines](https://doi.org/10.6084/m9.figshare.8441816). | ||
|
|
||
| **Understand Publisher Requirements** | ||
| Ensure that DOIs and ORCIDs are included as required by publishers to maintain compliance with submission guidelines. | ||
|
|
||
| ## Options | ||
|
|
||
| There are several options for publishing datasets: | ||
|
|
||
| 1. **CU Libraries and DataCite DOI Creation** | ||
| Researchers can create DOIs and landing pages for datasets using CU Libraries' integration with DataCite. | ||
|
|
||
| 2. **CU Scholar Hosting** | ||
| CU Scholar can host articles, reports, and datasets of limited size. CU Scholar prefers to generate and manage DOIs | ||
| for hosted datasets. | ||
|
|
||
| 3. **LASP DOI Management (Future Direction)** | ||
| LASP can develop internal resources for creating and managing DOIs and dataset landing pages, streamlining the | ||
| process for LASP-affiliated data products. | ||
|
|
||
| 4. **External Repositories** | ||
| For larger datasets or specialized data types, external repositories that support DOI assignment can be considered. | ||
|
|
||
| ## Useful Links | ||
|
|
||
| - [Creating a DOI via CU Libraries and DataCite](creating_a_doi.md) | ||
| - [CU Scholar](https://scholar.colorado.edu/about) | ||
| - [Force 11 Joint Declaration of Data Citation Principles](https://www.force11.org/datacitationprinciples) | ||
| - [ESIP Dataset Citation Guidelines](https://doi.org/10.6084/m9.figshare.8441816) | ||
| - [Zenodo DOI Citation Guide](https://doi.org/10.5281/zenodo.1451971) | ||
| - [Data Citation Roadmap (Scholarly Repositories)](https://www.biorxiv.org/content/biorxiv/early/2017/10/09/097196.full.pdf) | ||
| - [ORCID](https://orcid.org/) | ||
| - [DOIs for SORCE Data Products](https://confluence.lasp.colorado.edu/pages/viewpage.action?pageId=21464459) | ||
| (Confluence) | ||
|
|
||
| ## Acronyms | ||
|
|
||
| - **DOI** = Digital Object Identifier | ||
| - **ORCID** = Open Researcher and Contributor ID | ||
| - **ESIP** = Earth Science Information Partners | ||
|
|
||
| Credit: Content taken from a Confluence guide written by Anne Wilson and Shawn Polson. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| # Creating a DOI via CU Libraries and DataCite | ||
|
|
||
| As of 2018, CU Libraries is a member of DataCite. Through this membership, LASP can mint and | ||
| register DOIs for datasets housed in our repositories, enabling data to be persistently identified, | ||
| accessed, and cited. | ||
|
|
||
| Guidelines for assigning Digital Object Identifiers (DOIs) to datasets using this | ||
| membership, including steps for request, metadata requirements, and long-term | ||
| responsibilities are outlined here. | ||
|
|
||
| ## Purpose | ||
|
|
||
| This guideline supports LASP’s data publishing workflows by enabling the creation and registration | ||
| of persistent identifiers (DOIs) for datasets using CU resources. These identifiers help ensure | ||
| long-term access, discoverability, and proper citation of data. | ||
|
|
||
| ## How to create a DOI | ||
|
|
||
| Dataset DOIs should resolve to a dataset landing page providing information about the dataset like where it can be | ||
| accessed. CU libraries automatically generates a generic landing page populated with high level metadata on datasets | ||
| provided in a DOI request form. Whereas CU Scholar, another CU resource, requires data providers to have a reference to | ||
| such a landing page when creating a DOI. | ||
|
|
||
| Note, that the number of DOIs allocated to LASP is limited. | ||
|
|
||
| 1. **Submit a Request** | ||
| - Researchers: File a Jira issue with type "DOI" in the [Data Management Jira project](https://jira.lasp.colorado.edu/projects/DATAMAN/). | ||
|
|
||
| 2. **Prepare Required Metadata** | ||
| - Work with the Data Management team to ensure proper metadata and landing page are available. | ||
| - Minimum required metadata for DOI creation: | ||
| - URL of the landing page (not the dataset itself) | ||
| - Creators (list of names) | ||
| - Title | ||
| - Publisher (typically LASP or a project within LASP) | ||
| - ResourceType (usually `dataset`) | ||
| - DataCite supports additional metadata. Those properties are described here: https://support.datacite.org/docs/metadata-quality. | ||
|
|
||
| 3. **Create DOI via DataCite** | ||
| - Data Management team logs into [doi.datacite.org](https://doi.datacite.org/) using the `CUB.LASP` repository ID. | ||
| - Click "DOIs" → "Create DOI (Form)" | ||
| - Use the form to enter metadata. See full field descriptions at: [DataCite Field Descriptions](https://support.datacite.org/docs/field-descriptions-for-form) | ||
| - For developers: There is an [API](https://support.datacite.org/docs/api) that reads the full metadata schema. | ||
|
|
||
| 4. **Maintain DOI Metadata** | ||
| - Keep DOI metadata up to date in the [DataCite Metadata Store](https://support.datacite.org/docs) | ||
| - If a dataset is moved, update the registry. | ||
| - If a dataset is removed, maintain a “tombstone” landing page. | ||
|
|
||
| 5. **Follow DOI Best Practices** | ||
| - Use landing pages (not direct links to datasets). | ||
| - Maintain metadata quality and completeness as information changes. | ||
| - See [Metadata Guidelines](metadata.md) for dataset metadata requirements. | ||
|
|
||
| 6. **Adhere to Roles and Responsibilities** | ||
|
|
||
| LASP (as a DataCite Client) must meet responsibilities outlined in: | ||
| - [DataCite Community Responsibility](https://support.datacite.org/docs/community-responsibility) | ||
| - [Data Citation Roadmap for Scholarly Data Repositories](https://www.biorxiv.org/content/biorxiv/early/2017/10/09/097196.full.pdf) | ||
|
|
||
|  | ||
|
|
||
| ## Getting Help | ||
|
|
||
| Please use the [DATAMAN](https://jira.lasp.colorado.edu/secure/RapidBoard.jspa?rapidView=1430) project on MODS-Jira to | ||
| submit a ticket, and someone from the Data Management Working Group will respond to it. | ||
|
|
||
| ## Useful Links | ||
|
|
||
| - [Intro to Digital Object Identifiers](digital_object_identifiers.md) | ||
| - [DataCite](https://doi.datacite.org/) | ||
| - [Field Descriptions for DOI Form](https://support.datacite.org/docs/field-descriptions-for-form) | ||
| - [DataCite Metadata Quality](https://support.datacite.org/docs/metadata-quality) | ||
| - [DataCite Community Responsibility](https://support.datacite.org/docs/community-responsibility) | ||
| - [Data Citation Roadmap (Scholarly Repositories)](https://www.biorxiv.org/content/biorxiv/early/2017/10/09/097196.full.pdf) | ||
| - [Intro to DataCite REST API](https://support.datacite.org/docs/api) | ||
| - [Metadata Requirements](metadata.md) | ||
| - [NASA EOSDIS DOI Guidelines](https://wiki.earthdata.nasa.gov/display/DOIsforEOSDIS) | ||
| - [CU Scholar](https://scholar.colorado.edu/about) | ||
| - [Creating a DOI for Software](../workflows/open_source/citing_software.md) | ||
|
|
||
| ## Acronyms | ||
|
|
||
| - **DOI** = Digital Object Identifier | ||
| - **NASA** = National Aeronautics and Space Administration | ||
| - **EOSDIS** = Earth Observing System Data and Information System | ||
| - **API** = Application Programming Interface | ||
|
|
||
| Credit: Content taken from a Confluence guide written by Anne Wilson and updated by Doug Lindholm | ||
137 changes: 137 additions & 0 deletions
137
docs/source/data_management/digital_object_identifiers.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,137 @@ | ||
| # Digital Object Identifiers | ||
|
|
||
| A Digital Object Identifier (DOI) is a code used to uniquely | ||
| identify content of various types. DOIs enable easy online | ||
| access to research data for discovery, attribution, and reuse, | ||
| and enable accurate data citation and other metrics. DOIs are | ||
| a persistent identifier, and as such carry expectations of | ||
| curation, persistent access, and rich metadata. | ||
|
|
||
| There is a system and practices associated with DOI usage, | ||
| for "persistent and actionable identification and interoperable | ||
| exchange of managed information on digital networks" | ||
| (https://support.datacite.org/docs/doi-basics). | ||
|
|
||
| DOIs are intended to be "resolvable," usually to information | ||
| about the object to which the DOI refers—including information | ||
| about where the object can be found. For a dataset, that would | ||
| be a dataset landing page providing information about the | ||
| dataset like where it can be accessed. The DOI should not | ||
| point to the dataset itself. The DOI remains fixed over the | ||
| lifetime of the object, whereas its location and metadata may | ||
| change. When the location changes, the publisher of the | ||
| object is responsible for updating the metadata for the DOI | ||
| to the new locations. | ||
|
|
||
| The developer and administrator of the DOI system is the | ||
| International DOI Foundation (IDF) which introduced DOIs | ||
| in 2000. Organizations that meet the contractual obligations | ||
| of the DOI system and that are willing to pay to become a | ||
| member (such as DataCite, see below) can assign DOIs. | ||
|
|
||
| The DOI system is implemented through a federation of | ||
| registration agencies coordinated by the IDF. | ||
| See https://www.doi.org/, and particularly | ||
| https://www.doi.org/hb.html, the DOI Handbook, for details. | ||
|
|
||
| ## Purpose of DOIs | ||
|
|
||
| Funding agencies and publishers increasingly recognize that | ||
| datasets and scientific software are valuable research outputs | ||
| that should be openly available, identifiable, and citable—often | ||
| through DOIs. | ||
|
|
||
| At LASP, digital objects worthy of identification include | ||
| datasets and associated outputs (e.g., documentation, papers, | ||
| workflows, algorithms, software, etc.). | ||
|
|
||
| ## DOI registries | ||
|
|
||
| To enable accessibility, a DOI needs to reside in a registry | ||
| where it can be resolved. The registry collects and provides | ||
| high level information, assigns DOIs, and links to references. | ||
|
|
||
| [DataCite](https://datacite.org/) is a not-for-profit, global | ||
| initiative to "help the research community locate, identify, | ||
| and cite research data with confidence," through DOI minting | ||
| and registration. It is the leading global provider of DOIs | ||
| for datasets. From their website: | ||
|
|
||
| >By working closely with data centres to assign DOIs to | ||
| > datasets and other research objects, we are developing a | ||
| > robust infrastructure that supports simple and effective | ||
| > methods of data citation, discovery, and access. Citable | ||
| > data become legitimate contributions to scholarly | ||
| > communication, paving the way for new metrics and | ||
| > publication models that recognize and reward data sharing. | ||
|
|
||
| CU Libraries are now a member of DataCite. Through this | ||
| membership, LASP can mint and register DOIs for datasets | ||
| housed in our repositories, enabling data to be persistently | ||
| identified, accessed, and cited. | ||
|
|
||
| [Crossref](https://www.crossref.org/) is another registry that | ||
| is often mentioned in Earth and space science contexts. It's | ||
| a not-for-profit association of ~2000 voting member publishers | ||
| who represent 4300 societies and publishers. It exists to | ||
| facilitate the links between distributed content hosted at | ||
| other sites, and uses DOIs to do so. | ||
|
|
||
| [Zenodo](https://zenodo.org/) is a free repository developed | ||
| by CERN and operated by OpenAIRE. It is a general-purpose | ||
| repository that allows researchers to deposit datasets, | ||
| research software, reports, and any other research-related | ||
| digital artifacts. Zenodo assigns DOIs to the deposited | ||
| content, making it citable and discoverable. | ||
| See [citing software](../workflows/open_source/citing_software.md) | ||
| for more on using Zenodo to cite software. | ||
|
|
||
| [ORCiDs](https://orcid.org/) are like DOIs but provide | ||
| persistent digital object identifiers for people. | ||
|
|
||
| ## DOI Format | ||
|
|
||
| When a LASP researcher needs a DOI, they will provide some information and receive a DOI back. | ||
| They will never actually create a DOI. Nevertheless, it is worth understanding the form of a DOI | ||
| and the goals behind its format. | ||
|
|
||
| DataCite goals for DOIs include enabling robots and crawlers to recognize DataCite DOIs as URLs, | ||
| making them easy to cut and paste, and helping users recognize that DOIs are both a persistent link | ||
| and a persistent identifier. | ||
|
|
||
| This is a DOI: | ||
|
|
||
| https://doi.org/10.5281/ZENODO.31780 | ||
| A DOI name consists of three parts: | ||
|
|
||
|  | ||
|
|
||
| The proxy is an HTTP URL. DataCite recommends that all DOIs are permanent URLs. | ||
| (Using the old DOI protocol, e.g. doi:/10.5281/ZENODO.31780 is NOT recommended.) | ||
|
|
||
| A DOI prefix always starts with "10." and continues with a number. This number | ||
| defines a globally unique namespace. (The scope of "global" depends on the organization | ||
| managing multiple repositories.) Prefixes should not have semantic meaning. Adding | ||
| meaning to the identifier is risky because "despite besting intentions, all names can | ||
| change over time" [DataCite DOI Basics](https://support.datacite.org/docs/doi-basics). | ||
|
|
||
| The suffix for a DOI can be almost any string. Here is where information provided in an | ||
| input form may be integrated into the DOI. | ||
|
|
||
| Note that DOI names are not case-sensitive, while URLs are case-sensitive: | ||
| https://support.datacite.org/docs/datacite-doi-display-guidelines. | ||
|
|
||
| ## Useful Links | ||
|
|
||
| - [DataCite: DOI Basics](https://support.datacite.org/docs/doi-basics) | ||
| - [DataCite: DOI Handbook](https://www.doi.org/the-identifier/resources/handbook/) | ||
| - [DataCite: DOI Display Guidelines](https://support.datacite.org/docs/datacite-doi-display-guidelines) | ||
| - [Creating a DOI via CU Libraries and DataCite](creating_a_doi.md) | ||
|
|
||
| ## Acronyms | ||
|
|
||
| - **DOI** = Digital Object Identifier | ||
| - **IDF** = International DOI Foundation | ||
| - **ORCID** = Open Researcher and Contributor ID | ||
|
|
||
| Credit: Content taken from a Confluence guide written by Anne Wilson and Shawn Polson. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.