Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/source/_static/doi_format.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
69 changes: 69 additions & 0 deletions docs/source/data_management/citing_and_publishing_datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Publishing and Citing Datasets

Guidelines for making datasets publicly available, creating DOIs, and properly citing datasets.

## Purpose

This guideline supports Data Systems workflows by ensuring datasets referenced in publications are
openly accessible, properly identified with DOIs, and cited according to community standards. It
aligns with publisher and funder policies and promotes scientific reproducibility.

## How to Publish and Cite Datasets

**Make Data Open and Accessible**
Ensure datasets associated with publications are stored in publicly accessible, machine-readable formats.

**Create a DOI and Landing Page**
Digital Object Identifiers, DOIs, are machine-readable identifiers that resolve to information about a resource.
In addition to datasets, researchers can have an ORCID digital identifier, see https://orcid.org/. Publishers are
now generally requiring DOIs that point to data referenced publications, and often ORCIDs as well.
- See [Digital Object Identifiers](digital_object_identifiers.md) for an introduction to DOIs.
- See [Creating a DOI via CU Libraries and DataCite](creating_a_doi.md) for a quick start on creating a DOI and a
landing page for a dataset.
- LASP could build resources to create and manage DOIs and associated landing pages.

**Cite Datasets in Publications**
Follow established data citation principles to ensure datasets are properly cited in scholarly works. Reference
the [Force 11 Joint Declaration of Data Citation Principles](https://www.force11.org/datacitationprinciples) and
follow practices described in the [ESIP Dataset Citation Guidelines](https://doi.org/10.6084/m9.figshare.8441816).

**Understand Publisher Requirements**
Ensure that DOIs and ORCIDs are included as required by publishers to maintain compliance with submission guidelines.

## Options

There are several options for publishing datasets:

1. **CU Libraries and DataCite DOI Creation**
Researchers can create DOIs and landing pages for datasets using CU Libraries' integration with DataCite.

2. **CU Scholar Hosting**
CU Scholar can host articles, reports, and datasets of limited size. CU Scholar prefers to generate and manage DOIs
for hosted datasets.

3. **LASP DOI Management (Future Direction)**
LASP can develop internal resources for creating and managing DOIs and dataset landing pages, streamlining the
process for LASP-affiliated data products.

4. **External Repositories**
For larger datasets or specialized data types, external repositories that support DOI assignment can be considered.

## Useful Links

- [Creating a DOI via CU Libraries and DataCite](creating_a_doi.md)
- [CU Scholar](https://scholar.colorado.edu/about)
- [Force 11 Joint Declaration of Data Citation Principles](https://www.force11.org/datacitationprinciples)
- [ESIP Dataset Citation Guidelines](https://doi.org/10.6084/m9.figshare.8441816)
- [Zenodo DOI Citation Guide](https://doi.org/10.5281/zenodo.1451971)
- [Data Citation Roadmap (Scholarly Repositories)](https://www.biorxiv.org/content/biorxiv/early/2017/10/09/097196.full.pdf)
- [ORCID](https://orcid.org/)
- [DOIs for SORCE Data Products](https://confluence.lasp.colorado.edu/pages/viewpage.action?pageId=21464459)
(Confluence)

## Acronyms

- **DOI** = Digital Object Identifier
- **ORCID** = Open Researcher and Contributor ID
- **ESIP** = Earth Science Information Partners

Credit: Content taken from a Confluence guide written by Anne Wilson and Shawn Polson.
89 changes: 89 additions & 0 deletions docs/source/data_management/creating_a_doi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Creating a DOI via CU Libraries and DataCite

As of 2018, CU Libraries is a member of DataCite. Through this membership, LASP can mint and
register DOIs for datasets housed in our repositories, enabling data to be persistently identified,
accessed, and cited.

Guidelines for assigning Digital Object Identifiers (DOIs) to datasets using this
membership, including steps for request, metadata requirements, and long-term
responsibilities are outlined here.

## Purpose

This guideline supports LASP’s data publishing workflows by enabling the creation and registration
of persistent identifiers (DOIs) for datasets using CU resources. These identifiers help ensure
long-term access, discoverability, and proper citation of data.

## How to create a DOI

Dataset DOIs should resolve to a dataset landing page providing information about the dataset like where it can be
accessed. CU libraries automatically generates a generic landing page populated with high level metadata on datasets
provided in a DOI request form. Whereas CU Scholar, another CU resource, requires data providers to have a reference to
such a landing page when creating a DOI.

Note, that the number of DOIs allocated to LASP is limited.

1. **Submit a Request**
- Researchers: File a Jira issue with type "DOI" in the [Data Management Jira project](https://jira.lasp.colorado.edu/projects/DATAMAN/).

2. **Prepare Required Metadata**
- Work with the Data Management team to ensure proper metadata and landing page are available.
- Minimum required metadata for DOI creation:
- URL of the landing page (not the dataset itself)
- Creators (list of names)
- Title
- Publisher (typically LASP or a project within LASP)
- ResourceType (usually `dataset`)
- DataCite supports additional metadata. Those properties are described here: https://support.datacite.org/docs/metadata-quality.

3. **Create DOI via DataCite**
- Data Management team logs into [doi.datacite.org](https://doi.datacite.org/) using the `CUB.LASP` repository ID.
- Click "DOIs" → "Create DOI (Form)"
- Use the form to enter metadata. See full field descriptions at: [DataCite Field Descriptions](https://support.datacite.org/docs/field-descriptions-for-form)
- For developers: There is an [API](https://support.datacite.org/docs/api) that reads the full metadata schema.

4. **Maintain DOI Metadata**
- Keep DOI metadata up to date in the [DataCite Metadata Store](https://support.datacite.org/docs)
- If a dataset is moved, update the registry.
- If a dataset is removed, maintain a “tombstone” landing page.

5. **Follow DOI Best Practices**
- Use landing pages (not direct links to datasets).
- Maintain metadata quality and completeness as information changes.
- See [Metadata Guidelines](metadata.md) for dataset metadata requirements.

6. **Adhere to Roles and Responsibilities**

LASP (as a DataCite Client) must meet responsibilities outlined in:
- [DataCite Community Responsibility](https://support.datacite.org/docs/community-responsibility)
- [Data Citation Roadmap for Scholarly Data Repositories](https://www.biorxiv.org/content/biorxiv/early/2017/10/09/097196.full.pdf)

![DataCite_Repository_Guidelines](../_static/repository_obligations_table.png)

## Getting Help

Please use the [DATAMAN](https://jira.lasp.colorado.edu/secure/RapidBoard.jspa?rapidView=1430) project on MODS-Jira to
submit a ticket, and someone from the Data Management Working Group will respond to it.

## Useful Links

- [Intro to Digital Object Identifiers](digital_object_identifiers.md)
- [DataCite](https://doi.datacite.org/)
- [Field Descriptions for DOI Form](https://support.datacite.org/docs/field-descriptions-for-form)
- [DataCite Metadata Quality](https://support.datacite.org/docs/metadata-quality)
- [DataCite Community Responsibility](https://support.datacite.org/docs/community-responsibility)
- [Data Citation Roadmap (Scholarly Repositories)](https://www.biorxiv.org/content/biorxiv/early/2017/10/09/097196.full.pdf)
- [Intro to DataCite REST API](https://support.datacite.org/docs/api)
- [Metadata Requirements](metadata.md)
- [NASA EOSDIS DOI Guidelines](https://wiki.earthdata.nasa.gov/display/DOIsforEOSDIS)
- [CU Scholar](https://scholar.colorado.edu/about)
- [Creating a DOI for Software](../workflows/open_source/citing_software.md)

## Acronyms

- **DOI** = Digital Object Identifier
- **NASA** = National Aeronautics and Space Administration
- **EOSDIS** = Earth Observing System Data and Information System
- **API** = Application Programming Interface

Credit: Content taken from a Confluence guide written by Anne Wilson and updated by Doug Lindholm
137 changes: 137 additions & 0 deletions docs/source/data_management/digital_object_identifiers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Digital Object Identifiers

A Digital Object Identifier (DOI) is a code used to uniquely
identify content of various types. DOIs enable easy online
access to research data for discovery, attribution, and reuse,
and enable accurate data citation and other metrics. DOIs are
a persistent identifier, and as such carry expectations of
curation, persistent access, and rich metadata.

There is a system and practices associated with DOI usage,
for "persistent and actionable identification and interoperable
exchange of managed information on digital networks"
(https://support.datacite.org/docs/doi-basics).

DOIs are intended to be "resolvable," usually to information
about the object to which the DOI refers—including information
about where the object can be found. For a dataset, that would
be a dataset landing page providing information about the
dataset like where it can be accessed. The DOI should not
point to the dataset itself. The DOI remains fixed over the
lifetime of the object, whereas its location and metadata may
change. When the location changes, the publisher of the
object is responsible for updating the metadata for the DOI
to the new locations.

The developer and administrator of the DOI system is the
International DOI Foundation (IDF) which introduced DOIs
in 2000. Organizations that meet the contractual obligations
of the DOI system and that are willing to pay to become a
member (such as DataCite, see below) can assign DOIs.

The DOI system is implemented through a federation of
registration agencies coordinated by the IDF.
See https://www.doi.org/, and particularly
https://www.doi.org/hb.html, the DOI Handbook, for details.

## Purpose of DOIs

Funding agencies and publishers increasingly recognize that
datasets and scientific software are valuable research outputs
that should be openly available, identifiable, and citable—often
through DOIs.

At LASP, digital objects worthy of identification include
datasets and associated outputs (e.g., documentation, papers,
workflows, algorithms, software, etc.).

## DOI registries

To enable accessibility, a DOI needs to reside in a registry
where it can be resolved. The registry collects and provides
high level information, assigns DOIs, and links to references.

[DataCite](https://datacite.org/) is a not-for-profit, global
initiative to "help the research community locate, identify,
and cite research data with confidence," through DOI minting
and registration. It is the leading global provider of DOIs
for datasets. From their website:

>By working closely with data centres to assign DOIs to
> datasets and other research objects, we are developing a
> robust infrastructure that supports simple and effective
> methods of data citation, discovery, and access. Citable
> data become legitimate contributions to scholarly
> communication, paving the way for new metrics and
> publication models that recognize and reward data sharing.

CU Libraries are now a member of DataCite. Through this
membership, LASP can mint and register DOIs for datasets
housed in our repositories, enabling data to be persistently
identified, accessed, and cited.

[Crossref](https://www.crossref.org/) is another registry that
is often mentioned in Earth and space science contexts. It's
a not-for-profit association of ~2000 voting member publishers
who represent 4300 societies and publishers. It exists to
facilitate the links between distributed content hosted at
other sites, and uses DOIs to do so.

[Zenodo](https://zenodo.org/) is a free repository developed
by CERN and operated by OpenAIRE. It is a general-purpose
repository that allows researchers to deposit datasets,
research software, reports, and any other research-related
digital artifacts. Zenodo assigns DOIs to the deposited
content, making it citable and discoverable.
See [citing software](../workflows/open_source/citing_software.md)
for more on using Zenodo to cite software.

[ORCiDs](https://orcid.org/) are like DOIs but provide
persistent digital object identifiers for people.

## DOI Format

When a LASP researcher needs a DOI, they will provide some information and receive a DOI back.
They will never actually create a DOI. Nevertheless, it is worth understanding the form of a DOI
and the goals behind its format.

DataCite goals for DOIs include enabling robots and crawlers to recognize DataCite DOIs as URLs,
making them easy to cut and paste, and helping users recognize that DOIs are both a persistent link
and a persistent identifier.

This is a DOI:

https://doi.org/10.5281/ZENODO.31780
A DOI name consists of three parts:

![DOI_Format](../_static/doi_format.png)

The proxy is an HTTP URL. DataCite recommends that all DOIs are permanent URLs.
(Using the old DOI protocol, e.g. doi:/10.5281/ZENODO.31780 is NOT recommended.)

A DOI prefix always starts with "10." and continues with a number. This number
defines a globally unique namespace. (The scope of "global" depends on the organization
managing multiple repositories.) Prefixes should not have semantic meaning. Adding
meaning to the identifier is risky because "despite besting intentions, all names can
change over time" [DataCite DOI Basics](https://support.datacite.org/docs/doi-basics).

The suffix for a DOI can be almost any string. Here is where information provided in an
input form may be integrated into the DOI.

Note that DOI names are not case-sensitive, while URLs are case-sensitive:
https://support.datacite.org/docs/datacite-doi-display-guidelines.

## Useful Links

- [DataCite: DOI Basics](https://support.datacite.org/docs/doi-basics)
- [DataCite: DOI Handbook](https://www.doi.org/the-identifier/resources/handbook/)
- [DataCite: DOI Display Guidelines](https://support.datacite.org/docs/datacite-doi-display-guidelines)
- [Creating a DOI via CU Libraries and DataCite](creating_a_doi.md)

## Acronyms

- **DOI** = Digital Object Identifier
- **IDF** = International DOI Foundation
- **ORCID** = Open Researcher and Contributor ID

Credit: Content taken from a Confluence guide written by Anne Wilson and Shawn Polson.
5 changes: 4 additions & 1 deletion docs/source/data_management/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,7 @@ Data Management
file_formats/index
metadata.md
fair_principles.md
data_stewardship.md
data_stewardship.md
citing_and_publishing_datasets.md
digital_object_identifiers.md
creating_a_doi.md
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,4 @@ homepage = "https://github.com/lasp/"
repository = "https://github.com/lasp/developer-guide"

[tool.codespell]
ignore-words-list = "nd"
ignore-words-list = "nd, SORCE"
Loading