Skip to content

Commit d5a5106

Browse files
Add dataset publishing and citation content (#57)
* Add guidelines on citing and publishing datasets * Add images used in a couple data managment guides * Minor fixes to typos and file paths * Add "SORCE" to words codespell should ignore in poetry toml --------- Co-authored-by: Matthew Bourque <[email protected]>
1 parent 7b3f789 commit d5a5106

File tree

7 files changed

+300
-2
lines changed

7 files changed

+300
-2
lines changed

docs/source/_static/doi_format.png

8.61 KB
Loading
288 KB
Loading
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Publishing and Citing Datasets
2+
3+
Guidelines for making datasets publicly available, creating DOIs, and properly citing datasets.
4+
5+
## Purpose
6+
7+
This guideline supports Data Systems workflows by ensuring datasets referenced in publications are
8+
openly accessible, properly identified with DOIs, and cited according to community standards. It
9+
aligns with publisher and funder policies and promotes scientific reproducibility.
10+
11+
## How to Publish and Cite Datasets
12+
13+
**Make Data Open and Accessible**
14+
Ensure datasets associated with publications are stored in publicly accessible, machine-readable formats.
15+
16+
**Create a DOI and Landing Page**
17+
Digital Object Identifiers, DOIs, are machine-readable identifiers that resolve to information about a resource.
18+
In addition to datasets, researchers can have an ORCID digital identifier, see https://orcid.org/. Publishers are
19+
now generally requiring DOIs that point to data referenced publications, and often ORCIDs as well.
20+
- See [Digital Object Identifiers](digital_object_identifiers.md) for an introduction to DOIs.
21+
- See [Creating a DOI via CU Libraries and DataCite](creating_a_doi.md) for a quick start on creating a DOI and a
22+
landing page for a dataset.
23+
- LASP could build resources to create and manage DOIs and associated landing pages.
24+
25+
**Cite Datasets in Publications**
26+
Follow established data citation principles to ensure datasets are properly cited in scholarly works. Reference
27+
the [Force 11 Joint Declaration of Data Citation Principles](https://www.force11.org/datacitationprinciples) and
28+
follow practices described in the [ESIP Dataset Citation Guidelines](https://doi.org/10.6084/m9.figshare.8441816).
29+
30+
**Understand Publisher Requirements**
31+
Ensure that DOIs and ORCIDs are included as required by publishers to maintain compliance with submission guidelines.
32+
33+
## Options
34+
35+
There are several options for publishing datasets:
36+
37+
1. **CU Libraries and DataCite DOI Creation**
38+
Researchers can create DOIs and landing pages for datasets using CU Libraries' integration with DataCite.
39+
40+
2. **CU Scholar Hosting**
41+
CU Scholar can host articles, reports, and datasets of limited size. CU Scholar prefers to generate and manage DOIs
42+
for hosted datasets.
43+
44+
3. **LASP DOI Management (Future Direction)**
45+
LASP can develop internal resources for creating and managing DOIs and dataset landing pages, streamlining the
46+
process for LASP-affiliated data products.
47+
48+
4. **External Repositories**
49+
For larger datasets or specialized data types, external repositories that support DOI assignment can be considered.
50+
51+
## Useful Links
52+
53+
- [Creating a DOI via CU Libraries and DataCite](creating_a_doi.md)
54+
- [CU Scholar](https://scholar.colorado.edu/about)
55+
- [Force 11 Joint Declaration of Data Citation Principles](https://www.force11.org/datacitationprinciples)
56+
- [ESIP Dataset Citation Guidelines](https://doi.org/10.6084/m9.figshare.8441816)
57+
- [Zenodo DOI Citation Guide](https://doi.org/10.5281/zenodo.1451971)
58+
- [Data Citation Roadmap (Scholarly Repositories)](https://www.biorxiv.org/content/biorxiv/early/2017/10/09/097196.full.pdf)
59+
- [ORCID](https://orcid.org/)
60+
- [DOIs for SORCE Data Products](https://confluence.lasp.colorado.edu/pages/viewpage.action?pageId=21464459)
61+
(Confluence)
62+
63+
## Acronyms
64+
65+
- **DOI** = Digital Object Identifier
66+
- **ORCID** = Open Researcher and Contributor ID
67+
- **ESIP** = Earth Science Information Partners
68+
69+
Credit: Content taken from a Confluence guide written by Anne Wilson and Shawn Polson.
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Creating a DOI via CU Libraries and DataCite
2+
3+
As of 2018, CU Libraries is a member of DataCite. Through this membership, LASP can mint and
4+
register DOIs for datasets housed in our repositories, enabling data to be persistently identified,
5+
accessed, and cited.
6+
7+
Guidelines for assigning Digital Object Identifiers (DOIs) to datasets using this
8+
membership, including steps for request, metadata requirements, and long-term
9+
responsibilities are outlined here.
10+
11+
## Purpose
12+
13+
This guideline supports LASP’s data publishing workflows by enabling the creation and registration
14+
of persistent identifiers (DOIs) for datasets using CU resources. These identifiers help ensure
15+
long-term access, discoverability, and proper citation of data.
16+
17+
## How to create a DOI
18+
19+
Dataset DOIs should resolve to a dataset landing page providing information about the dataset like where it can be
20+
accessed. CU libraries automatically generates a generic landing page populated with high level metadata on datasets
21+
provided in a DOI request form. Whereas CU Scholar, another CU resource, requires data providers to have a reference to
22+
such a landing page when creating a DOI.
23+
24+
Note, that the number of DOIs allocated to LASP is limited.
25+
26+
1. **Submit a Request**
27+
- Researchers: File a Jira issue with type "DOI" in the [Data Management Jira project](https://jira.lasp.colorado.edu/projects/DATAMAN/).
28+
29+
2. **Prepare Required Metadata**
30+
- Work with the Data Management team to ensure proper metadata and landing page are available.
31+
- Minimum required metadata for DOI creation:
32+
- URL of the landing page (not the dataset itself)
33+
- Creators (list of names)
34+
- Title
35+
- Publisher (typically LASP or a project within LASP)
36+
- ResourceType (usually `dataset`)
37+
- DataCite supports additional metadata. Those properties are described here: https://support.datacite.org/docs/metadata-quality.
38+
39+
3. **Create DOI via DataCite**
40+
- Data Management team logs into [doi.datacite.org](https://doi.datacite.org/) using the `CUB.LASP` repository ID.
41+
- Click "DOIs" → "Create DOI (Form)"
42+
- Use the form to enter metadata. See full field descriptions at: [DataCite Field Descriptions](https://support.datacite.org/docs/field-descriptions-for-form)
43+
- For developers: There is an [API](https://support.datacite.org/docs/api) that reads the full metadata schema.
44+
45+
4. **Maintain DOI Metadata**
46+
- Keep DOI metadata up to date in the [DataCite Metadata Store](https://support.datacite.org/docs)
47+
- If a dataset is moved, update the registry.
48+
- If a dataset is removed, maintain a “tombstone” landing page.
49+
50+
5. **Follow DOI Best Practices**
51+
- Use landing pages (not direct links to datasets).
52+
- Maintain metadata quality and completeness as information changes.
53+
- See [Metadata Guidelines](metadata.md) for dataset metadata requirements.
54+
55+
6. **Adhere to Roles and Responsibilities**
56+
57+
LASP (as a DataCite Client) must meet responsibilities outlined in:
58+
- [DataCite Community Responsibility](https://support.datacite.org/docs/community-responsibility)
59+
- [Data Citation Roadmap for Scholarly Data Repositories](https://www.biorxiv.org/content/biorxiv/early/2017/10/09/097196.full.pdf)
60+
61+
![DataCite_Repository_Guidelines](../_static/repository_obligations_table.png)
62+
63+
## Getting Help
64+
65+
Please use the [DATAMAN](https://jira.lasp.colorado.edu/secure/RapidBoard.jspa?rapidView=1430) project on MODS-Jira to
66+
submit a ticket, and someone from the Data Management Working Group will respond to it.
67+
68+
## Useful Links
69+
70+
- [Intro to Digital Object Identifiers](digital_object_identifiers.md)
71+
- [DataCite](https://doi.datacite.org/)
72+
- [Field Descriptions for DOI Form](https://support.datacite.org/docs/field-descriptions-for-form)
73+
- [DataCite Metadata Quality](https://support.datacite.org/docs/metadata-quality)
74+
- [DataCite Community Responsibility](https://support.datacite.org/docs/community-responsibility)
75+
- [Data Citation Roadmap (Scholarly Repositories)](https://www.biorxiv.org/content/biorxiv/early/2017/10/09/097196.full.pdf)
76+
- [Intro to DataCite REST API](https://support.datacite.org/docs/api)
77+
- [Metadata Requirements](metadata.md)
78+
- [NASA EOSDIS DOI Guidelines](https://wiki.earthdata.nasa.gov/display/DOIsforEOSDIS)
79+
- [CU Scholar](https://scholar.colorado.edu/about)
80+
- [Creating a DOI for Software](../workflows/open_source/citing_software.md)
81+
82+
## Acronyms
83+
84+
- **DOI** = Digital Object Identifier
85+
- **NASA** = National Aeronautics and Space Administration
86+
- **EOSDIS** = Earth Observing System Data and Information System
87+
- **API** = Application Programming Interface
88+
89+
Credit: Content taken from a Confluence guide written by Anne Wilson and updated by Doug Lindholm
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Digital Object Identifiers
2+
3+
A Digital Object Identifier (DOI) is a code used to uniquely
4+
identify content of various types. DOIs enable easy online
5+
access to research data for discovery, attribution, and reuse,
6+
and enable accurate data citation and other metrics. DOIs are
7+
a persistent identifier, and as such carry expectations of
8+
curation, persistent access, and rich metadata.
9+
10+
There is a system and practices associated with DOI usage,
11+
for "persistent and actionable identification and interoperable
12+
exchange of managed information on digital networks"
13+
(https://support.datacite.org/docs/doi-basics).
14+
15+
DOIs are intended to be "resolvable," usually to information
16+
about the object to which the DOI refers—including information
17+
about where the object can be found. For a dataset, that would
18+
be a dataset landing page providing information about the
19+
dataset like where it can be accessed. The DOI should not
20+
point to the dataset itself. The DOI remains fixed over the
21+
lifetime of the object, whereas its location and metadata may
22+
change. When the location changes, the publisher of the
23+
object is responsible for updating the metadata for the DOI
24+
to the new locations.
25+
26+
The developer and administrator of the DOI system is the
27+
International DOI Foundation (IDF) which introduced DOIs
28+
in 2000. Organizations that meet the contractual obligations
29+
of the DOI system and that are willing to pay to become a
30+
member (such as DataCite, see below) can assign DOIs.
31+
32+
The DOI system is implemented through a federation of
33+
registration agencies coordinated by the IDF.
34+
See https://www.doi.org/, and particularly
35+
https://www.doi.org/hb.html, the DOI Handbook, for details.
36+
37+
## Purpose of DOIs
38+
39+
Funding agencies and publishers increasingly recognize that
40+
datasets and scientific software are valuable research outputs
41+
that should be openly available, identifiable, and citable—often
42+
through DOIs.
43+
44+
At LASP, digital objects worthy of identification include
45+
datasets and associated outputs (e.g., documentation, papers,
46+
workflows, algorithms, software, etc.).
47+
48+
## DOI registries
49+
50+
To enable accessibility, a DOI needs to reside in a registry
51+
where it can be resolved. The registry collects and provides
52+
high level information, assigns DOIs, and links to references.
53+
54+
[DataCite](https://datacite.org/) is a not-for-profit, global
55+
initiative to "help the research community locate, identify,
56+
and cite research data with confidence," through DOI minting
57+
and registration. It is the leading global provider of DOIs
58+
for datasets. From their website:
59+
60+
>By working closely with data centres to assign DOIs to
61+
> datasets and other research objects, we are developing a
62+
> robust infrastructure that supports simple and effective
63+
> methods of data citation, discovery, and access. Citable
64+
> data become legitimate contributions to scholarly
65+
> communication, paving the way for new metrics and
66+
> publication models that recognize and reward data sharing.
67+
68+
CU Libraries are now a member of DataCite. Through this
69+
membership, LASP can mint and register DOIs for datasets
70+
housed in our repositories, enabling data to be persistently
71+
identified, accessed, and cited.
72+
73+
[Crossref](https://www.crossref.org/) is another registry that
74+
is often mentioned in Earth and space science contexts. It's
75+
a not-for-profit association of ~2000 voting member publishers
76+
who represent 4300 societies and publishers. It exists to
77+
facilitate the links between distributed content hosted at
78+
other sites, and uses DOIs to do so.
79+
80+
[Zenodo](https://zenodo.org/) is a free repository developed
81+
by CERN and operated by OpenAIRE. It is a general-purpose
82+
repository that allows researchers to deposit datasets,
83+
research software, reports, and any other research-related
84+
digital artifacts. Zenodo assigns DOIs to the deposited
85+
content, making it citable and discoverable.
86+
See [citing software](../workflows/open_source/citing_software.md)
87+
for more on using Zenodo to cite software.
88+
89+
[ORCiDs](https://orcid.org/) are like DOIs but provide
90+
persistent digital object identifiers for people.
91+
92+
## DOI Format
93+
94+
When a LASP researcher needs a DOI, they will provide some information and receive a DOI back.
95+
They will never actually create a DOI. Nevertheless, it is worth understanding the form of a DOI
96+
and the goals behind its format.
97+
98+
DataCite goals for DOIs include enabling robots and crawlers to recognize DataCite DOIs as URLs,
99+
making them easy to cut and paste, and helping users recognize that DOIs are both a persistent link
100+
and a persistent identifier.
101+
102+
This is a DOI:
103+
104+
https://doi.org/10.5281/ZENODO.31780
105+
A DOI name consists of three parts:
106+
107+
![DOI_Format](../_static/doi_format.png)
108+
109+
The proxy is an HTTP URL. DataCite recommends that all DOIs are permanent URLs.
110+
(Using the old DOI protocol, e.g. doi:/10.5281/ZENODO.31780 is NOT recommended.)
111+
112+
A DOI prefix always starts with "10." and continues with a number. This number
113+
defines a globally unique namespace. (The scope of "global" depends on the organization
114+
managing multiple repositories.) Prefixes should not have semantic meaning. Adding
115+
meaning to the identifier is risky because "despite besting intentions, all names can
116+
change over time" [DataCite DOI Basics](https://support.datacite.org/docs/doi-basics).
117+
118+
The suffix for a DOI can be almost any string. Here is where information provided in an
119+
input form may be integrated into the DOI.
120+
121+
Note that DOI names are not case-sensitive, while URLs are case-sensitive:
122+
https://support.datacite.org/docs/datacite-doi-display-guidelines.
123+
124+
## Useful Links
125+
126+
- [DataCite: DOI Basics](https://support.datacite.org/docs/doi-basics)
127+
- [DataCite: DOI Handbook](https://www.doi.org/the-identifier/resources/handbook/)
128+
- [DataCite: DOI Display Guidelines](https://support.datacite.org/docs/datacite-doi-display-guidelines)
129+
- [Creating a DOI via CU Libraries and DataCite](creating_a_doi.md)
130+
131+
## Acronyms
132+
133+
- **DOI** = Digital Object Identifier
134+
- **IDF** = International DOI Foundation
135+
- **ORCID** = Open Researcher and Contributor ID
136+
137+
Credit: Content taken from a Confluence guide written by Anne Wilson and Shawn Polson.

docs/source/data_management/index.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,7 @@ Data Management
88
file_formats/index
99
metadata.md
1010
fair_principles.md
11-
data_stewardship.md
11+
data_stewardship.md
12+
citing_and_publishing_datasets.md
13+
digital_object_identifiers.md
14+
creating_a_doi.md

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,4 +36,4 @@ homepage = "https://github.com/lasp/"
3636
repository = "https://github.com/lasp/developer-guide"
3737

3838
[tool.codespell]
39-
ignore-words-list = "nd"
39+
ignore-words-list = "nd, SORCE"

0 commit comments

Comments
 (0)