Skip to content

Latest commit

 

History

History
58 lines (46 loc) · 1.81 KB

File metadata and controls

58 lines (46 loc) · 1.81 KB

Data Card: [Dataset Name]

Guidelines for Data Quality

Dataset Overview

  • Title:
  • Version:
  • Release Date:
  • Description / Summary:
  • Motivation: Why was this dataset created?

Contributors

  • Creators: [Names, institutions]
  • Funding / Sponsors:
  • Contact Information:

Source Information

  • Data Sources: (web, books, social media, clinical, crowdworkers, etc.)
  • Collection Method: (manual, scraping, recording, surveys, etc.)
  • Time Period: (years of data collection)
  • Geographic Coverage: (India, global, specific states/regions)

Languages & Content

  • Languages: (with ISO codes, e.g., en, hi, kn)
  • Domains: (news, healthcare, education, conversational, etc.)
  • Data Types: (text, audio, video, images, metadata)
  • Size: (# of samples, tokens, hours, etc.)

Composition

  • Train / Dev / Test Splits:
  • Annotation Process: (guidelines, annotators, inter-annotator agreement, quality checks)
  • Data format: structure
  • Demographics: (age, gender, dialect distribution, if available)

Intended Use

  • Recommended Tasks: (ASR, MT, sentiment analysis, etc.)
  • Out-of-Scope Uses: (not for medical decision making, etc.)

Ethical Considerations

  • Known Biases / Limitations:
  • Sensitive Content: (offensive, personal information, etc.)
  • Data Filtering / Cleaning:

Licensing & Access

  • License: (CC-BY, CC0, custom)
  • Access Conditions: (open, restricted, request-based)

Citation & References

  • Paper Link: [ArXiv/ACL/IEEE/etc.]
  • DOI:
  • BibTeX Citation:

Maintenance

  • Maintainers:
  • Update Policy:
  • Errata & Feedback: