- Title:
- Version:
- Release Date:
- Description / Summary:
- Motivation: Why was this dataset created?
- Creators: [Names, institutions]
- Funding / Sponsors:
- Contact Information:
- Data Sources: (web, books, social media, clinical, crowdworkers, etc.)
- Collection Method: (manual, scraping, recording, surveys, etc.)
- Time Period: (years of data collection)
- Geographic Coverage: (India, global, specific states/regions)
- Languages: (with ISO codes, e.g., en, hi, kn)
- Domains: (news, healthcare, education, conversational, etc.)
- Data Types: (text, audio, video, images, metadata)
- Size: (# of samples, tokens, hours, etc.)
- Train / Dev / Test Splits:
- Annotation Process: (guidelines, annotators, inter-annotator agreement, quality checks)
- Data format: structure
- Demographics: (age, gender, dialect distribution, if available)
- Recommended Tasks: (ASR, MT, sentiment analysis, etc.)
- Out-of-Scope Uses: (not for medical decision making, etc.)
- Known Biases / Limitations:
- Sensitive Content: (offensive, personal information, etc.)
- Data Filtering / Cleaning:
- License: (CC-BY, CC0, custom)
- Access Conditions: (open, restricted, request-based)
- Paper Link: [ArXiv/ACL/IEEE/etc.]
- DOI:
- BibTeX Citation:
- Maintainers:
- Update Policy:
- Errata & Feedback: