Skip to content

Knowledge Base

Documentation

Everything you need to navigate the ATLAS ecosystem — from contributing high-quality SEA language data to understanding our governance framework.

How to contribute

ATLAS lets you contribute datasets three ways. Pick the path that matches your situation.

The three paths

Paste a HuggingFace URL

Best for: datasets already public on HuggingFace

Paste your HuggingFace dataset URL and we'll auto-extract basic metadata (name, description, tags, version, citation). You fill in the rest — languages, license, modality, task and domain group, plus the Responsible AI assessment.

Upload a file

Best for: fresh data you're publishing for the first time

Upload a single file (up to 5 GB) — text data formats: zip, tar, gz, csv, json, txt, pdf. For multiple files, zip them first. After upload, fill in all the metadata yourself.

Need help or have a large dataset?

Best for: datasets >5 GB, special formats, or anything tricky

Reach out via our support form and our team will help you onboard your dataset. Use this path if your file is too big, in a format we don't accept directly, or if you'd like guidance.

What you'll fill in

Whichever path you pick, you'll be guided through a wizard covering:

  • Core metadata — dataset name, description, languages, license, modality, task group, domain group
  • Responsible AI assessment — data provenance, intended use, known limitations, biases, PII handling
  • Final review — confirm everything, accept the contributor terms, submit

What happens after you submit

Your submission enters the ATLAS contribution pipeline. The team reviews it and, when all checks pass, your dataset gets published to the catalogue. You can track its status anytime from your profile.

Ready to contribute?

Start contributing

Cookies & analytics

We use cookies to make ATLAS work and to understand how it's used. Choose which categories to allow.

Necessary

Required for core site functionality. Always active.

Always on

Analytics

Helps us understand how ATLAS is used so we can improve it.

Marketing

Used for personalised content. Off by default.