Cookies & analytics

We use cookies to make ATLAS work and to understand how it's used. Choose which categories to allow.

Necessary

Required for core site functionality. Always active.

Always on

Analytics

Helps us understand how ATLAS is used so we can improve it.

Marketing

Used for personalised content. Off by default.

Aksharantar

May 2026·v1.0.0·cc

⚠ Has use restrictions

Dataset Card for Aksharantar Dataset Summary Aksharantar is the largest publicly available transliteration dataset for 20 Indic languages. The corpus has 26M Indic language-English transliteration pairs. Supported Tasks and Leaderboards [More Information Needed] Languages Assamese (asm) Hindi (hin) Maithili (mai) Marathi (mar) Punjabi (pan) Tamil (tam) Bengali (ben) Kannada (kan) Malayalam (mal) Nepali (nep) Sanskrit (san) Telugu… See the full description on the dataset page: https://huggingface.co/datasets/ai4bharat/Aksharantar.

Languages

AssameseBanglaBodoDogriGujaratiHindi+15 more

Modalities

Text

Domains

Education, Language & Career

Intended use

Traditional NLP Task

Responsible AI assessment

0 complete · 0 partial · 9 missing

Expand each row to see more

Collection methodMissing

Annotation protocolMissing

Preprocessing protocolMissing

Intended use casesMissing

Known limitationsMissing

Known biasesMissing

Social impactMissing

Personal / sensitive informationMissing

Maintenance planMissing