Skip to content

Knowledge Base

Documentation

Everything you need to navigate the ATLAS ecosystem — from contributing high-quality SEA language data to understanding our governance framework.

Methodology

How growth is measured

Growth tracks momentum: not just whether ATLAS has data, but whether the catalog is meaningfully expanding. This page explains the main growth chart, the two language groupings beneath it, and the cumulative-trend math used throughout.

The big chart at the top

A single line answering ‘how fast is ATLAS growing’, with two toggles that change what's being counted.

Axes

X
Year the dataset was added to ATLAS.
Y
Dataset count under the selected view.

View

Cumulative
Running total — line only goes up.
New per year
Net additions per calendar year — rises and falls.

Filter

All
Every language in the catalog.
Top 5
Five languages with the largest record share.
Underrepresented
Languages flagged by the composite-percentile rule.

Two language groupings

The language-depth section pairs the top of the catalog against the bottom — five anchors and three edge cases — so you can read both the strongest growth and the most precarious presence side by side.

The peak

Where the catalog runs deep

The five languages with the most distinct datasets in the catalog. These tend to be the languages driving aggregate growth — when ATLAS as a whole grows, it grows here first.

The edge

Languages on the margin

The three lowest-coverage languages that have any data at all — flagged by the composite-percentile underrepresentation rule documented in the Coverage methodology. Together they hold less than 1% of the catalog.

The cumulative trend column

  1. Sum forward, year by year

    Each language's per-year additions are converted to a running total. A language that gained 5 datasets in 2022 and 12 in 2023 plots as [5, 17] — the line only ever goes up, never down. This is the same convention used in stock charts and saved-up balances; it makes growth visually unambiguous.

  2. Compare the endpoints

    The multiplier next to each sparkline is the ratio of the final cumulative value to the first non-zero value. A 1 → 12 trajectory reads as 12×. We hide the multiplier (showing a dash) when fewer than two non-zero years are present — a single data point isn't growth, it's just existence.

  3. Direction has only two states

    Cumulative lines can't go down (you can't un-add a dataset), so the arrow shows growth (▲, green) or flat (→, grey). No red. If a language's catalog is shrinking that's a bug elsewhere — this chart will never report it.

Which year does each datapoint refer to?

The year axis represents when each dataset was added to ATLAS — its contributed_at timestamp — not when the underlying corpus was originally published. The aggregation pulls from the dm.contributed_at column on dataset_metadata.

Back to Insights

Cookies & analytics

We use cookies to make ATLAS work and to understand how it's used. Choose which categories to allow.

Necessary

Required for core site functionality. Always active.

Always on

Analytics

Helps us understand how ATLAS is used so we can improve it.

Marketing

Used for personalised content. Off by default.