Skip to content
CrowLingo

Navigate CrowLingo

Jump to any page. Type to filter.

CrowLingo · Animal language processing · v0

We can finally see what we couldn't hear.

CrowLingo is a focused exploration of one species and one revolution: the American crow, and the new generation of AI audio models that turn its voice into a map.

Listen now

Three calls, three coordinates on the same map.

Each recording is one point in a high-dimensional acoustic space. The full atlas of ~12,400 calls lives at the Repertoire Atlas.

Cluster · Territorial caw

Perched flock — territorial caws

Perched flock — territorial caws00:25 · 200 Hz – 8 kHz
Perched flock — territorial caws

Cluster · Rattle complex

Rattle complex — affiliative

Rattle complex — affiliative00:33 · 200 Hz – 8 kHz
Rattle complex — affiliative

Cluster · Begging

Juvenile begging + adult exchange

Juvenile begging + adult exchange00:40 · 200 Hz – 8 kHz
Juvenile begging + adult exchange

What's actually new

Discrete categories gave way to a continuous map.

Vertical split: six discrete labeled call tiles (1970s–2010s) versus a continuous iridescent UMAP cloud (self-supervised era).
placeholder
IG · 02 · BEFORE · AFTER
The fifty-year hand-labeling regime treated crow calls as a handful of named types. Self-supervised audio models flip the frame: every clip becomes a vector, every vector a point in a shared space, every cluster an emergent category. The old labels survive — as labels of regions in the map, not as boundaries on the world.

The change is not that we found new sounds. The change is that we stopped treating each call as a label, and started treating the whole repertoire as a geometry. Graded variation, dialect, individual signature — all of it visible at once, on the same map, in milliseconds per call.

The 2026 carrion-crow bioRxiv preprint (Demartsev et al.) used wearable loggers + this mapping discipline to recover both discrete and graded structure in grunts and caws — territory by territory, individual by individual.

Where we are honestly

ALP turned crow vocalization into a map. We have not learned the language.

Diptych comparing the current one-way decoding loop to an aspirational closed bidirectional loop with confidence meters.
placeholder
IG · 06 · NOW · VS · AIMED
Demonstrated, today: automatic detection, unsupervised category discovery, caller-identity inference, behavioral-context mapping, zero-shot captioning via NatureLM-audio. Aimed-for, not yet here: compositional decoding, real-time bidirectional dialogue, a 'crow dictionary' with human glosses. The frontier is where the next five years live.