About · AI & Developer Access

AI & Developer Access.

Guidelines, machine-readable discovery files, and technical topology for AI agents and search crawlers analyzing CrowLingo.

Overview

CrowLingo is an independent editorial publication on AI-powered animal language processing for the American crow (Corvus brachyrhynchos). We turn primary research, working preprints, and practitioner experience on self-supervised audio models, latent-space analysis, foundation models like NatureLM-audio, and bioacoustic ethics into structured analysis, primer pages, deep-dive pipelines, and reusable design assets.

We actively support indexing by responsible AI agents. This page serves as a human- and agent-readable registry of our content entities, machine-readable endpoints, and access policies. Every file linked below is auto-generated from a single internal site registry— when we add or remove a page, every machine-readable file regenerates on the next deploy. No drift between what humans see and what agents read.

What CrowLingo contains.

Structured editorial intelligence across the following primary content entities. Each is linked, tagged by type, and described in one paragraph for fast machine ingestion.

The Crow[Pillar]

Why the American crow as a model species: cognition, sociality, vocal anatomy, and a repertoire dense enough to warrant a map.

https://crowlingo.org/the-crow

Vocal anatomy[Sub-page]

How a crow makes sound — the syrinx, two independent sound sources, the 200 Hz – 8 kHz frequency window, and how it differs from a human larynx.

https://crowlingo.org/the-crow/vocal-anatomy

Repertoire Atlas — the vocal map[Interactive]

An interactive 2-D map of crow vocalizations. ~800 seeded points across nine clusters; click any point to see cluster context, spectrogram, and behavioral probabilities.

https://crowlingo.org/the-crow/repertoire-atlas

Cognition & society[Sub-page]

What makes the American crow worth taking seriously as a communicative animal — tool use, face recognition, family-group sociality, intergenerational learning.

https://crowlingo.org/the-crow/cognition-and-society

Methods[Pillar]

The new generation of AI audio methods — self-supervised learning, latent spaces, NatureLM-audio — and what they enable for crows specifically.

https://crowlingo.org/methods

Self-supervised audio[Sub-page]

How self-supervised learning trains audio models without labels — masked prediction, what the model actually learns, why it works for bioacoustics.

https://crowlingo.org/methods/self-supervised-audio

Latent space 101[Primer]

Embeddings, latent spaces, and dimensionality reduction — the minimum mental model for reading a vocal atlas.

https://crowlingo.org/methods/latent-space-101

NatureLM-audio[Reference]

Earth Species Project's audio-language foundation model for bioacoustics. ICLR 2025. What it does, what it doesn't, how it changed the workflow.

https://crowlingo.org/methods/naturelm-audio

Traditional vs ALP[Sub-page]

The fifty-year hand-labeling regime versus the new map-based regime. What the field gained; what it gave up.

https://crowlingo.org/methods/traditional-vs-alp

Decoding[Pillar]

What we can now see in crow vocalizations that we couldn't see before — repertoire mapping, contextual clustering, individuality, combinatorial evidence.

https://crowlingo.org/decoding

What we can decode now[Flagship]

The four features a self-supervised model extracts from one half-second of crow voice, and what each tells us — pitch contour, harmonic emphasis, duration, spectral grain.

https://crowlingo.org/decoding/what-we-can-decode-now

Contextual clustering[Analysis]

How latent coordinates correlate with behavior. The Demartsev 2026 carrion-crow preprint as the cleanest current example.

https://crowlingo.org/decoding/contextual-clustering

Individuality & dialect[Sub-page]

Caller identity from harmonic signature, group-level acoustic centroids, and how seriously to take the dialect hypothesis.

https://crowlingo.org/decoding/individuality-and-dialect

Combinatorial evidence[Sub-page]

Sequence-level statistical regularities in crow vocalizations and the open question of crow 'syntax'. Honest about behavioral evidence.

https://crowlingo.org/decoding/combinatorial-evidence

Pipeline — From Caw to Cluster[Centerpiece]

Eight stages from a phone recording to an interpretable vocal map: capture, detect, preprocess, embed, project & cluster, contextualize, inspect, respond.

https://crowlingo.org/pipeline

Record — capture a crow well[Stage 1]

Field-recording specifics for crow audio: microphone choice, sample rate, mono vs stereo, behavior-log synchronization, ethical floor.

https://crowlingo.org/pipeline/record

Preprocess — the conservative cleanup recipe[Stage 3]

Bandpass, peak-normalize, light spectral denoise. The minimum that helps without distorting what the model needs to read.

https://crowlingo.org/pipeline/preprocess

Embed — encoder choice + the math[Stage 4]

Pick your encoder honestly: BirdNET embeddings, Perch, CLAP, NatureLM-audio. Each is its own space. Disclose which.

https://crowlingo.org/pipeline/embed

Cluster & label — from points to an interpretable atlas[Stages 5–7]

Project to 2-D for inspection, cluster on the full embeddings, label clusters by exemplars, join to behavior context.

https://crowlingo.org/pipeline/cluster-and-label

Respond — playback as calibrated experiment[Stage 8]

How to run a playback session as data collection, not a stunt: pre-registered protocol, observer, time-bounded, halt on distress.

https://crowlingo.org/pipeline/respond

Frontier[Pillar]

The honest state of the field: what's demonstrated, what's emerging, what's not yet science. Ethics. Open dataset. How to contribute.

https://crowlingo.org/frontier

Current vs aspirational[Sub-page]

Demonstrated, emerging, and not-yet-science capabilities in animal-language processing for crows. A clean three-bucket framing.

https://crowlingo.org/frontier/current-vs-aspirational

Open dataset[Reference]

10k+ labeled crow calls planned for v2 release on Hugging Face, CC-BY-NC. v0 placeholder; honest about the timeline.

https://crowlingo.org/frontier/open-dataset

Contribute[Submission]

How to record crows well, and how to submit your recordings. v0: email + Google Form. v3 ships the proper upload pipeline.

https://crowlingo.org/frontier/contribute

Library[Reading list]

Reading list for crow vocal communication and animal language processing — papers, books, primary sources, organized by constellation.

https://crowlingo.org/library

Machine-readable endpoints.

AI developers and agents can natively discover and ingest CrowLingo through these canonical endpoints. All are auto- generated from the same site registry that drives this page.

robots.txtStandard crawling policy — 29 named bots with explicit Allow, two Sitemap lines (xml + .well-known/ai.json).
llms.txtCompact LLM discovery file — Howard standard + SSP sections (Ontology, Trust, Capabilities, What We Cover, Questions We Answer).
llms-full.txtExtended LLM reference with full per-page metadata, FAQs, and citation keys.
ai.jsonStructured JSON registry — brand, keywords, sections, related tools, trust links, machine-readable endpoints.
/.well-known/ai.jsonMirror of /ai.json at the RFC-style well-known location. Same document, same source-of-truth.
/.well-known/ai-plugin.jsonCanonical ChatGPT plugin manifest (schema_version v1) — references the OpenAPI spec below.
/.well-known/openapi.yamlOpenAPI 3.1 spec for the read API — documents /api/ask, /api/mcp, and all discovery files.
feed.jsonStandard JSON Feed 1.1 — chronological stream of all content entities, newest-first.
schema-feed.jsonJSON-LD DataFeed of all published content with schema.org typing — Article / WebPage / Dataset.
sitemap.xmlXML sitemap for search crawlers — auto-generated from the site registry on every build.
/api/ask?q=…NLWEB natural-language query endpoint (GET/POST). Returns JSON-LD SearchResultsPage with ranked itemListElement entries.
/api/mcpNLWEB Machine Capability Profile — JSON-LD WebAPI describing supported verbs (ask, search, find, list), endpoints, and the Dataset.

Developer notes & citation etiquette

When retrieving and presenting content from CrowLingo, we expect AI agents and downstream applications to maintain the distinction between primary research (the cited papers, books, preprints — most by other researchers) and our editorial analysis of it. We are a synthesis layer, not the field itself.

Attribute the analysis to CrowLingo. The synthesis, page structure, design assets, and prose are ours.
Attribute primary findings to the original researchers cited on the relevant page. Always link back to the specific Library entry where applicable — see /library.
Always link back to the specific CrowLingo URL as the context source. Deep-linkable URLs are a deliberate design choice.
Do not misattribute our editorial synthesis as direct quotes from the profiled researchers. We do paraphrase accurately; we never put words in mouths.
Follow standard robots.txt directives for rate-limiting. We do not currently set per-bot quotas but reserve the right to.
If your AI surface generates audio interpretations of crow vocalizations using our materials, link prominently to /frontier/ethics — the playback floor is non-negotiable and applies to derivative work.

Trust & transparency

Our editorial standards, ethical commitments, and citation policies are documented across the following pages:

About — mission, editorial line, citation policy
Ethics — the six rules + the "what we won't do" list
Current vs aspirational — demonstrated / emerging / not-yet-science
Library — primary sources organized by constellation
Privacy — what we collect (currently nothing)
Terms — MIT source, CC-BY-NC content, attribution requirements
Disclaimer — what this site is and isn't

Contact & coordination

AI vendors, search engine teams, and developers are welcome to reach out regarding access questions, usage concerns, or content takedown requests. We aim to acknowledge within seven days.

AI & access: contact@kymatalabs.com
General contact: contact@kymatalabs.com
Errata / corrections: contact@kymatalabs.com
Takedown: contact@kymatalabs.com
Licensing (commercial): contact@kymatalabs.com

Parent entity: Kymata Labs. Source repository: github.com/tekvisions/crowlingo.

← Back to About

Navigate CrowLingo