Metadata API

For cultural heritage and publishing

Automated Indexing

Unlock the power of AI indexing with aureka's Metadata API

Map concepts to thesauri like the Library of Congress Subject Headings, Gemeinsame Normdatei (GND), BISAC and WGS codes.
Extract highly relevant keywords for each section and the entire document, prioritized by relevance to enhance search and categorization.
Summarize the core content of your documents to improve discoverability and enable quick insights.

Schedule Demo

Behind the Curtains

Depending on the indexing task, a tailored pipeline is applied.

For large authority files used in archival and bibliographic indexing, vocabularies are transformed into mathematical representations of their semantic meaning using embeddings, enriched by the hierarchical relationships between terms. Each text fragment is analyzed, and the most relevant terms from the vocabulary are retrieved and ranked.
For subject headings used in book indexing, a graph-based structure captures the hierarchy of terms. Large language models extract information from the text and navigate the vocabulary iteratively, identifying the most relevant nodes.
Generative language models are optimised to capture the meaning of textual sources and summarise it in a compact text.

It's Customizable

Do you need to index with your own thesaurus? The modular architecture allows for fast customizations.

Get in touch

Straitghforward Pricing

Schedule Demo