screenshot-api-1

For cultural heritage and publishing

Automated Indexing

Unlock the power of AI indexing with aureka's Metadata API

  • Map concepts to thesauri like the Library of Congress Subject Headings, Gemeinsame Normdatei (GND), BISAC and WGS codes.
  • Extract highly relevant keywords for each section and the entire document, prioritized by relevance to enhance search and categorization.
  • Summarize the core content of your documents to improve discoverability and enable quick insights.

Behind the Curtains

How Does it Work

Depending on the indexing task, a tailored pipeline is applied.

  • For large authority files used in archival and bibliographic indexing, vocabularies are transformed into mathematical representations of their semantic meaning using embeddings, enriched by the hierarchical relationships between terms. Each text fragment is analyzed, and the most relevant terms from the vocabulary are retrieved and ranked.
  • For subject headings used in book indexing, a graph-based structure captures the hierarchy of terms. Large language models extract information from the text and navigate the vocabulary iteratively, identifying the most relevant nodes.
  • Generative language models are optimised to capture the meaning of textual sources and summarise it in a compact text.

output (1)-1

It's Customizable

Our AI with Your Vocabulary

Do you need to index with your own thesaurus? The modular architecture allows for fast customizations.

Straitghforward Pricing

Pay for Use

  • Get free queries to test the API and integration
  • Keyword assignment at 0.3 USD per 1,000 words
  • Summarisation costs 0.1 USD per 1,000 words
  • Discount for large collections

Insights & Resources

Stay Tuned

Join our newsletter to be informed of software releases and news from aureka.