Blog

Domain-Specific Knowledge Assistants: Approaches in 2025

Written by Dr. Cecilia Maas | Aug 25, 2025 9:08:06 PM

Organisations often sit on vast collections of knowledge—legal texts, interview transcripts, reports, publications, research data—yet much of this information remains difficult to access in practice. Traditional keyword search falls short when users need nuanced answers to questions that require context, connections, and reasoning across diverse sources, not just keyword matches.

Recent advances in artificial intelligence offer a way forward. In 2025, methods such as multi-agent systems, GraphRAG, and the Model Context Protocol (MCP) are changing how organizations can unlock the value of their data. Instead of relying on closed, opaque platforms, institutions can now build open-source, transparent, and sustainable systems tailored to their domain. The promise is not just faster access to information, but a shift in how knowledge can be organized, accessed, and shared.

Understanding the Challenge

Working with complex corpora is rarely straightforward. Collections are not uniform: they often combine scanned PDFs, images and graphs, transcripts, databases, texts such as briefs or academic papers, and more, each with its own structure—or lack of it. Metadata may be incomplete or inconsistent, documents span multiple languages, and the same concept may be expressed in very different terms depending on the source.

Keyword search struggles in this environment, because it cannot bridge terminology gaps, identify conceptual relations, or weigh the relevance of different materials.

Beyond technical complexity, institutions face practical and ethical constraints. Organisations in civil society, academia or the cultural sector often operate with limited budgets, making proprietary enterprise solutions unsustainable. At the same time, they care for intellectual property rights, protecting sensitive data, and complying with emerging regulations for ethical use of AI.

This combination of factors creates a unique challenge: how to design AI systems that are not only powerful, but also transparent, accountable, and sustainable in the long term.

Best Approaches in 2025

The landscape of AI for knowledge access has matured significantly in 2025. At the heart of most domain-specific AI systems today lies retrieval-augmented generation (RAG). By grounding large language models in a curated corpus of data, RAG reduces hallucinations and ensures that answers can be traced back to source material instead of be indirectly informed by the data the model was trained on. It has quickly become the default method for building knowledge assistants, yet experience has shown its limitations: simple vector-based retrieval often struggles with nuanced queries, fragmented data, or the need to connect evidence across documents.

RAG provides a crucial foundation, but on its own it is not enough. Three approaches in particular are proving essential for institutions dealing with complex and diverse corpora in 2025.

  • Multi-agent systems allow different components powered by Large Language Models (LLM) to specialize. The exact architect varies according to the solution, but for example one agent focuses on retrieving relevant material using vectors to compute semantic similarity, another reranks and filters retrieved sources based on additional context, a third synthesizes an answer, and yet another identifies the user’s intent and structures the response accordingly. This division of a complex task in several simpler ones, which mirrors the workflow of a human researcher, is key to accessing complex information and providing responses relevant to the domain.
  • GraphRAG (graph-based retrieval-augmented generation) goes a step further, adding a structured representation of how entities present in the sources (e.g concepts, people, places, and events) interrelate. Instead of returning passages based on their semantic similarity to the user’s query, the graph component captures relations between the sources and therefore enables to answer to questions that pure semantic retrieval would fail to do.
  • the Model Context Protocol has emerged in late 2024 as a standard for interoperability, enabling LLMs to orchestrate multiple tools in order to accomplish a task—in this case, answering a question. Instead of relying on a single retrieval step, an MCP-enabled system can coordinate several complementary actions: issuing multiple queries to a vector database, reformulating them if the first attempt returns poor results, or combining database searches with calls to APIs, structured knowledge graphs, or reasoning agents. This makes information access far more robust. Rather than depending on one “best guess” passage, the model can assemble a richer evidence base, cross-check sources, and deliver responses that are more accurate.

Equally important is the commitment to open-source solutions. Frameworks such as LangChain, LangGraph, LlamaIndex, Weaviate, Qdrant, allow institutions to experiment, customize, and deploy without vendor lock-in. They provide building blocks that can be adapted to specific domains while keeping costs manageable and ensuring alignment with ethical and legal standards. Open source also fosters community-driven innovation: universities, NGOs, and researchers can share improvements, creating a commons of tools for knowledge access.

Examples from Practice

To understand how these challenges can be addressed, we share learnings from two recent projects implemented by aureka.

  • In the project Lupai, we collaborated with the association Bildungszentrum Lohana Berkins to develop an AI-powered application that supports migration advisors in researching complex cases more efficiently. The data corpus included laws, procedural instructions from ministries, and brochures published by advisory centers with extensive expertise on specific dimensions of the migration experience (e.g. asylum, migration and precarious work, self-employment with migrant status). The prototype, developed in 2024, used a multi-agent setup to retrieve sources from a curated database and help advisors locate precise information (more about it here). Beyond retrieval, one agent asked follow-up questions to fill in missing details (e.g. region, current migration or employment status), while another identified the intent of the query and structured the response accordingly—delivering, for example, a short validation if the intent was to confirm a fact, or a step-by-step guide if the goal was to design a strategy. In 2025, we refactored the system to include MCP: moving from one-shot retrieval to iterative queries significantly improved the quality and robustness of the responses.
  • With the association Schule ein Gesicht geben, we supported the development of an AI assistant to improve access to their extensive knowledge base on student representations in German schools. Over the years, the association had compiled detailed handbooks, guidelines, and recommendations to help students organize effectively, but it had become difficult to maintain an overview and locate the right resources. The assistant, to be launched in October 2025, will allow students and teachers to ask questions and clarify doubts using the association’s comprehensive materials. From understanding regional legal frameworks, to practical step-by-step guidance on starting a student council, to inspiration for new projects—AI will make these resources more discoverable and actionable. For this project, we implemented an MCP server from the start and designed a lightweight agentic architecture to keep the system performant and flexible. We also automated the ingestion of materials published on the association’s website and enriched the corpus with additional sources selected through the expert knowledge of the team.

Co-Creating Knowledge Access

The examples above highlight a broader lesson: building effective AI systems for information access is not simply a technical challenge, but a process of co-creation. Domain experts—whether they are migration advisors, educators, or cultural practitioners—hold the tacit knowledge needed to shape how data should be interpreted, what questions matter, and what constitutes a useful answer. Their involvement ensures that AI systems do not merely deliver text, but provide guidance aligned with real needs and institutional practices. This stands in sharp contrast to one-size-fits-all enterprise solutions, which often impose rigid structures and fail to capture the nuances of public-purpose work.

Our vision at aureka is to advance community-driven ****solutions that allow organizations with a public mission to benefit from each other’s experience while retaining the ability to adapt systems to their specific values and contexts. By pooling open-source tools, methodological innovations, and lessons learned across domains, organisations can create a shared ecosystem of digital knowledge assistants. These systems will not only make complex corpora more accessible, but also reflect the ethical commitments, transparency requirements, and long-term sustainability that public-oriented institutions demand.