Organisations often sit on vast collections of knowledge—legal texts, interview transcripts, reports, publications, research data—yet much of this information remains difficult to access in practice. Traditional keyword search falls short when users need nuanced answers to questions that require context, connections, and reasoning across diverse sources, not just keyword matches.
Recent advances in artificial intelligence offer a way forward. In 2025, methods such as multi-agent systems, GraphRAG, and the Model Context Protocol (MCP) are changing how organizations can unlock the value of their data. Instead of relying on closed, opaque platforms, institutions can now build open-source, transparent, and sustainable systems tailored to their domain. The promise is not just faster access to information, but a shift in how knowledge can be organized, accessed, and shared.
Working with complex corpora is rarely straightforward. Collections are not uniform: they often combine scanned PDFs, images and graphs, transcripts, databases, texts such as briefs or academic papers, and more, each with its own structure—or lack of it. Metadata may be incomplete or inconsistent, documents span multiple languages, and the same concept may be expressed in very different terms depending on the source.
Keyword search struggles in this environment, because it cannot bridge terminology gaps, identify conceptual relations, or weigh the relevance of different materials.
Beyond technical complexity, institutions face practical and ethical constraints. Organisations in civil society, academia or the cultural sector often operate with limited budgets, making proprietary enterprise solutions unsustainable. At the same time, they care for intellectual property rights, protecting sensitive data, and complying with emerging regulations for ethical use of AI.
This combination of factors creates a unique challenge: how to design AI systems that are not only powerful, but also transparent, accountable, and sustainable in the long term.
The landscape of AI for knowledge access has matured significantly in 2025. At the heart of most domain-specific AI systems today lies retrieval-augmented generation (RAG). By grounding large language models in a curated corpus of data, RAG reduces hallucinations and ensures that answers can be traced back to source material instead of be indirectly informed by the data the model was trained on. It has quickly become the default method for building knowledge assistants, yet experience has shown its limitations: simple vector-based retrieval often struggles with nuanced queries, fragmented data, or the need to connect evidence across documents.
RAG provides a crucial foundation, but on its own it is not enough. Three approaches in particular are proving essential for institutions dealing with complex and diverse corpora in 2025.
Equally important is the commitment to open-source solutions. Frameworks such as LangChain, LangGraph, LlamaIndex, Weaviate, Qdrant, allow institutions to experiment, customize, and deploy without vendor lock-in. They provide building blocks that can be adapted to specific domains while keeping costs manageable and ensuring alignment with ethical and legal standards. Open source also fosters community-driven innovation: universities, NGOs, and researchers can share improvements, creating a commons of tools for knowledge access.
To understand how these challenges can be addressed, we share learnings from two recent projects implemented by aureka.
The examples above highlight a broader lesson: building effective AI systems for information access is not simply a technical challenge, but a process of co-creation. Domain experts—whether they are migration advisors, educators, or cultural practitioners—hold the tacit knowledge needed to shape how data should be interpreted, what questions matter, and what constitutes a useful answer. Their involvement ensures that AI systems do not merely deliver text, but provide guidance aligned with real needs and institutional practices. This stands in sharp contrast to one-size-fits-all enterprise solutions, which often impose rigid structures and fail to capture the nuances of public-purpose work.
Our vision at aureka is to advance community-driven ****solutions that allow organizations with a public mission to benefit from each other’s experience while retaining the ability to adapt systems to their specific values and contexts. By pooling open-source tools, methodological innovations, and lessons learned across domains, organisations can create a shared ecosystem of digital knowledge assistants. These systems will not only make complex corpora more accessible, but also reflect the ethical commitments, transparency requirements, and long-term sustainability that public-oriented institutions demand.