loading . . . OWS.EU Partner in Focus: Radboud University Continuing our partner portrait series, today’s spotlight is set on Radboud University in the Netherlands. **Prof.dr.ir. Arjen P. de Vries** and **Prof.dr.ir. Djoerd Hiemstra** lead the Information Retrieval research group at Radboud University, part of the Data Science section in the Institute for Computing and Information Sciences.
In OpenWebSearch.EU, the team, which is complemented by PHD candidates **Gijs Hendriksen** and**Daria Alexander** , have been developing a new architecture for search engines with many parts of the system being decentralized. The key idea is to separate index construction from the search engines themselves, where the most expensive step to create index shards can be carried out on large clusters while the search engine itself can be operated locally.
Another vision includes an Open-Web-Search Engine Hub, where companies and individuals can share their specifications of search engines and pre-computed, regularly updated search indices.
Having recently launched the OpenWebIndex pilot, we asked Arjen and Gijs about some key results and learnings thus far while also touching on some next steps for the remaining project time.
_**Gijs and Arjen, thank you both for your time today. Please could you describe Radboud University’s tasks in the OpenWebSearch.eu project? What is the field of expertise that your bring to the project?**_
**Arjen:** The Radboud University expertise is **Information Retrieval** , which is the core field of computer science that contributes to the development of search engines. The central question is how computers can establish the relevance of information objects for people’s information needs. We look into a wide range of open questions in the field, covering topics including the mathematical modeling of information (with and without new AI techniques), scalable and resource efficient system architectures, and, perhaps the most difficult one, how to measure the quality of retrieval systems and compare different approaches on their effectiveness.
**_Sounds like an ongoing tedious process. Have you found any key learnings for what works and what doesn’t in combining or comparing the various approaches?_
**
**Gijs:** There were many learnings along the way indeed. Without going into too much detail, some of our key learnings are published as research papers and OWS deliverables.
_**How is the project progressing overall? Which major milestones are you proud of thus far?**_
**Gijs:** From our point of view, the project is progressing very well! After 2.5 years of engineering we are now running daily workflows that produce daily index shards from crawled content across three European data centers. Now that we are getting the data out there, we can focus on improving the ease of access to these index shards.
_**Could you elaborate on that a bit more?**_
**Gijs:** Sure. We are now working on improving access to the Open Web Index. A main part of that is deciding how we want to ‘shard’ the data, i.e. how we want to distribute the data across logical partitions that can be used to efficiently query a part of the data. Currently, we split the index into language-based shards, but we want to experiment with topic-based shards and even create shards based on frequent access patterns.
We are also actively investigating how we can best integrate shards over time. We are currently producing daily index shards, but have yet to decide how we can best combine these daily subsets, and how we should deal with document updates and deletions. Finally, we recognize that many people want to be able to query our index directly without having to download all our index data. We are working on a way in which we offer direct querying capabilities over an inverted file hosted in a data lake. This should also enable us to efficiently propagate updates to the index.
_**Sounds promising. What are some of the challenges you are facing?**_
**Arjen:** The main technical challenges stem directly from the scale of the Web, and the noisiness of Web data. The really big problem remains however that of evaluation. How do you establish the value of innovations in search without continuously running costly user studies? We are looking into mixing ideas from what is known in our field as ‘**the Cranfield tradition** ’, with new developments in LLMs, and user-oriented studies to fill in where machines would fail.
_**What makes the OWS project special?**_
**Arjen:** EU projects are often a way for partner organisations to fund their own interests, resulting in internal project frictions (large or small) about the direction and final objectives. With OpenWebSearch.eu it is nothing like that. Everyone on the team is highly motivated to make a lasting change in the distribution of online powers, and such a broadly shared target is so refreshing!
We are enjoying it thoroughly to take part in this enterprise, and we are convinced that OpenWebSearch.eu will produce a lasting impact, sustainable beyond the duration of the project.
_**Do you already have plans for the time after the project ends?**_
**Arjen:** The brief answer is ‘Keep going’. Hopefully we manage to keep the team together, and find funding to even expand by integrating parties that have started to contribute actively to the Open Web Search and Analysis Infrastructure. And we will work hard to make the index a fundamental building block, suitable for others to do Web search research.
_**Thank you for the insights!**_
Read more about Radboud University: https://openwebsearch.eu/partners/radboud-university/ https://openwebsearch.eu/ows-eu-partner-in-focus-radboud-university/