K E Y N O T E  S P E A K E R S


Steffen Staab

Professor of Databases and Information Systems
Universität Koblenz-Landau, Germany

Web Futures – Inclusive, Intelligent, Sustainable

Almost from its very beginning, the Web has been ambivalent.
It has facilitated freedom for information, but this also included the freedom to spread misinformation. It has faciliated intelligent personalization, but at the cost of intrusion into our private lifes. It has included more people than any other system before, but at the risk of exploiting them.
The Web is full of such ambivalences and the usage of artificial intelligences threatens to further amplify these ambivalences. To further the good and to contain the negative consequences, we need a research agenda studying and engineering the Web, as well as numerous activities by societies at large. In this talk, I will present and discuss a joint effort by an interdisciplinary team of Web Scientists to prepare and pursue such an agenda.


Steffen is full professor for Databases and Information Systems at the Universität Koblenz-Landau, Germany, and full professor for Web and Computer Science at University of Southampton, UK. He studied in Erlangen (Germany), Philadelphia (USA) and Freiburg (Germany) computer science and computational linguistics. In his research career he has managed to avoid almost all good advice that he now gives to his team members. Such advice includes focusing on research (vs. company) or concentrating on only one or two research areas (vs. considering ontologies, semantic web, social web, data engineering, text mining, peer-to-peer, multimedia, HCI, services, software modelling and programming and some more). Though, actually, improving how we understand and use text and data is a good common denominator for a lot of Steffen’s professional activities.

Maarten de Rijke

Professor of Artificial Intelligence and Information Retrieval
University of Amsterdam, The Netherlands

Conversations Based on Search Engine Result Pages

How might we convey the information that is traditionally returned by a search engine in the form of a complex search engine result page (SERP) in a meaningful and natural conversation? In the talk I will start from recent work on so-called background based conversations, where a conversational agent has access to additional background information to help it generate more natural and appropriate responses. Then, I will talk about ongoing work on our next step: SERP-based conversations. I will will explain the task definitions, describe pipelines (subtasks), baselines, datasets, etc. Finally, I will describe the differences between background-based and SERP-based conversations and their relations to other, related tasks. Our work on SERP-based conversations is in its early stages, leaving lots of opportunities for follow-up research.

Based on joint work with Pengjie Ren, Nikos Voskarides, and Svitlana Vakulenko.


Maarten de Rijke is University Professor of Artificial Intelligence and Information Retrieval at the University of Amsterdam. He holds MSc degrees in Philosophy and Mathematics (both cum laude), and a PhD in Theoretical Computer Science. He worked as a postdoc at CWI, before becoming a Warwick Research Fellow at the University of Warwick, UK. He joined the University of Amsterdam in 1998, and was appointed full professor in 2004. He is a member of the Royal Netherlands Academy of Arts and Sciences (KNAW) and a recipient of a Pioneer Personal Innovation grant, the Tony Kent Strix Award, the Bloomberg Data Science Research Award, the Criteo Faculty Research Award, the Google Faculty Research Award, the Microsoft PhD Research Fellowship Award, and the Yahoo Faculty and Research Engagement Program Award as well as a large number of NWO grants. He is the director of the newly established Innovation Center for Artificial Intelligence and a former director of Amsterdam Data Science.

De Rijke leads the Information and Language Processing Systems group at the Informatics Institute of the University of Amsterdam, one of the world’s leading academic research groups in information retrieval. His research focus is at the interface of information retrieval and artificial intelligence, with projects on online and offline learning to rank, on recommender systems, and on conversational search.

A Pionier personal innovational research incentives grant laureate (comparable to an advanced ERC grant), De Rijke has helped to generate over 65MEuro in project funding. With an h-index of 69 he has published over 750 papers, published or edited over a dozen books, is editor-in-chief of ACM Transactions on Information Systems, co-editor-in-chief of Foundations and Trends in Information Retrieval and of Springer’s Information Retrieval book series, (associate) editor for various journals and book series, and a current and former coordinator of retrieval evaluation tracks at TREC, CLEF and INEX. Recently, he was co-chair for SIGIR 2013, general co-chair for ECIR 2014, WSDM 2017, and ICTIR 2017, co-chair “web search systems and applications” for WWW 2015, short paper co-chair for SIGIR 2015, and program co-chair for information retrieval for CIKM 2015.

The retrieval and language technology developed by De Rijke’s research group is being used by organizations around the Netherlands and beyond, and has given rise to various spin-off initiatives.

Jaak Vilo

University of Tartu, Estonia
STACC Competence Center, Estonia

Biomedical data infrastructure and analysis

Life sciences are quickly becoming one of the largest producers of scientific as well as observational routine care data. What sets it apart from large centralised infrastructure like  particle colliders or space telescopes, is the distributed nature of data production that is happening in smallest labs or large international consortia and everything in between. The challenges arise from the ever increasing volumes of data, huge variety and diversity of data types and in the case of human data also the security and privacy aspects. Bioinformatics is essential – it can be often embedded within research groups, focus on methods development, develop infrastructure for long term data maintenance and re-use, or focus on novel biological questions through using public data for “dry” experimentation and secondary use. Large centers focus on managing the long-term usability of the data through centralised data submission and curation activities. Databases can be repositories of primary data, curated selective databases, or derivative domain specific databases and resources. All of this requires coordination of standards, tools, end-user training, etc. ELIXIR is a bioinformatics data infrastructure effort on the ESFRI roadmap.  Main role of ELIXIR is to develop and coordinate data, tools, compute, training and standards. For example, the FAIR principles (data has to be Findable, Accessible, Interoperable, Reusable) for data use have been developed for the field. These guidelines have been now adopted widely for data management and stewardship.

But not all data can be made publicly available over the Internet for many valid reasons – this includes human DNA sequence data of patients, their electronic health records etc. Biobanks and large health data cohorts are developing approaches for data management, analysis and sharing in responsible manners while also protecting data subjects. Again, the large consortia and standards developments are needed. OHDSI is an organisation aiming making observational health data available for research purposes through open data standards – common data model OMOP-CDM and open source software and tools. Goal of IMI EHDEN project, for example, is to enable at least 100M individuals data in such distributed data network.

I will touch upon our activities on those various fronts of biomedical data and show how these data are managed and analysed in Estonia combining health records and DNA. We are also developing the personalised (or precision) medicine approaches and infrastructure that would “translate” from research to medical use and bring the genetic information in usable forms to patients and doctors for example for preventive approaches or patient’s genetic background based warnings already during the regular prescription process.


Prof. Jaak Vilo heads the Data Science chair and the Institute of Computer Science at University of Tartu, Estonia and leads the health data analytics of STACC, a public-private research organisation in Estonia. He earned his PhD in Computer Science at University of Helsinki, Finland. In 1999-2002 he worked at the European Bioinformatics Institute, UK as one of the pioneers in early gene expression microarray data analytics. There he developed the Expression Profiler toolset for various biological data analysis tasks. In 2002, after 12 years abroad, he moved back to Estonia to help creating the Estonian Biobank in PPP partnership with VC investments as director of informatics of EGeen Ltd. He also started his own research group BIIT at University of Tartu, now about 20-people strong. His group applies data analysis, machine learning and algorithmic techniques to a broad range of biological and health data and applications. Linking genomics and many other omics data and health records is a key to developing methods for personalisation of medicine. Medical data, lab measurements, pharmacogenetics and overall multi-genic disease risk scores are complicated to handle due organisational and national barriers, yet international research would benefit greatly from opening up and sharing such data and research results. Prof. Vilo is a head of ELIXIR-Estonia node of the pan-European biological data infrastructure whose mission is to facilitate global data re-use.

Dimitrios Gunopulos

National and Kapodistrian University of Athens

Urban Mobility Data: Challenges and Prospects

Human urban trajectory data collected from GPS-enabled mobile devices or vehicles are widely used in several applications including urban planning, traffic management, location based services. We consider several trajectory data analysis problems that come up in such settings. These include the problem of map creation from GPS trajectories, the problem of identifying frequently travelled paths, and the vehicle travel time estimation problem using historical data and real time data from only a small number of floating cars.


Dimitrios Gunopulos got his PhD from Princeton University in 1995. He has held positions as a Postdoc at the Max-Planck-Institut for Informatics, Researcher at the IBM Almaden Research Center, Visiting Researcher at the University of Helsinki, Professor at the Department of Computer Science and Engineering in the University of California Riverside, Professor in the Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, and Visiting Researcher, Microsoft Silicon Valley Research Center. His research is in the areas of Smartcities, Big Data, Data Mining, Databases, Sensor and Peer-to-Peer networks, Algorithms and Computational Geometry. He has co-authored over two hundred journal and conference papers that have been widely cited(h-index 72). His research has been supported by NSF (including an NSF CAREER award), the DoD, the Institute of Museum and Library Services, the Tobacco Related Disease Research Program, the European Commission, the General Secretariat of Research and Technology, AT&T, Nokia, a Yahoo FREP Award and a Google Faculty Award. He has served as a General co-Chair in SIAM SDM 2018, SIAM SDM 2017, HDMS 2011, and IEEE ICDM 2010 and as a PC co-Chair in ICDE 2020, ECML/PKDD 2011, IEEE ICDM 2008, ACM SIGKDD 2006, and SSDBM 2003.