Tutorials | Web Intelligence 2019

Maarten de Rijke

Professor of Artificial Intelligence and Information Retrieval
University of Amsterdam, The Netherlands

Conversations Based on Search Engine Result Pages

How might we convey the information that is traditionally returned by a search engine in the form of a complex search engine result page (SERP) in a meaningful and natural conversation? In the talk I will start from recent work on so-called background based conversations, where a conversational agent has access to additional background information to help it generate more natural and appropriate responses. Then, I will talk about ongoing work on our next step: SERP-based conversations. I will will explain the task definitions, describe pipelines (subtasks), baselines, datasets, etc. Finally, I will describe the differences between background-based and SERP-based conversations and their relations to other, related tasks. Our work on SERP-based conversations is in its early stages, leaving lots of opportunities for follow-up research.

Based on joint work with Pengjie Ren, Nikos Voskarides, and Svitlana Vakulenko.

Biography

Maarten de Rijke is University Professor of Artificial Intelligence and Information Retrieval at the University of Amsterdam. He holds MSc degrees in Philosophy and Mathematics (both cum laude), and a PhD in Theoretical Computer Science. He worked as a postdoc at CWI, before becoming a Warwick Research Fellow at the University of Warwick, UK. He joined the University of Amsterdam in 1998, and was appointed full professor in 2004. He is a member of the Royal Netherlands Academy of Arts and Sciences (KNAW) and a recipient of a Pioneer Personal Innovation grant, the Tony Kent Strix Award, the Bloomberg Data Science Research Award, the Criteo Faculty Research Award, the Google Faculty Research Award, the Microsoft PhD Research Fellowship Award, and the Yahoo Faculty and Research Engagement Program Award as well as a large number of NWO grants. He is the director of the newly established Innovation Center for Artificial Intelligence and a former director of Amsterdam Data Science.

De Rijke leads the Information and Language Processing Systems group at the Informatics Institute of the University of Amsterdam, one of the world’s leading academic research groups in information retrieval. His research focus is at the interface of information retrieval and artificial intelligence, with projects on online and offline learning to rank, on recommender systems, and on conversational search.

A Pionier personal innovational research incentives grant laureate (comparable to an advanced ERC grant), De Rijke has helped to generate over 65MEuro in project funding. With an h-index of 69 he has published over 750 papers, published or edited over a dozen books, is editor-in-chief of ACM Transactions on Information Systems, co-editor-in-chief of Foundations and Trends in Information Retrieval and of Springer’s Information Retrieval book series, (associate) editor for various journals and book series, and a current and former coordinator of retrieval evaluation tracks at TREC, CLEF and INEX. Recently, he was co-chair for SIGIR 2013, general co-chair for ECIR 2014, WSDM 2017, and ICTIR 2017, co-chair “web search systems and applications” for WWW 2015, short paper co-chair for SIGIR 2015, and program co-chair for information retrieval for CIKM 2015.

The retrieval and language technology developed by De Rijke’s research group is being used by organizations around the Netherlands and beyond, and has given rise to various spin-off initiatives.

SACTI: Security Analytics and Cyber Threat Intelligence on the Web

Website: https://mklab.iti.gr/sacti2019/

This workshop addresses an interdisciplinary research field involving Web Intelligence, Security Informatics, Big Data Analytics, Deep Learning/Machine Learning, and Cybersecurity and aims to investigate the deliberate misuse of technical infrastructure for subversive purposes, including (but not limited to): the spreading of extremist propaganda, antagonistic or hateful commentary; the distribution of malware; online fraud and identity theft; denial of service attacks; etc. A better understanding of such phenomena on the Web (including social media) allows for their early detection and underpins the development of effective models for predicting cybersecurity threats.

Dimitrios Gunopulos

National and Kapodistrian University of Athens

Urban Mobility Data: Challenges and Prospects

Human urban trajectory data collected from GPS-enabled mobile devices or vehicles are widely used in several applications including urban planning, traffic management, location based services. We consider several trajectory data analysis problems that come up in such settings. These include the problem of map creation from GPS trajectories, the problem of identifying frequently travelled paths, and the vehicle travel time estimation problem using historical data and real time data from only a small number of floating cars.

Biography

Dimitrios Gunopulos got his PhD from Princeton University in 1995. He has held positions as a Postdoc at the Max-Planck-Institut for Informatics, Researcher at the IBM Almaden Research Center, Visiting Researcher at the University of Helsinki, Professor at the Department of Computer Science and Engineering in the University of California Riverside, Professor in the Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, and Visiting Researcher, Microsoft Silicon Valley Research Center. His research is in the areas of Smartcities, Big Data, Data Mining, Databases, Sensor and Peer-to-Peer networks, Algorithms and Computational Geometry. He has co-authored over two hundred journal and conference papers that have been widely cited(h-index 72). His research has been supported by NSF (including an NSF CAREER award), the DoD, the Institute of Museum and Library Services, the Tobacco Related Disease Research Program, the European Commission, the General Secretariat of Research and Technology, AT&T, Nokia, a Yahoo FREP Award and a Google Faculty Award. He has served as a General co-Chair in SIAM SDM 2018, SIAM SDM 2017, HDMS 2011, and IEEE ICDM 2010 and as a PC co-Chair in ICDE 2020, ECML/PKDD 2011, IEEE ICDM 2008, ACM SIGKDD 2006, and SSDBM 2003.

Jaak Vilo

University of Tartu, Estonia
STACC Competence Center, Estonia

Biomedical data infrastructure and analysis

Life sciences are quickly becoming one of the largest producers of scientific as well as observational routine care data. What sets it apart from large centralised infrastructure like particle colliders or space telescopes, is the distributed nature of data production that is happening in smallest labs or large international consortia and everything in between. The challenges arise from the ever increasing volumes of data, huge variety and diversity of data types and in the case of human data also the security and privacy aspects. Bioinformatics is essential – it can be often embedded within research groups, focus on methods development, develop infrastructure for long term data maintenance and re-use, or focus on novel biological questions through using public data for “dry” experimentation and secondary use. Large centers focus on managing the long-term usability of the data through centralised data submission and curation activities. Databases can be repositories of primary data, curated selective databases, or derivative domain specific databases and resources. All of this requires coordination of standards, tools, end-user training, etc. ELIXIR is a bioinformatics data infrastructure effort on the ESFRI roadmap. Main role of ELIXIR is to develop and coordinate data, tools, compute, training and standards. For example, the FAIR principles (data has to be Findable, Accessible, Interoperable, Reusable) for data use have been developed for the field. These guidelines have been now adopted widely for data management and stewardship.

But not all data can be made publicly available over the Internet for many valid reasons – this includes human DNA sequence data of patients, their electronic health records etc. Biobanks and large health data cohorts are developing approaches for data management, analysis and sharing in responsible manners while also protecting data subjects. Again, the large consortia and standards developments are needed. OHDSI is an organisation aiming making observational health data available for research purposes through open data standards – common data model OMOP-CDM and open source software and tools. Goal of IMI EHDEN project, for example, is to enable at least 100M individuals data in such distributed data network.

I will touch upon our activities on those various fronts of biomedical data and show how these data are managed and analysed in Estonia combining health records and DNA. We are also developing the personalised (or precision) medicine approaches and infrastructure that would “translate” from research to medical use and bring the genetic information in usable forms to patients and doctors for example for preventive approaches or patient’s genetic background based warnings already during the regular prescription process.

Biography

Prof. Jaak Vilo heads the Data Science chair and the Institute of Computer Science at University of Tartu, Estonia and leads the health data analytics of STACC, a public-private research organisation in Estonia. He earned his PhD in Computer Science at University of Helsinki, Finland. In 1999-2002 he worked at the European Bioinformatics Institute, UK as one of the pioneers in early gene expression microarray data analytics. There he developed the Expression Profiler toolset for various biological data analysis tasks. In 2002, after 12 years abroad, he moved back to Estonia to help creating the Estonian Biobank in PPP partnership with VC investments as director of informatics of EGeen Ltd. He also started his own research group BIIT at University of Tartu, now about 20-people strong. His group applies data analysis, machine learning and algorithmic techniques to a broad range of biological and health data and applications. Linking genomics and many other omics data and health records is a key to developing methods for personalisation of medicine. Medical data, lab measurements, pharmacogenetics and overall multi-genic disease risk scores are complicated to handle due organisational and national barriers, yet international research would benefit greatly from opening up and sharing such data and research results. Prof. Vilo is a head of ELIXIR-Estonia node of the pan-European biological data infrastructure whose mission is to facilitate global data re-use.

ACER: Affective Computing and Emotion Recognition

Website: TBA

Emotion Recognition became a key scenario for Artificial Intelligence and Affective Computing, in particular human robot interaction, data mining systems, and social network analysis.
Various emotion-mining techniques can be exploited for creating and automating personalized interfaces or subcomponent technology for larger systems, i.e. in business intelligence, affective tutoring and e-learning, social robots, and recommender systems.
Different from sentiment analysis, this approach works at a deeper level of abstraction, aiming at recognizing specific emotions and not only the positive/negative sentiment; in order to extract, manage and predict emotions in limited sets, based on novel models of emotions, or on well-accepted models.
The aim of the international workshop ACER is to present, discuss and ideate new affect computing and emotion recognition techniques in any AI-related task, especially to solve problems that can make life better, bringing together researchers and practitioners for stimulating cooperation and cross-fertilization between different communities focused on research, development and applications of emotion recognition. Since the cooperation among disciplines, e.g. computer science, psychology, neurology, is of great interest and benefit for this research area, we particularly invite submissions with an interdisciplinary view and participation of authors.
The ACER workshop aims also to create a network of research for future events and publications on Affective Computing, as already established in the previous editions (ACER-EMORE2019@ICCSA Saint Petersburg, Russia; EMORE2018@ICCSA Melbourne, Australia; EMORE2017@ICCSA Trieste, Italy; ACER2017@IEEE/ACM/WIC WI Leipzig, Germany) and with the special issue “Emotional Machines: the next revolution” in the Web Intelligence Journal; . Aiming at this collaboration path, ACER also welcomes papers on ongoing projects and PhD showcases, as well as applications, data sets, novel techniques, and multimodal or interdiscipinar approaches to emotion recognition.

SIWEB: Workshop On Social Innovation And Web Intelligence

Website: https://ptwist.eu/SIWEB2019/

Many research initiatives and projects who are at the crossroads of social innovation and web intelligence act so far in a dispersed manner and shall be given the opportunity to share knowledge and experiences in this workshop. Several social innovation issues demand novel solutions and new views for resolving environmental, human rights, social conflicts and many other issues.

SIWEB Workshop is driven by social innovation projects like the EU projects under the umbrella of Collective Awareness Platforms which revalue entities and materials, use open platforms environments while challenging Web intelligence and cutting edge technologies (such as blockchains, crowdsourcing, gamification, etc). Such novel approaches can improve social innovation momentum with new forms for living, socializing, business uptaking, marketing, and several disruptive tasks. The Workshop targets work from academia, communities, or business sides with emphasis on how Web intelligent solutions can propose new and unconventional solutions for social good, innovation, and societal impact.

SIWEB Workshop aims to bring together research initiatives, stakeholders, academia, business vendors, innovators, and projects who are at the edge of social innovation and Web intelligence. The presented papers should focus on those initiatives which strive for stimulating, setting up and sustaining innovation systems which support multiple actors such as citizens, communities, inventors, innovators, entrepreneurs, or public institutions in co-creating and strengthening societal and circular economy actions in-line with digital social innovation principles.

Steffen Staab

Prof. of Databases and Information Systems
Universität Koblenz-Landau, Germany

Web Futures – Inclusive, Intelligent, Sustainable

Almost from its very beginning, the Web has been ambivalent.
It has facilitated freedom for information, but this also included the freedom to spread misinformation. It has faciliated intelligent personalization, but at the cost of intrusion into our private lifes. It has included more people than any other system before, but at the risk of exploiting them.

The Web is full of such ambivalences and the usage of artificial intelligences threatens to further amplify these ambivalences. To further the good and to contain the negative consequences, we need a research agenda studying and engineering the Web, as well as numerous activities by societies at large. In this talk, I will present and discuss a joint effort by an interdisciplinary team of Web Scientists to prepare and pursue such an agenda.

Biography

Steffen is full professor for Databases and Information Systems at the Universität Koblenz-Landau, Germany, and full professor for Web and Computer Science at University of Southampton, UK. He studied in Erlangen (Germany), Philadelphia (USA) and Freiburg (Germany) computer science and computational linguistics. In his research career he has managed to avoid almost all good advice that he now gives to his team members. Such advice includes focusing on research (vs. company) or concentrating on only one or two research areas (vs. considering ontologies, semantic web, social web, data engineering, text mining, peer-to-peer, multimedia, HCI, services, software modelling and programming and some more). Though, actually, improving how we understand and use text and data is a good common denominator for a lot of Steffen’s professional activities.

SPH: Security and Privacy in Healthcare

Website: https://curex-project.eu/content/sph-workshop

With the rise of new types of cyber-crimes and digital healthcare infrastructures and platforms being universally recognised as Critical Information Infrastructure, healthcare organisations have begun to realise the necessity to prepare against these challenges. This calls for the effective preparation of organisations in an ever-evolving cyber-attack landscape. In this workshop, we invite high-quality submissions in the areas of security and privacy in the healthcare sector and other related topics. Submitted papers should highlight methods and approaches that can be used to analyse the security risks and requirements, highlight the security threats in the area of Healthcare and to provide novel methods and approaches to assure security, safeguard patient privacy and increase patient trust in the currently vulnerable critical healthcare information infrastructures.

CMDWM: Complex methods for data and web mining

Website: http://www.feds.ac.cn/index.php/zh-CN/xwbd/2807-the-6th-workshop-on-complex-methods-for-data-and-web-mining-cmdwm

New real-world applications of data mining and machine learning have shown that popular methods may appear to be too simple and restrictive. Mining more complex, larger and generally speaking “more difficult” datasets pose new challenges for researchers and ask for novel and more complex approaches. We organize this workshop where we want to promote research and discussion on more complex and advanced methods for the particularly demanding data and web mining problems. Although we welcome submissions concerning methods based on different principles, we would like also to see among the new research on using optimization techniques. The new data and web mining problems are definitely more complex than traditional ones and they could result in more difficult non-convex optimization formulations. We would like to focus on the interest of the data mining community on various challenging issues which come up while using complex methods to deal with the difficult data mining problems.

ABCSS: 4th International Workshop on Application of Big Data for Computational Social Science

Website: https://css-japan.com/en/abcss2019/

Contemporary computational sciences give important impacts on wide aspects of social sciences. Simulation technologies or abilities to calculate complex systems social scientists want to deal with are exponentially expanding, and thus more complex and more real systems could be a target. The so-called Big Data analysis allows us to quantify human behavior and social phenomena at a fine-grained level, yet it is global in scale, thereby complementing experimental data and theoretical and computational simulation results.

iCRM: 5th International Workshop on Intelligent Data Analysis in Integrated Social CRM

Website: http://www.scrc-leipzig.de/en/events/icrm2019/

Integrated Social Customer Relationship Management (Social CRM) is an emerging concept that includes strategies, processes, and technologies that use social media in CRM. Approaches from the field of web intelligence are important to transform the large mass of data available on social media into value-adding opportunities for companies. Today, a variety of software applications based on web and text-mining techniques is used for this task. However, these tools often fall short in identifying complex patterns (e.g. semantic information, intentions). Advanced techniques, such as semantic business intelligence (SBI) or computational intelligence (CI), promise a great potential to improve the capabilities in knowledge discovery and may also enable new usage scenarios in Social CRM (e.g. network analysis, topic recognition, trend prediction) in various domains (e.g. tourism, banking, energy, public sector, publishing, health, logistics, education). However, their current application in commercial tools and real-world scenarios seems limited not only because of missing expertise, but also because of aspects such as ease-of-use, configuration costs or availability of required resources. The workshop aims to shed light on current research efforts from a technical and economical perspective targeting the development and implementation of innovative tools and methods for intelligent data analysis in Social CRM, resulting in new (integrated) processes and capabilities.

TICS: Topics on Internet Censorship and Surveillance

Website: https://tics.site/cfp/

The ever-increasing demand for online content moderation and user profiling sees the adaptation of Web Intelligence concepts that were developed in good faith, into a censorship and surveillance apparatus owned by corporations and national agencies. Consequently, users are bound to an Orwellian Internet where mainstream platforms — such as search engines, social media, and content providers — place the blame for filter bubbles and extensive user behavioral analysis on Artificial Intelligence, since it is particularly difficult to detect bias, deliberate human intervention, or that inferences have unreportedly been made for purposes other than platform personalization. TICS aims to explore the technological, socio-economic, and legal means and driving forces behind these issues, and to propose alternative directions for building a semantic and human-centric Web that promotes netizen freedom.

SMA4H: Social Media Analytics for Health intelligence: How artificial intelligence transforms healthcare

Website: http://sma4h.icar.cnr.it/wordpress

Social media allows users to connect, collaborate, and debate on any topic. The result is a huge volume of user-generated content, including healthcare information that, if properly mined and analyzed, could help the public and private healthcare sectors improve the quality of their products and services while reducing costs.
In the public health area, especially, the physician could take a great advantage since the available huge data can be gathered faster and at a lower cost, compared to the traditional sources, mainly surveys. In fact, the pervasiveness and crowdsourcing power of social media data allow modeling phenomena that were not possible before because either too expensive or outright impossible to answer, such as the distribution of health information in a population, tracking health information trends over time and identifying gaps between health information supply and demand. Although most individual social media posts and messages contain little informational value, aggregation of millions of such messages can generate important knowledge.
Recently, social network data have been explored to monitor and analyze health issues with applications in disease surveillance and epidemiological studies. By far the first and most common healthcare application in social media is influenza. Seminal works have shown that the tweets can be used to track and predict influenza and detect depression. To this purpose, a variety of techniques have been proposed: starting from capturing the overall trend of a particular disease outbreak by monitoring social media, many other approaches appeared such as the ones based on linear regression, supervised machine learning and social network analysis. Other than influenza surveillance, other topics have started to be addressed, including, pharmacovigilance, user behavioral patterns, drug abuse, depression, well-being, assisted living and tracking infectious/viral disease spread.

NLPOE: 12th Natural Language Processing and Ontology Engineering

Website: http://rcei.jiangnan.edu.cn/info/1004/1394.htm

Ontology engineering is a subfield of artificial intelligence and computer science, which aims at a structured representation of terms and relationship between the terms within a particular domain, with the purpose to facilitate knowledge sharing and knowledge reuse. Ontology project involves the development of Ontology building programs, Ontology life-cycle management, the research of Ontology building methods, support tools, and ontology languages, and a series of similar activities. Ontologies have found important applications in information sharing, system integration, knowledge-based software development, and many other issues in the software industry.
However, ontology engineering is a time-consuming and painstaking endeavor, and NLP technology has important contributions to make it quick and automatic development of ontologies. This workshop will focus on the recent advances made in Ontology engineering and NLP, with the aim to promote the interaction between and common growth of the two areas. We are particularly interested in the building of upper-level language ontology in NLP and the application of NLP technology in Ontology engineering.
More importantly, we expect that individuals and research institutions in the areas of both Ontology Engineering and NLP could pay attention to this workshop, which may contribute to the integration and growth of these two areas.

Web4City2019: Web for Smart Cities

Website: https://web4city.wordpress.com

The proposed workshop aims to setup an event that will focus web intelligence for smart cities by bringing together researchers and practitioners in the fields of smart city and artificial intelligence (AI), especially web-intelligence. Specifically, the focus is on the emerging role of intelligence that transforms the smart services to adaptive and self-evolving ones, which deal with the usual smart city challenges like local growth, quality of life’s improvement, efficiency and climate change, etc.

MLACS: Machine Learning Algorithms for Cybersecurity

Website: https://tau.usq.edu.au/MLACS2019

Cyber Security is an area of growth with employment opportunities abound. Many universities in Australia and overseas have started offering niche cybersecurity programs. Along with the same note, there is a growth in the number of research students in Cyber Security, indicating demand for this upcoming domain. Recent trends indicate that machine learning is a key aspect of Cyber Security due to the volume of information crossing global networks, and the individual data associated with this information. In this workshop, we will be discussing the modern machine learning algorithm development, implementation, and utilization in business scenarios specific to Cyber Security.

T U T O R I A L S

CONTACT

Academic Matters

LOCAL ARRANGEMENTS

EASY CONFERENCES

Maarten de Rijke

Dimitrios Gunopulos

Jaak Vilo

Steffen Staab