TL;DR (full post below)
SEMANTiCS 2024 marking its 20th anniversary as a leading conference in Semantic Web, Knowledge Graphs (KGs), and Artificial Intelligence (AI), promises to tackle critical AI challenges in AI along three key axes: i) Neuro-symbolic AI, ii) the integration of LLMs and KGs, and iii) decentralised Semantics.
In particular, this vicennial edition emphasises research into Neuro-Symbolic AI (NAI), a field that blends the strengths of neural networks based methods (e.g., handling large datasets) and symbolic AI (e.g., reasoning, explainability).
Large Language Models (LLMs) have revolutionised the landscape of Natural Language Processing (NLP) by outperforming traditional baselines in many of the major NLP downstream tasks such as questions answering, machine translation, etc. However, they are prone to overgeneration, generating inaccurate or misleading outputs termed as hallucinations. SEMANTiCS seeks to explore novel strategies for better integrating LLMs with KGs for mitigating these drawbacks. Current developments in the field include:
- Retrieval-Augmented Generation (RAG): KGs provide structured factual information to the LLM, reducing its reliance on potentially flawed training data.
- Knowledge Injection: Embedding KG data directly into the LLM allows it to leverage knowledge during text generation.
- Answer Retrieval: Using LLMs to translate questions into SPARQL queries to obtain precise answers from KGs.
- Hallucination Detection: Cross-referencing LLM outputs with the facts from KGs to flag inconsistencies and promote transparency.
The current landscape of AI, including NAI and LLMs. poses major challenges and gives rise to data privacy concerns and regulations like GDPR. SEMANTiCS 2024 will also explore how Semantic Web technologies facilitate a decentralized approach to personal data management. Systems like the Fediverse and Solid give users control over their data using decentralized pods and access control mechanisms. This shift benefits users and opens possibilities for:
- Trustworthy data handling: Reducing risks and ensuring better compliance with privacy regulations.
- Semantic Interoperability: Enabling personalised experiences across decentralized platforms while maintaining data integrity and ownership.
Read below for more details (full post).
This year SEMANTiCS turns 20! Over the past two decades SEMANTiCS has established itself as a premiere forum where academics and industrial leaders exchange ideas, introduce new technologies, and discuss the most recent advances in the field. This ongoing synergy allowed the SEMANTiCS community to be active and responsive to the latest trends in the field.
As chairs of the Research and Innovation Track, we firmly believe that this year will be no different, with plenty of contributions tackling the most daunting challenges. Amongst all the topics, we envision at least three main areas in which SEMANTiCS 2024 will contribute to the literature: i) Neuro-symbolic AI, ii) integrating LLMs and KGs, and iii) decentralised Semantics.
Neuro-Symbolic AI
In the era of AI, neural networks have gained widespread adoption for tasks spanning NLP, computer vision, and beyond. They’ve found applications in diverse domains like biomedicine, law, materials sciences, etc. On the other hand, symbolic AI encompasses techniques like KGs and rule-based methods, offering more explainable and interpretable solutions. However, neural-network-based methods pose challenges in explainability. If neural-networks-based approaches excel at some tasks (e.g., image recognition, NLP, pattern recognition), symbolic AI excels at other kinds of tasks (e.g., logical reasoning and planning, explainability). This duality has spurred the emergence of a promising research area called NAI, which integrates both approaches to leverage their strengths. For instance, neural networks handle larger datasets efficiently, while symbolic methods offer transparency and reasoning capabilities and ensure trustworthy solutions.
Trustworthy NAI promotes transparent architecture and ensures reliable decision-making processes. Combining logics and neural processing fosters explainability and accountability in AI systems. Through rigorous validation and interpretability mechanisms, it inspires confidence in its outcomes. This stands as a beacon of reliability, paving the way for ethical and responsible AI advancements. Additionally, this combination allows inductive reasoning as well as deductive reasoning capabilities.
In alignment with this concept, one of the focal points of SEMANTICS 2024 revolves around NAI methods, encompassing tasks such as learning representations over KGs for various downstream applications, recommender systems, entity linking, and relation extraction. Due to the merits of NAI, such solutions are being widely employed in industry and academia, and they are also applicable to various domain-specific tasks, such as knowledge graph completion in the biomedical domain, cultural heritage, materials sciences, etc.
In addition, NAI also delves into the transformer-based approaches, which serve as the backbone for a myriad of language models. Whether these models are trained through masked or pre-trained techniques or designed to generate text, transformer architectures play a central role in enabling sophisticated processing of linguistic data. Moreover, the exploration of transformer-based methodologies within the context of NAI aligns seamlessly with the overarching objectives pursued in the domain of LLMs. This intersection underscores the importance of leveraging advanced language understanding capabilities to enhance the interpretability, reasoning, and overall performance of AI systems.
Integrating Knowledge Graphs and Large Language Models
LLMs are shifting paradigms within AI, and since their emergence, they have been unlocking new opportunities in NLP and beyond. However, due to their probabilistic nature, they sometimes generate text that is however far from being accurate. In this context, KGs can function as authoritative knowledge bases, providing essential support and alleviating those so-called hallucinations. Currently, LLMs and KG are combined through four main strategies.
First, KGs can support RAG by providing structured factual information in the form of entities and relationships, allowing the RAG model to retrieve and filter information, and ensuring that only relevant and verifiable data is fed to the LLM. In this case, when prompted with a new query, the retrieval component of RAG can search the KG and extract passages that directly relate to the input topic. This focused retrieval limits the LLM’s exposure to potentially misleading or unrelated information.
Second, LLMs can be augmented with various knowledge injection techniques by embedding information as vectors within the model’s parameter space. This allows the LLM to directly access and leverage that knowledge during text generation or task completion.
Both previous techniques improve the LLMs’ factual accuracy and enable greater specialisation for specific domains where the training data might be limited.
As a third strategy, LLMs can be fine-tuned to translate questions in neutral language to SPARQL queries, and then retrieve the answer from a triple store. Different from the previous approaches, this strategy leverages KG as the primary source of knowledge, with the LLMs being simply employed for their translating skills.
Finally, KGs, with their structured representation of facts and relationships, can also be used to identify hallucinations in LLMs. By mapping the output of the LLM against the KG, it allows systems to flag potential discrepancies or contradictions with established knowledge. For example, if the language model asserts an inaccurate historical date, the system can identify the correct date stored within the KG and provide it to the user, hence enhancing trustworthiness and transparency.
Research shows no single strategy dominates across all tasks. This underscores the need for improving these strategies as well as new solutions.
Decentralized Semantics for User Empowerment
80% of customers indicate they are more likely to do business with a company that offers personalised experiences. Traditionally, companies rely on hoarding personal data about their users on their own (cloud) infrastructure to drive this personalisation. This gives users little to no control over their data.
With changing laws and regulations, i.e. GDPR in Europe, privacy is of increasing concern for these companies. Non-compliance has resulted in significant fines, with Meta receiving a record €1.2 billion fine in 2023. On the other hand, consumers are also increasingly focused on online privacy. The SolidMonitor survey in Belgium revealed the public’s growing concern about how companies handle their data, with many users actively limiting data sharing and avoiding certain applications. The data siloing by companies further complicates the matter, with customers struggling to track their digital footprint.
As such, users are increasingly flocking to alternative forms of personalised data storage. Notable examples are the Fediverse and Solid ecosystem. The Fediverse is a network of federated social network platforms that can communicate with each other while remaining independent. They allow any individual or organisation to host their own servers and use decentralised protocols, such as ActivityPub, to enable interaction. Some well-known examples include Mastodon, PeerTube, and Diaspora. The Solid project and standardisation initiative combine existing Semantic Web standards, such as the Linked Data Platform (LDP) that enables to semantically describe any document using RDF, to realise data interoperability across the decentralised data pods and introduces access control mechanisms, e.g., identity, authentication, and permission lists. As such, each user can have one or more pods in which they store their personal data and are able to decide at every moment who has access to which specific data. Several Solid implementations exist, such as the Community Solid Server (CSS) and the Node Solid Server (NSS).
These decentralised initiatives ensure that personal data is no longer centralised in storage silos owned by the service providers. Indeed, users can choose which services to use based on the functionality they offer, rather than in which service ecosystem their data is hoarded. This separation of concerns has the potential to enable trustworthy data storage actors, and entirely unburden GDPR compliance from application and service developers. Semantic Web technologies offer the semantic interoperability layer that is needed to ensure that all the data spread across all these decentralised instances can still be meaningfully integrated with each other.
However, to fully realise the decentralised Web vision, a number of challenges need to be resolved to ensure that we can still enjoy all the services we enjoy today while building upon a decentralised personalised storage ecosystem:
- Semantic continuous publishing: Large-scale adoption relies on shared standards (ontologies) and efficient ways to annotate diverse data with meaningful metadata.
- Human-centric digital trust: Users need intuitive tools to manage their data, set granular permissions, ensure identity verification, and build reliable audit trails for usage tracking.
- Derivation of actionable insights: Techniques must be developed to analyse and derive meaningful insights from data spread across many sources while protecting user privacy and handling differing schemas.
- Scalable, efficient, and incremental querying: Efficient algorithms are needed to discover relevant data and query across constantly changing decentralized sources.
- Usability and User Experience: Intuitive interfaces and tools are crucial for mainstream adoption, including user-friendly mechanisms for data management and interaction.
- Economic incentives and governance: Sustainable models are needed to incentivize participation and establish effective governance structures for the decentralized web.
By tackling these obstacles, we pave the way for a future where data privacy is respected, innovation flourishes into a dynamic ecosystem of personalised services, and users reclaim sovereignty over their digital lives.
SEMANTiCS 2024 invites researchers, industry experts, and innovators to submit papers highlighting these fields. Will you be among those who unveil groundbreaking strategies for leveraging KGs and decentralized semantics?
This post has been written by Mehwish Alam, Femke Ongenae, and myself, Research & Innovation Track Chairs at the SEMANTiCS 2024 conference to be held in Amsterdam in September 2024. A copy of this post is available on https://2024-eu.semantics.cc/page/news?page=2024-04-15