Latest News Integrating Conversational Agents and Knowledge Graphs Within the Scholarly Domain18 March 2023“Integrating Conversational Agents and Knowledge Graphs Within the Scholarly Domain” is a journal paper accepted at IEEE Access. Antonello Meloni1, Simone Angioni1, Angelo Antonio Salatino2, Francesco Osborne2, Diego Reforgiato Recupero1, Enrico Motta2 1 Department of Mathematics and Computer Science, University of Cagliari (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) Abstract In the last few years, chatbots have become mainstream solutions adopted in a variety of domains for automatizing communication at scale. In the same period, knowledge graphs have attracted significant attention from business and academia as robust and scalable representations of information. In the scientific and academic research domain, they are increasingly used to illustrate the relevant actors (e.g., researchers, institutions), documents (e.g., articles, patents), entities (e.g., concepts, innovations), and other related information. Following the same direction, this paper describes how to integrate conversational agents with knowledge graphs focused on the scholarly domain, a.k.a. Scientific Knowledge Graphs. On top of the proposed architecture, we developed AIDA-Bot, a simple chatbot that leverages a large-scale knowledge graph of scholarly data. AIDA-Bot can answer natural language questions about scientific articles, research concepts, researchers, institutions, and research venues. We have developed four prototypes of AIDA-Bot on Alexa products, web browsers, Telegram clients, and humanoid robots. We performed a user study evaluation with 15 domain experts showing a high level of interest and engagement with the proposed agent. Download Download from DOI (Open Access): https://doi.org/10.1109/ACCESS.2023.3253388 Download from Institutional Repository (ORO): https://oro.open.ac.uk/88056/... R-Classify: Extracting Research Papers’ Relevant Concepts from a Controlled Vocabulary12 November 2022“R-Classify: Extracting Research Papers’ Relevant Concepts from a Controlled Vocabulary” is a software paper accepted at Software Impacts. Tanay Aggarwal, Angelo Antonio Salatino, Francesco Osborne, Enrico Motta Knowledge Media Institute, The Open University, Milton Keynes (UK) Abstract In the past few decades, we saw a proliferation of scientific articles available online. This data-rich environment offers several opportunities but also challenges, since it is problematic to explore these resources and identify all the relevant content. Hence, it is crucial that they are appropriately annotated with their relevant concepts so to increase their chance of being properly indexed and retrieved. In this paper, we present R-Classify, a web tool that assists users in identifying the most relevant concepts according to a large-scale ontology of research areas in the field of Computer Science. Web App R-Classify is up and running. Feel free to give it a try at https://cso.kmi.open.ac.uk/classify/ Download Download from DOI (Open Access): https://doi.org/10.1016/j.simpa.2022.100444 Download from institutional repository: https://oro.open.ac.uk/85958/... Leveraging Knowledge Graph Technologies to Assess Journals and Conferences at Springer Nature12 November 2022“Leveraging Knowledge Graph Technologies to Assess Journals and Conferences at Springer Nature” is an In-Use paper presented at the 21st International Semantic Web Conference (ISWC 2022). Simone Angioni1, Angelo Antonio Salatino2, Francesco Osborne2,3,AliaksandrBirukou4, Diego Reforgiato Recupero1, Enrico Motta2 1 Department of Mathematics and Computer Science, University of Cagliari (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) 3 Department of Business and Law, University of Milano Bicocca, Milan (Italy) 4 Springer-Verlag GmbH, Tiergartenstrasse 17, 69121 Heidelberg (DE) Abstract Research publishing companies need to constantly monitor and compare scientific journals and conferences in order to inform critical business and editorial decisions. Semantic Web and Knowledge Graph technologies are natural solutions since they allow these companies to integrate, represent, and analyse a large quantity of information from heterogeneous sources. In this paper, we present the AIDA Dashboard 2.0, an innovative system developed in collaboration with Springer Nature to analyse and compare scientific venues, now also available to the public. This tool builds on a knowledge graph which includes over 1.5B RDF triples and was produced by integrating information about 25M research articles from Microsoft Academic Graph, Dimensions, DBpedia, GRID, CSO, and INDUSO. It can produce sophisticated analytics and rankings that are not available in alternative systems. We discuss the advantages of this solution for the Springer Nature editorial process and present a user study involving 5 editors and 5 researchers, which yielded excellent results in terms of quality of the analytics and usability. Media Download Download from ORO: http://oro.open.ac.uk/84363/ Download from DOI: https://doi.org/10.1007/978-3-031-19433-7_42... Best Paper Award at the In-Use Track ISWC 202212 November 2022It is an honour to be prized for the Best Paper Award at the In-Use Track ISWC – International Semantic Web Conference (Premiere Conference in the Semantic Web). Great work in collaboration with Springer Nature and UniCa – Università degli Studi di Cagliari. The paper describes our recent efforts in putting semantic technologies (The AIDA Dashboard https://aida.kmi.open.ac.uk/dashboard) in production, and for use in the industry. Read our paper here: https://oro.open.ac.uk/84363/ Further Reading From KMi Planet (Eng): https://kmi.open.ac.uk/news/article/19810 From University of Cagliari (Italian): https://unica.it/unica/page/it/aida_dashbord_applicazione_web_innovativa_per_lanalisi_di_riviste_autori_e_conferenze_scientifiche... Annotating D3 dataset with the CSO Classifier20 September 2022Abstract The DBLP Discovery Dataset (D3) is a newly created dataset of research papers in the field of Computer Science which can support several tasks like identifying trends in research activity, productivity, focus, bias, accessibility, and impact. This dataset stems from DBLP and integrates additional information from the full-texts. We argue that papers classified with their research topics can improve the identification of research trends. To this end, we used the CSO Classifier to annotate all the papers within D3 and we made such extension available for research purposes. Introduction The DBLP Discovery Dataset (D3) is a dataset in the field of Computer Science, which was recently released and can support several tasks including identifying trends in research activity, productivity, focus, bias, accessibility, and impact. This dataset derives from DBLP and integrates additional information from the full-texts. Each paper is associated with a set of attributes: corpusid, abstract, updated, externalids, url, title, authors, venue, year, referencecount, citationcount, influentialcitationcount, isopenaccess, s2fieldsofstudy, publicationtypes, publicationdate, and journal. We argue that annotating research papers with their research topics can improve a number of tasks, including the exploration of research trends, the recommendation of similar research articles, and extraction of knowledge (read more). To this end, we run the CSO Classifier to annotate all the papers within the D3 dataset and we made such extension available for research purposes on Zenodo (see D3 dataset annotated with CSO topics – https://zenodo.org/record/7097148). CSO Classifier The CSO Classifier is an application that takes as input the text from abstract, title, and keywords of a research paper and outputs a list of relevant concepts from CSO. It consists of two main components: (i) the syntactic module and (ii) the semantic module. The syntactic module parses the input documents and identifies CSO concepts that are explicitly referred in the document. The semantic module uses part-of-speech tagging to identify promising terms and then exploits word embeddings to infer semantically related topics. Finally, the CSO Classifier combines the results of these two modules, removes outliers, and enhances them by including relevant super-areas. The reader can refer to this article for additional details. Dataset In this section, we will observe how to process the newly created annotation. The D3 dataset is distributed in JSONL format, meaning that each line is a JSON dictionary. This format is quite convenient for large files as it does not require the whole dataset to be parsed at once, but it can be parsed row by row (i.e., paper by paper). For the sake of consistency, we kept the same format with our annotated dataset. D3 dataset In Listing 1, we present an example of line (paper) found in the D3 dataset, having corpus id 26. In particular, we can observe the richness of metadata pertained in this dataset. JSON associated to paper (corpusid 26) within the D3 dataset. CSO annotations In Listing 2 we can find the extracted topics from the same paper (corpus id 26) showed in Listing 1. It is a JSON dictionary that will sit as single line within the distributed dataset. In particular, it contains 5 keys. There is the corpusid which helps to refer to the original paper contained in the D3 dataset. Then, there are four keys that express the outcome of the CSO Classifier: syntactic, semantic, union, and enhanced. The keys syntactic and semantic respectively contain the topics returned by the syntactic and semantic module. Union contains the unique topics found by the previous two modules. In enhanced you can find the relevant super-areas. JSON obtained by the CSO Classifier for the same paper (corpusid 26). Downloads Dataset: https://zenodo.org/record/7097148 This article in PDF: Annotating D3 dataset with the CSO Classifier... Sci-K 2022 – International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment17 June 2022“Sci-K 2022 – International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment” is the introductory chapter of the workshop proceedings of “Sci-K 2022 – International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment” co-located with The Web Conference 2022. Paolo Manghi1, Andrea Mannocci1, Francesco Osborne2, Dimitris Sacharidis3, Angelo Salatino2, Thanasis Vergoulis4 1 CNR-ISTI – National Research Council, Institute of Information Science and Technologies “Alessandro Faedo” (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) 3 Université Libre de Bruxelles (Belgium) 4 “Athena” RC (Greece) Abstract In this paper we present the 2nd edition of the Scientific Knowledge: Representation, Discovery, and Assessment (Sci-K 2022) workshop. Sci-K aims to explore innovative solutions and ideas for the generation of approaches, data models, and infrastructures (e.g., knowledge graphs) for supporting, directing, monitoring and assessing the scientific knowledge and progress. This edition is also a reflection point as the community is seeking alternative solutions to the now-defunct Microsoft Academic Graph (MAG). Download Download from doi: https://doi.org/10.1145/3487553.3524883... Enriching Data Lakes with Knowledge Graphs17 June 2022“Enriching Data Lakes with Knowledge Graphs” is a workshop paper published at “Knowledge Graph Generation from Text” co-located with ESWC 2022. Alessandro Chessa1,2, Gianni Fenu3, Enrico Motta4, Francesco Osborne4,5, Diego Reforgiato Recupero3,Angelo Antonio Salatino4, Luca Secchi1 1 Linkalab s.r.l., Cagliari, Italy 2 Luiss Data Lab, Rome, Italy 3 University of Cagliari, Cagliari, Italy 4 Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom 5 University of Milano Bicocca, Milan, Italy Abstract Data lakes are repositories of data stored in natural/raw format. A data lake may include structured data from relational databases, semi-structured data (i.e., JSON, CSV), unstructured data (i.e., text data), or binary data (i.e., images, audio, video). It is usually built on top of cost-efficient infrastructures such as Hadoop, Amazon S3, MongoDB, ElasticSearch, etc. Several organisations rely on big data lakes for crucial tasks such as reporting, visualisation, advanced analytics, machine learning, and business intelligence. A major limitation of this solution is that without descriptive metadata and a mechanism to maintain it, such data tend to be noisy, making their management and analysis complex and time-consuming. Therefore, there is the need to add a semantic layer based on a formal ontology to describe the data and efficient mechanism to represent them as a knowledge graph. In this paper, we present a methodology to add a semantic layer to a data lake and thus obtain a knowledge graph that can support structured queries and advanced data exploration. We describe a practical implementation of a methodology applied to a data lake consisting of text data describing the online marketplace for lodging and tourism activities. We report statistics about the data lake and the resulting knowledge graph. Download Link will be available soon... The AIDA Dashboard: a Web Application for Assessing and Comparing Scientific Conferences17 June 2022“The AIDA Dashboard: a Web Application for Assessing and Comparing Scientific Conferences” is a research paper submitted to IEEE Access. Simone Angioni1, Angelo Antonio Salatino2, Francesco Osborne2, Diego Reforgiato Recupero1, Enrico Motta2 1 Department of Mathematics and Computer Science, University of Cagliari (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) Abstract Scientific conferences are essential for developing active research communities, promoting the cross-pollination of ideas and technologies, bridging between academia and industry, and disseminating new findings. Analyzing and monitoring scientific conferences is thus crucial for all users who need to take informed decisions in this space. However, scholarly search engines and bibliometric applications only provide a limited set of analytics for assessing research conferences, preventing us from performing a comprehensive analysis of these events. In this paper, we introduce the AIDA Dashboard, a novel web application, developed in collaboration with Springer Nature, for analyzing and comparing scientific conferences. This tool introduces three major new features: 1) it enables users to easily compare conferences within specific fields (e.g., Digital Libraries) and time-frames (e.g., the last five years); 2) it characterises conferences according to a 14K research topics from the Computer Science Ontology (CSO); and 3) it provides several functionalities for assessing the involvement of commercial organizations, including the ability to characterize industrial contributions according to 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO). We evaluated the AIDA Dashboard by performing both a quantitative evaluation and a user study, obtaining excellent results in terms of quality of the analytics and usability. Downloads Download paper from IEEE Access (OA): https://ieeexplore.ieee.org/document/9754584 Download from ORO: http://oro.open.ac.uk/82668/... Characterising Research Areas in the field of AI17 June 2022“Characterising Research Areas in the field of AI” is a research paper submitted to the special track “Statistical Methods for Science Mapping” on “51st Scientific Meeting of the Italian Statistical Society”. Alessandra Belfiore1, Angelo Salatino2, Francesco Osborne2 1 Università della Campania Luigi Vanvitelli, Caserta (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) Abstract Interest in Artificial Intelligence (AI) continues to grow rapidly, hence it is crucial to support researchers and organisations in understanding where AI research is heading. In this study, we conducted a bibliometric analysis on 257K articles in AI, retrieved from OpenAlex. We identified the main conceptual themes by performing clustering analysis on the co-occurrence network of topics. Finally, we observed how such themes evolved over time. The results highlight the growing academic interest in research themes like deep learning, machine learning, and internet of things. Downloads Download paper from arXiv: https://arxiv.org/abs/2205.13471... New trends in scientific knowledge graphs and research impact assessment06 March 2022“New trends in scientific knowledge graphs and research impact assessment” is the introductory chapter of the Special Issue on “Scientific Knowledge Graphs and Research Impact Assessment” at Quantitative Science Studies (QSS by MIT Press). Paolo Manghi1, Andrea Mannocci1, Francesco Osborne2, Dimitris Sacharidis3, Angelo Salatino2, Thanasis Vergoulis4 1 CNR-ISTI – National Research Council, Institute of Information Science and Technologies “Alessandro Faedo” (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) 3 Université Libre de Bruxelles (Belgium) 4 “Athena” RC (Greece) Introduction In recent decades, we have experienced a continuously increasing publication rate of scientific articles and related research objects (e.g., data sets, software packages). As this trend keeps growing, practitioners in the field of scholarly knowledge are confronted with several challenges. In this special issue, we focus on two major categories of such challenges: (a) those related to the organization of scholarly data to achieve a flexible, context-sensitive, fine-grained, and machine-actionable representation of scholarly knowledge that at the same time is structured, interlinked, and semantically rich, and (b) those related to the design of novel, reliable, and comprehensive metrics to assess scientific impact. To address the challenges of the first category, new technical infrastructures are becoming increasingly popular, organizing and representing scholarly knowledge through scientific knowledge graphs (SKG). These are large networks describing the actors (e.g., authors, organizations), the documents (e.g., publications, patents), and other research outputs (e.g., research data, software) and knowledge (e.g., research topics, concepts, tasks, technologies) in this space as well as their reciprocal relationships. These resources provide substantial benefits to researchers, companies, and policymakers by powering several data-driven services for navigating, analyzing, and making sense of research dynamics. Some examples include Microsoft Academic Graph (MAG), AMiner, Open Academic Graph, ScholarlyData.org, Semantic Scholar, PID Graph, Open Research Knowledge Graph, OpenCitations, and the OpenAIRE research graph. Despite their popularity, the field of SKGs has a lot of open challenges, such as the design of ontologies able to conceptualize scholarly knowledge, model its representation, and enable its exchange across different SKGs; the extraction of entities and concepts, integration of information from heterogeneous sources, identification of duplicates, finding connections between entities, and identifying conceptual inconsistencies; and the development of services that exploit knowledge as provided by one or more SKGs to discover, monitor, measure, and consume research outcomes. With regard to the second category, we seek effective and precise research assessment. In this context, there is a need for reliable and comprehensive metrics and indicators of the impact and merit of publications, data sets, research institutions, individual researchers, and other relevant entities. Research impact refers to the attention a research work receives inside its respective and related disciplines, the social/mass media, and so on. A research work’s merit, on the other hand, is relevant to its quality aspects (e.g., its novelty, reproducibility, compliance with the Findable, Accessible, Interoperable, Reusable initiative for promoting data discovery and reuse, and readability). Nowadays, due to the growing popularity of Open Science initiatives, a large number of useful science-related data sets have been made openly available, paving the way for the synthesis of more sophisticated research impact and merit indicators (and, consequently, more precise research assessment). For instance, in recent years, due to the systematic effort of various developing teams, a variety of large SKGs has been made available, providing a very rich and relatively clean source of information about academics, their publications, and relevant metadata that can be used for the development of effective research assessment approaches. The proposal for this special issue originated from the collaboration of two workshops, the Scientific Knowledge Graphs Workshop (SKG 2020), and the Workshop on Assessing Impact and Merit in Science (AIMinScience 2020), held (virtually) in conjunction with the 2020 edition of the International Conference on Theory and Practice of Digital Libraries (TPDL) on August 25, 2020. SKG 2020 offered a forum to discuss about the themes surrounding the first set of challenges, namely methods for extracting entities and relationships from research publications; data models for the description of scholarly data; methods for the exploration, retrieval, and visualization of scientific knowledge graphs, and applications for making sense of scholarly data. On the other hand, AIMinScience 2020 focused on the second set of challenges, which include scientometrics and bibliometrics; applications utilizing scientific impact and merit to provide useful services to the research community and industry; data mining and machine learning approaches to facilitate research assessment; and insightful visualization techniques that utilize or facilitate research assessment. Given that the themes of both workshops are interlinked, because SKGs can indeed support research impact assessment, it was a joint decision to edit this special issue on Scientific Knowledge Graphs and Research Impact Assessment, with the aim of providing all practitioners interested in the scholarly knowledge with the current advances of these particular aspects. In addition, this collaboration catalyzed the creation of the International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment (Sci-K), a new joint event that replaced SKG and AIMinScience, focusing on a wider subject and audience. Sci-K aims to explore innovative solutions and ideas for the generation of approaches, data models, and infrastructures (e.g., knowledge graphs), for supporting, directing, monitoring, and assessing scientific knowledge. Its first edition, Sci-K 2021, was held on April 13, 2021, co-organized with The Web Conference 2021. It was a successful event with 11 presented papers and two keynote talks from Prof. Ludo Waltman and Prof. Staša Milojević. Download Download from ORO: http://oro.open.ac.uk/80008/ Download from Source (OA): https://direct.mit.edu/qss/article/2/4/1296/108052/New-trends-in-scientific-knowledge-graphs-and...