Latest News Sci-K 2022 – International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment17 June 2022“Sci-K 2022 – International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment” is the introductory chapter of the workshop proceedings of “Sci-K 2022 – International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment” co-located with The Web Conference 2022. Paolo Manghi1, Andrea Mannocci1, Francesco Osborne2, Dimitris Sacharidis3, Angelo Salatino2, Thanasis Vergoulis4 1 CNR-ISTI – National Research Council, Institute of Information Science and Technologies “Alessandro Faedo” (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) 3 Université Libre de Bruxelles (Belgium) 4 “Athena” RC (Greece) Abstract In this paper we present the 2nd edition of the Scientific Knowledge: Representation, Discovery, and Assessment (Sci-K 2022) workshop. Sci-K aims to explore innovative solutions and ideas for the generation of approaches, data models, and infrastructures (e.g., knowledge graphs) for supporting, directing, monitoring and assessing the scientific knowledge and progress. This edition is also a reflection point as the community is seeking alternative solutions to the now-defunct Microsoft Academic Graph (MAG). Download Download from doi: https://doi.org/10.1145/3487553.3524883... Enriching Data Lakes with Knowledge Graphs17 June 2022“Enriching Data Lakes with Knowledge Graphs” is a workshop paper published at “Knowledge Graph Generation from Text” co-located with ESWC 2022. Alessandro Chessa1,2, Gianni Fenu3, Enrico Motta4, Francesco Osborne4,5, Diego Reforgiato Recupero3,Angelo Antonio Salatino4, Luca Secchi1 1 Linkalab s.r.l., Cagliari, Italy 2 Luiss Data Lab, Rome, Italy 3 University of Cagliari, Cagliari, Italy 4 Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom 5 University of Milano Bicocca, Milan, Italy Abstract Data lakes are repositories of data stored in natural/raw format. A data lake may include structured data from relational databases, semi-structured data (i.e., JSON, CSV), unstructured data (i.e., text data), or binary data (i.e., images, audio, video). It is usually built on top of cost-efficient infrastructures such as Hadoop, Amazon S3, MongoDB, ElasticSearch, etc. Several organisations rely on big data lakes for crucial tasks such as reporting, visualisation, advanced analytics, machine learning, and business intelligence. A major limitation of this solution is that without descriptive metadata and a mechanism to maintain it, such data tend to be noisy, making their management and analysis complex and time-consuming. Therefore, there is the need to add a semantic layer based on a formal ontology to describe the data and efficient mechanism to represent them as a knowledge graph. In this paper, we present a methodology to add a semantic layer to a data lake and thus obtain a knowledge graph that can support structured queries and advanced data exploration. We describe a practical implementation of a methodology applied to a data lake consisting of text data describing the online marketplace for lodging and tourism activities. We report statistics about the data lake and the resulting knowledge graph. Download Link will be available soon... The AIDA Dashboard: a Web Application for Assessing and Comparing Scientific Conferences17 June 2022“The AIDA Dashboard: a Web Application for Assessing and Comparing Scientific Conferences” is a research paper submitted to IEEE Access. Simone Angioni1, Angelo Antonio Salatino2, Francesco Osborne2, Diego Reforgiato Recupero1, Enrico Motta2 1 Department of Mathematics and Computer Science, University of Cagliari (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) Abstract Scientific conferences are essential for developing active research communities, promoting the cross-pollination of ideas and technologies, bridging between academia and industry, and disseminating new findings. Analyzing and monitoring scientific conferences is thus crucial for all users who need to take informed decisions in this space. However, scholarly search engines and bibliometric applications only provide a limited set of analytics for assessing research conferences, preventing us from performing a comprehensive analysis of these events. In this paper, we introduce the AIDA Dashboard, a novel web application, developed in collaboration with Springer Nature, for analyzing and comparing scientific conferences. This tool introduces three major new features: 1) it enables users to easily compare conferences within specific fields (e.g., Digital Libraries) and time-frames (e.g., the last five years); 2) it characterises conferences according to a 14K research topics from the Computer Science Ontology (CSO); and 3) it provides several functionalities for assessing the involvement of commercial organizations, including the ability to characterize industrial contributions according to 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO). We evaluated the AIDA Dashboard by performing both a quantitative evaluation and a user study, obtaining excellent results in terms of quality of the analytics and usability. Downloads Download paper from IEEE Access (OA): https://ieeexplore.ieee.org/document/9754584 Download from ORO: http://oro.open.ac.uk/82668/... Characterising Research Areas in the field of AI17 June 2022“Characterising Research Areas in the field of AI” is a research paper submitted to the special track “Statistical Methods for Science Mapping” on “51st Scientific Meeting of the Italian Statistical Society”. Alessandra Belfiore1, Angelo Salatino2, Francesco Osborne2 1 Università della Campania Luigi Vanvitelli, Caserta (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) Abstract Interest in Artificial Intelligence (AI) continues to grow rapidly, hence it is crucial to support researchers and organisations in understanding where AI research is heading. In this study, we conducted a bibliometric analysis on 257K articles in AI, retrieved from OpenAlex. We identified the main conceptual themes by performing clustering analysis on the co-occurrence network of topics. Finally, we observed how such themes evolved over time. The results highlight the growing academic interest in research themes like deep learning, machine learning, and internet of things. Downloads Download paper from arXiv: https://arxiv.org/abs/2205.13471... New trends in scientific knowledge graphs and research impact assessment06 March 2022“New trends in scientific knowledge graphs and research impact assessment” is the introductory chapter of the Special Issue on “Scientific Knowledge Graphs and Research Impact Assessment” at Quantitative Science Studies (QSS by MIT Press). Paolo Manghi1, Andrea Mannocci1, Francesco Osborne2, Dimitris Sacharidis3, Angelo Salatino2, Thanasis Vergoulis4 1 CNR-ISTI – National Research Council, Institute of Information Science and Technologies “Alessandro Faedo” (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) 3 Université Libre de Bruxelles (Belgium) 4 “Athena” RC (Greece) Introduction In recent decades, we have experienced a continuously increasing publication rate of scientific articles and related research objects (e.g., data sets, software packages). As this trend keeps growing, practitioners in the field of scholarly knowledge are confronted with several challenges. In this special issue, we focus on two major categories of such challenges: (a) those related to the organization of scholarly data to achieve a flexible, context-sensitive, fine-grained, and machine-actionable representation of scholarly knowledge that at the same time is structured, interlinked, and semantically rich, and (b) those related to the design of novel, reliable, and comprehensive metrics to assess scientific impact. To address the challenges of the first category, new technical infrastructures are becoming increasingly popular, organizing and representing scholarly knowledge through scientific knowledge graphs (SKG). These are large networks describing the actors (e.g., authors, organizations), the documents (e.g., publications, patents), and other research outputs (e.g., research data, software) and knowledge (e.g., research topics, concepts, tasks, technologies) in this space as well as their reciprocal relationships. These resources provide substantial benefits to researchers, companies, and policymakers by powering several data-driven services for navigating, analyzing, and making sense of research dynamics. Some examples include Microsoft Academic Graph (MAG), AMiner, Open Academic Graph, ScholarlyData.org, Semantic Scholar, PID Graph, Open Research Knowledge Graph, OpenCitations, and the OpenAIRE research graph. Despite their popularity, the field of SKGs has a lot of open challenges, such as the design of ontologies able to conceptualize scholarly knowledge, model its representation, and enable its exchange across different SKGs; the extraction of entities and concepts, integration of information from heterogeneous sources, identification of duplicates, finding connections between entities, and identifying conceptual inconsistencies; and the development of services that exploit knowledge as provided by one or more SKGs to discover, monitor, measure, and consume research outcomes. With regard to the second category, we seek effective and precise research assessment. In this context, there is a need for reliable and comprehensive metrics and indicators of the impact and merit of publications, data sets, research institutions, individual researchers, and other relevant entities. Research impact refers to the attention a research work receives inside its respective and related disciplines, the social/mass media, and so on. A research work’s merit, on the other hand, is relevant to its quality aspects (e.g., its novelty, reproducibility, compliance with the Findable, Accessible, Interoperable, Reusable initiative for promoting data discovery and reuse, and readability). Nowadays, due to the growing popularity of Open Science initiatives, a large number of useful science-related data sets have been made openly available, paving the way for the synthesis of more sophisticated research impact and merit indicators (and, consequently, more precise research assessment). For instance, in recent years, due to the systematic effort of various developing teams, a variety of large SKGs has been made available, providing a very rich and relatively clean source of information about academics, their publications, and relevant metadata that can be used for the development of effective research assessment approaches. The proposal for this special issue originated from the collaboration of two workshops, the Scientific Knowledge Graphs Workshop (SKG 2020), and the Workshop on Assessing Impact and Merit in Science (AIMinScience 2020), held (virtually) in conjunction with the 2020 edition of the International Conference on Theory and Practice of Digital Libraries (TPDL) on August 25, 2020. SKG 2020 offered a forum to discuss about the themes surrounding the first set of challenges, namely methods for extracting entities and relationships from research publications; data models for the description of scholarly data; methods for the exploration, retrieval, and visualization of scientific knowledge graphs, and applications for making sense of scholarly data. On the other hand, AIMinScience 2020 focused on the second set of challenges, which include scientometrics and bibliometrics; applications utilizing scientific impact and merit to provide useful services to the research community and industry; data mining and machine learning approaches to facilitate research assessment; and insightful visualization techniques that utilize or facilitate research assessment. Given that the themes of both workshops are interlinked, because SKGs can indeed support research impact assessment, it was a joint decision to edit this special issue on Scientific Knowledge Graphs and Research Impact Assessment, with the aim of providing all practitioners interested in the scholarly knowledge with the current advances of these particular aspects. In addition, this collaboration catalyzed the creation of the International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment (Sci-K), a new joint event that replaced SKG and AIMinScience, focusing on a wider subject and audience. Sci-K aims to explore innovative solutions and ideas for the generation of approaches, data models, and infrastructures (e.g., knowledge graphs), for supporting, directing, monitoring, and assessing scientific knowledge. Its first edition, Sci-K 2021, was held on April 13, 2021, co-organized with The Web Conference 2021. It was a successful event with 11 presented papers and two keynote talks from Prof. Ludo Waltman and Prof. Staša Milojević. Download Download from ORO: http://oro.open.ac.uk/80008/ Download from Source (OA): https://direct.mit.edu/qss/article/2/4/1296/108052/New-trends-in-scientific-knowledge-graphs-and... AIDA: a Knowledge Graph about Research Dynamics in Academia and Industry05 March 2022“AIDA: a Knowledge Graph about Research Dynamics in Academia and Industry” is a research paper published at the Special Issue on “Scientific Knowledge Graphs and Research Impact Assessment” at Quantitative Science Studies (QSS by MIT Press). Simone Angioni1, Angelo Antonio Salatino2, Francesco Osborne2, Diego Reforgiato Recupero1, Enrico Motta2 1 Department of Mathematics and Computer Science, University of Cagliari (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) Abstract Academia and industry share a complex, multifaceted, and symbiotic relationship. Analyzing the knowledge flow between them, understanding which directions have the biggest potential, and discovering the best strategies to harmonize their efforts is a critical task for several stakeholders. Research publications and patents are an ideal medium to analyze this space, but current data sets of scholarly data cannot be used for such a purpose because they lack a high-quality characterization of the relevant research topics and industrial sectors. In this paper, we introduce the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21 million publications and 8 million patents according to the research topics drawn from the Computer Science Ontology. 5.1 million publications and 5.6 million patents are further characterized according to the type of the author’s affiliations and 66 industrial sectors from the proposed Industrial Sectors Ontology (INDUSO). AIDA was generated by an automatic pipeline that integrates data from Microsoft Academic Graph, Dimensions, DBpedia, the Computer Science Ontology, and the Global Research Identifier Database. It is publicly available under CC BY 4.0 and can be downloaded as a dump or queried via a triplestore. We evaluated the different parts of the generation pipeline on a manually crafted gold standard yielding competitive results. Download Download from ORO: http://oro.open.ac.uk/79445/ Download from Source (OA): https://direct.mit.edu/qss/article/2/4/1356/108043/AIDA-A-knowledge-graph-about-research-dynamics-in... Assessing Scientific Conferences through Knowledge Graphs04 March 2022“Assessing Scientific Conferences through Knowledge Graphs” is a paper published at the Industry Track of the 2021 International Semantic Web Conference. Simone Angioni1, Angelo Antonio Salatino2, Francesco Osborne2, Aliaksandr Birukou3, Diego Reforgiato Recupero1, Enrico Motta2 1 Department of Mathematics and Computer Science, University of Cagliari (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) 3 Springer-Verlag GmbH, Tiergartenstrasse 17, 69121 Heidelberg (DE) Abstract Springer Nature is the main publisher of scientific conferences in Computer Science and produces several well-known series of proceedings books, such as LNCS. The editorial team needs to take critical decisions about which conferences to publish as well as actively scan the horizon for identifying emerging ones. In this short paper, we present the Conference Dashboard, a new web application based on a large knowledge graph of scholarly data (1.3B triples) for assessing scientific conferences and informing editorial decisions. Download Download from CEUR (OA): http://ceur-ws.org/Vol-2980/paper411.pdf... AIDA-Bot: A Conversational Agent to ExploreScholarly Knowledge Graphs29 August 2021“AIDA-Bot: A Conversational Agent to ExploreScholarly Knowledge Graphs” is a demo paper accepted for presentation at the International Semantic Web Conference (ISWC 2021) poster and demo session. Antonello Meloni1, Simone Angioni1, Angelo Antonio Salatino2, Francesco Osborne2, Diego Reforgiato Recupero1, Enrico Motta2 1 Department of Mathematics and Computer Science, University of Cagliari (Italy) 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) Abstract Chatbots have become increasingly popular in the last years among both businesses and consumers. These conversational agents typically use natural language understanding technologies for reformulating questions as queries to a knowledge base and answer the user according to the resulting information. This demo paper proposes a general software architecture for chatbots that can be employed for exploring large-scale knowledge graphs of scholarly data. It also introduces AIDA-Bot, a working prototype that implements this architecture and can answer queries about scientific publications, research topics, researchers, universities, and conferences. In order to show the flexibility of the proposed solution, we implemented two versions of AIDA-Bot which can run, respectively, on Alexa devices and web browsers. Architecture of AIDA-Bot Downloads Download it from our institutional repository (open access): http://oro.open.ac.uk/78716/... Link Prediction of Weighted Triples for Knowledge Graph Completion Within the Scholarly Domain29 August 2021“Link Prediction of Weighted Triples for Knowledge Graph Completion Within the Scholarly Domain” is a journal paper accepted at IEEE Access Mojtaba Nayyeri1,2, Gökce Müge Cil1, Sahar Vahdati2, Francesco Osborne3, Andrey Kravchenko4, Simone Angioni5, Angelo Salatino3, Diego Reforgiato Recupero5, Enrico Motta3, Jens Lehmann1,6 1 SDA Research Group, University of Bonn, 53115 Bonn, Germany 2 Nature-Inspired Machine Intelligence, Institute for Applied Informatics (InfAI), 01069 Dresden, Germany 3 Knowledge Media Institute, The Open University, Milton Keynes MK7 6AA, U.K. 4 Christ Church, University of Oxford, Oxford OX1 1DP, U.K. 5 Department of Mathematics and Computer Science, University of Cagliari, 09124 Cagliari, Italy 6 Fraunhofer IAIS, 53757 Dresden, Germany Abstract Knowledge graphs (KGs) are widely used for modeling scholarly communication, performing scientometric analyses, and supporting a variety of intelligent services to explore the literature and predict research dynamics. However, they often suffer from incompleteness (e.g., missing affiliations, references, research topics), leading to a reduced scope and quality of the resulting analyses. This issue is usually tackled by computing knowledge graph embeddings (KGEs) and applying link prediction techniques. However, only a few KGE models are capable of taking weights of facts in the knowledge graph into account. Such weights can have different meanings, e.g. describe the degree of association or the degree of truth of a certain triple. In this paper, we propose the Weighted Triple Loss, a new loss function for KGE models that takes full advantage of the additional numerical weights on facts and it is even tolerant to incorrect weights. We also extend the Rule Loss, a loss function that is able to exploit a set of logical rules, in order to work with weighted triples. The evaluation of our solutions on several knowledge graphs indicates significant performance improvements with respect to the state of the art. Our main use case is the large-scale AIDA knowledge graph, which describes 21 million research articles. Our approach enables to complete information about affiliation types, countries, and research topics, greatly improving the scope of the resulting scientometrics analyses and providing better support to systems for monitoring and predicting research dynamics. RDF Schema of research articles in the Academia/Industry DynAmics (AIDA) Knowledge Graph Download Download from DOI (open access): https://doi.org/10.1109/ACCESS.2021.3105183... CSO Classifier 3.004 August 2021Abstract Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this repository, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of research areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods. v3.0 This release welcomes some improvements under the hood. In particular: we refactored the code, reorganising scripts into more elegant classes we added functionalities to automatically setup and update the classifier to the latest version of CSO we added the explanation feature, which returns chunks of text that allowed the classifier to infer a given topic the syntactic module takes now advantage of Spacy POS tagger (as previously done only by semantic module) the grammar for the chunk parser is now more robust: {<JJ.*>*<HYPH>*<JJ.*>*<HYPH>*<NN.*>*<HYPH>*<NN.*>+} In addition, in the post-processing module, we added the outlier detection component. This component improves the accuracy of the result set, by removing erroneous topics that were conceptually distant from the others. This component is enabled by default and can be disabled by setting delete_outliers = False when calling the CSO Classifier (see Parameters). Please, be aware that having substantially restructured the code into classes, the way of running the classifier has changed too. Thus, if you are using a previous version of the classifier, we encourage you to update it (pip install -U cso-classifier) and modify your calls to the classifier, accordingly. Read our usage examples. We would like to thank James Dunham @jamesdunham from CSET (Georgetown University) for suggesting to us how to improve the code. Download from: Full documentation available on GitHub readme file: https://github.com/angelosalatino/cso-classifier... Detection, Analysis, and Prediction of Research Topics with Scientific Knowledge Graphs25 June 2021“Detection, Analysis, and Prediction of Research Topics with Scientific Knowledge Graphs” is a book chapter of “Predicting the Dynamics of Research Impact” edited by Springer. Angelo A. Salatino1, Andrea Mannocci2, and Francesco Osborne1 1Knowledge Media Institute – The Open University, Milton Keynes, United Kingdom 2Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”, Italian National Research Council, Pisa, Italy Abstract Analysing research trends and predicting their impact on academia and industry is crucial to gain a deeper understanding of the advances in a research field and to inform critical decisions about research funding and technology adoption. In the last years, we saw the emergence of several publicly-available and large-scale Scientific Knowledge Graphs fostering the development of many data-driven approaches for performing quantitative analyses of research trends. This chapter presents an innovative framework for detecting, analysing, and forecasting research topics based on a large-scale knowledge graph characterising research articles according to the research topics from the Computer Science Ontology. We discuss the advantages of a solution based on a formal representation of topics and describe how it was applied to produce bibliometric studies and innovative tools for analysing and predicting research dynamics. Download chapter Download paper from ArXiv: https://arxiv.org/abs/2106.12875... CSO Classifier 3.0: A Scalable Unsupervised Method for Classifying Documents in Terms of Research Topics25 June 2021“CSO Classifier 3.0: A Scalable Unsupervised Method for Classifying Documents in Terms of Research Topics” is a journal paper accepted at the Special Issue of “TPDL 2019 & 2020” at Scientometrics. Angelo Salatino, Francesco Osborne, Enrico Motta Abstract Classifying scientific articles, patents, and other documents according to the relevant research topics is an important task, which enables a variety of functionalities, such as categorising documents in digital libraries, monitoring and predicting research trends, and recommending papers relevant to one or more topics. In this paper, we present the latest version of the CSO Classifier (v3.0), an unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive taxonomy of research areas in the field of Computer Science. The CSO classifier takes as input the metadata of a research paper (usually title, abstract, and keywords) and returns a set of research topics drawn from the ontology. This new version includes a new component for discarding outlier topics and offers improved scalability. We evaluated the CSO Classifier on a gold standard of manually annotated articles, demonstrating a significant improvement over alternative methods. We also present an an overview of applications adopting the CSO Classifier and describe how it can be adapted to other fields. Architecture Architecture of the CSO Classifier. Download Download paper from our institutional repository: http://oro.open.ac.uk/78283/ Download from DOI (Gold OA): https://doi.org/10.1007/s00799-021-00305-y... Trans4E: Link Prediction on Scholarly Knowledge Graphs25 June 2021“Trans4E: Link Prediction on Scholarly Knowledge Graphs” is a journal paper submitted to the Special Issue on “Knowledge Graph Representation & Reasoning” at the Neurocomputing Journal Mojtaba Nayyeria, Gokce Muge Cila, Sahar Vahdatib, Francesco Osborned, Mahfuzur Rahmana,Simone Angionie, Angelo Salatinod, Diego Reforgiato Recuperoe, Nadezhda Vassilyevaa, Enrico Mottad and Jens Lehmanna,c aSDA Research Group, University of Bonn (Germany) bInstitute for Applied Informatics (InfAI) cFraunhofer IAIS, Dresden (Germany) dKnowledge Media Institute, The Open University, Milton Keynes (UK) eDepartment of Mathematics and Computer Science, University of Cagliari (Italy) Abstract The incompleteness of Knowledge Graphs (KGs) is a crucial issue affecting the quality of AI-based services. In the scholarly domain, KGs describing research publications typically lack important information, hindering our ability to analyse and predict research dynamics. In recent years, link prediction approaches based on Knowledge Graph Embedding models became the first aid for this issue. In this work, we present Trans4E, a novel embedding model that is particularly fit for KGs which include N to M relations with N≫M. This is typical for KGs that categorize a large number of entities (e.g., research articles, patents, persons) according to a relatively small set of categories. Trans4E was applied on two large-scale knowledge graphs, the Academia/Industry DynAmics (AIDA) and Microsoft Academic Graph (MAG), for completing the information about Fields of Study (e.g., ‘neural networks’, ‘machine learning’, ‘artificial intelligence’), and affiliation types (e.g., ‘education’, ‘company’, ‘government’), improving the scope and accuracy of the resulting data. We evaluated our approach against alternative solutions on AIDA, MAG, and four other benchmarks (FB15k, FB15k-237, WN18, and WN18RR). Trans4E outperforms the other models when using low embedding dimensions and obtains competitive results in high dimensions. Download Download paper from our institutional repository: http://oro.open.ac.uk/77317/ Download paper from DOI (Elsevier): https://doi.org/10.1016/j.neucom.2021.02.100 Download paper from arXiv: https://arxiv.org/abs/2107.03297... Scientific Knowledge Graphs: an Overview12 May 2021On 12th May 2021, I have been invited by Dimitris Sacharidis to give a lecture to the master course is INFO-H509 “XML and Web Technologies” at the Université Libre de Bruxelles. Abstract In the last decade, several Scientific Knowledge Graphs (SKG) were released, representing scientific knowledge in a structured, interlinked, and semantically rich manner. But, what kind of information they describe? How they have been built? What can we do with them? In this lecture, I will first provide an overview of well-known SKGs, like Microsoft Academic Graph, Dimensions, and others. Then, I will present the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21M publications and 8M patents according to i) the research topics drawn from the Computer Science Ontology, ii) the type of the author’s affiliations (e.g, academia, industry), and iii) 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO). Finally, I will showcase a number of tools and approaches using such SKGs, supporting researchers, companies, and policymakers in making sense of research dynamics. Video Slides... Clique Percolation Method in Python29 December 2020Clique Percolation Method (CPM) is an algorithm for finding overlapping communities within networks, introduced by Palla et al. (2005, see references). This implementation in Python, firstly detects communities of size k, then creates a clique graph. Each community will be represented by each connected component in the clique graph. Algorithm The algorithm performs the following steps: 1- first find all cliques of size k in the graph 2- then create graph where nodes are cliques of size k 3- add edges if two nodes (cliques) share k-1 common nodes 4- each connected component is a community Example with k=3 Graph The presented graph contains the following cliques: {1, 2, 3} {1, 3, 4} {4, 5, 6} {5, 6, 7} {5, 6, 8} {5, 7, 8} {6, 7, 8} Each clique will represent a node in the clique graph and those node are connected each other if they share k-1 (2 in this case) nodes. Clique Graph As a result, the clique graph presents two connected components containing {1,2,3,4} and {4,5,6,7,8}. The node 4 belongs to both communities. In other words, the graph contains two overlapping communities. Description Usage clique_percolation_method(graph, k = 3): Implementation of the Clique Percolation Method Test import CliquePercolationMethod as cpm cpm.text() # or cpm.test_karate() Arguments graph: the input graph (igraph object) k: the size of the clique (usually = 3) Returns communities: a list of communities. Each element of this list is itself a list containing the nodes of such community. Package Dependencies This implementation requires igraph which can be installed by running: pip install python-igraph Project info Github repository: https://github.com/angelosalatino/CliquePercolationMethod-Python Issue tracker: Here References Palla, G., Derényi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043), 814-818.... ISWC2020 – BEST DEMO OF THE DAY AWARD24 December 2020The Smart Topic Miner, which is an innovative state-of-the-art AI application for automating editorial processes at Springer Nature and improving access to scientific knowledge, has been shortlisted for the “Most Innovative use of AI” DataIQ 2020 Awards. Smart Topic Miner analyses scientific publications in Computer Science and classifies them with very high accuracy in terms of a catalogue of 15,000 research topics. The automation of this complex task has produced both a 75% cost reduction and a dramatic improvement in metadata quality. As a result of the latter, downloads in Computer Science from the Springer Nature portal have increased by almost 10 million units, as search engines can now identify the relevant scientific content with much higher accuracy, thus benefitting readers of scientific literature all over the world. At the SKM3 team, we are very honoured to receive such great recognition, as it further proves the impact of our work and specifically the Smart Topic Miner. Disclaimer: This post is identical to the one published to my research team group Relevant publications: Integrating Knowledge Graphs for Analysing Academia and Industry Dynamics The AIDA Dashboard: Analysing Conferences with Semantic Technologies... Applying Machine Learning Techniques to Big Data in the Scholarly Domain12 November 2020On 12th Nov 2020, I have been invited to give a talk to the 5th International School on Applied Probability Theory, Communications Technologies & Data Science (APTCT-2020), organised and hosted by the RUDN University (Moskow, RU), jointly with Tampere University (Finland) and Brno University of Technology (Czech Republic). In this lecture, I showed the Computer Science Ontology framework (see Fig. 1), and how it has been successful for us to perform several experiments in the field of Science of Science. Specifically I showed in detail all the 5 layers. I started from the Scholarly data and the big scholarly data sources available out there. Then, I showed the Computer Science Ontology (CSO) and how it has been created: Klink-2 Algorithm. CSO is a large scale ontology of research areas in the field of Computer Science and being an ontology of scientific disciplines it gives the possibility to organise digital libraries (scholarly datasets) according to its constituents (research topics). Afterwards, I showed the several approaches available for topic classification: connecting scholarly datasets and taxonomies/ontologies of science. As approaches, I showed topic model (e.g. LDA), machine learning approaches (supervised), approaches based on citation networks and finally the CSO Classifier based on Natural Language Processing techniques. The CSO Classifier allows to enhance each research paper in big scholarly dataset with it relevant topics drawn from the CSO ontology. On top of these 4 initial layers, I showed several high-level applications: metadata extraction, showing Smart Topic Miner, a tool used by Springer Nature for annotating and extracting metadata information from book and conference proceedings; recommendation of books, showing Smart Book Recommender, a tool developed for Springer Nature to analyse the digital library and select the most appropriate books, journals, and proceedings to market at a scientific event; research trends forecast, showing a ML approach able to predict the impact of a topic in industry (receive > 50 patents in the following 10 years); conference dashboard, is a recent tool that we developed for assessing conferences across several parameters. This framework proved to be very successful and we are eager to explore new innovative solutions and contribute to the further development of the Science of Science field. The lecture was attended by more than 100 students. Fig. 1: Computer Science Ontology Framework Slides Useful Links 5th International School on Applied Probability Theory, Communications Technologies & Data Science (APTCT-2020): http://www.aptct.ru... Finalists at DataIQ 2020 Awards01 October 2020The Smart Topic Miner, which is an innovative state-of-the-art AI application for automating editorial processes at Springer Nature and improving access to scientific knowledge, has been shortlisted for the “Most Innovative use of AI” DataIQ 2020 Awards. Smart Topic Miner analyses scientific publications in Computer Science and classifies them with very high accuracy in terms of a catalogue of 15,000 research topics. The automation of this complex task has produced both a 75% cost reduction and a dramatic improvement in metadata quality. As a result of the latter, downloads in Computer Science from the Springer Nature portal have increased by almost 10 million units, as search engines can now identify the relevant scientific content with much higher accuracy, thus benefitting readers of scientific literature all over the world. At the SKM3 team, we are very honoured to receive such great recognition, as it further proves the impact of our work and specifically the Smart Topic Miner. Disclaimer: This post is identical to the one published to my research team group Media From the KMi News website: OU wins two categories in DataIQ 2020 Awards Congratulations to the following teams for making the 2020 #DataIQAwards shortlist for “Most innovative use of AI” -AIScout -Merkle and Spirit Energy -NatWest -Open University -TUI Group View the shortlist and register for the live reveal here: https://t.co/WbX4YUCowG pic.twitter.com/2Bu02TgNe4 — DataIQ (@TheDataIQ) September 8, 2020...