Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this page, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of research areas in the field of Computer Science.
I have always been passionate about technology. When I bought my first computer (special thanks to my father for funding it), and got it connected to the internet, it soon became part of my life: downloading movies, music, studying, chatting, engaging with different communities, writing a blog, buying and selling stuff. The web gave me […]
Awesome Scholarly Data Analysis is a curated collection of resources that can support Scholarly Data analytics. This list ranges from: Datasets, which includes different corpora of papers, citations, authors and others, as well as taxonomies and ontologies of research concepts; Tools for collecting and classifying research papers, information extraction, and visualization; and Venues, Summer Schools, […]
Springer Nature and the Knowledge Media Institute (KMi) of The Open University are partnering to provide a comprehensive Computer Science Ontology (CSO) to a broad range of communities engaged with scholarly data. CSO can be accessed free of charge through the CSO Portal, a web application that enables users to download, explore, and provide feedback on the ontology.
The Computer Science Ontology is a large-scale ontology of research areas that was automatically generated using the Klink-2 algorithm on a dataset of about 16 million publications, mainly in the field of Computer Science. In the rest of the paper, we will refer to this corpus as the Rexplore dataset.
The current version of CSO includes 14,164 topics and 162,121 semantic relationships. The main root is Computer Science; however, the ontology includes also a few secondary roots, such as Linguistics, Geometry, Semantics, and so on.
CSO presents two main advantages over manually crafted categorisations used in Computer Science (e.g., 2012 ACM Classification, Microsoft Academic Search Classification). First, it can characterise higher-level research areas by means of hundreds of sub-topics and related terms, which enables to map very specific terms to higher-level research areas. Secondly, it can be easily updated by running Klink-2 on a set of new publications.
Simple answer: no. However, before getting into a more detailed answer, allow me to briefly introduce the concept of citation networks, then I will describe why citation networks cannot be considered acyclic anymore. In the scholarly domain, citation networks is an information network in which each node represents a scientific paper and a link between […]
Last April, with my team, we attended the Springer Nature HackDay in Berlin (here is the post). Recently, Springer Nature released a short video featuring us. Summarised is also my interview, in which I discuss my research project and why we think SciGraph is important for those who work in the field of Science of Science. […]
On 2nd of August 2018, I have been invited by Boris Veytsman, Principal Research Scientist at Chan Zuckerberg Initiative (formerly Meta), to give a talk about my PhD work. Differently from my previous talk to the ORNL group, I had the opportunity to describe my doctoral work more comprehensively. More specifically, I initially showed what is available […]
On 30th Jul 2018, I have been invited from Dasha Herrmannova, former PhD student at the KMi, to give a talk at the “Machine Learning and Graph Mining for Big Scholarly Data” workshop organised for the Computational Data Analytics Group at Oak Ridge National Laboratory (ORNL). In this talk, named “AUGUR: Forecasting the Emergence of New […]