Applying Machine Learning Techniques to Big Data in the Scholarly Domain

Ontologies of research areas have been proven to be useful in many application for analysing and making sense of scholarly data. In this lecture, I will present how we produced the Computer Science Ontology (CSO), which is the largest ontology of research areas in the field of Computer Science, and discuss a number of applications that build on CSO, to support high-level tasks, such as topic classification, research trends forecasting, metadata extraction, and recommendation of books.

Book Review: Weapons of Math Destruction of Cathy O'Neil

weaponsmath-r4-6-06[1]
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy.
Everyday activities are more and more shifting to a digital environment. Digital gadgets such as smartphones and werable devices are becoming inseparable part of our lives promising mostly convenience. New digital technologies have been mainly seen as empowering technologies for the users. FitBit, for example, is claimed to be a motivating device to lead a healthy and active life enabling users to achieve their goals analysing their data [1]. The data collected by this kind of devices include sleeping patterns, the number of steps, the amount of time they are engaged in physical activities and so forth. However, these data are not available just to the users but also to companies that can use them for multiple purposes. Health insurance companies, such as Vitality [2], already exploit their customers’ data in an exchange of rewards such as free tickets to the cinema or hot beverages. The potential implications of the collection and manipulation of personal data on a personal and societal level though have been downgraded. Just imagine a National Health insurance business model that operates on the basis of the classification of citizens as high- or low- risk based on their data [3]. Citizens profiled as low-risk will be granted with lower health contributions, while high-risk profiled citizens will be paying expensive and unaffordable plans.

Imagine a society where decisions on public well-being, education and so forth will be dependent on algorithmic predictions. Cathy O’Neil’s book Weapon of Math Destruction; How Big Data Increases Inequality and Threatens Democracy explores exactly these societal consequences emerging from the abuse of big data predictions.

O’Neil gives insights of how algorithms can be misused in the sake of convenience and cost efficiency resulting in practices of discrimination and bias, amplifying inequality and threatening ultimately Democracy. Her book is written for the lay public drawing though upon her academic expertise and her working experience in the financial sector. O’Neil after earning a PhD in Mathematics at Harvard, worked for the D. E. Shaw hedge fund when she initially felt a sense of disillusionment towards mathematics for their part in the financial crisis in the U.S. in 2008. The financial sector was relying on algorithmic models based on mathematical formulas that, using her words, “were more to impress than clarify”. It is when similar incomprehensible models got adopted into other sectors that she started investigating on the matter. 

Read more

BigDat2017: a review

This week I have been attending the 3rd edition of the Big Data winter school: BigDat2017. It was held in my former campus, at the University of Bari (IT). It was a really nice feeling to be back for a while, sitting on those benches and following courses, once again.
Big Data has recently gained a lot of interest in research and many believe that it will still play its leading role for many years. Nowadays, we live in a world in which all information seems to be available, we are surrounded by data-driven applications (Google, Facebook, Twitter, Spotify, just to name a few), which gather data and try to provide tailor-made solutions for their users. To this end, having such event like BigDat2017 with its clear mission —introduce and update new researchers into this fast advancing research area—is really important.

Read more

A Visual Introduction to Machine Learning: Italian Translation

The R2D3 team (http://www.r2d3.us/) developed a visual introduction to Machine Learning. This introduction uses data visualization technologies to show a workflow that can help for the creation of a machine learning model able to make accurate predictions. Lately, many people volunteered to translate this introduction in different languages. I took care of the Italian version: Una introduzione visuale … Read more

[LAMRECOR] Logistica avanzata per la mobilità di persone e merci: modelli matematici e sperimentazioni per nuovi protocolli di recapito della corrispondenza

Introduction

The LAMRECOR project (Advanced Logistics for people and goods mobility: mathematical models and trials related to new protocols for mail delivery) develops a set of technological solutions and services for advanced logistic, through a high integration of the sorting and delivery system of mail and other postal products of Poste Italiane SpA, with innovative ICT technologies regarding data acquisition, components, modelling, development of processing system, data transmission, and information to customers.

Italy is the country with the highest amount of motorized mobility per capita. This scenario does not only affect the mobility of people but also the mobility of goods. Private land transports covers about 82% of demand. It is recorded a sustained growth of the transport of motorcycles and mopeds. The goods continue to travel mainly by road (71.9% in 2008), boat (18.3%) and for a little part on rails (9.8%).

Read more

Grid Search SVM

The Grid Search SVM is a Java-based application that allows to perform the grid search of an SVM classifier. According to the section 3.2 of (Hsu, Chang and Lin: A Practical Guide to Support Vector Classication) [1], the grid search consists in identifying the best (C, γ) values that allow to classify accurately the unknown data ( new instances as the test data).
In the same work they suggest a practical method to perform the grid search that consists in the exponentially growing of C and γ. They also gives the range the values for those parameters. For example C = [2-5, …, 215] and γ = [2-15, …, 23] (see also here).
The software here presented uses the LIBSVM to implement an SVM classifier and Weka classes as interface to classifiers and dataset. This software as it is said before, will take a particular classifier and will try to train and test on different values of C and γ. All the performances obtained will be stored inside a text file given as output file.

Read more

Design and Implementation of an innovative framework for Speech Emotion Recognition

[English]

With this article I want to publish my thesis work in Human-Computer Interaction, for the Master’s Degree in Computer Systems Engineering at Polytechnic of Bari. 

The entire thesis has been written in Italian. For this reason, I have prepared a English brief summary explaining all materials, methods, results and conclusions. Use the following link to read the abstract: My Thesis Abstract.pdf.

[Italiano]

Titolo in Italiano: “Progettazione e implementazione di un innovativo framework per il riconoscimento delle emozioni vocali”.

Con questo post si vuole pubblicare il lavoro di tesi, svolto nella disciplina di Interazione Uomo Macchina, per la Laurea Magistrale in Ingegneria Informatica al Politecnico di Bari.

Read more