Machine
Learning and Big Data Colloquium

CARTEL

Quantifying Patterns of Scientific Performance and Success

Ph.D. Roberta Sinatra,

Center for Complex Network Research

Northeastern University, Boston (MA, USA)

Abstract:

In most areas of human performance the path to a major accomplishment requires a steep learning curve, long practice and many trials. Athletes go through years of training and compete repeatedly before setting new records; musicians practice from an early age and perform in secondary venues before earning the spotlight. Science is not different: outstanding discoveries scientists become known for are usually preceded by publications of less memorable impact. Yet, little is known about the patterns that lead to scientific excellence: Are there quantifiable signs of an impending scientific hit? Will a scientist produce higher impact work following a major discovery? Is the success of a particular work predictable? What other measures of success exist beyond the obvious ones—the number and the impact factor of the published papers and the citations collected?

Quantitative answers to these questions can be given thanks to the emergence of Big Data, that offers information about multiple facets of scientific activity: from career paths to citation patters and altmetrics, like download patterns, twitter and Facebook mentions, Wikipedia activity. Our work is driven by the hypothesis that scientific success becomes predictable if we see it not as an individual phenomenon, but rather a collective one. For a result to be successful, it is not enough to be novel or fundamental. Rather, the scientific community must agree that it is worthy of praise and follow-up. Based on this premise, I will present a mathematically rigorous framework that offers actionable information towards a quantitative evaluation and prediction of scientific excellence.

Feature allocations, probability functions, and paintboxes

Ph.D. Tamara Broderick

UC Berkeley

Abstract:

Fecha: Vienres 03 de Octubre de 2014

Lugar: Auditorio IIMAS

Hora: 12:00 horas

In most areas of human performance the path to a major accomplishment requires a steep learning curve, long practice and many trials. Athletes go through years of training and compete repeatedly before setting new records; musicians practice from an early age and perform in secondary venues before earning the spotlight. Science is not different: outstanding discoveries scientists become known for are usually preceded by publications of less memorable impact. Yet, little is known about the patterns that lead to scientific excellence: Are there quantifiable signs of an impending scientific hit? Will a scientist produce higher impact work following a major discovery? Is the success of a particular work predictable? What other measures of success exist beyond the obvious ones—the number and the impact factor of the published papers and the citations collected?

Quantitative answers to these questions can be given thanks to the emergence of Big Data, that offers information about multiple facets of scientific activity: from career paths to citation patters and altmetrics, like download patterns, twitter and Facebook mentions, Wikipedia activity. Our work is driven by the hypothesis that scientific success becomes predictable if we see it not as an individual phenomenon, but rather a collective one. For a result to be successful, it is not enough to be novel or fundamental. Rather, the scientific community must agree that it is worthy of praise and follow-up. Based on this premise, I will present a mathematically rigorous framework that offers actionable information towards a quantitative evaluation and prediction of scientific excellence.

Feature allocations, probability functions, and paintboxes

Ph.D. Tamara Broderick

UC Berkeley

Abstract:

Clustering involves placing entities
into mutually exclusive categories. We wish to relax the requirement of
mutual exclusivity, allowing objects to belong simultaneously to
multiple classes, a formulation that we refer to as "feature
allocation." The first step is a theoretical one. In the case of
clustering the class of probability distributions over exchangeable
partitions of a dataset has been characterized (via exchangeable
partition probability functions and the Kingman paintbox). These
characterizations support an elegant nonparametric Bayesian framework
for clustering in which the number of clusters is not assumed to be
known a priori. We establish an analogous characterization for feature
allocation; we define notions of "exchangeable feature probability
functions" and "feature paintboxes" that lead to a Bayesian framework
that does not require the number of features to be fixed a priori. The
second step is a computational one. Rather than appealing to Markov
chain Monte Carlo for Bayesian inference, we develop a method to
transform Bayesian methods for feature allocation (and other latent
structure problems) into optimization problems with objective functions
analogous to K-means in the clustering setting. These yield
approximations to Bayesian inference that are scalable to large
inference problems.

Fecha: Vienres 03 de Octubre de 2014

Lugar: Auditorio IIMAS

Hora: 12:00 horas