Machine Learning and Big Data Colloquium                 


Quantifying Patterns of Scientific Performance and Success

Ph.D. Roberta Sinatra,
Center for Complex Network Research
Northeastern University, Boston (MA, USA)

In most areas of human performance the path to a major accomplishment requires a steep learning curve, long practice and many trials. Athletes go through years of training and compete repeatedly before setting new records; musicians practice from an early age and perform in secondary venues before earning the spotlight. Science is not different: outstanding discoveries scientists become known for are usually preceded by publications of less memorable impact. Yet, little is known about the patterns that lead to scientific excellence: Are there quantifiable signs of an impending scientific hit? Will a scientist produce higher impact work following a major discovery? Is the success of a particular work predictable? What other measures of success exist beyond the obvious ones—the number and the impact factor of the published papers and the citations collected?

Quantitative answers to these questions can be given thanks to the emergence of Big Data, that offers information about multiple facets of scientific activity: from career paths to citation patters and altmetrics, like download patterns, twitter and Facebook mentions, Wikipedia activity. Our work is driven by the hypothesis that scientific success becomes predictable if we see it not as an individual phenomenon, but rather a collective one. For a result to be successful, it is not enough to be novel or fundamental. Rather, the scientific community must agree that it is worthy of praise and follow-up. Based on this premise, I will present a mathematically rigorous framework that offers actionable information towards a quantitative evaluation and prediction of scientific excellence.

Feature allocations, probability functions, and paintboxes

Ph.D.  Tamara Broderick
UC Berkeley

Clustering involves placing entities into mutually exclusive categories. We wish to relax the requirement of mutual exclusivity, allowing objects to belong simultaneously to multiple classes, a formulation that we refer to as "feature allocation." The first step is a theoretical one. In the case of clustering the class of probability distributions over exchangeable partitions of a dataset has been characterized (via exchangeable partition probability functions and the Kingman paintbox). These characterizations support an elegant nonparametric Bayesian framework for clustering in which the number of clusters is not assumed to be known a priori. We establish an analogous characterization for feature allocation; we define notions of "exchangeable feature probability functions" and "feature paintboxes" that lead to a Bayesian framework that does not require the number of features to be fixed a priori. The second step is a computational one. Rather than appealing to Markov chain Monte Carlo for Bayesian inference, we develop a method to transform Bayesian methods for feature allocation (and other latent structure problems) into optimization problems with objective functions analogous to K-means in the clustering setting. These yield approximations to Bayesian inference that are scalable to large inference problems.

Fecha: Vienres 03 de Octubre de 2014
Lugar: Auditorio IIMAS
Hora: 12:00 horas