Abstract:
In most areas of human performance the path to a major accomplishment
requires a steep learning curve, long practice and many trials.
Athletes go through years of training and compete repeatedly before
setting new records; musicians practice from an early age and perform
in secondary venues before earning the spotlight. Science is not
different: outstanding discoveries scientists become known for are
usually preceded by publications of less memorable impact. Yet, little
is known about the patterns that lead to scientific excellence: Are
there quantifiable signs of an impending scientific hit? Will a
scientist produce higher impact work following a major discovery? Is
the success of a particular work predictable? What other measures of
success exist beyond the obvious ones—the number and the impact factor
of the published papers and the citations collected?
Quantitative answers to these questions can be given thanks to the
emergence of Big Data, that offers information about multiple facets of
scientific activity: from career paths to citation patters and
altmetrics, like download patterns, twitter and Facebook mentions,
Wikipedia activity. Our work is driven by the hypothesis that
scientific success becomes predictable if we see it not as an
individual phenomenon, but rather a collective one. For a result to be
successful, it is not enough to be novel or fundamental. Rather, the
scientific community must agree that it is worthy of praise and
follow-up. Based on this premise, I will present a mathematically
rigorous framework that offers actionable information towards a
quantitative evaluation and prediction of scientific excellence.
Feature allocations,
probability functions, and paintboxes
Ph.D. Tamara
Broderick
UC Berkeley
Abstract:
Clustering involves placing entities
into mutually exclusive categories. We wish to relax the requirement of
mutual exclusivity, allowing objects to belong simultaneously to
multiple classes, a formulation that we refer to as "feature
allocation." The first step is a theoretical one. In the case of
clustering the class of probability distributions over exchangeable
partitions of a dataset has been characterized (via exchangeable
partition probability functions and the Kingman paintbox). These
characterizations support an elegant nonparametric Bayesian framework
for clustering in which the number of clusters is not assumed to be
known a priori. We establish an analogous characterization for feature
allocation; we define notions of "exchangeable feature probability
functions" and "feature paintboxes" that lead to a Bayesian framework
that does not require the number of features to be fixed a priori. The
second step is a computational one. Rather than appealing to Markov
chain Monte Carlo for Bayesian inference, we develop a method to
transform Bayesian methods for feature allocation (and other latent
structure problems) into optimization problems with objective functions
analogous to K-means in the clustering setting. These yield
approximations to Bayesian inference that are scalable to large
inference problems.
Fecha: Vienres 03 de Octubre de 2014
Lugar: Auditorio IIMAS
Hora: 12:00 horas