Title: Biologically plausible neural networks for invariant visual recognition

I will describe a neural network architecture with binary neurons and discrete bounded synapses that can recognize and detect visual object classes. The network has a feedforward, recurrent and feedback component. The network is trained using simple Hebbian learning rules and can retrieve and retain objects from memory using the recurrent component. I will explain the crucial role of inhibition in the network's function and show how the model offers some ideas on the purpose of feedback connections in the biological visual system. The connection to multiple randomized classifiers will discussed as well.

Peter Bickel

Title: Regression on manifolds

Abstract: I'll report on work of Dr. Anil Aswani done under my supervision and that of Prof.Claire Tomlin (EECS). We took advantage of low dimensional manifold structure in high dimensional regression, motivated in part by a study of development in the blastoderm stage in fly.

Gunnar Carlsson

Title: The Shape of Data

Many interesting problems in the study of data can be interpreted as questions about the "shape" of the data. For example, the existence of a cluster decomposition of a data set can be viewed as an aspect of its shape, as can the presence of loops and higher dimensional features. These shape theoretic aspects are important in identifying conceptually coherent groups within a data set, or perhaps the presence of periodic or recurrent behavior. Topology can be characterized as the study of shape, including both questions about how to represent shape efficiently as well as how to measure it, in a suitable sense. Over the last decade, there has been an effort to adapt the methods of topology to the study of data, so that one can become more precise about the shape theoretic aspects. I will talk about some of these developments, with examples.

Sanjoy Dasgupta

Title: Notions of dimension, and their use in analyzing non-parametric regression

Abstract: Many theoreticians in machine learning, statistics, and computational geometry have been inspired by Partha's paper on "Finding the homology of submanifolds with high confidence", with Steve Smale and Shmuel Weinberger. Among other things, it introduces a clean notion of curvature, which has considerably simplified subsequent work on manifold learning. I'll discuss two outgrowths of this work: first, briefly, a result on random projections of manifolds by Baraniuk-Wakin and Clarkson; and then, in more detail, some results of my own looking at general notions of intrinsic dimension and developing nonparametric regressors whose convergence rate depends only on these, rather than on ambient dimension.

Pedro Felzenszwalb

Title: Object detection grammars

Abstract: I will discuss various aspects of object detection using compositional models, focusing on the framework of object detection grammars, discriminative training and efficient computation. Object detection grammars provide a formalism for expressing very general types of models for object detection. Over the past few years we have considered a sequence of increasingly richer models. Each model in this sequence builds on the structures and methods employed by the previous models, while staying within the framework of discriminatively trained grammar models. Along the way, we have increased representational capacity, developed new machine learning techniques, and focused on efficient computation.

Antonio Galves

Title: Context tree selection and linguistic rhythm retrieval from written texts Abstract: The starting point of this article is the question ``How to retrieve fingerprints of rhythm in written texts?''. We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature. This is a joint work with Charlotte Galves, Jesus Garcia, Nancy L. Garcia and Florencia Leonardi. This article is dedicated to Partha Niyogi and Jean-Roger Vergnaud. It will appear in one of the next issues of The Annals of Applied Statistics.

Aren Jensen

Title: Not All Frames Are Created Equal: Temporal Sparsity for Robust and Efficient ASR

Abstract: Traditional frame-based speech recognition technologies build sequence models of temporally dense vector time series representations that account for the entirety of the speech signal. However, under non-stationary distortion, the burden of accounting for everything can propagate errors beyond the corrupted frames. I will present an alternative strategy, developed by Partha and myself, where the speech signal is instead (i) transformed into a sparse set of temporal point patterns of the most salient acoustic events and (ii) decoded using explicit models of the temporal statistics of these patterns. Formalized under a point process model framework, the proposed sparse methods exhibit sufficiency for clean speech recognition, provide a new avenue to improve noise robustness, and hold potential for significantly increased computational efficiency over their frame-based counterparts.

Alexey Koloydenko

Title: A risk-based view of the conventional and new types of path inference in HMMs

Abstract: I plan to talk about a recent work with my collaborator Jüri Lember which concerns path inference in HMMs. We re-examine the two most popular methods of path inference, namely, the Viterbi algorithm and the optimal accuracy (BCJR) algorithm. In the early days of digital communication, an opinion emerged that any difference in performance between the two may not be significant for the current applications. There is perhaps less ground for this opinion nowadays as HMM-based applications have become very diverse. In fact, an algorithmic attempt to hybridize the Viterbi and optimal accuracy methods was already contemplated by practitioners a couple of decades ago. We take a more general approach and hybridize the Viterbi and the optimal accuracy decoders in a natural risk-based manner while still staying within the same efficient forward-backward algorithmic framework.

Vladimir Koltchinskii

Title: Low Rank Matrix Estimation

We will discuss a problem of estimation of a large matrix based on a finite number of noisy measurements of random linear functionals of this matrix. The goal is to construct estimators of the target matrix that is either low rank, or can be well approximated by low rank matrices. Several approaches to this problem based on penalized least squares method with convex complexity penalties that favor low rank solutions will be considered and error bounds for these methods with explicit dependence on the rank of the target matrix will be discussed.

Natalia Komarova

Title: Mathematical modeling of color categorization in humans

All the perceivable colors in the psychophysical space form a 3-dimensional solid. This solid gets split into a finite (and small) number of regions, which are referred to a color terms. There are universal features in color categorization across different cultures/languages, and there are also differences in the number and locations of color terms. I will describe a mathematical framework for reasoning about color categorization in humans, and show how some of the universal features can be explained.

Karen Livescu

Title: Articulatory modeling for automatic speech recognition

The standard approach to speech recognition represents words as concatenations of hidden Markov models, each representing a phonetic unit such as a particular vowel or consonant in a particular context. For several decades the speech research community has also experimented with models of speech articulation, that is models of the behavior of the lips, tongue, vocal folds, and soft palate during speech production, as an alternative to phone-based models. This work is motivated by the potential for improved pronunciation modeling, language portability, and noise robustness. However, such models have not yet made it into mainstream recognition systems, in part because of their complexity relative to traditional approaches. Current interest in structured prediction and low-resource speech recognition has renewed interest in this approach. This talk will review the current state of articulatory models and will describe one particular model family, using dynamic graphical models over articulatory variables. This approach allows for probabilistic modeling of asynchrony between articulators and reduction in articulatory gestures that account for pronunciation variation.

Partha Mitra

Title: Mouse Brain Architecture Project

Abstract: Fundamental gaps remain in our understanding of brains. This is evident from our limited mechanistic understanding of neuropsychiatric disorders and the difficulty in developing therapies. In particular, we do not have a comprehensive picture of the circuit architecture of brains. A frequently mentioned reason for this gap is the complexity of the circuitry: the astronomical numbers of neurons and synapses are often cited in this context. However, although brains are complex, this complexity is not disorganized - classical neuroanatomical studies exhibit the existence of an intermediate, "mesoscopic" level of organization, as can be seen from classical neuroanatomical atlases which exhibit brain nuclei, layered structures, and organized projection patterns. We have argued for the need and feasibility of determining brain-wide circuit architecture at a mesoscopic scale in multiple species, starting with the mouse. This talk will present some historical and theoretical background, and a description of ongoing experimental work as well as intermediate results.

Hariharan Narayanan

Title: Testing the Manifold Hypothesis

Abstract: Increasingly, we are confronted with very high dimensional data sets in areas like computational biology and medical imaging. As a result, methods of avoiding the curse of dimensionality have come to the forefront of machine learning research. One approach, which relies on exploiting the geometry of the data, has evolved into a subfield called manifold learning. The underlying hypothesis of this field is that data tend to lie near a low di- mensional submanifold, due to constraints that limit the degrees of freedom of the process generating them. This appears to be the case, for example, in speech and video data. Although there are many widely used algorithms which assume this hypothesis, the basic question of testing this hypothesis is poorly understood. We will describe joint work with Charles Fefferman and Sanjoy Mitter on devel- oping a provably correct, efficient algorithm to test this hypothesis from random data.

D. Kimbrough Oller

Title: Automated monitoring of vocal development: The potential for screening and diagnosis of childhood handicaps

Abstract: Partha Niyogi and I worked together on a project showing that all-day audio recordings of infants and young children in their natural environments could be analyzed with automated methods to determine the child's stage of development. Using a model based on research in infant vocal development, we assessed 12 acoustic parameters with no human intervention and determined that we could predict infant age from 2-48 months with a high degree of accuracy (predicted age and real age correlated at about 0.8). Further we showed that children with autism or language delay could be differentiated from typically developing children with high accuracy. The research was a collaboration with the LENA Research Foundation, and was published in PNAS in 2010. The presentation for the memorial conference will bring this line of research up to date by reviewing new lines of effort extending the published findings. Oller, D. K., Niyogi, P., S. Gray, J. A. Richards, J. Gilkerson, D. Xu, U. Yapanel, S. F. Warren (2010). Automated Vocal Analysis of Naturalistic Recordings from Children with Autism, Language Delay and Typical Development. Proceedings of the National Academy of Sciences, 107, 30, 13354-13359. http://www.pnas.org/content/early/2010/07/08/1003882107.full.pdf PMID: 20643944

Tomaso Poggio

Title: The computational magic of the ventral stream: from visual development to group theory, Hebbian learning and wavelets.

Abstract: I cannot think of a better person than Partha to tell me whether the theory I will describe is too elegant to be true...or not. I conjecture that the sample complexity of object recognition is mostly due to geometric image transformations and that a main goal of the ventral stream is to learn-and- discount image transforma=ons. From this hypothesis and a few reasonable assump=ons about neural mechanisms, I develop a theory predic=ng that the size of the receptive fields determines which transformations are learned during development; that the transformation represented in each area determines the tuning of the neurons in the area; and that class-specific transformations are learned and represented at the top of the ventral stream hierarchy. A surprising implication of these theoretical results is that the computational goals and some of the tuning properties of cells in the ventral stream may follow rather directly from symmetry properties (in the sense of physics)of the visual world through a process of unsupervised correlational learning, based on Hebbian synapses.

Robert Schapire

Title: A Theory of Multiclass Boosting

Abstract: Boosting combines weak classifiers to form highly accurate predictors. Unlike the case of binary classification, in the multiclass setting, a complete theoretical understanding of boosting is relatively lacking. We do not know the "correct" way to define the requirements on the weak classifiers, nor has the notion of optimal boosting been explored for this case. In this talk, we introduce a broad and general framework for studying multiclass boosting that formalizes the interaction between the boosting algorithm and the weak learner. Within this framework, we are able to precisely identify what is meant by correct weak learning conditions that are neither too weak nor too strong, and to judge both new and old conditions relative to this criterion. Further, using a game-theoretic approach, we show how to design optimally efficient boosting algorithms for these weak learning conditions. This is joint work with Indraneel Mukherjee.

Vikas Sindhwani

Title: Learning Vector-valued functions and Data-dependent Kernels for Manifold Regularization

Abstract: In this talk, I will recall the Manifold Regularization framework of Belkin, Niyogi and Sindhwani (JMLR 2006). I will then give a brief overview of recent work on extending manifold regularization to estimate vector-valued functions (ICML 2011, with Ha Quang Minh), and on addressing the problem of learning data-dependent geometry-aware kernels using a non-parametric group-OMP approach (NIPS 2011, with A.C. Lozano).

Amit Singer

Title: Vector Diffusion Maps and the Connection Laplacian

Abstract: Motivated by problems in structural biology, specifically cryo-electron microscopy, we introduce vector diffusion maps (VDM), a new mathematical framework for organizing and analyzing high dimensional data sets, 2D images and 3D shapes. VDM is a mathematical and algorithmic generalization of diffusion maps and other non-linear dimensionality reduction methods, such as LLE, ISOMAP and Laplacian eigenmaps. While existing methods are either directly or indirectly related to the heat kernel for functions over the data, VDM is based on the heat kernel for vector fields. VDM provides tools for organizing complex data sets, embedding them in a low dimensional space and interpolating and regressing vector fields over the data. In particular, it equips the data with a metric, which we refer to as the vector diffusion distance. In the manifold learning setup, where the data set is distributed on (or near) a low dimensional manifold M^d embedded in R^p, we prove the relationship between VDM and the connection-Laplacian operator for vector fields over the manifold. Applications to structural biology (cryo-electron microscopy and NMR spectroscopy), computer vision and shape space analysis will be discussed. (Joint work with Hau-tieng Wu.)

Steve Smale

Title: Topology and the Geometry of Manifolds from Sampling

Abstract: We will discuss some recent developments inspired by my relationship with Partha.

Morgan Sonderegger

Title: Combining data and dynamical systems models of language change

Abstract: Partha's 2006 book (The Computational Nature of Language Learning and Evolution) describes a general mathematical framework for modeling language evolution. Given an algorithm by which each language learner in a generation induces a grammar from examples drawn from members of the previous generation, a dynamical system results which describes the evolution of linguistic knowledge in the population. This approach allows one to rigorously reason about the relationship between individual learning and population-level language change. I describe joint work with Partha applying this framework to data from a complex case of language change: stress shift in English noun/verb pairs over the past several centuries. I consider a range of learning algorithms for individuals, motivated by the experimental literature on English stress, and analyze the population dynamics of each resulting dynamical system model. Each model is evaluated by comparing its dynamics to properties observed in the noun/verb data, allowing for adjudication between different possible accounts of the causes of change in this case.

Shmuel Weinberger

Title: A Topological view of unsupervised learning

Abstract: This title of this talk is taken from Partha Niyogi's posthumously published article with Steve Smale and me. I will discuss the results of that paper, its context, and some subsequent work that has grown out of it.

Bin Yu

Title: Spectral clustering and high-dim stochastic block model for undirected and directed graphs

In recent years network analysis have become the focus of much research in many fields including biology, communication studies, economics, information science, organizational studies, and social psychology. Communities or clusters of highly connected actors form an essential feature in the structure of several empirical networks. Spectral clustering is a popular and computationally feasible method to discover these communities. The Stochastic Block Model is a social network model with well defined communities. This talk will give conditions for spectral clustering to correctly estimate the community membership of nearly all nodes. These asymptotic results are the first clustering results that allow the number of clusters in the model to grow with the number of nodes, hence the name high-dimensional. Moreover, I will present on-going work on directed spectral clustering for networks whose edges are directed, including the enron data as an example.

Jun Zhang

Title: Laplacian eigenfunctions learn population structure

This talk summarizes previous joint work with Partha on applications of geometric learning to population genetics and some ongoing progress in that direction. Principal components analysis has been used for decades to summarize genetic variation across geographic regions and to infer population migration history. More recently, with the advent of genome-wide association studies of complex traits, it has become a commonly-used tool for detection and correction of confoundings due to population structure. However, principal components are generally sensitive to outliers. Motivated from geometric learning, we proposed summarizing genetic variation by Laplacian eigenfunctions. Incorporated with L1 penalty, a sparse version of the approach can effectively identify a small panel of structure informative markers and outperforms traditional methods on ascertaining ancestral informative markers(AIM). we validated the method using gloabl HapMap and HGDP datasets.