LCCC - NIPS 2010 Workshop on Learning on Cores, Clusters and Clouds

LCCC : NIPS 2010 Workshop on
Learning on Cores, Clusters and Clouds

Schedule

Time	Event	Speaker
7:00	Poster Setup
7:30 - 8:00	Opening remarks and overview of the field	John Langford
8:00 - 9:00	Keynote: Averaging algorithms and distributed optimization	John N. Tsitsiklis
9:00 - 9:20	Coffee Break and Poster Session
9:20 - 9:45	Optimal Distributed Online Prediction Using Mini-Batches	Lin Xiao
9:45 - 10:10	MapReduce/Bigtable for Distributed Optimization	Slav Petrov
10:10 - 10:30	Mini Talks Part I
10:30 - 15:30	Poster Session and Ski Break
14:00 - 15:30	Unofficial Tutorial on Vowpal Wabbit	Langford et al.
15:30 - 16:30	Keynote: Machine Learning in the Cloud with GraphLab	Carlos Guestrin
16:30 - 16:55	Distributed MAP Inference for Undirected Graphical Models	Sameer Singh
16:55 - 17:15	Coffee Break and Poster Session
17:15 - 17:40	Gradient Boosted Decision Trees on Hadoop	Jerry Ye
17:40 - 18:00	Mini Talks Part II
18:00 - 18:30	Panel discussion and summary	All Speakers
18:30	Last chance to look at posters

Keynote: Averaging algorithms and distributed optimization

Slides

John N. Tsitsiklis

In distributed averaging and consensus algorithms, processors exchange and update certain values (or "estimates", or "opinions") by forming a local average with the values of their neighbors. Under suitable conditions, such algorithms converge to consensus (every processor ends up holding the same value) or even average-consensus (consensus is achieved on the average of the initial values held by the processors). Algorithms of this type have been proposed as a subroutine of distributed optimization methods, used to combine the results of different processors while a master algorithm is running. We overview a few applications of averaging algorithms, with a focus on gradient-like optimization methods. We then proceed to highlight some results, old and new, with a focus on convergence rates. We finally discuss some open problems.

Keynote: Machine Learning in the Cloud with GraphLab

Carlos Guestrin

Exponentially increasing dataset sizes have driven Machine Learning experts to explore using parallel and distributed computing for their research. Furthermore, cloud computing resources such as Amazon EC2 have become increasingly available, providing cheap and scalable platforms for large scale computation. However, due to the complexities involved in distributed design, it can be difficult for ML researchers to take full advantage of cloud resources. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges.

By targeting common patterns in ML, we developed GraphLab, which compactly expresses asynchronous iterative algorithms with sparse computational dependencies common in ML, while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions for a variety of ML tasks, including learning graphical models with approximate inference, Gibbs sampling, tensor factorization, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large-scale real-world problems and demonstrate their scalability on Amazon EC2, using up to 256 processors.

Distributed MAP Inference for Undirected Graphical Models

Sameer Singh, Amar Subramanya, Fernando Pereira, Andrew McCallum

In this work, we distribute the MCMC-based MAP inference using the Map-Reduce framework. The variables are assigned randomly to machines, which leads to some factors that neighbor vari- ables on separate machines. Parallel MCMC-chains are initiated using proposal distributions that only suggest local changes such that factors that lie across machines are not examined. After a fixed number of samples on each machine, we redistribute the variables amongst the machines to enable proposals across variables that were on different machines. To demonstrate the distribution strategy on a real-world information extraction application, we model the task of cross-document coreference.

Gradient Boosted Decision Trees on Hadoop

Jerry Ye, Jyh-Herng Chow, Jiang Chen, Zhaohui Zheng

Stochastic Gradient Boosted Decision Trees (GBDT) is one of the most widely used learning algorithms in machine learning today. It is adaptable, easy to interpret, and produces highly accurate models. However, most implementations today are computationally expensive and require all training data to be in main memory. As training data becomes ever larger, there is motivation for us to parallelize the GBDT algorithm. Parallelizing decision tree training is intuitive and various approaches have been explored in existing literature. Stochastic boosting on the other hand is inherently a sequential process and have not been applied to distributed decision trees. In this paper, we describe a distributed implementation of GBDT that utilizes MPI on the Hadoop grid environment as presented by us at CIKM in 2009.

MapReduce/Bigtable for Distributed Optimization

Keith Hall, Scott Gilpin, Gideon Mann (presented by Slav Petrov)

For large data it can be very time consuming to run gradient based optimizat ion,for example to minimize the log-likelihood for maximum entropy models.Distributed methods are therefore appealing and a number of distributed gradientoptimization strategies have been proposed including: distributed gradient, asynchronousupdates, and iterative parameter mixtures. In this paper, we evaluatethese various strategies with regards to their accuracy and speed over MapReduce/Bigtable and discuss the techniques needed for high performance.

Optimal Distributed Online Prediction using Mini-Batches

Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, Lin Xiao

Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work we present the distributed mini-batch algorithm, a method of converting any serial gradient-based online pre- diction algorithm into a distributed algorithm that scales nicely to multiple cores, clusters, and grids. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our regret analysis explicitly takes into account communication latencies that occur on the network. Our method can also be adapted to optimally solve the closely- related distributed stochastic optimization problem.

Mini Talks I

Building Heterogeneous Platforms for End-to-end Online Learning Based on Dataflow Computing Design. Benoit Corda, Clement Farabet, Marco Scoffier, Yann LeCun.

A Convenient Framework for Efficient Parallel Multipass Algorithms. Markus Weimer, Sriram Rao, Martin Zinkevich.

Parallel Online Learning. Daniel Hsu, Nikos Karampatziakis, John Langford.

The Learning Behind the Gmail Priority Inbox. Douglas Aberdeen, Ondrej Pacovsky, Andrew Slater.

Mini Talks II

Learning to Rank on a Cluster using Boosted Decision Trees. Christopher J. C. Burges, Krysta M. Svore.

Parallel Splash Gibbs Sampling. Joseph E. Gonzalez, Yucheng Low, Arthur Gretton, Carlos Guestrin.

Distributed Markov chain Monte Carlo. Lawrence Murray.

All-Pairs Nearest Neighbor Search on Manycore Systems. Larwence Cayton.

Posters

Large-Scale Learning needs core Computer Science. Soren Sonnenburg, Vojtech Franc, Konrad Reick.

A zealous parallel gradient descent algorithm. Gilles Louppe, Pierre Geurts.

A Parallel Algorithm for Exact Bayesian Network Inference. Olga Nikolova, Jaroslaw Zola, Srinivas Aluru.

Parallel Gradient Boosted Regression Stumps. Stephen Tyree, Jennifer Paykin, Kilian Q. Weinberger, Kunal Agrawal.

Machine Learning in the Cloud with GraphLab. Aapo Kyrola, Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin.

K-means on Azure. Matthieu Durut, Fabrice Rossi.

Approximate EM Learning on Large Computer Clusters. Jorg Bornschein, Zhenwhen Dai, Volker Lindenstruth, Jorg Lucke.

MapReduce for Parallel Dynamic Programming and Reinforcement Learning. Yuxi Li, Dale Schuurmans.

PSMA: A Parallel Algorithm for Learning Regular Languages. Hasan Ibne Akram et al.

Organizers

John Langford (Yahoo Research)

Ofer Dekel (Microsoft Research)

John Duchi (UC Berkeley and Google Research)

Alekh Agarwal (UC Berkeley)

Lawrence Cayton (MPI Tuebingen)