LCCC : NIPS
2010 Workshop on 
Schedule 
Time  Event  Speaker 

7:00  Poster Setup  
7:30  8:00  Opening remarks and overview of the field  John Langford 
8:00  9:00  Keynote: Averaging algorithms and distributed optimization  John N. Tsitsiklis 
9:00  9:20  Coffee Break and Poster Session  
9:20  9:45  Optimal Distributed Online Prediction Using MiniBatches  Lin Xiao 
9:45  10:10  MapReduce/Bigtable for Distributed Optimization  Slav Petrov 
10:10  10:30  Mini Talks Part I  
10:30  15:30  Poster Session and Ski Break  
14:00  15:30  Unofficial Tutorial on Vowpal Wabbit  Langford et al. 
15:30  16:30  Keynote: Machine Learning in the Cloud with GraphLab  Carlos Guestrin 
16:30  16:55  Distributed MAP Inference for Undirected Graphical Models  Sameer Singh 
16:55  17:15  Coffee Break and Poster Session  
17:15  17:40  Gradient Boosted Decision Trees on Hadoop  Jerry Ye 
17:40  18:00  Mini Talks Part II  
18:00  18:30  Panel discussion and summary  All Speakers 
18:30  Last chance to look at posters 
Keynote: Averaging algorithms and distributed optimizationSlidesJohn N. Tsitsiklis In distributed averaging and consensus algorithms, processors exchange and update certain values (or "estimates", or "opinions") by forming a local average with the values of their neighbors. Under suitable conditions, such algorithms converge to consensus (every processor ends up holding the same value) or even averageconsensus (consensus is achieved on the average of the initial values held by the processors). Algorithms of this type have been proposed as a subroutine of distributed optimization methods, used to combine the results of different processors while a master algorithm is running. We overview a few applications of averaging algorithms, with a focus on gradientlike optimization methods. We then proceed to highlight some results, old and new, with a focus on convergence rates. We finally discuss some open problems. 
Keynote: Machine Learning in the Cloud with GraphLabCarlos GuestrinExponentially increasing dataset sizes have driven Machine Learning experts to explore using parallel and distributed computing for their research. Furthermore, cloud computing resources such as Amazon EC2 have become increasingly available, providing cheap and scalable platforms for large scale computation. However, due to the complexities involved in distributed design, it can be difficult for ML researchers to take full advantage of cloud resources. Existing highlevel parallel abstractions like MapReduce are insufficiently expressive while lowlevel tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which compactly expresses asynchronous iterative algorithms with sparse computational dependencies common in ML, while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions for a variety of ML tasks, including learning graphical models with approximate inference, Gibbs sampling, tensor factorization, CoEM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on largescale realworld problems and demonstrate their scalability on Amazon EC2, using up to 256 processors. 
Distributed MAP Inference for Undirected Graphical ModelsSameer Singh, Amar Subramanya, Fernando Pereira, Andrew McCallumIn this work, we distribute the MCMCbased MAP inference using the MapReduce framework. The variables are assigned randomly to machines, which leads to some factors that neighbor vari ables on separate machines. Parallel MCMCchains are initiated using proposal distributions that only suggest local changes such that factors that lie across machines are not examined. After a fixed number of samples on each machine, we redistribute the variables amongst the machines to enable proposals across variables that were on different machines. To demonstrate the distribution strategy on a realworld information extraction application, we model the task of crossdocument coreference. 
Gradient Boosted Decision Trees on HadoopJerry Ye, JyhHerng Chow, Jiang Chen, Zhaohui ZhengStochastic Gradient Boosted Decision Trees (GBDT) is one of the most widely used learning algorithms in machine learning today. It is adaptable, easy to interpret, and produces highly accurate models. However, most implementations today are computationally expensive and require all training data to be in main memory. As training data becomes ever larger, there is motivation for us to parallelize the GBDT algorithm. Parallelizing decision tree training is intuitive and various approaches have been explored in existing literature. Stochastic boosting on the other hand is inherently a sequential process and have not been applied to distributed decision trees. In this paper, we describe a distributed implementation of GBDT that utilizes MPI on the Hadoop grid environment as presented by us at CIKM in 2009. 
MapReduce/Bigtable for Distributed OptimizationKeith Hall, Scott Gilpin, Gideon Mann (presented by Slav Petrov)For large data it can be very time consuming to run gradient based optimizat ion,for example to minimize the loglikelihood for maximum entropy models.Distributed methods are therefore appealing and a number of distributed gradientoptimization strategies have been proposed including: distributed gradient, asynchronousupdates, and iterative parameter mixtures. In this paper, we evaluatethese various strategies with regards to their accuracy and speed over MapReduce/Bigtable and discuss the techniques needed for high performance. 
Optimal Distributed Online Prediction using MiniBatchesOfer Dekel, Ran GiladBachrach, Ohad Shamir, Lin XiaoOnline prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of webscale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work we present the distributed minibatch algorithm, a method of converting any serial gradientbased online pre diction algorithm into a distributed algorithm that scales nicely to multiple cores, clusters, and grids. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our regret analysis explicitly takes into account communication latencies that occur on the network. Our method can also be adapted to optimally solve the closely related distributed stochastic optimization problem. 
Mini Talks I

Mini Talks II

Posters

Organizers
