Overview
In the current era of web-scale datasets, high throughput biology and
astrophysics, and multilanguage machine translation, modern datasets
no longer fit on a single computer and traditional machine learning
algorithms often have prohibitively long running times. Parallelized
and distributed machine learning is no longer a luxury; it has become
a necessity. Moreover, industry leaders have already declared that
clouds are the future of computing, and new computing platforms such
as Microsoft's Azure and Amazon's EC2 are bringing distributed
computing to the masses. The machine learning community has been slow
to react to these important trends in computing, and it is time for us
to step up to the challenge.
While some parallel and distributed machine learning algorithms
already exist, many relevant issues are yet to be
addressed. Distributed learning algorithms should be robust to node
failures and network latencies, and they should be able to exploit the
power of asynchronous updates. Some of these issues have been tackled
in other fields where distributed computation is more mature, such as
convex optimization and numerical linear algebra, and we can learn
from their successes and their failures.
The goals of our workshop are:
- To draw the attention of machine learning researchers to this rich
and emerging area of problems and to establish a community of
researchers that are interested in distributed learning.
- To define a number of common problems for distributed learning
(online/batch, synchronous/asynchronous, cloud/cluster/multicore) and
to encourage future research that is comparable and compatible
- To expose the learning community to relevant work in fields such as
distributed optimization and distributed linear algebra.
- To identify research problems that are unique to distributed
learning.
|