MLbase: A Distributed Machine-learning System.
03 Nov 2015Takeaways from MLbase: A Distributed Machine-learning System, Kraska et al, CIDR, 2013.
MLbase is a system to make the use and development of distributed ML algorithms more accessible to non-experts.
Existing systems lack the ability to free the user from specifics of ML algorithmic choices and automatic optimization for distributed execution. MLbase is the first system trying to address this and makes the following contributions: A declarative language to specify machine learning tasks. A optimizer to select ML algorithms and associated parameters based on a cost-based model. A framework for continuous refinement and re-optimization of the cost-based model. A distributed runtime optimized for the data-access patterns of machine learning.
Architecture.
- The system consists of a master and several worker nodes.
- The master accepts, parses and materializes the incoming request as executable ML operations which are distributed among the worker nodes.
- The worker nodes execute these ML operations through the MLbase runtime.
Workflow.
- A user issues requests using the MLbase declarative task language to the MLbase master.
- The request is parsed into a Logical Learning Plan (LLP).
- Many operations are mapped 1-to-1 to LLP operators (as data-loading) whereas ML functions are expanded to best-practice workflows.
- The LLP specifies the search-space with respect to the parameters for the algorithms, the data-subsamplings to evaluate, and the cross-validations for quality tests.
- The LLP is transformed to a Physical Learning Plan (PLP) by the Optimizer.
- The Optimizer narrows down the complete search-space to give concrete parameters and data-subsampling strategies in the PLP. These choices are driven by several factors as accuracy of the algorithms, statistics about the datasets, data layouts, algorithmic speed and parallel execution strategies.
- It also estimates the execution time and the algorithmic quality based on several heuristics.
- PLP is executed to yield a learned model (fn-model) and a summary of the quality assessment of the learning process (model lineage).
- The PLP composes several data-centric primitives for machine learning tasks.
- The master distributes these primitives to the workers, monitors progress and handles node failures.
- The choice of these primitives is driven by the contract that the ML algorithm has with the runtime. The contract specifies the computational guarantees and the consistency properties requested (or not requested).
- Additional exploration in the background to improve fn-model.
Extensibility Framework.
ML experts can develop new ML algorithms in the MLbase system subject to the following constraints.
- The new algorithms need to be specified in MLbase primitives.
- The algorithm must specify a contract with regards to the algorithm type, parameters, time-complexity and consistency constraints.
A few of the above system-specifics are explained via examples and a lot of them are abstractly envisioned.