Global Analytics in the Face of Bandwidth and Regulatory Constraints11 Nov 2015
Takeaways from Global Analytics in the Face of Bandwidth and Regulatory Constraints, Vulimiri et al, NSDI, 2015.
- Existing Approach for Global Analytics: Transfer data to a central data center.
- Transfers significant data volumes.
- Limit on the number of applications due to limited cross-border network bandwidth.
- Network capacity growth is decelerating.
- Regulatory constraints on data movement.
Problem Geode tries to solve:
Provide wide area analytics while minimizing bandwidth over geo-distributed data structured as SQL tables.
Resources within a single data center are relatively cheap compared to cross-data center bandwidth.
- Central Command Layer.
- Thin proxy layer atop local analytics stack.
- Central Workload Optimizer.
- Subquery Deltas.
- Cache query results at the source and destination with the query’s signature.
- Instead of sending fresh results, send only a diff (delta) between the new and old results.
- Cache results on a sub-query basis.
- Workload Optimizer.
- Identify optimal centralized plan.
- Use default Calcite implementation.
- Rough SQL table statistics in Hive to get an optimized join algorithm selection.
- Use pseudo-distributed measurement to estimate data movement.
- Simulate a virtual topology in which each base data partition is a separate datacenter, even if the data is centralized.
- Achieved by rewriting SQL queries to simulate data partitions.
- Evaluate alternative data partitions to get an indication of optimal. To reduce the sample space, do a worst case evaluation (worst data partition).
- Combine all measurements to jointly solve site selection and data replication problem.
- Formulate an ILP to jointly solve both problems to minimize total bandwidth cost.
- Due to scalability constraints, alternatively use greedy approach.
- Solve site selection problem in isolation based on sovereignity constraints.
- Greedily pick the datacenter to copy input data needed per-task based on lowest cost.