Abstract
We present C YCLADES, a general framework for parallelizing stochastic optimization algorithms in a shared memory setting. C YCLADES is asynchronous during model updates, and requires no memory locking mechanisms, similar to H OG WILD !-type algorithms. Unlike H OGWILD !, C YCLADES introduces no conflicts during parallel execution, and offers a black-box analysis for provable speedups across a large family of algorithms. Due to its inherent cache locality and conflictfree nature, our multi-core implementation of C YCLADES consistently outperforms H OGWILD !-type algorithms on sufficiently sparse datasets, leading to up to 40% speedup gains compared to H OGWILD !, and up to 5 gains over asynchronous implementations of variance reduction algorithms.