We study two communication-efficient algorithms for distributed statistical optimization on large-scale data. The first algorithm is an averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error that decays as Whenever this guarantee matches the best possible rate achievable by a centralized algorithm having access to all N samples. The second algorithm is a novel method, based on an appropriate form of the bootstrap. Requiring only a single round of communication, it has mean-squared error that decays as and so is more robust to the amount of parallelization. We complement our theoretical results with experiments on largescale problems from the internet search domain. In particular, we show that our methods efficiently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which consists of samples and dimensions.