We consider the problem of testing whether two unequal-sized samples were drawn from identical distributions, versus distributions that differ significantly. Specifically, given a target error parameter > 0, m1 independent draws from an unknown distribution p with discrete support, and draws from an unknown distribution q of discrete support, we describe a test for distinguishing the case that p = q from the case that ||p - q||1 If p and q are supported on at most elements, then our test is successful with high probability provided and We show that this tradeoff is information the1 oretically optimal throughout this range in the dependencies on all parameters, to constant factors for worst-case distributions. As a consequence, we obtain an algorithm for estimating the mixing time of a Markov chain on n states up to a log n factor that uses queries to a “next node” oracle. The core of our testing algorithm is a relatively simple statistic that seems to perform well in practice, both on synthetic and on natural language data. We believe that this statistic might prove to be a useful primitive within larger machine learning and natural language processing systems.