Abstract In [Hodosh et al., 2013], we establish a rankingbased framework for sentence-based image description and retrieval. We introduce a new dataset of images paired with multiple descriptive captions that was specififically designed for these tasks. We also present strong KCCA-based baseline systems for description and search, and perform an in-depth study of evaluation metrics for these two tasks. Our results indicate that automatic evaluation metrics for our ranking-based tasks are more accurate and robust than those proposed for generation-based image description