Abstract
Recent progress on Automatic Image Annotation (AIA) is achieved by either exploiting low level visual features or high level se- mantic context. Integrating these two paradigms to further leverage the performance of AIA is promising. However, very few previous works have studied this issue in a unified framework. In this paper, we propose a unified model based on Conditional Random Fields (CRF), which es- tablishes tight interaction between visual features and semantic context. In particular, Kernelized Logistic Regression (KLR) with multiple visual distance learning is embedded into the CRF framework. We introduce L1 and L2 regularization terms into the unified learning process for the dis- tance learning and the parameters penalty respectively. The experiments are conducted on two benchmarks: Corel and TRECVID-2005 data sets for evaluation. The experimental results show that, compared with the state-of-the-art methods, the unified model achieves significant improve- ment on annotation performance and shows more robustness with in- creasing number of various visual features.