Model-based Iterative Restoration for Binary Document Image Compression
with Dictionary Learning
Abstract
The inherent noise in the observed (e.g., scanned) binary
document image degrades the image quality and harms the
compression ratio through breaking the pattern repentance
and adding entropy to the document images. In this paper, we design a cost function in Bayesian framework with
dictionary learning. Minimizing our cost function produces
a restored image which has better quality than that of the
observed noisy image, and a dictionary for representing
and encoding the image. After the restoration, we use this
dictionary (from the same cost function) to encode the restored image following the symbol-dictionary framework by
JBIG2 standard with the lossless mode. Experimental results with a variety of document images demonstrate that
our method improves the image quality compared with the
observed image, and simultaneously improves the compression ratio. For the test images with synthetic noise, our
method reduces the number of flipped pixels by 48.2% and
improves the compression ratio by 36.36% as compared
with the best encoding methods. For the test images with
real noise, our method visually improves the image quality, and outperforms the cutting-edge method by 28.27% in
terms of the compression ratio