Title : ( Farsi and Arabic document images lossy compression based on the mixed raster content model )
Authors: H. Grailu , M. Lotfizad , Hadi Sadoghi Yazdi ,Abstract
Abstract Recently, the mixed raster content model was proposed for compound document image compression. Most state-of-the-art document image compression methods, such as DjVu, work on the basis of this model but they have some disadvantages, especially for Farsi and Arabic document images. First, the Farsi/Arabic script has some characteristics which can be used to further improve the compression performance. Second, existing segmentation methods have focused on well-separating the textual objects from the background and/or optimizing the rate-distortion trade-off; nevertheless, they have not considered the text readability andOCR facility. Third, these methods usually suffer from the undesired jaggy artifact and misclassifying the important textual details. In this paper,MRC-based document image compression method is proposed which compromises rate-distortion trade-off better than the existing state-of-the-art document compression methods. The proposed method has higher performance in the aspects of segmentation, bi-level mask layer compression, OCR facility, and the overall compression. It uses a 1D pattern matching technique for compression of mask layer. It also uses a segmentationmethod which is sensitive enough to the small textual objects. Experimental results show that the proposed method has considerably higher compression performance than that of the state-of-the-art compression method DjVu, as high as 1.75–2.3.
Keywords
, Keywords Document image compression · Bi, level textual image compression · Document segmentation · MRC model · OCR facility@article{paperid:1014030,
author = {H. Grailu and M. Lotfizad and Sadoghi Yazdi, Hadi},
title = {Farsi and Arabic document images lossy compression based on the mixed raster content model},
journal = {International Journal on Document Analysis and Recognition},
year = {2009},
volume = {12},
number = {4},
month = {June},
issn = {1433-2833},
pages = {227--247},
numpages = {20},
keywords = {Keywords Document image compression · Bi-level
textual image compression · Document segmentation · MRC model · OCR facility},
}
%0 Journal Article
%T Farsi and Arabic document images lossy compression based on the mixed raster content model
%A H. Grailu
%A M. Lotfizad
%A Sadoghi Yazdi, Hadi
%J International Journal on Document Analysis and Recognition
%@ 1433-2833
%D 2009