Extraction Of Layout Entities And Sub-layout Query-based Retrieval Of Document Images | Awesome Learning to Hash Add your paper to Learning2Hash

Extraction Of Layout Entities And Sub-layout Query-based Retrieval Of Document Images

Bansal Anukriti, Roy Sumantra Dutta, Harit Gaurav. Arxiv 2016

[Paper]    
ARXIV Graph

Layouts and sub-layouts constitute an important clue while searching a document on the basis of its structure, or when textual content is unknown/irrelevant. A sub-layout specifies the arrangement of document entities within a smaller portion of the document. We propose an efficient graph-based matching algorithm, integrated with hash-based indexing, to prune a possibly large search space. A user can specify a combination of sub-layouts of interest using sketch-based queries. The system supports partial matching for unspecified layout entities. We handle cases of segmentation pre-processing errors (for text/non-text blocks) with a symmetry maximization-based strategy, and accounting for multiple domain-specific plausible segmentation hypotheses. We show promising results of our system on a database of unstructured entities, containing 4776 newspaper images.

Similar Work