资源数据集NIST Structured Forms Reference Set of Binary Images (SFRS) II 图像数据

NIST Structured Forms Reference Set of Binary Images (SFRS) II 图像数据

2019-12-18 | |  115 |   0 |   0

The second NIST database of structured forms consists of 5,595 pages of binary, black-and-white images of synthesized documents containing hand-print.

The documents in this database are 12 different tax forms with the IRS 1040 Package X for the year 1988. These include Forms 1040, 2106, 2441, 4562, and 6251 together with Schedules A, B, C, D, E, F, and SE. Eight of these forms contain two pages or form faces; therefore, there are 20 different form faces represented in the database.

The document images in this database appear to be real hand-printed forms prepared by individuals, but the images have been automatically derived and synthesized using a computer and contain no "real" tax data. There are 900 simulated tax submissions represented in the database averaging 6.22 form faces per submission.


sd6.jpg

A representative image file of a completed form in NIST Special Database 6


The database has the following features:

  • 900 simulated tax submissions

  • 5,595 images of completed structured form faces containing hand-printed data

  • 5,595 text files containing entry field answers

  • 20 tables of entry field types and contexts


Suitable for both document processing and automated data capture research, development and evaluation, the database can be used for:

  • forms identification

  • field isolation: locating entry fields on the form

  • character segmentation: separating entry field values into characters

  • character recognition: identifying specific handprinted characters.


The database is a valuable tool for measurement of system performance and system comparison on complex forms.


上一篇:NIST Structured Forms Reference Set of Binary Images (SFRS) 图像数据

下一篇:NIPS 2003 属性选择竞赛数据

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...