ScanPack: digital solution and 'peeling' document layer

ScanPack's word processing system is able to identify documents from specially damaged originals that can easily peel off the print layers of seals, signatures ...

ScanPack's word processing system is able to identify documents from specially damaged originals that can easily peel off the print layers of seals, signatures .

Russian developer Cognitive Technologies has announced that it will launch a system for processing Cognitive ScanPack documents with special features. The system is intended for traditional jobs such as scanning and processing, compressing documents. However, this system has several special technologies that make it different from today's similar systems.

A key new feature of ScanPack is the new image analysis algorithms. The most significant feature is the ability to work with complex structured documents (with multiple layers printed (signed, stamped) on the text surface) and the ability to work with damaged documents (broken background due to dirty, crushed, tarnished). After processing with ScanPack, the document returns "acceptable image quality".

The use to store in PDF / A format allows compression of original documents up to 4 - 10 times. With the result file, you can perform a search by text. ScanPack automates the process of digitizing documents from scanning stage to data compression stage. Cognitive said that the properties of the system will make it particularly suitable for work on job records.

Grigory Lipich, Chief Executive Officer Abbyy Russia is not ready to test to evaluate Cognitive's ScanPack technology, claiming that "similar technologies have been on the market for a long time ". Lipich said that his company's products are using MRC technology (Mixed Raster Content) that allows a significant reduction in PDF file size and results in small, searchable results. Follow the entire text and keep the original state. It is equipped for Abbyy FineReader Engine developers and in document and data input systems Abbyy FlexiCapture and Abbyy Recognition Server.

ScanPack: digital solution and 'peeling' document layer Picture 1ScanPack: digital solution and 'peeling' document layer Picture 1
Test results compare Abbyy FineReader and Cognitive ScanPack.Text identifier separated by purple .

When using MRC technology, the image before compression has undergone a stage called " layer separation ": in the figure, the structure details are separated into three forms (text, image (image, diagram, graphs, etc., and areas that are shared in one color, then these layers will be handled independently by compression algorithms, and in Abbyy's solutions with technology application ADRT (Adaptive Document Recognition Technology) adaptive identification allows processing complex formatted documents.

ScanPack: digital solution and 'peeling' document layer Picture 2ScanPack: digital solution and 'peeling' document layer Picture 2
The experiment compared the Abbyy FineReader with the Cognitive ScanPack.Text identifier separated by purple .

Vladimir Arlazarov, head of the Laboratory of the Moscow Institute of Theoretical and Practical Physics, said that the PDF / A format for compressing images and storing documents is actually being used by many developers in products. and its technology. In particular, the application of MRC technology (Mixed Raster Content) is an extension of the approach used in DjVu format. While using MRC, geometric fragmentation using identification technology is implemented, in which images are separated into graphic layers (paintings and text) using different compression algorithms.

According to Arlazarov, in this approach there is a major drawback: If the system cannot recognize the object (text on the picture, seal or signature on printed text, poor copy quality, books or newspapers fall yellow) it will be processed as an image and will not be able to perform a search according to the document content after it is processed.

Arlazarov explains that, in Cognitive ScanPack, there is a geometry and color defragmentation application, which allows splitting of documents into several layers of information, so that the text can be processed in case of overlap and overwriting. In positions that are crossed out or disturbed by copying, stain . Separating documents into layers does not depend on each other important in the process of document processing, in which the paper background is as significant as in case of passport handling.

Also, according to Arlazarov, "the binary methods used to recover ScanPack's text increase the image quality of the text on the final document compared to the original document ". After that, each layer of information is processed more effectively by compression algorithm (text is compressed in TIFF format while images are in JPG format).

Cognitive Technologies Nikolai Nikolsky's vice president of marketing asserts that ScanPack-based products will not compete directly with Abbyy's solutions. Meanwhile, Vladimir Arlazarov added that by default, ScanPack uses Cuneiform-aware cores, but if desired, users can also connect to Abbyy FineReader systems.

Interestingly, because ScanPack knows how to identify and separate seal images and signatures, it is inadvertently "abetting " for forging paper documents. Vladimir Arlazarov admits that, with the mass market of ScanPack-based products, document falsification will become easy. However, he also said that those who use mature Photoshop software can do it.

Arlazarov said developers are trying to remove the risk of abusing technology by adding some recognizable finished product documentation, or reducing the quality of signatures, the seal has just been re-released. create.

Cognitive said, currently the ScanPack system is being used in two insurance companies "Zurich Insurance " and "Renessans Insurance " and Magnhitogorsky Metallurgy Plant, and possibly in the armed forces.

Nikolai Nikolsky said that the solutions on the Cognitive ScanPack platform will be sold in bulk in 2011. The total market value of Russian " document processing systems " is valued at USD 1 billion by Nikolsky (VND 20,833 billion). . Nikolai Nikolsky said that because there are no equivalent systems, Cognitive ScanPack can capture a significant market share in the world market.

Another interesting thing is that ScanPack is mainly based on open technologies: Cuneiform identification core developed by Cognitive and published in 2008 under a free license of BSD and PDF / A is a subset of PDF that has been standardized. chemical in the ISO system. Image recognition and processing components are still within license frameworks, according to Cognitive.

5 ★ | 1 Vote