Network Distribution of Scanned Documents

Wednesday, September 26, 2001 - 17:30
TH 331
Radovan Krtolica San Francisco State University

This talk relates experience from three software development Projects for handling documents in small offices, within the framework of client-server architecture. The focus of all three projects is on storage and distribution of documents scanned from paper over LAN. In all three projects the physical container of the DB for document images is a server connected to one or more office scanners. Client PC units use content of the documents to retrieve them from the database. The content-based retrieval is augmented by conventional (manual) indexing to speed up the retrieval in certain cases.

The focus of the first project is the development of proprietary Software for high accuracy multifont machine printed character recognition. The OCR procedure associates text (ASCII) files to document images in the database, and is meant to be a background process.

The purpose of the second project was to allow immediate distribution of scanned document images over LAN and over different DB locations. In a stack of paper sheets on the scanner input trail, pages belonging to different documents are separated by special transmittal sheets. Preprinted form on transmittal sheets contains a box for hand entered transmittal symbols. Software solutions are developed to recognize the form on the transmittal sheets, and subsequently, to recognize hand entered transmittal symbols. Then, all the images belonging to one document are distributed according to the hand-entered instructions. This solution reduces continuous monitoring of the scanning process to Intermittent supervision.

The goal of the third project is to separate text from nontext in Document images, and to develop a fast engine for JPEG compression and decompression of document images. Both software solutions were eventually implemented in Canon's products. Software developed in all three projects is a part of Canon's proprietary packages DOX and Dr DOX for storing and handling documents in small and medium size offices. It has been successfully applied by Canon USA Medical branch, and sold to several major clinics in USA.


Radovan Krtolica is adjunct professor at San Francisco State And Santa Clara universities, teaching courses in computer science and electrical engineering. He has spent ten years with Canon Research Center America in Palo Alto, California, designing C/C++ software solutions for character recognition, image processing and image compression problems. Previously, he was Visiting Scientist at the Scientific Laboratory of Ford Motor Company in Dearborn, Michigan, and Director of Mathematical Institute of Serbian Academy of Sciences and Arts in Belgrade, Yugoslavia, where he was involved in research and teaching of stochastic control systems, signal processing and encryption/decryption systems. He published two books and more than 30 papers in well known international scientific journals. He holds seven U.S. patents, two E.U. patents, and ten pending patents in image rocessing and compression.