…don't search – find!

What file type should i use to scan a document? 

Why should this be considered?

Anyone who frequently works with documents has surely wondered which file type to use for scanning. Nowadays, many different types are available, which are suitable for different applications. There are a variety of reasons why you should think about this. The following aspects are influenced by the choice of file type:

  • Text Recognition (OCR)
  • Memory requirement
  • Image Quality
  • Editing possibilities

Text recognition is one of the most important points if working with a document management system. Here, the optimal choice of the file type minimizes the errors. The storage requirements must also be taken into account if you are working with a large number of large documents that are to be archived in the long term. Good compression saves a lot of file size here and the hard disk or the server are less burdened.

The reading quality is self-explanatory, but may be more or less important in some cases. In some companies, many documents are automatically archived and only proofread if necessary. Here it can be weighed up whether a slightly poorer read quality allows better compression. Lastly, for some types, the editing options are limited. However, this also often depends on the software with which the file is to be edited.


The best file type for scanning documents
 

TIFF and PDF are the most popular file types for scanning documents. TIFF (Tagged Image File Format ) was created by the Aldus Corporation in order to to provide high-resolution images in printable, lossless quality. It is accepted that these files are several times the size of a lossy compressed JPEG image. However, these days it is also possible to compress a TIFF file.

A PDF (Portable Document Format) consists of text and layout. In make, the text is embedded, which eliminates the need for OCR, as it can be extracted directly without conversion. However, this is not always the case. The PDF/A is supporting this text embedding. Other file types are:

  • RAW
  • GIF
  • BMP
  • JPG

We recommend F files for scanning for several reasons. First, the recognition rate is better than a PDF in many cases. In addition, with proper compression (CCITT4), a smaller file size is also possible and requires less memory.

On the subject of editing options, a PDF format is often preferred because the number of programs is greater here. However, many modern document management systems (such as bitfarm-Archiv) also support the editing of TIFF files, so that pages can be subsequently rotated, cut or provided with graphic elements (stamps, notes, etc.).

what file type should i use to scan a document


TIFF-Files in bitfarm-Archiv 

bifarm-Archiv converts all files into a TIFF format during archiving. This ensures that a preview function of the document is possible. However, the original file is still stored separately on the server in any case. This TIFF file is therefore also audit-proof and  a copy can be subsequently edited in the document editor. Together with strong OCR modules such as Tesserarct and Omnipage, the optimum recognition rate can also be guaranteed here. 

We strongly recommend that for good OCR and at the same time small file size,  to set as black and white as default profile with a compression of CCITT 4 and 300 DPI. If necessary you can you can also fine-tune the contrast and brightness if the scan result is not good enough in this respect. the scan result is not good enough in this respect. 

Higher resolutions are of course also possible, but are down-converted by the archiving to save storage space. However, this conversion process slows down the server throughput.


Are you interested in more information around the topic of document management? We glady invite you to our YouTube series (please enable english subtitles) and are looking forward to your E-Mail. In this context, we are also inviting you to check out our free open source DMS, bitfarm-Archiv.

Further reading: Sources and interesting links

blogs.loc.gov

w.sunybroome.edu

Researchgate

Wikipedia


Glossary index | Home | Software | Services | Document Management | FAQ | Contact