Note: TessOCR is no longer under development, and it is no longer available for download.
TessOCR is free OCR tool using tesseract, ImageMagick and Xpdf as a framework with JVM. TessOCR is released and distributed under the Apache License, Version 2.0.
- Supported language: Japanese, English, French and so on. Additional support for character recognition dictionary.
- Layout recognition: Detects horizontal-writing and vertical-writing automatically. Recognizes only content of tabular.
- Recognizable format of image data: JPEG,PNG,GIF,BMP, TIFF and PDF.
- Recognizable image dimensions: There is no particular limitation.
- Recognizable character size: (Under the investigation)
- Elimination of noise in the image: Manual control.
- Correction of the inclination of the image: Manual control.
- Crop the image: Manual control. Spread pages can be specified.
- Convert to the grayscaled image by threshold: Manual control.
- Training the character recognition dictionary: Semi-automatic control. You can edit the box.
- Text Editing : You can input the text and edit it, and save it. You can search the text and replace with another string.
TessOCR uses internally tesseract, ImageMagick and Xpdf to process the image. However, tesseract, ImageMagick and/or Xpdf do not include as a framework of TessOCR. If tesseract, ImageMagick and/or Xpdf is already installed in your environment, TessOCR will link to it. If tesseract, ImageMagick and/or Xpdf have not been installed yet, that thing will notify to you. You have to install tesseract, ImageMagick and/or Xpdf using MacPorts.