How to build Tesseract 3.03 with Visual Studio 2013

by Paul Vorbach, 2014-04-10Comments


This article has been translated to Korean.


Previously I wrote about how to compile Tesseract OCR using Cygwin. While this is nice if you want to compile Tesseract for your own system where you can install Cygwin on your own, compiling with Visual Studio is better if you intend to distribute the compiled package so users don’t have to install Cygwin.

Compiling Tesseract 3.02.02 with Visual C++ 2008 (Express) is covered by the documentation whereas compiling Tesseract 3.03 isn’t covered at all, though.

Unfortunately newer versions of Tesseract also require a new version of Leptonica, a C library for image processing and image analysis applications, which in turn requires new versions of zlib, libpng, libtiff, libjpeg and giflib. Tesseract provides pre-compiled versions of Leptonica, which prevents you from having to collect and set up projects for all of these libraries in Visual Studio, which can be a tedious task.

Yesterday I found a project on GitHub that includes a Visual Studio solution file for all dependencies required to compile Tesseract 3.03: charlesw/tesseract-vs2012. While following the build instructions there, I stumpled over several build errors, which I could easily resolve by removing a definition. The necessary change is in my fork of the repository mentioned above.

This is a write-up of all steps that are required to compile Tesseract 3.03 with Visual Studio 2013.

Prerequisites

  1. Install Git.
  2. Install SVN. There are many versions of SVN. You can, for example, install the binary package from SlickSVN for free.
  3. Install Visual Studio 2013 for Windows Desktop (the Express version will be enough). You don’t need the optional features except for “Microsoft Foundation Classes for C++”.

Building the dependencies

  1. Create a directory where you want to compile Tesseract. In this document, I’ll assume it’s C:\Tesseract-Build\.
  2. Open a CMD prompt and change to that directory.

    cd \Tesseract-Build\
  3. Clone the dependencies repository from GitHub.

    git clone git://github.com/pvorb/tesseract-vs2013.git
  4. Open the “VS 2013 Developer Command Prompt”. (It can be found in the Start Menu.)
  5. Change to the newly cloned repository.

    cd \Tesseract-Build\tesseract-vs2013
  6. Build the dependencies

    msbuild build.proj
  7. You can close the “VS 2013 Developer Command Prompt”.

Building Tesseract

  1. Re-open the first command prompt and ensure it’s still in C:\Tesseract-Build\.
  2. Get the latest source from SVN.

    svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
  3. Change to the newly checked-out repository.

    cd tesseract-ocr
  4. Apply the patch provided in tesseract-vs2013.

    svn patch ..\tesseract-vs2013\vs2013+64bit_support.patch
  5. Copy both directories in C:\Tesseract-Build\tesseract-vs2013\release\ to C:\Tesseract-Build\. Now you should have

    • C:\Tesseract-Build\include\
    • C:\Tesseract-Build\lib\
  6. Open C:\Tesseract-Build\tesseract-ocr\vs2013\tesseract.sln with Visual Studio 2013.
  7. Press F7 on your keyboard. Both libtesseract303 and tesseract should compile without errors.

The Visual Studio solution file contains configurations for dynamic and static compilation as well as debugging and release configurations for both 32-Bit and 64-Bit. Select whichever configuration you need and recompile with F7.

You can find the compiled binaries in C:\Tesseract-Build\tesseract-ocr\vs2013\bin\.