PDF Creator – about project and development process

In this article I describe the organization of the PDF Creator project and the tools that we use to develop and promote the product.

Development Tools

1. Subversion (SVN) Control

The life of a programmer would be unbearable without a source control manager :-). As of this writing, we have executed 2387 commits in the PDF Creator project using the Tortoise SVN, (http://tortoisesvn.tigris.org/). Since every mistake would cost a lot, we try to follow this rule – one that we would recommend to everyone: Always commit smaller portions of the code. Every commit must correspond to a single task!

This way it is (a) easier to find an error during rollbacks (every commit changes a small amount of source code), (b) often easy to discover an error before the commit just by looking at it, and (c) easier for colleagues to be aware of the changes in the source code, since all they have to do is read the comments and look at the smaller portions of the code.

Of course, this is not possible in all cases, and sometimes we do have to perform huge commits. Still this is a rule worth following. After all, it is one of the refactoring rules: Change smaller portions of the code, preserving its workability between sessions.

Two more source-control rules:

  1. There must always be two versions in the SVN – a version that compiles and a workable version. It is easy to create code that compiles, but the real aim is its correctness.
  2. Text comments should be supplied to every commit. This is also a good way to check whether the commit implements only a single task. If a comment describes several targets instead of just one, the rule has been violaled.

2. Bug Tracker

We use Mantis (http://www.mantisbt.org/). PDF Creator is the base for a number of other products – converters, virtual printer, etc. (See http://www.colorpilot.com/developer.html) The number of its users is much higher than the count of those who purchased the base product. We needed to simplify communications between the developers and the users of the products.

To make the things clear for those who have never used a bug tracker, here is a sample of how it works. The virtual printer developers receive (from the users of their product) information about an incorrect conversion into a PDF. The Virtual Printer gets an EMF file from input, then our library converts it. The information from the user is used to create a “ticket” for the library developers. The ticket is accompanied by a description of the problem and the problematic EMF file. Then we investigate the degree of severity of the problem to find out how critical it is and appoint a specific developer to work on a fix. After the problem has been resolved, the ticket gets closed. Everybody concerned is notified and may keep an eye on what is going on.

3. Technical Support

To manage our correspondence with clients, we use RT (http://bestpractical.com/rt/). We need the system for convenient storage of the archive, as well as assurance that every person who should read the messages does receive them.

Ideally, (2) and (3) might have been united into a single service, so that the clients were aware of their problem status. In the example above, the virtual printer user would keep an eye on developers’ correspondence concerning his problem. One such system that could help is FogBugz (http://fogcreek.com/FogBugz/).

4. Technical Documentation

Our internal technical documentation project, Squirrel (http://www.colorpilot.com/squirrel.html), isn’t widely known, however, it is publicly available. The program makes it easier to create a CHM-version of the documentation, integrates with SVN, and permits creation of the online documentation. See http://www.colorpilot.com/pdfcreatorpilotmanual/PDF_Creator_Pilot.html for the results of Squirrel’s work.

5. Development Environment

We use Visual Studio 2008, C++, and C#.

6. Refactoring Tools

There are no built-in tools for C++ refactoring in Visual Assist X (http://www.wholetomato.com/), so we have to use a third party plug-in for Visual Studio.

7. Unit Tests in C++

Here we use UnitTest++ (http://unittest-cpp.sourceforge.net/).

8. Profiling Tools

To diagnose memory leaks, we use http://www.colorpilot.com/~vit.shibaev/mmgr.zip. (I described that tool in one of my previous articles which you can find here: http://www.colorpilot.com/blog/about-pdf-creator-39/.) To control performance, we also use the Intel VTune, from time to time.

Project Organization

PDF Creator is a library for PDF processing: creation, reading, modification, text extraction, etc. Currently, the project exists as a solution with 18 sub-projects in Visual Studio 2008. There are five general types of sub-projects.

1. The Library Core

Here we define all the logic of PDF processing. This project is always in the process of modification. One of our old dreams (and still a dream today) is to make the core cross-platform. The biggest obstacle is the fact that EMF is a private Windows format. To make our dream come true, we should separate the conversion from the cross-platform part.

2. Client Interfaces

Two of the projects implement the client interfaces. One of them implements the COM interface; the other one implements a static library. The COM interface delegates its tasks to the static library; the static library forwards them to the core. No code is duplicated.

The static library isn’t advertised on our site, however every client may receive it together with the COM version. The static library contains some undocumented features that allow users to get some additional information about a document. If you are a developer and your product is intended to view PDFs, these features may be very useful for you.

3. Test Projects

  1. Non-automated tests for the COM interface. A set of APIs that form and render PDF documents immediately after their launch.
  2. Unit tests for the static library and the core. In addition to those described immediately above, the unit tests check the internal state of the classes, handle errors, etc.
  3. Projects that check the EMF-conversion. The first one (written in WTL) converts the specified EMF file into a PDF. Its interafce is minimalistic, looking very much like our online service http://www.colorpilot.com/pdflibrary_convert-emf-to-pdf.html.
    The second one (written in C#) are a work in progress. However, it is already possible to convert all PDFs from a folder and to instantly review the results. This provides a significant speed increase during the testing process, which is especially great since we’ve got hundreds of test metafiles already.

4. Auxiliary Projects

These include a font-processing library (that parses TrueType and Type1 fonts, works with encodings, and modifies the TrueType fonts), font installer, working with the keys, etc.

5. Third Party Open-Source Products

PDF is a complex format. It is based on many areas of science, multiple algorithms, and formats. It involves compression and encryption algorithms, various image and font formats, color spaces, etc. A big part of our implementation is based on open source projects with a free license, for instance, CxImage, ZLib, LibJPEG, Lcms, and UnitTest++.

Vitaly Shibaev
Developer of PDF Library

Leave a Reply

Your email address will not be published.