PDF & Perl

PDF is well defined and essentially text-based. If you understand the PDF format, then you can work with the content of PDF files in almost any language. I like to use Perl for manipulating and modifying PDF files outside Acrobat, and to this end have developed a Perl module that is constantly being expanded.

When new concepts are introduced in the PDF specification, I see whether the module should be updated to support them. A good example is object streams and cross-reference streams that were introduced in version 1.5 of PDF (at the same time as Acrobat 6).

An object stream is a stream that contains a number of objects. Object streams form the content of "normal" objects. An object in an object stream is referenced by specifying the object number of the enclosing object and the position of the object in the object stream. So, when I want to add an object to a PDF file I must specify whether it is to be added to an object stream, and if so to which one. Some housekeeping is needed to keep track of how many objects are in an object stream (the PDF specification recommends certain limits).

A cross-reference stream contains the information traditionally found in a cross-reference table but in a more compact format, and, being in a stream, with the option of compression.

First, support for reading cross-reference streams and object streams was added to the module. After due thought, routines for writing cross-reference and object streams have now also been added.