Examining the Structure of a PDF
- Written by Matthew
When learning about the PDF format, or trying to track down problems with a PDF, being able to see the structure of a PDF file is a great help. This utility shows a PDF file as an hierarchy of objects using a tree view.
The program is written exclusively in Perl and makes extensive use of our Perl module for handling PDF objects. Because the output is shown in a Windows TreeView control the tool only runs under Windows. All the necessary components to allow it to run on machines that do not have Perl installed are packaged using Cava Packager (www.cava.co.uk) and the installer made using Inno Setup (www.jrsoftware.org).
Simply drag a file onto the window or use File/Open and click 'Show Structure' and the content is shown. Files that are encrypted using normal password security are also handled.
The utility has two modes: 'normal mode' in which the tree is expanded under user control, and 'expanded mode' in which the whole tree is available to the user at the click of a button.
'Normal mode' lets you view an object by expanding it, and the same object can be open more than once in different contexts. In 'expanded mode' the content of an object is shown once, and further occurrences are replaced by a link (XREF) that takes you to the expanded object.
In 'expanded mode' four buttons are available:
Start with full tree puts the program in 'expanded mode';
Expand streams shows stream contents in the tree (truncated after 260 characters);
Expand All expands the whole tree.
Stream contents are shown in full in a separate window by clicking the label (STREAM) in the tree representation. Null characters are shown as "\x00".
The time taken to show the initial output depends on the number of objects in the PDF file. For many files the results are shown almost instantaneously: in a test it took 15s before the initial output was shown for a file with >350000 objects: it is unfeasible to attempt to open such a file with the Start with full tree option selected (the likelihood is that you will run out of memory). Because stream contents have to be scanned to replace null characters, large streams could take a few seconds before they are shown. Handling an encrypted file is slightly slower than handling the same file with no encryption.