com.itextpdf.text.pdf.parser
Class TaggedPdfReaderTool

java.lang.Object
  extended by com.itextpdf.text.pdf.parser.TaggedPdfReaderTool

public class TaggedPdfReaderTool
extends Object

Converts a tagged PDF document into an XML file.

Since:
5.0.2

Field Summary
protected  PrintWriter out
          The writer object to which the XML will be written
protected  PdfReader reader
          The reader object from which the content streams are read.
 
Constructor Summary
TaggedPdfReaderTool()
           
 
Method Summary
 void convertToXml(PdfReader reader, OutputStream os)
          Parses a string with structured content.
 void convertToXml(PdfReader reader, OutputStream os, String charset)
          Parses a string with structured content.
 void inspectChild(PdfObject k)
          Inspects a child of a structured element.
 void inspectChildArray(PdfArray k)
          If the child of a structured element is an array, we need to loop over the elements.
 void inspectChildDictionary(PdfDictionary k)
          If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.
 void inspectChildDictionary(PdfDictionary k, boolean inspectAttributes)
          If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.
 void parseTag(String tag, PdfObject object, PdfDictionary page)
          Searches for a tag in a page.
protected  String xmlName(PdfName name)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

reader

protected PdfReader reader
The reader object from which the content streams are read.


out

protected PrintWriter out
The writer object to which the XML will be written

Constructor Detail

TaggedPdfReaderTool

public TaggedPdfReaderTool()
Method Detail

convertToXml

public void convertToXml(PdfReader reader,
                         OutputStream os,
                         String charset)
                  throws IOException
Parses a string with structured content.

Parameters:
reader - the PdfReader that has access to the PDF file
os - the OutputStream to which the resulting xml will be written
charset - the charset to encode the data
Throws:
IOException
Since:
5.0.5

convertToXml

public void convertToXml(PdfReader reader,
                         OutputStream os)
                  throws IOException
Parses a string with structured content. The output is done using the current charset.

Parameters:
reader - the PdfReader that has access to the PDF file
os - the OutputStream to which the resulting xml will be written
Throws:
IOException

inspectChild

public void inspectChild(PdfObject k)
                  throws IOException
Inspects a child of a structured element. This can be an array or a dictionary.

Parameters:
k - the child to inspect
Throws:
IOException

inspectChildArray

public void inspectChildArray(PdfArray k)
                       throws IOException
If the child of a structured element is an array, we need to loop over the elements.

Parameters:
k - the child array to inspect
Throws:
IOException

inspectChildDictionary

public void inspectChildDictionary(PdfDictionary k)
                            throws IOException
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.

Parameters:
k - the child dictionary to inspect
Throws:
IOException

inspectChildDictionary

public void inspectChildDictionary(PdfDictionary k,
                                   boolean inspectAttributes)
                            throws IOException
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.

Parameters:
k - the child dictionary to inspect
Throws:
IOException

xmlName

protected String xmlName(PdfName name)

parseTag

public void parseTag(String tag,
                     PdfObject object,
                     PdfDictionary page)
              throws IOException
Searches for a tag in a page.

Parameters:
tag - the name of the tag
object - an identifier to find the marked content
page - a page dictionary
Throws:
IOException


Copyright © 2013. All Rights Reserved.