com.itextpdf.text.pdf.parser
Class SimpleTextExtractionStrategy

java.lang.Object
  extended by com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy
All Implemented Interfaces:
RenderListener, TextExtractionStrategy

public class SimpleTextExtractionStrategy
extends Object
implements TextExtractionStrategy

A simple text extraction renderer. This renderer keeps track of the current Y position of each string. If it detects that the y position has changed, it inserts a line break into the output. If the PDF renders text in a non-top-to-bottom fashion, this will result in the text not being a true representation of how it appears in the PDF. This renderer also uses a simple strategy based on the font metrics to determine if a blank space should be inserted into the output.

Since:
2.1.5

Constructor Summary
SimpleTextExtractionStrategy()
          Creates a new text extraction renderer.
 
Method Summary
protected  void appendTextChunk(CharSequence text)
          Used to actually append text to the text results.
 void beginTextBlock()
          Called when a new text block is beginning (i.e.
 void endTextBlock()
          Called when a text block has ended (i.e.
 String getResultantText()
          Returns the result so far.
 void renderImage(ImageRenderInfo renderInfo)
          no-op method - this renderer isn't interested in image events
 void renderText(TextRenderInfo renderInfo)
          Captures text using a simplified algorithm for inserting hard returns and spaces
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleTextExtractionStrategy

public SimpleTextExtractionStrategy()
Creates a new text extraction renderer.

Method Detail

beginTextBlock

public void beginTextBlock()
Description copied from interface: RenderListener
Called when a new text block is beginning (i.e. BT)

Specified by:
beginTextBlock in interface RenderListener
Since:
5.0.1

endTextBlock

public void endTextBlock()
Description copied from interface: RenderListener
Called when a text block has ended (i.e. ET)

Specified by:
endTextBlock in interface RenderListener
Since:
5.0.1

getResultantText

public String getResultantText()
Returns the result so far.

Specified by:
getResultantText in interface TextExtractionStrategy
Returns:
a String with the resulting text.

appendTextChunk

protected final void appendTextChunk(CharSequence text)
Used to actually append text to the text results. Subclasses can use this to insert text that wouldn't normally be included in text parsing (e.g. result of OCR performed against image content)

Parameters:
text - the text to append to the text results accumulated so far

renderText

public void renderText(TextRenderInfo renderInfo)
Captures text using a simplified algorithm for inserting hard returns and spaces

Specified by:
renderText in interface RenderListener
Parameters:
renderInfo - render info

renderImage

public void renderImage(ImageRenderInfo renderInfo)
no-op method - this renderer isn't interested in image events

Specified by:
renderImage in interface RenderListener
Parameters:
renderInfo - information specifying what to render
Since:
5.0.1
See Also:
RenderListener.renderImage(com.itextpdf.text.pdf.parser.ImageRenderInfo)


Copyright © 2013. All Rights Reserved.