LocationTextExtractionStrategy (iText, a Free Java-PDF library 5.4.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.itextpdf.text.pdf.parser
Class LocationTextExtractionStrategy

java.lang.Object
  com.itextpdf.text.pdf.parser.LocationTextExtractionStrategy

All Implemented Interfaces:: RenderListener, TextExtractionStrategy

public class LocationTextExtractionStrategy
extends Object
implements TextExtractionStrategy
extends Object
implements TextExtractionStrategy

Development preview - this class (and all of the parser classes) are still experiencing heavy development, and are subject to change both behavior and interface.
A text extraction renderer that keeps track of relative position of text on page The resultant text will be relatively consistent with the physical layout that most PDF files have on screen.
This renderer keeps track of the orientation and distance (both perpendicular and parallel) to the unit vector of the orientation. Text is ordered by orientation, then perpendicular, then parallel distance. Text with the same perpendicular distance, but different parallel distance is treated as being on the same line.
This renderer also uses a simple strategy based on the font metrics to determine if a blank space should be inserted into the output.

Since:: 5.0.2

Nested Class Summary
`static class`	`LocationTextExtractionStrategy.TextChunk` Represents a chunk of text, it's orientation, and location relative to the orientation vector
`static interface`	`LocationTextExtractionStrategy.TextChunkFilter` Specifies a filter for filtering `LocationTextExtractionStrategy.TextChunk` objects during text extraction

Constructor Summary
`LocationTextExtractionStrategy()` Creates a new text extraction renderer.

Method Summary
`void`	`beginTextBlock()` Called when a new text block is beginning (i.e.
`void`	`endTextBlock()` Called when a text block has ended (i.e.
`String`	`getResultantText()` Returns the result so far.
`String`	`getResultantText(LocationTextExtractionStrategy.TextChunkFilter chunkFilter)` Gets text that meets the specified filter If multiple text extractions will be performed for the same page (i.e.
`protected boolean`	`isChunkAtWordBoundary(LocationTextExtractionStrategy.TextChunk chunk, LocationTextExtractionStrategy.TextChunk previousChunk)` Determines if a space character should be inserted between a previous chunk and the current chunk.
`void`	`renderImage(ImageRenderInfo renderInfo)` no-op method - this renderer isn't interested in image events
`void`	`renderText(TextRenderInfo renderInfo)` Called when text should be rendered

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

LocationTextExtractionStrategy

public LocationTextExtractionStrategy()

Creates a new text extraction renderer.

Method Detail

beginTextBlock

public void beginTextBlock()

Description copied from interface: RenderListener

Called when a new text block is beginning (i.e. BT)

Specified by:: beginTextBlock in interface RenderListener

See Also:: RenderListener.beginTextBlock()

endTextBlock

public void endTextBlock()

Description copied from interface: RenderListener

Called when a text block has ended (i.e. ET)

Specified by:: endTextBlock in interface RenderListener

See Also:: RenderListener.endTextBlock()

isChunkAtWordBoundary

protected boolean isChunkAtWordBoundary(LocationTextExtractionStrategy.TextChunk chunk,
                                        LocationTextExtractionStrategy.TextChunk previousChunk)

Determines if a space character should be inserted between a previous chunk and the current chunk. This method is exposed as a callback so subclasses can fine time the algorithm for determining whether a space should be inserted or not. By default, this method will insert a space if the there is a gap of more than half the font space character width between the end of the previous chunk and the beginning of the current chunk. It will also indicate that a space is needed if the starting point of the new chunk appears *before* the end of the previous chunk (i.e. overlapping text).

Parameters:: chunk - the new chunk being evaluated; previousChunk - the chunk that appeared immediately before the current chunk
Returns:: true if the two chunks represent different words (i.e. should have a space between them). False otherwise.

getResultantText

public String getResultantText(LocationTextExtractionStrategy.TextChunkFilter chunkFilter)

Gets text that meets the specified filter If multiple text extractions will be performed for the same page (i.e. for different physical regions of the page), filtering at this level is more efficient than filtering using FilteredRenderListener - but not nearly as powerful because most of the RenderInfo state is not captured in LocationTextExtractionStrategy.TextChunk

Parameters:: chunkFilter - the filter to to apply
Returns:: the text results so far, filtered using the specified filter

getResultantText

public String getResultantText()

Returns the result so far.

Specified by:: getResultantText in interface TextExtractionStrategy

Returns:: a String with the resulting text.

renderText

public void renderText(TextRenderInfo renderInfo)

Description copied from interface: RenderListener

Called when text should be rendered

Specified by:: renderText in interface RenderListener

Parameters:: renderInfo - information specifying what to render
See Also:: RenderListener.renderText(com.itextpdf.text.pdf.parser.TextRenderInfo)

renderImage

public void renderImage(ImageRenderInfo renderInfo)

no-op method - this renderer isn't interested in image events

Specified by:: renderImage in interface RenderListener

Parameters:: renderInfo - information specifying what to render
Since:: 5.0.1
See Also:: RenderListener.renderImage(com.itextpdf.text.pdf.parser.ImageRenderInfo)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.itextpdf.text.pdf.parser Class LocationTextExtractionStrategy

LocationTextExtractionStrategy

beginTextBlock

endTextBlock

isChunkAtWordBoundary

getResultantText

getResultantText

renderText

renderImage

com.itextpdf.text.pdf.parser
Class LocationTextExtractionStrategy