|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.itextpdf.text.pdf.parser.LocationTextExtractionStrategy
public class LocationTextExtractionStrategy
Development preview - this class (and all of the parser classes) are still experiencing
heavy development, and are subject to change both behavior and interface.
A text extraction renderer that keeps track of relative position of text on page
The resultant text will be relatively consistent with the physical layout that most
PDF files have on screen.
This renderer keeps track of the orientation and distance (both perpendicular
and parallel) to the unit vector of the orientation. Text is ordered by
orientation, then perpendicular, then parallel distance. Text with the same
perpendicular distance, but different parallel distance is treated as being on
the same line.
This renderer also uses a simple strategy based on the font metrics to determine if
a blank space should be inserted into the output.
Nested Class Summary | |
---|---|
static class |
LocationTextExtractionStrategy.TextChunk
Represents a chunk of text, it's orientation, and location relative to the orientation vector |
static interface |
LocationTextExtractionStrategy.TextChunkFilter
Specifies a filter for filtering LocationTextExtractionStrategy.TextChunk objects during text extraction |
Constructor Summary | |
---|---|
LocationTextExtractionStrategy()
Creates a new text extraction renderer. |
Method Summary | |
---|---|
void |
beginTextBlock()
Called when a new text block is beginning (i.e. |
void |
endTextBlock()
Called when a text block has ended (i.e. |
String |
getResultantText()
Returns the result so far. |
String |
getResultantText(LocationTextExtractionStrategy.TextChunkFilter chunkFilter)
Gets text that meets the specified filter If multiple text extractions will be performed for the same page (i.e. |
protected boolean |
isChunkAtWordBoundary(LocationTextExtractionStrategy.TextChunk chunk,
LocationTextExtractionStrategy.TextChunk previousChunk)
Determines if a space character should be inserted between a previous chunk and the current chunk. |
void |
renderImage(ImageRenderInfo renderInfo)
no-op method - this renderer isn't interested in image events |
void |
renderText(TextRenderInfo renderInfo)
Called when text should be rendered |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public LocationTextExtractionStrategy()
Method Detail |
---|
public void beginTextBlock()
RenderListener
beginTextBlock
in interface RenderListener
RenderListener.beginTextBlock()
public void endTextBlock()
RenderListener
endTextBlock
in interface RenderListener
RenderListener.endTextBlock()
protected boolean isChunkAtWordBoundary(LocationTextExtractionStrategy.TextChunk chunk, LocationTextExtractionStrategy.TextChunk previousChunk)
chunk
- the new chunk being evaluatedpreviousChunk
- the chunk that appeared immediately before the current chunk
public String getResultantText(LocationTextExtractionStrategy.TextChunkFilter chunkFilter)
FilteredRenderListener
- but not nearly as powerful
because most of the RenderInfo state is not captured in LocationTextExtractionStrategy.TextChunk
chunkFilter
- the filter to to apply
public String getResultantText()
getResultantText
in interface TextExtractionStrategy
public void renderText(TextRenderInfo renderInfo)
RenderListener
renderText
in interface RenderListener
renderInfo
- information specifying what to renderRenderListener.renderText(com.itextpdf.text.pdf.parser.TextRenderInfo)
public void renderImage(ImageRenderInfo renderInfo)
renderImage
in interface RenderListener
renderInfo
- information specifying what to renderRenderListener.renderImage(com.itextpdf.text.pdf.parser.ImageRenderInfo)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |