public class TextAbsorber extends Object
Represents an absorber object of a text. Performs text extraction and provides access to the
result via TextAbsorber.Text
object.
The example demonstrates how to extract text on the first PDF document page. // open document Document doc = new Document(inFile); // create TextAbsorber object to extract text TextAbsorber absorber = new TextAbsorber(); // accept the absorber for first page doc.getPages().get(1).accept(absorber); // get the extracted text String extractedText = absorber.getText();
The TextAbsorber
object is used to extract text from a Pdf document or the document's
page.
Constructor and Description |
---|
TextAbsorber()
Initializes a new instance of the
TextAbsorber . |
TextAbsorber(TextExtractionOptions extractionOptions)
Initializes a new instance of the
TextAbsorber with extraction options. |
TextAbsorber(TextExtractionOptions extractionOptions,
TextSearchOptions textSearchOptions)
Initializes a new instance of the
TextAbsorber with extraction and text search
options. |
TextAbsorber(TextSearchOptions textSearchOptions)
Initializes a new instance of the
TextAbsorber with text search options. |
Modifier and Type | Method and Description |
---|---|
List<TextExtractionError> |
getErrors()
List of
TextExtractionError objects. |
TextExtractionOptions |
getExtractionOptions()
Gets text extraction options.
|
String |
getText()
Gets extracted text that the
TextAbsorber extracts on the PDF document or page. |
TextSearchOptions |
getTextSearchOptions()
Gets text search options.
|
boolean |
hasErrors()
Value indicates whether errors were found during text extraction.
|
void |
setExtractionOptions(TextExtractionOptions value)
Sets text extraction options.
|
void |
setTextSearchOptions(TextSearchOptions value)
Sets text search options.
|
void |
visit(IDocument pdf)
Extracts text on the specified document
The example demonstrates how to extract text on PDF document.
// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for all document's pages
absorber.visit(doc);
// get the extracted text
String extractedText = absorber.getText();
|
void |
visit(Page page)
Extracts text on the specified page
The example demonstrates how to extract text on the first PDF document page.
// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for all document's pages
absorber.visit(doc.getPages(1));
// get the extracted text
String extractedText = absorber.getText();
|
void |
visit(XForm form)
Extracts text on the specified XForm.
|
public TextAbsorber()
Initializes a new instance of the TextAbsorber
.
The example demonstrates how to extract text from all pages of the PDF document. // open document Document doc = new Document(inFile); // create TextAbsorber object to extract text TextAbsorber absorber = new TextAbsorber(); // accept the absorber for all document's pages doc.getPages().accept(absorber); // get the extracted text String extractedText = absorber.getText();
Performs text extraction and provides access to the extracted text via
TextAbsorber.Text
object.
public TextAbsorber(TextExtractionOptions extractionOptions)
Initializes a new instance of the TextAbsorber
with extraction options.
The example demonstrates how to extract text from all pages of the PDF document. // open document Document doc = new Document(inFile); // create TextAbsorber object to extract text with formatting TextAbsorber absorber = new TextAbsorber(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure)); // accept the absorber for all document's pages doc.getPages().accept(absorber); // get the extracted text String extractedText = absorber.getText();
Performs text extraction and provides access to the extracted text via
TextAbsorber.Text
object.
extractionOptions
- Text extraction options
public TextAbsorber(TextExtractionOptions extractionOptions, TextSearchOptions textSearchOptions)
Initializes a new instance of the TextAbsorber
with extraction and text search
options.
extractionOptions
- Text extraction optionstextSearchOptions
- Text search options
Performs text extraction and provides access to the extracted text via
TextAbsorber.Text
object.
public TextAbsorber(TextSearchOptions textSearchOptions)
Initializes a new instance of the TextAbsorber
with text search options.
textSearchOptions
- Text search options
Performs text extraction and provides access to the extracted text via
TextAbsorber.Text
object.
public String getText()
Gets extracted text that the TextAbsorber
extracts on the PDF document or page.
The example demonstrates how to extract text from all pages of the PDF document. // open document Document doc = new Document(inFile); // create TextAbsorber object to extract text TextAbsorber absorber = new TextAbsorber(); // accept the absorber for all document's pages doc.getPages().accept(absorber); // get the extracted text String extractedText = absorber.getText();
public boolean hasErrors()
Value indicates whether errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance.
public List<TextExtractionError> getErrors()
List of TextExtractionError
objects. It contain information about errors were found
during text extraction. Searching for errors will performed only if
TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance.
public void visit(Page page)
Extracts text on the specified page
The example demonstrates how to extract text on the first PDF document page. // open document Document doc = new Document(inFile); // create TextAbsorber object to extract text TextAbsorber absorber = new TextAbsorber(); // accept the absorber for all document's pages absorber.visit(doc.getPages(1)); // get the extracted text String extractedText = absorber.getText();
page
- Pdf pocument page object.public void visit(XForm form)
Extracts text on the specified XForm.
The example demonstrates how to extract text on the first PDF document page. // open document Document doc = new Document(inFile); // create TextAbsorber object to extract text TextAbsorber absorber = new TextAbsorber(); // accept the absorber for all document's pages absorber.visit(doc.Pages().get(1).getResources().getForms().get("Xform1")); // get the extracted text String extractedText = absorber.getText();
form
- Pdf form object.public void visit(IDocument pdf)
Extracts text on the specified document
The example demonstrates how to extract text on PDF document. // open document Document doc = new Document(inFile); // create TextAbsorber object to extract text TextAbsorber absorber = new TextAbsorber(); // accept the absorber for all document's pages absorber.visit(doc); // get the extracted text String extractedText = absorber.getText();
pdf
- Pdf pocument object.public TextExtractionOptions getExtractionOptions()
Gets text extraction options.
The example demonstrates how to set Pure text formatting mode and perform text extraction. // open document Document doc = new Document(inFile); // create TextAbsorber object to extract text with formatting TextAbsorber absorber = new TextAbsorber(); // set pure text formatting mode absorber.setExtractionOptions ( new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure)); // accept the absorber for all document's pages doc.getPages().accept(absorber); // get the extracted text String extractedText = absorber.getText();
Allows to define text formatting mode TextExtractionOptions
during extraction. The
default mode is TextExtractionOptions.TextFormattingMode.Pure
public void setExtractionOptions(TextExtractionOptions value)
Sets text extraction options.
The example demonstrates how to set Pure text formatting mode and perform text extraction. // open document Document doc = new Document(inFile); // create TextAbsorber object to extract text with formatting TextAbsorber absorber = new TextAbsorber(); // set pure text formatting mode absorber.setExtractionOptions ( new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure)); // accept the absorber for all document's pages doc.getPages().accept(absorber); // get the extracted text String extractedText = absorber.getText();
Allows to define text formatting mode TextExtractionOptions
during extraction. The
default mode is TextExtractionOptions.TextFormattingMode.Pure
value
- TextExtractionOptions valuepublic TextSearchOptions getTextSearchOptions()
Allows to define rectangle which delimits the extracted text. By default the rectangle is empty. That means page boundaries only defines the text extraction region.
public void setTextSearchOptions(TextSearchOptions value)
Allows to define rectangle which delimits the extracted text. By default the rectangle is empty. That means page boundaries only defines the text extraction region.
value
- TextSearchOptions value