public final class TextDevice extends PageDevice
Represents class for converting pdf document pages into text.
The example demonstrates how to extract text on the first PDF document page. Document doc = new Document(inFile); String extractedText; ByteArrayOutputStream ms = new ByteArrayOutputStream(); try { // create text device TextDevice device = new TextDevice(); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), ms); // use the extracted text extractedText = Encoding.getUnicode().getString(ms.toByteArray()); ms.close(); } catch (IOException e) { e.printStackTrace(); }
The TextDevice
object is basically used to extract text from pdf page.
Constructor and Description |
---|
TextDevice()
Initializes a new instance of the
TextDevice with the Raw text formatting mode and
Unicode text encoding. |
TextDevice(Charset encoding)
Initializes a new instance of the
TextDevice for the specified encoding. |
TextDevice(TextEncodingInternal encoding)
Initializes a new instance of the
TextDevice for the specified encoding. |
TextDevice(TextExtractionOptions extractionOptions)
Initializes a new instance of the
TextDevice with text extraction options. |
TextDevice(TextExtractionOptions extractionOptions,
Charset encoding)
Initializes a new instance of the
TextDevice for the specified encoding with text
extraction options. |
TextDevice(TextExtractionOptions extractionOptions,
TextEncodingInternal encoding)
Initializes a new instance of the
TextDevice for the specified encoding with text
extraction options. |
Modifier and Type | Method and Description |
---|---|
Charset |
getEncoding()
Gets encoding of extracted text.
|
TextEncodingInternal |
getEncodingInternal()
Gets encoding of extracted text.
|
TextExtractionOptions |
getExtractionOptions()
Gets text extraction options.
|
void |
process(Page page,
OutputStream output)
Convert page and save it as text stream.
|
void |
processInternal(Page page,
com.aspose.ms.System.IO.Stream output)
Convert page and save it as text stream.
|
void |
setEncoding(Charset value)
Sets encoding of extracted text.
|
void |
setEncodingInternal(TextEncodingInternal value)
Sets encoding of extracted text.
|
void |
setExtractionOptions(TextExtractionOptions value)
Sets text extraction options.
|
process, process
public TextDevice(TextExtractionOptions extractionOptions)
Initializes a new instance of the TextDevice
with text extraction options.
extractionOptions
- Text extraction options.public TextDevice()
Initializes a new instance of the TextDevice
with the Raw text formatting mode and
Unicode text encoding.
public TextDevice(TextEncodingInternal encoding)
Initializes a new instance of the TextDevice
for the specified encoding.
encoding
- Encoding of extracted textpublic TextDevice(Charset encoding)
Initializes a new instance of the TextDevice
for the specified encoding.
encoding
- Encoding of extracted textpublic TextDevice(TextExtractionOptions extractionOptions, TextEncodingInternal encoding)
Initializes a new instance of the TextDevice
for the specified encoding with text
extraction options.
extractionOptions
- Text extraction options.encoding
- Encoding of extracted text.public TextDevice(TextExtractionOptions extractionOptions, Charset encoding)
Initializes a new instance of the TextDevice
for the specified encoding with text
extraction options.
extractionOptions
- Text extraction options.encoding
- Encoding of extracted text.public TextExtractionOptions getExtractionOptions()
Gets text extraction options.
The example demonstrates how to extracted text in raw order. Document doc = new Document(inFile); String extractedText; // create text device TextDevice device = new TextDevice(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw)); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), outFile);
public void setExtractionOptions(TextExtractionOptions value)
Sets text extraction options.
value
- TextExtractionOptions element
The example demonstrates how to extracted text in raw order. Document doc = new Document(inFile); String extractedText; // create text device TextDevice device = new TextDevice(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw)); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), outFile);
public TextEncodingInternal getEncodingInternal()
Gets encoding of extracted text.
The example demonstrates how to represent extracted text in UTF-8 encoding. Document doc = new Document(inFile); String extractedText; // create text device TextDevice device = new TextDevice(java.nio.charset.Charset.forName("UTF-8")); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), outFile);
public Charset getEncoding()
Gets encoding of extracted text.
The example demonstrates how to represent extracted text in UTF-8 encoding. Document doc = new Document(inFile); String extractedText; // create text device TextDevice device = new TextDevice(java.nio.charset.Charset.forName("UTF-8")); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), outFile);
public void setEncodingInternal(TextEncodingInternal value)
Sets encoding of extracted text.
value
- TextEncodingInternal element
The example demonstrates how to represent extracted text in UTF-8 encoding. Document doc = new Document(inFile); String extractedText; // create text device TextDevice device = new TextDevice(TextEncodingInternal.getUTF8()); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), outFile);
public void setEncoding(Charset value)
Sets encoding of extracted text.
value
- Charset element
The example demonstrates how to represent extracted text in UTF-8 encoding. Document doc = new Document(inFile); String extractedText; // create text device TextDevice device = new TextDevice(java.nio.charset.Charset.forName("UTF-8")); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), outFile);
public void processInternal(Page page, com.aspose.ms.System.IO.Stream output)
Convert page and save it as text stream.
The example demonstrates how to extract text on the first PDF document page. Document doc = new Document(inFile); String extractedText; ByteArrayOutputStream ms = new ByteArrayOutputStream(); // create text device TextDevice device = new TextDevice(); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), ms); // use the extracted text extractedText = Encoding.getUnicode().getString(ms.toByteArray()); ms.close();
processInternal
in class PageDevice
page
- The page to convert.output
- Result stream.public void process(Page page, OutputStream output)
Convert page and save it as text stream.
The example demonstrates how to extract text on the first PDF document page. Document doc = new Document(inFile); String extractedText; ByteArrayOutputStream ms = new ByteArrayOutputStream(); // create text device TextDevice device = new TextDevice(); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), ms); // use the extracted text extractedText = Encoding.getUnicode().getString(ms.toByteArray()); ms.close();
process
in class PageDevice
page
- The page to convert.output
- Result stream.