com.aspose.pdf

Interfaces

Classes

Enums

Exceptions

com.aspose.pdf

Class TextFragmentAbsorber



  • public final class TextFragmentAbsorber
    extends TextAbsorber

    Represents an absorber object of text fragments. Performs text search and provides access to search results via TextFragmentAbsorber.TextFragments collection.


     The example demonstrates how to find text on the first PDF document page and replace the text and it's font.
    
     // Open document
     Document doc = new Document("D:\\Tests\\input.pdf");
     // Find font that will be used to change document text font
     com.aspose.pdf.Font font = FontRepository.findFont("Arial");
     // Create TextFragmentAbsorber object to find all "hello world" text occurrences
     TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello world");
     // Accept the absorber for first page
     doc.getPages().get(1).accept(absorber);
     // Change text and font of the first text occurrence
     absorber.getTextFragments().get_Item(1).setText ( "hi world");
     absorber.getTextFragments().get_Item(1).getTextState().setFont ( font);
     // Save document
     doc.save("D:\\Tests\\output.pdf");
     

    The TextFragmentAbsorber object is basically used in text search scenario. When the search is completed the occurrences are represented with TextFragment objects that the TextFragmentAbsorber.TextFragments collection contains. The TextFragment object provides access to the search occurrence text, text properties, and allows to edit text and change the text state (font, font size, color etc).

    • Constructor Detail

      • TextFragmentAbsorber

        public TextFragmentAbsorber()

        Initializes a new instance of the TextFragmentAbsorber that performs search of all text segments of the document or page.


         The example demonstrates how to find text on the first PDF document page and replace the text.
        
         // Open document
         Document doc = new Document("D:\\Tests\\input.pdf");
         // Find font that will be used to change document text font
         Font font = FontRepository.findFont("Arial");
         // Create TextFragmentAbsorber object
         TextFragmentAbsorber absorber = new TextFragmentAbsorber();
         // Make the absorber to search all "hello world" text occurrences
         absorber.setPhrase ( "hello world");
         // Accept the absorber for first page
         doc.getPages().get(1).accept(absorber);
         // Change text of the first text occurrence
         absorber.getTextFragments().get_Item(1).setText ( "hi world");
         // Save document
         doc.save("D:\\Tests\\output.pdf");
         

        Performs text search and provides access to search results via TextFragmentAbsorber.TextFragments collection.

      • TextFragmentAbsorber

        public TextFragmentAbsorber(TextEditOptions textEditOptions)

        Initializes a new instance of the TextFragmentAbsorber with text edit options, that performs search of all text segments of the document or page.


         The example demonstrates how to find all text fragments on the first PDF document page and replace font for them.
        
          // Open document
          Document doc = new Document("D:\\Tests\\input.pdf");
        
          // Create TextFragmentAbsorber object
          TextFragmentAbsorber absorber = new TextFragmentAbsorber(new TextEditOptions(TextEditOptions.FontReplace
          .RemoveUnusedFonts));
        
          // Accept the absorber for first page
          doc.getPages()get(1).accept(absorber);
        
          // Find Courier font
          Font font = FontRepository.findFont("Courier");
          // Set the font for all the text fragments
          for (TextFragment textFragment :  (Iterable<TextFragment>)absorber.TextFragments)
          {
              textFragment.getTextState().setFont ( font);
          }
          // Save document
          doc.save("D:\\Tests\\output.pdf");
         
        Parameters:
        textEditOptions - Text edit options (Allows to turn on some edit features).

        Performs text search and provides access to search results via TextFragmentAbsorber.TextFragments collection.

      • TextFragmentAbsorber

        public TextFragmentAbsorber(String phrase)

        Initializes a new instance of the TextFragmentAbsorber class for the specified text phrase.


         The example demonstrates how to find text on the first PDF document page and replace the text and it's font.
        
         // Open document
         Document doc = new Document("D:\\Tests\\input.pdf");
         // Find font that will be used to change document text font
         com.aspose.pdf.Font font = FontRepository.findFont("Arial");
         // Create TextFragmentAbsorber object to find all "hello world" text occurrences
         TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello world");
         // Accept the absorber for first page
         doc.getPages().get_Item(1).accept(absorber);
         // Change text and font of the first text occurrence
         absorber.getTextFragments().get_Item(1).setText ( "hi world");
         absorber.getTextFragments().get_Item(1).getTextState().setFont ( font);
         // Save document
         doc.save("D:\\Tests\\output.pdf");
         
        Parameters:
        phrase - Phrase that the TextFragmentAbsorber searches

        Performs text search of the specified phrase and provides access to search results via TextFragmentAbsorber.TextFragments collection.

      • TextFragmentAbsorber

        public TextFragmentAbsorber(Pattern regex)

        Initializes a new instance of the TextFragmentAbsorber class for the specified System.Text.RegularExpressions.Regex class object.


        The example demonstrates how to find text on the first PDF document page and replace the text and it's font.
          // Open document
          Document doc = new Document("input.pdf");
          // Find font that will be used to change document text font
          Font font = FontRepository.findFont("Arial");
          // Create TextAbsorber object to find all instances of the input regex
          TextFragmentAbsorber absorber = new TextFragmentAbsorber(new Regex("h\\w*?o"));
          // Accept the absorber for first page
          doc.getPages().get_item(1).accept(absorber);
          // we should find "hello" word and replace it with "Hi"
          absorber.getTextFragments().get_item(1).setText("Hi");
          // Save document
          doc.save("output.pdf");
          
        Parameters:
        regex - System.Text.RegularExpressions.Regex class object that the TextFragmentAbsorber searches

        Performs text search of the specified phrase and provides access to search results via TextFragmentAbsorber.TextFragments(getTextFragments()/setTextFragments(TextFragmentCollection)) collection.

      • TextFragmentAbsorber

        public TextFragmentAbsorber(String phrase,
                                    TextSearchOptions textSearchOptions)

        Initializes a new instance of the TextFragmentAbsorber class for the specified text phrase and text search options.


         The example demonstrates how to find text with regular expression on the first PDF document page and replace
         the text.
        
         // Open document
         Document doc = new Document("D:\\Tests\\input.pdf");
         // Create TextFragmentAbsorber object that searches all words starting 'h' and ending 'o' using regular
         expression.
         TextFragmentAbsorber absorber = new TextFragmentAbsorber("h\\w*?o", new TextSearchOptions(true));
         // we should find "hello" word and replace it with "Hi"
         doc.getPages().get_Item(1).accept(absorber);
         absorber.getTextFragments().get_Item(1).setText ( "Hi");
        
         // Save document
         doc.save("D:\\Tests\\output.pdf");
         
        Parameters:
        phrase - Phrase that the TextFragmentAbsorber searches
        textSearchOptions - Text search options (Allows to turn on some search features. For example, search with regular expression)

        Performs text search of the specified phrase and provides access to search results via TextFragmentAbsorber.TextFragments collection.

      • TextFragmentAbsorber

        public TextFragmentAbsorber(Pattern regex,
                                    TextSearchOptions textSearchOptions)

        Initializes a new instance of the TextFragmentAbsorber class for the specified text phrase and text search options.


        The example demonstrates how to find text with regular expression on the first PDF document page and replace the text.
          // Open document
          Document doc = new Document("input.pdf");
          // Create TextFragmentAbsorber object that searches all words starting 'h' and ending 'o' using regular expression.
          TextFragmentAbsorber absorber = new TextFragmentAbsorber(new Regex("h\\w*?o"), new TextSearchOptions(true));
          // we should find "hello" word and replace it with "Hi"
          doc.getPages().get_Item(1).accept(absorber);
          absorber.getTextFragments.get_Item(1).setText("Hi");
          // Save document
          doc.save("output.pdf");
          
        Parameters:
        regex - Regex class object that the TextFragmentAbsorber searches
        textSearchOptions - Text search options (Allows to turn on some search features.)

        Performs text search of the specified phrase and provides access to search results via TextFragmentAbsorber.TextFragments(getTextFragments()/setTextFragments(TextFragmentCollection)) collection.

      • TextFragmentAbsorber

        public TextFragmentAbsorber(String phrase,
                                    TextSearchOptions textSearchOptions,
                                    TextEditOptions textEditOptions)

        Initializes a new instance of the TextFragmentAbsorber class for the specified text phrase, text search options and text edit options. The text edit options are not supported yet.


         The example demonstrates how to find text with regular expression on the first PDF document page and replace
         the text.
        
         // Open document
         Document doc = new Document("D:\\Tests\\input.pdf");
         // Create TextFragmentAbsorber object that searches all words starting 'h' and ending 'o' using regular
         expression.
         TextFragmentAbsorber absorber = new TextFragmentAbsorber("h\w*?o", new TextSearchOptions(true));
         // we should find "hello" word and replace it with "Hi"
         doc.getPages().get_item(1).accept(absorber);
         absorber.getTextFragments().get_Item(1).setText ( "Hi");
         // Save document
         doc.save("D:\\Tests\\output.pdf");
         
        Parameters:
        phrase - Phrase that the TextFragmentAbsorber searches
        textSearchOptions - Text search options (Allows to turn on some search features. For example, search with regular expression)
        textEditOptions - Text edit options (Allows to turn on some edit features. For example, define special behavior when requested symbol cannot be written with font). The parameter is not supported yet.

        Performs text search of the specified phrase and provides access to search results via TextFragmentAbsorber.TextFragments collection.

      • TextFragmentAbsorber

        public TextFragmentAbsorber(Pattern regex,
                                    TextEditOptions textEditOptions)

        Initializes a new instance of the TextFragmentAbsorber class for the specified text phrase and text edit options.

        Parameters:
        regex - System.Text.RegularExpressions.Regex class object that the TextFragmentAbsorber searches
        textEditOptions - Text edit options (Allows to turn on some edit features).

        Performs text search of the specified phrase and provides access to search results via TextFragmentAbsorber.TextFragments(getTextFragments()/setTextFragments(TextFragmentCollection)) collection.

      • TextFragmentAbsorber

        public TextFragmentAbsorber(String phrase,
                                    TextEditOptions textEditOptions)

        Initializes a new instance of the TextFragmentAbsorber class for the specified text phrase and text edit options.

        Parameters:
        phrase - Phrase that the TextFragmentAbsorber searches
        textEditOptions - Text edit options (Allows to turn on some edit features).

        Performs text search of the specified phrase and provides access to search results via TextFragmentAbsorber.TextFragments collection.

    • Method Detail

      • getTextFragments

        public TextFragmentCollection getTextFragments()

        Gets collection of search occurrences that are presented with TextFragment objects.

        Returns:
        TextFragmentCollection object
         The example demonstrates how to find text on the first PDF document page and replace all search occurrences
         with new text.
        
         // Open document
         Document doc = new Document("D:\\Tests\\input.pdf");
         // Find font that will be used to change document text font
         Font font = FontRepository.findFont("Arial");
         // Create TextFragmentAbsorber object to find all "hello world" text occurrences
         TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello world");
         // Accept the absorber for first page
         doc.getPages().get(1).accept(absorber);
         // Change text of all search occurrences
         for (TextFragment textFragment :  (Iterable<TextFragment>)absorber.getTextFragments())
         {
             textFragment.setText ( "hi world");
         }
         // Save document
         doc.save("D:\\Tests\\output.pdf");
         
      • setTextFragments

        public void setTextFragments(TextFragmentCollection value)

        Sets collection of search occurrences that are presented with TextFragment objects.

        Parameters:
        value - TextFragmentCollection object
                      The example demonstrates how to find text on the first PDF document page and replace all search
                      occurrences with new text.
        
                      // Open document
                      Document doc = new Document("D:\\Tests\\input.pdf");
                      // Find font that will be used to change document text font
                      Font font = FontRepository.findFont("Arial");
                      // Create TextFragmentAbsorber object to find all "hello world" text occurrences
                      TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello world");
                      // Accept the absorber for first page
                      doc.getPages().get(1).accept(absorber);
                      // Change text of all search occurrences
                      for (TextFragment textFragment :  (Iterable<TextFragment>)absorber.getTextFragments())
                      {
                          textFragment.setText ( "hi world");
                      }
                      // Save document
                      doc.save("D:\\Tests\\output.pdf");
                      
      • getPhrase

        public String getPhrase()

        Gets phrase that the TextFragmentAbsorber searches on the PDF document or page.

        Returns:
        String value
         The example demonstrates how to perform search text several times and perform text replacements.
        
         // Open document
         Document doc = new Document("D:\\Tests\\input.pdf");
         // Create TextFragmentAbsorber object to find all "hello" text occurrences
         TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello");
         doc.getPages().get(1).accept(absorber);
         absorber.getTextFragments().get_Item(1).setText ( "Hi");
         // search another word and replace it
         absorber.setPhrase ( "world");
         doc.getPages().get(1).accept(absorber);
         absorber.getTextFragments().get_Item(1).setText ( "John");
         // Save document
         doc.save("D:\\Tests\\output.pdf");
         
      • setPhrase

        public void setPhrase(String value)

        Sets phrase that the TextFragmentAbsorber searches on the PDF document or page.

        Parameters:
        value - String value
                      The example demonstrates how to perform search text several times and perform text replacements.
        
                      // Open document
                      Document doc = new Document("D:\\Tests\\input.pdf");
                      // Create TextFragmentAbsorber object to find all "hello" text occurrences
                      TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello");
                      doc.getPages().get(1).accept(absorber);
                      absorber.getTextFragments().get_Item(1).setText ( "Hi");
                      // search another word and replace it
                      absorber.setPhrase ( "world");
                      doc.getPages().get(1).accept(absorber);
                      absorber.getTextFragments().get_Item(1).setText ( "John");
                      // Save document
                      doc.save("D:\\Tests\\output.pdf");
                      
      • getTextSearchOptions

        public TextSearchOptions getTextSearchOptions()

        Gets search options. The options enable search using regular expressions.

        Overrides:
        getTextSearchOptions in class TextAbsorber
        Returns:
        TextSearchOptions object
         The example demonstrates how to perform search text using regular expression.
        
         // Open document
         Document doc = new Document("D:\\Tests\\input.pdf");
         // Create TextFragmentAbsorber object
         TextFragmentAbsorber absorber = new TextFragmentAbsorber();
         // make the absorber to search all words starting 'h' and ending 'o' using regular expression.
         absorber.setPhrase ( "h\w*?o");
         absorber.setTextSearchOptions ( new TextSearchOptions(true));
         // we should find "hello" word and replace it with "Hi"
         doc.getPages().get(1).accept(absorber);
         absorber.getTextFragments().get_Item(1).setText ( "Hi");
         // Save document
         doc.save("D:\\Tests\\output.pdf");
         
      • setTextSearchOptions

        public void setTextSearchOptions(TextSearchOptions value)

        Sets search options. The options enable search using regular expressions.

        Overrides:
        setTextSearchOptions in class TextAbsorber
        Parameters:
        value - TextSearchOptions object
                      The example demonstrates how to perform search text using regular expression.
        
                      // Open document
                      Document doc = new Document("D:\\Tests\\input.pdf");
                      // Create TextFragmentAbsorber object
                      TextFragmentAbsorber absorber = new TextFragmentAbsorber();
                      // make the absorber to search all words starting 'h' and ending 'o' using regular expression.
                      absorber.setPhrase ( "h\w*?o");
                      absorber.setTextSearchOptions ( new TextSearchOptions(true));
                      // we should find "hello" word and replace it with "Hi"
                      doc.getPages().get(1).accept(absorber);
                      absorber.getTextFragments().get_Item(1).setText ( "Hi");
                      // Save document
                      doc.save("D:\\Tests\\output.pdf");
                      
      • getTextEditOptions

        public TextEditOptions getTextEditOptions()

        Gets text edit options. The options define special behavior when requested symbol cannot be written with font.

        Returns:
        TextEditOptions object
      • setTextEditOptions

        public void setTextEditOptions(TextEditOptions value)

        Sets text edit options. The options define special behavior when requested symbol cannot be written with font.

        Parameters:
        value - TextEditOptions object
      • getTextReplaceOptions

        public TextReplaceOptions getTextReplaceOptions()

        Gets text replace options. The options define behavior when text fragment is replaced to more short.

        Returns:
        TextReplaceOptions value
      • setTextReplaceOptions

        public void setTextReplaceOptions(TextReplaceOptions value)

        Sets text replace options. The options define behavior when text fragment is replaced to more short.

        Parameters:
        value - TextReplaceOptions value
      • hasErrors_Fragment

        public boolean hasErrors_Fragment()

        Value indicates whether errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance.

        Returns:
        boolean value
      • getErrors

        public List<TextExtractionError> getErrors()

        List of TextExtractionError objects. It contain information about errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance.

        Overrides:
        getErrors in class TextAbsorber
        Returns:
        List of TextExtractionError objects
      • getText

        public String getText()

        Gets extracted text that the TextAbsorber extracts on the PDF document or page.

        Overrides:
        getText in class TextAbsorber
        Returns:
        String value
         The example demonstrates how to extract text from all pages of the PDF document.
         
         // open document
         Document doc = new Document(inFile);
         // create TextAbsorber object to extract text
         TextAbsorber absorber = new TextAbsorber();
         // accept the absorber for all document's pages
         doc.getPages().accept(absorber);
         // get the extracted text
         String extractedText = absorber.getText();
         
      • visit

        public void visit(Page page)

        Performs search on the specified page.


         The example demonstrates how to find text on the first PDF document page and replace the text.
        
         // Open document
         Document doc = new Document("D:\\Tests\\input.pdf");
         // Find font that will be used to change document text font
         Font font = FontRepository.findFont("Arial");
         // Create TextFragmentAbsorber object to find all "hello world" text occurrences
         TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello world");
         // Accept the absorber for first page
         absorber.visit(doc.getPages().get(1));
         // Change text of all search occurrences
         for (TextFragment textFragment :  (Iterable<TextFragment>)absorber.getTextFragments())
         {
             textFragment.setText ( "hi world");
         }
         // Save document
         doc.save("D:\\Tests\\output.pdf");
         
        Overrides:
        visit in class TextAbsorber
        Parameters:
        page - PDF document page object.
      • visit

        public void visit(IDocument pdf)

        Performs search on the specified document.


         The example demonstrates how to find text on PDF document and replace text of all search occurrences.
        
         // Open document
         Document doc = new Document("D:\\Tests\\input.pdf");
         // Find font that will be used to change document text font
         Font font = FontRepository.findFont("Arial");
         // Create TextFragmentAbsorber object to find all "hello world" text occurrences
         TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello world");
         // Accept the absorber for first page
         absorber.visit(doc);
         // Change text of the first text occurrence
         absorber.getTextFragments().get_Item(1).setText ( "hi world");
         // Save document
         doc.save("D:\\Tests\\output.pdf");
         
        Overrides:
        visit in class TextAbsorber
        Parameters:
        pdf - PDF document object.
      • applyForAllFragments

        public void applyForAllFragments(Font font)

        Applies font for all text fragments that were absorbed. It works faster than looping through the fragments if all fragments on the page(s) were absorbed. Otherwise it works similar with looping.

        Parameters:
        font - Fontof the text.
      • applyForAllFragments

        public void applyForAllFragments(float fontSize)

        Applies font size for all text fragments that were absorbed. It works faster than looping through the fragments if all fragments on the page(s) were absorbed. Otherwise it works similar with looping.

        Parameters:
        fontSize - Font size of the text.
      • applyForAllFragments

        public void applyForAllFragments(Font font,
                                         float fontSize)

        Applies font and size for all text fragments that were absorbed. It works faster than looping through the fragments if all fragments on the page(s) were absorbed. Otherwise it works similar with looping.

        Parameters:
        font - Fontof the text.
        fontSize - Font size of the text.
      • reset

        public void reset()

        Clears TextFragments collection of this TextFragmentAbsorber object.

      • removeAllText

        public void removeAllText(Page page)

        Removes all text from the specified page.

        Parameters:
        page - PDF document page object.
      • removeAllText

        public final void removeAllText(Page page,
                                        Rectangle rect)

        Removes text inside the specified rectangle from the specified page.

        Parameters:
        page - PDF document page object.
        rect - Rectangle to remove text inside.
      • removeAllText

        public void removeAllText(Document document)

        Removes all text from the document.

        Parameters:
        document - PDF document object.
      • visit

        public void visit(XForm xForm)

        Performs search on the specified form object.

        Overrides:
        visit in class TextAbsorber
        Parameters:
        xForm - Pdf form object.