com.aspose.pdf

Interfaces

Classes

Enums

Exceptions

com.aspose.pdf

Class ParagraphAbsorber



  • public class ParagraphAbsorber
    extends Object

    Represents an absorber object of page structure objects such as sections and paragraphs. Performs search for sections and paragraphs of text and provides access for rectangles and polydons that describes it in text coordinate space. Also performs text segments search and provides access to search results via TextFragments collections grouped by structure elements.


    The example demonstrates how to find first text segment of each paragraph on the first PDF document page and highlight it. // Open document Document doc = new Document("input.pdf"); // Create ParagraphAbsorber object ParagraphAbsorber absorber = new ParagraphAbsorber(); // Accept the absorber for first page absorber.visit(doc.getPages.get_Item(1)); // Get markup object of first page PageMarkup markup = absorber.getPageMarkups().get(0); // Loop through structure elements of the page text to find first text fragment of each paragraph for (MarkupSection section : markup.getSections()) { for (MarkupParagraph paragraph : section.getParagraphs()) { TextFragment fragment = paragraph.getFragments().get_Item(0); // Update text properties fragment.getTextState().setBackgroundColor (Color.getLightBlue()); } } // Save document doc.save(GetOutputPath("output.pdf"));
    When the search is completed the ParagraphAbsorber.PageMarkups collection will contains PageMarkup objects that represents page structure by collections of MarkupSection and MarkupParagraph. The TextFragment object provides access to the search occurrence text, text properties, and allows to edit text and change the text state (font, font size, color etc).
    • Constructor Summary

      Constructors 
      Constructor and Description
      ParagraphAbsorber()
      Initializes a new instance of the ParagraphAbsorber that performs search for sections/paragraphs of the document or page.
      ParagraphAbsorber(int sectionsSearchDepth)
      Initializes a new instance of the ParagraphAbsorber that performs search for sections/paragraphs of the document or page.
    • Constructor Detail

      • ParagraphAbsorber

        public ParagraphAbsorber()

        Initializes a new instance of the ParagraphAbsorber that performs search for sections/paragraphs of the document or page.

      • ParagraphAbsorber

        public ParagraphAbsorber(int sectionsSearchDepth)

        Initializes a new instance of the ParagraphAbsorber that performs search for sections/paragraphs of the document or page.

        Parameters:
        sectionsSearchDepth - Number of sequential searches for more fine elements of structure that will be performed.
        See ParagraphAbsorber.SectionsSearchDepth property for more hints about the parameter.
    • Method Detail

      • getPageMarkups

        public List<PageMarkup> getPageMarkups()

        Gets collection of PageMarkup that were absorbed.

        Returns:
        List of PageMarkup instances
      • getSectionsSearchDepth

        public int getSectionsSearchDepth()

        Gets or sets value that instructs how many times sequential searches for more fine elements of structure will be performed. Default search depth is 3. It means three searches for horizontally divided sections (headers, paragraphs etc) and three searches for vertically divided ones (columns).


        Increasing of this value may lead to minor decreasing performance with no visible changes in search result. Decreasing of this value may lead to incorrect determination of paragraphs in sections. We are not recommend to set value less than default if you aren't desire to get only 'rough' elements of page structure.
        Returns:
        int value
      • setSectionsSearchDepth

        public void setSectionsSearchDepth(int value)

        Gets or sets value that instructs how many times sequential searches for more fine elements of structure will be performed. Default search depth is 3. It means three searches for horizontally divided sections (headers, paragraphs etc) and three searches for vertically divided ones (columns).


        Increasing of this value may lead to minor decreasing performance with no visible changes in search result. Decreasing of this value may lead to incorrect determination of paragraphs in sections. We are not recommend to set value less than default if you aren't desire to get only 'rough' elements of page structure.
        Parameters:
        value - int value
      • visit

        public void visit(Document doc)
        Performs search for sections and paragraphs on the specified Document.
        Parameters:
        doc - Pdf document object.
      • visit

        public void visit(Page page)

        Performs search on the specified Page.

        Parameters:
        page - Pdf pocument page object.