com.aspose.ocr

Class OcrEngine



  • public final class OcrEngine
    extends Object

    Main Aspose.OCR class. Users will use instance of this class most of the time. OcrEngine provides methods for image processing, choosing language and recognition process


     
     OcrEngine ocr = new OcrEngine();
     ocr.setImage(ImageStream.fromFile("image.tiff"));
     if (ocr.process())
     {
         System.out.println(ocr.getText());
     }
     
    • Constructor Detail

      • OcrEngine

        public OcrEngine()

        Initializes a new instance of the OcrEngine class. By default English language is loaded into LanguageContainer.

    • Method Detail

      • addNotifier

        public void addNotifier(INotifier processor)

        Adds notifier.

        Each notifier can send event (recognized word, recognized several characters). You can add many notifiers.


         
         OcrEngine ocr = new OcrEngine();
         ocr.setImage(ImageStream.fromFile(pictureFileName));
         final INotifier wordNotifier = NotifierFactory.wordNotifier();
         wordNotifier.Elapsed.add(new NotifierHandler()
         {
         //param processor The processor to add.
         public void invoke(Object sender, IRecognizedText recognizedText)
         {
         System.out.println(" text : " + wordNotifier.getText());
         }
         });
         final INotifier blockNotifier = NotifierFactory.blockNotifier();
         blockNotifier.Elapsed.add(new NotifierHandler()
         {
         public void invoke(Object sender, IRecognizedText recognizedText)
         {
         Assert.assertEquals("shod regular text", blockNotifier.getText());
         }
         });
         ocr.addNotifier(wordNotifier);
         ocr.addNotifier(blockNotifier);
         if (ocr.process())
         {
         System.out.println(ocr.getText());
         }
         
      • clearNotifies

        public void clearNotifies()

        Clear notifiers list.

      • dispose

        public void dispose()

        Disposes the engine

      • getConfig

        public OCRConfig getConfig()

        Gets configuration.


         
         OcrEngine ocr = new OcrEngine();
         ocr.getConfig().setAdjustRotation(AdjustRotationMode.Automatic);
         
        Throws:
        com.aspose.ms.System.ArgumentNullException - Thrown when value is null.
      • getLanguageContainer

        public LanguageContainer getLanguageContainer()

        Gets container with recognition languages. Enables user to define a list of languages to use for recognition. You may add several languages here. By default English language is loaded.

        Recognition of multiple languages. The text is recognized by words. Each recognized word has a specific language. There is a priority of recognition languages. Language that was added earlier to the collection has a higher priority. If the word is identical in several languages, a language that was earlier added to the collection will be selected.


          
          OcrEngine ocr = new OcrEngine();
          ocr.setImage(ImageStream.fromFile("image.tiff"));
          ocr.getLanguageContainer().addLanguage(LanguageFactory.load(
                  "Portuguese-RSC-HS-PB-ResourcesAllCharsNet.zip")); // Resource file name
          if (ocr.process())
          {
              System.out.println(ocr.getText());
          }
          
      • getPages

        public Page[] getPages()

        Gets recognized text divided to pages. This property is only available after recognition is complete, otherwise exception will be raised.

        Throws:
        OcrException - Thrown when use before recognition.
      • getPreprocessedImages

        public PreprocessedImages getPreprocessedImages()

        Containes several modifications of source image, that were obtained during different preprocessing steps


        
         OcrEngine ocr = new OcrEngine();
         ocr.setImage(ImageStream.fromFile("image.tiff"));
         ocr.getConfig().setSavePreprocessedImages(true);
         if (ocr.process())
         {
             BufferedImage im= ocr.getPreprocessedImages().getBinarizedImage();
         }
         
      • getProcessAllPages

        public boolean getProcessAllPages()

        Gets a value indicating whether all frames in image must be processed.

      • getText

        public IRecognizedText getText()

        Gets recognized text. This property is only available after recognition is complete, otherwise exception will be raised.

        Throws:
        OcrException - Thrown when use before recognition.
      • process

        public boolean process()

        Runs the recognition process.

        OcrEngine must be configured before running this method, otherwise Exception will be thrown. Once this method is called, you may get recognized text from Text property. Before calling the method, add at least one language to Languages and set image.

        Returns:
        A value indicating whether text has been recognized succesfully.
        Throws:
        OcrException - Thrown if it instance is not configured.
      • setConfig

        public void setConfig(OCRConfig value)

        Sets configuration.


         
         OcrEngine ocr = new OcrEngine();
         
        Throws:
        com.aspose.ms.System.ArgumentNullException - Thrown when value is null.
      • setImage

        public void setImage(IImageStream value)

        Sets the picture to recognize the text from. This property must be set before recognition or specified as one of arguments during “Process” call.

      • setProcessAllPages

        public void setProcessAllPages(boolean value)

        Sets a value indicating whether all frames in image must be processed.