UIMA Pipeline - Apache UIMA project - Part 2
In my previous article, we identified the use of descriptor files and using them we created a custom annotator which was then initialized with and with out the help of UIMA tool support like the CAS Visual Debugger (CVD). In this article, we will look into using several annotators in a series fashion (hence the name pipeline), one after the other in order to analyze a document.
As shown above in the diagram, we can set a series of annotators in order to analyze a document. This is one way we can arrange annotators provided in UIMA and there is also the possibility of arranging them in parallel, which will be discussed in the future. In my previous article, I have provided you with reference on how to create a custom annotator (in which the 'RoomNumberAnnotator' was created) and we will be using two such sample annotators (lets take them as HashtagAnnotator and NounPhraseAnnotator) so we can demonstrate the pipeline in action. In order to arrange these 2 annotators to act in series, we have to create an aggregate analysis engine which will define the flow configuration along with the annotators being used. Creating an aggregate descriptor is shown below. 1.Create an Analysis Engine (preferably within the 'desc' folder of the project)
2. Open it using 'Component Descriptor Editor' and set the engine type to 'Aggregate'
3. Then in the Aggregate tab, you can add the component engines (descriptor files), which includes the Type System Definitions which are used in respective Annotators.
4. Once you add both the descriptors, you can save the file by pressing 'CTRL + S' and complete the process.
After the configuration, the aggregate analysis engine would look like this.
Once this is completed, make sure you set the correct descriptor file to initiate the analysis process.
private AnalysisEngine analysisEngine = null; public AnalysisEngineInitiator() throws ResourceInitializationException { File descriptorFile = new File("./desc/AnalysisEngine.xml"); XMLInputSource descriptorSource = null;
try { descriptorSource = new XMLInputSource(descriptorFile); } catch (IOException e) { e.printStackTrace(); } ResourceSpecifier specifier = null; try { specifier = UIMAFramework.getXMLParser().parseResourceSpecifier(descriptorSource); } catch (InvalidXMLException e) { e.printStackTrace(); } analysisEngine = UIMAFramework.produceAnalysisEngine(specifier);
}
Now you can run the project by sending plain text, which will be annotated using both the annotators, one after the other. Cheers...!!!











