Direct Document Capture in Nuxeo Using Ephesoft and CMIS
Our partner, Nuxeo, blogs about how document capture can be a critically important part of any system that uses a content repository. They also mention how another partner of Netlocity’s, Ephesoft, offers an excellent solution for document capture, data extraction, and built-in support for exporting documents via CMIS.
Ephesoft (in this case the Enterprise Edition) offers “intelligent document capture”. It provides the ability to scan physical and electronic documents, automatically process them for arbitrary content (ICR, OCR, images, etc.), and export/report on the results.
Content Management Interoperability Services (CMIS) is an open standard that allows different content repositories to interoperate. Specifically, CMIS defines an abstraction layer for document management using web protocols.
To be clear, the support for CMIS in Ephesoft means that it can export content to anyrepository that properly supports CMIS with no special tooling or integration. Nuxeo happens to have excellent CMIS support so this kind of integration is really easy.
If you would like to find out more about connecting content management applications using CMIS, watch our webinar where CMIS visionary Jeff Potts and I discuss CMIS and its value.
Here’s a short rundown on how to capture documents directly in Nuxeo using Ephesoft and CMIS. Note that in this example Ephesoft is running on Windows so any file paths are in Windows format.
You can find a complete tutorial to setting up document capture here:
http://www.ephesoft.com/wiki/index.php?title=Tutorial
I will just summarize the basic steps and follow with some helpful tips:
Define a “document type”.
Define the “index fields”.
Define the “key value extraction” for those fields.
Tip: I recommend using the Advanced Extraction in most cases, as opposed to the Key-Value Extraction, because it’s more explicit and intuitive. With Advanced Extraction you visually define the explicit capture area with the label (green) and the field (red) like so:
Here’s a helpful video about how to setup Advanced Extraction:
http://wiki.ephesoft.com/advanced-key-value-extraction
Tip: Ephesoft uses “confidence” scores to determine if a document matches a particular Document Type, and if the fields match or not. If something does not match, user intervention is generally required. Confidence scores are not a percentage but an index that is capped at 40. For basic testing and development it’s perfectly acceptable to set the confidence score to 0 to avoid any human intervention.
Tip: If you’re accessing the Ephesoft UI from somewhere other than the host you may find that document images to not show up. To fix this you need to modify the file “C:\Ephesoft\Application\WEB-INF\classes\META-INF\dcma-batch\dcma-batch.properties”. Set the property “batch.base_http_url” to match the IP address or hostname of the Ephesoft server.
You need to create a folderish document into which Ephesoft will export the documents.
You may need to create a new document type in Nuxeo to support the information coming from Ephesoft. This depends on whether or not you want to reuse an existing document type – in this case beware of any events/automation for that document type – or create a new one to decouple the documents coming from Ephesoft from any existing content. In the latter case this gives you complete control over what happens after the documents arrive, without affecting any existing business logic.
For security reasons you may want to create a user specifically for Ephesoft to use, with appropriate permissions so the user doesn’t have full access to the whole repository.
To integrate Nuxeo and Ephesoft via CMIS you only need to complete two steps:
Configure the CMIS plug-in.
Configure the field mapping.
Use the Ephesoft Admin Client to perform these steps.
From the “Batch Class Management” tab, open your batch class.
Double-click “CMIS-Export”.
Then click the Edit button to make the necessary changes. Here is an example:
Configure the following options:
Cmis Root Folder Name – this is the folder you created in Nuxeo to receive the documents. The path should be relative to the repository name.
Cmis Upload File Extension – can be “pdf” or “tiff”.
Cmis Server URL – Use the format “http://server:port/nuxeo/atom/cmis”.
Cmis Server User Name – Nuxeo username that has write access to the “Cmis Root Folder Name”.
Cmis Server User Password – password for the Nuxeo user.
Cmis Server Repository Id – the name of the Nuxeo repository, usually “default”.
Cmis Server Switch ON/OFF – make sure this is set to “ON”.
Do not enter a leading slash.
Do not enter a trailing slash.
Click “OK” the save the edit, and be sure to click “Apply” to permanently commit the changes. Finally click “Validate” and then “Deploy Workflow” any time you make plug-in changes.
Locate the file “C:\Ephesoft\SharedFolders\BC4\cmis-plugin-mapping\DLF-Attribute-mapping.properties”. Here you must define the mapping between your Ephesoft document type and the corresponding Nuxeo document type. Ephesoft values are on the left, Nuxeo on the right.
When you configured your batch class, you defined a folder where Ephesoft will expect to find documents to import (the “UNC Folder” property). Drop a PDF or TIFF in this folder and Ephesoft will work its magic. After a few minutes you’ll end up with a document in Nuxeo at the path you configured. Easy peasy!
Tip: If something doesn’t work, open the “Batch Instance Management” tab in the Ephesoft Admin client, locate the failing batch, click the “>>” button and then the “Troubleshoot” button.
This allows you to download a copy of all the logs and involved documents for that batch. Generally the Application Log contains the most useful information.
Tip: A failing batch can be restarted using the “Restart” button; this restarts the failing step, not the entire batch! If the CMIS export isn’t working, you can easily make changes and retry just the export.
Read more Nuxeo blogs here!
Have a question? Contact Netlocity!