Beliebte Suchanfragen
|
//

Kofax Transformation Modules – format locators and dynamic regular expressions – Part 2

1.2.2013 | 4 minutes of reading time

Part 2: Dynamic regular expressions in KTM

In the first part of this blog article I explained the use of KTM format locators and regular epressions. Now I will try to explain how flexible KTM projects can be designed by using the KTM internal scripting language. But you should be familiar with KTM’s scripting language and the KTM object model.

KTM format locators (see part 1 ) are static expressions, when they have been defined in  the KTM Project Builder. They are used with their defined values within the Kofax Capture workflow during runtime.

But there might be the – admittedly very rare – case, that you have to change the regular expression of a format locator during runtime, because of general conditions. Unfortunately this doesn’t work ‘out of the box’. But within the rich building set of KTM there is a library which will enable this functionality.

Recently we had to setup a document classification/extraction project at a scan service provider who works for financial institutions. The challenge was to develop one project, which should work for several clients. We had to deal with document types, where the described ‘static’ format locators could not deliver sufficent results. We were in need of some type of a format locator whose regular expression could be modified during runtime (depending on client specific data). As KTM provides a VB-compatible scripting language and due to some knowledge of the KTM object model, we were able to master this challenge

The documents of this special document type had the same layout and content for all clients. The difference was just a certain part of text on the document – depending on the client.  Depending on the location of this text (upper left or upper right corner), an account number had to be read out of a field which was located at bottom left or bottom right of the document.

The mapping between the client and the specific text part was provided in an initialization file outside of KTM:

100 ; Hamburg
110 ; Berlin
120 ; Bremen
:
:

If needed, this file can be edited anytime independent of the KTM project. The client number  (100, 110, 120, …)  is read by the KTM project during the runtime of the Kofax Capture scanning system.

Within the KTM project we defined a format locator, which checks if the client specific text (Hamburg, Berlin, Bremen, …) is printed in the upper left or upper right corner. The regular expression of this format locator was dynamically fed with the client’s specific text (Hamburg, Berlin, Bremen, …) during the runtime of the KTM project.  That way we succeeded in changing the regular expression of a format locator during runtime only by editing an external initialization file.

Because of the complexity of the described project, I will explain the dynamically change of the regular expression with the simple example of the insurance number from part 1 of this blog article.

First of all you have to create a reference to a KTM library. This is done in the Project Builder scripting environment on the appropriate document class:

In part 1 of this article we have setup a format locator FL_VSNR with the regular
expression 20\d{2}/\d{1,10}:

In order to change the regular expression during runtime, you have to insert a scripting locator (SL_ChangeRE in the screen below) ABOVE the format locator, whose regular expression has to be changed. The scripting locator must be defined above the format locator. So he will be executed before the format locator, as the scripting locator must change the regular expression of the format locator.

The scripting locator SL_ChangeRE consists of the following piece of scripting, which changes the regular expression of the format locator FL_VSNR to the new value 20\d{2}/\d{2}:

1' Class script: Dokumente
2Private Sub SL_ChangeRE_LocateAlternatives(ByVal pXDoc As CASCADELib.CscXDocument, _
3            ByVal pLocator As CASCADELib.CscXDocField)
4
5Dim NewRegEx As String
6Dim oLocator As CscRegExpLib.CscRegExpLocator
7
8'get format Locator FL_VSNR
9Set oLocator = Project.ClassByName("Dokumente").Locators.ItemByName("FL_VSNR").LocatorMethod
10'set new regex for FL_VSNR
11NewRegEx="20\d{2}/\d{2}"
12oLocator.RegularExpressions(0).RegularExpression=NewRegEx
13End Sub

The behaviour of this script can be tested directly within the KTM Project Builder with our test document:

*** Remark: Versicherungsnummer = insurance number ***

The original format locator (see part 1 ):

The document extraction will result in the known value 2011/47123, if the scripting locator is not used:

If the script locator is used the result will change to 2011/47:

If you take a look at the format locator in Project Builder after the extraction, you will see, that the regular expression has been changed actually:

I hope this inner view of some parts of the KTM toolbox shows, that KTM is indeed a very configurable product. I am looking forward to any further hints or tricks in the usage of KTM tools and its scripting language. Within the next months I will try to publish more articles about KTM in this place.

New: article about document classification with KTM

New: KTM and insurance companies: Document Process Automation

|

share post

//

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.