In addition to fuzzy databases KTM also offers so-called dictionaries for the optimization of recognition. For example these dictionaries can be used in the regular expressions of a format locator to find dates of the form “01. December 2015”. The dictionary should consist of all the month names (January, February, March, …)
The KTM fuzzy databases can be searched by using the KTM script language. There are also sample programs offered by KOFAX (for example in the scripting help of KTM or here: “Best Practices “). To my knowledge there are no such examples for a search in a dictionary. In a recognition project for German license plates I had to search in a dictionary by script. I would like to briefly explain the reasons for this and present a sample script.
A regular expression of the following form is usually enough to recognize a German license plate number (One to three characters – delimiter – one to two characters – delimiter – one to four numbers, example: SG CC 876):
[A-ZÄÖÜ]{1,3}\x20?[\.|\x20|-]\x20?[A-Z]{1,2}[\.|\x20|-]?[0-9]{1,4}
The first characters are a code for the city or the administrative district where the car is registered.
However, this regular expression may find also strings that are not valid, since only one to three letters are searched at the front. It would be nice, if one could use a list of valid city codes instead of [A-ZÄÖÜ]{1,3}. By doing so, the list of recognized, but invalid number plates would become much smaller. KTM dictionaries come in handy there. Lists of valid city codes are available in the Internet and a suitable dictionary file (KFZ-Staedte) with the valid codes is created quickly:
AIC
AK
AM
AN
ANA
AÖ
AP
AS
ASL
ASZ
AUR
:
:
This format locator
delivers the correct result:
However, the test of another document failed – no number plate was detected:
Here is a weakness in the integration of dictionaries into regular expressions: It only works if the dictionary string is separated by a space, tabulator,… from the rest of the regular expression.
This is the case with the first example: “COE – EW 247”. The second example “COE.EW.247” has points as a separator between the individual parts of the number plate and the integration of the dictionary does not work as desired.
But I did not want to do without the optimized recognition of the city codes. Thus I used again the ‘original’ regular expression:
[A-ZÄÖÜ]{1,3}\x20?[\.|\x20|-]\x20?[A-Z]{1,2}[\.|\x20|-]?[0-9]{1,4}
But now I took the recognized city code and checked it against the dictionary by script. If the test is positive, the license plate is accepted, otherwise it is discarded.
Here is a sample script showing how to search for strings in a KTM dictionary:
1Function ExistiertStadtAusAMKZ(kennzeichen) As Boolean 2 3'The following format for the number plate (kennzeichen) is expected: COE.EW.247 4 5Dim DictResItems As CscDictionaryResItems 6Dim Dict As CscDictionary 7Dim strData As String 8Dim strReplaceVal As String 9Dim QueryText As String 10Dim pos As Integer 11 12ExistiertStadtAusAMKZ=False 13pos=InStr(kennzeichen,".") 14 15If pos>0 Then 16 QueryText=Left(kennzeichen,pos-1) 'city code 17 Set Dict = Project.Dictionaries.ItemByName("KFZ-Staedte") 18 Set DictResItems=Dict.Search(QueryText,CscEvalMatchQuery,5) 19 If DictResItems.Count>0 Then 20 'strData holds the code 21 'strReplaceVal holds the optinal replacement value from dictionary 22 Dict.GetRecordData(DictResItems(0).RecID,strData,strReplaceVal) 23 'ExistiertStadtAusAMKZ=True 'something was found 24 Else 25 'nothing was found 26 ExistiertStadtAusAMKZ=False 27 End If 28End If 29 30End Function
More articles
fromJürgen Voss
Your job at codecentric?
Jobs
Agile Developer und Consultant (w/d/m)
Alle Standorte
Gemeinsam bessere Projekte umsetzen.
Wir helfen deinem Unternehmen.
Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.
Hilf uns, noch besser zu werden.
Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.
Blog author
Jürgen Voss
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.