Diesel city driving bans as a use case for machine and deep learning applications
In this series of articles, we would like to give you an understanding of different machine and deep learning approaches using the example of detecting diesel cars by recognition of environmental zone badges or type labels on vehicles. In this article, we implement a HOG detector (Histogram of Oriented Gradients) based on the dlib C++ library.
The links to the other parts can be found here:
>codecentric.ai Homepage
>Deep Diesel – Part 1: Machine & Deep Learning for diesel car detection
>Deep Diesel – Part 2: Machine Learning Diesel car detection using a HOG detector
>Deep Diesel – Part 3: Deep Learning diesel car detection using the AWS DeepLens
>codecentric.ai youtube channel
dlib C++ Library
A quick introduction to working with machine learning in the field of computer vision is provided by the dlib Library, which combines a variety of algorithms. The library is a C++-based library, but also offers a Python interface for a subset of functions. http://dlib.net/ . A look at the very wide range of examples and comments delivered with the library is definitely worthwhile: from face recognition and vehicle detection to deep-learning-based dog hipsterizers, different scenarios can be found in the examples.
A good help for the installation can be found here.
Detection by the HOG detector
We decided to start with a HOG detector (Histogram of Oriented Gradients ), which primarily refers to the geometric properties of the object. The big advantage of this detector is that very little training data is required for initial results. The basic idea is a localized evaluation (counting) of the occurrence of gradients (edges) and the summary in a characteristic histogram. This histogram can then be used as a template to identify similar structures in other images. Localized in this context means that many histograms are created and evaluated along a narrow grid for the image.
This procedure makes it possible to recognize geometric objects even after displacement, to some extent also during scaling, but unfortunately, the recognition is not rotation- or perspective-invariant, since the orientation of the gradients in the image to be evaluated changes.
In our use case, the HOG detector does not use an obvious characteristic – the color of the environmental badge – but is exclusively oriented towards the round shape and the high-contrast “4”. To get an impression of what the algorithm is based on, you can visualize the feature gradients.
The following method extracts the features from our image:
void extract_fhog_features(
const image_type& img,
array2d<matrix<T,31,1>,mm>& hog,
int cell_size = 8,
int filter_rows_padding = 1,
int filter_cols_padding = 1
);
The corresponding call of the method to visualize 32-pixel blocks is:
extract_fhog_features(img,hog,32);
A good starting point to understand this for your own pictures and objects is the example file delivered with the dlib distribution: fhog_ex.cpp http://dlib.net/fhog_ex.cpp.html , which can easily be patched to your own needs. Here you can see some examples for the environmental badge, analyzed in different block-sizes.
Visualization of feature gradients for 2562 ,1282 ,322 ,82 – Pixel Grids
To implement the detector, we use the Python interface and follow the typical steps to create a detector:
- Collect training data
- Labeling of the data
- Training and parameterization of the detector
- Testing the detector
- Evaluation of the live data
Collect training data
While working on the detector, we tried several methods of collecting and influencing the performance of the detector.
- Taking photos from the central perspective (camera) of parked vehicles
- Taking photos from a two-point perspective (camera) of parked vehicles
- Extracting frames from a two-point perspective (webcam) of moving vehicles
Photos frontal, two-point perspective and frame from the video stream
Initially, without taking into account the characteristics of the algorithm, we took frontal photos of parked vehicles, adjusted the perspective and finally we switched to the use of videos of vehicles in motion. The performance of the results has improved from step to step, with the biggest performance boost occuring while switching to use the videos. In particular, the correct perspective and use of the same device for the recording of training data, test data, and later detection are the factors with the biggest influence. The easiest way to collect test data is to record a video of passing vehicles and then save selected frames on which the environmental sticker is clearly visible.
Furthermore, it should be noted that with the HOG detector it is more important to find few representative images than many different training data. Many images dilute the histogram we are looking for in the test data and make the result worse rather than better.
Labeling the data
For labeling the data we use the tool imglab, which can be found in the dlib/tools/ folder after compiling the dlib C++ library. With the command
'./imglab -c deepsnow.xml ../outside/'
an xml file with the list of training data is initially generated, which is then loaded with
'./imglab deepsnow.xml'
to attach the labels.
After entering the label name, the Regions of Interest (RoIs) are bordered. It has been shown that objects should not be encircled with pixel accuracy since the gradients within the marking are interpreted as characteristic features. The result is an.xml file with the paths to the data and the coordinates to the RoIs.
<?xml version='1.0' encoding='ISO-8859-1'?>
<?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?>
<dataset>
<name>imglab dataset</name>
<comment>Created by imglab tool.</comment>
<images>
<image file='trainingsdata/20180313_085705.jpg'>
<box top='1779' left='855' width='90' height='65'>
<label>4</label>
</box>
</image>
</images>
Example of an XML file with meta information about the training data
Training and parameterization of the algorithm
To train the classifier, we create a minimal Python script in which we read the xml file with the labels and do the necessary parameterization (training_options:http://dlib.net/python/index.html#dlib.train_simple_object_detector ).
import dlib
options = dlib.simple_object_detector_training_options()
options.add_left_right_image_flips = False
options.num_threads = 16
options.be_verbose = True
options.C = 5
detector = dlib.train_simple_object_detector("./deepsnow.xml", "dieselfilter.svm", options)
We can optionally double the training data (data augmentation) by vertically mirroring images for mirror-symmetric objects (not applicable to characters like the ‘4’ in our case). Depending on the machine on which we train, we can adjust the number of threads that are spawned for parallelization of the training. The parameter C corresponds to the SVM C regularization parameter, which is a measure of how much the training algorithm should punish misclassifications. The higher C, the narrower the classification limits are drawn, which can ultimately lead to an overfitting of the training data. A matching C can be found by ‘clever’ trying. A good representation of the effect can be found here: https://datascience.stackexchange.com/questions/4943/intuition-for-the-regularization-parameter-in-svm
An example call with
$>python2.7 dieseltraining.py leads to the following output:
Output:
Training with C: 5
Training with epsilon: 0.01
Training using 8 threads.
Training with sliding window 85 pixels wide by 75 pixels tall.
Upsample images...
Upsample images...
objective: 142.648
objective gap: 142.642
risk: 28.5284
risk gap: 28.5284
num planes: 3
iter: 1
objective: 47.5514
objective gap: 47.5011
risk: 9.50024
risk gap: 9.50022
num planes: 4
iter: 2
...
iter: 385
Training complete.
Trained with C: 5
Training with epsilon: 0.01
Trained using 4 threads.
Trained with sliding window 85 pixels wide by 75 pixels tall.
Upsampled images 1 time to allow detection of small boxes.
Saved detector to file dieselfilter.svm
The output describes the optimization of the target function ‘objective’ while minimizing the risk of a misclassification ‘risk’ by adjusting the parameter vector that is not output here. The goal is to bring the risk gap under epsilon. A detailed introduction to the topic can be found in the paper ‘Predicting Structured Objects with Support Vector Machines’ – https://dl.acm.org/citation.cfm?id=1592783&dl=ACM&coll=DL (closed access).
Testing the detector
A basic methodology in machine learning is the division of the available labeled images into training and test sets. The detector is trained with the images of the training set and the performance is measured by classifying the images from the test set.
dlib.test_simple_object_detector(isitdiesel.xml, "dieselfilter.svm")
Example Output:
Testing accuracy: precision: 1, recall: 1, average precision: 1
Precision is a metric for measuring the accuracy of the detections – a precision of 1 means that there were no false positive (‘false alarm’) detections. A precision of 0 means that wrong detections were made in each image.
The recall specifies the probability of correctly detected objects. A recall of 1 corresponds to a detector that has found all objects. A recall of 0 means no object was found.
The average precision is a measure of the overall quality of the detector, which is calculated over all detections. The closer to 1, the better the detector on the test set.
A detailed description can be found here.
Detecting Diesel Cars
The aim of the scenario was to perform a real-time analysis of the traffic flow, i.e. to perform the detection on a live video stream. To do this, we have chosen a simple setup, which is easily reproducible.
Checkpoint on the company car park
To get an evaluation of the live data of the traffic, we built a small application that takes a video stream and runs the single images through our detector.
import cv2
import dlib
import imutils
CAM_URL = 0
cam = cv2.VideoCapture(CAM_URL)
frame_idx = 0
detector = dlib.simple_object_detector("dieselfilter.svm")
while cam.isOpened():
ret, frame = cam.read()
frame = imutils.resize(frame, width=1024)
orig = frame.copy()
frame_idx += 1
detections = detector(frame)
if len(detections) > 0:
print(" Number of Diesels detected: {}".format(len(detections)))
for k, d in enumerate(detections):
print("Diesel {}: Left: {} Top: {} Right: {} Bottom: {}".format(
k, d.left(), d.top(), d.right(), d.bottom()))
cv2.rectangle(frame, (d.left(), d.top()), (d.right(), d.bottom()), (0, 0, 255), 2)
cv2.imshow("raw", frame)
cv2.moveWindow("raw", 0, 0)
key = cv2.waitKey(1) & 0xff
if key == 27:
break
Since the frames are to be evaluated live and current images are always updated with cam.read() only between evaluations with the detector, the frame rate of the evaluation is extremely low. For a fluid evaluation, a pre-recorded video frame by frame can be evaluated by the detector.
Screen-grabbing of the detector output (with added lettering ‘Detection’)
Furthermore, it is possible to detect the type labels of cars, in this case the Audi TDI logo.
TDI Diesel type label detection
Conclusion and performance
The simple detector we presented is easy to implement, but it has barely sufficient performance in this setup regarding the application scenario. In order to increase it, some effort would still have to go into the parameterization of the algorithm. Aspects such as the strong dependence on perspective are particularly challenging. Aspects such as motion blur and low frame rate are also a problem in the test set up with increasing speed, because the detector solely depends on clear contours and on the environmental sticker being recognizable in sufficient size. These aspects can certainly be addressed with improved hardware in terms of optical quality and computing power. Since the detector does not draw any information from the color channels, it would also be possible to switch to the infrared range and take single shots with flash, as is already the case with the devices currently used in traffic monitoring.
For simple object detection scenarios in controlled environments, the algorithm is very well suited to deliver fast, reliable results. A particularly positive feature is the fact that very little training data is required.
In the next setup, we will look at the features and performance of detectors based on deep learning approaches.
The links to the other parts can be found here:
>codecentric.ai Homepage
>Deep Diesel – Part 1: Machine & Deep Learning for diesel car detection
>Deep Diesel – Part 2: Machine Learning diesel car detection using a HOG detector
>Deep Diesel – Part 3: Deep Learning diesel car detection using the AWS DeepLens
>codecentric.ai youtube channel
More articles
fromKai Herings
Your job at codecentric?
Jobs
Agile Developer und Consultant (w/d/m)
Alle Standorte
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.
Gemeinsam bessere Projekte umsetzen.
Wir helfen deinem Unternehmen.
Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.
Hilf uns, noch besser zu werden.
Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.
Blog author
Kai Herings
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.