The intelligent recognition of objects is a basic prerequisite for individually controlling the automation process in versatile and highly flexible production. In the HAII4You Factory, the Industry 4.0 demonstrator, which serves as the testbed for the latest enabling technologies in Integrated Industry, for the 2019 Hannover Messe trade show HARTING has teamed with the German Research Center for Artificial Intelligence (DFKI) to implement object recognition on the MICA® by way of AI. Here, a camera and a trained system detect whether a correctly assembled product has been produced. In the event of faulty assembly, the MICA® intervenes directly in the production process. This testbed can be generalised to apply to numerous questions in a classic production environment. In the future, with the help of intelligent image recognition, defective parts will be able to be identified and permit the relevant process to be stopped or adjusted.
Compared to classification in which an incoming image in its entirety is classified into one or more categories with a certain probability, object recognition is similarly able to classify several objects within an image. To simplify this, the incoming image is divided into thousands of overlapping "subareas", each of which is classified individually. The likelihood of belonging to a particular class must exceed a threshold in order to be considered further. Spatially proximal “subareas” which belong together based on their classification are combined. Finally, each leftover subarea is assigned the most probable class. These can be visualised for humans for control purposes or integrated directly into existing processes via suitable interfaces.
Still, as an end user, how exactly do you arrive at a solution with the least possible expenditure of time and money? This starts with the choice of the edge device such as the HARTING MICA®. To use an edge device to deploy AI, a hybrid edge-Cloud architecture is used. For AI-based models to be used, they must be trained beforehand.
The artificial Deep Neural Network (DNN), which ultimately performs object detection, is trained on a server with at least one powerful graphics card within a virtual machine (VM). This VM can be executed both within the corresponding servers of Cloud providers as well as on on-premise servers if e.g. sensitive data cannot leave the Company Network.
But before training can begin, a record with training and validation images must first be created. To do so, it is necessary to take at least 150 pictures of the corresponding objects for each class from different perspectives and distances with changing lighting and based on one’s own experience. In addition, at least some of the images should also contain several objects from other classes which may also partially overlap. This increases the robustness of AI-based object recognition, which, in contrast to classic manually programmed filters, can cope flexibly with a wide range of conditions.
All the images created must subsequently be labelled manually by marking all the objects in each individual image with a rectangle (a so-called bounding box) and assigning them to a class. This work step must be carried out very carefully since the data set created in this way forms the eventual training basis and determines the accuracy of object recognition.
The architecture of the DNN used also influences the achievable accuracy of object recognition. However, the limits here are set by the requirements of the application and the achievable inference speed on the edge device. Consequently, a compromise needs to be found between speed and accuracy. Depending on the selected architecture, the incoming images must be reduced in size. Common sizes are 224x224, 300x300 or 640x640 RGB pixels. It should be kept in mind that, within the same architecture family, larger input images can achieve higher accuracy at the cost of corresponding losses in Speed.
If, as in this example, one uses TensorFlow1 and the associated Object Detection API2, there are several models with different architectures available for download3. For example, these models are pre-trained on the COCO4 (Common Objects in COntext) dataset with 1000 different classes. For most industrial applications, however, this model needs to be adapted.
Here, the reusable "feature extraction" part of the pre-trained network is "frozen" according to a "transfer learning" procedure, so that this is unchangeable when making adjustments through post-training. The unusable classification part of the source network, on the other hand, is replaced by its own new classification part, which is trained with the previously created data set. Since only a part of the network has to be re-trained, the necessary computational effort is reduced considerably. This saves both time and costs.
After completing the training, the model in this example is converted to a TensorFlow Lite model, which in turn is executed inside a container through a TensorFlow Lite5 Run-Time. This container can - e.g. in the case of Azure - be provided as an "IoT Edge module" via Azure IoT Edge on the MICA. Alternatively, a container can also be provided via the local web interface of the MICA.
The model executed locally on the MICA now makes an inference service available. Using SOAP, images can be transferred from another service. The inference results are then provided in XML or JSON format. Additional services6 can be leveraged to transfer these via OPC UA, MQTT or supported fieldbus protocols to MES/ERP systems as well as Controllers.
DFKI ranks as Germany's leading business-oriented research institution in the field of innovative software technologies based on artificial intelligence methods. In the international scientific world, the DFKI is considered one of the most important “Centers of Excellence”. The research area Innovative Factory Systems (IFS) under the direction of Prof. Dr. Martin Ruskowski deals with research questions surrounding Industrie 4.0 and the factory of the future.
5 TensorFlow Lite is an embedded/mobile-optimised version of TensorFlow.