There are currently some hyped-up topics that are getting a lot of publicity. One of these is the basic requirement for the Smart Factory in the world of Industry 4.0, namely artificial intelligence. Whenever we talk about it, the topic of “Big Data” is never far away, either. The reason? The bulk of popular machine learning methods is based on Big Data, in order to learn from sample data.
"Machine learning" is one of the most in-demand approaches of AI when it comes to transforming data into added value. In particular, the manufacturing industry, mechanical engineers and the companies that have already switched to networked production are all seeing an exponential growth in data volumes which will continue for an as yet unforeseeable period. As a result, demand for "machine learning" methods has been increasing for some time, with the latter intended to enable companies to turn more than just a fraction of their data into added value.
One first needs to get a handle on the topic of "Big Data". What does this term mean specifically? This can best be explained by the 5 "Vs", which goes back to an article from the year 2001 that has been variously added to during intervening years. The 5 "Vs" stand for:
- Volume: Data volumes that are so large that they can no longer be efficiently stored or processed using traditional methods.
- Variety: The variety of data sources as well as data types, which are often unstructured or only semi-structured.
- Velocity: The speed at which data is generated and both can and must be analysed.
- Veracity: The fact that data are error-prone, but because of the large amounts of data this error-proneness may exist without the data losing their overall value.
- Value: The ability to use large amounts of data to create not only costs, but also added business value.
Team made up of data scientists and engineers
The key to creating the "value" with the data is often artificial intelligence, as AI enables complex and efficiency-enhancing automation solutions in production as well as in the office environment. In order to get there, however, it is essential to create an infrastructure that addresses the other "Vs". Consequently, the HARTING Technology Group has established a team composed of data scientists and data engineers. This team has implemented an infrastructure based on the so-called Lambda architecture which it also also runs when implementing AI projects.
MICA® as part of the Lambda architecture
In addition to Lambda's typical core elements of a "cold" cyclical computation path and a "hot" path for real-time information analysis, the infrastructure includes elements added for the HARTING application – an Industrial IoT approach where IoT devices are installed in manufacturing and the customer is also intended to be provided with data from the infrastructure.
These include a system for connecting a variety of data sources from different IT systems, a solution for integrating IIoT devices, as well as an interface for the secure provision and billing of information to customers and partners. Naturally, our MICA® product is part of the architecture. As an IIoT device, it helps us to obtain large and diverse amounts of data that we need for AI training, while also making the topic of Velocity manageable by means of trained AI systems through preprocessing at the Edge, as well as the topic of Variety via standardised interfaces and data verification.
From a conceptual point of view, lambda architecture describes the development of a Big Data architecture. The name is probably due to the similarity to a left-rotated "lambda" or is due to lambda calculus, the basis of functional programming. The architecture consists of two core components, the speed layer and the batch layer. Both layers are usually preceded by a data ingestion layer, which is responsible for the buffering and the short-term recoverability of the data. To permit the data to be displayed from the processing layers, they have a serving layer, which pre-processes and holds the data. If the data generated by a sensor reach the ingestion layer, they are passed on to the speed layer as well as to the batch layer. The speed layer usually processes the data immediately and completely. In contrast, the batch layer usually processes the data with a time delay at specified intervals.
The most important Machine Learning methods
All machine learning methods can be subsumed into two classes: supervised machine learning and unsupervised machine learning. The main difference is the learning process. Supervised machine learning has output data sets at the beginning of the learning process. These already contain valid results which are mostly based on calculations and are used to train the algorithm and then apply it to other data. Unsupervised machine learning does not know the expected output at the beginning of the learning process. This open-ended (with respect to the results) approach is therefore exploratory in nature. The learning process is accomplished by the algorithm attempting to cluster data or identify anomalies in a particular manner.