Integrated Machine Learning

Why Machine Learning?

Incorporating machine learning techniques into real-time digital twins takes their power and simplicity to the next level. Creating streaming analytics code that surfaces emerging issues hidden within a stream of telemetry can be challenging. In many cases, the algorithm itself may be unknown because the underlying processes which lead to device failures are not well understood. In these cases, a machine learning (ML) algorithm can be trained to recognize abnormal telemetry patterns by feeding it thousands of historic telemetry messages that have been classified as normal or abnormal. No manual analytics coding is required. After training and testing, the ML algorithm can then be put to work monitoring incoming telemetry and alerting when it observes suspected abnormal behavior.

Adding Machine Learning to Real-Time Digital Twins

To enable machine learning (ML) within real-time digital twins, ScaleOut Software has integrated Microsoft’s popular machine learning library called ML.NET into its real-time digital twin architecture. Using the ScaleOut Model Development Tool, users can select, train, evaluate, deploy, and test ML algorithms within their real-time digital twin models. Once deployed, the selected ML algorithm runs independently for each data source, examining incoming telemetry within milliseconds after it arrives and logging abnormal events. Real-time digital twins also can be configured to generate alerts and send them to popular alerting providers, such as Splunk, Slack, and Pager Duty. In addition, business rules optionally can be used to further extend real-time analytics.

The following diagram illustrates the use of an ML algorithm to analyze engine and cargo parameters being monitored by a real-time digital twin tracking each truck in a fleet. When abnormal parameters are detected by the ML algorithm (as illustrated by the spike in the telemetry), the real-time digital twin records the incident and sends a message to the alerting provider:

Straightforward Development Workflow

Training an ML algorithm to recognize abnormal telemetry just requires supplying a training set of historic data that have been classified as normal or abnormal. Using this training data, the ScaleOut Model Development Tool lets the user train and evaluate up to ten binary classification algorithms supplied by ML.NET using a technique called supervised learning. The user can then select the appropriate trained algorithm to deploy based on metrics for each algorithm generated during training and testing. (The algorithms are tested using a portion of the data supplied for training.)

For example, consider an electric motor which periodically supplies three parameters (temperature, RPM, and voltage) to its real-time digital twin for monitoring by an ML algorithm that detects anomalies and generate alerts when they occur:

Training the real-time digital twin’s ML model follows this workflow:

Additional Support for Spike and Trend Detection

In addition to enabling multi-parameter, supervised learning for anomaly detection, the Model Development Tool provides support for detecting spikes in individual telemetry parameters. Spike detection uses an ML.NET algorithm (called an adaptive kernel density estimation algorithm) which detects rapid changes in telemetry for a single parameter.

It is also useful to detect unusual but subtle changes in a parameter’s telemetry over time. For example, if the temperature for an electric motor is expected to remain constant, it may be important to detect a slow rise in temperature that might otherwise go unobserved. To address this need, the ScaleOut Model Development Tool uses a ScaleOut-developed, linear regression algorithm to detect and report inflection points in the telemetry for selected parameters.

These two techniques for tracking changes in a telemetry parameter are illustrated below:

Automatic Event Correlation

ScaleOut StreamServer automatically correlates incoming messages from each data source for delivery to its respective real-time digital twin. This simplifies design by eliminating the need for application code to extract messages from a combined event pipeline for analysis.

State Information for Each Data Source

Each real-time digital twin maintains state information about its corresponding data source — without having to access external databases or caches. For example, ML algorithms can maintain a time-stamped list of significant events to enable additional analysis.

Easy-to-Use Rules Engine

Rules-Based Real-Time Digital Twins

In many applications, a rules-based formulation of analytics logic can simplify application code and open up development of real-time digital twins to analysts who lack programming experience and want to avoid coding in Java, C#, or JavaScript. Rules-based algorithms have been widely adopted over the years and proven to provide a straightforward technique for expressing business logic in numerous applications and expert systems. ScaleOut Software has developed a simple, easy-to-learn rules language targeted at business analysts and engineers that enables fast development of real-time digital twin algorithms for streaming analytics.

In their simplest form, rules are expressed as “IF condition THEN action” statements which are executed sequentially by a “rules engine.” Other rules which just perform actions, such as calculations or message sending, can be expressed with “DO action” statements. These rules replace programming code with simple, highly readable statements that can be used in many applications where more complex logic is not required. They also can be used to extend the automatic analysis performed by machine learning (ML), for example, to calculate instance parameter values subsequently analyzed by ML.

Users can create lists of time-stamped instance and telemetry values to track significant events in the state of a data source (such as a dangerously high value). In addition, the rules engine includes a comprehensive set of built-in functions that enable rules to perform arithmetic, string, and data aggregation operations on lists.

Example of a Rules-Based Real-Time Digital Twin

Here’s a simple example that illustrates how easy it is to construct a real-time digital twin using ScaleOut’s rules engine. Consider an IoT application in which a real-time digital twin is monitoring messages sent from a thermometer and looking for a situation in which the temperature either spikes beyond an allowed limit of 250 deg. or exceeds an allowed average value of 112 deg. This logic could be expressed with the following rules, which are executed for each incoming message. Note that the temperature reading within the message is called Incoming.Temp here, and the other variables maintain state information within the real-time digital twin’s instance for this thermometer. For example, the number of temperature spikes is maintained in the variable NumEvents.

DO CurrentTemp = Incoming.Temp
IF CurrentTemp > MaxTemp THEN MaxTemp = CurrentTemp
DO AverageTemp = AverageTemp * NumSamples + CurrentTemp
DO NumSamples = NumSamples + 1
DO AverageTemp = AverageTemp / NumSamples
IF MaxTemp > 250.0 THEN NumEvents = NumEvents + 1
IF MaxTemp > 250.0 THEN LogMessage.Message = "Max temp exceeded" AND LogMessage
IF AverageTemp > 112.0 THEN LogMessage.Message = "Average temp exceeded" AND LogMessage

The following diagram shows how messages are delivered to a thermometer’s real-time digital twin instance and are analyzed by its rules engine. A rules engine runs independently for every real-time digital twin and evaluates the rules when a message is received. This allows the rules to analyze incoming telemetry within a few milliseconds and immediately update state information about the data source.

In addition to logging a message, the rules engine can be used to send alerts to popular alerting providers, such as Splunk, Slack, and Pager Duty. To learn about the full capabilities of the ScaleOut Rules Engine, please see the Rules Engine Models section of the ScaleOut Digital Twin Streaming Service User Guide™ here.

Intuitive GUI-Based Development

GUI-Based Development for Real-Time Digital Twins

The ScaleOut Model Development Tool simplifies the development of real-time digital twins using machine learning, business rules or both together. This Windows-based graphical development environment enables application developers to create and test real-time digital twin models prior to deploying them on the streaming service for production use. Once deployed, the streaming service automatically creates a real-time digital twin instance for each unique data source as messages are received and new data sources are identified.

Using the tool’s graphical user interface (GUI), developers create a new real-time digital twin model by first specifying:

Instance properties to be tracked, such as AverageTemp, MaxTemp, and NumEvents in the above example
Message properties that will be used, such as Incoming.Temp for incoming messages
(Optional) Rules to be executed (like the ones listed above)

Building a Machine Learning Model

When creating an machine learning (ML) model, the user only needs to specify options for selected instance properties and/or message properties; no coding is required. For multi-variable anomaly detection with supervised learning, the user first selects a set of properties as a group (called a data collection) to be analyzed for anomalies using ML. Next, the ScaleOut Model Development Tool guides the user through ML training and testing for a set of candidate algorithms implemented by Microsoft’s ML.NET library. Finally, the user selects an ML algorithm for deployment based on metrics calculated during testing.

In addition, users can select individual parameters for spike and trend detection using GUI options. No training is required for spike and trend detection.

Here’s a screenshot showing a data collection of three incoming telemetry parameters (temperature, RPM, and voltage for an electric motor) that have been selected for ML-based anomaly detection within a real-time digital twin. (Note that this model has additional instance parameters, like MaxRPM, which are calculated by the rules engine.)

Once the Configure button is selected, the GUI’s wizard takes the user through a series of steps to train the ML model and select the ML algorithm to deploy. Here’s the third step in the process in which the user selects an ML algorithm to deploy:

After that, the real-time digital twin model is ready to be deployed. The entire process takes just a few minutes to complete. Now, real-time digital twins are ready to simultaneously analyze telemetry from thousands of data sources using powerful ML techniques.

Building and Testing a Rules-Based Model

The ScaleOut Model Development Tool makes it easy for users to create rules that execute when messages are received by a real-time digital twin. These rules also can run alongside an ML model to provide additional processing of incoming messages and state tracking for data sources. The tool validates the rules when they are created to make sure that they will execute.

Once a set of rules has been created, the user can click on the Test model tab in the ScaleOut Model Development Tool to test the model by sending it messages and observing changes in the values of the properties. The rules can be run as a group or one at a time for each message to verify that they are creating the desired state changes and outgoing messages. The development tool can simulate sending messages back to the data source, to another real-time digital twin instance, to the message log in the service’s UI, or to an alerting provider.

Here is a screenshot of the development tool during a test of a rules-based model for a thermometer which shows how messages can be sent to a simulated real-time digital twin for testing.

Once the real-time digital twin model has been deployed, the streaming service can aggregate instance properties across all real-time digital twin instances and visualize the results. The rules engine running in each real-time digital twin instance updates property values as it processes incoming messages, and the results are immediately aggregated by the streaming service. For example, if the thermometers supplied their locations, the average temperature could be plotted by region. This allows managers to immediately spot patterns in the data across all data sources and direct responses where they are most urgently needed.

The ScaleOut Model Development Tool combines ease of use and power to simplify the development of real-time digital twins. It lets users harness the power of machine learning with ML.NET and no coding needed. The rules engine enables fast, easy coding of application logic by analysts and engineers who lack programming experience. Together, these capabilities enable real-time digital twins to dramatically enhance situational awareness for live, mission-critical systems.

ScaleOut Model Development Tool™

Simplify Development and Harness the Power of Machine Learning

Integrated Machine Learning

Easy-to-Use Rules Engine

Intuitive GUI-Based Development

Try ScaleOut for free