Powerful Stream-Processing Model

What is the Digital Twin Model?

Traditional stream-processing and complex event processing systems have focused on extracting interesting patterns from incoming data with stateless applications. While these applications maintain state information about the data stream itself, they don’t generally make use of information about the dynamic state of data sources. For example, if an IoT application is attempting to detect whether data from a temperature sensor is predicting the failure of a medical freezer, it typically just looks at patterns in the temperature readings, such as sudden spikes or a continuously upward trend, without regard to the freezer’s usage or service history.

The following diagram depicts a typical stream processing pipeline processing events from many data sources:

digital twin builder

Imagine if the stream-processing application for medical freezers could instantly access relevant information about each freezer’s specific model, service history, environmental conditions, and usage patterns. This would give its predictive analytics algorithm much richer context in which to analyze incoming temperature readings, leading to more informed predictions in real time about possible impending failures with fewer false alarms.

The digital twin model offers an answer to this challenge. While this term was coined for use in product life cycle management, it was recently popularized for IoT because it offers key insights into how state data can be organized within stream-processing applications for maximum effectiveness. In particular, it suggests that applications implement a stateful model of the physical data sources that generate event streams, and that the application maintain separate state information for each data source.

For example, using the digital twin model, a rental car company can track and analyze telemetry from each car in its fleet with digital twins with detailed knowledge about each car’s rental contract, the driver’s demographics and driving record, and maintenance issues. With this information it could, for example, alert managers when a driver repeatedly exceeds the speed limit according to criteria specific to the driver’s age and driving history or otherwise deviates from the rental contract. All of this leads to deeper and more timely insights on telemetry received from the vehicles:

digital twin builder

The digital twin model provides an intuitive approach to organizing state data, and, by shifting the focus of analysis from the event stream to the data sources, it potentially enables much deeper introspection than previously possible. With the digital twin model, an application can conveniently track all relevant information about the evolving state of physical data sources and then analyze incoming events in this rich context to provide high quality insights, alerting, and feedback.

Digital twin models add value in almost every imaginable stream-processing application. They enable real-time streaming analytics that previously could only be performed in offline, batch processing. Here are a few examples:

They help IoT applications do a better job of predictive analytics when processing event messages by tracking the parameters of each device, when maintenance was last performed, known anomalies, and much more.
They assist healthcare applications in interpreting real-time telemetry, such as blood-pressure and heart-rate readings, in the context of each patient’s medical history, medications, and recent incidents, so that more effective alerts can be generated when care is needed.
They enable ecommerce applications to interpret website clickstreams with the knowledge of each shopper’s demographics, brand preferences, and recent purchases to make more targeted product recommendations

In summary, the digital twin model provides a powerful organizational tool that refocuses stream-processing on the state of data sources instead of just the data within event streams. This additional context magnifies the developer’s ability to implement deep introspection and represents a new way of thinking about stateful stream-processing and real-time streaming analytics.

Digital Twins Are Just Objects

Beyond providing a powerful semantic model for stateful stream-processing, digital twins also offer advantages for software engineering because they can take advantage of well understood object-oriented programming techniques. A digital twin can be implemented as a data class which encapsulates both state data (including a time-ordered event collection) and methods for updating and analyzing that data. Analytics methods can range from simple sequential code to machine learning algorithms or rules engines. These methods also can reach out to databases to access and update historical data sets.

As illustrated in the following diagram, a digital twin can receive event messages from data sources (or other digital twins). It also can receive command messages from other digital twins or applications. In turn, it can generate alert messages to applications and feedback messages (including commands) to data sources. Having all of this contextual data immediately available to assist in analyzing event messages enables better, faster decision-making than previously possible:

digital twin builder

The granularity of a digital twin can encompass a model of a single sensor or that of a subsystem comprising multiple sensors. The application developer makes choices about which data (and event streams) are logically related and need to be encapsulated in a single entity for analysis to meet the application’s goals.

Although the in-memory state of a digital twin consists of only the event and state data needed for real-time processing, an application optionally can reference historical data from external database servers to broaden its context if needed, as shown below.

digital twin builder

Building a Hierarchy of Digital Twins

Digital twin models can simplify the implementation of stream-processing applications for complex systems by organizing them into a hierarchy at multiple levels of abstraction, from device handling to strategic analysis and control. By partitioning an application into a hierarchy of digital twins, code can be modularized and thereby simplified with a clean separation of concerns and well-defined interfaces for testing.

For example, consider an application that analyzes telemetry from the components of a wind turbine. This application receives telemetry for each component and combines this with relevant contextual data, such as the component’s make, model, and service history, to enhance its ability to predict impending failures. As illustrated below for a hypothetical wind turbine, the stream-processing system correlates telemetry from three components (blades, generator, and control panel) and delivers it to associated digital twin objects, where event handlers analyze the telemetry and generate feedback and alerts:

digital twin builder

Taking advantage of the hierarchical organization shown above, digital twins for the blades and generator could feed telemetry to a higher-level digital twin model called the Blade System that manages the rotating components within the tower and their common concerns, such as avoiding over-speeds, while not dealing with the detailed issues of directly managing these two components. Likewise, the digital twin for the blade system and the control panel feed telemetry to a yet higher-level digital twin model which coordinates overall operations and generates alerts as necessary.

Building and Running Digital Twin Models

The ScaleOut Digital Twin Builder software toolkit enables developers to easily build digital twin models of data sources and deploy them to ScaleOut StreamServer for execution. The toolkit’s goal is to dramatically simplify the construction of these models by hiding the details of their implementation, deployment, connection to data sources, and management within ScaleOut StreamServer’s in-memory data grid (IMDG).

The toolkit includes class definitions in Java and C# for defining digital twin models in a form that can be deployed to ScaleOut StreamServer. It also includes APIs for deploying them on ScaleOut StreamServer so that instances of each model (one per data source) can receive and analyze incoming messages.

As messages arrive from data sources, ScaleOut StreamServer automatically creates an instance of a digital twin model within its IMDG as needed for each physical data source. It then correlates incoming messages from each data source for delivery to the associated instance of a digital twin model, as depicted in the following diagram for a fleet of rental cars. In many applications, the IMDG can host thousands (or more) digital twins to handle the workload from its data sources.

digital twin builder

Using the ScaleOut Digital Twin Builder API Libraries, an application can create connections to Azure IoT Hub, Kafka, and REST as illustrated in the following diagram. These connections deliver event messages from data sources to digital twin models and return alerts and commands back to data sources from these models.

digital twin builder

As part of event-processing, digital twins can create alerts for human attention and/or feedback directed at the corresponding data source. In addition, the collection of digital twin objects stored in the IMDG can be queried or analyzed using ScaleOut StreamServer’s data-parallel APIs to extract important aggregate patterns and trends. For example, in the rental car application, data-parallel analytics could allow a manager to query the compute the maximum excessive speeding for all cars in a specified region. These data flows are illustrated in the following diagram:

digital twin builder

What makes ScaleOut StreamServer’s IMDG an excellent fit for hosting digital twin models is its ability to transparently host both state information and application code for thousands of digital twins within a fast, highly scalable, in-memory computing platform and then automatically direct incoming events to their respective digital twins within the grid for processing. These two key capabilities enable a breakthrough new approach to stateful stream-processing.

Automatic Event Correlation

ScaleOut StreamServer automatically correlates incoming messages from each data source for delivery to its respective digital twin. This simplifies application design by eliminating the need to pick out relevant messages from the event pipeline for analysis.

Immediate Access to State Information

By providing each digital twin’s message processing code immediate access to state information about the data source, applications have the context they need for deep introspection in real time — without having to access external databases or caches.

Fast, Easy Development

Object-Oriented Design Keeps It Simple

To keep application development fast and easy, the ScaleOut Digital Twin Builder enables digital twin models to be constructed using standard, object-oriented techniques in Java and C#. The use of popular programming languages and object-oriented techniques makes it easy for most developers to build digital twin models. This approach simplifies design, reduces overall development time, and enhances maintainability.

Because a digital twin encapsulates state information and associated analysis code, it naturally can be represented as a user-defined data type (often called a class) within an object-oriented language, such as Java or C#. For example, consider the digital twin for a basic controller with class properties (status and event collection) describing the controller’s status and methods for analyzing events and performing device commands. The use of an object class to represent the controller conveniently encapsulates the data and code as a single unit and allows an application to create many instances of this type to manage different data sources. This class can be depicted graphically as follows:

digital twin builder

Here’s how this basic controller class could be written in Java:

public class BasicController {
    private List<Event> eventCollection;
    private DeviceStatus status;
    public  void start() {…}
    public  void stop() {…}
    public void handleEvent() {…}
}

An application also can make use of this class definition to construct various special purpose digital twins as subclasses, taking advantage of the object-oriented technique called inheritance, which maximizes code reuse and saves development time. For example, it can define the digital twin for a hot water valve as a subclass of a basic controller that adds new properties, such as temperature and flow rate, with associated methods for managing them. This subclass inherits all of the properties of a basic controller while adding new capabilities to manage specialized controller types.

Java and C# API Libraries

The ScaleOut Digital Twin Builder provides a complete set of API libraries for building stateful stream-processing applications using the digital twin model. Their goal is to enable developers to focus on building and deploying digital twins without the need to handle low level details regarding their in-memory storage and message processing within ScaleOut StateServer’s IMDG.

API libraries for implementing digital twin models include the following components:

a base class definition for a digital twin model’s state object which describes the state information to be maintained by each instance of the model
a base class definition for a message processor which describes the application code that processes incoming messages using the state object
a processing context class that supplies context information to a message processor and includes utility methods, for example, for sending a message back to a data source

Both Java and C# APIs are provided to deploy these digital twin models to ScaleOut StreamServer. In addition, to enable connections to external data sources, the ScaleOut Digital Twin Builder includes:

API libraries for connecting ScaleOut StreamServer to Microsoft Azure IoT Hub and Kafka to exchange messages with data sources
API libraries for sending messages to digital twins from Java and C# client applications

The ScaleOut Digital Twin Builder’s Java APIs can be found on GitHub, and the C# APIs can be downloaded as NuGet packages. For more details, please consult the ScaleOut Digital Twin Builder User Guide and other documentation here.

Fast Development Cycle

The ScaleOut Digital Twin Builder’s APIs make it easy for developers to build stateful stream-processing applications for real-time streaming analytics. The first step is to build the required digital twin models and then deploy them to ScaleOut StreamServer. Once connections have been established to data sources, ScaleOut StreamServer automatically creates instances of these models as needed to handle incoming messages. Here’s a depiction of the work flow:

digital twin builder

Instances of digital twin models (or simply, “digital twins”) receive event messages from their respective data sources and can send command or alert messages back to these data sources or to other digital twins in a hierarchy. Each digital twin model implements a message processor class that defines the message processing method used to handle incoming messages, and it also implements a state object class that defines the information to be maintained for the digital twin and passed to the message processor’s method. When a message arrives, ScaleOut StreamServer delivers it to the correct digital twin by calling the message processor’s method.

That’s all there is to it. This simple but powerful object-oriented model provides a clean separation of application-specific code and the orchestration mechanisms required for stream-processing. All of the details of correlating incoming messages, accessing and updating state information, communicating with data sources, and deploying digital twin models (not to mention scaling performance and ensuring high availability) are taking care of by the ScaleOut Digital Twin Builder’s API libraries and ScaleOut StreamServer.

Fast, Scalable Performance

Transparent Scaling for Fast Message Handling

ScaleOut StreamServer stores instances of digital twin models as memory-based objects within its in-memory data grid (IMDG), which automatically distributes them across a cluster of servers. This provides transparent scaling of both memory capacity and throughput, keeping message-handling latency low even as the number of instances grows to many thousands of digital twins:

digital twin builder

ScaleOut StreamServer’s IMDG implements a software-based, key-value store of serialized objects that spans a cluster of commodity servers (or cloud instances). This object storage is used to host state objects for digital twins. The grid’s architecture provides cost-effective scalability and high availability, while hiding the complexity of distributed in-memory storage from applications.

Avoiding Network Bottlenecks

ScaleOut StreamServer automatically correlates incoming event messages for each digital twin based on the corresponding data source’s unique identifier and then delivers these messages to the server on which the digital twin’s state object is stored. It then runs the application’s message processing code within the IMDG on this server. This avoids network overhead and bottlenecks which could increase event-handling latency and restrict throughput scaling. It also delivers vastly superior performance in comparison to stream-processing platforms that rely on external databases or caches to store state information for data sources, as illustrated here:

digital twin builder

In contrast, ScaleOut StreamServer takes full advantage of the cluster’s computing power to run application code within the IMDG — where the data lives — to maximize performance and avoid network bottlenecks. Message processing code for each digital twin runs on the same IMDG server where the twin’s state object is located. This avoids the need for a network transfer to retrieve and update the state object, thereby avoiding a network bottleneck. It also minimizes the latency to access the state object, which keeps message-processing response time as low as possible.

Real-Time, Data-Parallel Analytics

One of the most exciting opportunities created by the use of digital twin models is the ability they offer to detect aggregate patterns and trends in real time. These aggregate results can be used to generate alerts and also to provide real-time feedback for use in message processing by the digital twins. ScaleOut StateServer’s scalable, in-memory computing platform brings data-parallel analytics into the “fast path,” enabling applications to tap into dynamic data that was previously unavailable in real time.

Consider an IoT application tracking telemetry from heart-rate tracking watches worn by the customers of a company’s nationwide network of fitness centers. Using the digital twin model, these trackers can supply telemetry to digital twin instances in a cloud-based application that evaluate this telemetry in real time based on each customer’s diverse parameters, such as age, BMI, medical history, current medications, exercise history, and recent incidents. With this information, the digital twin model offers much deeper introspection on heart-rate telemetry than just tracking its patterns over time.

In addition, ScaleOut StreamServer enables an application to implement periodic, data parallel analytics in real-time to assess aggregate trends across all customers, such as the maximum observed heart-rate by age or BMI. This information can be reported to managers and also can be fed back to the digital twin model to generate alerts for customers who deviate too far from aggregate trends. For example, a customer could be alerted if heart-rate spikes appear to be atypical for the customer’s age, BMI, or medical history.

Real-time feedback from aggregate statistics is depicted in the following illustration:

digital twin builder

With the digital twin model, data-parallel analytics no longer has to be relegated to batch processing on the data lake. Now it can be performed in real time, adding even more value to streaming analytics.

ScaleOut Digital Twin Builder™

Bringing the Digital Twin to Streaming Analytics

Powerful Stream-Processing Model

What is the Digital Twin Model?

Digital Twins Are Just Objects

Building a Hierarchy of Digital Twins

Building and Running Digital Twin Models

Fast, Easy Development

Object-Oriented Design Keeps It Simple

Java and C# API Libraries

Fast Development Cycle

Fast, Scalable Performance

Transparent Scaling for Fast Message Handling

Avoiding Network Bottlenecks

Real-Time, Data-Parallel Analytics

Try ScaleOut for free