From isolation to the compute continuum
We live surrounded by safety-critical cyber-physical systems, even when we may not be aware of it. The cars we drive, planes we travel with, or devices used for medical procedures; the variety is astonishing and, yet most have something in common: their computing resources are limited by the compute power available on the local device and they only execute those processes that are strictly necessary to meet their purpose. The rationale for this is very simple: these systems control safety-critical or mission-critical processes and must be completely error proof because, if they fail, someone may get hurt or even die.
TRANSACT aims at transforming such monolithic, localized standalone cyber-physical systems into distributed systems, allowing to outsource functionalities to the cloud or the edge tiers and using, as well, these tiers to run additional processes that enrich the device capabilities. However, achieving these goals is not trivial.
TRANSACT partners have devised a three-tier architecture, shown in Figure 1, that includes several core and value-added services and functions that will assist us in this direction. For instance, outsourcing functionalities to the edge or cloud tiers will allow us to run them with more computing resources. Doing so can reduce execution time or leverage resources to improve their performance, e.g., increasing the precision with which such processes are run or parallelizing them. There are several aspects to consider, though.
Outsourcing functionality: safety-, mission-, or non-critical?
First, whether we are outsourcing a safety-, mission- or non-critical functionality. Safety-critical processes will only be outsourced t the edge tier as we need reliable networks with dependability guarantees, if possible. Other processes where latency is not a critical issue can be moved into the cloud (cloud cannot provide latency guarantees). To perform this outsourcing TRANSACT will rely on core modules like the Operational Mode Manager and Coordinator, the Monitoring Services and the Data Services and Comms. The Operational Mode Manager will use the monitors to control several KPIs, whose data are exchanged through the Data Services and Comms, associated with processes that can be outsourced. When conditions are favorable, it will communicate with the Operational Mode Coordinator in the edge or cloud tiers (note that one or the two tiers may be available) and launch that process in there letting that tier take control. This change of mode can last as long as the conditions remain favorable. However, if the manager detects a substantial degradation of any KPI it will retake control, which is critical to ensure safety aspects, while being specially cautious in the transition between modes as not handling well a transition can lead to oscillations that endanger the operation.
Value-added services: Adding Machine Learning and Artifical Intelligence
Second, TRANSACT will enable the possibility of running additional processes in the edge and cloud tiers. The goal is to enrich the functionality by adding processes that would be impossible to run on a device, leveraging edge or cloud resources. This opens a whole new world of possibilities, offered by TRANSACT optional services and functions and allowing BigData, Machine Learning (ML) or Artificial Intelligence (AI) processes to come into play, combining data coming from different devices or even allow for the creation of marketplaces putting new services at customers disposal or allowing them to create them and share/sell them to others. In general, these “blue boxes” functionalities (see Figure 1) will depend on the Data Services & Comms module, to facilitate the data they need from the device, or the Identitity & Access Service to ensure no data is being consumed by unauthorized parties, as well as the Privacy Services to ensure data is properly handled. The impact of bringing in ML and AI capabilities is huge, as it has been actually reflected in the requirements collected in the initial phase of the project, expressed by stakeholders associated to the different TRANSACT pilots.
Managing remote (OTA-) updates
Besides these main goals, TRANSACT will offer other relevant capabilities for safety-critical devices, as auditing services, data management or the aforementioned privacy or identity services. However, one very interesting outcome will be the possibility of managing remote updates to the devices. This core component will allow the management of updates to both individual devices or of update campaigns, affecting sets of devices that are under control or connected to the same edge or cloud tier. Managing the updates remotely will not only avoid that specialized personnel has to perform on-site operations on the device, but also downtime due to updates during standard working hours. Automated remote updates will allow to schedule these processes when their impact on regular operation is minimized.
On May 31, 2022, we have released a first deliverable – D7 (D2.1) Reference architectures for distributed safety-critical distributed cyber-physical systems (V1) – studying the state of the art, both from the academic point of view as well as analyzing the current industry practices. Currently, we are evaluating the requirements collected from the different pilots, matching them to the identified topics of interest and assigning priorities. During the next months we will undertake the commencement of the implementation of the different core TRANSACT services, so they can be a) smoothly integrated in devices and b) deployed in the edge or cloud tiers in a customized and automated way, easily configuring its interaction with the devices. From here, we will proceed to validate the proposed implementation in the different TRANSACT pilots.
For more information
The result of the work described in this article is available in the public TRANSACT D2.1 deliverable: “D7 (D2.1) Reference architectures for distributed safety-critical distributed cyber-physical systems (V1)”.