In today’s rapidly evolving technological landscape, the deployment of artificial intelligence (AI) applications has become increasingly prevalent. However, ensuring the seamless monitoring and performance optimization of these AI systems remains a complex challenge. Unlike non-AI software, AI applications require continuous monitoring throughout their lifecycle due to evolving input data and application context, which quickly renders released software versions outdated.
While several monitoring solutions exist, traditional post-deployment techniques often lack flexibility and functionality. They typically require predefined data formats and integration into the core application, which can be problematic for established software or highly regulated industries like healthcare. Additionally, basic add-on packages designed for AI software, such as EvidentlyAI, Great Expectations and DeepQu, often lack customizability and mainly focus on simple classification tasks. This leaves out more intricate aspects of AI, such as advanced 3D segmentation and landmark detection algorithms. Furthermore, these tools often focus primarily on data-only metrics, overlooking important performance outputs like inference speed and memory footprint, which can significantly impact the user experience. Lastly, the absence of AI monitoring algorithms, such as anomaly detection and risk of failure prediction, further limits the effectiveness of these tools.
In the TRANSACT project, we have developed an innovative approach to AI monitoring that addresses these limitations of existing tools. We propose a plug-and-play AI monitoring solution, or “AI monitoring toolbox,” that provides users with full control over monitoring their AI applications. This toolbox allows users to define custom operational/AI-specific metrics, visualizations, and analysers, enabling comprehensive insights into application performance, accuracy, and resource utilization (e.g., network speed, hardware temperature, etc.). By ensuring high customizability and modularity, experts can monitor and visualize complex data, such as annotated DICOM images. The solution expands the scope of AI monitoring beyond simple classification tasks and empowers users to effectively monitor and maintain the performance of their AI applications throughout their full lifecycle.
Figure 1 visualizes a possible real-world application of our solution. Before deploying an AI application, a research and development team builds the AI models and the associated application. Once deployed, data generated by these AI models is stored in a separate data storage. Optionally, the output can be reviewed or corrected by a human-in-the-loop to obtain early feedback. The AI monitoring toolbox continuously monitors the application’s performance and can trigger model adjustments to maintain high-quality output.
By keeping data generation, storage, and monitoring separated, our solution allows users to work with their preferred data formats and storage systems, ensuring compatibility with their specific business requirements. The only requirement is that the data in the storage should be readable by external software, such as through REST API calls. The toolbox also provides the flexibility to monitor both newly generated data and existing production data, enabling real-time performance evaluation.
The toolbox architecture consists of four main components: metrics, analysers, exporters, and dashboards:
- Metrics: These are the core of the monitoring tool as they define the performance, data drift, target drift, explainability, Gaussian analysis, etc., of the AI models being monitored. Users can add any desired metric to the toolbox, including other AI-based algorithms, to evaluate AI predictions. Examples of metrics could be classification accuracy, segmentation Dice score, or Euclidean distance.
- Analysers: This component reads data from the storage using filters, sorting parameters, and (custom) metrics. It enables the software to create its own live feed from the data, which can then be used by dashboards or exporters for visualization or report writing, respectively. In a simplified form, an analyser could fetch two 3D points, one prediction and one corrected (or ground truth), and apply given metrics, such as the Euclidean distance, to provide values for further analysis.
- Exporters: To avoid slowing down a live updated feed with excessive data, exporters allow users to generate custom reports that can be stored and read at any given time. These reports capture a snapshot of the AI application’s state at a specific moment, making them useful for quarterly reporting of application performance or other purposes.
- Dashboards: If users want a live feed of the current status of the AI application, the dashboard components provide insights using visualization tools. They include generic plotting components like histograms, scatter plots, and image viewers, using attributes from analysers and metrics to populate a web UI. An example preview is shown in Figure 2, depicting an overview of some operational and AI metrics, along with some distributions and time-stepped values.
The versatility of the AI monitoring architecture makes it applicable to various use cases. For example, in a medical use case, the solution can monitor AI models to detect anomalies or changes in performance in anatomical segmentation masks. It can also analyse data divided into categories, such as age or gender of a patient population, to identify which categories are negatively impacting performance, allowing experts to focus on those cases. Additional components, such as notifications and triggers, can be added to the architecture to activate alerts when model degradation exceeds a specified threshold, enabling proactive measures like automated retraining.
Similarly, the solution can be leveraged in other scenarios, such as monitoring the deviation between real-world data and predicted digital twin data (see use case 1 of the TRANSACT project) or monitoring the discrepancy between predicted and chosen navigation routes (see use case 2 of the TRANSACT project).
Unlike most existing tools, our solution does not require integration into core software, making it highly suitable for industries with strict regulatory processes. With this flexible solution, organizations can unlock the full potential of their AI systems while maintaining transparency, explainability, and compliance.