Safety-Aware Anomaly Detection and RUL Analytics for Industrial Machinery

Developing an auditable, safety-aware anomaly detection and Remaining Useful Life (RUL) analytics pipeline for industrial machinery, validated on our bottle inspection and sorting demonstrator, combining multi-sensor time-series learning, uncertainty-aware alarms, and human-in-the-loop verification.

NED demonstrator, bottle inspection and sorting system, used for anomaly detection and RUL validation.

Overview

This project develops a safety-aware anomaly detection and RUL analytics pipeline for industrial machinery, grounded in demonstrator evidence rather than purely offline datasets. Using the bottle inspection and sorting system, we capture multi-sensor telemetry (signals from actuators, conveyors, inspection modules, and control logic), align it into machine-state segments, then learn normal operational signatures to detect deviation patterns early. Beyond detection, the framework estimates degradation trends and Remaining Useful Life with uncertainty bounds, enabling maintenance planning that is both operationally useful and defensible.

A core requirement is auditability, every alert is paired with traceable evidence (when the deviation started, which signals contributed, confidence/uncertainty, and operator validation outcome), enabling post hoc analysis, governance, and cross-shift handover.

Motivation

Industrial anomaly detection is often evaluated on static datasets, but deployment fails when alarms are noisy, poorly explained, or misaligned with safety and operational decision-making. In inspection and sorting cells, late fault detection can create cascading issues, mis-sorts, quality escapes, downtime spikes, and unsafe recovery actions. This project focuses on early and explainable detection, RUL-style forecasting, and human verification, so the outputs are usable in real maintenance and operations workflows.

System Architecture

Sensing & Control Layer

Demonstrator telemetry from PLC/control events plus sensor streams (time-series + event logs)
Production context signals, e.g., mode, recipe, throughput rate, inspection outcomes

Data Layer

Synchronisation, cleaning, and machine-cycle segmentation
Feature store for time-domain, frequency-domain, and event-derived features
Ground-truth support via fault logs, interventions, and controlled fault injection (where feasible)

Learning Layer (Detection + Prognostics)

Anomaly Detection
- Learns normal operational signatures per mode/recipe
- Detects deviations using confidence-aware thresholds and persistence logic
RUL / Degradation Modelling
- Health indicator construction and trend modelling
- RUL prediction with uncertainty bounds (intervals), not just point estimates

Human-in-the-Loop Layer

Alarm triage panel, operator labels, “confirmed / not an issue / needs monitoring”
Feedback improves calibration, reduces false positives, and builds trust

Governance & Audit Layer

Alarm evidence packs, timestamps, signal attribution, and decision rationale
Machine-readable artefacts enabling traceability and reproducibility

Key Components

Cycle & State Segmenter
Partitions telemetry into meaningful machine phases (conveyor motion, inspection window, sorting actuation), ensuring anomalies are contextual rather than global.
Mode-Aware Normality Model
Separates “expected variation” (different bottle types, throughput levels, lighting changes) from true abnormality.
Anomaly Scoring + Alarm Logic
Combines raw anomaly scores with persistence, cooldown, and confidence gates to avoid alert storms.
RUL / Degradation Module
Builds interpretable health indicators, estimates trends, and outputs RUL with uncertainty intervals.
Explainability & Evidence Pack Builder
Produces “what changed” summaries, signal contributions, and local time windows around onset.
Operator Panel + Feedback Loop
Lets users validate alarms, attach notes, and trigger targeted re-analysis.

Evaluation

The framework is evaluated on the NED bottle inspection and sorting demonstrator under controlled and naturally occurring variability.

Disruption / anomaly classes include:

Conveyor/drive anomalies
Speed drift, intermittent stalls, rising vibration signatures, abnormal cycle-time variance.
Inspection degradation
Sensor noise increase, lighting drift, mis-detection patterns, inspection latency spikes.
Sorting/actuation issues
Delay or misfire behaviours, increasing rejection errors, timing misalignment with conveyor state.
Process/context shifts
Recipe changes, throughput changes, environmental shifts, which must not be misclassified as faults.

Metrics and evidence focus on:

Detection lead time before failure or intervention
False alarm rate, missed detection rate, and alert stability
RUL calibration (interval coverage, not only point error)
Robustness across operating modes/recipes
Operator agreement and triage effectiveness
Audit completeness (can we reconstruct why an alert fired)

Key Findings

Earlier fault visibility
Mode-aware baselines reduce nuisance alarms while preserving sensitivity to true degradation.
Operationally usable RUL
Uncertainty-bounded outputs support planning decisions rather than overconfident predictions.
Trust via auditability
Evidence packs and operator validation pathways make alarms defensible and easier to adopt.
Demonstrator-first realism
Demonstrator evaluation exposes deployment issues (context shifts, phase alignment, alert storms) that static datasets often hide.

Hamidu Barrie

MRes Researcher

My research interests include Process Mining, Multi-Modal Anomaly Detection and Industry 4/5.0.

Bugra Alkan

Senior Lecturer in AI and Robotics

My research interests include human–robot collaboration, industrial AI and cyber-physical production systems.