Time Lies Broke Our Factory Alerts: Camera vs PLC (Real Failure, Real Fix)

Our unsafe motion alert fired before the guard-door event - logs said so, the real bug was software time, not the model.

What we were running

In UEDC (Universal Edge Device Connector - our edge device software) we fuse two streams:

  • RTSP camera frames - Rust RTSP adapter on the edge, Go simulator for tests.
  • PLC (Modbus/TCP) events - Rust Modbus adapter on the edge, Go simulator for tests.

Symptom: the "unsafe-motion" alert triggered before guard_door_open. We almost rewrote fusion.

Root cause: timestamps

  • Camera: frame stamped in userspace after decode.
  • PLC: CLOCK_REALTIME via a time.Now() - equivalent on Modbus receive.
  • Under CPU & network load, NTP-disciplined CLOCK_REALTIME drifted by milliseconds. The Modbus "arrival" mirrored scheduler mood. Meanwhile, camera decode added variable latency, so its timestamps skewed the other way. Our fusion learned scheduler jitter, not the workcell (workcell is our deployable hardware unit with edge adapters and edge box UI on factory floors)

Here is our micro latency table:

Path Component Idle (ms) Loaded (ms) Note
PLC RX→userspace (NTP) 0.9 12–20 scheduler jitter
Cam Decode→stamp (userspace) 6.0 9–18 GOP/codec variability
Both With PTP+PHC (HW RX stamp) ≤0.05 ≤0.20 bounded, stable (μs-level)

What fixed it (and why it works)

  1. Make the NIC's PHC the source of truth
    1. Turn on PTP (IEEE-1588) so the NIC's PHC at (/dev/ptp0) is synchronized (not just NTP).
    2. ptp4l syncs the NIC's hardware clock to the grandmaster, hardware timestamps arrive with packets.
    3. phc2sys -s /dev/ptp0 -c CLOCK_REALTIME -w disciplines the kernel clock to the PHC so CLOCK_REALTIME is finally usable.
  2. Timestamp at the earliest defensible point of each adapter.
    1. Modbus/TCP: enable hardware RX timestamps with SO_TIMESTAMPING and read the SCM_TIMESTAMPING control message so the timestamp is the on-wire time capture by the NIC, not when your code wakes up.
    2. RTSP (UDP/TCP): same socket option on the RTP/RTCP sockets, carry the hardware RX timestamp alongside the frame through your decode pipeline.
    3. Serial/USB (if any): you can't get PHC on UART, so rely on CLOCK_REALTIME via phc2sys and take the stamp at the first byte read that unblocks userspace ("byte-one rule").
  3. Carry timing provenance through your schema
    Every record includes its time source and quality
{
  "ts": "2025-10-26T10:13:37.123456Z",
  "clock": { "source": "ptp_phc0", "skew_ms": -0.006 },
  "seq": 8127361,
  "device_id": "plc_01",
  "type": "plc.coil.change",
  "payload": { "...": "..." }
}

Consumers down-weight or drop records with non-PTP sources or large skew_ms.


Adapter-side code that matters (correct, compilable snippets)

Rust (Modbus/TCP) — receive with hardware timestamps

Uses nix to access SO_TIMESTAMPING and read the ancillary data. Real code plumb this into our adapter trait and unified record.
use nix::sys::socket::{
    recvmsg, setsockopt, sockopt::Timestamping, ControlMessageOwned, MsgFlags,
};
use nix::sys::uio::IoSliceMut;
use std::net::UdpSocket;
use std::os::fd::AsRawFd;

fn enable_hw_rx_ts(sock: &UdpSocket) {
    // Request HW RX timestamps (plus software as fallback)
    let flags = (libc::SOF_TIMESTAMPING_RX_HARDWARE
        | libc::SOF_TIMESTAMPING_RAW_HARDWARE
        | libc::SOF_TIMESTAMPING_SOFTWARE) as i32;
    // Safety: the kernel expects a bitmask int
    unsafe { setsockopt(sock.as_raw_fd(), Timestamping, &flags) }.ok();
}

fn recv_with_hwts(sock: &UdpSocket, buf: &mut [u8]) -> (usize, Option<i128>) {
    enable_hw_rx_ts(sock);

    let mut iov = [IoSliceMut::new(buf)];
    let mut cmsgspace = nix::cmsg_space!([libc::timespec; 3]);

    let msg = recvmsg(
        sock.as_raw_fd(),
        &mut iov,
        Some(&mut cmsgspace),
        MsgFlags::empty(),
    ).expect("recvmsg");

    let mut t_ns: Option<i128> = None;

    for cmsg in msg.cmsgs() {
        if let ControlMessageOwned::ScmTimestamping(ts) = cmsg {
            // Order: [software, transformed hw, raw hw]
            let best = if ts[2].tv_sec != 0 || ts[2].tv_nsec != 0 { ts[2] } else { ts[1] };
            if best.tv_sec != 0 || best.tv_nsec != 0 {
                t_ns = Some((best.tv_sec as i128) * 1_000_000_000 + (best.tv_nsec as i128));
            }
        }
    }

    (msg.bytes, t_ns)
}

Rust (RTSP/RTP) - propagate RX timestamp with the frame

struct Frame {
    data: bytes::Bytes,
    hw_rx_unix_ns: Option<i128>, // from socket ancillary data
    rtp_ts: u32,                 // codec clock domain
}

fn on_rtp_packet(pkt: &[u8], hw_rx_unix_ns: Option<i128>) -> Option<Frame> {
    // ... depacketize; when a frame boundary completes:
    let assembled = bytes::Bytes::copy_from_slice(pkt); // placeholder
    Some(Frame { data: assembled, hw_rx_unix_ns, rtp_ts: 0 })
}

Here are some safe to run commands you can try:

# Verify NIC supports HW timestamping + PHC
ethtool -T enp1s0 | grep -E 'SOF_TIMESTAMPING|PTP'
# Expect lines including: SOF_TIMESTAMPING_RX_HARDWARE … and "PTP Hardware Clock: 0"

# Discipline kernel clock to PHC (example; adapt interface/paths)
sudo ptp4l -i enp1s0 -m -2 -f /etc/ptp4l.conf &
sudo phc2sys -s /dev/ptp0 -c CLOCK_REALTIME -w -m

# Sample phc2sys output (synthetic-style; they'll see similar)
# phc2sys[2119.123]: phc offset -0.432 us freq +5.230 ppm delay 0.281 us
# phc2sys[2120.124]: phc offset +0.091 us freq +5.229 ppm delay 0.279 us
# phc2sys[2121.125]: phc offset -0.007 us freq +5.229 ppm delay 0.280 us

Why this works: PLC edges and camera frames now share a common physical timebase (PHC disciplined into CLOCK_REALTIME ). Fusion aligns the hardware RX time, not scheduler latency or decode delay.


Wire → NIC (no PHC) → Kernel queues → Userspace (stamp) → Decode → Fusion: timestamp taken after decode, so scheduler + codec jitter masquerade as ground truth.

Before (NTP, userspace stamps)

Wire (SoF) → NIC (PHC HW RX stamp) → Kernel (ancillary) → Userspace (carry) → Decode (carry) → Fusion: timestamp captured at ingress; causality preserved under load.


We didn’t fix the model. We fixed time. With PTP on the NIC, phc2sys disciplining CLOCK_REALTIME, and hardware RX timestamps in Modbus/RTSP, millisecond jitter vanished. Once clocks told the truth, fusion learned the workcell, not our scheduler.