The real work of AI today is writing a Rust parser for a 20-year-old sensor

Everyone in our industry is racing to train bigger models. The chatter is all about parameter counts, GPU clusters, and novel architectures. We're building ever-more-powerful brains in a frantic race for AGI or market dominance.

But we're building them in the dark. A large model without a high-fidelity connection to physical data is like a philosopher without eyesight, hearing, or touch. It can reason brilliantly about a world it has only ever read about.

I've come to believe the frontier of AI isn’t more parameters-it’s proximity.

The Asymmetry of Glamour

When we talk about "AI infrastructure", we're almost always talking about the glamorous half: the compute stack. The world of CUDA cores, distributed training frameworks, and model quantization. This is where the benchmarks are set and the headlines are written.

The other half of the infrastructure—the part that actually touches the world—is the physical stack. It’s the sprawling, messy collection of cameras, microphones, accelerometers, PLCs, lidar, and drones. This is the part that captures reality.

This physical stack is 90 percent of the effort and 0 percent of the glamour. And it's where the truth lives. Pixels, voltages, vibrations, locations—these are the raw, unaliased timestamps on reality.

For every engineer optimizing a Transformer's attention mechanism, there's another one in a factory in the middle of nowhere trying to figure out why a sensor's calibration drifts every time the morning sun hits a specific pipe. One of these jobs gets a keynote presentation; the other gets grease on their laptop.

The Data Moat is Dug in the Dirt, Not the Cloud

For a decade, the conventional wisdom was that "data is the new oil." We took it to mean massive, web-scraped datasets. But that oil is now a commodity. Everyone is training on a similar slice of the public internet. The real, proprietary moat for the next generation of AI will be built on data no one else can get.

Consider a factory trying to predict machine failure. You could train an LLM on every maintenance manual ever written. It could tell you, hypothetically, that "high-frequency vibrations followed by a current spike often precede motor failure." That's pattern matching on text.

But what if you had a live stream of data from an accelerometer and a temperature sensor on the machine? Now you're not predicting based on what others have written. You are observing the precursor to failure directly.

The problem is, that data doesn't arrive as clean JSON. It arrives as a raw, unforgiving stream of bytes over a serial port or a UDP packet.

Let's say our sensor sends a 12-byte packet. The protocol, documented in some PDF from 2005, says:

Bytes 0-3: A u32 magic number (0xDEADBEEF) to validate the packet.
Bytes 4-7: A f32 for vibration frequency in Hz (little-endian).
Bytes 8-11: A f32 for temperature in Celsius (little-endian).

Before any AI/ML model can see this data, a systems engineer has to write the tedious, error-prone code to parse it. This is the unglamorous, forgotten work.

Here’s what that looks like in Rust, a language prized for its safety and correctness in these environments:

// main.rs
use byteorder::{LittleEndian, ReadBytesExt};
use std::io::Cursor;

const MAGIC_NUMBER: u32 = 0xDEADBEEF;

#[derive(Debug)]
struct SensorReading {
    vibration_hz: f32,
    temperature_celsius: f32,
}

#[derive(Debug, PartialEq)]
enum ParseError {
    InvalidLength,
    InvalidMagicNumber,
    ReadError,
}

fn parse_sensor_packet(data: &[u8]) -> Result<SensorReading, ParseError> {
    if data.len() != 12 {
        return Err(ParseError::InvalidLength);
    }

    let mut reader = Cursor::new(data);

    let magic = reader.read_u32::<LittleEndian>().map_err(|_| ParseError::ReadError)?;
    if magic != MAGIC_NUMBER {
        return Err(ParseError::InvalidMagicNumber);
    }

    let vibration_hz = reader.read_f32::<LittleEndian>().map_err(|_| ParseError::ReadError)?;
    let temperature_celsius = reader.read_f32::<LittleEndian>().map_err(|_| ParseError::ReadError)?;
    
    Ok(SensorReading {
        vibration_hz,
        temperature_celsius,
    })
}

fn main() {
    // A valid packet from the sensor
    let raw_data: [u8; 12] = [0xef, 0xbe, 0xad, 0xde, 0x33, 0x33, 0xf3, 0x42, 0xcd, 0xcc, 0x4c, 0x42];
    
    match parse_sensor_packet(&raw_data) {
        Ok(reading) => println!("Successfully parsed sensor data: {:?}", reading),
        Err(e) => println!("Failed to parse packet: {:?}", e),
    }

    // A corrupted packet
    let corrupted_data: [u8; 12] = [0xef, 0xbe, 0xad, 0x00, 0x33, 0x33, 0xf3, 0x42, 0xcd, 0xcc, 0x4c, 0x42];
    match parse_sensor_packet(&corrupted_data) {
        Ok(reading) => println!("Successfully parsed sensor data: {:?}", reading),
        Err(e) => println!("Failed to parse packet: {:?}", e),
    }
}

And here’s the same logic in Go, valued for its simplicity and concurrency, making it great for building data ingestion pipelines:

// main.go
package main

import (
	"bytes"
	"encoding/binary"
	"fmt"
)

const MagicNumber uint32 = 0xDEADBEEF

type SensorReading struct {
	VibrationHz      float32
	TemperatureCelsius float32
}

func parseSensorPacket(data []byte) (SensorReading, error) {
	var reading SensorReading
	if len(data) != 12 {
		return reading, fmt.Errorf("invalid packet length: got %d, want 12", len(data))
	}

	reader := bytes.NewReader(data)

	var magic uint32
	if err := binary.Read(reader, binary.LittleEndian, &magic); err != nil {
		return reading, fmt.Errorf("failed to read magic number: %w", err)
	}

	if magic != MagicNumber {
		return reading, fmt.Errorf("invalid magic number: got %x", magic)
	}

	if err := binary.Read(reader, binary.LittleEndian, &reading.VibrationHz); err != nil {
		return reading, fmt.Errorf("failed to read vibration: %w", err)
	}

	if err := binary.Read(reader, binary.LittleEndian, &reading.TemperatureCelsius); err != nil {
		return reading, fmt.Errorf("failed to read temperature: %w", err)
	}

	return reading, nil
}

func main() {
	// A valid packet from the sensor
	rawData := []byte{0xef, 0xbe, 0xad, 0xde, 0x33, 0x33, 0xf3, 0x42, 0xcd, 0xcc, 0x4c, 0x42}

	if reading, err := parseSensorPacket(rawData); err != nil {
		fmt.Printf("Failed to parse packet: %v\n", err)
	} else {
		fmt.Printf("Successfully parsed sensor data: %+v\n", reading)
	}
    
    // A corrupted packet
	corruptedData := []byte{0xef, 0xbe, 0xad, 0x00, 0x33, 0x33, 0xf3, 0x42, 0xcd, 0xcc, 0x4c, 0x42}
    if reading, err := parseSensorPacket(corruptedData); err != nil {
		fmt.Printf("Failed to parse packet: %v\n", err)
	} else {
		fmt.Printf("Successfully parsed sensor data: %+v\n", reading)
	}
}

This code-handling byte order, validating magic numbers, checking lengths-is the connective tissue between the physical world and the world of AI. It’s not glamorous. It won’t get you a paper at NeurIPS. But without it, your trillion-parameter model is just a blind philosopher.

The Future Streams from Connected Matter

The endgame isn't more sophisticated ways to re-process our digital exhaust. Someday, the most valuable training data won’t come from scraped text—it will stream from connected matter.

When that happens, the companies that own the next wave of AI won't be the ones with the slickest training loop. They will be the ones who have mastered the messy, unglamorous art of interfacing with the physical world. The ones who can deploy, maintain, and calibrate a thousand sensors as easily as we spin up a thousand cloud instances.

We're building incredible engines, but we've neglected the supply chain. It's time to start paying attention to the forgotten half of the stack, because the companies closest to physical reality will be the closest to the truth. And in the long run, the truth always wins.

The real work of AI today is writing a Rust parser for a 20-year-old sensor

The Asymmetry of Glamour

The Data Moat is Dug in the Dirt, Not the Cloud

The Future Streams from Connected Matter

Read more

Debugging Humidity: Lessons From Deploying Code to a Factory Floor