Intelligence
at the Source

We build custom on-device AI and embedded machine learning solutions — from TinyML on microcontrollers to full edge AI deployment on NVIDIA Jetson. Ultra-low latency, zero cloud dependency, and complete data sovereignty by design.

38KB Smallest production model deployed (anomaly detection, MCU)
94% Accuracy on STM32F4 with 192 KB RAM after proper input normalization
8 Edge AI verticals covered — from industrial PDM to military-grade tactical AI
5× OTA update size reduction via differential model updates on constrained links
Edge vs Cloud

When Edge AI is the Right Choice

Most clients arrive with "cloud vs edge" already decided. Our job is to understand what actually concerns them — latency, privacy, cost, or reliability — and engineer the right answer for that specific constraint.

Choose Edge AI when…

  • Latency under 10–20 ms is non-negotiable (safety systems, real-time control)
  • Data cannot leave the device — medical records, industrial trade secrets, GDPR constraints
  • 500+ devices making continuous inferences — cloud bills become painful within 6–12 months
  • Offline-first environments: field inspectors, remote industrial sites, tactical deployments
  • Specialized accuracy beats general-purpose cloud — your sensor, your environment, your model

Consider Cloud when…

  • Fewer than 10 devices with infrequent inference — cloud is cheaper and simpler
  • 500 ms response time is acceptable for the use case
  • Reliable connectivity is always available and data egress is permitted
  • Rapid model iteration is more important than on-device deployment cost

Our rule: If the problem can be solved without ML — solve it without ML. A door sensor costs $2. A camera with a CV model costs 50× more to build and maintain. We tell clients this even when it means a smaller engagement.

What We Build

Edge AI Service Verticals

Eight edge AI service verticals — each with a defined hardware stack, sensor configuration, neural network architecture, and edge MLOps pipeline. From industrial IoT to enterprise-grade systems, our edge AI development and consulting covers the full lifecycle.

Computer Vision & Smart Video Analytics

Real-time object, face, and license plate recognition on edge hardware. Streams processed locally — no video ever leaves the premises.

YOLO v8/v9/v10 TensorRT DeepStream OpenVINO NVIDIA Jetson

Predictive Maintenance (PDM)

Vibration, acoustic, and thermal anomaly detection on industrial equipment. Deployed on MCUs — runs entirely within the machine, no internet required.

TinyML / Edge Impulse STM32 / ESP32 Autoencoder · 1D CNN MQTT / AWS IoT Edge

Safety & Non-Compliance Detection

PPE detection (helmets, vests), restricted zone violations, fall detection, and behavioral analysis — integrated directly with VMS platforms.

OpenPose / MediaPipe DeepSORT Milestone / Genetec API Smart Cameras NPU

Healthcare & Wearables

On-device ECG analysis, SpO2 monitoring, and portable medical imaging — built for ARM Cortex-M and Apple/Qualcomm wearable SoCs. All inference stays on the device.

LSTM · 1D CNN · U-Net Core ML / Android NNAPI TF Lite BLE Gateways

Smart Retail & Cashier-Free Stores

Customer tracking, shelf product recognition, and loss prevention via sensor fusion (cameras + weight sensors + RFID) on in-store edge servers.

Vision Transformers Re-ID (privacy-safe) Sensor Fusion Kafka / CUDA C++

Precision Agriculture & Drone AI

Crop analysis from agri-drones, autonomous tractor guidance, and precision spraying — running on onboard computers in offline field conditions.

Mask R-CNN · NDVI ROS / MAVLink Hailo-8 / RPi 5 LoRaWAN

Tactical Edge AI & Defense

Autonomous drone navigation, EW signal analysis, and sensor fusion for targeting systems on MIL-STD-810G ruggedized hardware with FPGA acceleration.

FPGA (Xilinx / Altera) Sensor Fusion · RL MANET / ATAK C++ / CUDA

On-Device Voice & Local NLP

Offline speech transcription, translation, and voice assistants using Whisper-family models optimized for edge NPUs and DSPs — no API calls, no data leaving the device.

Whisper.cpp RNN-T / Conformer Vosk / Kaldi / ONNX Apple Neural Engine
Hardware Stack

Edge AI Hardware Expertise: From MCUs to Multi-GPU Edge Servers

From microcontrollers with kilobytes of RAM to multi-GPU edge servers — we select and validate hardware based on your actual deployment constraints, not what looks good in a demo.

Microcontrollers (TinyML)
STM32F4 · 192KB RAM STM32F103 · 20KB RAM ESP32 ARM Cortex-M4/M7

Use case: anomaly detection, vibration classification, simple audio events. int8 quantized models at 38–200 KB.

Single-Board Computers
Raspberry Pi 4 / 5 Google Coral Edge TPU Intel Movidius VPU Hailo-8 Accelerator

Use case: computer vision pipelines, multi-sensor processing. OpenVINO + TF Lite optimized.

NVIDIA Jetson Platform
Jetson Nano Jetson Xavier NX Jetson Orin Nano Jetson Orin

Use case: real-time video analytics, on-device LLMs (15–18 tok/s on Orin Nano), DeepStream pipelines.

FPGA & Specialized Silicon
Xilinx / AMD FPGA Intel Altera Apple M1/M2 (edge server) Qualcomm Snapdragon Wear

Use case: ultra-low latency, hardware security, on-device LLM office/clinic edge servers, defense applications.

Field truth: Documentation is always better than real hardware behavior in the field. We've written direct register code when HAL libraries added more latency than the inference itself. We've dealt with thermal throttling at 60% on Pi 4. We know what surprises to expect — and how to solve them.

Why WebbyLab

Real Experience. Real Numbers. Honest Thinking.

Edge AI development spans embedded systems engineering, on-device machine learning, edge software, connectivity layers, and MLOps. Finding a single team with deep expertise across all of these is rare. We are that team — with custom model optimization, full-stack edge software, wireless connectivity integration, and deployment experience across 100,000+ field devices.

Hardware-First Thinking

We start from hardware constraints, then design the model — not the other way around. The device is a constant, not a variable. Decisions made before manufacturing are permanent.

Production, Not Demos

We've shipped models as small as 38 KB to MCUs in industrial environments. We've dealt with flash wear at 12 months, thermal calibration drift at −15°C, and OTA failures mid-update. These aren't hypotheticals.

Quantization Expertise

Float32 → int8 gives 4× size reduction immediately. We use quantization-aware training by default — 2–3% accuracy gap vs. post-training quantization. We always show per-class accuracy, never a single misleading aggregate.

Edge MLOps Built In

Differential OTA updates (5–6× smaller payload), dual-slot flash with atomic rollback, confidence distribution monitoring, hardware-profile-aware model routing. Not add-ons — built in from day one.

Security as Standard

We've extracted unencrypted models from production devices in under 20 minutes to demonstrate the risk to clients. Model encryption, secure boot, disabled debug interfaces (JTAG/SWD), and signed OTA are standard in every deployment.

Honest Scope Assessment

We've talked clients out of edge AI when it wasn't the right answer. We've reframed "impossible" requirements by separating the solution the client proposed from the actual problem they needed to solve.

POC Insights

Hard Problems We Solved in Proof-of-Concept Work

We haven't shipped millions of edge AI devices — but we've run the POCs where the real engineering questions get answered. These are the technical traps we found, and how we got through them.

01 Industrial · STM32
20KB RAM available

Digital Signature on STM32F103 — Every Existing Library Hardfaulted

Challenge

Client needed a DSTU 4145-compliant digital signature (elliptic curves) directly on-device. Every available implementation assumed desktop-class memory — 256-bit point coordinates required multiple large simultaneous buffers. First run: immediate hardfault on scalar multiplication.

What we found

Mapped the full double-and-add dependency graph manually. Reduced live buffers to 3 × 32 bytes at any point. Rewrote GOST 34.311 hashing as a streaming block processor. Zero dynamic allocation. Signing took 4 seconds — acceptable for the field inspector workflow.

02 Manufacturing · Audio
93% bench accuracy (broke in winter)

int8 Audio Classifier That Passed All Benchmarks — Then Failed When the Factory Got Cold

Challenge

Industrial sound classifier performed at 93% post-int8 in benchmarks. One week in the field: perfect. Then winter. An unheated factory floor shifted the microphone's frequency response. The quantized model began missing events and generating false alarms. Float32 had absorbed the shift; int8 didn't.

What we found

int8 activation clamping in early layers was too tight to handle thermal-induced sensor drift. Fix: collected cold-condition data, retrained with temperature augmentation, explicitly widened clamp range in layers 1–3, added a confidence-gate confirmation pass. The test set is always poorer than the real world.

03 Industrial · Vision
89% aggregate (94–95% on critical classes)

89% Accuracy the Client Rejected — Until We Showed the Per-Class Breakdown

Challenge

5-class defect detection POC. Float32 on server: 96%. Int8 on STM32: 89%. Client said "not acceptable" before seeing the details. Aggregate accuracy was the only number on the table.

What we found

Per-class confusion matrix told a different story: the two critical defect classes held at 94–95%. Accuracy dropped on three minor classes — where parts are re-routed to manual inspection anyway. Three meetings, one detailed breakdown, client approved. Aggregate metrics lie; never show a single number.

04 Retail · Architecture
120 stores — why we proposed hybrid

We Talked a Client Out of Full Edge — and Scoped a Hybrid That Scaled to 120 Stores

Challenge

Retail chain wanted full on-device footfall analytics across 50 stores for privacy. They had reliable corporate internet everywhere. Full edge per-device meant significant capex, multi-version OTA infrastructure, and ongoing firmware management at scale.

What we found

Privacy concern was the real driver, not a technical constraint. Proposed hybrid: video stays local, processed on one in-store server, only aggregated counts go to cloud. Client scaled to 120 stores six months later — the per-device approach would have been unmanageable at that size.

Edge MLOps

Edge MLOps: Drift Detection, OTA Updates & Model Management at Scale

80% of edge AI projects focus on model accuracy, then neglect the infrastructure that keeps models alive in the field. We build the 20% that makes production actually work.

01

Telemetry & Confidence Monitoring

Devices send compact hourly packets: model version, inference count, confidence score distribution, class distribution, and raw sample fingerprints. A confidence drop of 10–15% from baseline over a week is a drift signal — not an alarm, a flag for review.

02

Atomic Dual-Slot OTA

New model writes to the inactive flash slot. Hash verified. Pointer atomically flipped. Power loss during the 2-minute critical window? Device reboots on the old model. Differential updates reduce payload size 5–6× — critical on NB-IoT or LoRa where 200 KB takes hours.

03

Hardware-Profile-Aware Routing

When a supplier changes the accelerometer mid-production run, you have two hardware revisions with different noise profiles in the "same" device. Each unit runs a self-test at boot, sends a hardware fingerprint to the OTA server, and receives the model trained for its actual silicon — not the generic one.

04

Safe Update Scheduling

An update that starts mid-measurement or with 15% battery destroys trust fast. Devices evaluate a readiness condition before accepting updates: idle state, battery above threshold, not in a critical measurement window, confirmed connectivity. The device decides — not the server.

On-Device LLMs

On-Device LLMs & Small Language Models for Enterprise Edge AI

On-device LLMs are no longer a research prototype. We've validated the real performance numbers — and the real enterprise use cases where businesses actually pay.

Validated Hardware & Performance

Hardware Model Speed Notes
Raspberry Pi 5, 8 GB Phi-2 Q4 (GGUF) 3–4 tok/s Structured output, offline docs
Jetson Orin Nano Gemma 2B 15–18 tok/s GPU-accelerated, interactive
Apple M1/M2 Mac Mini 7B+ models 20+ tok/s Office/clinic edge server
STM32 / MCU Any LLM Not feasible Physically impossible at this scale

⚠ Model load time on weak hardware: 20–30 seconds. For most use cases the model must stay resident in memory permanently.

Enterprise Use Cases That Actually Get Budget

Offline Field Documentation

Inspectors and field engineers structure reports, fill forms, and extract data points from observations — with no internet. No sensitive operational data leaves the device. Companies pay for zero-egress assurance.

Local RAG on Corporate Documents

Companies that won't allow internal document content near OpenAI or Google infrastructure. On-premise Q&A with no internet dependency. Already being sold to enterprises in legal, finance, and pharma.

Industrial Equipment Assistants

An operator beside a machine queries a model that knows only that equipment, its documentation, and its failure modes. Replaces a thick manual nobody reads. Works without factory Wi-Fi.

What unites all three: the client pays not for AI — they pay for zero data leakage risk and internet-independent operation.

Security

Edge AI Security Risks Your Team Is Not Thinking About

We've demonstrated these attacks to clients on their own hardware. This is not theoretical.

1

Model Theft via JTAG

An unencrypted model in flash can be physically extracted in under 20 minutes with a standard JTAG adapter. Your model is your IP. We encrypt models at rest, verify integrity at boot, and treat model extraction as a primary attack vector.

2

Adversarial Physical Inputs

In production, we observed operators learning the behavioral pattern that produced "green" results faster — effectively performing an adversarial attack on the model by shifting the input distribution through their own actions. Users are part of distribution shift.

3

Firmware Supply Chain

Who signs the build? How is it verified that exactly the intended firmware lands on the device? Most teams have no answer. Build signing, device-side verification, and secure boot are non-negotiable baseline hygiene — not optional extras.

4

Debug Interfaces in Production

We have seen UART, JTAG, and SWD debug interfaces left open on production devices "because it's easier for debugging." This is equivalent to an SSH server with root/root exposed externally. We audit and close all debug interfaces in production builds.

FAQ

What Clients Ask — and What We Actually Think

This is the most common misconception — and it's simply wrong in narrow-domain contexts. A cloud model is generalized, trained on millions of diverse scenarios. An edge model can be trained exclusively on data from your sensor, in your facility, under your lighting conditions. We have achieved better accuracy on edge than clients had with cloud solutions. Specialization beats generalization in a constrained context.

The opposite is true. Cloud infrastructure is managed by someone else. Edge infrastructure is your device, at your client's site, often without internet, sometimes in harsh environments. Edge AI is harder to deploy, harder to update, harder to debug, and harder to monitor than cloud AI. Anyone telling you otherwise has not shipped edge AI to production at scale.

Between a PyTorch model and working inference on an MCU there is a separate project with its own risks: quantization (int8 or lower), architecture redesign if the model doesn't fit, ONNX export, framework-specific optimization (TensorRT / OpenVINO / TF Lite Micro), driver-level integration, and input normalization on the device. The ML part is often the easiest part of this project.

No. If the device can be physically stolen, the firmware is unprotected, and the model is stored unencrypted — there is no more privacy than cloud. The attack vector is just different: physical access instead of network access. On-device inference reduces network data exposure but does not eliminate privacy risk. Security engineering is still required.

Our heuristic: above 500–1,000 inferences per device per day running continuously, cloud costs typically become painful within 6–12 months. But this depends heavily on model size — a lightweight classifier in cloud costs almost nothing; a vision model scales differently. The right question to ask: what is your projected cloud inference cost annualized? If that number exceeds the edge hardware cost, the conversation becomes concrete.

No. There are entire classes of applications where edge is required not due to lack of connectivity but because of latency (safety systems need sub-20 ms), privacy (medical data sovereignty laws), or reliability (a network hiccup cannot stop a production line). These requirements do not disappear with better 5G coverage. Edge AI is a permanent architectural category, not a workaround.

Get in Touch

Start Your Custom Edge AI Development Project

Whether you're evaluating Edge vs Cloud, scoping an MCU deployment, or need a second opinion on your current architecture — we're happy to have a technical conversation first.

London, UK · Kyiv, Ukraine

    Name*

    Email*

    Phone number

    Company

    Project description*



    Delete

    I give consent to the processing of my personal data given in the contact form above as well as receiving commercial and marketing communications under the

    2026 WEBBYLAB LLC. All rights reserved.