Inferensys

Glossary

Ethos-U55

The Arm Ethos-U55 is a configurable, area- and power-efficient microNPU (Neural Processing Unit) accelerator for machine learning inference in embedded and IoT endpoint devices.
Engineer deploying small language model to edge device, IoT sensor visible on desk, technical hardware setup in bright workspace.
MICRO NPU

What is Ethos-U55?

The Arm Ethos-U55 is a micro Neural Processing Unit (microNPU) designed as a configurable, area- and power-efficient accelerator for machine learning inference in embedded and IoT endpoint devices using Cortex-M CPUs.

The Arm Ethos-U55 is a microNPU (Neural Processing Unit) accelerator designed to offload and dramatically accelerate machine learning inference workloads from the main Cortex-M CPU in deeply embedded and IoT devices. It is a configurable, licensable intellectual property (IP) block that silicon vendors integrate into their system-on-chip (SoC) designs to deliver high performance per watt for convolutional neural networks within strict area and power budgets.

Deploying models to the Ethos-U55 requires a vendor-specific NPU SDK and compiler toolchain that translates standard neural network formats into optimized, executable code for the accelerator. This hardware-software co-design is critical for TinyML deployment, enabling complex vision and audio models to run in real-time on microcontrollers where software-only execution would be infeasible due to memory and latency constraints.

ARM ETHOS-U55

Key Architectural Features

The Arm Ethos-U55 is a microNPU (Neural Processing Unit) designed as a configurable, area- and power-efficient accelerator for machine learning inference in embedded and IoT endpoint devices using Cortex-M CPUs.

01

MicroNPU Core Architecture

The Ethos-U55 is a dedicated neural processing unit architected from the ground up for constrained devices. Its core is a single-instruction, multiple-data (SIMD) engine with a tensor array of processing elements (PEs). Key features include:

  • Configurable PE Count: The number of PEs is scalable (e.g., 32, 64, 128, 256) to balance performance, silicon area, and power for the target application.
  • Weight Streaming: The architecture employs a weight decoder that streams compressed weights directly from external memory (e.g., Flash) into the PE array, minimizing the need for large on-chip SRAM to hold the entire model.
  • Activation Caching: Intermediate activation tensors are managed via a small, efficient system cache, reducing costly main memory accesses.
02

Memory System & Dataflow

Efficient data movement is critical for low-power operation. The Ethos-U55 uses a hierarchical memory system and a weight-stationary dataflow.

  • Hierarchical Memory: Comprises small, fast buffers close to the PEs, a larger shared SRAM, and interfaces to external Flash and RAM. This minimizes energy-intensive accesses to main memory.
  • Weight-Stationary Flow: Weights are loaded into the PE array once and held stationary, while activation data streams through. This maximizes data reuse and reduces power consumption compared to activation-stationary or output-stationary flows.
  • Direct Memory Access (DMA): Dedicated DMA engines manage data transfers between memory hierarchies, operating in parallel with computation to hide latency.
03

Supported Operations & Precision

The accelerator is optimized for the most common layers and data types used in efficient neural networks for edge sensing.

  • Layer Support: Natively accelerates convolutions, fully connected layers, transpose convolutions, pooling (max/average), and element-wise operations (add, mul, etc.).
  • Data Types: Primarily optimized for int8 (8-bit integer) and int16 arithmetic, which provide a favorable accuracy-to-efficiency trade-off for microcontroller deployments. Support for asymmetric quantization is integral.
  • Activation Functions: Hardware-accelerated support for common non-linearities like ReLU, ReLU6, and sigmoid/tanh (via lookup tables).
04

Cortex-M System Integration

The Ethos-U55 is designed as a coherent accelerator tightly coupled with an Arm Cortex-M CPU, typically via an AMBA AXI system bus.

  • Programmer's Model: The Cortex-M core acts as the host processor, controlling the U55 via a simple register interface. It offloads entire subgraphs or layers for execution.
  • Shared Memory: The U55 operates on tensors located in system SRAM accessible to the CPU, enabling zero-copy data passing. Coherency is managed by the system.
  • Interrupt-Driven: The U55 signals task completion via an interrupt, allowing the CPU to sleep or perform other tasks during inference, maximizing system power efficiency.
06

Power & Performance Profiles

The architecture is designed for the extreme efficiency required in always-on, battery-powered endpoints.

  • Performance: Delivers inference speeds ranging from tens to hundreds of GOPS (Giga Operations Per Second), depending on configuration and clock frequency, enabling real-time audio and vision tasks.
  • Power Efficiency: Achieves TOPS/W (Tera Operations Per Second per Watt) figures significantly higher than CPU-only execution, often in the range of 2-5 TOPS/W.
  • Area Efficiency: As a microNPU, it occupies a small silicon footprint (e.g., ~0.5 mm² for a 128-PE configuration in a 22nm process), making it viable for cost-sensitive MCU/SoC integration.
ETHOS-U55

How It Works: System Integration & Dataflow

The Arm Ethos-U55 is a microNPU (Neural Processing Unit) designed as a configurable, area- and power-efficient accelerator for machine learning inference in embedded and IoT endpoint devices using Cortex-M CPUs.

The Ethos-U55 microNPU operates as a tightly-coupled coprocessor to an Arm Cortex-M host CPU, offloading neural network inference workloads. It connects via a dedicated AXI-stream interface to the system bus, enabling direct access to model weights and input data stored in system memory. The host CPU initiates inference by writing control registers, triggering the U55 to autonomously fetch and execute the compiled neural network graph. This hardware acceleration dramatically reduces CPU load and system power consumption compared to software-only execution on the Cortex-M core.

Data flows through a dedicated SRAM tensor buffer within the U55, minimizing costly external memory accesses. The accelerator's systolic array architecture performs the dense matrix multiplications central to convolutional and fully connected layers with extreme efficiency. Post-processing activation functions like ReLU are applied on-the-fly. The host is notified via interrupt upon completion, at which point the final output tensors are available in main memory. This streamlined pipelined dataflow is managed entirely by the U55's internal sequencer, requiring minimal CPU oversight after job submission.

ETHOS-U55

Primary Use Cases & Applications

The Arm Ethos-U55 microNPU is designed to bring efficient machine learning inference to deeply embedded systems. Its primary applications are in power- and area-constrained endpoint devices where local intelligence is critical.

02

Anomaly Detection in Industrial IoT

Processes sensor data streams (vibration, acoustic, current) in real-time to predict mechanical failures. The microNPU executes time-series classification or regression models directly on the sensor node, enabling predictive maintenance.

  • Example: Monitoring pump vibrations for bearing wear, detecting electrical anomalies in motors.
  • Benefit: Reduces unplanned downtime by enabling condition-based maintenance without constant cloud connectivity.
03

Visual Wake Words & People Detection

Brings computer vision to battery-powered cameras and sensors. The Ethos-U55 accelerates lightweight convolutional neural networks (CNNs) for tasks like person detection, object counting, or simple classification.

  • Example: A security camera that activates recording only when a person is detected, conserving energy and bandwidth.
  • Benefit: Enables privacy-preserving, reactive systems that filter data at the source.
04

Predictive Sensor Fusion

Combines data from multiple sensors (IMU, environmental) using small neural networks to infer complex states. The Ethos-U55's efficiency allows for running multi-input models that would be too heavy for a Cortex-M core alone.

  • Example: In a wearable, fusing accelerometer and gyroscope data for advanced activity recognition or fall detection.
  • Benefit: Creates more context-aware and intelligent sensing applications from simple sensor hardware.
05

Low-Power Biometric Authentication

Enables on-device biometric verification for enhanced security. The accelerator can run models for fingerprint matching or face recognition locally, ensuring biometric data never leaves the device.

  • Example: Access control systems, secure payment terminals, or personal lockboxes.
  • Benefit: Provides a high-security, low-latency authentication path that is resilient to network outages.
06

Smart Health & Wearable Monitoring

Processes physiological signals for real-time health insights. The Ethos-U55 can run models for heart rate variability analysis, sleep stage classification, or abnormal ECG detection directly on a wearable device.

  • Example: A smartwatch that provides real-time atrial fibrillation detection without sending raw ECG data to a phone.
  • Benefit: Maximizes user privacy and enables immediate, life-critical alerts.
PERFORMANCE COMPARISON

Ethos-U55 vs. Software-Only Inference on Cortex-M

This table quantifies the performance, efficiency, and system impact of using the Arm Ethos-U55 microNPU accelerator versus executing neural network inference solely on the CPU cores of a Cortex-M microcontroller.

Metric / FeatureSoftware-Only on Cortex-MEthos-U55 Accelerated

Peak INT8 Throughput

~2-5 GOPS (varies by core)

Up to 128 GOPS (configurable)

Typical Inference Speedup

1x (Baseline)

10x to 50x

Power Efficiency (TOPS/W)

~0.1 - 0.5 TOPS/W

2 TOPS/W

CPU Utilization During Inference

80-100%

< 10% (manages NPU)

System Power Draw (Active Inference)

High (CPU fully active)

Low (CPU idle, NPU efficient)

Latency for a 50-layer CNN

500-2000 ms

10-100 ms

Memory Bandwidth Pressure

High (weights & activations via bus)

Reduced (weights cached in NPU)

Supported Data Types

INT8, INT16, FP32 (via CMSIS-NN)

INT8 (primary), INT16, INT4 (configurable)

Model Porting Effort

Lower (CMSIS-NN kernels)

Requires NPU-specific compilation & tuning

Deterministic Execution Timing

Yes (CPU-based)

Yes (NPU is a deterministic hardware block)

Hardware Cost & Silicon Area

N/A (uses existing CPU)

Additional die area for NPU macro

ARM ETHOS-U55

Frequently Asked Questions

Essential questions and answers about the Arm Ethos-U55 microNPU, a dedicated hardware accelerator for machine learning inference in deeply embedded systems.

The Arm Ethos-U55 is a micro Neural Processing Unit (microNPU), a dedicated hardware accelerator designed to offload and execute machine learning inference workloads from an Arm Cortex-M series CPU. It works by receiving compiled neural network subgraphs, which it processes using highly optimized, parallel compute units for operations like convolutions and activations. The U55 features a configurable architecture, allowing silicon designers to scale its performance and area by adjusting the number of macs per cycle and the size of its internal weight and activation memory. It operates in a tightly coupled memory (TCM) system, enabling low-latency data exchange with the host CPU, and is managed through a driver that integrates with frameworks like TensorFlow Lite Micro.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.