YC S26 · Pre-tape-out · San Francisco

Self-adapting
silicon for AI.

Modern accelerators run agent workloads at <30% utilization. Silmir builds a flexible matrix of compute, memory, and interconnect blocks. It reorganizes itself to the workload at runtime.

Read the thesis Get in touch

DomainAdaptive Silicon · Inference

ScopeBlock · Chip · System

StatusFPGA prototype in development

FIG.01 Adaptive compute / memory / interconnect matrix · live power reallocation

<30%

Accelerator utilization on agent workloads.

10–100×

Per-device work spread inside one inference step.

1000+

Node-scale deployments studied to surface the bottleneck.

// 01 · Thesis

The bottleneck isn't compute. It's the loop.

Agent loops swing between memory-bound calls, I/O-bound tool use, and compute-bound orchestration. They branch and backtrack dozens of times per task.

Accelerators built for batch training leave most of the silicon idle. At cluster scale the imbalance compounds. The network between nodes becomes the dominant bottleneck.

Any architecture that does not treat network and memory as first-class, dynamically allocated resources will hit a hard ceiling.

~10–100× spread in per-device work inside a single inference step. The busiest device does an order of magnitude more than the idlest.

FIG.02 Per-GPU utilization · single inference step · 16-device node

// 02 · Architecture

A learning matrix, not a fixed pipeline.

Fine-grained allocation

Power moves at the block level. Idle blocks yield budget to bottlenecked ones in real time.

Global visibility

Every block sees every other block's runtime state. The allocator decides with full information.

Online-learned policy

Allocation is learned, not hand-coded. The policy adapts as workload patterns shift.

Anomaly substrate

A safety layer catches bad policy calls before they propagate. Learning at silicon level becomes safe to ship.

FIG.03 Stack: workload signals → learned allocator → adaptive matrix → physical substrate

// 03 · Differentiator

Adaptation, end to end.

Block level

ALU, CPU, and memory blocks adjust voltage, clock, and routing per cycle. Adaptation begins below the core boundary.

Chip level

A learned controller redistributes power across the full block matrix. Compute, memory, and interconnect compete in one budget.

System level

Heterogeneous big-little arrays reorganize execution across nodes per inference loop. The whole system adapts as one.

Fixed-function accelerators win at one pattern. General-purpose chips spread thin across all of them. Neither adapts at runtime. Silmir treats adaptation as the architecture — from the block to the cluster.

// 04 · Stack

Open substrate. Custom silicon next.

Hardware

BlocksALU · CPU · MEM · NET
HDLRTL · HLS
SubstrateOpen-source RISC-V
SimCycle-accurate · FPGA
PathShuttle → full-mask ASIC

Runtime

CoreRust · Python
TargetsCPU · GPU · NPU clusters
CompilerMLIR-based IR
Phase 1Scheduling on GPU clusters
Phase 2FPGA prototype

Policy

ModelsGBT · light transformers
InputsPer-block runtime telemetry
OutputsPower · clock · route maps
SafetyAnomaly-gated commit
LoopOnline learning, on-die

// 05 · Roadmap

Software first. Silicon when the policy is proven.

T+0

Runtime layer on GPU clusters

Show learned allocation beats static partitioning on real agent workloads.
T+6

FPGA prototype

Adaptation layer in silicon. Live demo under shifting workload.
T+18

Shuttle tape-out

Adaptive subsystem on a shuttle run. Validate physical-level primitives.
T+36

Full-mask tape-out

First adaptive ASIC for inference. Hyperscaler design wins.

// 06 · Contact

Building, hiring, talking.

Inference at scale. Adaptive systems. RTL. Agent workloads. Get in touch.

founders@silmir.com

Self-adapting silicon for AI.

The bottleneck isn't compute. It's the loop.

A learning matrix, not a fixed pipeline.

Fine-grained allocation

Global visibility

Online-learned policy

Anomaly substrate

Adaptation, end to end.

Block level

Chip level

System level

Open substrate. Custom silicon next.

Hardware

Runtime

Policy

Software first. Silicon when the policy is proven.

Runtime layer on GPU clusters

FPGA prototype

Shuttle tape-out

Full-mask tape-out

Building, hiring, talking.

Self-adapting
silicon for AI.