Self-adapting
silicon for AI.
Modern accelerators run agent workloads at <30% utilization. Silmir builds a flexible matrix of compute, memory, and interconnect blocks. It reorganizes itself to the workload at runtime.
The bottleneck isn't compute. It's the loop.
Agent loops swing between memory-bound calls, I/O-bound tool use, and compute-bound orchestration. They branch and backtrack dozens of times per task.
Accelerators built for batch training leave most of the silicon idle. At cluster scale the imbalance compounds. The network between nodes becomes the dominant bottleneck.
Any architecture that does not treat network and memory as first-class, dynamically allocated resources will hit a hard ceiling.
~10–100× spread in per-device work inside a single inference step. The busiest device does an order of magnitude more than the idlest.
A learning matrix, not a fixed pipeline.
Fine-grained allocation
Power moves at the block level. Idle blocks yield budget to bottlenecked ones in real time.
Global visibility
Every block sees every other block's runtime state. The allocator decides with full information.
Online-learned policy
Allocation is learned, not hand-coded. The policy adapts as workload patterns shift.
Anomaly substrate
A safety layer catches bad policy calls before they propagate. Learning at silicon level becomes safe to ship.
Adaptation, end to end.
Block level
ALU, CPU, and memory blocks adjust voltage, clock, and routing per cycle. Adaptation begins below the core boundary.
Chip level
A learned controller redistributes power across the full block matrix. Compute, memory, and interconnect compete in one budget.
System level
Heterogeneous big-little arrays reorganize execution across nodes per inference loop. The whole system adapts as one.
Fixed-function accelerators win at one pattern. General-purpose chips spread thin across all of them. Neither adapts at runtime. Silmir treats adaptation as the architecture — from the block to the cluster.
Open substrate. Custom silicon next.
Hardware
- BlocksALU · CPU · MEM · NET
- HDLRTL · HLS
- SubstrateOpen-source RISC-V
- SimCycle-accurate · FPGA
- PathShuttle → full-mask ASIC
Runtime
- CoreRust · Python
- TargetsCPU · GPU · NPU clusters
- CompilerMLIR-based IR
- Phase 1Scheduling on GPU clusters
- Phase 2FPGA prototype
Policy
- ModelsGBT · light transformers
- InputsPer-block runtime telemetry
- OutputsPower · clock · route maps
- SafetyAnomaly-gated commit
- LoopOnline learning, on-die
Software first. Silicon when the policy is proven.
-
T+0
Runtime layer on GPU clusters
Show learned allocation beats static partitioning on real agent workloads.
-
T+6
FPGA prototype
Adaptation layer in silicon. Live demo under shifting workload.
-
T+18
Shuttle tape-out
Adaptive subsystem on a shuttle run. Validate physical-level primitives.
-
T+36
Full-mask tape-out
First adaptive ASIC for inference. Hyperscaler design wins.
Building, hiring, talking.
Inference at scale. Adaptive systems. RTL. Agent workloads. Get in touch.