Your GPUs are burning money.
Strided tells you where.
Strided is a kernel-level causal diagnosis engine for production LLM inference. Feed it a Nsight Compute report or DCGM telemetry; get a ranked, confidence-scored explanation of the bottleneck — and a specific fix.
A diagnosis engine, not a profiler.
Existing tools tell you your GPU is slow. strided tells you why it's slow, what to do about it, and how confident we are in the answer.
For people who can't keep ten senior CUDA engineers on staff.
A senior engineer with two days of Nsight can find the bottleneck. strided turns that capability into a CLI you can run on every dump.
The gap is the business.
You bought H100-hours. Your workload only used 60% of theoretical FLOPS. The gap between rental rate and realized compute is real money. strided measures it.
Parse. Diagnose. Done.
Capture
Run Nsight Compute, DCGM, or vLLM /metrics against your inference workload. Standard tools, standard outputs.
Parse
Strided normalizes the dump into a canonical schema. Phase timings, layer-level metrics, cluster signals, cache state.
Fire rules
Ten deterministic rules evaluate the schema. KV fragmentation, NCCL dominance, TP imbalance, quantization opportunity, more.
Rank
Each fired rule carries a confidence score and a suggested fix. Ambiguous case → no diagnosis. We don't guess.
Strided is in the 90-day validation window.
We're testing one question: can a deterministic rule engine match a senior CUDA engineer with two days of Nsight, on real customer dumps?
If you operate LLM inference at scale and have a redacted Nsight or DCGM dump from a real workload, we want to test against it. We will not retain raw data. We will share the diagnosis with you.