Open position
Engineering Architect, Performance
Why this role exists
Silicon companies benchmark their own hardware. What they rarely do is challenge the results — question whether the benchmark is measuring what matters, whether the compiler is generating what it should, whether the ISA is missing an instruction that would change the outcome. Their performance teams report numbers. We change them.
VRULL’s performance work is not a support function. It’s the intelligence layer that drives compiler priorities, ISA extension proposals, and framework optimisation strategy — across both AI and HPC workloads. When we tell a silicon partner that their matrix extension is leaving 25% on the table because the LLVM backend can’t schedule across a specific dependency, or that their Fortran vectorisation is half the throughput of what the hardware supports because the MLIR lowering misses a tiling opportunity — that’s not a bug report. It’s an architectural insight that changes their product.
AI compresses the data-collection cycle. Automated profiling, regression triage, report generation — all faster. What AI can’t do is look at a 3% regression on SPEC CPU and know that it signals a cost-model problem that will compound across every HPC workload for the lifetime of the product. Or recognise that a PyTorch inference benchmark is hiding a vectorisation failure that would be catastrophic on a real deployment model. That’s your job.
What you’ll do
- Design and run performance campaigns across AArch64 and RISC‑V — AI inference, HPC benchmarks (SPEC, HPCG, HPL), and customer-specific workloads in languages from C/C++ and Fortran to Julia
- Diagnose performance gaps architecturally: connect a benchmark number to a scheduling decision, a missed vectorisation, a pipeline stall, an ISA limitation — whether the code came through GCC, LLVM, MLIR, or Julia’s compiler
- Drive compiler and ISA priorities with data — identify the optimisations worth pursuing and the extensions worth proposing, across both AI and HPC domains
- Build and evolve benchmark infrastructure that scales with AI-assisted automation
- Produce competitive analyses that give silicon partners actionable intelligence, not just spreadsheets
- Present findings at internal reviews and industry forums with the clarity that turns data into decisions
What we’re looking for
- Experience correlating performance data with microarchitectural behaviour — cache effects, branch prediction, pipeline forwarding, memory ordering
- Deep enough compiler knowledge to read generated code from GCC, LLVM, or MLIR and know whether the performance gap is in the compiler, the ISA, or the workload
- Proficiency with profiling tools (perf, hardware PMU counters, trace analysis) and the judgement to interpret them architecturally
- Experience with both AI and HPC workload characterisation — inference latency, training throughput, FLOPS efficiency, memory bandwidth utilisation
- Strong scripting and automation skills — you build the infrastructure, not just use it
- The communication skills to present findings to silicon architects and compiler engineers and make them act
What sets you apart
- Experience proving that a proposed ISA extension delivers its claimed performance on AI and HPC workloads — or proving that it doesn’t
- Background in both compiler optimisation (GCC, LLVM, MLIR) and microarchitectural analysis
- Performance analysis across multiple language ecosystems — you know why a Fortran code vectorises differently from a Julia kernel and what that means for the hardware
- A track record of identifying performance opportunities that internal teams missed
- The confidence to tell a partner their benchmark numbers are wrong — and the data to back it up
Performance analysis is easy. Performance architecture — knowing which number matters and why, whether the workload is an AI model or a Fortran simulation — is rare. That’s what we hire for.
Interested in this role?
Send your CV and a note about why this role interests you to careers@vrull.eu.
Apply for this role