P2 - Under the Hood of SP1. How Your Rust Code Becomes a Proof

Welcome back, intrepid explorer! After our whirlwind introduction to SP1 in Part 1, you’re probably buzzing with excitement about this zero-knowledge virtual machine (zkVM) that runs Rust code and spits out cryptographic proofs. We saw SP1s big picture: a “truth machine” for code, proving programs correct with blazing speed (up to 28× faster than some alternatives) and a healthy dose of engineering magic. But how does it actually work under the hood? That’s our mission in Part 2. In this article, we’ll trace the journey from writing Rust code to obtaining that final proof, step by step. Consider this a guided tour through SP1’s engine room – with the same playful, slightly chaotic vibe as before. By the end, you’ll understand exactly how your Rust code becomes a proof (and why each step is so darn cool). Okay, seatbelts on – here we go!

From Rust to LLVM IR. Code Meets a Universal Translator

So you’ve written some Rust code. (Maybe it’s a function to verify a cryptographic signature, or a little game logic – whatever floats your boat.) How does this high-level Rust turn into something SP1 can run? The first stage is compilation to LLVM IR. If you’re not a compiler guru, LLVM IR might sound like gibberish, but it’s actually the unsung hero of many programming languages. Think of LLVM IR as Esperanto for code: a universal intermediate language that many compilers use as a common step. Rust’s compiler (like many others) translates your code into this LLVM Intermediate Representation before going to machine code. Why? Because LLVM IR is a standardized, well-understood format where a ton of optimizations can be applied. It’s like having a master translator convert your novel into a universally understood draft, which can later be translated into any specific language (be it machine code for an Intel x86 processor, an ARM chip, or… you guessed it, RISC-V!).

Simplified analogy. Imagine you wrote a recipe in English, and you want cooks around the world to follow it. You might first translate that recipe into a universal pictographic cookbook language that all chefs know. LLVM IR is kind of like that universal cookbook language for computers – a way to represent your program in a form that’s not tied to any one type of CPU. The Rust compiler takes your Rust code (with all its fancy abstractions, iterators, etc.) and lowers it down into LLVM IR, which is basically a portable assembly language. This IR strips away the super-high-level stuff and focuses on operations closer to what a CPU understands (loads, stores, arithmetic, function calls, etc.), but in a platform-agnostic way.

Once in LLVM IR form, your program undergoes a bunch of optimizations (the compiler’s chance to make it faster and more efficient). Then comes the next step: turning that optimized IR into actual machine code for a specific architecture. In our case, the target architecture is RISC-V, which is SP1’s native tongue. Why RISC-V? Great question – and the topic of our next section!

From LLVM IR to RISC-V. Speaking SP1’s Language

After the Rust compiler has done its thing with LLVM IR, it’s time to produce real machine code. SP1 is built around the RISC-V instruction set architecture (ISA), specifically a 32-bit variant (RV32IM). This means that to run your program in SP1, we compile it into RISC-V machine instructions. Think of RISC-V as the language of a simple, no-frills computer processor. By targeting RISC-V, SP1 leverages a well-supported, open ISA – and more importantly, we can use all of Rust’s existing compiler infrastructure to get there. In other words, Rust+LLVM already know how to speak RISC-V, so we didn’t have to reinvent the wheel. We just ask Rust’s compiler, “Hey, could you compile this code as if it were going to run on a RISC-V processor?” and it happily obliges.

Analogy time. You have that universally translated recipe (LLVM IR). Now you want a chef in Paris to cook it, so you translate the recipe into French. RISC-V is like the “French” in this scenario – a specific language/dialect for a particular “kitchen” (the SP1 VM). The translation from IR to RISC-V yields a RISC-V binary – essentially an executable file containing RISC-V instructions (in ELF format, just like a Linux binary). This binary is something you could run on a physical RISC-V chip if you had one. And in our case, we do have one… sort of! Instead of silicon, our “chip” will be the SP1 virtual machine.

Why did we choose RISC-V for SP1? Remember, SP1 aims to prove program execution, so every instruction executed needs to be checked in the proof. RISC-V is simple and minimalist by design – it has a clean, small instruction set without a lot of crazy complex operations. For a zkVM like SP1, simplicity = fewer things to turn into math later. The official Succinct blog put it nicely: using a simple general ISA like RISC-V gives flexibility (we can support any language that compiles to it) while keeping the proving logic lean. Fewer instructions and a simpler design mean it’s easier to create the mathematical constraints for each instruction’s behavior. It’s like choosing a basic vocabulary to tell a story so that an extremely strict grammar checker (our proof system) can verify it without getting overwhelmed by obscure words.

In practice, SP1’s toolchain (sp1-sdk) sets up Rust to target RISC-V and even provides a standard library support. That’s right – you can use many normal Rust library features; SP1 isn’t limiting you to some weird freestanding subset. Write your Rust code with std if you want, and compile it for RISC-V using the SDK. The result is a RISC-V executable of your program. Now we take that executable and load it into SP1’s zkVM… time to run it! But we’re not just going to run it normally; we’re going to run it in a way that every step can later be proven. Enter the execution trace, the star of the show in the proving process.

Execution Trace. The zkVM as a Courtroom Stenographer

When SP1 runs your RISC-V program, it behaves a bit differently than a regular computer. A normal CPU just executes instructions and moves on, maybe updating some registers and memory as it goes, and in the end you get your output. SP1’s virtual CPU does all that but also keeps a detailed log of every single step. This log is what we call the execution trace. If you recall our courtroom analogy hint: SP1’s zkVM is like a courtroom where your program is on trial to prove its correctness. In this courtroom, there’s a stenographer recording every word (instruction) and action (state changes) meticulously. The execution trace is essentially a complete transcript of the program’s execution – every instruction executed, the state of registers, any memory reads/writes, etc., step by step.

Why log everything? Because later, we’re going to use this trace as evidence to prove that the program did what it was supposed to do. In a normal setting, recording every CPU cycle would be overkill (and slow). But in a zkVM, the trace is golden. It’s the bridge between “I ran the program” and “I can prove I ran the program correctly.” Think of the trace as a giant table of state snapshots. Each row in this table represents one step of execution (one instruction), and the columns might include things like: the instruction opcode, the program counter (address of the instruction), the values of all CPU registers at that step, any input to the instruction, any output produced (like a value written to a register or memory), and so on. By the final row, the trace contains everything that happened from start to finish.

Now, SP1’s architecture actually breaks this trace into multiple interconnected tables for sanity and performance – for example, one table for CPU state, one for memory operations, one for any special accelerators, etc. This is called a cross-table lookup design, but you can imagine it like this: We have one detailed log of CPU steps, and another log of memory events, and we cross-check them. If at step 100 the CPU log says “loaded value 42 from memory address X,” there better be an entry in the memory log that at some point provided value 42 at address X. SP1 ensures these logs are consistent via clever checks (permutation checks). It’s akin to a diligent auditor comparing two records of the same event – if there’s any mismatch, the whole process is deemed invalid. No sneaky cheating allowed!

To put it playfully - SP1 is like an overzealous diary-writer who writes down everything that happens during execution, and then cross-references every detail to make sure it all adds up. By the time your program finishes running on the SP1 VM, we have this massive diary (the execution trace) that could be used to replay the program exactly or inspect any moment in time. But instead of replaying it for verification (which would be slow, basically re-executing the program), SP1 is about to do something way cooler: turn that diary into a succinct proof that the diary is internally consistent and corresponds to a valid execution of your Rust program.

Before we move on, it’s worth emphasizing how crucial the trace is. It’s the raw material for the proof. If we lost the trace, we couldn’t prove anything – we’d just have an output with no evidence how it came about. With the trace, however, we have all the evidence needed to convince a skeptic. All that’s left is to package that evidence in a cryptographic way… which is exactly what happens next.

From Trace to Proof. STARK Magic / Math to the Rescue!

Alright, take a deep breath – this is where the zero-knowledge magic happens! We have our complete execution trace (that exhaustive log of the program run). Our goal now is to produce a cryptographic proof that this trace is valid (i.e., the program followed all the rules and produced the correct result) without requiring someone to re-run the whole program themselves. SP1 accomplishes this using a flavor of proof system known as a STARK, which stands for Scalable Transparent ARgument of Knowledge. Let’s unpack that briefly in a non-scary way:

Scalable It can handle large computations and traces efficiently (important, because our trace could be huge).

Transparent
No fancy trusted setup ceremony needed. Everything is based on public randomness and math – no secret keys or initial rituals. This makes it easier to use and very secure in a public setting.
Argument of Knowledge It’s a proof that we know something (in this case, the valid execution trace) without revealing the trace itself. We convince others that “a correct trace exists” without giving away every line of that trace.

In simpler terms, a STARK is like a super-compressed representation of our execution that anyone can check. It’s as if we had a massive 500-page transcript (the trace) and we somehow condense it into a one-page certificate that says: “Trust me, all the details in those 500 pages are consistent and correct according to the rules,” and importantly, anyone can verify that certificate quickly. How is this possible? Math, math, math! Specifically, lots of polynomial algebra and a dash of cryptographic randomness.

Here’s the high-level idea (with a bit of technical seasoning, but don’t worry about the nitty-gritty) - we’re going to treat the execution trace as a set of sequences of numbers, which we then interpret as polynomials over a finite field. If you haven’t thought about polynomials since high school, it might sound wild that we use them here, but polynomials are the workhorse of many zk proofs. Each column of our trace table (e.g., the column of “register X’s value at each step”) can be seen as a polynomial where the row index is like the input and the register value is the output. We then establish a bunch of polynomial equations (constraints) that encode the rules of execution. For example, one rule might be “the value of the program counter in the next row equals the previous value plus 4 (since each RISC-V instruction is 4 bytes)” – that rule can be written as a polynomial relationship between successive entries in the program counter column. Another rule might encode that “if an instruction is a load, then the values in the memory table must have a matching entry,” which becomes a relationship between the CPU trace polynomial and the memory trace polynomial (enforced by those cross-lookups we mentioned). It’s like turning the CPU’s spec into algebraic equations.

All these equations together form what’s called an AIR (Algebraic Intermediate Representation) – essentially the math version of the VM’s logic. Crafting the AIR is a one-time effort done by the SP1 developers (don’t worry, you as a user of SP1 never have to write these equations by hand!). The AIR defines what it means for a trace to be a valid execution of a RISC-V program. Now proving the program ran correctly boils down to proving that the numbers in our trace satisfy all those AIR equations.

So how do we prove that without sharing the whole trace? We use a protocol (courtesy of STARKs and some polynomial commitment tricks) that lets us assert “these equations hold for some set of secret polynomials, trust me” and then interactively (or via Fiat-Shamir, non-interactively) convince a verifier that it’s true. A key component here is something called FRI (Fast Reed-Solomon IOP) – basically a fancy way to check polynomial properties by sampling just a few points. Imagine verifying a huge Sudoku solution by checking just a couple of spots in the grid; FRI is kind of like that, but for polynomials: it drastically reduces the amount of data needed to be sent. SP1’s proving system uses FRI and other techniques to keep the proof size small and verification fast, even if the trace had millions of steps.

In the end, SP1 produces a STARK proof - essentially a bundle of cryptographic commitments and bits that convince anyone who looks at them that “there exists an execution trace that satisfies all the rules and produces the outputs claimed.” The verifier (which could be a program, a blockchain smart contract, or any observer) can check this proof in milliseconds typically, which is way faster than running the original program (which might have taken seconds or minutes of computation). We’ve compressed the computation’s correctness into a succinct proof.

To illustrate, it’s like we had a super long courtroom transcript, and now we have a notarized certificate signed by Math itself that says “The proceedings were fair and followed all the laws” – and anyone can validate that notarization quickly, without reading the whole transcript. The actual content of the program’s execution (the trace) can remain secret if needed, or just impractical to transmit; all that matters is the proof attesting to its validity.

One more thing to mention: SP1’s proof can optionally be made even more convenient for certain uses. STARK proofs, while quick to verify and transparent, can be somewhat large in size (on the order of a few MB for big programs). If you want to stick a proof on a blockchain like Ethereum, smaller is better. That’s why SP1 supports an additional step called SNARK wrapping – essentially taking the STARK proof and proving it again with a tiny SNARK proof (like Groth16) that’s only a few hundred bytes. This hybrid approach gives the best of both worlds: the heavy lifting done by a STARK, and a small wrapper proof for on-chain verification. The end result is you can verify an SP1 proof inside an Ethereum smart contract with minimal gas, courtesy of this SNARK wrapper. Cool, right? It’s like taking our one-page certificate and laminating it onto a postcard for easy carrying.

But I digress – the SNARK wrapping and recursive proof tricks (proving proofs of proofs!) were already touched on in Part 1 and could be whole topics on their own. The key takeaway for our purposes is - SP1 translates your Rust program into RISC-V, runs it while recording a trace, and then uses that trace to generate a STARK proof that the execution was valid. Everything else (recursion, SNARK, etc.) is icing on the cake to make proofs smaller or aggregate them – powerful features, but not required to get a basic proof of one program’s execution.

Wrapping Up (and What’s Next)

Congratulations, you just navigated the journey of code-to-proof under the hood of SP1! We started with your Rust source, compiled down through LLVM’s universal IR to RISC-V machine code. We loaded that code into the SP1 zkVM, which ran it step-by-step while logging an exhaustive transcript of execution. Then we saw how SP1’s circuit logic (our strict “judge” and diligent “stenographer” duo) turns that execution trace into a STARK proof – effectively compressing a complete run of a program into a tiny cryptographic certificate of correctness. It’s like turning a long novel into a haiku that still somehow captures the story. Wild, isn’t it?

The beauty of SP1 is that you, the developer, don’t have to worry about most of these complex inner workings. You write and compile Rust code as usual, and SP1 handles the heavy lifting of proof generation. Yet, understanding what’s happening under the hood (as we’ve done here) demystifies the “moon math” aspect. It’s not black magic; it’s careful engineering built on some ingenious mathematics. Zero-knowledge proofs might have a lot of sophisticated theory, but at the end of the day, using them with SP1 feels a lot like using a normal computer – just with an extra proof as output!

Now, if you’re itching to tinker with SP1 yourself, good news SP1 is 100% open-source and available to play with. You can find the code on GitHub, including an examples directory full of cool programs that have been proven (from Fibonacci to light clients). The Succinct team has put together a handy developers portal on their website and detailed docs to get you started (check out the Succinct developers page and the SP1 documentation site for guides). Whether you want to write your first “Hello, ZK World” in Rust or dive into the deep end, those resources have you covered. And if you’re curious about the latest updates or design discussions, the Succinct Labs blog is a great place to peek – they regularly post about performance benchmarks, new features, and roadmap plans.

Speaking of which – remember those mysterious “precompiles” and the talk of SP1 being ridiculously fast? We hinted at special accelerator circuits and even threw out numbers like 20× improvement for certain operations. Stay tuned for Part 3, where we’ll uncover how SP1 turbocharges performance using precompiles (a.k.a. built-in cheat codes for the zkVM) to make proving blazing fast. In that next article, we’ll explore how heavy operations (like hashing, elliptic curve crypto, etc.) get superpowers in SP1, and how that leads to jaw-dropping speedups in proof generation. If you thought today’s deep dive was fun, the next one will be a real treat for the performance junkies out there.

Until next time, keep coding, keep dreaming, and as always – keep it succinct and zk-on!

Write by alexanderblv for the Succinct, May 2025 x.com/alexander_blv ERC20 - 0x1e1Aa06ff5DC84482be94a216483f946D0bC67e7