September 29

Leo Compiler. From Source Code to Arithmetic Circuits

Aleo’s Leo is a high-level, Rust-like programming language designed for writing provable programs with zero-knowledge succinct proofs. Unlike traditional compilers, the Leo compiler not only emits R1CS circuits but also formally verifies each step: at each stage it generates machine-checkable proofs (via ACL2) of correct transformation. This makes Leo unique among ZK languages, and the first known language to include a testing framework, package manager, remote compiler and theorem prover for general-purpose ZK applications. In practice, developers write familiar constructs (functions, loops, structs, etc.) in Leo, and the compiler lowers them through multiple phases into efficient Rank-1 Constraint Systems (R1CS) for proving. We examine these compiler stages in detail, show how high-level Leo code turns into circuits, and compare Leo’s approach to that of Circom and ZoKrates.

Compilation Pipeline Overview

The Leo compiler pipeline is a sequence of well-defined phases that gradually transform Leo source text into an R1CS circuit. At a high level it works as follows: the parser reads Leo code (using a PEG grammar) and produces a Grammar AST. This is immediately converted into a cleaner AST (removing punctuation, comments, etc.) The AST is then transformed into an Abstract Semantic Graph (ASG), enriching nodes with context (scopes, types, parent functions) to support further analysis. Next, a canonicalization pass simplifies syntax (e.g. resolving aliases and ensuring a unique representation of types). Then the type-checker and inference engine assigns explicit types to all variables (in Leo every program must have fully specified types before R1CS).

In summary, the Leo pipeline flows Source Code → (Parsing) → Grammar AST → (AST conversion) → AST → (ASG conversion) → ASG → (Canonicalization) → ASG → (Type Checking) → ASG → (Optimizations) → ASG → (Circuit Synthesis) → R1CS. (Each “→” represents a transformation phase, with intermediate artifacts.) In practice this means Leo code is first validated against the syntax and basic constraints (parsing, AST, ASG), then semantically normalized, then optimized, and finally compiled into constraints.

Parsing and AST Conversion

Parsing. The compiler begins by reading Leo code text and tokenizing it according to a parsing expression grammar (PEG). Leo’s grammar is unambiguous: if a file parses, it has a unique parse tree. Lexical and syntactic errors are detected immediately by the PEG parser. (Leo’s formal development even includes a verified ABNF grammar to prove correct parsing.) This phase outputs a Grammar AST capturing all syntactic elements (including punctuation, keywords, etc.).

AST conversion. Immediately after parsing, Leo converts the grammar AST into a clean Abstract Syntax Tree (AST). The AST strips out superfluous nodes (e.g. semicolons, raw syntax tokens, comments) and restructures the parse tree into higher-level nodes (e.g. combine declaration tokens into one Struct node). For example, parsing the declaration

function add(a: u32, b: u32) -> u32 {
    return a + b;
}

yields a Grammar AST with separate tokens for function, identifier, (, ), etc. The AST conversion would then produce a node like FunctionDecl(name="add", params=[("a",u32),("b",u32)], returnType=u32, body=Block(...)). No semantic checks are done here; the goal is simply to reorganize the tree. The compiler also sets up an error-reporting framework at this stage (annotating nodes with source spans).

At the end of parsing and AST conversion, we have an AST that exactly represents the program’s structure. For instance, the AST for a simple code snippet might look like:

Program(
  Transition(name="place_bid", params=[("bidder",address),("amount",u64)], returnType=Void,
    Block([
      LetStmt(var="highest_bid", type=u64, value=<expr>),
      IfStmt(cond=<expr>, then=Block([...]), else=Block([...])),
      ReturnStmt(expr="highest_bid")
    ])
  )
)

Each node records identifiers, literal values, and nested blocks. This AST is still textual (with high-level constructs like IfStmt, ForLoop, etc.).

ASG Conversion. Semantic Graph

ASG conversion. Next, the compiler transforms the AST into an Abstract Semantic Graph (ASG). Unlike the tree-shaped AST, an ASG is a graph where nodes are connected by semantic relationships. In practice, this means linking each variable and function call to its declaration, typing each literal, and annotating scopes and types on every node. For example, each occurrence of a variable x in the ASG points back to the single LetStmt(var="x", ...) that defines it. Each expression node in the ASG carries type information (if known) and a pointer to its parent function and circuit. This richer structure makes it much easier to analyze semantics: the ASG can detect undefined variables, duplicate declarations, or attempts to mutate immutable variables.

Concretely, the AST-to-ASG pass might convert the tree above into something like:

LetDecl(name="highest_bid", type=u64, value=Call(func="max", args=[Var("a"),Var("b")]),
        children=[], parent=Transition("place_bid"))
If(cond=BinOp(">", Var("a"),Var("b")),
   then=Block([ ... ]), else=Block([ ... ]),
   children=[Var("a"),Var("b")],
   parent=Transition("place_bid"))

where each node has links to its children and context. Crucially, the ASG can be converted back to an AST, so that the compiler can serialize it for proof-checking. But the ASG itself is more powerful: it supports complex constructs (cycles via variables referencing definitions, context from parent functions, etc.) that a tree cannot. Errors found during AST→ASG conversion (such as unknown identifiers or illegal mutations) immediately abort compilation.

Canonicalization

After building the ASG, Leo performs canonicalization to simplify syntactic forms. This phase enforces a normalized representation of types and operations so that later phases don’t have to worry about multiple equivalent syntaxes. For example, canonicalization might (depending on details in Section 3.3.1 of the whitepaper) ensure that all integer types are represented in a single form, or that certain syntactic sugar (like implicit casts) is removed. Because all types are now in canonical form, the type-checker can simply compare type nodes by equality.

As an illustration, suppose the AST had a custom type alias or a shorthand notation; canonicalization would replace these with the fully expanded core types. Or if Leo had any deprecated syntax (the grammar might flag it earlier), canonicalization could rewrite it to the newer form. In practice this pass is relatively straightforward and mostly bookkeeping. Its main role is to prepare for type inference and to simplify reasoning (both for the compiler and for formal proofs). (The ACL2 formalization of Leo includes a precise definition of canonicalization, and the compiler emits a proof that it matches the specification)

Type Checking and Inference

Next comes type checking/inference. Leo is statically typed: every variable and expression must have a type known at compile time. The compiler verifies that operations are applied to compatible types (e.g. adding two u64 is allowed, adding a u64 and a bool is not). Leo supports implicit typing in some situations, so the compiler infers types where they are omitted. For example, in a function return statement or a constant declaration, if the type is not explicitly written, it is inferred from context (function signature, constructor types, etc.). By the end of this phase, every variable and expression in the ASG has an explicit type node attached. If any type errors remain (unified types fail to match, missing type info, etc.), compilation fails with an error message telling the developer where to insert an explicit type.

A key guarantee is: before generating R1CS, all types are fully explicit and checked. This matches Section 3.3.2 of the Leo specification. For example, if a programmer writes let x = a + b; without specifying : u32, the compiler will infer that type from a and b. If a and b were of type u32, x is inferred u32. If a and b had mismatched types, a compile-time error is raised. The compiler emits a proof that the resulting AST is a correct type-inference solution to the input AST.

Optimizations. Folding, Inlining, Unrolling

With types resolved, Leo performs high-level optimizations to simplify the program before circuit generation. The three main optimizations are:

  • Constant folding and propagation: Any expression whose operands are all constant values is evaluated at compile time. For instance, let y = 3u8 + 5u8; becomes let y = 8u8;. After folding, the compiler replaces uses of that constant. It then propagates constants into other expressions: e.g. let z = y * 2u8; becomes let z = 16u8. Folding catches dead code (like code that always fails constraints) as early errors. Multiple passes of folding+propagation may be needed: one pass might reveal new constant expressions, which are then folded again.
  • Loop unrolling: Loops in Leo must have a statically known bound. For example, for i: u8 in 0u8..4u8 { ... } will always iterate exactly 4 times. The compiler replaces such loops with multiple copies of the loop body, one per iteration. This “flattens” control flow into straight-line code. For instance: let sum: u64 = 0u64; for i: u8 in 0u8..4u8 { sum = sum + arr[i]; } becomes (after unrolling) something like: let sum: u64 = 0u64; sum = sum + arr[0u8]; sum = sum + arr[1u8]; sum = sum + arr[2u8]; sum = sum + arr[3u8]; Since i is constant on each iteration, each loop body is duplicated with the appropriate index. (Leo’s docs note that array indices must be constant expressions, so this unrolling is always possible) The result is that after unrolling, no loops remain in the AST.
  • Function inlining: Similarly, user-defined functions and circuit definitions are inlined at their call sites. Every call foo(x,y) is replaced by the body of foo, substituting parameters with the arguments. After inlining, no function calls remain in the AST. (Leo allows users to write generic functions, but all are expanded at compile time.)

Because these optimizations remove high-level constructs (loops and functions), they “flatten” the program into a straight-line sequence of primitive operations. This flattening is crucial: the final circuit generator only handles basic arithmetic and boolean gadgets, not complex control flow. Leo may iterate these optimizations repeatedly until no more constants appear. Notably, the compiler generates a proof after each optimization pass, so we end with a chain of proofs guaranteeing that the final unfolded AST is semantically equivalent to the original program. The proof also asserts the resulting AST has no loops, no function calls, and all array sizes known.

Circuit Synthesis (R1CS Code Generation)

The final phase is circuit synthesis: translating the optimized ASG into an R1CS relation. Leo has a library of R1CS gadgets, each one encoding a primitive operation as fixed constraints. For example, a gadget for addition outputs one R1CS constraint enforcing z = x + y. A gadget for boolean checks enforces a variable is 0 or 1 (e.g. b*(b-1)=0). To compile the ASG, Leo walks the graph of operations and substitutes each primitive with its gadget. This leverages the notion of “handcrafted circuits”: instead of generating thousands of low-level multiply-add gates manually, Leo reuses tested components.

Practically, this means: every arithmetic expression like a + b becomes a constraint a + b - t1 = 0 (introducing a new witness t1 for the result), every multiplication a * b yields a constraint a * b - t2 = 0, boolean comparisons become binary constraints, etc. A conditional like if cond { X } else { Y } is handled by introducing a boolean p = (cond>0), and then the outputs are computed by a selector: result = p*X + (1-p)*Y with p*(p-1)=0. In the R1CS language, each of these relations (flattened into linear equations with one multiplication term per constraint) is added to the circuit.

The outcome is a complete R1CS relation that is logically equivalent to the original Leo program: for every valid input the constraints have a satisfying assignment of intermediate “wire” values. Leo’s compiler can then output this R1CS (often in JSON form) for a SNARK prover. It even provides tooling to count how many constraints were generated (letting the developer measure cost). For example, the u8 type in Leo is defined as 8 boolean variables under the hood; if you define a new type u4, Leo uses a gadget with 4 boolean wires and generates the corresponding 4 constraints – ensuring any new user-defined circuit gets compiled into equally efficient constraints.

Finally, the Aleo platform uses a universal SNARK (Nova/Marlin variant) for proving. Leo’s circuit synthesis supports this by structuring the circuit with a universal structured reference string (SRS): instead of needing a fresh trusted setup per circuit, Leo compiles each program against a universal SRS that any party can update afterwards. In summary, the compiler produces an R1CS circuit and proof-of-correctness; the Leo developer then uses Aleo’s proving system to generate or verify proofs off-chain or in smart contracts.

Example - High-Level Leo to R1CS

To see this process in action, consider a concrete Leo snippet. Suppose we have a transition (program) that computes a conditional sum of an array:

program sum_example.aleo {
    // Compute either a constant or the sum of an array
    transition compute(a: u64, b: u64, arr: [u64;4]) -> u64 {
        let mut total: u64 = 0u64;
        if a > b {
            // unrolled loop will iterate from 0..4u8
            for i: u8 in 0u8..4u8 {
                total = total + arr[i];
            }
        } else {
            total = b;
        }
        return total;
    }
}

Parsing/AST. The parser reads the above text into a Grammar AST; the AST conversion then produces a tree roughly like:

TransitionDecl(name="compute", params=[("a",u64),("b",u64),("arr",[u64;4])], returnType=u64,
  Block([
    LetStmt(var="total", type=u64, value=0u64, mutable=true),
    IfStmt(cond=BinOp(">", Var("a"),Var("b")),
      then=Block([
        ForLoop(init=("i",u8,0u8), end=4u8, 
          body=Block([
            Assignment(var="total", value=BinOp("+", Var("total"), Access(arr, Var("i"))))
          ])
        )
      ]),
      else=Block([
        Assignment(var="total", value=Var("b"))
      ])
    ),
    ReturnStmt(expr=Var("total"))
  ])
)

This AST captures the high-level structure: a let-binding, an if, a for, and return. At this stage, no changes have been made except cleaning up syntax.

ASG. Converting to the ASG adds cross-links: each use of a, b, arr, total, and i is linked to its declaration. The >, +, and array access are tagged with expected types (u64 for the sum, u8 index, etc.). For example, the ForLoop node knows that i: u8 ranges from 0 to 4. If we imagine part of this ASG, the Var("total") nodes in the loop body point back to the LetStmt("total"). No semantic errors occur since types are consistent (e.g. arr[i] is valid: i is u8, arr is [u64;4]).

Canonicalization. After canonicalization, the AST/ASG looks the same (types were already explicit). We now know the loop bound is the constant 4, and types match syntactic expectations.

Type Inference. In this snippet all types were given (u64, u8, etc.), so type inference simply confirms consistency. One check: 0u64 is a u64 literal, so total: u64 is correct. The compiler verifies no types are missing or mismatched.

Optimizations. Now we optimize. The loop for i in 0..4 has a constant bound 4, so we unroll it. The compiler replaces the loop with four copies of its body (for i=0,1,2,3). The code becomes:

let total: u64 = 0u64;
if a > b {
    total = total + arr[0u8];
    total = total + arr[1u8];
    total = total + arr[2u8];
    total = total + arr[3u8];
} else {
    total = b;
}
return total;

Here we see “no loops” and each index 0u8..3u8 is inlined. There are also no function calls to inline. Constant folding does little (total starts at 0, nothing else constant). All u8 and u64 types remain the same.

ASG after optimization. The ASG now has four consecutive Assignment("total", total + arr[i]) nodes (with i replaced by concrete constants). The IfStmt still exists at this level.

Circuit Synthesis. Finally, Leo emits constraints. It introduces a boolean witness p = (a > b) using a comparison gadget. Concretely, it will compile a > b to something like: compute t = a - b, assert p * (p - 1) = 0 (forcing p to be 0 or 1), and assert p = 1 if t is nonzero (via another gadget). Then for the assignments inside the if: on the “then” branch, total accumulates arr[0] + arr[1] + arr[2] + arr[3]; on the “else” branch, total = b. The compiler will enforce a final constraint

total = p*(arr[0]+arr[1]+arr[2]+arr[3]) + (1-p)*b.

In R1CS form this becomes linear equations. For example, writing a few constraints:

t1 = a - b
p*(p-1) = 0                    // p is boolean 0/1
total_then = arr[0] + arr[1] + arr[2] + arr[3]
total = p * total_then + (1-p) * b

(Each equation would be turned into one or more rank-1 constraints by introducing intermediary variables.) What matters is that the same logic has been captured by algebraic constraints. The boolean p effectively selects the path, and the sum of the array elements is computed by four additions (one per unrolled step). In practice, Leo’s gadget library might do this slightly differently (e.g. adding one by one), but the net result is the same. The final R1CS encodes both branches of the if and ensures total is correct.

In summary, the original high-level code with a loop and branch is lowered to straight-line arithmetic relations. Each high-level construct becomes a small network of constraints: comparison gadgets for >, addition gadgets for +, boolean gadgets for if, etc. This example illustrates the general pattern: complex control flow is flattened, and only basic arithmetic/logic remains at circuit time.

Leo vs Circom vs ZoKrates

Leo’s compilation model shares goals with other ZK DSLs like Circom and ZoKrates, but there are important differences in design and developer experience. Below we compare along several dimensions:

  • Language Paradigm: Leo is a general-purpose, Rust-like language with rich features: functions, generics, structs, records, tuples, static arrays, etc. Circom is a domain-specific “circuit” language, where programs are built from templates (components) that define constraints. Circom has no traditional functions, only templates and a fixed main circuit. ZoKrates is a C-like imperative DSL: it has functions, mutable variables (with mut), for loops, structs, and static arrays, but no objects or generics beyond templates.
  • Control Flow and Loops: Leo supports arbitrary nested loops and conditionals in code, but they must have static bounds (so all loops are unrolled). Circom 2 introduced some control constructs (like looping templates), but generally designers use recursion at the template level or replicate components manually. ZoKrates allows for-loops with constant bounds (e.g. for u32 i in 0..5 { ... }) and if statements. All three perform loop unrolling: Leo explicitly at compile-time, ZoKrates as part of compilation, and Circom requires unrolling via recursive template instantiation or macros.
  • Type System: Leo has a strong static type system with inference: types must be explicit by R1CS time, and the compiler infers missing ones (emitting errors if ambiguous). It includes rich algebraic types: field elements, integers (u8, u32, etc.), booleans, addresses, and user-defined structs/records. ZoKrates also has a static type system with two primitives (field and bool) and supports u32, u64 etc. with overflow semantics, plus static arrays/structs. Circom’s type system is more limited: it has “signals” (wires) and public/private keywords, but all data is essentially elements in a field; it does allow tuples and array types, but no user-defined structs (as of Circom 2). Importantly, Leo’s AST and ASG passes are formally verified to respect type rules, whereas Circom and ZoKrates compilers are not formally checked.
  • Optimizations: All three compilers perform similar optimizations (constant folding, dead code elimination, etc.), but the tooling differs. Circom’s compiler has built-in constraint simplification: a --O2 mode does constant propagation on the constraint equations. For example, unused wires can be eliminated at Circom compile time. ZoKrates similarly folds constants and unrolls loops during its IR-to-circuit lowering, though it exposes fewer optimization knobs. Leo explicitly documents its optimization passes: constant folding, propagation, unrolling, inlining run until fixpoint. Because Leo emits a proof for each optimization step, it can guarantee those transforms are semantics-preserving. In contrast, Circom and ZoKrates do not produce such proofs (though Circom’s constraint simplifier is known to occasionally remove constraints if undefined, see discussions).
  • Proof System Integration: Leo is built for the Aleo stack, which uses a universal SNARK (Nova/Marlin) with an updatable SRS. The Leo compiler supports generating circuits that work with Aleo’s proving setup: developers can use a single universal trusted setup across all programs. Circom is commonly paired with Groth16 proofs (e.g. via snarkjs and EVM verifiers), though newer tools also support PLONK. ZoKrates supports multiple schemes (Groth16, Plonk, Marlin) and can output Solidity verifiers for Ethereum. In terms of proofs, Leo’s difference is formal: its entire pipeline is designed around formal verification of compiler steps, whereas Circom/ZoKrates compilers are not.
  • Developer Experience: Leo provides a modern developer toolkit: a Cargo-like package manager for crates, a local/remote compiler service, an official playground, and built-in testing utilities. Circom has a simpler setup (each circuit is a .circom file, compiled with command-line tools); it has libraries of template components (circomlib) but no standard package system. ZoKrates offers an interactive CLI and browser IDE, and defines a JSON ABI, but also lacks a formal typecheck framework. Leo’s language syntax (inspired by Rust) and formal tooling can be more approachable for experienced developers, while Circom’s “hardware description” style is more specialized.

The table below summarizes some key contrasts:

Overall, Leo aims for expressiveness and correctness: developers can write familiar high-level code, rely on formal checks, and still target optimized SNARK circuits. Circom and ZoKrates sacrifice some high-level convenience for simplicity: they expose a more direct view of constraints but lack Leo’s proven guarantees.

Practical Insights for ZK Developers

For developers, using Leo offers both high-level abstraction and insight into how code maps to proofs. Because Leo is a compiled language with Rust-like syntax, experienced programmers can leverage loops, functions, and types instead of manually wiring gates. At the same time, Leo’s compile-time feedback is rigorous: type errors or unsupported patterns are caught early, and developers see how constants and control flow are handled. For example, knowing loops will be unrolled encourages writing loops only when bounds are small. The built-in testing framework lets developers write unit tests for transitions without crafting proofs manually.

Working with Leo deepens one’s understanding of zero-knowledge circuits. Seeing a high-level conditional turn into a boolean selector gadget, or observing how an array sum yields multiple addition constraints, helps developers internalize the cost model of ZK proofs. The Leo REPL and compiler even let you count constraints for a given code snippet, so you can compare different implementations and optimize. In short, Leo hides the low-level constraint boilerplate, but doesn’t obscure it: you can always audit the generated R1CS or witness code if needed. This balance of abstraction and transparency makes Leo a strong pedagogical tool.

Moreover, Leo is tightly integrated with the Aleo ecosystem: compiled programs can be deployed to Aleo’s private network, and proofs easily verified on-chain with provided Solidity verifiers. The universal SNARK setup means developers don’t fuss over trusted setup for each circuit. Learning Leo thus directly builds skills applicable to secure app development on Aleo (and by extension, understanding generic ZK-SNARK systems).

Inspiringly, Leo’s formal approach raises confidence: every Leo program comes with a machine-checked guarantee that its circuit matches the code. In an industry where trust in compiler correctness is crucial, this is a distinctive advantage. For ZK practitioners, Leo represents a mature, well-engineered platform where one can focus on building logic, while the compiler handles the heavy lifting of circuit construction and verification.

In summary, the Leo compiler provides a clear, rigorous pathway from high-level code to arithmetic circuits. Its multi-stage pipeline (parsing → AST → ASG → optimized ASG → R1CS) ensures both efficiency and correctness. By comparison, Circom and ZoKrates offer lighter-weight, DSL-centric toolchains. Developers choosing Leo benefit from its expressiveness, optimization, and formal guarantees, making it an attractive entry into the world of zero-knowledge programming and the growing Aleo ecosystem.

Write by alexanderblv for the Aleo, September 2025

x.com/alexander_blv

ERC20 - 0x1e1Aa06ff5DC84482be94a216483f946D0bC67e7