Types That Depend on Values

The deeper ideas of Chapter 2 of Theorem Proving in Lean 4, picking up where Lean by Example deliberately stopped.

One question runs under Chapter 2: what changes about a type system when a type can depend on a value? The short answer is expressive power. Properties other languages enforce at runtime (a list's length, an index's bound, a number's positivity) live in the type itself, and the type checker becomes a proof-checker for those properties.

From List α to Vector α n

TPIL4 §2.8

Lean follows the standard type-theory convention of using lowercase Greek letters (α, β, γ) as type variables. With that in mind:

The type List α is polymorphic: it takes a type argument, and α picks the element type. Lists of Nat, lists of Bool, lists of String all come from the same schema, with α deciding which.

#check [1, 2, 3]
  | [1, 2, 3] : List Nat
#check [true, false]
  | [true, false] : List Bool

This is already a small instance of a type depending on a parameter: List Nat and List Bool are distinct types, and the parameter α decides which. The book opens §2.8 by pointing at exactly this.

2.8The short explanation is that types can depend on parameters. You have already seen a nice example of this: the type List α depends on the argument α, and this dependence is what distinguishes List Nat and List Bool.

But notice what List α does not distinguish. A list of three Nats and a list of three thousand Nats have the same type: List Nat.

#check [1, 2, 3]
  | [1, 2, 3] : List Nat
#check [1, 2, 3, 4, 5]
  | [1, 2, 3, 4, 5] : List Nat

The length of the list is not visible to the type checker. If a function expects a list of exactly five, the check happens at runtime, the failure is handled somehow, and you hope no caller was missed.

Imagine instead a type that does carry the length. Call it Vector α n: the type of vectors with elements of type α and length exactly n. The type takes two parameters, a type α and a natural number n. The real definition in mathlib has shifted over time; the sketch below is the shape, not the literal source:

def Vector (α : Type) (n : Nat) : Type := ...

The jump: one of the two parameters is a value. α is a type; n is a Nat. Vector Nat 3 and Vector Nat 5 are now different types, even though the element type is the same. A type that takes a value parameter is said to be indexed1 by that value.

The payoff shows up in the constructors. Given the sketch above, the cons that prepends an element to a vector would have this signature:

Vector.cons :
  {α : Type} →
  {n : Nat} →
  α →
  Vector α n →
  Vector α (n+1)

Read the return type. Vector α (n+1). Not just any vector: a vector of length one more than the input vector. The (n+1) in the return type is a real expression. The argument n flows into it, and the result is a type computed from that argument.

That is the move. The type on the right of the final arrow is computed from the values of the arguments to the left. List couldn't say "output length = input length + 1" because List didn't have a length to talk about. Vector does, because it built the length into the type.

Pi types: when the return depends on the input

TPIL4 §2.8

The (n+1) in Vector.cons's return type is an instance of a general pattern. When the type on the right of an arrow is computed from the value on the left, the arrow is called a dependent function type, or a Pi type. The name comes from a mathematical notation: Π (a : α), β a with the Greek letter Π. Lean's surface syntax uses the arrow form.

The shape is (a : α) → β a. The a is an ordinary argument of type α. The β a on the right is the return type, and it mentions the argument a. β is a function from α to types: a family of types indexed by α.

Read (a : α) → β a out loud: "for each a of type α, give me a value of type β a." The return type varies with the specific a.

When β doesn't reference a (when the return type is the same for every input), the Pi collapses back to the ordinary function arrow. α → β is notation for (a : α) → β where β happens not to mention a. Every ordinary function type is a degenerate Pi.

With the name in hand, a signature Lean by Example showed without fully unpacking now reads differently:

#check @List.cons
  | @List.cons : {α : Type u_1} → α → List α → List α

This is a Pi type where the dependency is on the first argument, α. The types of the subsequent arguments and the return type all mention α, the value passed in first. Polymorphism, the thing where List.cons works on any element type, is exactly this: a Pi type whose first argument happens to be a Type. The move Vector α n made in §1 is to let the dependency be on an ordinary Nat instead, or on anything else a value can be.

Sigma types: when the second component's type depends on the first

TPIL4 §2.8

Products generalize the same way arrows did. The ordinary pair α × β holds a value of type α with an independent value of type β. The dependent pair (a : α) × β a lets the second component's type depend on the first component's value.

An ordinary pair:

#check (5, "hello")
  | (5, "hello") : Nat × String

The String type makes no reference to 5. Swap the 5 for any other Nat and the second component's type doesn't change.

A dependent pair bundles a value with something whose type mentions it. The canonical example uses Fin n, Lean's type of natural numbers strictly less than n. The inhabitants of Fin 5 are 0, 1, 2, 3, 4. Now bundle a size with a valid index into a collection of that size:

(n : Nat) × Fin n

The second component's type is computed from the first. A pair of 3 and 2 is well-typed (since 2 < 3); a pair of 3 and 4 is not, because 4 is not in Fin 3. The type checker rules the mismatch out before any runtime check would.

Construction uses the anonymous-constructor syntax ⟨a, b⟩:

def p : (n : Nat) × Fin n := ⟨3, 2⟩

Under the hood, the dependent-pair type is Lean's Sigma; (a : α) × β a is surface syntax for Sigma (fun a => β a). The non-dependent form is Prod. Most code uses the angle-bracket syntax and doesn't name either.

As with Pi, there's a Greek-letter form: Σ a : α, β a. And as with Pi, the ordinary product is the degenerate case: α × β is (a : α) × β where β doesn't mention a.

Reach for × when the components are independent. Reach for (a : α) × β a when the second component's type needs to mention the first component's value.

The universe hierarchy

TPIL4 §2.2

Once types are first-class values, a question arises that simpler type systems don't have to ask. If Nat : Type, what does Type have on its right-hand side when you #check it?

#check Nat
  | Nat : Type
#check Type
  | Type : Type 1
#check Type 1
  | Type 1 : Type 2

Lean answers with a tower. Type (sometimes written Type 0) has type Type 1; Type 1 has type Type 2; and this continues forever. There is a Type n for every natural number, each one living in the next one up.

The reason the hierarchy exists is paradox avoidance. A naive Type : Type lets you form a type that, in effect, contains itself, and the system derives a contradiction: Girard's paradox2, the type-theoretic cousin of Russell's set-of-all-sets trouble. Stratifying into Type 0, Type 1, Type 2, and so on rules the self-reference out structurally.

In practice you almost never write a universe level by hand. Polymorphic definitions like List are universe polymorphic, with a signature that reads List.{u} (α : Type u) : Type u, where u is a universe variable. Lean infers u at each use. The universe u command opts in explicitly when inference needs help, but most of the time the hierarchy is invisible.

The one-sentence answer to "why can't Type have type Type?" is: because it would contain itself and make the system inconsistent. The hierarchy is a structural workaround, not a feature you reach for.

Elaboration: solving for the arguments you didn't type

TPIL4 §2.9

The type system the previous sections describe would be unusable in practice if every call had to spell out every argument. Vector.cons's signature names α, n, the element, and the tail. In practice you write Vector.cons x v, not Vector.cons Nat 2 x v. Curly braces in a definition mark arguments as implicit, and Lean infers them. This section is about what "infers them" actually means.

The mechanism has a name: unification. When Lean elaborates a call, it creates a placeholder (called a metavariable) for each implicit argument, walks the expression to collect constraints of the form "this type must equal that type," and solves the system by substituting the solutions back in.

The smallest example is ident, with one implicit α:

def ident {α : Type} (x : α) : α := x

#check ident 5
  | ident 5 : Nat

The call ident 5 forces Lean to unify the argument type α with the type of 5, which is Nat. The solution is α := Nat, and the whole expression has type Nat.

Sometimes context supplies enough information; sometimes it doesn't. A bare #check List.nil prints the polymorphic signature as-is:

#check List.nil
  | List.nil.{u} {α : Type u} : List α

Wrapping it in parentheses forces Lean to elaborate it as a term, and the unresolved metavariable becomes visible:

#check (List.nil)
  | [] : List ?m.1

The ?m.1 is a metavariable that never got resolved; Lean could not determine α from context. Two escape valves pin it down. Type ascription (List.nil : List Nat) feeds the expected type backwards. The @ form @List.nil Nat turns the implicit back into an explicit argument you can pass directly.

The short mental model: implicit arguments are not magic; they are constraint solving over metavariables. When the constraints pin everything down, you get a clean expression. When they don't, you see a ?m in the output, and that is your cue to add ascription or a @. Lean by Example sketched the four related syntactic tools ({x : α}, _, @, (e : T)) without explaining what the elaborator does with them. This is what it does.

Invariants in types

Back to where we started. Once a type can depend on a value, invariants can live in the type itself, and the type checker verifies them. The canonical example is appending two vectors:

Vector.append :
  {α : Type} →
  {n m : Nat} →
  Vector α n →
  Vector α m →
  Vector α (n + m)

The return type says "the length of the concatenation is the sum of the input lengths." That is not a comment or a runtime assertion; it is the signature the type checker is checking. A function claiming to concatenate two vectors but returning the wrong length fails to type-check, at definition time. The property moves from "hopefully correct" to "proved before it runs."

This also opens the door to thinking of propositions as types. A claim like "n is positive" can be represented as a type whose inhabitants are proofs of that claim. A function returning a value of that type is a function that produces a proof. The dependent-type machinery from the previous sections is what makes this work: types can depend on terms, so a type can express a claim about specific values. That story is the subject of Propositions as Types, Annotated.

Worth sitting with: what would you want the type checker to verify for you that it currently can't? Buffer-bound invariants, schema matches, protocol states, authorization checks: once types can depend on values, those are all expressible. The question is what you'd reach for first.