Note: AI was used to edit this post. As a proof-of-human thought and input, I am also publishing the original draft here which was written fully before asking AI to workshop the post with me.
This post is aimed at Lean language beginners that are interested in writing proofs in Lean, but still feel lost when reading lean code.
A very simplified mental model of Lean is that at the core there are two systems:
- a very fancy functional programming language. The objects here are
Nat,List Nat, functions of those, etc
def n: Nat := 3
def l: List Nat := [1, 2, 3]
def f (n : Nat) := n + 1
The type system is fancy in the way that types have they own types (and so on recursively). But for the purposes
of this post we will just talk about the first layer Type (which is shorthand for Type 0)
#check Nat -- Nat : Type
#check List Nat -- List Nat : Type
#check Nat -> Nat -- Nat → Nat : Type
Finally, parametric polymorphism is very pleasingly homoiconic with rest of the system, meaning we can write generic parameters using exactly the same syntax for declaration or application (no <> or [] like Java or TypeScript.) So far this should be very familiar to people with exposure to typed FP.
function append (a: Type) (arr: List a) := arr ++ [a]
- a system to state and manipulate logical propositions about the objects of 1). By convension these are
named
h*which stands for hypothesis.
def n := 3
def hnpos : n > 2 := by decide
Again this very homoiconic - system 2) uses the same keywords for defining and declaring types as 1).
Notably, the type of the proposition is allowed to depend on variables. In the example above, <
is a fancy way of writing LT.lt n 3. On the surface it is a generic type like List a, note that n
is a variable of type Nat, unlike a which was of type Type. Allowing, such dependency is why
we say Lean is based on dependent types. Finally, if the logical proposition is the type, the object
of that type is the proof, in this case by decide. Don’t worry if you don’t understand what it does, there
are many ways to prove that 3 is bigger than 2. If the proposition was not true like n < 2 we can still state it,
but will have very hard time proving, i.e. filling out the other side of def h: n < 2 := ???.
Any statement from first order logic can be writen in Lean, n > 2, n = n, ∀ n: Nat, ∃ m: Nat, m * 2 = n, etc.
Again since all types have their own type, it is natural to ask what’s the type of those statements - it is Prop.
#check n > 2 -- n > 2 : Prop
#check n = n -- n = n : Prop
#check ∀ n: Nat, ∃ m: Nat, m * 2 = n -- ∀ (n : Nat), ∃ m, m * 2 = n : Prop
Props can be manupulated using the usual logical ways - creating conjuctions, disjunctions, implications, iff, etc.
def and_comm (p q: Prop) : p ∧ q ↔ q ∧ p := And.comm p q
You might see the ones above stated with theorem not def, which is more ideomatic and ergonomic. but the differences are minor.
The keyword example is like an anonymous theorem.
So objects of Type and Prop are the core ingredients of any Lean proof. Usually, one passes some number of objects, props about
those and tries to prove some new prop.
example (a b c n : Nat) (ha : a > 1) (hb : b > 1) (hc : c > 1) (hn : n > 2) : a ^ n + b ^ n ≠ c ^ n := by sorry
Ergonomics
I hope the above exposition convinces you that there is a minimal core to Lean. I like to think of it as the explicit model. It is nice a simple but sadly has bad ergonomics once you start doing non-trivial proofs. The argument lists become long and unwildly, so a lot of the complexity in Lean comes from adding different systems of implicits, inference, etc, to make it more ergonomic to write proofs in lean.
The situation reminds me of this quote from Wadler’s monad paper
Pure functional languages have this advantage: all flow of data is made explicit. And this disadvantage: sometimes it is painfully explicit.
In the rest of the post I will go over some of the niceties Lean overlays on this simple model of types and props. I will use the following very contrived example. Suspend disbeliev and imagine a much more complex proof as we will be doing various refactoring to make it “simpler” while it already is quite simple:
lemma t1 (n : Nat) (hn: n > 0) : n + 1 = 1 + n := by sorry
lemma t2 (n : Nat) (hn: n > 0) : n + n = 2 * n := by sorry
lemma t3 (n : Nat) (hn: n + 1 = 1 + n) : n + 2 = 2 + n := by sorry
theorem t4 (n : Nat) (hn : n > 0) : True := by
have h1 := t1 n hn
have h2 := t2 n hn
have h3 := t3 n h1
trivial
Basic inference
One can ask Lean to infer an expression by replacing it with _. Depending on the context it might
succeed or fail. In the example below n can be inferred from the h*.
theorem t4 (n : Nat) (hn : n > 0) : True := by
have h1 := t1 _ hn
have h2 := t2 _ hn
have h3 := t3 _ h1
trivial
Inferred Arguments declaration
Instead of asking every user of the theorem h1 to write _ the author of the theorem can flip the script
and declare the argument as always inferable, since they know that n is present in the dependent type of hn.
To do that we use {} curly brackets.
lemma t1 {n : Nat} (hn: n > 0) : n + 1 = 1 + n := by sorry
lemma t2 {n : Nat} (hn: n > 0) : n + n = 2 * n := by sorry
lemma t3 {n : Nat} (hn: n + 1 = 1 + n) : n + 2 = 2 + n := by sorry
theorem t4 (n : Nat) (hn : n > 0) : True := by
have h1 := t1 hn
have h2 := t2 hn
have h3 := t3 h1
trivial
If the inference gets in the way one can always explicitly overwrite it with t1 (n := ...) ....
Variable keyword
The next step for ergonomics is noticing that {n : Nat} repeats in every t* theorem. Lean
allows you declare a variable for all declarations in an namespace using the variable keyword.
variable {n: Nat}
lemma t1 (hn: n > 0) : n + 1 = 1 + n := by sorry
lemma t2 (hn: n > 0) : n + n = 2 * n := by sorry
lemma t3 (hn: n + 1 = 1 + n) : n + 2 = 2 + n := by sorry
theorem t4 (n : Nat) (hn : n > 0) : True := by
have h1 := t1 hn
have h2 := t2 hn
have h3 := t3 h1
trivial
Note, t4 can keep an explicit n argument or not, there is no clash with the variable declaration.
Structure packing
One might notice that we keep talking about the natural numbers bigger than zero, i.e. the positive natural numbers. Naturally one might want to create a new type to define those. This should be a very natural refactoring to software people.
Lean has the structure keyword for that, and we can just combine the underlying carrier object
in Nat with the prop that it is bigger than zero.
structure PNat where
val : ℕ
hpos : val > 0
variable (n : PNat)
lemma t1 : n.val + 1 = 1 + n.val := by sorry
lemma t2 : n.val + n.val = 2 * n.val := by sorry
lemma t3 (hn: n.val + 1 = 1 + n.val) : n.val + 2 = 2 + n.val := by sorry
theorem t4 (n : PNat): True := by
have h1 := t1 n
have h2 := t2 n
have h3 := t3 n h1
trivial
Note, that we had to make our variable explicit (n : PNat) instead of implicit {n : PNat} because we trimmed the
argument list so much there is nothing to infer from.
Operator overloading
You might have noticed that the structure refactoring forced us to add .val everywhere, making it hardly look
any cleaner. As it turns out Lean has a form of operator overloading tha allows it to define + and * on PNat which
does the .val under the hood.
structure PNat where
val : ℕ
hpos : val > 0
instance PNatAdd : Add PNat where
add a b := ⟨a.val + b.val, add_pos a.hpos b.hpos⟩
instance PNatMul : Mul PNat where
mul a b := ⟨a.val * b.val, mul_pos a.hpos b.hpos⟩
def one : PNat := ⟨1, by norm_num⟩
def two : PNat := ⟨2, by norm_num⟩
variable (n : PNat)
lemma t1 : n + one = one + n := by sorry
lemma t2 : n + n = two * n := by sorry
lemma t3 (hn: n + one = one + n) : n + two = two + n := by sorry
theorem t4 (n : PNat): True := by
have h1 := t1 n
have h2 := t2 n
have h3 := t3 n h1
trivial
So the actual proof got noticably simpler, but we had to add more are more infrastrucure, including a proofs that
adding or multiplying two positive numbers gives us back a positive numbers (turns out those were already in Lean).
Finally, we had to move away from the literals 1 and 2 and create custom PNat vars one and two. Lean has
overloading mechanisms to allow us to keep the literals, but that would take us too far.
Custom type classes
Finally, you might have notived the operator overloading of + and * was done through the type class system, using
the insteance keyword. As it turns out that system is extremely flexible and open to customization.
At its core it is another type of “implicit” resolution, where a piece of lean data - type or prop is attached
to some other one. One mental picture is of a big global hashmap. Then whenever we need that extra data, lean can
run a query to fetch what is needed. One can view it as a functional type of dependency injection (if coming from
more OOP world) and very similar to Haskell’s type classes (if coming from FP, but note there are some differences).
Let’s use a custom type class to attach implicitly the proof that n is positive to a given n. First we extend
the type class system by defining a new type class with the class keyword. The key to use in the search is a
given a : Nat and the returned data is pos : a > 0.
class IsPos (a: Nat) where
pos : a > 0
Now every time we need a positive proof for some n we pull it with [IsPos n] and every next function
invokation that needs it also gets it inferred in form the one present in the local context.
Of course, we can use [IsPos n] with the variable keyword.
variable (n: Nat) [IsPos n]
lemma t1 : n + 1 = 1 + n := by sorry
lemma t2 : n + n = 2 * n := by sorry
lemma t3 (hn: n + 1 = 1 + n) : n + 2 = 2 + n := by sorry
theorem t4 (n : Nat) [IsPos n]: True := by
have h1 := t1 n
have h2 := t2 n
have h3 := t3 n h1
trivial
Finally, while the [IsPos n] instance says anonymous and it is implicilty passed between function calls, we
can always give it a name in a local variable with have hnpos : IsPos n := by infer_instance and now
we can treat hnpos as a structure and get hnpos.pos for the actual Prop.
Also, if we want to explicitly put a proof that n is positive in the type system, we can like this:
def a := 3
instance : IsPos a where
pos := by decide
def h := t4 a
In summary, the type class system is another way of implicitly passing data around, but it has more
of a global flavor - the IsPos a instance can be pulled out of the global hashmap with [IsPos a]
whenever one has access to a.
Learn more
Hopefully, this contrived example has given you a more concrete understanding of how the various language features exists to improve the ergonomics of writing compliated proofs, by showing you how they are used on a very minimal example (doesn’t even need Mathlib). I encourage you to paste there examples in the lean playground and try to extend or break then.
You can learn a lot more about implicits, variable declarations and type classes by reading the pofficial guide](https://leanprover.github.io/theorem_proving_in_lean4/) and the language reference.