The Kaleidoscope Language
=========================
-This tutorial will be illustrated with a toy language that we'll call
+This tutorial is illustrated with a toy language called
"`Kaleidoscope <http://en.wikipedia.org/wiki/Kaleidoscope>`_" (derived
from "meaning beautiful, form, and view"). Kaleidoscope is a procedural
language that allows you to define functions, use conditionals, math,
etc. Over the course of the tutorial, we'll extend Kaleidoscope to
support the if/then/else construct, a for loop, user defined operators,
-JIT compilation with a simple command line interface, etc.
+JIT compilation with a simple command line interface, debug info, etc.
-Because we want to keep things simple, the only datatype in Kaleidoscope
+We want to keep things simple, so the only datatype in Kaleidoscope
is a 64-bit floating point type (aka 'double' in C parlance). As such,
all values are implicitly double precision and the language doesn't
require type declarations. This gives the language a very nice and
# This expression will compute the 40th number.
fib(40)
-We also allow Kaleidoscope to call into standard library functions (the
-LLVM JIT makes this completely trivial). This means that you can use the
+We also allow Kaleidoscope to call into standard library functions - the
+LLVM JIT makes this really easy. This means that you can use the
'extern' keyword to define a function before you use it (this is also
-useful for mutually recursive functions). For example:
+useful for mutually recursive functions). For example:
::
little Kaleidoscope application that `displays a Mandelbrot
Set <LangImpl06.html#kicking-the-tires>`_ at various levels of magnification.
-Lets dive into the implementation of this language!
+Let's dive into the implementation of this language!
The Lexer
=========
as its ASCII value. If the current token is an identifier, the
``IdentifierStr`` global variable holds the name of the identifier. If
the current token is a numeric literal (like 1.0), ``NumVal`` holds its
-value. Note that we use global variables for simplicity, this is not the
+value. We use global variables for simplicity, but this is not the
best choice for a real language implementation :).
The actual implementation of the lexer is a single function named
return tok_number;
}
-This is all pretty straight-forward code for processing input. When
+This is all pretty straightforward code for processing input. When
reading a numeric value from input, we use the C ``strtod`` function to
convert it to a numeric value that we store in ``NumVal``. Note that
this isn't doing sufficient error checking: it will incorrectly read
"1.23.45.67" and handle it as if you typed in "1.23". Feel free to
-extend it :). Next we handle comments:
+extend it! Next we handle comments:
.. code-block:: c++
My First Language Frontend with LLVM Tutorial
=============================================
+**Requirements:** This tutorial assumes you know C++, but no previous
+compiler experience is necessary.
+
Welcome to the "My First Language Frontend with LLVM" tutorial. Here we
run through the implementation of a simple language, showing
how fun and easy it can be. This tutorial will get you up and running
iteratively over the course of several chapters, showing how it is built
over time. This lets us cover a range of language design and LLVM-specific
ideas, showing and explaining the code for it all along the way,
-and reduces the amount of overwhelming details up front. We strongly
+and reduces the overwhelming amount of details up front. We strongly
encourage that you *work with this code* - make a copy and hack it up and
experiment.
-Warning: In order to focus on teaching compiler techniques and LLVM
+**Warning**: In order to focus on teaching compiler techniques and LLVM
specifically,
this tutorial does *not* show best practices in software engineering
principles. For example, the code uses global variables
-all over the place, doesn't use nice design patterns like
+pervasively, doesn't use
`visitors <http://en.wikipedia.org/wiki/Visitor_pattern>`_, etc... but
instead keeps things simple and focuses on the topics at hand.
This tutorial is structured into chapters covering individual topics,
-allowing you to skip ahead or over things as you wish:
+allowing you to skip ahead as you wish:
-- `Chapter #1 <LangImpl01.html>`_: Introduction to the Kaleidoscope
- language, and the definition of its Lexer. This shows where we are
+- `Chapter #1: Kaleidoscope language and Lexer <LangImpl01.html>`_ -
+ This shows where we are
going and the basic functionality that we want to build. A lexer
is also the first part of building a parser for a language, and we
use a simple C++ lexer which is easy to understand.
-- `Chapter #2 <LangImpl02.html>`_: Implementing a Parser and AST -
+- `Chapter #2: Implementing a Parser and AST <LangImpl02.html>`_ -
With the lexer in place, we can talk about parsing techniques and
basic AST construction. This tutorial describes recursive descent
parsing and operator precedence parsing.
-- `Chapter #3 <LangImpl03.html>`_: Code generation to LLVM IR - with
+- `Chapter #3: Code generation to LLVM IR <LangImpl03.html>`_ - with
the AST ready, we show how easy it is to generate LLVM IR, and show
a simple way to incorporate LLVM into your project.
-- `Chapter #4 <LangImpl04.html>`_: Adding JIT and Optimizer Support
- - One great thing about LLVM is its support for JIT compilation, so
+- `Chapter #4: Adding JIT and Optimizer Support <LangImpl04.html>`_ -
+ One great thing about LLVM is its support for JIT compilation, so
we'll dive right into it and show you the 3 lines it takes to add JIT
support. Later chapters show how to generate .o files.
-- `Chapter #5 <LangImpl05.html>`_: Extending the Language: Control
- Flow - With the basic language up and running, we show how to extend
+- `Chapter #5: Extending the Language: Control Flow <LangImpl05.html>`_ - With the basic language up and running, we show how to extend
it with control flow operations ('if' statement and a 'for' loop). This
gives us a chance to talk about SSA construction and control
flow.
-- `Chapter #6 <LangImpl06.html>`_: Extending the Language:
- User-defined Operators - This chapter extends the language to let
- users define arbitrary unary and binary operators (with assignable
- precedence!). This lets us build a significant piece of the
+- `Chapter #6: Extending the Language: User-defined Operators
+ <LangImpl06.html>`_ - This chapter extends the language to let
+ users define arbitrary unary and binary operators - with assignable
+ precedence! This allows us to build a significant piece of the
"language" as library routines.
-- `Chapter #7 <LangImpl07.html>`_: Extending the Language: Mutable
- Variables - This chapter talks about adding user-defined local
+- `Chapter #7: Extending the Language: Mutable Variables
+ <LangImpl07.html>`_ - This chapter talks about adding user-defined local
variables along with an assignment operator. This shows how easy it is
to construct SSA form in LLVM: LLVM does *not* require your front-end
to construct SSA form in order to use it!
-- `Chapter #8 <LangImpl08.html>`_: Compiling to Object Files - This
+- `Chapter #8: Compiling to Object Files <LangImpl08.html>`_ - This
chapter explains how to take LLVM IR and compile it down to object
files, like a static compiler does.
-- `Chapter #9 <LangImpl09.html>`_: Extending the Language: Debug
- Information - A real language needs to support debuggers, so we add
- debug information that allows setting breakpoints in Kaleidoscope
+- `Chapter #9: Extending the Language: Debug Information
+ <LangImpl09.html>`_ - A real language needs to support debuggers, so we
+ add debug information that allows setting breakpoints in Kaleidoscope
functions, print out argument variables, and call functions!
-- `Chapter #10 <LangImpl10.html>`_: Conclusion and other useful LLVM
- tidbits - This chapter wraps up the series by talking about
- potential ways to extend the language, and includes some
- pointers to info on "special topics" like adding garbage
+- `Chapter #10: Conclusion and other tidbits <LangImpl10.html>`_ - This
+ chapter wraps up the series by discussing ways to extend the language
+ and includes pointers to info on "special topics" like adding garbage
collection support, exceptions, debugging, support for "spaghetti
- stacks", and random tips and tricks.
+ stacks", etc.
By the end of the tutorial, we'll have written a bit less than 1000 lines
-of non-comment, non-blank, lines of code. With this small amount of
+of (non-comment, non-blank) lines of code. With this small amount of
code, we'll have built up a nice little compiler for a non-trivial
language including a hand-written lexer, parser, AST, as well as code
-generation support with a JIT compiler. The breadth of this
-tutorial is a great testament to the strengths of LLVM and shows why
-it is such a popular target for language designers.
+generation support - both static and JIT! The breadth of this is a great
+testament to the strengths of LLVM and shows why it is such a popular
+target for language designers and others who need high performance code
+generation.