Compilers — Code Generation

Kevin Da Silva
4 min readJan 27, 2022

Hey reader, welcome to the last compilation phase, we have been through incredible phases in the last few weeks: lexical analysis, syntactic analysis, semantic analysis, and now code generation. I know compilers may not be a trivial subject, but certainly, it's totally worth learning it, so it's super nice to see you here.

What is Code Generation?

Some people in the compilers world eventually mention that nowadays code generation is the less complex part due to this phase having been eased and simplified a lot since assembly.

But basically, code generation in compilers normally follow one or both of these rules:

  • Compilation from one high-level language to another high-level language

It's very common that a high-level language translates to another high-level language, normally to get the advantage of running in a certain environment like languages like Java, Clojure, Scala that compile to byte code to run in the environment of JVM, or languages like Typescript or modern ECMA versions of JS use babel to compile JS like code to JS browser compatible code to be able to use the browser environment. (Klang also compile to JS to be able to run in the browser).

And the second rule

  • Compilation from a high-level language to a binary file

This one is more common for compiled languages like c, c++, Haskell, where the compiler gets the hardware instructions and translates a high-level language to processor instructions in one single.exe file.

If we are building an interpreter, how do we execute our code without code generation?

Well, normally instead of generating source files, we process every instruction line by line of the file and generate the processor instruction instead of generating files.

Of course, we can mix both alternatives, like JVM languages that compile to byte code that is interpreted by the JVM.

Implementing the last phase of the interpreter

lisp-eval/lib/lisp_eval.ex

Let's do some imports:

defmodule LispEval do
import LexicalAnalysis, only: [lexical_analysis: 1]
import SyntaticAnalysis, only: [syntatic_analysis: 3]
import SemanticAnalysis, only: [semantic_analysis: 1]
import IO.ANSI, only: [blue_background: 0, reset: 0]
end

With that done lets define a main function

defmodule LispEval do
import LexicalAnalysis, only: [lexical_analysis: 1]
import SyntaticAnalysis, only: [syntatic_analysis: 3]
import SemanticAnalysis, only: [semantic_analysis: 1]
import IO.ANSI, only: [blue_background: 0, reset: 0]
def main(filepath) do

end

end

And now we just pipe all our compilation phases together

defmodule LispEval do
import LexicalAnalysis, only: [lexical_analysis: 1]
import SyntaticAnalysis, only: [syntatic_analysis: 3]
import SemanticAnalysis, only: [semantic_analysis: 1]
import IO.ANSI, only: [blue_background: 0, reset: 0]
def main(filepath) do
program = File.read!(filepath)
|> String.split("", trim: true)
|> lexical_analysis()
|> Enum.reverse()
|> syntatic_analysis(0, [])
{out, _} = semantic_analysis program

end
end

And to finish, we just print the response given by our interpreter in the terminal with the IO.puts function

defmodule LispEval do
import LexicalAnalysis, only: [lexical_analysis: 1]
import SyntaticAnalysis, only: [syntatic_analysis: 3]
import SemanticAnalysis, only: [semantic_analysis: 1]
import IO.ANSI, only: [blue_background: 0, reset: 0]
def main(filepath) do
program = File.read!(filepath)
|> String.split("", trim: true)
|> lexical_analysis()
|> Enum.reverse()
|> syntatic_analysis(0, [])
{out, _} = semantic_analysis program
IO.puts blue_background() <> "[ Output: #{out} ]" <> reset()
end
end

And now to test it we just need to:

iex -S mix

create an example file:

(+ 5 3)

and call:

LispEval.main(my_created_file_path)

And we should get the value 8 printed on a blue background, and we have an interpreter for our little language.

Extra challenges:

Now that you are feeling the power of compilers and are still wanting some side quests, here is some for you:

  • Add multi-line comments to your interpreter

[TIP] compilers ignore comments in your source files, so to implement it you will need to define a multi-line comment symbol(like /* */) and will have to create an automaton to ignore it in the lexical analysis phase.

  • Add power operator

[TIP] Add one more symbol to the @operators(^ symbol, for example) in lexical analysis and a rule to perform the exponential operation in semantic analysis.

  • Make syntactic analysis call lexical analysis when needing the next token

[TIP] Currently our interpreter first process the entire program in the lexical analysis phase, but let's suppose that in the first line of our program we have a syntactic error like 5(*5) the lexical analysis will process the entire file just for us to discover that the file is not well-formatted in syntactic analysis what would be a waste of resources, to solve that instead of returning a list of tokens in lexical analysis phase change it to return a tuple like {token, remaining of the program} and in the syntactic analysis change it instead of processing a list just call LexicalAnalysis module to get each token when it is needed in syntactic analysis. (Klang implements that)

  • Turn your interpreter into a Cli

[TIP] Search for escriptsthe current lisp-eval already supports it in a cli, and you can find the implementation here.

And if you would like to learn more about compilers and Elixir, here are some resources

-Elixir

-Compilers

And if you are looking for academic experience with both topics at the same time, UFMG university has some master’s degree positions open for students willing to work in the Elixir compiler, get more details here.

And here we are at the end of one more series, the first one in the year, well what a way to start a year :). It was incredible to have your companionship until here, and I hope you could learn a bit more about compilers and hope to see you in the next series. Bye!!

--

--

Kevin Da Silva

I'm a back-end developer, functional programming lover, fascinated by computer science and languages. From the south part of Brazil to the world