Compilers — Syntactic Analysis
Hello from the other siiiiiiiide reader(Adele vibes here), nice to see you again in the second episode of this series where we are covering the main aspects of compiler theory by building a lisp expression interpreter in Elixir. If you still don't read the first part or just wanna review the concept, click here!
Let's get started
What’s Syntactic Analysis?
Well similar to a real-world language like English, the position of the words should follow a phrasal order like:
The cat and the dog become friends.
It makes sense because I make it explicit that the dog and the cat are the subjects and after it, I express their action of becoming friends, but if the sentence was written like this:
The friends cat and the become dog .
It doesn't make any sense because the correct order expected by the language is completely messed up
And the same can be seen in our program, for example, if we have an expression like this:
)(6(+(*) 3
This program is lexically correct, but syntactically it follows no order, and it's without meaning, so in order to make our programs correct we need a free context grammar, and after it, we have to build a parsing tree.
Our free context grammar for our lisp expressions will be the following:
O -> (
C -> )
N -> 0..9|ORNNC
R -> {+, -, *, /}
EXP -> ORNNC
And our program: “(+ 5 5)” would generate the following parse tree:
EXP
/ / | \ \
O R N N C
| | | | |
( + 5 5 )
Now that we get the theory, let's jump into Elixir and implement our sintatic analyser
Let's create our module file:
lisp_eval/lib/lisp_eval/sintatic_analysis.ex
defmodule SyntaticAnalysis do end
And after it, let's create our first rule: the balanced parenthesis
defmodule SyntaticAnalysis do def syntatic_analysis([{:open_paren, _} | tail], parens, pt), do: operator tail, parens+1, pt
def syntatic_analysis([{:close_paren, _} | tail], parens, pt), do: operator tail, parens-1, pt
def syntatic_analysis([], 0, pt), do: pt
def syntatic_analysis([], _, _), do: raise "Open close parens not equally balanced."
end
The function basically creates the rule that if our program:
Is starting with an open parenthesis, we call the operator step(still not implemented) because the operator is the next one (+ 5 5)
If we receive a close parenthesis as an argument, we just send it to the operator function for the exception to be held by the next function
If we receive an empty list with the second parameter(the balance of parenthesis) equal to zero, the program is syntactically correct, and we just return the parse tree
If the value is not zero that means the parentheses are not balanced if it's positive there are more open parentheses and if it is negative that means that there are more closing parentheses in the expression
Now it's the operator turn
defmodule SyntaticAnalysis do def syntatic_analysis([{:open_paren, _} | tail], parens, pt), do: operator tail, parens+1, pt
def syntatic_analysis([{:close_paren, _} | tail], parens, pt), do: operator tail, parens-1, pt
def syntatic_analysis([], 0, pt), do: pt
def syntatic_analysis([], _, _), do: raise "Open close parens not equally balanced." def operator([{:operator, op} | tail], parens, pt), do: value(tail, parens, pt ++ [op])
def operator(_, _, _), do: raise "Operator expected."
end
The operator function is simpler than the previous one, where we declared only two branches:
The one that, if the symbol is an operator, we call the value step(the next function we are going to declare)
And if the symbol is not an operator, we just raise an exception saying that we were expecting an operator symbol.
And to finish: the value step
defmodule SyntaticAnalysis do def syntatic_analysis([{:open_paren, _} | tail], parens, pt), do: operator tail, parens+1, pt
def syntatic_analysis([{:close_paren, _} | tail], parens, pt), do: operator tail, parens-1, pt
def syntatic_analysis([], 0, pt), do: pt
def syntatic_analysis([], _, _), do: raise "Open close parens not equally balanced." def operator([{:operator, op} | tail], parens, pt), do: value(tail, parens, pt ++ [op])
def operator(_, _, _), do: raise "Operator expected." def value([{:number, num} | tail], parens, pt), do: value(tail, parens, pt ++ [num])
def value([{:open_paren, _} | tail], parens, pt), do: operator tail, parens+1, pt
def value([{:close_paren, _} | tail], parens, pt), do: value tail, parens-1, pt
def value([], parens, pt), do: syntatic_analysis [], parens, pt
def value(_, _, _), do: raise "Value expected."
end
Our value function is a bit more complex, so let's break it one step at a time
First, if we receive a number token It's what we are expecting, and then we call the value step again hoping for a second value, ending of expression(“)” symbol) or the start of a new nested expression(“(” symbol).
If we receive an open parenthesis symbol, we call operator step because a nested expression is being evaluated, and we increase the counter that controls the balance of the parentheses.
If the symbol is a close parenthesis symbol, we call value again because it may be the end of a nested expression or the end of the program and the balance of parentheses are decreased in one
If the program is an empty list that means that the program was entirely evaluated, so we just call the syntatic_analysis step to validate the final result
And last, if we receive any other thing, we raise an exception because a value was expected.
Now let's check the final result
Run:
iex -S mix
Make sure you repeated all the steps of the previous article and get the list of tokens(the one stored in the output variable)
iex(3)> program = Enum.reverse(output)
Then we just have to call our syntatic_analysis function with our list of tokens
iex(4)> SyntaticAnalysis.syntatic_analysis(program, 0, [])
And we should get something like this:
["+", "5", "5"]
And it is done we make sure all the symbols in our program are following the correct order, and we filtered all the necessary and meaningful symbols for the next step(the semantic analysis).
Thank you again for your time, I hope I could put the concepts in a didactic way, compilers are not a trivial subject, but it is certainly worth it. See you at the next one, Bye!