¡AST FTW!
The most revolutionary and vivid idea (pattern, paradigm, design, younameit) in computer science was invented in 1958. It is still extremely underrated and very few developers fully understand it’s crucial importance to have a robust and easy to maintain code. It costs all the patterns invented after. If I were to be faced with a choice to pick up only one key property of my language of choice, that would be it.
Clean and exposable to the developer Abstract Syntax Tree. An ability to have AST on hand. An ability to modify AST.
That ability alone makes the development process hundreds of times faster. The resulting code becomes concise and readable. DRY comes out for free out of the box. The only goodness AST does not bring into the development process is you cannot ask it to serve a whiskey for you.
Back in 1958 there was not much room for imagination in the language design, mostly due to a necessity to be as near to the machine codes as possible. John McCarthy found the best way to build a Turing-complete language for algorithms. He did not try to invent a fancy syntax, he just showed the simplest way to write the decision tree down to the paper. Yeah, Lots of Irritating Stupid Parentheses, or LISP. Since then the von Neumann architecture has not changed a lot, despite that we use silicon semiconductors performing GazziFlops in a cheap smartphone. Still, the stupid piece of iron (ok, ok, silicon) inside our supercomputers does operations on two inputs for it’s whole life. There were attempts to build computers build upon tri-state logic (the most successful was Setun built in USSR in 1959,) but what we have now in front of us, does indeed call functions with two parameters only.
For some weird reason, the profit of having access to AST is still not evident to the vast majority of computer engineers. Half of the century we were trying to make the computer language as readable for the human beings as possible (the first genius attempt was done by Grace Hopper with COBOL, and I would consider it being the most successful one towards this direction.) While trying to make it more and more readable we lost ties to the machine core. We found ourselves in a need to build the bloated complicated transpilers from not-yet-readable-enough code to what the semiconductors might understand. Even worse, we attempted to fake AST for the most popular languages.
Ruby has Ripper
in stdlib:
pp Ripper.sexp('def hello(world) "Hello, #{world}!"; end') #⇒
[:program,
[[:def,
[:@ident, "hello", [1, 4]],
[:paren,
[:params, [[:@ident, "world", [1, 10]]], nil, nil, nil, nil, nil, nil]],
[:bodystmt,
[[:string_literal,
[:string_content,
[:@tstring_content, "Hello, ", [1, 18]],
[:string_embexpr, [[:var_ref, [:@ident, "world", [1, 27]]]]],
[:@tstring_content, "!", [1, 33]]]]],
nil,
nil,
nil]]]]
Javascript has some 3-rd party leftpads to spit out an AST.
Java has somewhat implemented in Eclipse.
All the above is fooling us, developers, because nobody needs read-only AST. The language syntax was explicitly designed to be more readable than AST. That’s its main purpose, for God’s sake. Why would I ever need to read something way more verbose and less readable?—There is no reason and that’s why many professionals even have no idea their languages of choice have kinda AST representation.
Read-write AST is a game changer. Rust seems to start understanding that. Maybe some other less common languages I am not aware of.
Elixir was born with AST being the first class citizen. Mostly because it runs over ErlangVM and the necessity to reuse as much as possible made AST handling a must. Both ways. No compromises.
We have a very well readable source, that flawlessly transpiles back and forth into AST on demand. That simple thing made it possible to have macros that are both ready to read and extremely powerful. In ruby (python, javascript, lua, java, c++) we cannot have a syntactically legit construct representing one branch in switch-case. In both LISP and Elixir we surely can. Natively in LISP and with quote do
in Elixir.
AST is a tree in the first place. Meaning one might easily plug out branches and plug in other branches. That makes possible such things as not including the whole AST with calls to Logger.debug
into release version—there are exactly zero processor ticks in prod, the code is simply not there, no conditionals needed.
Once we have AST, we might traverse it, modify and plug in back. Macros in Elixir receive the AST and return AST back. Since macros are expanded on the compilation stage, we might do literally whatever we want.
Metaprogramming abilities are pathetic like provided by other languages ruby, python or java. I am positive, that sooner or later people will all of a sudden realize that AST solves most issues with the unreadable and unmaintainable code. The real development liberty is impossible without a direct access to AST.
Try it and you’ll never consider switching back.
Addendum 1. Elixir AST.
iex|1 ▶ quote do: ({:ok, result} -> result)
[{:->, [], [[ok: {:result, [], Elixir}], {:result, [], Elixir}]}]
Happy treeing!