Statement Expressions
This post uses code blocks, which might not come out very well in your Tumblr feed. For the best experience, view this post on mathdevelopment.tumblr.com
Let’s consider a hypothetical C-like language. Here's some example code so you get the idea:
while (true) { print("Pick a number!"); let x = input(); if (x > 0) { print("You chose " + x); break; } else { print("Sorry, pick a POSITIVE number"); } }
We understand this code by breaking it up into statements and control structures. Using the antlr parsing syntax, we might parse this program as:
entry: (control | statement)*; control: while | if; while: WHILE '(' expression ')' block; if: IF '(' expression ')' block (ELSE block)?; block: '{' (statement | block)* '}'; statement: (let | break | call) ';'; let: LET NAME '=' expression; break: BREAK; call: expression '(' callargs? ')'; callargs: (expression ',')* expression; expression: operator | atom; operator : operator '>' atom | operator '<' atom | operator '=' atom | operator '+' atom | operator '-' atom | operator '*' atom | operator '/' atom ; atom : '(' expression ')' | call | NAME | NUMBER | STRING | BOOL ;
That might be a lot to take in, especially if you've never seen that syntax before, but if you stare at it long enough, you should be able to get the gist of what's going on. Take your time if you need it. We'll be coming back to this later.
This model seems intuitive. It's how we as humans understand the program, and it properly conforms with all of our needs. But of course you're thinking something's up. Why would this blog post author go through all the trouble to explain something that his readers intuitively know already?
Something is up, and I'll explain that in a bit, but first I want to put up a sort of disclaimer. This first model works. It works very well. This is the parsing model (plus or minus a few things) that languages with this kind of syntax tend to use. This post is not about why the intuitive parsing strategy is bad, but rather to explore an alternative parsing strategy, and what benefits and drawbacks it may have.
Here's a question: why do we have a semicolon at the end of each line? It denotes the end of a statement, right? Some languages have done away with it and been fine, but proponents argue that the semicolon improves legibility, and maybe makes syntax errors more apparent. Plus, it can avoid the widely-hated significant whitespace, one of the only things that can bring C, LISP, and Malbolge developers together.
Okay blog author, you've made your point. You like semicolons and hate python. What are you getting at?
Semicolons are used for terminating statements, but what if we viewed it another way? What if instead of saying that a semicolon ends a statement, what if we said it lets a new statement begin? This seems like a subtle distinction, and it is, but it's a distinction that lets us bridge to a more powerful concept. Semicolons indicate both the end of a statement and the start of a new statement. If you want to run two statements, you need a semicolon between them. To join them, in a sense. The semicolon is the statement join operator.
You might have some concerns at this point, and I'll get to those, but first let's think about the syntax. Here's what I propose:
entry: statement?; statement : LET NAME '=' expression ';' statement? | BREAK ';' statement? | call ';' statement? | WHILE '(' expression ')' block statement? | IF '(' expression ')' block (ELSE block)? statement? | ';' statement? // empty statement ; block: '{' statement? '}'; // everything else as before
We can represent everything we could represent in the old parse syntax, but now instead of having a list of statements, we have a tree of statement expressions.
Now I know what you're thinking: either, "but how does let work?", or "hehe... I know how let is gonna work... this is cool."
Thank you for your input group 2, but I'm going to cater to group 1 for now. You may not have noticed it, but the grammar I chose for statement expressions didn't actually need to be as verbose as I wrote it. An alternative, more compact version could have worked just as well:
statement : (let | break | call) ';' statement? | (while | if) statement? ;
But that would have presented a problem once we got to implementation. You see, using let instead of var or def like many other languages do was not accidental. In functional programming, let has the connotation of "let [variable = value] in [expression]". And that's exactly what we're doing here. let passes information to the trailing statement expression about that variable we just declared, sort of like opening a new scope. If we had used the more compact parsing grammar, the definition in let wouldn't have been able to access the trailing statement very easily (of course, it still would have been possible, but doing what we're doing here with the original grammar also would have been possible; it just wouldn't have been very pretty).
Using indentation to represent the tree structure of our expression statements, we get this for the original program:
while (true) { print("Pick a number!"); let x = input(); if (x > 0) { print("You chose " + x); break; } else { print("Sorry, pick a POSITIVE number"); } }
You might have a new concern, and understandably so. This sample program was carefully created as to avoid something common in imperative languages and much less common in functional languages:
let x = input(); if (x < 0) { let y = x*x; x = y; } print(x);
Mutation.
We're stuck, right? This is the end? Our exploration of treating statements as expressions has come to a close?
No, not quite.
Our statements are expressions, and expressions return something. So what data type does x = y return? State. It returns the state of x having been assigned to y. Every time we evaluate the next statement in the tree, we pass it the state of the previous one, and at the end we return not a value, but the newly constructed state after the statement has been evaluated. When we enter a block we create a new state whose parent state is the state passed into the block's statement, and the block returns the parent of the state returned from its last statement.
Update: Let me interject for a moment to clarify something. I'm assuming here that the state is an immutable data structure. It doesn't need to be of course, but this whole state-passing system starts to feel extremely over-engineered if it is not.
Say we input 3:
// For state, we use the format [parentState, [myBindings]] let x = input(); // returns [global, [x: 3]] if (x < 0) { let y = x*x; x = y; } // returns [global, [x: 3]] print(x); // returns [global, [x: 3]]
But if we input -3:
let x = input(); // returns [global, [x: -3]] if (x < 0) { let y = x*x; // returns [[global, [x: -3]], [y: 9]] x = y; // returns [[global, [x: 9]], [y: 9]] } // returns [global, [x: 9]] print(x); // returns [global, [x: 9]]
So why do we want this? What's the point? Part of it is a sense of idealism. There's a general feeling that immutability is good practice, and this brings immutability to... mutation, I suppose. In some cases it could be easier to develop a language with this structure. Static code analysis could catch bugs that would be harder to detect with mutable state.
But there are drawbacks too. This doesn't permit shared program state across threads, for example, though this model could be extended with some sort of "unsafe assignment" mechanism to mutate a shared state. And if a compiler developer is seeking optimal performance, this model will likely be reduced down to exactly what we had before anyways.
In the end, as mentioned near to the start of this post, this alternative model is not "better" than our more traditional sequential statement model, but it's not worse either. It's an alternative approach, a new way to tackle an old problem. But that's also kind of the point. Even problems which appear to have only one solution can often be elegantly solved with alternative approaches. The solution is not defined by the problem, the problem is defined by the solution.









