In the previous post of this series, we walked through how to create a grammar for the Simple language (which is similar to a Javascript subset) using our new grammar/AST framework. We showed how easy it was to write the grammar using EBNF-like notation but completely from within your native C#/VB code.
We also showed how the grammar automatically generates an AST of the parsed document for you. The generated tree generally has much more information than what is useful though. Today’s post is going to show how we can completely customize the AST output so that it focuses only on the information we care about.
Our grammar/AST framework has an extremely powerful tree construction mechanism with numerous built-in helper methods to do common tree construction tasks. You also have the ability to create completely custom tree construction nodes via C#/VB code if you like.
Default tree construction
The default tree constructors will output nearly everything that is parsed. Each AST node that is generated automatically contains the offset range of the text that created it.
Productions
Any EBNF production will create a scope and will output a new hierarchy level node in an AST. The AST node that is created can have zero to many children, depending on what was matched within the production.
If the production is the root production for a non-terminal, then the AST node that is output for the production will have a value that is the non-terminal’s key. Note that productions can be used within other productions. In that case, the nested productions generate an AST node that has children but no value.
Terminals
Terminals, which match tokens, will output an AST node with no children that has a value of the text that was read.
Quantifiers
“Optional” quantifiers will output the tree construction results for the quantifier’s child term if the term matched successfully. Otherwise an empty structural AST node will be output.
For any other quantifier (“zero-or-more”, “one-or-more”, etc.), a new scope is created within the quantifier. The quantifier will output a new hierarchy level node in the AST. The AST node that is created can have zero to many children, depending on what was matched in the quantifier.
Scopes and labels
We mentioned a couple places above where scopes are created. Namely, scopes occur in productions and in quantifiers (ones other than “optional”).
Any EBNF term can be assigned a label. A label is a way for a tree constructor to identify something that was matched in its scope. Labels are only useful when creating custom tree constructors.
Terminals and non-terminals, represented by the Terminal and NonTerminal classes can be auto-converted to their EBNF term equivalents (EbnfTerminal and EbnfNonTerminal) by calling their indexer and passing the label as a string parameter.
So say we have a terminal within a production for an identifier and we want to apply a label to it. We can do so like this:
For any object that is EbnfTerm-based, there is a SetLabel method you can call instead like:
1: statement.ZeroOrMore().SetLabel("stmt")
In that example, statement is a NonTerminal, its ZeroOrMore method call converted it to an EbnfNonTerminal and wrapped it with an EbnfQuantifier. Finally we set the label stmt on the quantifier via the SetLabel method.
Customizing tree constructors
Adding tree constructors to a production is easy. So far we’ve only seen how to indicate the EBNF pattern expression for a production. A tree constructor can be added to this by using the > operator followed by an ITreeConstructionNode instance like this:
1: expression.Production = equalityExpression["exp"] > AstFrom("exp");
In the above sample code I created a production for an expression non-terminal, which simply calls into an equalityExpression non-terminal. It applies a label to that non-terminal’s results so that the tree constructor can work with them. The AstFrom tree constructor simply looks for a match with the specified label and returns its value.
Thus the result of a match with this production is the result of the equalityExpression non-terminal. If we didn’t include the tree constructor portion of the above production, the result would be an Expression AST node whose child was the result of the equalityExpression non-terminal.
Built-in tree construction nodes
There are a number of built-in tree construction nodes that should handle nearly all your tree construction needs. These nodes are created by method calls on the Grammar class.
Ast(string value, params ITreeConstructionNode[] childNodes)
The ITreeConstructionNode generated by this method will create an AST node with the specified value and optional child nodes.
This example creates an AST node with value ReturnStatement that has a child node resulting from an EBNF term match whose label is exp:
1: Ast("ReturnStatement", AstFrom("exp"))
AstValueOf(ITreeConstructionNode valueOfNode, params ITreeConstructionNode[] childNodes)
The ITreeConstructionNode generated by this method works the same way as Ast except that the root node created will have the value of the valueOfNode tree construction node result’s value.
This example for an XML attribute name and value shows how a parent AST node can be created whose value is the attribute name and the child AST node will be the attribute value:
1: AstValueOf(AstFrom("attrName"), AstFrom("attrValue"))
AstFrom(string label)
The ITreeConstructionNode generated by this method will create an AST node from an EBNF term match with the specified label.
This example creates an AST node resulting from an EBNF term match whose label is exp:
AstChildFrom(string label)
The ITreeConstructionNode generated by this method will create an AST node from the first child of an EBNF term match with the specified label.
This example creates an AST node resulting from the first child of an EBNF term match whose label is op:
AstChildrenFrom(string label)
The ITreeConstructionNode generated by this method will create an AST node from the children of an EBNF term match with the specified label.
This example creates an AST node with value Block that contains the children of an EBNF term match whose label is stmt:
1: Ast("Block", AstChildrenFrom("stmt"))
AstConditional(string value, params ITreeConstructionNode[] childNodes)
The ITreeConstructionNode generated by this method will create an AST node with the specified value and optional child nodes, but only if there is more than one child available. If one child is available, it is directly returned.
This example creates an AST node with value Parameters that has child nodes from the union of the children of EBNF term matches params1 and params2:
1: AstConditional("Parameters", AstChildrenFrom("params1"),
2: AstChildrenFrom("params2"))
Note that if only one result came from the union of params1 and params2 children, that single result would be returned and would not be wrapped by a Parameters node.
AstConditionalFrom(string value, string label)
The ITreeConstructionNode generated by this method will create an AST node with the specified value whose child nodes come from the children of the specified labeled EBNF term match, but only if at least one child node is available. If no child nodes are available, nothing is created.
This example creates a FunctionDeclaration AST node whose first child comes from an EBNF term match with label name. The next child will be a Parameters node but only if there was at least one child within an EBNF term match with label params. The last child will be the result of an EBNF term match with label block.
1: Ast("FunctionDeclaration", AstFrom("name"),
2: AstConditionalFrom("Parameters", "params"), AstFrom("block"))
This tree construction node is very handy when there are things such as optional parameter or argument lists in a grammar and you don’t want to clutter up the resulting AST when a particular member doesn’t have parameters or arguments.
AstValueOfConditional(ITreeConstructionNode valueOfNode, params ITreeConstructionNode[] childNodes)
The ITreeConstructionNode generated by this method works the same way as AstConditional except that the root node created when there is more than one child available will have the value of the valueOfNode tree construction node result’s value.
This example for a binary operator expression shows how the operator used will be the value of the root node created, but only if the leftexp and rightexp returned matches. If only one match occurred (no operator was used), then that match will be directly passed up.
1: AstValueOfConditional(AstChildFrom("op"), AstFrom("leftexp"),
2: AstFrom("rightexp"))
This tree construction node is pretty advanced and will be easier to understand in the next post, where we show practical usage of it.
Creating custom tree construction nodes
The built-in tree construction nodes should handle nearly all the tree construction scenarios you have. But what if there is something that requires some additional programmatic logic? No problem, you can simply create a class that implements ITreeConstructionNode and inject it anywhere just like you do with the results of the above methods.
Next steps
In this post, we reviewed all the basics on how tree construction works. In the next post, we’ll revisit our Simple language and will apply tree construction techniques to its productions. The default tree constructors create an enormous AST, but once we apply our own tree construction, the AST will be short and concise.