The Actipro Blog

All the latest UI control development news from Actipro

SyntaxEditor grammar/AST framework part 5: Optimizing the Simple grammar’s AST output

In the previous post we gave an introduction to the powerful tree construction mechanism that is built into the grammar/AST framework and talked about each of the built-in tree construction nodes that are available.  We also mentioned that for very rare scenarios where advanced logic is needed to build a particular AST node, you can create and inject your own ITreeConstructionNode-based classes to handle the task.

In today’s post, we’re going to revisit the Simple language and will enhance our grammar productions so that we make the resulting AST very concise. 

Simple language grammar review

To review, our Simple language is similar to a subset of Javascript.  Here are the grammar productions we created in part 3:

   1: this.Root.Production = functionDeclaration.ZeroOrMore();
   2:  
   3: functionDeclaration.Production = @function + @identifier + @openParenthesis + 
   4:     functionParameterList.Optional() + @closeParenthesis + block;
   5:  
   6: functionParameterList.Production = @identifier + (@comma + @identifier).ZeroOrMore();
   7:  
   8: statement.Production = block
   9:     | emptyStatement
  10:     | variableDeclarationStatement
  11:     | assignmentStatement
  12:     | returnStatement;
  13:  
  14: block.Production = @openCurlyBrace + statement.ZeroOrMore() + @closeCurlyBrace;
  15:         
  16: emptyStatement.Production = semiColon.ToTerm().ToProduction();
  17:  
  18: variableDeclarationStatement.Production = @var + @identifier + @semiColon;
  19:  
  20: assignmentStatement.Production = @identifier + @assignment + expression + @semiColon;
  21:  
  22: returnStatement.Production = @return + expression + @semiColon;
  23:  
  24: expression.Production = equalityExpression.ToTerm().ToProduction();
  25:  
  26: equalityExpression.Production = additiveExpression + 
  27:     ((@equality | @inequality) + equalityExpression).Optional();
  28:  
  29: additiveExpression.Production = multiplicativeExpression + 
  30:     ((@addition | @subtraction) + additiveExpression).Optional();
  31:  
  32: multiplicativeExpression.Production = primaryExpression + 
  33:     ((@multiplication | @division) + multiplicativeExpression).Optional();
  34:  
  35: primaryExpression.Production = numberExpression
  36:     | functionAccessExpression
  37:     | simpleName
  38:     | parenthesizedExpression;
  39:  
  40: numberExpression.Production = @number.ToTerm().ToProduction();
  41:  
  42: simpleName.Production = @identifier.ToTerm().ToProduction();
  43:  
  44: functionAccessExpression.Production = @identifier + @openParenthesis + 
  45:     functionArgumentList.Optional() + @closeParenthesis;
  46:  
  47: functionArgumentList.Production = expression + (@comma + expression).ZeroOrMore();
  48:  
  49: parenthesizedExpression.Production = @openParenthesis + expression + @closeParenthesis;

The grammar productions above all use the default tree constructors so the AST output when parsing documents is extremely verbose.  As an example, let’s look at this input:

   1: function IncrementAndMultiply(x, y) {
   2:     var result;
   3:     result = Increment(x);
   4:     return result * y;
   5: }

With our grammar as-is, this is the verbose AST output:

   1: CompilationUnit[
   2:     [
   3:         FunctionDeclaration[
   4:             "function"
   5:             "IncrementAndMultiply"
   6:             "("
   7:             FunctionParameterList[
   8:                 "x"
   9:                 [
  10:                     ","
  11:                     "y"
  12:                 ]
  13:             ]
  14:             ")"
  15:             Block[
  16:                 "{"
  17:                 [
  18:                     Statement[
  19:                         VariableDeclarationStatement[
  20:                             "var"
  21:                             "result"
  22:                             ";"
  23:                         ]
  24:                     ]
  25:                     Statement[
  26:                         AssignmentStatement[
  27:                             "result"
  28:                             "="
  29:                             Expression[
  30:                                 EqualityExpression[
  31:                                     AdditiveExpression[
  32:                                         MultiplicativeExpression[
  33:                                             PrimaryExpression[
  34:                                                 FunctionAccessExpression[
  35:                                                     "Increment"
  36:                                                     "("
  37:                                                     FunctionArgumentList[
  38:                                                         Expression[
  39:                                                             EqualityExpression[
  40:                                                                 AdditiveExpression[
  41:                                                                     MultiplicativeExpression[
  42:                                                                         PrimaryExpression[
  43:                                                                             SimpleName[
  44:                                                                                 "x"
  45:                                                                             ]
  46:                                                                         ]
  47:                                                                         []
  48:                                                                     ]
  49:                                                                     []
  50:                                                                 ]
  51:                                                                 []
  52:                                                             ]
  53:                                                         ]
  54:                                                         []
  55:                                                     ]
  56:                                                     ")"
  57:                                                 ]
  58:                                             ]
  59:                                             []
  60:                                         ]
  61:                                         []
  62:                                     ]
  63:                                     []
  64:                                 ]
  65:                             ]
  66:                             ";"
  67:                         ]
  68:                     ]
  69:                     Statement[
  70:                         ReturnStatement[
  71:                             "return"
  72:                             Expression[
  73:                                 EqualityExpression[
  74:                                     AdditiveExpression[
  75:                                         MultiplicativeExpression[
  76:                                             PrimaryExpression[
  77:                                                 SimpleName[
  78:                                                     "result"
  79:                                                 ]
  80:                                             ]
  81:                                             [
  82:                                                 "*"
  83:                                             ]
  84:                                             MultiplicativeExpression[
  85:                                                 PrimaryExpression[
  86:                                                     SimpleName[
  87:                                                         "y"
  88:                                                     ]
  89:                                                 ]
  90:                                                 []
  91:                                             ]
  92:                                         ]
  93:                                         []
  94:                                     ]
  95:                                     []
  96:                                 ]
  97:                             ]
  98:                             ";"
  99:                         ]
 100:                     ]
 101:                 ]
 102:                 "}"
 103:             ]
 104:         ]
 105:     ]
 106: ]

When we look at that output, we see lots of empty value nodes that contain other nodes, we see nodes for tokens such as ( that we don’t care about, and lots of extra nodes for various expressions (like AdditiveExpression) that don’t really apply here, but which are inserted due to how expressions are defined.

Let’s fix all these problems right now and modify our grammar productions to add tree constructors.

Simple language grammar tree constructor enhancements

Here is the modified set of grammar productions we’ll use.  Note the tree constructors that are now present in nearly all of the productions as compared to our original grammar productions listed above.

   1: // Output a root 'CompilationUnit' node that contains the children of the ZeroOrMore quantifier, 
   2: //   which is zero or more FunctionDeclarations
   3: this.Root.Production = functionDeclaration.ZeroOrMore().SetLabel("decl")
   4:     > Ast("CompilationUnit", AstChildrenFrom("decl"));
   5:  
   6: // Outputs a 'FunctionDeclaration' node with children:
   7: //   1) The function declaration name value
   8: //   2) A 'Parameters' node that contains each parameter name; or nothing gets inserted
   9: //      if there are no parameters
  10: //   3) The Block statement result
  11: functionDeclaration.Production = @function + @identifier["name"] + @openParenthesis + 
  12:     functionParameterList["params"].Optional() + @closeParenthesis + block["block"]
  13:     > Ast("FunctionDeclaration", AstFrom("name"), 
  14:         AstConditionalFrom("Parameters", "params"), AstFrom("block"));
  15:  
  16: // Outputs a 'FunctionParameterList' node whose children are parameter name values...
  17: //   Is an excellent sample of how to build node for a delimited list
  18: functionParameterList.Production = @identifier["param"] + 
  19:     (@comma + @identifier["param"] > AstFrom("param")).ZeroOrMore().SetLabel("moreparams")
  20:     > Ast("FunctionParameterList", AstFrom("param"), AstChildrenFrom("moreparams"));
  21:  
  22: // Outputs the result that came from the appropriate statement type that was matched...
  23: //   We don't want to wrap the results with a 'Statement' node so we use AstFrom
  24: statement.Production = block["stmt"] > AstFrom("stmt")
  25:     | emptyStatement > null
  26:     | variableDeclarationStatement["stmt"] > AstFrom("stmt")
  27:     | assignmentStatement["stmt"] > AstFrom("stmt")
  28:     | returnStatement["stmt"] > AstFrom("stmt");
  29:  
  30: // Outputs a 'Block' node whose children are the contained statements
  31: block.Production = @openCurlyBrace + statement.ZeroOrMore().SetLabel("stmt") + @closeCurlyBrace
  32:     > Ast("Block", AstChildrenFrom("stmt"));
  33:         
  34: // Don't output anything since we don't care about empty statements
  35: emptyStatement.Production = semiColon > null;
  36:  
  37: // Outputs a 'VariableDeclarationStatement' node whose child is the variable name
  38: variableDeclarationStatement.Production = @var + @identifier["name"] + @semiColon
  39:     > Ast("VariableDeclarationStatement", AstFrom("name"));
  40:  
  41: // Outputs a 'AssignmentStatement' node whose children are the variable name 
  42: //   and the assigned expression
  43: assignmentStatement.Production = @identifier["varname"] + @assignment + 
  44:     expression["exp"] + @semiColon
  45:     > Ast("AssignmentStatement", AstFrom("varname"), AstFrom("exp"));
  46:  
  47: // Outputs a 'ReturnStatement' node whose child is the expression to return
  48: returnStatement.Production = @return + expression["exp"] + @semiColon
  49:     > Ast("ReturnStatement", AstFrom("exp"));
  50:  
  51: // Outputs the resulting node from the EqualityExpression
  52: expression.Production = equalityExpression["exp"] > AstFrom("exp");
  53:  
  54: // Outputs the resulting node from the AdditiveExpression; or if an equality 
  55: //   operator is found, outputs a '==' or '!=' node whose children are the 
  56: //   left and right child expressions of the operator
  57: equalityExpression.Production = additiveExpression["leftexp"] + 
  58:     ((@equality | @inequality).SetLabel("op") + equalityExpression["rightexp"]).Optional()
  59:     > AstValueOfConditional(AstChildFrom("op"), AstFrom("leftexp"), AstFrom("rightexp"));
  60:  
  61: // Outputs the resulting node from the MultiplicativeExpression; or if an additive 
  62: //   operator is found,  outputs a '+' or '-' node whose children are the 
  63: //   left and right child expressions of the operator
  64: additiveExpression.Production = multiplicativeExpression["leftexp"] + 
  65:     ((@addition | @subtraction).SetLabel("op") + additiveExpression["rightexp"]).Optional()
  66:     > AstValueOfConditional(AstChildFrom("op"), AstFrom("leftexp"), AstFrom("rightexp"));
  67:             
  68: // Outputs the resulting node from the PrimaryExpression; or if an multiplicative 
  69: //   operator is found, outputs a '*' or '/' node whose children are the 
  70: //   left and right child expressions of the operator
  71: multiplicativeExpression.Production = primaryExpression["leftexp"] + 
  72:     ((@multiplication | @division).SetLabel("op") + multiplicativeExpression["rightexp"]).Optional()
  73:     > AstValueOfConditional(AstChildFrom("op"), AstFrom("leftexp"), AstFrom("rightexp"));
  74:  
  75: // Outputs the result that came from the appropriate expression type that was matched...
  76: //   We don't want to wrap the results with a 'Expression' node so we use AstFrom
  77: primaryExpression.Production = numberExpression["exp"] > AstFrom("exp")
  78:     | functionAccessExpression["exp"] > AstFrom("exp")
  79:     | simpleName["exp"] > AstFrom("exp")
  80:     | parenthesizedExpression["exp"] > AstFrom("exp");
  81:  
  82: // Outputs the numeric value
  83: numberExpression.Production = @number.ToTerm().ToProduction();
  84:  
  85: // Outputs the name value
  86: simpleName.Production = @identifier.ToTerm().ToProduction();
  87:  
  88: // Outputs a 'FunctionAccessExpression' node with children:
  89: //   1) The function name
  90: //   2) An 'Arguments' node that contains each argument name; or nothing gets inserted
  91: //      if there are no arguments
  92: functionAccessExpression.Production = @identifier["name"] + @openParenthesis + 
  93:     functionArgumentList["args"].Optional() + @closeParenthesis
  94:     > Ast("FunctionAccessExpression", AstFrom("name"), AstConditionalFrom("Arguments", "args"));
  95:  
  96: // Outputs a 'FunctionArgumentList' node whose children are argument expressions...
  97: //   Is an excellent sample of how to build node for a delimited list
  98: functionArgumentList.Production = expression["exp"] + 
  99:     (@comma + expression["exp"] > AstFrom("exp")).ZeroOrMore().SetLabel("moreexps")
 100:     > Ast("FunctionArgumentList", AstFrom("exp"), AstChildrenFrom("moreexps"));
 101:  
 102: // Outputs a 'ParenthesizedExpression' node whose child is the contained expression
 103: parenthesizedExpression.Production = @openParenthesis + expression["exp"] + @closeParenthesis
 104:     > Ast("ParenthesizedExpression", AstFrom("exp"));

You can see we’re now making use of labels and all the various Ast* methods described in the previous post.  Let’s use the same input as above and see how our AST now looks:

   1: CompilationUnit[
   2:     FunctionDeclaration[
   3:         "IncrementAndMultiply"
   4:         Parameters[
   5:             "x"
   6:             "y"
   7:         ]
   8:         Block[
   9:             VariableDeclarationStatement[
  10:                 "result"
  11:             ]
  12:             AssignmentStatement[
  13:                 "result"
  14:                 FunctionAccessExpression[
  15:                     "Increment"
  16:                     Arguments[
  17:                         SimpleName[
  18:                             "x"
  19:                         ]
  20:                     ]
  21:                 ]
  22:             ]
  23:             ReturnStatement[
  24:                 *[
  25:                     SimpleName[
  26:                         "result"
  27:                     ]
  28:                     SimpleName[
  29:                         "y"
  30:                     ]
  31:                 ]
  32:             ]
  33:         ]
  34:     ]
  35: ]


Now that’s a big improvement.  We only have nodes for information that is important to us.  And as mentioned in the past, each AST node that is generated automatically adds which start/end offsets it occurs over, which is invaluable information for consuming the AST later on.

Elimination of empty value nodes

You’ll note that we completely eliminated all empty value nodes that are generated by quantifiers.  An example is where in the root non-terminal’s production, we create a CompilationUnit whose child nodes come directly from the child nodes of the ZeroOrMore quantifier.

Elimination of unimportant terminal nodes

We no longer have any AST nodes that relate to unimportant terminals such as ( or {.  We now only have AST nodes for terminals that are meaningful to the code structure.

Elimination of expression nodes that don’t apply

Due to how expressions must be defined to properly set up operator precedence, the default tree constructors would create a lot of nodes such as AdditiveExpression that didn’t apply for this input.  With our modifications, we’ve completely eliminated those extra expressions.  You’ll note our return statement at the end generates an AST node whose value is * (the operator used) and whose two children are the left and right child expressions to be multiplied.  This functionality is achieved via the tree construction node provided by the AstValueOfConditional method.

More conditional output

Although the above input doesn’t show it, say IncrementAndMultiply didn’t have any parameters.  Via use of the tree construction node created by AstConditionalFrom, the Parameters node would have been left out of the AST output in that case.

Next steps

We showed today how to slim down our AST and have it only return information that is important to use.  In the next post, we’ll examine some of the error handling features available in the grammar/AST framework.

Comments (2) -

August 11, 2010 at 19:03  

chadbr United States

Any ETA on the next release?

August 12, 2010 at 03:04  

Bill Henning (Actipro) United States

Probably sometime in September.  We're doing major updates to some other products in WPF Studio as well.

Pingbacks and trackbacks (2)+

Comments are closed