• Products
  • Download
  • Purchase
  • Support
  • Company
Actipro Software company logo
Twitter Follow Actipro RSS Subscribe (RSS Feed)

The Actipro Blog

Tag Cloud

  • aero
  • blog
  • docking
  • editors
  • gauge
  • intelliprompt
  • navigation
  • propertygrid
  • ribbon
  • shared library
  • silverlight
  • syntaxeditor
  • themes
  • views
  • winforms
  • wpf

Latest Twitter News

November 21, 2011 at 11:14 AM
#WPF Studio 2011.2 is out now! Includes enhanced themes for native WPF conrtols and new SyntaxEditor features. http://t.co/uEMCaGPG

September 26, 2011 at 1:25 PM
If you'd like to see our #WPF / #Silverlight SyntaxEditor code editor control ported to Metro, provide feedback here: http://t.co/xXBNIDTi

September 15, 2011 at 8:31 PM
If you want to see SyntaxEditor eventually show up in Win8's #xaml UI, be sure to add your support to this MS thread: http://t.co/FBjz6TuC

August 15, 2011 at 1:47 PM
New SyntaxEditor IntelliPrompt parameter info feature docs/samples ready for the 2011.2 #WPF and #Silverlight releases. http://t.co/ezoYIjv

August 2, 2011 at 2:40 PM
First look at new automated IntelliPrompt parameter info coming to our C#/VB editor control in #WPF / #Silverlight http://t.co/CUz6O1T

Twitter Follow us on Twitter

Month List

  • 2012
    • February (3)
    • January (2)
  • 2011
    • December (2)
    • November (7)
    • October (2)
    • September (1)
    • August (5)
    • July (3)
    • June (6)
    • May (5)
    • April (8)
    • March (4)
    • February (5)
    • January (9)
  • 2010
    • December (9)
    • November (10)
    • October (4)
    • September (8)
    • August (12)
    • July (9)
    • June (7)
    • May (6)
    • April (7)
    • March (6)
    • February (6)
    • January (4)
  • 2009
    • December (2)
    • November (2)
    • October (12)
    • September (3)
    • August (11)
    • July (10)
    • June (6)
    • May (3)
    • April (7)
    • March (6)
    • February (8)
    • January (10)
  • 2008
    • December (10)
    • November (2)
    • October (3)
    • September (5)
    • August (5)
    • July (8)
    • June (4)
    • May (4)
    • April (10)
    • March (8)
    • February (1)
    • January (2)

Category List

  • RSS feed for ActiproActipro (289)
  • RSS feed for Blog SummaryBlog Summary (13)
  • RSS feed for GeneralGeneral (34)
  • RSS feed for In developmentIn development (150)
  • RSS feed for New featuresNew features (140)
  • RSS feed for New productNew product (30)
  • RSS feed for PromotionPromotion (2)
  • RSS feed for SilverlightSilverlight (71)
  • RSS feed for Tips and tricksTips and tricks (4)
  • RSS feed for Visual Studio 2008Visual Studio 2008 (2)
  • RSS feed for Windows FormsWindows Forms (20)
  • RSS feed for Windows VistaWindows Vista (10)
  • RSS feed for WPFWPF (235)
  • RSS feed for XAMLXAML (23)

About Us

Actipro Software is a leading provider of .NET user interface controls for the WPF, Silverlight, and WinForms frameworks, and is most well-known for their SyntaxEditor syntax-highlighting code editor control.

Please take some time to learn more about us and our product offerings.

SyntaxEditor grammar/AST framework part 2: Introduction to symbols and EBNF terms

August 6, 2010 at 1:37 AM
by Bill Henning (Actipro)

In the previous post, we gave an overview about what we’ll be covering in this multi-part walkthrough of the next generation grammar/AST framework we’re developing for SyntaxEditor, our code editor control.

Today’s post will dive right into a grammar and the building blocks needed to create one.

Grammar

A grammar is a set of production rules for parsing text in a formal language, where there is a single starting symbol from which parsing should begin.  All code languages should have some sort of EBNF grammar specification available for reference on the Internet, so that compilers and other products know how to read the code and interpret it.

Symbols

A grammar is comprised of symbols, which are either terminals or non-terminals.

Terminals are the lowest-level fixed-input lexical element used in a grammar production rule.  In the case of SyntaxEditor’s framework, terminals equate to tokens that are read in from a lexer.

NonTerminals each have a production rule, that is comprised of some combination of terminal and non-terminal symbol references.

EBNF sample

Take this EBNF for a Simple language FunctionDeclaration non-terminal:

   1: FunctionDeclaration:
   2:     "function" "identifier" "(" FunctionParameterList? ")" Block

That is saying the non-terminal’s name is FunctionDeclaration and a function declaration is built by a concatenation of a function terminal, followed by an identifier terminal, followed by an open parenthesis terminal, followed by an optional FunctionParameterList non-terminal, followed by a close parenthesis terminal, and ending with a Block non-terminal. 

EBNF terms

In the Actipro grammar framework, we have the concept of EBNF terms, all of which inherit an EbnfTerm base class.  There are multiple classes use to represent various types of EBNF terms, each of which can be used within a production rule for a non-terminal:

  • EbnfProduction
  • EbnfTerminal
  • EbnfNonTerminal
  • EbnfQuantifier
  • EbnfConcatenation
  • EbnfAlternation

EbnfProduction

An EbnfProduction represents a production rule.  Each NonTerminal has exactly one EbnfProduction associated with it that defines the EBNF for that non-terminal.

EbnfProduction objects have one child EbnfTerm that provides the EBNF pattern to match for the production rule.  Productions can also optionally be assigned custom tree constructors, but we’ll get into that more in several posts from now.

EbnfTerminal

An EbnfTerminal represents the usage of a pre-defined Terminal instance within a production rule.

For instance, a Terminal for an Identifier is likely to be used many places in a grammar.  The Terminal for an Identifier is defined once at the top of the grammar declaration and each usage of it within a production is made via separate EbnfTerminal instances that reference the Terminal.

EbnfNonTerminal

An EbnfNonTerminal is the same as an EbnfTerminal except that instead of referencing a Terminal, it is referencing a NonTerminal.

EbnfQuantifier

An EbnfQuantifier wraps a child EbnfTerm and allows you to specify the minimum and maximum number of matches that can be made of the child term.

There are pre-defined helpers for standard Kleene operators such as optional (?), zero-or-more (*), and one-or-more (+).

EbnfConcatenation

An EbnfConcatenation is a sequence of other EbnfTerms that are matched at the same scope level when performing tree construction.

EbnfAlternation

An EbnfAlternation is a way for one of multiple supplied EbnfTerm options to match.  In standard EBNF notation, alternations are typically represented with a vertical bar (|).

First look at building a non-terminal

Now that we’ve covered the basics, let’s dig in and see how it works.  We’ll write some grammar creation code that builds the FunctionDeclaration non-terminal example above.

Our grammar framework has been designed so that you code the grammar right in C#/VB and no code generation is necessary via an external tool.  A grammar must inherit our Grammar class.

A grammar is generally initialized in its constructor with four steps:

  1. Create terminals
  2. Create non-terminals
  3. Configure non-terminal can-match callbacks
  4. Configure non-terminal productions

Create terminals

The first step is where we simply declare Terminal variables, one for each token that will be supplied to us by a lexer and used in our non-terminal productions.

For the above example EBNF, we have four terminals so let’s declare them:

   1: var @closeParenthesis = new Terminal(SimpleTokenId.CloseParenthesis, "CloseParenthesis") 
   2:     { ErrorAlias = "')'" };
   3: var @function = new Terminal(SimpleTokenId.Function, "Function") 
   4:     { ErrorAlias = "'function'" };
   5: var @identifier = new Terminal(SimpleTokenId.Identifier, "Identifier");
   6: var @openParenthesis = new Terminal(SimpleTokenId.OpenParenthesis, "OpenParenthesis") 
   7:     { ErrorAlias = "'('" };

You’ll notice that each variable has a @ in front of it.   In C#, the @ character is simply an escape character in case the identifier is a keyword.  In our case, we’re going to use it just so later in our non-terminal productions, we can easily tell which variables are terminals and which are non-terminals.

The Terminal constructor takes a token ID, and the string key that identifies the terminal.  Note that the SimpleTokenId class contains const token ID values and was generated for us by the Actipro Language Designer application (a new feature for the next version).  Token IDs are how the grammar parser matches terminals with tokens that are being read.  The tokens come from ILexer instances that you already have defined if you use SyntaxEditor.

Some of the Terminals are also assigned an ErrorAlias.  When parsing, if a terminal is expected, but the current token doesn’t match, a parse error will be reported that will use the error alias if it is specified.  If not specified, it will fall back to using the terminal’s key.

Now that our four terminals are defined, let’s create our non-terminals.

Create non-terminals

We now declare NonTerminal variables, one for each non-terminal in our grammar. 

For our blog post, we will just declare one non-terminal variable even though a real grammar would have a lot more and we are even referencing other non-terminals for FunctionParameterList and Block in our production below.

   1: var functionDeclaration = new NonTerminal("FunctionDeclaration");

Configure non-terminal can-match callbacks

Some grammars have more than one non-terminal that start with the same terminal and since our grammar is LL(*)-based, it is not allowed to have two or more non-terminals starting with the same terminal in an alternation together.  If we did allow that, then the parser would never have the possibility of matching anything but the first non-terminal in the alternation that started with that terminal.  This is called ambiguity and we automatically detect it for you and warn you about it when it is found.

We allow you to easily resolve ambiguity by setting a can-match callback on non-terminals that require it.  In that case, we execute your custom callback code to determine if a non-terminal can match rather than the normal way that is ambiguous.

This is a confusing concept but the next post will show an example of why and when to do this.

Configure non-terminal productions

Finally we come to the meat of the grammar.  All terminals and non-terminal variables have been declared and we’re ready to get into some EBNF-like notation for our non-terminal production rules that consists of various EbnfTerm instances.

This is where the real magic of our grammar framework happens.  We’ve made tons of operator overloads, methods, indexers, and implicit casts so that you can define production rules with a minimal amount of C#/VB code.

Let’s define our example non-terminal production rule now:

   1: functionDeclaration.Production = @function + @identifier + @openParenthesis + 
   2:     functionParameterList.Optional() + @closeParenthesis + block;

It is our convention to have @ characters at the start of our terminal variables so they stand out differently than non-terminal variables.  The + operator means concatenation.  Even though we are referencing Terminal and NonTerminal-based variables here, operator overloading is actually converting each Terminal reference to an EbnfTerminal and each NonTerminal reference to an EbnfNonTerminal behind the scenes.  Finally note the Optional method call which wraps the EbnfNonTerminal for FunctionParameterList with an EbnfQuantifier that has a minimum of 0 and a maximum of 1.

Putting it all together

Wasn’t that easy?  Of course a real grammar would have a lot more terminals, non-terminals, and production rules but you just follow the same process for them too. 

Before we end this post, let’s assume we have the following input to the parser for this grammar:

   1: function Add() {}

The default tree constructors will build a full AST for you without you needing to do anything.  Of course it includes much more information than is practical, which is why we’ll show in a future post how to optimize tree construction.  Anyhow, here is a text representation of the AST node output for that function using the default tree constructor:

   1: FunctionDeclaration[
   2:     "function"
   3:     "Add"
   4:     "("
   5:     []
   6:     ")"
   7:     Block[
   8:         "{"
   9:         []
  10:         "}"
  11:     ]
  12: ]

Each AST node in the output automatically stores the proper start/end offsets for you too making it possible to query which node contains an offset later on.

In our next post in this series we’ll show a complete grammar example based on what we’ve learned so far.  Then in posts after that, we’ll show how to customize tree construction and how to process errors.

Tags: wpf, silverlight, syntaxeditor
Filed under: Actipro, In development, WPF, Silverlight
Submit to DotNetKicks...
Permalink | Comments (4)

Related posts

SyntaxEditor grammar/AST framework part 3: Creating a grammar for the Simple languageIn the previous post we gave a detailed introduction to symbols, EBNF terms, and how you can transla...SyntaxEditor grammar/AST framework part 4: Introduction to customizing tree constructionIn the previous post of this series, we walked through how to create a grammar for the Simple langua...SyntaxEditor grammar/AST framework part 6: Introduction to callbacks and error handlingIn the previous post, we optimized the tree construction output of our Simple language to be very co...

Comments

August 6, 2010 at 03:51  

trackback

ActiPro SyntaxEditor discussion of forthcoming grammar/AST framework

You've been kicked (a good thing) - Trackback from DotNetKicks.com

DotNetKicks.com

August 6, 2010 at 14:50  

Ken Brown

This is very exciting news Bill!

The short tutorial was also very informative and I really like the programmatic approach.

Regards,
Ken Brown –CSMicro Systems

Ken Brown United States

August 7, 2010 at 02:33  

Bill Henning (Actipro)

Glad you like the tutorial Ken.  You'll really like the upcoming posts in this series too as they will walk you completely through how easy it is to create a grammar using this framework.

Bill Henning (Actipro) United States

August 27, 2010 at 01:40  

trackback

FeedBurner blog post RSS feed issue fixed

FeedBurner blog post RSS feed issue fixed

The Actipro Blog - WPF and WinForms Development

Comments are closed
Copyright © 1999-2012 Actipro Software LLC. All rights reserved.
Home Actipro Software | Products | Download | Contact Us