One major area we’ve been working on for the past several weeks is how best to store data so that syntax highlighting can be fast yet not use much memory.
In SyntaxEditor for WinForms, the Document had a Tokens collection that stored all parsed tokens in it. The tokens then were used to provide syntax highlighting info to the views. It was easy to pick up at any point for incremental parsing (when typing changes were made, etc.) since the existing tokens were readily available, and thus we’d be able to have lexical context information (which lexical scopes and states we are nested into at any given offset) very quickly.
While this works fine, there are some drawbacks:
- On initial document load, the entire document must be lexically parsed… this means a brief pause when opening large documents.
- The larger the document is, the more memory is used since tokens are stored for the entire document text.
Next Generation Updates
As mentioned in some previous posts, we’ve really been focusing on improving core features for our next generation text/parsing model. Improvements fall into three areas: feature set, performance improvements, and memory reduction.
We are really focused on enhancing the editing experience for large documents. Our next gen design makes heavy use of virtualization techniques. This means we have eliminated the need to parse the entire document on load, meaning a near-instant load time for large documents. Additionally we no longer persist tokens in the document, meaning a huge memory reduction for large documents.
UPDATE: I should add that right now in the WPF control we can open a 10MB C# document about instantly and can start typing in it right away. There is not noticeable slowdown in typing response speed or scrolling speed in a large document like this either, over that of a small document.
Since document tokens are no longer persisted like in SyntaxEditor for WinForms and are retrieved on-demand, determining the lexical context for a given offset was a trick to determine. We’ve been at it for a while and finally have a good system for tracking and retrieving context info to allow incremental lexical parsing to resume near a specified offset after text updates. This was a big hurdle to cross, so…
What’s Next
We’re going to clean up some of the code and add in a way to load dynamic language definitions that you created for SyntaxEditor for WinForms. Then we’ll finally be ready to start a closed alpha test on what has been implemented so far!