View previous topic :: View next topic |
Author |
Message |
itsatomic
Joined: 12 Jun 2006 Posts: 38
|
Posted: Sun Jun 25, 2006 7:19 am Post subject: Lexer block rules example |
|
|
Hi again, Michael.
I have been reading the Lexer help file / documentation, and trying my best to understand how it all works. I am beginning to understand, but the block rules purpose and implementation are a big stumbling block for me.
In the helpfile you have the following:
Code: | Rule "class derived" of LexLib.Pascal:
0. EQUAL <identifier>
1. EQUAL ) <symbol>
2. SKIP <symbol, identifier>
3. EQUAL ( <symbol>
4. EQUAL class <identifier>
5. EQUAL = <symbol>
TComponent = class ( TPersistent , IInterface , IInterfaceComponentReference )
Conditions start 9 8 7 6 5 4 3 2 1 0
Conditions end 1 0 -1 -2 -3 -4 -5 -6 -7 -8
|
with 0 under the ) character, etc.
I understand you are going backwards from the end of the string, but I do not understand why 1 = ) in the list, when the 0 is under the )?? It seems to imply there is an identifier AFTER the ). Can you please help me understand this?
Also, other than block highlighting, what is a block used for? I could not catch all the uses of this lexer construct from the help files.
It would be fantastic if you could explain the process of creating a lexer from scratch, even if it was a simple one. The lexer I am using needs alot of tweaking and I would much rather learn how to do it than bug you with questions
I cannot even see this rule in the current lexer library entry for Pascal either. Hmmm....
Could you provide a simple and maybe a complex block rule definition / example please?
Thanks!
Aaron |
|
Back to top |
|
|
econtrol Site Admin
Joined: 09 Jun 2006 Posts: 202
|
Posted: Mon Jun 26, 2006 6:37 am Post subject: |
|
|
Hi, Aaron,
Yes this rule already removed from pascal lexers (replaced with gramma rule reference).
0 identifire means that after ( ... ) must be an identifier (not symbol), for example "private", "end".
Maybe it is not a good example.
Code: | Another simple example "record" from Pascal lexer:
0: EQUAL record
1: EQUAL =
2: EQUAL <identifier>
TMyRecord = record
0 1 2 |
When you are writting rule you should be sure that this rule will be succeeded only for the required situations.
For example, if there will be rule (range start)
Code: | 0: EQUAL class
1: EQUAL =
2: EQUAL <identifier>
and corresponding rule (range end)
0: EQUAL end
1: EQUAL ; |
In this case there will be errors in ranges detection in files where forward class declaration is used
Range start rule will succeed but there is no "end;" for such declarations, so all next ranges may be incorrect.
Michael. |
|
Back to top |
|
|
itsatomic
Joined: 12 Jun 2006 Posts: 38
|
Posted: Mon Jun 26, 2006 7:31 am Post subject: Stylesheets lexer |
|
|
The stylesheets lexer displays the following result in the config dialog:
body {
font-family: Tahoma, Verdana, Arial, Helvetica, sans-serif;
font-size: 8pt
}
The list of font faces should all be blue, I think. What change would need to be made to the lexer to effect this change?
I am trying to learn how to do lexers, so simply changing the lexer and emailing it will not be as helpful as describing the process you go through to correct the lexer here.
Thanks
Aaron |
|
Back to top |
|
|
econtrol Site Admin
Joined: 09 Jun 2006 Posts: 202
|
Posted: Mon Jun 26, 2006 9:38 am Post subject: |
|
|
I've updated rule "Param":
0: <> ':'
1: = <string>
and set "Identifier index" = 1
Michael. |
|
Back to top |
|
|
itsatomic
Joined: 12 Jun 2006 Posts: 38
|
Posted: Mon Jun 26, 2006 2:57 pm Post subject: |
|
|
Thanks, that worked. I have no idea why, though.
Can you provide any suggestions for learning how to construct a lexer without developing a syntax highlight component myself?
Is it too involved to provide a step-by-step explanation of how you develop even a simple lexer?
I am really trying my very best to understand what is involved, honestly, I am!
Aaron |
|
Back to top |
|
|
econtrol Site Admin
Joined: 09 Jun 2006 Posts: 202
|
Posted: Wed Jun 28, 2006 7:26 am Post subject: |
|
|
Main sequence of analysis:
1. Extracting tokens
All text is assumed continuous. On each iteration parser rules are applied to the current analyzed position. If any rule is succeeded next rules are not checked, token is created and current position is incremented on the length of the token.
After this step we have an array of tokens (words). Each token has token index (that corresponds to the token type name) and token position in the text (position is used to get token string).
2. Processing "Block" rules
These rules are applied to the tokens array.
Each "block" rule is a collection of conditions and attributes.
Each condition is checked for a token.
Conditions are checked in backward order, i.e. first condition is applied to the current token index, second to the previous token and so on.
"Block" rule succeeds only if all condition succeed.
All block rules are checked for each token index (to break rules checking you may use property "Cancel next rules" that breaks rules checking if this rule succeeds).
To design "Block" rules you should initially design a full set of parser rules and check them (for errors) in demo editor (hint of token contains required information: token index, position, ...).
After that you should find sequences of tokens that may be described with the block rule.
The most simple block rules - "tag detector". If this rule succeeds style of the identifier token (reference index is defined by the corresponding property of the block rule) is changed.
Michael. |
|
Back to top |
|
|
|