EControl Ltd.

Michael Pro · Joined: 20 Aug 2007 Posts: 3

Let me explain my problem.
Now I'm writing modified lexer for Pascal Script and I've wrote some grammar constructions for my new lexer.

For example, here is a part of my definitions

IdentList = <Identifier> (',' <Identifier>)*;
TypeRes = ':' <Identifier>;
IdentWithType = IdentList TypeRes;
IdentWithTypeList = (IdentWithType ';')+;

It's defining a kind of typical pascal declarations

FMode, FStatus: Boolean;
Counter: Integer;
Velocity: Double;

etc

Although, as you see, this definitions use greedy algorithm for token finding - it will find only two first strings in memo, ignoring the last one with Velocity.
I've tried to use standard '+?' for IdentWithTypeList to use non-greedy method:

IdentWithTypeList = (IdentWithType ';')+?;

But it doesn't work - grammar isn't compiling. What should I do in this situation? How should I use non-greedy algorithm in grammar? It's vital, because there is no other way in defining recursive constructions.

Best Regards,
Michael.

Michael Pro · Joined: 20 Aug 2007 Posts: 3

Well, I've fixed some definitions in Rules.
I'm using rule for variable list detecting with this grammar and Self closing range.
Despite this the analyzer doesn't use greedy algorythm at all.
In Memo

FMode, FStatus: Boolean;
Counter: Integer;
Velocity: Double;

It finds only two first as desired block and doesn't affects the last block - it seems like it's work as "non-greedy". But declaration

IdentWithTypeList = (IdentWithType ';')+;

means that all three strings must be found - because selection method is "greedy".

econtrol · Site Admin Joined: 09 Jun 2006 Posts: 202

Hello.

Back scanning algorithm is used, so if you link range (with "Initially closed" flag) to rule

IdentWithTypeList = (IdentWithType ';')+;

there will be created 3 ranges.

FMode, FStatus: Boolean;

FMode, FStatus: Boolean;
Counter: Integer;

FMode, FStatus: Boolean;
Counter: Integer;
Velocity: Double;

This occurs due to rule is checked at each token and in back order.
It is not true grammar, it is only extended test token sequence algorithm.
It is intended to improve condition checking, but not for real parsing of text.

To correctly extract this text you should define terminate condition, for example:

IdentWithTypeList = (IdentWithType ';')+ NextSectionStart;
NextSectionStart = 'begin' | 'const' | 'var';

In this case first will be checked 'NextSectionStart' and then (IdentWithType ';')+.
Disadvantage 'NextSectionStart' will be included in range, that is not good.

I think it will be good to combine grammar rule and usual conditions. For example, if usual condition is defined and is tested successfully grammar rule is tested.

LALR algorithm is not appropriate for purpose of sequence testing (it requiers code consistency around tested sequence). Implemented algorithm is intended for testing sequences independently on surrounding text.

P.S. I have plans to implement LALR algorithm as additional method of text analysis.

Michael.

Michael Pro · Joined: 20 Aug 2007 Posts: 3

Thanks a lot.
Yesterday I've solved this problem in similiar way - created a list with standard Script Pascal "ending sequences" for variables section Smile