EControl Ltd. Forum Index EControl Ltd.
VCL libraries and software support forum
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Regular expressions in lexer grammar

 
Post new topic   Reply to topic    EControl Ltd. Forum Index -> EControl Syntax Editor
View previous topic :: View next topic  
Author Message
Michael Pro



Joined: 20 Aug 2007
Posts: 3

PostPosted: Tue Sep 18, 2007 5:33 am    Post subject: Regular expressions in lexer grammar Reply with quote

Let me explain my problem.
Now I'm writing modified lexer for Pascal Script and I've wrote some grammar constructions for my new lexer.

For example, here is a part of my definitions

IdentList = <Identifier> (',' <Identifier>)*;
TypeRes = ':' <Identifier>;
IdentWithType = IdentList TypeRes;
IdentWithTypeList = (IdentWithType ';')+;

It's defining a kind of typical pascal declarations

FMode, FStatus: Boolean;
Counter: Integer;
Velocity: Double;

etc

Although, as you see, this definitions use greedy algorithm for token finding - it will find only two first strings in memo, ignoring the last one with Velocity.
I've tried to use standard '+?' for IdentWithTypeList to use non-greedy method:

IdentWithTypeList = (IdentWithType ';')+?;

But it doesn't work - grammar isn't compiling. What should I do in this situation? How should I use non-greedy algorithm in grammar? It's vital, because there is no other way in defining recursive constructions.

Best Regards,
Michael.
Back to top
View user's profile Send private message
Michael Pro



Joined: 20 Aug 2007
Posts: 3

PostPosted: Tue Sep 18, 2007 5:45 am    Post subject: Reply with quote

Well, I've fixed some definitions in Rules.
I'm using rule for variable list detecting with this grammar and Self closing range.
Despite this the analyzer doesn't use greedy algorythm at all.
In Memo

FMode, FStatus: Boolean;
Counter: Integer;
Velocity: Double;

It finds only two first as desired block and doesn't affects the last block - it seems like it's work as "non-greedy". But declaration

IdentWithTypeList = (IdentWithType ';')+;

means that all three strings must be found - because selection method is "greedy".
Back to top
View user's profile Send private message
econtrol
Site Admin


Joined: 09 Jun 2006
Posts: 202

PostPosted: Tue Sep 18, 2007 6:16 pm    Post subject: Reply with quote

Hello.

Back scanning algorithm is used, so if you link range (with "Initially closed" flag) to rule

IdentWithTypeList = (IdentWithType ';')+;

there will be created 3 ranges.

FMode, FStatus: Boolean;

FMode, FStatus: Boolean;
Counter: Integer;

FMode, FStatus: Boolean;
Counter: Integer;
Velocity: Double;

This occurs due to rule is checked at each token and in back order.
It is not true grammar, it is only extended test token sequence algorithm.
It is intended to improve condition checking, but not for real parsing of text.

To correctly extract this text you should define terminate condition, for example:

IdentWithTypeList = (IdentWithType ';')+ NextSectionStart;
NextSectionStart = 'begin' | 'const' | 'var';

In this case first will be checked 'NextSectionStart' and then (IdentWithType ';')+.
Disadvantage 'NextSectionStart' will be included in range, that is not good.

I think it will be good to combine grammar rule and usual conditions. For example, if usual condition is defined and is tested successfully grammar rule is tested.

LALR algorithm is not appropriate for purpose of sequence testing (it requiers code consistency around tested sequence). Implemented algorithm is intended for testing sequences independently on surrounding text.

P.S. I have plans to implement LALR algorithm as additional method of text analysis.

Michael.
Back to top
View user's profile Send private message Send e-mail
Michael Pro



Joined: 20 Aug 2007
Posts: 3

PostPosted: Wed Sep 19, 2007 5:03 am    Post subject: Reply with quote

Thanks a lot.
Yesterday I've solved this problem in similiar way - created a list with standard Script Pascal "ending sequences" for variables section Smile
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    EControl Ltd. Forum Index -> EControl Syntax Editor All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group