[TransWarp] fmtparse - recursive grammar

alexander smishlajev alex at ank-sia.com
Fri Jun 20 23:10:16 EDT 2003


Phillip J. Eby wrote:
> 
>>> openingChars is used to tell a preceding item at the same rule level 
>>> what its terminators are.  So  if I have a rule that's like 
>>> Sequence(Named('foo'), ',' Named('bar')),  the openingChars of ',' is 
>>> ',', and we take the 'withTerminators()' of Named('foo') so it has 
>>> the chance to take the ',' terminator into account.
>>
>> as far as i uderstand, the terminators are used only to break the 
>> greedyness of MatchString.  and the only real provider of openingChars 
>> is StringConstant.  am i right?
> 
> Yes - but if MatchString is a regular expression, then it should also 
> supply openingChars.  It's just left to the user to do this.

i see.  i did miss that point.

> Note that this also doesn't mean you can't define your
> own syntax elements that supply openingChars as well.

yes, but are there many uses for that?

> Finally, note that Optional() supplies a combination of its
> contents' opening characters, and those of its successor, if any.

i understood that.

>> also i see that you have reversed the flow of the parsing: the papers 
>> on packrat parsing show right-to-left parsing, starting from the end 
>> of the string.  but fmtparse does left-to-right parsing.  i suppose 
>> this was done in order to utilize the regexp engine.  i do not mean 
>> that there is something wrong; i just want to ensure that i understand 
>> things right.
> 
> Actually, I'm surprised.  I had absolutely no idea that packrat parsing 
> went right to left!  Are you sure?  Honestly, I did not study the code 
> very well, just the concept of memoizing partial parse results in a 
> backtracking recursive descent parser.

well, maybe that my statement was not intelligible.  i meant that 
"classic" tabular top-down parsing fills it table starting from the 
right, and packrat parsing only applies a lazy evaluation to that scheme.

>> i propose to use the Input object to keep track of the rules being 
>> applied.  since Input.parse is called from each non-trivial 
>> Rule.parse, i think there will be no problems in memorizing the rule 
>> before calling rule.parse.  then, a rule having need in terminators 
>> could request them from it's envelope Sequence via the Input object.  
>> this wold allow the lazy evaluation of the terminators at the time 
>> when they are really needed.
>>
>> what do you think?
> 
> If I understand you correctly, I think you have underestimated the 
> problem.

maybe.  but i do not see, where.

> First, terminators can propagate laterally as well as 
> vertically within a structure

you mean opening chars, not terminators, don't you?

> and second, rules don't know about their containers.

that's where Input could help.  the last rule that called input.parse is 
the container for current rule.

> I don't think you can transfer the necessary knowledge to 
> the Input without making it part of the parsing state, which will make 
> the system much slower.

i did not intend to transfer the knowledge to Input.  i want the input 
object to act as a proxy to the container of the rule.

a new method is added to containing rules: getTerminatorsFor(rule).  in 
Sequence, this shall scan the following siblings of the rule, calling 
getOpening() on each of them, starting from the very right.  Optional, 
Repeat and Alternatives should just return their own terminators 
(computed at the start of their parse).

> Currently, for all the concrete rules, the 
> parse state is just an integer position in the string.

at first, i wanted to use the state for that, but later i found that 
using Input as a container proxy is much more appropriate.

still, i think that state should be a special object rather than plain 
integer, to allow graceful error reporting as described in chapter 
3.2.4. of the Bryan's thesis.

best wishes,
alex.





More information about the PEAK mailing list