[TransWarp] fmtparse - recursive grammar

Mon Jun 16 12:34:08 EDT 2003

At 03:16 PM 6/16/03 +0400, Oleg Broytmann wrote:
> >
> > So, what we really need here is to change 'fmtparse' to do its adapt()
> > calls on demand at a later stage, instead of up-front.  Converting to Once
> > bindings would be ideal for this, although I originally intended fmtparse
> > to be a standalone demo of what kinds of things can be accomplished with
> > PEP 246 adapt().  Guess I'll have to give up on that.
> >
> > Perhaps you could contribute a patch to change fmtparse to use Once
> > bindings for all adapt() on input arguments?  E.g. Alternatives() would
> > save its args in initArgs or _alternatives, and then have .alternatives be
> > a Once binding that adapts, instead of __init__ doing the adapting.  There
> > are only about 5-6 places that need changing for this, I think.
>
>    My first attempt was unsuccsessful. There is _computeRules() and
>openingChars in Sequence; I tryied to make _computeRules a delayed function
>(I renamed it to rules() and made it binding.Once)... and there is the
>problem - _computeRules() calls getOpening() on subrules, so openingChars
>must be defined in the subrules; but one of the subrules is actually
>recursivly referenced rule, that did not defined openingChars yet - it will
>be defined in the end of _computeRules(). Catch 22. Calling self.rules in
>getOpening() produces infinite recursion, od course.
>    How can I calculate openingChars in non-recursive way?

openingChars is used to tell a preceding item at the same rule level what 
its terminators are.  So  if I have a rule that's like 
Sequence(Named('foo'), ',' Named('bar')),  the openingChars of ',' is ',', 
and we take the 'withTerminators()' of Named('foo') so it has the chance to 
take the ',' terminator into account.

For a right-recursive rule, we don't have to recurse to find out the 
opening characters, because they will be defined by the first subrule.  In 
other words, if we have:

expr ::=  addend + addend | '(' expr ')'

Then the openingChars of "expr" should be addend.openingChars+'('.  In 
practice, for a sequence, its openingChars should always be the 
openingChars of its first subrule.

I think this is the way to break the recursion...  A Sequence should 
initially define its openingChars as the 'getOpening()' of its first 
sub-rule.  As long as the rule is not left-recursive, this should result in 
a stable value.  Rewriting the above:

class Expression(model.Type):
     class addend1(model.Feature): referencedType=model.Integer
     class addend2(model.Feature): referencedType=model.Integer
     class subexpr(model.Feature): referencedType='Expression'
     mdl_syntax = Alternatives(
         Sequence(addend1, '+' addend2),
         Sequence('(', subexpr, ')'),
     )

When calculating 'rules' for the second alternative sequence, if 
openingChars is computed from firstSubrule.getOpening(''), it ends up '(', 
which is correct for the sequence.

The only case in which this will *not* be correct is when the first subrule 
(in depth-first traversal) is an Optional, or any other construct that can 
yield an empty match.  This is because constructs that yield empty matches 
can open with characters from their next sibling.  So, openingChars would 
not be correct in such a case.

It seems it might be necessary to expand the Rule interface to include an 
indication of whether it is capable of matching an empty value; in that 
case, we could scan the subrules forward until we find the first non-empty 
construct, then go backwards to compute the opening characters.  This seems 
overly complicated, though.

Perhaps there is an answer, though.  If a recursive sequence contains only 
possibly empty constructs before the recursion occurs, is it meaningful any 
more to consider it a right-recursive rule?  So, the recursion can be 
broken by having rules() compute an initial "guess" for openingChars, and 
set that value *before* looping through the subrules.  getOpening() would 
invoke self.rules before returning self.openingChars.  Finally, self.rules 
would need to save an initial value in self.rules so that it would not be 
re-entered by getOpening().  It would look something like:

def getOpening(self,closing):
     self.rules
     return self.openingChars

def rules(self,d,a):

     obs = [adapt(ob,Rule) for ob in self.initArgs]

     if obs:
          self.openingChars = obs[0].getOpening('')

     obs.reverse()

     closeAll = self.closingChars
     closeCtx = ''

     # prevent re-entry
     self.rules = syn = []

     try:
         for ob in obs:
             ob = ob.withTerminators(closeAll+closeCtx)
             closeCtx = ob.getOpening(closeCtx)
             syn.insert(0,ob)

         self.openingChars = closeCtx

     except:
         # if we fail, don't leave debris behind
         del self.rules
         raise

     return syn

So here's the flow: calling rules sets a default opening chars and an empty 
rules list.  In the process of going through the rules list, when we make 
the recursive call to getOpening, which re-invokes the rules computation, 
nothing happens because we've already put in a dummy 'rules' value.  So the 
recursive call returns the "guessed" openingChars.  We then get to the end 
of the list, and set the *correct* openingChars.  If the sequence was 
non-recursive, there is no difference from before.  But if it's recursive, 
we've "fixed" it.

There's only one piece of recursion left: the 'withTerminators()' call.  We 
need to change 'Sequence.withTerminators()' to *not* return a new object if 
the current one's 'closingChars' are correct.  Otherwise, we will get 
infinite recursion there too: a new object will be created every time, even 
if it should happen at most once.  I suspect it may be necessary to do this 
for all rules' 'withTerminators()' methods, not just Sequence.  But maybe not.

I think maybe I need to create a non-trivial expression grammar using 
peak.model and fmtparse, just to have a good test suite for this sort of 
thing.  But first I need to find the time.  :)