[PEAK] PEAK-Rules indexing performance

Fri Jan 9 15:12:02 EST 2009

Phillip J. Eby wrote:
> At 01:03 PM 1/9/2009 +0100, Alberto Valverde wrote:
>> Hi,
>>
>> We were struggling trying to optimize the bad performance (well, 
>> horrible: about 4 seconds just to build the tree the first time a gf 
>> was called with a different type signature) of a function heavily 
>> extended with peak.rules. Finally, an "intuition driven" optimization 
>> [1] hit the nail and performance improved *dramatically* (to < 0.1 
>> s). This is the magic changeset:
>>
>> http://hg.python-rum.org/tw.rum-exp/rev/1446664f1423
>>
>> Basically, some rules which were checking for set inclusion of an 
>> argument in a set bound to "self" were changed so the inclusion check 
>> was made against a list "inlined" into the same rule. Note that the 
>> performance improvement is not at runtime, the performance here is 
>> fine, but during the indexed tree building process when a gf is first 
>> called. The profiler reveals many, many calls to "implies" and 
>> "overrides". More details, wild guesses  and profiler output are here:
>>
>> http://python-rum.org/ticket/53
>>
>> The thing is that we'd like to further optimize if possible following 
>> the same pattern or at least do it in a cleaner way. However, not 
>> knowing the real reasons doesn't make us feel too comfortable... So, 
>> please, can some light be shed over the reasons of this magical 
>> improvement?
>
> The specific thing going on with your optimization is that your 
> original code was doing a lot of unindexable comparisons.  When you 
> have an operation like 'x in y', and 'y' is not a registration-time 
> constant, it can't be indexed as a set of ORed == operations.  It has 
> to be indexed as 'x in y: true' instead of a series of x==1: true, 
> x==2: true, operations.
>
> Why this results in such a huge performance tank, I have no idea.  But 
> you *can* do the optimization more cleanly.  It's not necessary to 
> stringify the constants, they can just be module-global values or even 
> function-local values.  (e.g. register_widget() in your patch doesn't 
> need to stringify 'actions' to get the benefit.)
>
> They just have to be reachable in the declarer's namespace at the time 
> of the declaration, and the sequence will be treated as a constant and 
> unfolded into a series of ORed == operations.
>
> (For that matter, you could use SomeClass.attribute, and it would 
> still be unfolded at compile time -- whether SomeClass is 
> module-global or function-local at the declaration point.  It just 
> can't be an expression based on an *argument* to the target function.)

Hmm, good to know... this brings to mind some other places which we can 
apply the same pattern to optimize...
>
> As far as performance goes, the Chambers-and-Chen tree building 
> algorithm *is* subject to exponential blowup, but the way PEAK-Rules 
> does it, it should be amortized over a larger number of calls.  
> PEAK-Rules builds dispatch trees lazily, it doesn't build them 
> all-at-once.
>
> So what I suspect is happening, is actually expensive creation of leaf 
> nodes, *as different combinations of classes and actions are 
> invoked*.  Looking at your profile, I see 13.7 seconds (?) spent in 
> build_leaf(), and build_leaf() is what constructs a combined method 
> for the final call operation.

All those 13.7 seconds in build_leaf are for a couple of call to the 
same GF with arguments of different types. Unfortunately they all occur 
in the same request the first time certain views are called (hence the 
urgent need to optimize it)
>
> Aha... yes, I see where the blowup would be 2**sequences...  that is, 
> you have sequences like actions, new_actions, and input_actions...  
> which I would guess then a 2**3 == 8X multiplier on the number of 
> required leaf nodes.
>
> You see, when the tree is optimized, what you have is a dispatch table 
> that looks like:
>
>        root[class][action] == method
>
> That is, the root has an entry for each target class, and then that 
> entry has entries for each action, which then leads directly to a 
> target method (possibly a combined method).
>
> When it's *not* optimized, however, it looks more like *this*:
>
>        root[class][action][action in self.new_actions][action in 
> self.edit_actions][action in self.input_actions] == method
>
> Which is a crap-load more nodes to build out, a lot of duplication, 
> and extra runtime overhead, because all three of those sequence 
> lookups have to happen on *every* call (due to being dynamically 
> valued by way of 'self').  So 8X as many nodes to build, plus three 
> extra levels of dispatch -- and three extra indexes to prioritize in 
> best_expr()! -- probably accounts for all your performance loss.  And 
> every additional condition you added of this nature would've halved 
> your speed again.
>
> This 2X penalty applies whenever you use a criterion that's opaque to 
> optimization.  It's rather like trying to do a join or select on a 
> function in a database without any ability to index functions: it 
> forces a table scan, so to speak.  PEAK-Rules tries to confine the 
> effect by ensuring the test is only applied where it's applicable, so 
> if there are only a few types to which you're applying the conditions, 
> and the conditions *depend on each other*, it can prevent the blowup.  
> But in your case, the various 'action in' tests are *independent*, so 
> it can't trim the tree that way.
>
> That is, if you have a pair of tests like  'A in B and A in C' and 'A 
> not in B and A in D', then PEAK-Rules can skip testing 'A in C' if 'A 
> in B' comes up false.  So instead of a tree like:
>
>        root[A in B][A in C][A in D]
>
> You get a tree where the "true" branch of 'A in B' goes to an 'A in C' 
> test, and where the false branch goes to an 'A in D' test.  But in 
> your case, that optimization wasn't possible because PEAK-Rules 
> doesn't know the implication relationships between the sequences.
>
> So, if there is a clear implication relationship of this sort, you 
> could explicitly write it into your rules (e.g. 'action not in 
> self.input_actions and action in self.new_actions') and that would 
> mitigate the blowup.  However, I suspect from the code I see in your 
> patch that this isn't really possible, because there aren't any 
> mutually exclusive sequences in the code I saw.
>
> In the long run, I'd like to handle the sort of use case you're doing 
> by allowing "predicate functions" to exist; then you could define a GF 
> like 'is_edit_action(ob, action)', and thus be able to redefine the 
> edit actions for a given class, but still get the optimization 
> benefits.  The idea is that the predicate function would be expanded 
> into its full logical form inside the tree building of the outer function.

Hmm, sort of like "generic meta-functions"?  After some peeking at 
peak.rules internals, motivated by the meta function recipe you posted 
some months ago to optimize the "isclass" test, I thought about using 
them to simplify some rules by "wrapping" several related checks under a 
function without incurring in performance penalties due to 
non-statically analyzable predicates. Making these generic would be useful.
> I have some notes on how to do this, but it requires a refactoring of 
> certain key logical functions in the core, to allow subscribing to 
> callbacks to recompute a logical expression.  Otherwise, when you 
> added a new rule to 'is_edit_action' (or whatever), the GF's using 
> that function in their rules wouldn't get updated.
>
This has been a very useful explanation. Extremely enlightening, thanks 
for taking the time to elaborate.

Cheers,
Alberto