[PEAK] Persistence styles, MDA, AOP, PyProtocols, and PEAK

Phillip J. Eby pje at telecommunity.com
Wed Jul 7 02:50:46 EDT 2004


Preface
-------

This is a rather long article, even for me.  (And that's saying 
something!)  However, it addresses a coming "sea change" that will affect 
PEAK at many levels, including at least:

* PyProtocols

* peak.model and peak.storage

* peak.metamodels

* the AOP/module inheritance facilities in peak.config.modules

Naturally, I think the changes will be very good ones for PEAK, and they 
are unlikely to have effects on existing code that doesn't depend on any of 
the above.  In particular, I do not expect any destabilizing effects on the 
core (except perhaps peak.model) or any primitives.  (Some API changes in 
PyProtocols 1.0 are almost certain, however, with backwards-compatible APIs 
remaining available at least through 1.1.)

However, for the non-core packages, especially metamodels and the AOP 
stuff, major upheavals and/or outright replacement or removal are likely.

So... now that I have your attention, let me begin.  :)


How We Got Here
---------------

My work at Verio has for a long time been the driving force that defined 
requirements for how PEAK would develop.  That is, my goals for PEAK 
reflected my long-term vision for how software would be developed there, 
and thus the software design goals included both social and organizational 
targets, as part of an overall development communications methodology 
covering matters from early requirements gathering all the way through 
deployment and ongoing support/maintenance.  You can see this, for example, 
in tools like the 'peak.ddt' package, which facilitates requirements 
communication and quality control by making a system's conformance to 
requirements visible to a group.

Part of this long-term vision included an MDA, or Model-Driven 
Architecture, based on UML and other OMG technologies.  You can see this 
influence as far back as the earliest days of the TransWarp frameworks, and 
as far forward as the current peak.metamodels and peak.storage.xmi packages.

However, with my departure from Verio, the company's needs are no longer a 
primary source of requirements for future PEAK development.  On the 
downside, this removes some of the clarity and focus from the requirements. 
But on the upside, I now feel more able to take an evolutionary approach to 
certain aspects of the system.  While at Verio, I felt a (self-imposed) 
pressure to only build in PEAK what were reasonably complete solutions to 
the requirements I knew about.  This tended to leave some areas of PEAK not 
developed at all, while they waited for me to find a "final answer".

Now, I feel more comfortable with prioritizing feature development with 
even more YAGNI (You Ain't Gonna Need It) and STASCTAP (Simple Things Are 
Simple, Complex Things Are Possible) than ever before.  You may notice this 
in some of my recent moves to consolidate subsystems and eliminate 
dependencies, while simplifying the code base...  and there's a lot more 
where that came from.

Currently, I consider the bulk of the PEAK framework to be a smashing 
success.  Nearly all of the core, and even some of the non-core frameworks 
are in good shape.  While some are far from complete, they have relatively 
few unresolved questions or "architecture smells", as noted in February's 
STATUS.txt.  But there is one gigantic unresolved architecture smell in 
PEAK right now, that stinks so bad it gives me headaches.  And that's our 
story with respect to persistence and MDA.

I think I've probably been railing about this issue for almost a year now, 
talking at length about conceptual queries, relational algebra and 
calculus, business rules, predicate dispatch, fact orientation and all of 
that.  Mostly these have been concepts I've grasped at to try and 
unify/solidify PEAK's existing persistence and MDA philosophy.  And in the 
last few weeks, I've come to realize that PEAK's existing philosophy on 
these matters is what needs to be revisited.  Most importantly, our concept 
of the central role of the "domain model" is flawed.

Historically speaking, PEAK's persistence philosophy evolved from Zope's by 
way of ZPatterns.  The basic concept was that objects were retrieved from 
containers, and those containers made the objects seem like ordinary Zope 
objects, even though they were retrieved from LDAP or an RDBMS instead of 
ZODB.  These containers, called "Racks" were then combined with 
focal-points (called "Specialists") for methods on multiple (or zero) 
objects of a particular interface.

While I'd like to call this concept a brilliant invention by Ty and myself, 
it really falls more under the heading of identifying and clarifying a 
pattern that we and others were haphazardly following, and turning it into 
a library.  What we didn't really clarify were the limits of applicability 
of this pattern, and that's a big part of what's gotten PEAK's persistence 
and MDA philosophies into trouble.

Specifically, the ZPatterns approach was well suited to web-based 
applications.  Indeed, it's still IMO an ideal way to approach the design 
of a web-based application's URL space, and I intend to carry it forward in 
peak.web.  But it's not only not the best way to structure an application's 
internals, it's a *lousy* way.  Just look at the PEAK "bulletins" example 
code, or any non-trivial example of a PEAK application that uses data managers!

It worked well for Zope 2 applications, of course, because Zope 2 
applications are built in the image of their URL space, but this is not so 
for "normal" Python apps.  You can see in the evolution of TransWarp and 
PEAK how there was an implicit assumption of this sort of hierarchical 
structuring according to URLs, that we carried with us from ZPatterns.

Another assumption that was carried forward has to do with MDA and the 
theory of building an application based on a "domain model", independent of 
its storage mechanism.  The idea of the domain model is that you put an 
application's core behaviors into objects that reflect the application domain.

Unfortunately, this idea scales rather poorly when dealing with large 
numbers of objects.  Object-oriented languages are oriented towards dealing 
with individual objects more so than collections of them.  Loops over large 
collections are inefficient when compared to bulk operations on an RDBMS, 
for example.  This means that building practical applications requires 
these bulk operations to be factored out, somehow.  ZPatterns took the 
approach of moving the operations to Specialists, and having domain objects 
delegate bulk operations to the specialists.

But this just moves the scalability issue from performance to 
development.  Now we're writing routines in the domain model that do 
nothing but pass the buck to Specialists or DMs, which then have to do 
something "real".  And every new reporting or analysis requirement for the 
application leads to more paired methods of this kind.

And, it still doesn't address a key scalability issue for virtually any 
enterprise-class business application: business rules!  Of course, the 
domain model itself may have some facility for storing and implementing 
rules that are native to the model (such as discount rules), but what about 
operational rules like "e-mail the customer a notice when X 
happens"?  These rules have to be incorporated into the code that does X, 
either in a DM or the domain model, and in either case making the system 
more brittle.  That is, the code is less reusable in other contexts.


Persistence Styles
------------------

These observations have helped me to realize that there are different 
"persistence styles", with corresponding scopes of applicability for the 
use of domain models.  I have dubbed them "fact base", "persistent root", 
and "document", where "document" persistence can be thought of as a 
specialized form of "persistent root" persistence.  The existence of these 
styles conflicts with my multi-year assumption that truly "transparent 
persistence" in a single framework was possible and practical.  That is, my 
assumption that one could create a domain model and have it be meaningfully 
persisted to arbitrary forms of storage without reflecting change in the 
domain model.

In fact, this is *only* really practical with a domain model that is 
entirely suited to the "document" style (at least with our current 
technology -- more on that later).  Thus, attempting to build a 
one-size-fits-all mechanism will tend to produce a mechanism that's at its 
best only for the document style -- as PEAK's is.

But I digress.  Let me explain the styles, as I understand them at the 
moment.  A "fact base" model is one that requires operations over large 
numbers of objects, with complex queries and reporting.  (By their nature, 
most serious "business" applications fall into this category.)  In a "fact 
base" model, objects are almost always initially retrieved using keys known 
to humans or other systems, rather than by navigation from a standard 
starting point.

A "persistent root" model is one where there is some distinguished root 
object from which the rest of the objects descend, as in ZODB.  Large 
collections and mass operations are infrequent; direct manipulation of 
individual objects the rule.  Zope 2 applications are mostly like this, but 
ZCatalog can be seen as an effort to overcome the inherent limitations of 
this style once the structure becomes large enough and mass operations 
(like searches) become more desirable.

A "document" model is one where the persistent root and all its children 
are sufficiently small to allow them to be loaded into memory all at once, 
and written out all at once when changes are required.  It does not suffer 
from the sorts of issues ZCatalog tries to work around, because it does not 
need to scale to a size that needs such a thing.

These styles are not 100% mutually exclusive (since a "fact base" might 
contain "documents"), nor are their boundaries really fixed.  For example, 
the "Prevayler" philosophy of persistence effectively says that if you have 
enough memory to play with, you can use the "document" model for anything, 
as long as you also keep transaction logs.  In theory, this would mean that 
in another 5 or 10 years, we might be able to use the Prevayler approach 
for all applications.

But that's baloney, actually.  At best, Prevayler covers the "persistent 
root" and "document" models well, where navigation occurs on the basis of 
object trees with relatively low fan-out.  Without additional structures 
performing functions analagous to those of ZCatalog, applications with 
"fact base" characteristics (like needing to look up one out of millions of 
transactions by an order number) just don't work out-of-the-box under a 
"prevalent" architecture: you end up having to design your own data 
structures to deal with these issues.

Currently, PEAK's DM system attempts to support all of the styles, and thus 
ends up really only being *good* at the "document" model.  This is 
well-illustrated by the fact that the only implemented DM's in PEAK that 
are not just tests or examples are document-based: the XMI DM in 
peak.storage.xmi, and the HTMLDocument DM in peak.ddt!  The only drawback 
to PEAK DM's in the "document" and "persistent root" models is that DM's 
require a special key to indicate the root object.  Apart from this, they 
work quite well.

But for the "fact base" model, DM's aren't really very good with dealing 
with multiple objects, and you have to write a *lot* of them, typically one 
per type, plus one for every kind of query.  Of course, I've mentioned that 
a lot of times before, and that I intend to do something about it, often 
accompanied by lots of hand-waving about "fact orientation".  :)

Mostly, though, I've been thinking in terms of how to expand or revise DM's 
to accomodate fact-orientation or SQL mapping.  I see now, however, that 
this is really not the right place to start, since DM's do just fine for 
the models they currently cover, and they are not well-suited to such a 
transformation.

So, instead, we need tools that are fact-oriented from the ground up.  For 
the most part, this means simply abstracting SQL and a physical schema, 
replacing them with a logical schema.  Queries and commands issued against 
the logical schema can then be translated to SQL or other operations 
appropriate to the back-end.

In order to make that work, we're going to need several things, including:

* A model for facts
* A query language that can be mapped to either SQL or in-memory Python objects
* A data management API that's query-focused and easily extended
* A mechanism for defining mappings between an abstract fact model, and one 
or more concrete storage models


MDA and AOP
-----------

As I said, our theory -- hypothesis, really, or perhaps just an article of 
faith -- was that the answer to all of these issues could be found in an 
MDA (Model-Driven Architecture) implemented using AOP (Aspect-Oriented 
Programming).

More specifically, I knew that we needed to separate storage-related 
concerns from application concerns, and that further, we needed a way to 
reuse, extend, and refine models without changing source code (so that 
versions of an application that was simultaneously targeted for different 
markets could share common source code despite differences in domain models 
and business rules).

This was the theory driving TransWarp through most of its history, and to 
some extent it continued on into PEAK.  TransWarp, however, never really 
produced much of direct value, and to the best of my knowledge PEAK's AOP 
facilities are currently only used by maybe one person/company -- and it's 
not me or anybody at Verio.  :)

There are several reasons for this.  At least one of them is that I never 
succeeded in writing a really good AOP tool!  But I think an even bigger 
reason is that most of the things I wanted to do with AOP, turned out to be 
easier to do and understand in other ways.  For example, PyProtocols and 
protocol adaptation made possible lots of things I'd planned to do with 
AOP.  So did the evolution of peak.binding and peak.config.

There are only two things left that the AOP stuff was intended to do (in 
the sense of being important enough to spend the time on developing it in 
the first place):

  * Allow variations of an application to change domain object classes to 
use different collaborator classes

  * Allow mixing domain-specific behavior into classes that were 
automatically generated from UML or other modelling tools

The first of these is in fact the only use case I know of where anybody 
might actually be using PEAK's AOP facility.  I don't personally expect to 
have either use case any time soon, though.  I *do* expect to still 
accommodate both scenarios in the future, it's just that the way of 
achieving them may have to change.

Whether the way changes or not, though, I do not currently intend to 
continue maintaining the AOP code or distributing it with PEAK, as there 
are much better ways of doing this now than my quirky bytecode recompiler 
mechanism.  However, if there's enough interest or need, I might be willing 
to spin the code off into a separate, "user-supported" distribution.

So, if you are currently using PEAK's AOP facilities in any way, *please* 
post what you're using it for now, so I can make sure I'm not overlooking 
any use cases in the transition to a post-AOP PEAK.



Going Beyond AOP (with 40-Year Old Technology!)
-----------------------------------------------

"Greenspun's Tenth Rule of Programming: any sufficiently complicated C or 
Fortran program contains an ad hoc informally-specified bug-ridden slow 
implementation of half of Common Lisp."

   -- Phil Greenspun

Apparently, this rule applies to Java as well.  I recently ran across the 
book-in-progress, "Practical Common Lisp" (available at 
http://www.gigamonkeys.com/book/ ), only to discover a surprising 
similarity between this chapter:

     http://www.gigamonkeys.com/book/object-reorientation-generic-functions.html

And the AspectJ language.  In fact, as I did more research, I began to 
discover that the only real plusses AspectJ has over what's available in 
Common Lisp are that it 1) works with "oblivious" code that wasn't written 
with AOP or extensibility in mind, and 2) uses predicates.

Now, before anybody panics, I am *not* planning to rewrite PEAK in Common 
Lisp!  But I *do* want what they've got, which includes many things AspectJ 
does not.  And, it addresses numerous things we need in the 
currently-unstable parts of PEAK...  the parts of PEAK that have been 
unstable precisely because they've been lacking this sort of functionality.

I'm speaking here of "multiple dispatch".  More specifically, I'm talking 
about a kind of symmetric multi-dispatch that's closer in nature to the 
kind found in the Cecil and Dylan programming languages than what's in 
CLOS, but the basic idea is the same.

Specifically, one defines a "generic function" that can have multiple 
implementations (aka "methods").  The appropriate method to invoke is 
selected at runtime using information about *all* of the function's 
arguments, not just the first argument, as Python implicitly does for you.

What good does that do?  Well, think about storage.  Suppose you defined a 
generic function like this:

     def save_to(ob, db):
         ...

And what if you could define implementations of this function for different 
combinations of object type and database type?  Maybe something like:

     [when("isinstance(ob,Invoice) and isinstance(db,XMLDocument)")]
     def save_to(ob,db):
         # code to write out invoice as XML

     [when("isinstance(db,PickleFile)")]
     def save_to(ob,db):
         # code to write out arbitrary object as a pickle

Doesn't this look a *lot* easier to you than writing DM classes?  It sure 
does to me.

You might be wondering, however, how this is different from writing a bunch 
of "if:" statements.  Well, an "if:" statement has to be written in *one 
place*, and has the *same set of branches*, all the time.  But generic 
functions' methods are different.

First, they can be written all over the place. The two methods above could 
live in completely different modules -- and almost certainly would.  Which 
means that if you don't need, say, the pickle use case, you could just not 
import that module, which means that branch of our "virtual if statement" 
simply wouldn't exist, thus consuming no excess memory or CPU time.  So, 
they can be "write anywhere, run any time".  Now that's what I call 
"modular separation of concerns".  :)

In addition to these very basic examples, expanding the approach to support 
full "predicate dispatching", and to support CLOS-style "method combining" 
and "qualifiers", would allow us to write things like this "adventure game" 
example:


    [when("not target.isDrinkable()"]
    def drink(actor,target):
        print "You can't drink that."

    [when("target.isDrinkable()")]
    def drink(actor,target):
        print "glug, glug..."
        target.consume()

    [after("target.isDrinkable() and target.isPoisonous()"
           " and not actor.isWearing(amulet_against_poison)")]
    def drink(actor, target):
        print "oops, you're dead!"
        actor.die()

The idea here is that the "after method" runs *after* the successful 
completion of any "primary methods" (defined with 'when()'), as long as its 
conditions apply.

In some ways, this is a bit like the old SkinScript of ZPatterns, except 
that it's 1) pure Python rather than a new language, and 2) applicable to 
any and every kind of function you want, rather than a handful of 
pre-defined "events".

And speaking of events, generic functions make *great* extension or 
"plug-in" points in general for systems that need them.  For example, an 
e-commerce framework could call generic functions at various stages of 
processing an order, such as "completed order", allowing custom business 
rules to execute according to the defined triggering conditions.

Indeed, almost anything that needs to be "rule-driven" can be expressed as 
a generic function.


So what's the Catch?
--------------------

You may be wondering what the catch to all this is, or perhaps you've 
already made up your mind as to what that catch *must* be.  You're maybe 
thinking it must be hellishly slow to eval() all those strings.  Or maybe 
that it's all a pipe dream that will take ages to implement.

But actually, no, neither one is the case, as you might already know if 
you've been paying close attention to recent PyProtocols checkins.  The 
new, experimental 'protocols.dispatch' module on the CVS trunk has already 
proven that it's possible to implement highly efficient generic functions 
in Python, complete with predicate dispatch.  They don't yet have the nice 
API I've presented in this article, but they do exist, and their execution 
speed in typical cases can actually approach that of the PyProtocols 
'adapt()' function!

In addition to the functionality proof-of-concept, I've also got a 
proof-of-concept Python expression parser (that so far handles everything 
but list comprehensions and lambdas) for what's needed to implement the 
fancy 'when/before/after()' API.  And there's a proof-of-concept for the 
"function decorator syntax" as well.

So, actually, all of the major technical pieces needed to make this happen 
(expression parser, decorator syntax, and dispatch algorithm) have been 
developed to at least the proof-of-concept stage.  The parser will reduce 
normal Python expressions to fast-executing objects, so there's no need to 
eval() expression strings at runtime.  Further, the in-CVS prototype 
dispatcher automatically recognizes common subexpressions between rules, so 
that e.g. 'target.isDrinkable()' will get called only *once* per call to 
the generic function, even if the expression appears in dozens of rules.

Also, the prototype dispatcher automatically checks "more discriminating" 
tests first.  So, for example, if one of the tests is on the type of an 
arguments, and there are methods for lots of different types, the type of 
that argument will be checked first, to narrow down the applicable methods 
faster.  Only then will pure boolean tests (like 'target.isDrinkable()') be 
checked, in order to further narrow down the options.

Finally, the already highly-optimized expression evaluation and dispatching 
code is destined to move to Pyrex, where I hope to make it evaluate 
expressions as quickly as the Python interpreter itself does, by way of a 
few tricks I've come up with.

The net result is that the production version of this code should be 
roughly comparable in speed to a series of hand-written 'if:' 
statements!  For some tests, like type/protocol tests and range/equality 
tests, it's likely to actually be *faster* than 'if:' statements, due to 
the use of hash tables and binary searches.  (But this will likely be 
balanced out by other overhead factors.)

Even, however, if it were to end up being say, twice as slow as 'if:' 
statements would be, it's still a heck of a bargain in development time, 
compared to having to write the 'if:' statements yourself.  Consider, for 
example, this set of example rules from the prototype dispatcher's unit tests:

         classify = GenericFunction(args=['age'])

         classify[(Inequality('<',2),)]   = lambda age:"infant"
         classify[(Inequality('<',13),)]  = lambda age:"preteen"
         classify[(Inequality('<',5),)]   = lambda age:"preschooler"
         classify[(Inequality('<',20),)]  = lambda age:"teenager"
         classify[(Inequality('>=',20),)] = lambda age:"adult"
         classify[(Inequality('>=',55),)] = lambda age:"senior"
         classify[(Inequality('=',16),)]  = lambda age:"sweet sixteen"

As an exercise, try to write the 'if:' statements (or design a lookup 
table) to implement this by hand.  More to the point, try to do it 
*correctly*, and without duplicating any of the comparisons.  Then, try to 
do it without actually rewriting the rules.  That is, try to do it using 
*only* the comparisons shown above, without (for example) changing the 
'>='  rules into '<' rules.  (After all, in a business application, the 
closer the code is to the human-readable requirements spec, the better off 
you are.  If you rewrite the rules to make them dispatchable, you've just 
introduced a "requirements traceability" issue.)

Have you tried it yet?  Even ignoring the traceability issue, it's quite 
tedious  to implement, because you have to explicitly work out what rules 
are "more specific" than others.  If somebody asked you to make the above 
classifications by age, it's trivial for you to "execute" the rules in your 
head, because our brains just "know" what's more specific than something 
else -- they go by what's "closest" or "most specific".  The advantage of 
predicate dispatching -- above and beyond the advantages of generic 
functions in general -- is that it teaches the computer how to figure out 
what "closest" means, just from the contents of the rules themselves.

It's not perfect, of course.  The current dispatcher only knows about types 
and value ranges (and it'll soon know about booleans).  But it doesn't know 
that one test might affect the applicability of another test.  For example, 
if an object is of some type other than NoneType, then obviously an 'is 
None' or 'is not None' test doesn't need to be executed.  However, short of 
coding this and dozens of other rules into the system, there's no way for 
the dispatch system to know that.

In practice, though, I suspect it will be rare that this causes anything 
other than a little inefficiency here and there.  And, the long term 
solution in any case will be to use a generic function to compare the 
rules.  In this way, one could do something like:

     [when("isinstance(r1,ClassTerm) and r1.klass is not NoneType"
           " and isinstance(r2,IsNotTest) and r2.value is None")]
     def implies(r1,r2):
         return True

to extend the dispatch system with knowledge about the relationship between 
various kinds of tests.  (Which just gives another example of how 
extensible a system based on generic functions can be!)


Back to Reality
---------------

So how does all this relate back to PyProtocols, PEAK, persistence, MDA, 
and all that?  Well, for PyProtocols the relationship is simply this: 
sometimes you don't really need/want an adapter object, especially when the 
interface has only one method.  It's sometimes surprising to realize how 
many useful interfaces are just that: one method.  In fact, it's wasteful 
to create an adapter for such an interface, as you only throw it away after 
you call that one method.  So, generic functions are an obvious 
replacement.  If you only do type tests on one argument, generic function 
dispatch is basically the same speed as regular adaptation, as it follows 
the exact same lookup algorithm.  (And note, by the way, that the prototype 
dispatcher can dispatch on protocol tests as easily as it can on type tests.)

Most of PEAK's core (binding/config/naming) APIs aren't going to be 
affected, though.  If they really needed this kind of complicated 
dispatching, odds are good they wouldn't already be as stable as they 
are.  However, in the less stable areas of the system (like storage), one 
of the major reasons *why* they're not stable is because of how hard they 
are to write *without* something like predicate dispatch and generic functions.

A while back in this article, I wrote that there were two remaining AOP use 
cases in my mind for PEAK:

  * Allow variations of an application to change domain object classes to 
use different collaborator classes

  * Allow mixing domain-specific behavior into classes that were 
automatically generated from UML or other modelling tools

Both of these can be accomplished with generic functions.  The first, by 
having features use a generic function to look up their target type, and 
the second, by having the code generation tools insert "abstract" generic 
functions into the generated code.  That is, stubs without any 
implementations.  One then simply writes the implementations in a separate 
module.

Poof!  We're done.  No more AOP, no module inheritance, no bytecode 
hacks.  All gone, but the potential for a model-driven architecture and for 
modular separation of concerns remain intact.

Okay, so what about the persistence stuff, fact bases, and all that?  I 
said we needed:

* A model for facts
* A query language that can be mapped to either SQL or in-memory Python objects
* A data management API that's query-focused and easily extended
* A mechanism for defining mappings between an abstract fact model, and one 
or more concrete storage models

I haven't narrowed these down 100%, but here's what I think so far.  The 
query language is probably going to end up being Python, specifically list 
or generator comprehensions, e.g.:

     [(invoice,invoice.customer) for invoice in Invoices if status=="pastdue"]

However, these will be specified as *strings*.  If the objects being 
queried are in memory, the string can just be eval()'d, but if they are in 
a database, the query can be converted to SQL, by a mechanism similar to 
the one I'm writing now for doing predicate dispatch.  Of course, the 
decision to eval() or to translate (and translate to what SQL dialect, 
etc.) will be made by generic functions.

The data management API and mapping mechanisms will probably end up being 
mostly generic functions, possibly accessed via methods on an "editing 
context" object (as I've mentioned in previous mailing list posts), but the 
back-end implementation code that you write will look more like the 
'save_to' function examples I gave earlier in this article, rather than 
looking anything like today's DM objects.  Indeed, this will ultimately be 
the death of DM's, and I believe everyone will be happy to see them 
go.  Overall, the storage API is probably going to end up looking somewhat 
like that of Hibernate, a popular Java object-relational mapper.  (The main 
difference being that instead of their specialized object query langugage, 
we'll just use Python.)

As for the fact model itself, I haven't yet absorbed the full effect of 
this new technology, but I feel fairly good about the ability to 
potentially express model constraints such that they can be tested 
(immediately, or at commit time, or at a later time) using predicate-driven 
rules.

Finally, note that the ability to dispatch on both the type of an object 
*and* the type of database being persisted to, means that we can even 
escape the tyranny of "persistence styles", by selecting different 
implementations.  Using generic functions, the "storage framework" becomes 
so loosely coupled that its dominant structuring no longer influences what 
persistence style it's "most suited" for.  In fact, it doesn't *have* a 
dominant structuring, so there's nothing to get in the way.

Of course, all of this is still extremely "hand-wavy", but the use of 
predicate dispatch and generic functions should completely eliminate major 
areas of concern that I previously had -- especially with respect to how to 
register things in the configuration system such that they end up with the 
right precedence ordering, given the various types they applied to and what 
kind of database they were for.  Indeed, just the fact that we won't need 
to define any new configuration *syntaxes* (such as .ini file section 
types) makes the generic function route pretty appealing.

Indeed, speaking of configuration, there are other subsystems in PEAK that 
have been crying for more sophisticated configuration, that generic 
functions fit the bill quite well for.  For example, 'peak.web' would like 
to have "views" and other kinds of adaptation defined "in context" of 
particular parts of a site, or possibly depending on various conditions 
that might apply to the object at a point in time, or what kind of web 
request it is (e.g. browser vs. XML-RPC).

'peak.security' also has a system that's *sort of* like predicate 
dispatching, in that it evaluates rules to determine whether a user has 
permission to do something.  But its syntax isn't exactly easy-to-use, and 
there's lots of extra junk involved in the mechanisms.  I don't know if 
I'll actually go back and "fix" it, but I know if I were writing it today, 
I'd want to do it with generic functions rather than building up all the 
special-purpose framework code that's in there now for rule evaluation.

(By the way, when I speak here of "configuration", I'm not really talking 
about file formats like .ini and ZConfig and all that, so much as I am the 
assembling of the components needed to implement a specific 
application.  Such assembly is done by its developer, or by someone who is 
extending the application.  Today, it's often done in config files, but in 
the future more of it may be done in code, with the configuration files 
mainly saying what modules to import.  But don't confuse this with 
configuration *settings* used by an application -- these will stay in 
configuration files where they belong.)


Conclusion
----------

Generic functions can replace AOP, in code that uses them.  This makes 
PEAK's AOP facilities moot for their original intended uses.  Generic 
functions will also greatly simplify the design and implementation of 
future storage and query features, while opening up many new possibilities 
for extensibility, business rules development, and similar 
capabilities.  They are also a natural complement/extension to protocol 
adaptation.

(For example, if you use a generic function as an adapter, you can have 
flexible dynamic adaptation, with no ambiguity as far as PyProtocols is 
concerned.  Indeed, it's possible that you could replace 'adapt()' itself 
with a generic function, meaning that you could even register adaptations 
to protocols that aren't PyProtocols protocols.)

Therefore, the plan is to finish implementing generic functions, phase out 
the AOP subsystem (and perhaps also separate peak.metamodels into a 
different package and distribution), and then begin developing the new 
storage and query APIs.

Currently, my intent is to put generic functions in a 'protocols.generic' 
subpackage, which you'll use in a way that looks something like:

     from protocols.generic import when, before, after

     [when("something()")]
     def do_it(...):
         ...

However, I'm at least somewhat open to the possibilities of:

* Making a separate top-level package for generics (e.g. 'from generics 
import when') instead of a subpackage of 'protocols'

* Making a separate 'PyGenerics' package (that includes PyProtocols)

* Doing something else altogether that I haven't thought of, but which 
someone else suggests.  :)

So, here's your chance to change history.  (Or at least a HISTORY.txt file 
or two!)  Tell me, what am I missing?  What am I doing wrong?  Ask me 
questions, tell me this is crazy, or whatever you want.  Your feedback 
(including questions) about this plan is respectfully requested, and will 
be greatly appreciated.




More information about the PEAK mailing list