[PEAK] PROPOSAL: Remove ZODB/Zope X3 dependencies/"compatibility"

Sat May 29 17:36:40 EDT 2004

I've been reviewing the latest version of Zope X3, and it appears to me 
that it's a waste of time to keep chasing compatibility with it in peak.web 
and peak.model.  So, I'd like to propose some changes to the affected packages:

ZODB 4 and peak.model
---------------------

PEAK's persistence machinery was built on an early version of ZODB 4.  But 
Zope X3 has dropped use of ZODB 4 and gone to an enhanced ZODB 3.3 
instead.  So, we're completely incompatible at this point, and if we're 
going to make a change, we might as well change to something that suits 
PEAK better.

My plan is basically as follows: create a dictionary-like mapping "record" 
object that can be given functions that load individual attributes, as well 
as a catch-all "load everything else" function, and that provides an event 
source for changes made to loaded values.  Then, model.Element will be 
changed to no longer subclass the ZODB "persistent" base class, and the 
various DataManager base classes will be changed to set objects' '__dict__' 
attribute to a "record" object, and to subscribe to the event source.  (And 
before anybody asks: no, this will not make data managers asynchronous, 
although it would in principle become *possible* for you to make them so, 
with sufficient effort.)

I don't know if these changes can be accomplished with full 
backwards-compatibility.  I'd like to try to leave the current DM extension 
API intact if possible, but it will necessarily involve performance 
compromises.  For example, today the normal return from a _load() method is 
a dictionary containing values or LazyLoader instances.  To preserve that, 
I'm going to have to have code scan the return value for LazyLoaders and 
handle them separately, which is going to impose some overhead that wasn't 
there before.  (Unless of course everybody tells me they aren't using 
LazyLoaders and I don't need to preserve that functionality.)

One particularly dicey area is '_p_jar' and '_p_oid'.  Technically these 
are not part of the public interface of peak.model objects, but some of you 
are probably using them (especially since I used them in peak.ddt).  As it 
happens, any attribute or method beginning with '_p_' would go away under 
this proposal.    Instead, one would have to use some extrinsic information 
to track an object's oid or DM, such as via a closure created when the 
object's ghost is created.

This isn't really as bad as it sounds, as I believe I can make it so that 
code written to the old API won't notice any changes, as long as you can 
identify one or more attributes that the old '_p_oid' was based 
on.  However, let this be a warning now: if you're using '_p_' attributes 
in your code for any reason, you should probably get rid of them ASAP, as 
they will definitely not be backward compatible.

The ultimate goal will be to support a new, '_p_'-less and DM-free API, 
using "editing context" objects with an API that's something like:

     # Find by primary key
     something = ec.find(SomeType, some_field=27)

     # Find by alternate key
     something = ec.find(SomeType, foo=99)

     # Query for multiple items
     for item in ec.find(SomeType, some_non_unique_field="baz"):
         pass

     # Query for all items
     for item in ec.find(SomeType):
         pass

     # Add an object
     newObj = SomeType(some_field=42, other_field="test")
     ec.add(newObj)

     # Delete object(s)
     ec.delete(SomeType, some_field=27)

instead of the current:

     # Find by unique key
     something = self.someDM[27]

     # Find by alternate key
     something = self.anotherDM[27]

     # Query for multiple items
     for item in self.someQueryDM["baz"]:
         pass

     # Query for all items
     # XXX write your own special method

     # Add an object
     newObj = self.someDM.newItem(SomeType)
     newObj.some_field=42
     newObj.other_field="test"

     # Delete object(s)
     # XXX write your own special method

My experience with DMs so far is that it's really tedious to keep track of 
them, even in trivial applications like the 'bulletins' example.  So, this 
new format eliminates the need for FacadeDMs when searching on alternate 
keys, *and* it eliminates the need for QueryDMs to manage collections.  In 
other words, the single 'find' method should be usable for finding both 
individual items (whenever all the fields for a unique key are supplied) 
and doing multi-item queries (when no unique keys are present).  This API 
should also be much more amenable to simple object-relational mappings, and 
should be able to integrate with peak.query in a straightforward fashion.

For this API to be implementable, editing contexts will need to know what 
the unique keys are, and a rough idea of collection sizes.  (Specifically, 
whether a given query or collection can or should be cached.)  However, 
it's not clear to me at present whether this info will be part of the 
peak.model objects, or in the underlying storage mechanism.  It seems to 
me, however, as though this information needs to be part of the program's 
static model, because when you issue a 'find()' call it should be clear 
whether you expect a single item or multiple ones.  (And asking for a 
single item that doesn't exist should result in an error.)

Anyway, this API should fix a lot of quirks and warts in the current 
DM-based API, such as the inability to check whether an object "exists" at 
retrieval time.  I'll probably write other posts later to flesh out this 
design further, and the path along which the existing code will be migrated.

Zope X3 and peak.web
--------------------

PEAK's web application package is based on Zope X3's 'zope.publisher' 
package.  My original intent in doing this was to:

* Avoid having to maintain HTTP request/response classes
* Take advantage of Zope's 'publish()' algorithm
* Take advantage of Zope's I18N support for determining browser-preferred 
languages and character sets (and possibly other locale preferences later)

The downside to all this is that one must currently install a significant 
portion of Zope X3 (zope.testing, zope.interface, zope.component, 
zope.proxy, zope.security, zope.i18n, and zope.publisher, to be 
precise).  PEAK doesn't really need most of this stuff.  Indeed, 
'zope.i18n.locales' and 'zope.publisher' are all we're really trying to use 
at present.

And, even the packages we do use have stuff we don't necessarily need.  For 
example, much of what HTTPRequest and HTTPResponse do are there (IMO) to 
support the traditional Zope 2 APIs and marshalling functions, which are 
just cruft where PEAK is concerned.

So, it's tempting to consider replacing them with simpler objects.  The 
"lingua franca" used by PEAK for HTTP requests and responses is just an 
environment dictionary, and a set of in/out/error streams.  Passing them 
directly to published objects would give them total control over how their 
inputs were parsed and their outputs formatted.

Another advantage of this approach is that it would allow a functional 
interface for HTTP: pass in an environment and input stream, and receive 
back a set of headers and an iterator over the output.  The disadvantage is 
that it couples knowledge of HTTP to the function.  That is, the function 
is defined in terms of HTTP and can't be migrated to something else.  I 
don't see this as a huge concern, however.

What else do we lose by not having request/response objects?  Not much, 
that I can tell.  Pretty much every piece of functionality that they 
provide can be replaced with function calls.  For example, a 
'get_cookie(environ)' function.  Such functions can even cache their 
results inside the 'environ' mapping, so that the work is only done 
once.  In essence, we're centralizing all of the system's mutability in an 
environment mapping.  Actually, we could make 'environ' immutable and have 
the functions return a new environment, but that seems like overkill.

So, throughout the existing peak.web interfaces, the 'interaction' and 
'ctx' parameters could be replaced by an 'environ' parameter.  Most of the 
functionality of the current 'web.TraversalContext' would move to functions 
that operate on an 'environ', possibly returning a new 'environ'.

In this way, we could replace linear traversal with recursive traversal, 
making all renderings capable of functional composition.  Or, to put that 
in English, it means you can use a "pipes and filters" pattern to assemble 
components.  (Ulrich will be happy because this means he'll be able to do 
his XSL transforms over arbitrary renderables, without having to create a 
dummy Response object, for example.)

The end result is that we'd have a uniform interface at all levels of 
peak.web: a single-method interface, perhaps something like:

     def handle_http(environ, input_stream, error_stream):
         return status_code, header_sequence, output_iterable

You could then create an arbitrary number of processing stages over 
this.  For example, PEAK's first processing stage could simply call the 
next stage, wrapped in a 'try' block that does transaction and error 
handling.  Each stage can delegate to a subsequent stage following a 
traversal operation, if there is anything to traverse.  Interestingly, this 
approach completely eliminates the need to have complex logic like Zope's 
"traversal stack" system, as each object being traversed has total control 
over the subsequent traversal.

This interface is easy to adapt to the existing IRerunnableCGI interface, too:

     def runCGI(self, input, output, errors, env, argv=()):
         status,headers,iterable = self.subject.handleHTTP(env,input,errors)
         print >>output,"Status:",status
         for header in headers:
             print >>output,header
         map(output.write, iterable)

So, it can integrate nicely with our current CGI, FastCGI, and HTTP wrappers.

Thus, I believe we can end our dependency on Zope X3 for publishing 
support, but the changes to peak.web will be substantial.  In addition to 
the changes I've outlined above, we would also be dropping the use of 
adaptation to convert components to their decorator/view objects.  We'll 
still have decorators, but they'll be registered via the configuration 
system instead, allowing "placeful" lookups, and getting rid of the need to 
have the page, error, and traversal protocols for registration 
purposes.  Indeed, we should end up with it being possible for most 
decorators to be defined by simple configuration, without coding (as is 
currently required).

Wrap-up
-------

So, those are the proposals.  They involve quite a bit of work to 
implement, so I don't know how long they'll take me, or even when I'll be 
getting started.  In the meantime, I'd like to hear your feedback, 
comments, questions, objections, etc.