[PEAK] DM refactoring and asynchrony (was Re: PEAK and Twisted, revisited)

Tue Apr 20 02:57:19 EDT 2004

Phillip,

Thanks for the out-loud thinking -- this message gave me more useful
information about the direction of PEAK than any I've read before!
Comments inserted below.

Phillip J. Eby wrote:
> At 09:46 AM 4/16/04 +0200, Ulrich Eck wrote:
>> the reason why i didn't spend too much time on it is,
>> that pje plans to completly refactor the DataManager API to integrate
>> with peak.events and peak.query. a concrete timetable and design for
>> this refactoring is not yet available i think.
> 
> The design is getting a bit more concrete, but not the timetable.  :(  
> It may not so much be a refactoring of the current API, as addition of 
> another completely different API.  But there is some theoretical work to 
> sort out first.

The addition of a completely different API makes sense to me.

In fact that's a fundamental aspect of my application, which
deals with both a "standard" set of domain schemas -- having
mostly to do with facts and assertions -- and a set of
domain objects that have API's and carry state around.
Just different views of the same stuff, though, so the
effects of querying, modifying, and interacting with each
must be consistent and sometimes "concurrent".

The OO/DM view vs. the FO/EC view seems to me a classical
dualism -- each of them valid/optimal for different use-cases,
operating on different views of the same underlying shared
data/model/state.  I'm guessing you agree -- am I right?

> In a fact orientation approach, one deals with "elementary facts" that 
> relate 1 or more objects in some way.  For example, "Order #5573 totals 
> $127" is a "binary fact" relating the order object referenced "#5573" 
> and the dollar amount object referenced by "$127".  Most facts are 
> binary (i.e. involve two objects), but unaries and ternaries are also 
> possible.  Quaternaries and higher-order facts are uncommon, when 
> dealing with elementary (non-decomposable) facts.
> 
> (By the way, 'x[y]==z' can be considered a ternary fact, which may later 
> play into the implementation of mapping types in peak.model.)

Your description of "FO" (of which I confess my ignorance :)
both here and further on reminds me strongly of the formalism
of Description Logics, which provides the rigorous foundation
for the W3C Semantic Web (SW) experiments, RDFS, OWL, triples,
etc.  This connection is of great interest to me, and I
think some of the recent work in those areas provide some
potential ingredients for defining and managing FO domains.

I'd like to help work on synthesizing the OO and FO approaches
at the MOF level -- of course, that too will require a lot of
work to make it really useful, and it's best to do it while
prototyping more concrete implementations at the level I think
you're talking about, and which I am doing somewhat more
primitively in my app.

For my application, the practical benefit of making the connection
at the MOF level would be the leverage it would provide for
interacting consistently in both data/fact bulk import/export
and inferencing mode and API/services and event/transactional
mode.

> The tremendous advantage of this approach is that it has no impedance 
> mismatch with databases, rule systems, or form-based UIs, and only a low 
> mismatch with direct-manipulation UIs.  By contrast, OO-paradigm storage 
> systems like peak.storage and ZODB have a high impedance mismatch with 
> rule systems and form-based UIs, a medium mismatch with relational DBs, 
> and a low mismatch with direct-manipulation UIs.

I agree.  In my application, I'm including a "Triples" (a la OWL
and SW -- read "facts" ;) table in which domain objects known
to my application participate in facts (triples) with
properties that could be defined either locally or remotely,
and at all levels, and Triples of which either the subject or
predicate is a domain object could be joined with domain object
tables to provide facts about the domain objects.

I haven't done an implementation yet, but my idea is to
allow the Triples to span any number of metalevels,
as I think you are also proposing.  Of course, this means
that not all sets of facts would be "decidable" in a formal
sense, but my application wouldn't be trying to find *all*
"entailments" (inferences), anyway.  Also, I suppose it might
be possible to define meta-layers within which views would be
decidable, if desired ... ;)

> This unifying characteristic makes the FO viewpoint a perfect 
> "intermediate representation" or "lingua franca" between different 
> levels of a system.  In fact, because of the ease of expressing complex 
> constraints in FO systems, and because FO models and constraints can be 
> specified or rendered as user-friendly English (or other language) 
> expressions, I believe it is in fact preferable to such modelling 
> languages as UML for gathering end-user requirements and defining the 
> domain model for an application.

I completely agree.  UML is designed and best suited for modeling
solution spaces, IMO.  I think it is weak for domain modeling.
BTW, I think your "'lingua franca' between different levels of a
system" is the same concept as the triples/facts across any number
of meta-levels to which I aluded above.  Right?

> The hard part of the design, that I'm chewing on in my spare time, is 
> how to create a Pythonic FO notation, that won't end up looking like 
> Prolog or Lisp! ...

You may have already seen this, but I like "N3" notation as a
clean lexical format for triples (sort of the triples "rst". ;)
( See http://www.w3.org/DesignIssues/Notation3.html )
There are some Python N3 parsers, including Dan Connolly's cwm.
N3 might not be the basis for the ideal Pythonic FO notation,
but it might provide some ideas.  A common Python API for the
extent of the shared semantics between FO and N3/OWL would be
very desirable from my point of view.

[BTW, just because I seem to be harping on W3C stuff, please don't
infer that I'm an XML freak; to me XML is way over-hyped and
abused, but it *is* a fact of life ... *sigh*.  And I am *so*
happy that you developed an XMI import capability for PEAK, and
that you did it as an MOF model!  I definitely hope to use that
in my app.  From your remarks, I'm sure that dealing with the
vague XMI specs was not too much fun.  That may or may not be
related to it being XML, but knowing XML, I suspect it was. ;) ]

> ...  A little further thought on this for such things as 
> business rules, DB access, and so on, quickly reveals that the actual 
> use cases for an "object oriented" model for business objects are 
> becoming rather scarce.

Rules seem to be an area of current work in the SW world.
They are beyond the scope of OWL as it exists today,
but there is a proposed OWL extension to support them:

http://www.cs.man.ac.uk/~horrocks/DAML/Rules

(written by two of the authors of OWL)
Then there's the RuleML stuff, which I haven't looked at yet.

> Indeed, the more I think about it the more I realize that trying to 
> implement a lot of typical business functions by direct mapping to the 
> OO paradigm actually *adds* complexity and effort.  Business in the real 
> world actually does consist mostly of data and rules.  Objects in the 
> real world are not at all encapsulated and they certainly do not enforce 
> their own business rules!

That is the case in my domain (manufacturing, engineering).
Within a specific tool (e.g., mechanical CAD), there may be
all kinds of nice controls and constraints, but once the data is
released into the wild, all bets are off -- especially from a
Systems Engineering point of view, which is the most important
one for my application.

> So, the fundamental paradigm for peak.query applications will be to have 
> an EditingContext against which queries are made, and that facts are 
> added to or removed from.  EditingContexts will also publish validation 
> facts.  That is, one source of facts from an editing context will be 
> facts about what fact changes you've made are violating business rules 
> or other validation.  But it's likely that these "constraint metafacts" 
> (to coin a phrase) ...

works for me :)

> will not be anything "special" in the system; it's 
> just that business rules will be able to be sources of facts.  Indeed, 
> anything that you would compute in an object's methods could be moved to 
> a business rule implementing a fact type.  For that matter, it becomes 
> silly to even talk about business rules: there are really only fact 
> types, and a fact type can be implemented by any computation you wish.

> This is both more and less granular than a DM.

IOW, they may be fully "dual" (inside-out views of each other).

> ... for 
> most applications you'll define relatively trivial mappings from fact 
> types to 2 or more columns from a relational DB, using relational 
> algebra operators similar to the prototypes that are already in 
> peak.query today.

For me, that resonates pretty strongly with my "triples" table
concept.  BTW, I don't mean just the 3 (subject, predicate, and object)
columns; in mine, I add admin data -- owner, creator, modifier, timedate
stamps, etc.  Certainly there could be more.

> But, unlike the peak.query of today, fact retrieval will be able to be 
> asynchronous.  That is, you'll be able to "subscribe" to a query, or 
> yield a task's execution until a new fact is asserted.  Even if your 
> application isn't doing event-driven I/O or using a reactor loop, you 
> could use these subscriptions to e.g. automatically raise an error when 
> a constraint is violated.  (In practice, the EditingContext will do this 
> when you ask that it commit your changes to its parent EditingContext, 
> if any.)  If you're writing a non-web GUI, you'd more likely subscribe 
> to such events in order to display status bar text or highlight an input 
> error.

Sure, but this stuff is just "framework sugar".  (Of course,
I like it. :)

> Writing this email has actually helped me sort out quite a few of the 
> implementation details I had been wondering about, such as how business 
> rules and constraints should work.  Now I see that they do not require 
> anything "special", as they are just the same as any other kind of 
> fact.  They only appear to be "meta", but in truth they are constructed 
> by the same fundamental joining of facts as any other compound fact!  
> Very nice.

Well, they *are* meta in a DL sense, but it's only an important distinction
if you are concerned about the "decidability" of collections of facts,
which I think is only a problem if you're doing some aggressive
inferencing that's trying to compute some significant fraction of "all"
entailments of the set, as opposed to just trying to verify some rules.
It seems to be very important to the Semantic Web theorists:  hence two
of the three "species" of OWL (OWL Lite and OWL-DL) are designed to be
decidable -- e.g., in OWL-DL, classes aren't allowed to be objects.
We would need OWL Full.  I suspect there are lots of apps that don't need
massive entailment computing.  Mine, for example.

> ... mapping to an RDBMS could be 
> rather interesting.  The model as I've described it so far has no 
> built-in way to ensure that inserts and updates get done in a way that 
> honors RDBMS-level constraints.  Hm.  Actually, the mapping from fact 
> types to tables actually has to incorporate information about joins, so 
> the information to do it is there.

With facts represented a triples, the join would be on subject
or object id.  In OWL, that would be a URI, but I plan to generalize
that to include any suitable name.  (Some of them would only have
local significance.)

> ... Hm.  This is sounding simpler than I thought it would be, at least in 
> principle.  But I still need to:
> 
> * Work out the metamodel for fact types, how to map from constraints to 
> derived fact types that signal the violation of those constraints, and a 
> Pythonic notation for defining a fact-oriented domain model.  (For 
> example, to define a constraint on a fact type or set of fact types, one 
> must have a way to reference those types in code.)

I would think at the MOF level facts and their types would be
structurally isomorphic to objects.

> * Define a mechanism for implementations of fact types, that allows for 
> the possibility of segmenting compound fact implementations (e.g. 
> database tables) as well as compounding elementary fact implementations, 
> in a way that doesn't lose the connectedness.  That is, if I issue a 
> compound query that has 3 facts answerable by a DB table, I don't want 
> to generate 3 queries against that table, just one.  And if the other 
> needed facts are in another table in the same DB, I want to generate a 
> query that joins them!  As it happens, a lot of the ground work for this 
> has already been done in peak.query.algebra.

Interesting ... I'll have to look at that!

> * Define a framework for defining concrete schema mappings, in a way 
> that allows reuse.  That is, I might want to define mappings from a fact 
> schema to two different database schemas, but share an implementation of 
> various business rules between them.

Absolute.

> Oh yeah, and then I need to actually implement all that.  Does that 
> maybe answer any questions you might have about whether this will be 
> done soon?  :)

Um, yeah.  So I won't be using PEAK in my app for a while yet.  :(
But I *will* be playing with it in my sandbox!  :)

> This still isn't perfect, in that this API doesn't yet explicitly deal 
> with asynchronous results, or how to "listen" to assertions or 
> retractions, or how to deal with queries when there are assertions or 
> retractions that haven't been "flushed" to the physical storage scheme.  
> But it's getting there.  In fact, it's probably close enough to allow 
> prototyping some toy schema implementations (sans metamodels) to explore 
> how editing contexts and fact type implementations could/should work.

Cool!  I will observe (and try to play along) with great interest.

I hope you agree about the relevance of Description Logics, etc.
Another related Python app that might be of interest is GraphPath:

http://www.langdale.com.au/GraphPath

Cheers,
Steve