[TransWarp] Mapping Elements to/from records in tables

Phillip J. Eby pje at telecommunity.com
Sun Jun 30 21:07:37 EDT 2002


Background: for large and complex data models, it may be best to use a lot 
of simple, automatically generated jars whose contents simply represent 
rows, or collections thereof.  But domain-level objects will often have 
data from more than one table, and of course they will need to be instances 
of domain level classes, rather than of a generic "row" class.  So, we need 
a "dispatch" layer which maps between domain and data-level concepts.

We could implement the dispatch layer using plain AbstractJars, defining 
load() and save() methods that retrieved and mapped and saved the necessary 
attributes.  However, this would force loading state from all of the tables 
that made up the object, even if they were never used.  It would also force 
writing back to all the tables, even if only one of them had changed data.

In principle, that doesn't sound like a bad thing, especially since Ty and 
I already have apps that effectively do that.  Where this *won't* work 
well, is multi-row queries, where the query doesn't join in all the tables 
that provide the object's state.  Either the ghost() method will have to 
reject these partial states (except perhaps to determine the a ghost's new 
class), or it will have to issue a single-row query to get the rest of the 
data.  Either way, the result won't be efficient for any kind of 
multi-object processing.

I believe it would be more modular and more efficient (in terms of DB 
retrievals) to consider the "state" of an Element as being mapped to some 
number of "records", where a record is a row from one of the tables that 
make up the Element's state.  If an Element's "state" is just a dictionary 
of names to records, then its state can be loaded without necessarily 
loading the state of any of its records, except to the extent needed to 
determine the Element's class.  Any state preloading done by multi-row 
queries will be on the records themselves, but of course the element can 
take advantage of it.

Now the issue is that either 1) the element classes need to be changed to 
access their state from records, or 2) there needs to be some kind of 
intervening dispatch object which the element can use to interact with its 
records.

The first approach is certainly possible, since one can simply use module 
inheritance and add extra information to the element classes' features to 
tell them what attribute of what record they should use, and any mapping 
information.

But the second approach may have some added benefits.  Notably, it might be 
possible to define the mapping in a UML model, or to use an automated 
mapping for data models that follow standardized mapping patterns.

Either way, we need to consider the types of structural features that 
elements can have (fields, references, collections, and sequences), and 
look at the options for implementing them in common storage mechanisms (SQL 
and LDAP).


Fields
------

Fields contain immutable values, whose contents, if any, are also 
immutable.  Strings, numbers, and tuples consisting only of strings, 
numbers, or tuples are obvious examples.  There are only two possible 
mappings for these:

* One-to-one -- the Element field maps to one field in an underlying 
record.  It may or may not have the same name, and there may be read and 
write expressions to convert the data type or value in some way.  So this 
mapping's knowledge needs to include: which element field, which record, 
which record field, and any transformation expressions.

* One-to-many -- the Element field corresponds to more than one 
field.  This might occur, for example, when the Element wants a date/time 
value, but the DB has separate fields for the date and the time.  This is a 
more complex mapping, in that the write transformation probably needs to be 
a function rather than just an expression.  But it's still pretty 
straightforward.


References
----------

References are a (possibly None) pointer to another Element.  At the 
Element level, they're actually implemented as a special form of 
Collection.  At the database level, they really only have one kind of mapping:

* Foreign key reference -- A field in a record contains a value that can be 
used as an object ID for another jar to retrieve the referenced 
object.  The mapping for this needs to know what jar the object should be 
gotten from, and an expression to compute a new foreign key value when the 
reference is changed.  Of course, it also needs to know the element's name 
for the reference feature and the underlying record and field.  (The 
foreign key expression will typically be just the referenced object's oid 
as given by the referenced Element jar, since for simplicity you'll 
normally want the Element-level jars to use the same primary oids as the 
tables they're based on.  Alternate key jars will suffice to get you to any 
domain-level keys that aren't database oids.)

It would probably be more correct to say that this mapping has two 
expressions: one to compute the oid for lookup in the referenced element 
jar, and one to compute the value placed in the record when the reference 
is changed.  By default, both of these expressions won't do any 
transformation.  I keep thinking that it's also possible you might have a 
1-1 query jar to implement something like this, but I can't for the life of 
me think of a use case.  So YAGNI.  Heck, having expressions at all for 
this mapping is probably YAGNI.  (YAGNI=You Ain't Gonna Need It.)


Collections
-----------

Collections are (possibly ordered) sequences of related Elements.  There 
are two ways to store them in a database:

* Pointer/ID list -- this is the way LDAP often does it.  You'll have a 
multi-valued attribute that contains DN's, which are effectively the OIDs 
of the targeted objects.  This mapping needs to know the name of the 
underlying field, and the jar to retrieve the items from.  This is where 
"lazy" state management of a dispatch layer really shines.  If we were 
loading state for one of these without a dispatch layer, we'd have to get 
ghosts for all of these references right away.  Which would be a big 
headache, because guess what, LDAP objects are polymorphic!  That means 
that to load a ghost, you need to know the class, and to know the class, 
you have to get the state, which means that just to load the state of an 
LDAP group object you'd have to load the entire freaking state of every 
member and admin of the group.  Anyway, the mapping here is still pretty 
straightforward.  To load, it creates a list of ghosts from the ID values, 
and to save, it creates a list of IDs from the ghosts.  (Note: the 
model.Collection feature implementation flags the Element as having changed 
when the collection changes, so we don't need to have a persistent list or 
anything like that.  The downside is that it'll flag a change for the other 
kind of collection mapping, too, even though it's not needed...  Probably 
all of these mappings should check when saving things that they're actually 
changing the underlying values!)

* Inverse Foreign Key relationship -- this is the way SQL does it.  There's 
no field on any of the records that comprise the Element; instead, you need 
to use a query jar that returns a ghosted PersistentList of the objects 
that point to the Element.  This mapping just needs to know the jar to use, 
and how to determine the OID.  Normally, you'll just use the OID of the 
Element itself.  This mapping is the only one that doesn't have to 
implement any kind of "save" capability, because it's the sole 
responsibility of the other side of the relationship.


Sequences
---------

A sequence is an ordered Collection.  It has the same mapping types as a 
Collection, but with some added twists.  For pointer/ID list mappings, 
there's nothing really special to do.  The order is the order; either the 
underlying DB supports ordering or not.  But for foreign key relationships, 
the query jar needs to have a sort key, *and* it needs to actually 
implement a "save()" operation on its PersistentList!  That save() 
operation will go through the list of referencing objects and ensure that 
their sort keys match the sequence in the list.  This is a highly 
application- and data model-specific operation.  But STASCTAP.  (Simple 
Things Are Simple, Complex Things Are Possible.)



Wrap-Up, and Implementation Notes
=================================

It sounds like an application will have a minimum of: one jar per table, 
plus one jar per root Element type (root types are ones with no DB-level 
types they inherit from), and one jar per inverse foreign key reference 
type.  Plus of course any actual *query* jars, and any DB-level jars that 
the foreign key and query jars rely on.

For a WarpCORE-based DB, there will be few Element jars, because WarpCORE 
has few root types (objects and events are the main ones).  And if the 
table jars and DB-level foreign and alternate key jars are instantiated 
from metadata, there's little programming to be done there.  Probably some 
of the Element-level alternate key and foreign key jars could be generated 
from metadata as well.  A set of mapping classes that implemented the 
roughly 5-7 mapping strategies described in this posting, would probably 
make the specification of the rest pretty straightforward coding.  Probably 
the design of a compact notation for expressing the parameters of the 
mappings will be the most interesting part of developing them.

Let's look again at the issue of dispatch layer code vs. dispatch 
objects.  It still seems like both are practical approaches to take; none 
of the mapping styles inherently rule out being used by either.  However, 
many of the mappings are computationally intensive, implying that values 
requiring transformation should be cached.  But cached values can become 
out-of-date, and if the lower level objects aren't directly updated on 
writes, they can become out-of-date as well.  I'm going to have to think 
about this issue some more, as there are different ways to handle it with 
each dispatch approach.

Another open issue is record format: a standardized way of representing 
DB-level states.  We may wish to devise classes that have tuple-like 
storage requirements, ala ZODB's Record type.  But this is partly dependent 
on how we end up addressing the caching/transformation issues.




More information about the PEAK mailing list