Versioned Storage Re: [TransWarp] Basic "storage jar" design

Wed Jul 10 17:02:37 EDT 2002

On Tue, 2002-07-09 at 15:38, Phillip J. Eby wrote:
> At 02:09 PM 7/9/02 -0500, John D. Heintz wrote:
> >What are your thoughts on versioning at the storage level?  Do you have
> >very many actual requirements for versioning business objects?
> 
> Not really.  I generally consider versioning to be something which needs to
> be an explicit part of the application domain model, and is usually not
> something I want to hide in the implementation layer.

In the domain of document management I've found that to be the reality
of most DMS/CMS systems.  My personal feel is that the current
"infrastructure" support for versioning isn't sufficient for managing
content.  It leaves too much to be rebuilt for every application, that
is why I tried to define a versioning model that could be used for
business analysis as well as an implementation guide.  Not much interest
in it though, but I had to scratch that itch.

> 
> 
> >I've been working with hyperdocuments (XML and XLink kind of stuff) the
> >past few years and have gotten pretty interested in versioning linked
> >information objects with the ability to flexibly manage the
> >configurations of those objects. It's my opinion that stuff like
> >workflow builds on lifecycle capabilities and lifecycle builds on
> >versioning capabilities.  While I think these are incredibly interesting
> >technical problems, I haven't yet determined if there is a strong
> >business need for solutions based on these ideas.
> 
> Depends on what you mean by "workflow".  If you're talking about content
> management workflow, I agree.  If you're talking about general process
> workflow, versioning support usually isn't a significant part, IMHO.

I mean pretty much all of workflow: in any given workflow a "reject"
choice might require going back to historical content to start over. If
every unit of content is independent that's pretty easy, but if there
are relationships then the complexity starts.

> 
> 
> >The versioning model that I've written up [1] and extended to
> >hyperdocuments [2] is designed to enable policied version lookup based
> >on the context of that resolution. Objects would association to each
> >other not just through raw oids, but rather through some resourceId and
> >other data to specify how to resolve the correct version.
> >
> >An easy example would be 1) show me the latest public version of my
> >website, 2) show me what my website publicly contained on March 30th,
> >and 3) show me Eliot's draft branch of the website on March 30th. In all
> >of these cases I want the hyperlinks between objects to resolve to the
> >versions of resources that were effective at the right point in time and
> >specific "branch".
> 
> I think that I've been using the term "oid" much more flexibly than people
> realize, and that everyone projects their own limited meaning of "oid" onto
> my use of the term.  :)
> 
> To me, an oid is simply something that uniquely identifies an item, and
> which may be used as a Python dictionary key.  That's it.  So there's
> nothing that stops you from having an OID that means "newest version of
> document X" or "version Y of document X".  You just need to have in the
> appropriate racks, the mechanism for "understanding" such OID's and mapping
> them to/from an external storage.  For example, if a field loaded by rack A
> needs to refer to "item Q in rack B with the same version as me", then rack
> A must look at the version of the item it's referring from, in order to
> provide rack B with an OID that indicates the version to be loaded.  Does
> that make sense?

That's what I suspected.

> 
> 
> >1) allowing the oid of an object to contain (resourceId,
> >resolutionPolicy=None, *args). I think you already would allow this, but
> >I'm being explicit. Here the *args would be policy dependent data, like
> >a specific versionId. Examples of resolutionPolicy include OnSnapshot
> >and Fixed. (See papers for details.)
> 
> I consider the whole issue of version-dependent retrieval to fall on the
> "CTAP" side of STASCTAP; that is, "complex things are possible".  :)  So
> you certainly could do something like this, but until I personally actually
> try to do something that needs this, I'm not going to indulge in designing
> a framework for it.  I'd rather just implement racks that embed the
> specific policies I need in the application context, and then later try and
> figure out what would make it easier to write lots of these, if I found
> myself doing lots of these.

I have found myself doing a lot of these, and worse than that I've found
all DMS/CMS solutions to sufficiently support versioning for linked
content.  

Zope takes a step in the right direction with ZODB Versions: a site-wide
Version that can get pushed to the main site when ready. This isn't used
very often (I don't think) because of some limitations and challenges in
the ZODB semantics and implementation details of Versions.

Most other DMS/CMS systems I'm familiar with provide about as much
support as RCS: not much.

> 
> One issue in particular is that the versioning *mechanism* could vary
> significantly by application and backend.  I know of certain applications
> at the company I work for which contain data about revisions of things.  If
> I wrote an application which needed to access that data, I'd need to take
> into account the back-end format, so I might be limited in how much of a
> generic framework I could use.

Ug, I've been thinking hard about how to integrate my versioning model
with other systems. I still don't have anything terribly insightful to
share on the topic. I think that your idea of just stuffing an oid in
there and the back end resolves it is probably the best one.

> 
> However, if you want to built atop the basic racks and other tools provided
> by PEAK to offer a more advanced framework, I think that would be cool, and
> if it complied or could be made to comply with PEAK architectural
> requirements and standards, we could maybe even include it in the
> distribution.

I may some day get there, I've just about gotten through writing down
most of the abstract stuff. We've already built the glorified prototype
(that now lies in a CVS coffin ;) and learned a lot from that.  I pretty
much just need to start implementing slices and layers to see where I
can get.  I'll definitely be following PEAK's evolution though.

> 
> 
> >2) and some "versioning" Jars that can act as facades for the series of
> >Snapshots and Branches from a Versioning Storage. A "versioning" Jar
> >could then be set to some particular timestamp and branch to always
> >return versions relative to that context. This could yield read-only
> >content from those snapshots. The "version" jar could also point to the
> >HEAD on some branch and provide read-write access. (This is pretty much
> >what my team called a "Sandbox", but we haven't written up anything
> >about it yet. This thing also does conflict detection and merging
> >support.)
> 
> Wow.  That's way on the far side of CTAP, for me.  :)  I do want to comment
> that "setting" a rack to a timestamp sounds like a bad idea for me; I'd
> want to instead create *instances* of a rack, each configured for a
> timestamp, because I'm inclined to "immutability as a way of life".  :)
> That is, I find that when application components are stateful, it leads to
> debugging, debugging leads to fear, fear leads to anger, and anger leads to
> the dark side.  :)

A previous draft of that email had a better description of this: One
rack could act as a facade for the set of racks representing each
snapshot (timestamp, branch pair). The only state would be which other
rack to delegate to. Removing the facade is fine with me though. I
appreciate your advice though.

The Sandbox-ish bit is just complicated however you slice it though. 
When merging n-branches together with arbitrary sub-set resource with
arbitrary versions choices you end up with a bit of code.  We've written
it twice now, I think a third time will be the charm. ;-)

> 
> 
> 
> >These are just pretty much a raw dump of my ideas based on your
> >mailings, so take it with a grain of salt.  ;-)
> >
> >[1] SnapCM: Versioning Object Model.  This papers defines in UML and OCL
> >an object model for precisely describing the branches, snapshots and
> >version link resolution behavior.  (Very abstract model, just the
> >concepts).
> >http://www.isogen.com/papers/snapCM/index.html
> >http://www.isogen.com/papers/snapCM.pdf
> >
> >[2] Versioned Hyperdocuments: Support for Lifecycle Models. This paper
> >extends the SnapCM model to Documents and Hyperlinking. It includes a
> >much more complete narrative of how/why/when kind of stuff.
> >http://www.isogen.com/papers/versioned-hyperdocuments/index.html
> >http://www.isogen.com/papers/versioned-hyperdocuments.pdf
> 
> I'll take a look at these when I get the chance.  I do have some curiousity
> about versioning models, I just don't have a mandate to *do* anything about
> them at present.  :)

I understand.  Nowadays I tend to see everything as a versioned
lifecycle problem, but that's just my daim bramage!

> 
> 
> _______________________________________________
> TransWarp mailing list
> TransWarp at eby-sarna.com
> http://www.eby-sarna.com/mailman/listinfo/transwarp
> 
-- 
John D. Heintz | Senior Developer

1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.380.0347 | jheintz at isogen.com

http://www.isogen.com