[PEAK] Re: Trellis-fork

Sergey Schetinin maluke at gmail.com
Fri Jul 3 16:30:55 EDT 2009


I want to explain one big change that I want to introduce to Trellis.
I will not do it right now, as it's something that should be done all
at once, doing it incrementally would be problematic. The change is to
move most of the logic from cells to the controller. The rationale is
that for consistency outside of txns, for concurrency and a few other
things snapshots are a must. If the cells were involved in that it
would take a lot of coordination while if it the cells were relieved
of their role of storing their values, snapshots come naturally.

Here's a draft of how I think things will work.

First of all the cells hold no values, their role is more of a *key*
to the storage managed by controller.

Then there's a controller and a new kind of object: transaction. Right
now the transaction is represented by a set of values in certain ctrl
attributes. To make that more explicit, let's assume that when there's
a transaction active, there's a ctrl.txn attribute with a Transaciton
instance, so ctrl.active is pretty much the same thing as ctrl.txn is
not None.

So, the cells have no _value attribute, instead the get_value would
look something like:

def get_value(self):
    if ctrl.txn is None:
        return ctrl.data[self]
    else:
        return ctrl.txn.snapshot[self]

The txn.snapshot is a copy of ctrl.data at the time the transaction
was starting. Both data and snapshot are read-only! When the
transaction commits the ctrl.data is replaced with a new dictionary.

Now, the cell values can change inside transaction but as we are using
snapshots we should store and get those changes somewhere else:

def set_value(self, value):
    # atomically
    ctrl.txn.writes[self] = value

def get_value(self):
    if ctrl.txn is None:
        return ctrl.data[self]
    else:
        try:
            return ctrl.txn.writes[self]
        except KeyError:
            return ctrl.txn.snapshot[self]

At this point we ignore the dependency and input conflict tracking,
because it's not that different from the system used now.

The interesting part is what happens on commit. We need to create new
ctrl.data and make sure it's consistent. Remember, we want to support
concurrent transactions, so ctrl.data might be different from
txn.snapshot by now, let's look at that case.

First of all we need to make sure that current ctrl.data has the same
values we have read from snapshot, so get_value would look like this:

def get_value(self):
    txn = ctrl.txn
    if txn is None:
        return ctrl.data[self]
    else:
        try:
            return txn.writes[self]
        except KeyError:
            try:
                return txn.reads[self]
            except KeyError:
                val = txn.reads[self] = txn.snapshot[self]
                return val

The commit phase is synchronized, that is it happens sequentially even
for concurrent transactions, here's how new .data creation might look:

# method of Transaction class
def merge_into_data(self, data):
    # first, make sure the read values are still valid
    if data is not self.snapshot:
        for key, value in self.reads.iteritems():
            if data[key] != value:
                raise RollForward # or just retry the affected cells!
    # next merge the writes
    new_data = data.copy()
    new_data.update(self.writes)
    return new_data

Another interesting change is how rule rollback happens -- we delete
items it set in txn.writes and do the same for the rules that read
them (that obviously has to be tracked and is not represented in the
source code above). This means there's no linear undo log and only the
rules that *had* to be undone will run again.

BTW, the discrete cells wouldn't need to store anything in the snapshots:

def set_value(self, value):
    # atomically
    # for new txns ethereal is an empty dict
    ctrl.txn.ethereal[self] = value


def get_value(self):
    txn = ctrl.txn
    if txn is None:
        return self.resets_to
    else:
        return txn.ethereal.get(self, self.resets_to)


Cell initialization is pretty straightforward too. When we look the
value up in snapshot (or data) and the cell wasn't initialized we'll
get a KeyError, at this point we might run the associated rule and put
the result into writes. This way even if the same cell is initialized
in more than one concurrent txn it will work out.

Now the part I'm still thinking about is cell initialization,
rollbacks and @perform.

The rule of the cell might create yet another cell while initializing,
for example a new maintain that writes some cell that was already
read. Some rules will have to be rolled back because of that, however
what if some of those rules are performers? Those cannot be rolled
back, generally. This is a problem that currently exists in Trellis
too.

There are a few solutions to this:
 * prohibit performers from using lazily initialized cells -- not satisfactory
 * require performers to be undoable -- will not work for a lot of
things like networking etc
 * add support for deferring the effect itself

Seems that the first and the last ones together would cover it nicely.
So if the @perform accesses something that ends up initializing a rule
that runs in non ctrl.readonly mode it's an error. Plus there's a new
effect() method, callable from @maintains something like this:

@maintain
def write_file(self):
    effect(self.file.write, self.status)

The self.file.write will run after the transaction if finished (just
before .data is replaced) and with the same restrictions as our new
@perform would. There's a problem though. The performers must see
discrete values as well. This means that the performers have to run
much earlier than entire transaction is complete. And with concurrency
transactions may be retried even when there were no errors that cannot
be allowed.

So IMO the solution is to remove @perform completely and only keep
effect() from @maintain rules. This way even effects of the discrete
values will run after the txn itself. One can even add support for
effect ordering (something like "showing the windows is the last thing
to do") and effect retries -- there's a [partially] ordered set of
effects, what should happen if any of them fail etc, this way all that
can be considered without trying to keep all the required
transactional properties in mind. The developer might even choose for
effects to happen in a different thread or just get queued globally
without blocking related transactions.



-- 
Best Regards,
Sergey Schetinin

http://s3bk.com/ -- S3 Backup
http://word-to-html.com/ -- Word to HTML Converter


More information about the PEAK mailing list