[TransWarp] Towards a query theory, part 1: filters and correlation

Tue Oct 14 12:43:16 EDT 2003

On Tuesday, October 14, 2003, at 11:12 AM, Phillip J. Eby wrote:
> At 02:20 PM 10/14/03 +0300, Niki Spahiev wrote:
>> IIRC SQLObjects has many similar ideas implemented.
>
> Thanks, but as far as I can tell from its documentation, it does not 
> support correlation, aggregation, or grouping.  It only implements 
> what I consider "the easy part": pure filtering based on traversal 
> expressions from a single start set.  Such "easy" queries can be 
> executed now on Python objects, using the functions in 
> peak.model.queries, although SQL generation is not yet supported.

No, SQLObject doesn't do this (though it can both generate SQL and also 
execute its queries on Python objects).  But imagining that it might...

> Generating SQL from "easy" queries involves a lot of details, but is 
> straightforward, on the whole.  Mostly it's a process of walking the 
> query to extract table aliases, joins, and criteria.
>
> Here are PEAK's "hard" query requirements:
>
> * Support correlations (e.g. the "find employees that lives in the 
> same city as their supervisor, but was born in a different country" 
> example)

If SQLObject were extended to do this, it might look like:

var = Employee.var
query = AND(var.address.city == var.supervisor.address.city,
             var.birthplace.city != var.supervisor.birthplace.city)

SQLObject's basic metaphor is one of tightly encapsulated row/objects, 
so we have to produce objects, not arbitrary tuples/relations.  I.e., 
we have to find instances of Employee, not just cities, or a 
combination of employees and their supervisors.

I guess, given the above query, you could extend this to do:

select(query, query.var, query.var.supervisor)

The iterator would generate (employee, supervisor) tuples.

> * Support aggregation (count/min/max/avg) over groups (GROUP BY in SQL)

I would not put this in the query.  It doesn't effect which objects we 
find, but rather what results we return.  Maybe if we wanted the total 
salary paid, listed according to supervisor:

var = Employee.var
select(var.supervisor, groupBy=var.supervisor, 
aggregate=SUM(var.salary))

Implementing that, though, would be challenging.  Maybe easier if 
select() was given more explicit information about what it was expected 
to do.

> * Support cross-database joins, where an object's data resides in more 
> than one system, or where some of an object's features are the result 
> of computation in Python over data retrieved from an external source.

Well, this is just a nice feature that's hard.  If you support Python 
execution you can always do it brute-force.  If you want to do it right 
you need a query optimizer.  Holding out for a single theory of queries 
that supports this on top of everything else seems like a hurdle that 
is too high to leap (at least in one bound).

--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org