[TransWarp] Input wanted: Streams, Factories, sessions, etc.

Phillip J. Eby pje at telecommunity.com
Sun Nov 17 15:53:41 EST 2002


Ty and I have been batting around the idea of representing files and 
network connections as "stream factories".  Specifically, a stream factory 
is an object with an 'open()' method (and possibly others) for manipulating 
the referenced stream/file/connection.  Examples of other methods might 
include exists(), stat(), etc. The open() method would return a normal 
"file-like" object, possibly an actual file, for reading and writing.

What good is this?  Well, it allows for some interesting things like being 
able to manipulate zipfile contents as if they were real files, or maybe 
having transactional streams which actually write to a (locked) temporary 
file and then rename themselves to overwrite the original file at 
transaction commit, and so on.  It could also be used to represent 
os.popen() targets.

A similar pattern is needed for certain types of network connections.  For 
example, although right now our SMTP URL factory returns an open SMTP 
connection, it really should return some kind of factory with a method to 
open a connection or session.  This also makes sense for things like HTTP 
connections (after all, suppose you want to do a POST instead of a GET, or 
want to control the headers?).

In addition, it seems to make sense for almost any sort of messaging API - 
have a more or less static reference (via naming.lookup()) to an object 
that lets you "tear off" sessions, similar to the way ManagedConnection 
objects let you "tear off" cursors to perform queries.

So, what should the actual interface be.  For file-like objects, it seems 
it should include:

open(mode='r',bufsize=0)
exists()
isfile()
islink()
isdir()
stat()
mimeType(), guessType()...?

It doesn't seem to make sense to include the ability to delete the file, or 
rename it, since those are properly functions of the file's container (e.g. 
a naming context).  We'll probably need a helper class or functions to 
parse a file mode string, so that file-like objects that aren't really 
files will have a consistent interpretation of mode strings.

For file:// URLs, we could implement this interface directly on the URL 
class, since instances have all the information needed (i.e. the 
filename!).  HTTP URLs could implement an 'r' open mode as a simple GET, 
optionally using other modes to have more control over headers sent, 
etc.  In theory, FTP URLs could interpret 'w' as returning the data 
connection over which the data upload should be sent.  Anyway, it seems 
that URLs in general can and should be their own stream/connection 
factories if there's a need to have one.  This would allow us to have some 
default object/state factories that would use this stream factory interface 
to load or save objects in a naming context.

One of the big questions is where to put the interface definitions 
themselves, though.  They don't quite fit under any of the ideas of 
binding, naming, config, or even really storage!  Perhaps Ty's idea of 
'peak.networking' might make more sense, although even there it's an odd 
one out when it comes to files.  The interfaces also don't seem ubiquitous 
enough to deserve placement in peak.api.

For things like SMTP and other messaging interfaces, open() seems wrong, 
since you don't really want a stream.  (Or do you?)  Perhaps it would make 
more sense to call a 'session()' method for such kinds of objects, which 
returns a session object that supports 'open()', but that open() would take 
a different set of parameters than the usual.  For example, a SMTP 
session's open() could take all the parameters of smtplib's 'sendmail()' 
method, except for the actual message.

Okay, let's take a use case and see how it works in a short script:

from peak.api import *

storage.beginTransaction()

s = naming.lookup('smtp://some.where').session().open(
    'me at nowhere',['you at somewhere']
)

print >>s, "From: me at nowhere"
print >>s, "To: you at somewhere"
print >>s, "Subject: test"
print >>s
print >>s, "Here's my test e-mail"

storage.commitTransaction()

# s is closed by the transaction, so writing to it
# past this point causes an error

Interestingly, a session is rather like a ManagedConnection, in that it 
needs to keep track of its cursors (streams).  Unlike a managed connection, 
it needs to do so to keep you from trying to send two e-mails on the 
session at the same time, or else to keep a connection pool and 
automatically handle it behind the scenes (which would definitely be YAGNI 
for us right now).

So, here are some of the open questions...

* Where should the interfaces for these ideas live?  (If all else fails, I 
suppose peak.naming.interfaces would be okay, since you'll mainly *get* 
instances of these factories via the naming system.)

* Should all messaging services (such as e-mail) be transactional?  (My 
inclination is yes, for data integrity, Ty's inclination is no, for 
simplicity.)

* Should an explicit close() operation be required for a stream's content 
to be valid?  (My inclination is yes, if the resource isn't controlled by a 
transaction, but no if it is controlled.)

* Are there any parameters in common for session() methods across different 
kinds of systems (e.g. e-mail, spread, ...)?  Are there other things this 
could be used for?

Your thoughts and suggestions, on these questions or anything else, are 
appreciated.




More information about the PEAK mailing list