[PEAK] Command-line option parsing

Fri Nov 12 19:42:05 EST 2004

The peak.running.commands framework currently doesn't handle command line 
options, which rather limits its utility for creating command-line 
tools.  For example, it would be nice if e.g. 'peak serve' let you specify 
the port or host as a command-line argument.

While Python 2.3 includes a rather nice 'optparse' module for argument 
processing, it (understandably) lacks several important features for 
integration with PEAK.  For example, it's not designed to allow tying 
options to attribute descriptors.

So, what would a PEAK command-line option framework look like?  Well, here 
are some of the requirements I have:

* Inherit existing options from base class

* Ability to *not* inherit (i.e. delete) a specific option, or to avoid 
inheriting any options from a superclass

* Generate help message automatically

* In response to a specific option:

   - Set a value (or append to a list of values) with optional conversion 
from a string (and wrap any errors in an InvocationError)

   - Set a boolean true or false value

   - Increment or decrement a value

   - Perform an arbitrary action, with access to (and ability to modify) 
the remaining arguments

   - Raise InvocationError when a non-repeatable option (e.g. a "set" 
option) appears more than once

* Map non-argument options arguments to attributes or methods, both 
positionally and "*args" style, with optional type conversion from 
string.  That is, you should be able to declare attributes or methods that 
receive either the Nth argument, or that receive arguments from the Nth 
argument on.

* Raise InvocationError for missing "required options" or required arguments

* Compact notation allowing easy association of an option with attributes 
or methods

So, with these requirements in mind, let's try some syntax 
possibilities.   I'm assuming this will live in 'peak.running.options', and 
be used with an 'options' prefix.  I'm also assuming that we will add a 
metadata option to all binding attributes, and that it's one place where 
option metadata can be defined.  We can also use decorators to define 
methods/actions.

So, we'll probably have some things like:

* Metadata for options:

   - options.Set(*option_names, help/value/type+metavar, repeatable=False)

   - options.Add(*option_names, help/value/type+metavar, repeatable=True)

   - options.Append(*option_names, help/value/type+metavar, repeatable=True)

* Metadata for non-option arguments:

   - options.Argument(argnum, metavar)

   - options.After(argnum, metavar)  (arguments following argument 'argnum')

* Other attribute metadata:

   - options.Required(message)  (indicates that if at least one 
option/argument associated with this attribute was not set, raise an error 
at parse completion, with the supplied message).

* Function Decorators:

   - [options.argument_handler(help/type)]
     def _handler(self, argument, remaining_args):

   - [options.option_handler(*option_names, help/type+metavar, 
repeatable=False)]
     def _handler(self, parser, optname, optval, remaining_args):

* Class Advisors:

   - options.reject_inheritance(*option_names)  (reject named inherited 
options/required attrs, or reject all inherited options/required attrs if 
no names specified)

Items above that say 'help/value/type+metavar' would take keyword arguments 
to describe the help string, or to determine the value or type being 
set.  Either a value or a type *must* be set.  If it's a value, the option 
is just a flag, but if it's a type, the option will accept an option 
argument, and the type will be called to convert the argument string to the 
value to be set, added, appended, or whatever.  The handler decorators 
allow an optional 'type' keyword as well, to allow conversion of arguments 
or options from a string to something else.  The 'type' invocation should 
be wrapped in a handler that traps ValueError for conversion to an 
InvocationError.

The 'metavar' keyword, if specified, indicates the placeholder name to be 
used in "usage" output, for either option arguments or positional 
arguments.  The 'repeatable' keyword argument on various items indicates 
that the option can occur multiple times.

Here's some example code showing some of these features in hypothetical use:

# ========

from peak.running import commands, options

class SomeCommand(commands.AbstractCommand):

     first_arg = binding.Require(
         "First positional argument", [options.Argument(1)]
     )

     second_arg = binding.Require(
         "Second positional argument", [options.Argument(2)]
     )

     extra_args = binding.Make(list, [options.After(2)])

     verbose = binding.Attribute(
         [options.Set('-v','--verbose', value=True,  help="Be talkative"),
          options.Set('-q','--quiet',   value=False, help="Shhh!!")],
         defaultValue = False
     )

     debugFlags = binding.Attribute(
         [options.Add('--debug-foo', value=1, help="Enable foo debugging",),
          options.Add('--debug-bar', value=2, help="Enable bar debugging",)],
         defaultValue = 0
     )

     configSources = binding.Make(list, [
         options.Append('-c', '--config', type=str, help="Configuration 
source")
     ])

     [options.option_handler('--log-url', help="Logfile URL")]
     def _setLogFile(self,parser,optname,optval):
         try:
             self.logfile = self.lookupComponent(optval)
         except something...
             raise InvocationError("blah")

# ==========

So, what are the open design issues here?  Well, there is some potential 
for conflicting/inconsistent metadata.  For example, what happens if you 
specify both some positional 'Argument()' attributes and an argument 
handler?  More than one attribute for the same 'Argument()'?  How does 
'After()' relate?  What if you use an option name more than once?

I think we can safely ignore option name conflicts between a class and its 
superclass, or more precisely, we can simply resolve them in favor of the 
subclass.  Conflicts *within* a class, however, should be considered 
programmer error.

Hm.  Maybe there's another way to handle arguments.  Suppose it looked like 
this:

     [options.argument_handler()]
     def _handle_args(self, foo, bar, *etc):
         # ...

The idea here is that non-option arguments are simply passed to the handler 
positionally.  If you have a '*' argument, you get any remaining arguments 
supplied therein.  If you don't, you accept only a limited number of 
arguments.  If you use function defaults, those positional arguments are 
optional, and default to the default.  Finally, the argument names 
themselves can be altered to produce an automatic usage message.  For 
example, the above might be rendered as:

     usage: progname [options] foo bar etc...

Anyway, if we take this approach, we can get rid of the Argument/After 
metadata, and thus reduce this to a question of whether or not a given 
class has an argument handler, and consider multiple handlers per class to 
be an error.  I don't see a need to have a way to reject inheriting an 
argument handler, because you just define a new one.  If your command 
doesn't take arguments, then:

     [options.argument_handler()]
     def _handle_args(self):
         # ...

is sufficient to cause an invocation error if any arguments are supplied.

Hm.  Should the argument handler be invoked at argument parse time, or at 
command run time?  What is the difference?  Actually, when does parsing 
happen, period?

It seems to me that option parsing should take place relatively 
early.  This is because some PEAK command interpreters actually want to 
replace themselves with another object, when they are being used as an 
argument to some higher-level interpreter.  For example, when running 'peak 
CGI WSGI import:foo.bar', the 'WSGI' interpreter wants to substitute a 
wrapped version of the 'import:foo.bar' object for itself, so that the 
'CGI' command sees the wrapper when it goes to do "CGI things" to that object.

Currently, commands.AbstractInterpreter contains an ugly kludge to actually 
attempt to parse arguments at __init__ time, in order to replace the 
interpreter with the target object.  This is kind of sick, to say the 
least, and has led to quirky bugs in the past, not to mention various 
kludges in the commands framework like the 'NoSuchSubcommand' crap.

I think, however, that the best thing to do here is to fix the kludginess, 
such that commands that act on a subcommand always first ask the subcommand 
if it wants to replace itself with a target object.  Interpreters would 
pass this request along to their subcommands, too, so that the parent 
command always receives the "innermost" replacement possible.  Hm.  This 
could probably be part of the 'getSubcommand()' method, actually, so that 
all commands would inherit it.  And, if I made AbstractInterpreter support 
"replacing itself", then all existing command objects would still work, and 
I could see about phasing out the quirky bits.

Hm.  One thing that would be really interesting, would be if we could make 
it so that '_run()' is normally the argument-processing method.  Maybe we 
can arrange things such that, if you haven't defined any option metadata, 
your '_run()' method gets called.  There might be some backward 
compatibility issues, however, with command classes that inherit from 
non-abstract PEAK commands (i.e., other than AbstractCommand and 
AbstractInterpreter), and override '_run()' now.  For example, many people 
subclass commands.EventDriven, and if we added a '--stop-after' argument, 
and they overrode '_run()', there would be a conflict.

So, I think maybe we'll need another method name, like 'go', e.g.:

     [options.argument_handler()]
     def go(self, source, dest):
         # ...

And then the '_run()' in AbstractCommand can just run the option parser, 
triggering the argument handler once everything's 
parsed.  AbstractInterpreter's 'go' might then look like:

     [options.argument_handler()]
     def go(self, cmd, *args):
         self.subCmdArgs = args
         return self.interpret(cmd).run()

I don't much care for 'go()' as a method name.  Maybe 'cmd()'?  Anybody got 
any suggestions?

Anyway, looks like we've got the basic structure figured out, such that 
existing commands should still work the way they do today, but there are 
still some open design items:

   * partial option name matches -- should we support this?

   * what about tab-completion of command line options?  (e.g. via 
'optcomplete', see http://furius.ca/optcomplete/ for info)

   * precise definition of how usage messages are generated, including the 
ability for a command to participate in generating the help message (e.g. 
commands.BootStrap wants to list available subcommands)

   * how to override default usage messages/help strings, etc.  (e.g., can 
you override *just* an option or argument's help message from a base class, 
or do you have to just redefine the whole option?)

   * should we allow "interspersed arguments", and if so, how do we control 
that on a per-class basis?

   * should we even have the 'argument_handler()' decorator, or should we 
just declare type information for the arguments, and just let '_run()' call 
'go(self,*args)', checking for the argument count mismatch if any?

   * do we want to allow "option groups", ala 'optparse'?

I'll have to tackle these questions in a follow-up post at a later time, 
along with implementation issues like, should we use 'optparse' to 
implement the underlying parse mechanism?  There are advantages and 
disadvantages that need to be weighed.

In the meantime, these prerequisite tasks can be started:

* Create a basic metadata registration facility for peak.binding, such that 
metadata given to bindings is invoked at class creation time, and told what 
class+attribute it's being applied to

* Allow 'binding.Attribute' to have a 'defaultValue' specified, so we have 
a simple way to define bindings with a default value, besides Make/Obtain.

* Fix the interpretation kludge in the commands framework, by defining a 
"replaceable subcommand" facility (and investigate how the IExecutable 
stuff might be cleaned up, too.)

* Investigate whether 'optparse' has sufficient hooks to allow it to drive 
the option parsing we want.

Finally, I'll recap the currently-anticipated, moderately well-defined APIs 
that would be needed:

   - options.Set(*option_names, help/value/type+metavar, repeatable=False)
   - options.Add(*option_names, help/value/type+metavar, repeatable=True)
   - options.Append(*option_names, help/value/type+metavar, repeatable=True)
   - options.Required(message)

   - [options.option_handler(*option_names, help/type+metavar, 
repeatable=False)]
     def _handler(self, parser, optname, optval, remaining_args):

   - options.reject_inheritance(*option_names)

   - maybe an 'options.Group(description, sortPosn=None)', that can then be 
used as a 'group' kwarg for other option definitions.

Hm.  That doesn't look too bad, especially since 
Set/Add/Append/option_handler are mostly trivial variants of each other, 
while Required and reject_inheritance just set some simple 
metadata.  Really, most of the complexity will be buried in the parsing, 
and in the assembling of a parser from the metadata.