[PEAK] Organizing documentation of Python code

Thu Sep 23 14:16:54 EDT 2004

At 11:17 AM 9/23/04 -0500, Ian Bicking wrote:

>Or, you could do a hybrid.  Load the objects, then do minimal parsing on 
>the module, adding attributes to objects you loaded to indicate their 
>order, and perhaps some other small things, like finding interstitial 
>docstrings.

Sure, you could.  But why?  Isn't it almost as easy to do:

     [doc.begin("Abstract Methods (must be overridden in subclasses)")]

or:
     [doc.begin(ABSTRACT_METHODS)]

as it is to use an interstitial docstring?

This technique needs no parsing at all, and is nicely explicit as to what 
the docstrings are for.

The only reasons I see for wanting to parse source are:

1. Support documenting arbitrary modules with sequence info

2. Document modules that can't be imported on the platform where the docs 
are generated

The first is a non-goal for PEAK, but I could accomplish it anyway by 
running imports under a trace hook, if I really wanted to.  (But I probably 
wouldn't, because alphabetic order is better than an arbitrary order if the 
author didn't intentionally organize the code that way.)

The second is more interesting, since e.g. peak.storage.DDE uses 
Windows-specific features, but our documentation is generated on 
Linux.  But, the hybrid approach you're suggesting won't work for such 
modules anyway, because it's still initially import-based.

But, in truth, PEAK modules that use platform-specific libraries are 
usually written to delay those imports: for example, peak.storage.DDE only 
imports win32ui and dde when you actually open a DDE connection.  Following 
this convention for any Windows-specific code in PEAK would allow an 
import-based documentation tool to still generate docs on Unix.

I've been doing a bit more thinking about topics and the like, and 
borrowing terminology from XFML ( http://www.xfml.org/ ), I think that I'll 
use the idea of "facets" containing hierarchies of "topics", but there will 
be two kinds of facets: "TOC"s (tables of contents) and "Index"es.  An 
Index is a docset-wide listing of symbols grouped by topics in that index.

For example, "Subclasses" could be an Index whose topics are classes.  The 
classes listed under the topic for a given class would be the classes 
registered as subclasses of the class topic.  This is a simple way to 
implement relationships and links.

Other indexes would be alphabetical, such as "Methods" - its topics would 
be names, and all methods named a given name would be collected within 
those topics.

There are many ways to format an index, of course, ranging from grouping a 
namespace's contents by the topics in that index, or listing the items in a 
topic related to the current item (e.g. listing known subclasses of the 
current class).

The difference between an Index and a TOC is that Indexes only list links 
to actual documentation items, where a TOC is used to order and group the 
contents of a namespace.  Actually, I guess you could actually use a 
designated Index as the TOC, simply by convention.  After all, what if you 
wanted to generate docs sorted by something else?

So, the overall process still looks something like:

* Import API modules and create Symbols for them, adding them to a DocSet 
and populating the symbols with references to the actual objects they represent

* Scan through all of the objects, running a function on each one to 
produce additional index entries

* Generate documentation, using methods on the DocSet to query the indexes 
and symbols

It may also be that somewhere in there, there should be a pass to parse the 
docstrings and extract other metadata to put in the indexes, create new 
symbols, tagged values, etc.  And, there might be a configuration file 
being read to insert other metadata and tagged values.

At this point, the main vagueness in the design is formatting hints like 
the sequence between topics of general applicability.  I'm thinking that 
the "topics" passed to doc APIs should be able to be strings, so that you 
can just say whatever's on your mind when doing something new.  But, if you 
mix that in with existing general-purpose topics like "Abstract Methods" 
that might already be defined by their use in another class or module, what 
order do they end up in?

One way to deal with this is to use hierarchy to register all-purpose 
topics like "Abstract Methods" as subtopics under various standard topics 
with an overall ordering, and then put any new topics generated with 
strings under a "Contents" topic.  But this still presents the possibility 
of topic overlap between and change-of-sequence, unless the namespace has 
its own sequence stored for the topics.

But keeping the exact sequence isn't always what you want 
either!  Sometimes you'd rather let the system group the methods sensibly 
on its own.  It seems you'd have to have an option or something.

And that's really where formatting bugs me at the library level: too many 
options with regard to sequencing.  (Output formats are another concern, 
but apart from the sequencing issue, I don't think they really affect the 
structure of the metadata library.)

Maybe the right thing to do is distinguish "sections" and "topics".  You 
could record an item or items under multiple "topics", but only one 
"section", and a namespace's sections are linearly sequenced.

Of course, inheritance leads to some interesting issues.  Should you list 
all inherited items under a section for inherited methods?  I guess if 
sections are hierarchical, then we could list inherited methods under the 
same sections as the base class used.  But it would probably be better to 
match up sections, adding any sections that the subclass is missing, and 
simply tag the methods as inherited.

Sections could probably be implemented as topics; in effect, when you 
define sections under a symbol, the symbol will create a private Index to 
serve as its table of contents, and use that index to sequence its output.

So, is a topic anything more sophisticated than a string?  If index output 
is either in order-of-definition or alphabetical, what else do we 
need?  For hierarchy, it could be a sequence of strings, or perhaps a 
nested set of tuples, such that:

     PARENT = "Top-Level Topic"
     SUB1   =  PARENT, "Subtopic at Level 1"
     SUB2   =  SUB1,      "Subtopic at Level 2"

In other words, a topic is either a string, or a tuple of a topic and a 
string, recursively.  Then, an Index is little more than an ordered 
collection of topics, providing methods to walk its tree or add new topics.

Not bad.  Not bad at all.  I think that wraps up sequencing and grouping 
issues/choices.

At some point, I need to write up a prioritized feature list for this thing.