[PEAK] checking arbitrary bytecode for potential crashes

PJ Eby pje at telecommunity.com
Wed Oct 1 13:04:00 EDT 2014


On Wed, Oct 1, 2014 at 2:19 AM, Dima Tisnek <dimaqq at gmail.com> wrote:
> Makes sense.
>
> Do you think it's realistic to map arbitrary bytecode to opcode or ast
> api and let bytecodeassembler recreate said bytecode with some
> validation, or is that just not possible in general case?

Not really, no.  All it'll get you is some stack-level checking and --
*maybe* -- detect unreachable code.  Like I said, it's not an
all-purpose verifier, it just has some sanity checks to help catch
code generation bugs.  It's for use in tools that compile DSLs or
other languages to Python bytecode.

The specific verification that happens is that it keeps track of the
stack depth at each bytecode location, and verifies that any jumps to
that location are coming from a location with the same stack depth.
Any unconditional jump or returns mark the stack depth at the next
location to -1 (meaning unknown), because of course execution cannot
continue there, unless it is a jump target from somewhere else.  This
also means that trying to generate code at such a location will fail
(due to the unknown stack depth) unless some jump or block has
targeted that location, in which case the stack level will be known.

That's how both stack level analysis and dead code detection are done,
in one algorithm.  And that's the only real verification that's done,
that would be relevant for arbitrary bytecode.  Its intended  purpose
is to help catch code generation bugs like mixing up your jump labels
or not pushing or popping everything you were supposed to before
looping, having different stack levels when two branches merge, things
like that.


More information about the PEAK mailing list