Thursday, 2 May 2013

The Dynamics of Catching Exceptions in Python

In which I discuss dynamism in catching exceptions - something which took me by surprise and could hide bugs - or enable fun...

The Problem Code

The following code - abstracted just slightly(!) from production code - looks perfectly good. It's calling a function to get some statistics and then process them in some way. Getting the values in the first place involves a socket connection, which could fail with a socket error. Since statistics aren't vital to the running of the system, we simply log the error and move on.

(Note I'm using doctest to check this article - this is representative of a script doing real things!)

>>> def get_stats():
...     pass
...
>>> def do_something_with_stats(stats):
...     pass
...
>>> try:
...     stats = get_stats()
... except socket.error:
...     logging.warning("Can't get statistics")
... else:
...     do_something_with_stats(stats)

The Find

Our tests didn't find anything wrong, but actually paying some attention to our static analysis reports showed a problem:

$ flake8 filename.py
filename.py:351:1: F821 undefined name 'socket'
filename.py:352:1: F821 undefined name 'logging'

The problem with the code was that the socket and logging modules weren't imported in the module - and we clearly weren't testing for that case. What surprised me was that this didn't cause a NameError up front - I had assumed that exception clauses would have some eager name lookup - after all, if it needs to catch these exceptions, it needs to know what they are!

It turns out not so much - except clause lookups are done lazily, only evaluated if an exception is raised. Not only are the name lookups lazy, but the 'argument' of an except statement can be any arbitrary expression.

This can be good, bad, or just downright ugly.

The Good

Exception specifications can be handed around as any other values. This allows dynamic specification of the exceptions to be caught.

>>> def do_something():
...    blob
...
>>> def attempt(action, ignore_spec):
...     try:
...         action()
...     except ignore_spec:
...         pass
...
>>> attempt(do_something, ignore_spec=(NameError, TypeError))
>>> attempt(do_something, ignore_spec=TypeError)
Traceback (most recent call last):
  ...
NameError: global name 'blob' is not defined

The Bad

The downside of this dynamism is that mistakes in exception specifications often won't be noticed until it's too late - when the exception is triggered. When using exceptions to catch rare events (failure to open a file for writing for example), unless there is a test for that specific case, it won't be known until an exception (any exception) is triggered, at which point a check kicks in to see whether an exception matches, and causes an error all of its own - typically a NameError.

>>> def do_something():
...     return 1, 2
...
>>> try:
...     a, b = do_something()
... except ValuError:  # oops - someone can't type
...     print("Oops")
... else:
...     print("OK!")   # we are 'ok' until do_something returns a triple...
OK!

The Ugly

>>> try:
...    TypeError = ZeroDivisionError  # now why would we do this...?!
...    1 / 0
... except TypeError:
...    print("Caught!")
... else:
...    print("ok")
...
Caught!

The exception specification needn't just be a name lookup - arbitrary expressions also work:

>>> try:
...     1 / 0
... except eval(''.join('Zero Division Error'.split())):
...     print("Caught!")
... else:
...     print("ok")
...
Caught!

Not only can the exception spec be decided at run-time, it can even use the active exceptions' information. The following is a convoluted way to always catch the exception which is being raised - but nothing else:

>>> import sys
>>> def current_exc_type():
...     return sys.exc_info()[0]
...
>>> try:
...     blob
... except current_exc_type():
...     print ("Got you!")
...
Got you!

Clearly this is what we are _really_ looking for when we write exception handlers, and this should immediately be suggested as best practice :-p

The (Byte) Code

To confirm how it appears that exception handling works, I ran dis.dis() on an exception example. (Note the disassembly here is under Python2.7 - different byte code is produced under Python 3.3, but it's basically similar):

>>> import dis
>>> def x():
...     try:
...         pass
...     except Blobbity:
...         print("bad")
...     else:
...         print("good")
...
>>> dis.dis(x)  # doctest: +NORMALIZE_WHITESPACE
  2           0 SETUP_EXCEPT             4 (to 7)
<BLANKLINE>
  3           3 POP_BLOCK
              4 JUMP_FORWARD            22 (to 29)
<BLANKLINE>
  4     >>    7 DUP_TOP
              8 LOAD_GLOBAL              0 (Blobbity)
             11 COMPARE_OP              10 (exception match)
             14 POP_JUMP_IF_FALSE       28
             17 POP_TOP
             18 POP_TOP
             19 POP_TOP
<BLANKLINE>
  5          20 LOAD_CONST               1 ('bad')
             23 PRINT_ITEM
             24 PRINT_NEWLINE
             25 JUMP_FORWARD             6 (to 34)
        >>   28 END_FINALLY
<BLANKLINE>
  7     >>   29 LOAD_CONST               2 ('good')
             32 PRINT_ITEM
             33 PRINT_NEWLINE
        >>   34 LOAD_CONST               0 (None)
             37 RETURN_VALUE

This shows the 'issue' with my original expectations. Exception handling is done exactly as it 'looks' in the Python itself. The setup doesn't need to know anything about the subsequent 'catching' clauses, and they will be completely ignored if no exception is raised. SETUP_EXCEPT doesn't care what happens, just that if there is an exception, the first handler should be evaluated, and then the second, and so on.

Each handler has two parts: getting an exception spec, and comparing it to the just-raised exception. Everything is lazy, and everything appears exactly as you might expect from just looking at the code line-by-line, thinking about things from the point of view of a naive interpreter. Nothing clever happens, and that's what suddenly makes it seem very clever.

Summary

The dynamism of exception specs caught me by surprise slightly, but it has some interesting applications. Of course actually implementing many of those would probably be a bad idea ;-)

It isn't always intuitive how much dynamism certain Python features support - for example it isn't obvious that both expressions and statements are happily accepted directly in class scope (rather than function, method, or global scope), but not everything is so flexible. Although (I think) it would be nice, expressions are forbidden when applying decorators - the following is a syntax error in Python:

@(lambda fn: fn)
def x():
   pass

Here's a final example of playing with dynamic exception specifications to only propagate the first exception of a given type, silently swallowing repeated exceptions:

>>> class Pushover(object):
...     exc_spec = set()
...
...     def attempt(self, action):
...         try:
...             return action()
...         except tuple(self.exc_spec):
...             pass
...         except BaseException as e:
...             self.exc_spec.add(e.__class__)
...             raise
...
>>> pushover = Pushover()
>>>
>>> for _ in range(4):
...     try:
...         pushover.attempt(lambda: 1 / 0)
...     except:
...         print ("Boo")
...     else:
...         print ("Yay!")
Boo
Yay!
Yay!
Yay!

4 comments:

  1. Didn't know this before!!!

    BTW, in the machine code, SETUP_EXCEPT jumps to the `else` part directly, how does it know when an exception happens?

    ReplyDelete
  2. One word...

    pylint

    Unit testing is not a substitute for static analysis. In Java and C# the compiler does static analysis for you (and the IDE) but in Python you need to run pylint in your build process.

    ReplyDelete
  3. @Michael: Although, proper unit tests should exercise the exception handler and would have caught the error described above.

    ReplyDelete
  4. Just like unit testing, error handling and debugging are underrated. Nice explanations and keep it up. :)

    ReplyDelete