Saturday, 13 July 2013

Twisted Network Programming Essentials book review


Twisted Network Programming Essentials by Jessica McKellar & Abe Fettig (O'Reilly Media) gives an introduction to Twisted, a Python framework devoted to event-driven programming, and particularly it's application to networking. The book covers both high level general concepts of Twisted, as well as detailed examples covering some of the protocols Twisted supports, including my particular interests of HTTP and SSH.



In common with many technical books, things start slowly with a chapter on installation which (IMO) would have been better relegated to an Appendix. I like a technical book to start with motivating examples followed by an overview of the technology, and that's exactly how the second chapter 'Building Basic Clients and Servers' works - it is a really good introduction, describing Twisted's core architectural separation of Transports, Protocols, and the Reactor, with some solid introductory examples (including the obligatory echo server).

The next chapter, 'Writing Asynchronous Code with Deferreds', is slightly disappointing. According to the preface, this is a new chapter in the second edition (I've not read the first), and it certainly describes 'How' to use Deferreds, but I would have appreciated more on the 'Why' - the examples are contrived demonstrations of how things work, rather than demonstrating any real use. The chapter opens by stating 'Practice will help you develop an intuition for how to structure asynchronous code', which is undoubtedly true. But this chapter won't - and I'm not sure whether it's the book or Twisted that's at fault here. Again, maybe an appendix would have been more appropriate for this content, so the use could be seen in realistic examples first?

The remainder of the book covers a variety of protocols (HTTP, Mail, IRC, SSH) and various deployment and best practices, including authentication, integration with databases, and concurrency. The section on deployment was particularly useful, and I enjoyed learning about (and using) the range of features available 'out of the box' in the twistd program. The use of Twisted's 'manhole' functionality to provide Python shell access to a remote program over SSH was also a highlight.

Although the book hasn't yet motivated me to start using Twisted regularly, I do feel I now understand the basic approach and could apply it to the sort of tasks (primarily HTTP servers and clients) I'm interested in. The only things I think could have been improved would be to move chapters 1 & 3 to appendices, and some consideration about how Twisted fits into both the Python and wider event-driven world - to answer the question of why and when I should use Twisted rather than NodeJS or gevent, or for example how Twisted's deferreds compare to Python's own concurrent.futures, or Javascript's promises.

If you already know you are going to be using Twisted, but feel slightly apprehensive about it, I think this is an excellent place to start.


Thursday, 2 May 2013

The Dynamics of Catching Exceptions in Python

In which I discuss dynamism in catching exceptions - something which took me by surprise and could hide bugs - or enable fun...

The Problem Code

The following code - abstracted just slightly(!) from production code - looks perfectly good. It's calling a function to get some statistics and then process them in some way. Getting the values in the first place involves a socket connection, which could fail with a socket error. Since statistics aren't vital to the running of the system, we simply log the error and move on.

(Note I'm using doctest to check this article - this is representative of a script doing real things!)

>>> def get_stats():
...     pass
...
>>> def do_something_with_stats(stats):
...     pass
...
>>> try:
...     stats = get_stats()
... except socket.error:
...     logging.warning("Can't get statistics")
... else:
...     do_something_with_stats(stats)

The Find

Our tests didn't find anything wrong, but actually paying some attention to our static analysis reports showed a problem:

$ flake8 filename.py
filename.py:351:1: F821 undefined name 'socket'
filename.py:352:1: F821 undefined name 'logging'

The problem with the code was that the socket and logging modules weren't imported in the module - and we clearly weren't testing for that case. What surprised me was that this didn't cause a NameError up front - I had assumed that exception clauses would have some eager name lookup - after all, if it needs to catch these exceptions, it needs to know what they are!

It turns out not so much - except clause lookups are done lazily, only evaluated if an exception is raised. Not only are the name lookups lazy, but the 'argument' of an except statement can be any arbitrary expression.

This can be good, bad, or just downright ugly.

The Good

Exception specifications can be handed around as any other values. This allows dynamic specification of the exceptions to be caught.

>>> def do_something():
...    blob
...
>>> def attempt(action, ignore_spec):
...     try:
...         action()
...     except ignore_spec:
...         pass
...
>>> attempt(do_something, ignore_spec=(NameError, TypeError))
>>> attempt(do_something, ignore_spec=TypeError)
Traceback (most recent call last):
  ...
NameError: global name 'blob' is not defined

The Bad

The downside of this dynamism is that mistakes in exception specifications often won't be noticed until it's too late - when the exception is triggered. When using exceptions to catch rare events (failure to open a file for writing for example), unless there is a test for that specific case, it won't be known until an exception (any exception) is triggered, at which point a check kicks in to see whether an exception matches, and causes an error all of its own - typically a NameError.

>>> def do_something():
...     return 1, 2
...
>>> try:
...     a, b = do_something()
... except ValuError:  # oops - someone can't type
...     print("Oops")
... else:
...     print("OK!")   # we are 'ok' until do_something returns a triple...
OK!

The Ugly

>>> try:
...    TypeError = ZeroDivisionError  # now why would we do this...?!
...    1 / 0
... except TypeError:
...    print("Caught!")
... else:
...    print("ok")
...
Caught!

The exception specification needn't just be a name lookup - arbitrary expressions also work:

>>> try:
...     1 / 0
... except eval(''.join('Zero Division Error'.split())):
...     print("Caught!")
... else:
...     print("ok")
...
Caught!

Not only can the exception spec be decided at run-time, it can even use the active exceptions' information. The following is a convoluted way to always catch the exception which is being raised - but nothing else:

>>> import sys
>>> def current_exc_type():
...     return sys.exc_info()[0]
...
>>> try:
...     blob
... except current_exc_type():
...     print ("Got you!")
...
Got you!

Clearly this is what we are _really_ looking for when we write exception handlers, and this should immediately be suggested as best practice :-p

The (Byte) Code

To confirm how it appears that exception handling works, I ran dis.dis() on an exception example. (Note the disassembly here is under Python2.7 - different byte code is produced under Python 3.3, but it's basically similar):

>>> import dis
>>> def x():
...     try:
...         pass
...     except Blobbity:
...         print("bad")
...     else:
...         print("good")
...
>>> dis.dis(x)  # doctest: +NORMALIZE_WHITESPACE
  2           0 SETUP_EXCEPT             4 (to 7)
<BLANKLINE>
  3           3 POP_BLOCK
              4 JUMP_FORWARD            22 (to 29)
<BLANKLINE>
  4     >>    7 DUP_TOP
              8 LOAD_GLOBAL              0 (Blobbity)
             11 COMPARE_OP              10 (exception match)
             14 POP_JUMP_IF_FALSE       28
             17 POP_TOP
             18 POP_TOP
             19 POP_TOP
<BLANKLINE>
  5          20 LOAD_CONST               1 ('bad')
             23 PRINT_ITEM
             24 PRINT_NEWLINE
             25 JUMP_FORWARD             6 (to 34)
        >>   28 END_FINALLY
<BLANKLINE>
  7     >>   29 LOAD_CONST               2 ('good')
             32 PRINT_ITEM
             33 PRINT_NEWLINE
        >>   34 LOAD_CONST               0 (None)
             37 RETURN_VALUE

This shows the 'issue' with my original expectations. Exception handling is done exactly as it 'looks' in the Python itself. The setup doesn't need to know anything about the subsequent 'catching' clauses, and they will be completely ignored if no exception is raised. SETUP_EXCEPT doesn't care what happens, just that if there is an exception, the first handler should be evaluated, and then the second, and so on.

Each handler has two parts: getting an exception spec, and comparing it to the just-raised exception. Everything is lazy, and everything appears exactly as you might expect from just looking at the code line-by-line, thinking about things from the point of view of a naive interpreter. Nothing clever happens, and that's what suddenly makes it seem very clever.

Summary

The dynamism of exception specs caught me by surprise slightly, but it has some interesting applications. Of course actually implementing many of those would probably be a bad idea ;-)

It isn't always intuitive how much dynamism certain Python features support - for example it isn't obvious that both expressions and statements are happily accepted directly in class scope (rather than function, method, or global scope), but not everything is so flexible. Although (I think) it would be nice, expressions are forbidden when applying decorators - the following is a syntax error in Python:

@(lambda fn: fn)
def x():
   pass

Here's a final example of playing with dynamic exception specifications to only propagate the first exception of a given type, silently swallowing repeated exceptions:

>>> class Pushover(object):
...     exc_spec = set()
...
...     def attempt(self, action):
...         try:
...             return action()
...         except tuple(self.exc_spec):
...             pass
...         except BaseException as e:
...             self.exc_spec.add(e.__class__)
...             raise
...
>>> pushover = Pushover()
>>>
>>> for _ in range(4):
...     try:
...         pushover.attempt(lambda: 1 / 0)
...     except:
...         print ("Boo")
...     else:
...         print ("Yay!")
Boo
Yay!
Yay!
Yay!

Thursday, 3 January 2013

New pylibftdi release - 0.11

'pylibftdi' is a library for talking to FTDI devices via the libftdi library. FTDI make a wide range of chipsets and modules for interfacing to a number of protocols via USB, including 8-bit parallel and RS232 serial modes. They're a great way of interfacing to other electronics from your computer.

I've just released pylibftdi 0.11. I'm at the point where I'm looking at getting to RC and then stable status, which I'll release as 1.0 - at which point the API will be considered stable. While it isn't yet, I've taken the opportunity to tidy a couple of things, as well as add some improvements.

Raspberry Pi support; better documentation

Though it worked previously, I've taken the opportunity to test it a bit on Raspberry Pi, and I've updated the docs describing udev rules which allow access to the devices without needing sudo / root access. I think this is now a good option if you want a bidirectional 8 bit port for your Raspberry Pi, and it's certainly lower risk of damaging your Pi than using the GPIO pins directly.

BitBangDevice changes

The new latch property

BitBangDevices provide a simple abstraction of a parallel IO device; a 'direction' property which controls which line is an input or output, and a 'port' property for the actual reads and writes. This is based on systems going all the way back to the BBC micro user port and earlier. direction maps to the 'Data Direction Register' of the Beeb, the 'TRISx' register of the Microchip PIC, and so on. port maps to the 'data' register of the Beeb, or the PORTx register of the Microchip PIC. Just as the PIC18F series introduced the 'LATx' register, so too does pylibftdi 0.11 introduce the latch. Read the documentation for more information - in most cases you simply don't need to care about this.

Initialisation

If a physical FTDI device is not reset between program runs, then it retains its output register state; a pin set high in one run of the program would still be high when the device was opened in a subsequent program run. Prior to pylibftdi v0.11, this was not taken into account, and the assumed state of all output pins was that they were at the reset state, i.e. all low. This meant that operations such as read-modify-write on port bits would not reflect the current state, as they do not do a read based on the output state of the port, but based on the internal view of what output values are set to.

With the change, the following will work as expected:

$ python
>>> from pylibftdi import BitBangDevice
>>> bb = BitBangDevice()
>>> bb.port |= 1
>>> ^D
$ python
>>> from pylibftdi import BitBangDevice
>>> bb = BitBangDevice()
>>> bb.port |= 2
>>> ^D

Previously, the final state of the device pins would have been '2'; the read-modify-write implied by |= 2 would have used '0' is its 'read' source, and have output '2'. The new code initialises the internal latch state to the value read from the pins (it's possible to read the actual state of output pins as well as input pins). With the latest version, the final state of the pins after the above will be '3' - both D0 and D1 set high.

API changes

I've always said in the README for pylibftdi that the API won't be stable until version 1.0, and I've changed two parameters only introduced in 0.10.x to have clearer names.

The following two parameters to the Device constructor have changed name:

interface -> interface_select
I considered interface too generic and unintuitive here. The values and behaviour for this parameter (which selects which interface to use on a multi-interface device) haven't changed.
buffer_size -> chunk_size
This is the maximum number of bytes which will be written / read at a time in read/write calls to the libftdi library, designed to ensure we are regularly executing at least some Python byte code, which we can then interrupt (timely Ctrl-C interruption is the primary use-case for this parameter). It was never about buffering, so I've changed the name to reflect this.

Other changes

The bit_server example now works properly; this can be run as:

$ python -m pylibftdi.examples.bit_server

and will start a basic CGI-based web server, open a web-browser talking to it (on port 8008 by default), and allow you to control the state of each of the 8 output lines on the connected device (which it sets to async bit-bang mode).

This will be further developed in the future - it looks somewhat rough right now :)

The led_flash example has also gained a feature in taking a command line argument of the rate at which to flash - defaulting to 1 Hz. To cause an LED (or a piezo buzzer works just as well - and more annoyingly!) to flash at 10Hz, run:

$ python -m pylibftdi.examples.led_flash 10

Coming next

I'm still trying to improve test coverage. I spent some time trying to port the tests to the Mock library, though my efforts at effectively patching at the ctypes DLL level weren't very successful.

Documentation continues, and thanks to the wonderful readthedocs.org, the documentation isn't necessarily tied to the more sedate release cycle - it always shows the latest version from Bitbucket. If more API changes happen this could be counter-productive, but I'll try really hard to note if this is the case, and it makes things much nicer when updating things like installation instructions (which I have done, adding tested udev rules instructions etc).

libftdi 1.0 is just going through release candidate stage at the moment, so I'll test against that. I expect only the installation docs will need changes.

I've never tested pylibftdi on Windows, and I'm keen to do this in the near future, though I don't have regular access to a Windows machine, so no guarantees about this. I suspect it all 'just works'...