tag:blogger.com,1999:blog-30294388163242830012024-02-07T04:20:23.393+00:00Coded StructureThoughts on programming, systems, structure and Truth...Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.comBlogger38125tag:blogger.com,1999:blog-3029438816324283001.post-73804940255012493602014-04-26T22:41:00.000+01:002014-04-26T22:42:16.985+01:00brotmap - precomputing the Mandelbrot set<p>I’ve not written a blog post for ages. Maybe sporadic posts are inevitable. Anyway, here’s one which has been sitting in draft form for a couple of years ago and I’ve just managed to drag it up-to-date.</p>
<p><em><strong>tl;dr</strong> Compute and store high-resolution sampling of the Mandelbrot set, in a way which can be incrementally updated (e.g. to increase maximum iteration count) and is independent of any image which can then be generated from it.</em></p>
<div class="separator" style="clear: both; text-align: center;"><a href="http://codedstructure.net/images/mandel7.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="http://codedstructure.net/images/mandel7.png" /></a></div>
<p>I’ve been somewhat fascinated by fractals for over two decades now (that makes me feel old :-) ), and the Mandelbrot set is both common and relatively easy to understand and program. I’m not going to go into details here - take a look at the <a href="http://en.wikipedia.org/wiki/Mandelbrot_set">Wikipedia page</a>.</p>
<p>The usual thing with Mandelbrot plotters is to evaluate the Mandelbrot set over a given area of the complex plane and render the result as a colourful picture. Depending on the hardware, area selected, and precision, this can take from milliseconds (rough rendering on a GPU) to many hours. But however long it takes, the typical process is to re-evaluate it in real time, each time. I’ve done an example of that in JavaScript <a href="http://codedstructure.net/projects/mandel_canvas/">here</a>. There are many others in all sorts of programming languages.</p>
<p><code>brotmap</code> is a bit different - it’s thinking about the question “What if we <strong>pre-calculated</strong> and stored the Mandelbrot set, to a sensible degree of accuracy, such that we could render images from the pre-calculated version?”</p>
<p>An analogy could be a sampling synthesizer. The work required to produce a tone from a sampler is considerably less than from a complex synth. Back in-the-day (two decades ago) I would pre-generate tables of sines for graphical plasma effects and so-on, because a table lookup was much faster than a sin(x) calculation even on a top-of-the-range 486. Today that would be crazy; memory is the bottleneck today, and table lookups of just about any sort are to be regarded with suspicion.</p>
<p>But that is exactly the point and purpose of <code>brotmap</code>. Its grand but insane idea is this: <strong><em>let’s precalculate the Mandelbrot set</em></strong>. (Well, actually the point and purpose of <code>brotmap</code> is to have a play around and maybe try out some new (or not-so-new) things along the way, but that’s not very profound).</p>
<p>There are a couple of things which needed to be decided before we go off and do such a silly thing. What are the input parameters? What is the end result? Starting with the output format, a coloured image isn’t much use to anyone; we need a something lower level. What we really want is an iteration count at bailout; that is what the colours in funky fractal images are based on anyway. By storing the iteration count, we can apply any colour map we like at a later point, or turn the map into a 3D height map, or anything else which may or may not be interesting.</p>
<p>On the input side, we need to specify the area we are interested in, the resolution, and the maximum iteration count. A square area from –2..+1 on the real axis and –1.5..+1.5 on the imaginary works well as a outer boundary, and the resolution can be as high as we like. For performance and accuracy we want each point to be accurately representable by a floating point number, so <code>brotmap</code> uses a step size of 2<sup>-n</sup> for some n.</p>
<p>There is no point having high resolution if we don’t also have a high maximum iteration count. One key ‘feature’ of brotmap is that it allows incremental increases in iteration count. If a map is made with a MAX_ITER count of 1024, then the work generating that map can be reused by using it as a starting point in further iterations. To achieve this, not only is the iteration-count-at-bailout stored for each point, but also (for points which have so far not reached bailout), the current value of the complex number in the iterative calculation. To prevent precision loss, these are stored as a pair of double precision numbers (2x8 bytes per point). But if the point is definitely <em>not</em> in the M-set, then we no longer need that information - just the iteration count.</p>
<h2 id="anonymousunionstotherescue">Anonymous unions to the rescue</h2>
<p>These maps clearly get rather large. At a step size of just 2<sup>–10</sup>, there are 3*3 (the image area on the complex plane) * 2<sup>10</sup> (the number of points per unit in each row) * 2<sup>10</sup> (the number of rows per unit) = 9.5 million points. And each of these has to store a good few bits of data - at least two double precision floating point values for points which could still be found to be in the M-set, and the bailout iteration count for those that have been excluded from the set.</p>
<p>Since we only care about either the current iteration values of <em>re</em> and <em>im</em>, <strong>or</strong> the number of iterations at which we exceeded our bailout condition, we can use unions to store both sets of information in the same space. But we also need a way of determining which type of data each point contains. Fortunately, IEEE754 floating point comes to our rescue here, because there are some special bit patterns we can use as sentinels - they will never appear in the course of (our) floating point evaluations, but we can set them and detect them. Amongst these values are the NaNs. Not-a-Number values allow us to use one of the pair of double floats to indicate that the point is outside the M-set, and that the other value should be treated as an integer iteration count.</p>
<pre>
struct pinfo {
double x;
union {
double y;
long itercount;
};
};
</pre>
<p>One of the great things about C++ is support for anonymous unions. That union in the <code>pinfo</code> struct? No name. <em>Anonymous</em>, you might say. These types allow operations to all members of the union to be transparent - nothing in the code needs to know the structure even <em>is</em> a union.</p>
<p>To make the point clearer, the <code>pinfo</code> struct could have looked like this instead:</p>
<pre>
struct pinfo {
double x
double y;
long itercount;
};
</pre>
<p>and nothing else in the code would have to change, except that we would be using 50% more storage (assuming the size of a <code>long</code> is also 8 bytes, typically true on 64 bit machines).</p>
<p>OK, so we have a basic input spec, output spec, and the M-set calculation itself is straightforward. But we’ve still got to write out gigabytes or more of data for anything interesting. We don’t want messy IO code cluttering up the rest of the code, do we?</p>
<h2 id="mmaptotherescue">mmap to the rescue</h2>
<p><code>mmap</code> is awesome. It’s not the easiest API to setup and clean up, but neither is it difficult, and in-between these steps it gets out of your way. Like totally-invisible out of your way. I can imagine that using it with a 32 bit virtual address space would be a pain, as you’d have to continually re-map different sections of a large (multi-gigabyte) file into the limited address space, but with a 64 bit VAS, it feels like magic. That structure of millions of 16 byte points? Wave a wand, and it’s backed by a file. No read operations, write operations, anything else at the user level. No stdio buffering, flushing, seeking. Just the C(++) memory model, and the OS does the rest. It feels like cheating - and maybe it is to use it like this - but remember this is a crazy pointless program, right?</p>
<h2 id="pthreadstotherescue">pthreads to the rescue</h2>
<p>Mandelbrot calculation is a trivially parallelizable problem. And I have multiple cores in my machine (only two, but…), so it would be nice to get a speedup from them. Sadly I’m more than a little late to this party. The C++11 standard has got threading support, and I’ll use this as an opportunity to learn that later, but for now I’ve learnt a minimum of pthreads coding to get this working. It’s simple enough; use <code>pthread_create</code> to create each thread, and have a mutex lock around shared data.</p>
<h1 id="renderingthedata">Rendering the data</h1>
<p>Of course, this wouldn’t be much fun without actually being able to have some visual representation of the output, so <code>make_ppm</code> is a separate program which reads the data files and outputs a PPM file rendering the M-set in basic greyscale. Colour maps can wait :-)</p>
<p>Note I’m just using PPM as a lowest-common-denominator file format. It’s trivial for this sort of thing, though it does produce large (uncompressed) files, taking 3 bytes per pixel.</p>
<p>pnmtopng will convert a PPM file to the more useful png. (pnmtopng is part of <a href="http://netpbm.sourceforge.net">netbpm</a> - available for most Linux distributions or as part of <a href="http://brew.sh">homebrew</a> for Mac, though ppm2tiff seems to be pre-installed on Mac and will suffice).</p>
<h1 id="runningit">Running it</h1>
<p>The code for <code>brotmap</code> is available on <a href="http://bitbucket.org/codedstructure/brotmap">bitbucket</a>, or <a href="http://github.com/codedstructure/brotmap">github</a> if you prefer that.</p>
<p>The makefile includes a target which will build and display the output (subject to dependencies - tested on Linux & Mac OS X with netpbm installed):</p>
<pre><code>make show
</code></pre>
<p>This will compile the two programs (<code>brotmap</code> and <code>make_ppm</code>), and then run things (ignoring directories etc) as follows:</p>
<pre><code>./brotmap mandel.dat 10
./make_ppm mandel.dat out.ppm
pnmtopng out.ppm > image.png
open image.png
</code></pre>
<p>This computes a set of data for a 3072x3072 sampling of the Mandelbrot set, then renders a PPM file from it, converts to a more friendly format, and then (hopefully) displays it on-screen.</p>
<p><code>brotmap</code> takes two arguments: the target filename, and a ‘binary digits’ value, dictating the resolution of the computed filename. Note the output filenames will be large:</p>
<table>
<colgroup>
<col style="text-align:left;"/>
<col style="text-align:left;"/>
<col style="text-align:left;"/>
<col style="text-align:left;"/>
</colgroup>
<thead>
<tr>
<th style="text-align:left;">bit_size</th>
<th style="text-align:left;">res (x*y)</th>
<th style="text-align:left;">points</th>
<th style="text-align:left;">file size</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;">10</td>
<td style="text-align:left;">3072</td>
<td style="text-align:left;">9437184</td>
<td style="text-align:left;">144 MB</td>
</tr>
<tr>
<td style="text-align:left;">11</td>
<td style="text-align:left;">6144</td>
<td style="text-align:left;">37748736</td>
<td style="text-align:left;">576 MB</td>
</tr>
<tr>
<td style="text-align:left;">12</td>
<td style="text-align:left;">12288</td>
<td style="text-align:left;">150994944</td>
<td style="text-align:left;">2.3 GB</td>
</tr>
<tr>
<td style="text-align:left;">13</td>
<td style="text-align:left;">24576</td>
<td style="text-align:left;">603979776</td>
<td style="text-align:left;">9.2 GB</td>
</tr>
<tr>
<td style="text-align:left;">14</td>
<td style="text-align:left;">49152</td>
<td style="text-align:left;">2415919104</td>
<td style="text-align:left;">36.86 GB</td>
</tr>
<tr>
<td style="text-align:left;">15</td>
<td style="text-align:left;">98304</td>
<td style="text-align:left;">9663676416</td>
<td style="text-align:left;">147.5 GB</td>
</tr>
<tr>
<td style="text-align:left;">16</td>
<td style="text-align:left;">196608</td>
<td style="text-align:left;">38654705664</td>
<td style="text-align:left;">589.8 GB</td>
</tr>
</tbody>
</table>
<p>The default which various Make targets use is a binary size of 10. 12 is fairly quick, and I’ve tried 14 once or twice.</p>
<p><code>make_ppm</code> takes two arguments; the input file generated by <code>brotmap</code>, and the output file which will be in PPM format (subformat P6).</p>
<p>See an example png (a 12288x12288 resolution greyscale image <a href="http://codedstructure.net/images/mandel12.png">here</a> - though note it may stress your browser slightly. This is computed to an iteration count of 4096, with binary digits of 12. Note that the 2.3 GB source data for this result in a PNG file of only 4 MB…</p>
<p>A smaller example (binary digits of 10) is <a href="http://codedstructure.net/images/mandel10.png">here</a>.</p>
<h1 id="whatsnext">What’s next?</h1>
<ul>
<li><p>Better command line parsing (e.g. for iteration count, step size…) - there’s some in there, but it’s very crude.</p></li>
<li><p>Incremental spatial updates - incremental updates based on iteration count are nice, but what’s really needed are incremental <em>resolution</em> increases. It should be possible to increase resolution by a factor of two in each direction by keeping the current set of data as one of the four points being evaluated for each of the original points, so doubling the number of points takes the same amount of time as the previous round (assuming that data is available). It might make sense to completely change the structure of the data in memory / on-disk to support this operation.</p></li>
<li><p>C++11 based concurrency - it won’t get much new, though I’ll get round to automatically working out the appropriate number of threads to use.</p></li>
<li><p>Use of mmap-based IO in <code>make_ppm</code> as well as <code>brotmap</code>. Again, won’t get anything new, but will clean things up.</p></li>
<li><p>Improvements to make_ppm - it should be possible to pull out a small section of the data and only render a selected area. Selectable colourmaps (something other than grayscale) would be nice too.</p></li>
<li><p>Distributed parallelism - this is a major step up in terms of complexity, but definitely doable. I like to keep things low-level and primitive (and yet still portable - that’s what POSIX is all about), so I’ll probably do something socket based first, or maybe zeromq…</p></li>
<li><p>Improved performance per core - the M-set calculation per point is very basic, with a single optimisation that it knows that points within the major circle and cardioid are within the M-set. Further optimisations could be to use SIMD parallelism (SSE3).</p></li>
<li><p>Smooth colouring; most mandelbrot plotters don’t just use a simple iteration count - colour mapping, but compute some ‘distance’ factor from which to derive the colour.</p></li>
</ul>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com0tag:blogger.com,1999:blog-3029438816324283001.post-4509017162870621352013-07-13T22:46:00.000+01:002013-07-13T22:46:10.757+01:00Twisted Network Programming Essentials book review<br />
Twisted Network Programming Essentials by Jessica McKellar & Abe Fettig (O'Reilly Media) gives an introduction to Twisted, a Python framework devoted to event-driven programming, and particularly it's application to networking. The book covers both high level general concepts of Twisted, as well as detailed examples covering some of the protocols Twisted supports, including my particular interests of HTTP and SSH.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJiwSXDCcMZ_aIvSm5T6n0n-FHKBZ0-nPbtOIUvfBWAq_hJLPLrM2L3AXbSM0rEW4NXpncvPlh84SYB5SV1daiReun50dFif6Lgy7WCMf8XUyPYNzlHON56jnr5cLMxMExvQsMgxDlZ0c/s1600/cat.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJiwSXDCcMZ_aIvSm5T6n0n-FHKBZ0-nPbtOIUvfBWAq_hJLPLrM2L3AXbSM0rEW4NXpncvPlh84SYB5SV1daiReun50dFif6Lgy7WCMf8XUyPYNzlHON56jnr5cLMxMExvQsMgxDlZ0c/s1600/cat.gif" /></a></div>
<div style="text-align: center;">
<br /></div>
<div style="text-align: center;">
<a href="http://shop.oreilly.com/product/0636920025016.do">http://shop.oreilly.com/product/0636920025016.do</a></div>
<br />
In common with many technical books, things start slowly with a chapter on installation which (IMO) would have been better relegated to an Appendix. I like a technical book to start with motivating examples followed by an overview of the technology, and that's exactly how the second chapter 'Building Basic Clients and Servers' works - it is a really good introduction, describing Twisted's core architectural separation of Transports, Protocols, and the Reactor, with some solid introductory examples (including the obligatory echo server).<br />
<br />
The next chapter, 'Writing Asynchronous Code with Deferreds', is slightly disappointing. According to the preface, this is a new chapter in the second edition (I've not read the first), and it certainly describes 'How' to use Deferreds, but I would have appreciated more on the 'Why' - the examples are contrived demonstrations of how things work, rather than demonstrating any real use. The chapter opens by stating 'Practice will help you develop an intuition for how to structure asynchronous code', which is undoubtedly true. But this chapter won't - and I'm not sure whether it's the book or Twisted that's at fault here. Again, maybe an appendix would have been more appropriate for this content, so the use could be seen in realistic examples first?<br />
<br />
The remainder of the book covers a variety of protocols (HTTP, Mail, IRC, SSH) and various deployment and best practices, including authentication, integration with databases, and concurrency. The section on deployment was particularly useful, and I enjoyed learning about (and using) the range of features available 'out of the box' in the twistd program. The use of Twisted's 'manhole' functionality to provide Python shell access to a remote program over SSH was also a highlight.<br />
<br />
Although the book hasn't yet motivated me to start using Twisted regularly, I do feel I now understand the basic approach and could apply it to the sort of tasks (primarily HTTP servers and clients) I'm interested in. The only things I think could have been improved would be to move chapters 1 & 3 to appendices, and some consideration about how Twisted fits into both the Python and wider event-driven world - to answer the question of why and when I should use Twisted rather than NodeJS or gevent, or for example how Twisted's deferreds compare to Python's own concurrent.futures, or Javascript's promises.<br />
<br />
If you already know you are going to be using Twisted, but feel slightly apprehensive about it, I think this is an excellent place to start.<br />
<br />
<br />
Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com0tag:blogger.com,1999:blog-3029438816324283001.post-23764522403361122872013-05-02T23:32:00.000+01:002013-05-02T23:56:29.500+01:00The Dynamics of Catching Exceptions in Python<p>In which I discuss dynamism in catching exceptions - something which took
me by surprise and could hide bugs - or enable fun...</p>
<h3>The Problem Code</h3>
<p>The following code - abstracted just slightly(!) from production code - looks
perfectly good. It's calling a function to get some statistics and then process them in some
way. Getting the values in the first place involves a socket connection, which
could fail with a socket error. Since statistics aren't vital to the running
of the system, we simply log the error and move on.</p>
<p>(Note I'm using doctest to check this article - this is representative of a
script doing real things!)</p>
<pre class="literal-block prettyprint">>>> def get_stats():
... pass
...
>>> def do_something_with_stats(stats):
... pass
...
>>> try:
... stats = get_stats()
... except socket.error:
... logging.warning("Can't get statistics")
... else:
... do_something_with_stats(stats)
</pre>
<h4>
The Find</h4>
<p>Our tests didn't find anything wrong, but actually paying some attention to
our static analysis reports showed a problem:</p>
<pre class="literal-block">$ flake8 filename.py
filename.py:351:1: F821 undefined name 'socket'
filename.py:352:1: F821 undefined name 'logging'
</pre>
<p>The problem with the code was that the socket and logging modules weren't
imported in the module - and we clearly weren't testing for that case.
What surprised me was that this didn't cause a NameError up front - I had
assumed that exception clauses would have some eager name lookup - after all,
if it needs to catch these exceptions, it needs to know what they are!</p>
<p>It turns out not so much - <tt class="docutils literal">except</tt> clause lookups are done lazily, only
evaluated if an exception is raised. Not only are the name lookups lazy, but
the 'argument' of an <tt class="docutils literal">except</tt> statement can be any arbitrary expression.</p>
<p>This can be good, bad, or just downright ugly.</p>
<h4>
The Good</h4>
<p>Exception specifications can be handed around as any other values. This allows
dynamic specification of the exceptions to be caught.</p>
<pre class="literal-block prettyprint">>>> def do_something():
... blob
...
>>> def attempt(action, ignore_spec):
... try:
... action()
... except ignore_spec:
... pass
...
>>> attempt(do_something, ignore_spec=(NameError, TypeError))
>>> attempt(do_something, ignore_spec=TypeError)
Traceback (most recent call last):
...
NameError: global name 'blob' is not defined
</pre>
<h4>
The Bad</h4>
<p>The downside of this dynamism is that mistakes in exception specifications
often won't be noticed until it's too late - when the exception is triggered.
When using exceptions to catch rare events (failure to open a file for writing
for example), unless there is a test for that specific case, it won't be known
until an exception (any exception) is triggered, at which point a check kicks
in to see whether an exception matches, and causes an error all of its own -
typically a NameError.</p>
<pre class="literal-block prettyprint">>>> def do_something():
... return 1, 2
...
>>> try:
... a, b = do_something()
... except ValuError: # oops - someone can't type
... print("Oops")
... else:
... print("OK!") # we are 'ok' until do_something returns a triple...
OK!
</pre>
<h4>
The Ugly</h4>
<pre class="literal-block prettyprint">>>> try:
... TypeError = ZeroDivisionError # now why would we do this...?!
... 1 / 0
... except TypeError:
... print("Caught!")
... else:
... print("ok")
...
Caught!
</pre>
<p>The exception specification needn't just be a name lookup - arbitrary
expressions also work:</p>
<pre class="literal-block prettyprint">>>> try:
... 1 / 0
... except eval(''.join('Zero Division Error'.split())):
... print("Caught!")
... else:
... print("ok")
...
Caught!
</pre>
<p>Not only can the exception spec be decided at run-time, it can even use the
active exceptions' information. The following is a convoluted way to always
catch the exception which is being raised - but nothing else:</p>
<pre class="literal-block prettyprint">>>> import sys
>>> def current_exc_type():
... return sys.exc_info()[0]
...
>>> try:
... blob
... except current_exc_type():
... print ("Got you!")
...
Got you!
</pre>
<p><em>Clearly this is what we are _really_ looking for when we write exception
handlers, and this should immediately be suggested as best practice :-p</em></p>
<h4>
The (Byte) Code</h4>
<p>To confirm how it <em>appears</em> that exception handling works, I ran <tt class="docutils literal">dis.dis()</tt>
on an exception example. (Note the disassembly here is under Python2.7 -
different byte code is produced under Python 3.3, but it's basically similar):</p>
<pre class="literal-block prettyprint">>>> import dis
>>> def x():
... try:
... pass
... except Blobbity:
... print("bad")
... else:
... print("good")
...
>>> dis.dis(x) # doctest: +NORMALIZE_WHITESPACE
2 0 SETUP_EXCEPT 4 (to 7)
<BLANKLINE>
3 3 POP_BLOCK
4 JUMP_FORWARD 22 (to 29)
<BLANKLINE>
4 >> 7 DUP_TOP
8 LOAD_GLOBAL 0 (Blobbity)
11 COMPARE_OP 10 (exception match)
14 POP_JUMP_IF_FALSE 28
17 POP_TOP
18 POP_TOP
19 POP_TOP
<BLANKLINE>
5 20 LOAD_CONST 1 ('bad')
23 PRINT_ITEM
24 PRINT_NEWLINE
25 JUMP_FORWARD 6 (to 34)
>> 28 END_FINALLY
<BLANKLINE>
7 >> 29 LOAD_CONST 2 ('good')
32 PRINT_ITEM
33 PRINT_NEWLINE
>> 34 LOAD_CONST 0 (None)
37 RETURN_VALUE
</pre>
<p>This shows the 'issue' with my original expectations. Exception handling is
done exactly as it 'looks' in the Python itself. The setup doesn't need to know
anything about the subsequent 'catching' clauses, and they will be completely
ignored if no exception is raised. <tt class="docutils literal">SETUP_EXCEPT</tt> doesn't care what happens,
just that if there is an exception, the first handler should be evaluated, and
then the second, and so on.</p>
<p>Each handler has two parts: getting an exception spec, and comparing it to the
just-raised exception. Everything is lazy, and everything appears exactly as
you might expect from just looking at the code line-by-line, thinking about
things from the point of view of a naive interpreter. Nothing clever happens,
and that's what suddenly makes it seem very clever.</p>
<h4>
Summary</h4>
<p>The dynamism of exception specs caught me by surprise slightly, but it has
some interesting applications. Of course actually implementing many of those
would probably be a bad idea ;-)</p>
<p>It isn't always intuitive how much dynamism certain Python features support -
for example it isn't obvious that both expressions and statements are happily
accepted directly in class scope (rather than function, method, or global
scope), but not everything is so flexible. Although (I think) it would be nice,
expressions are forbidden when applying decorators - the following is a syntax
error in Python:</p>
<pre class="literal-block prettyprint">@(lambda fn: fn)
def x():
pass
</pre>
<p>Here's a final example of playing with dynamic exception specifications to
only propagate the first exception of a given type, silently swallowing
repeated exceptions:</p>
<pre class="literal-block prettyprint">>>> class Pushover(object):
... exc_spec = set()
...
... def attempt(self, action):
... try:
... return action()
... except tuple(self.exc_spec):
... pass
... except BaseException as e:
... self.exc_spec.add(e.__class__)
... raise
...
>>> pushover = Pushover()
>>>
>>> for _ in range(4):
... try:
... pushover.attempt(lambda: 1 / 0)
... except:
... print ("Boo")
... else:
... print ("Yay!")
Boo
Yay!
Yay!
Yay!
</pre>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com4tag:blogger.com,1999:blog-3029438816324283001.post-62550073903182038622013-01-03T23:12:00.001+00:002013-01-06T09:12:21.442+00:00New pylibftdi release - 0.11<p>'pylibftdi' is a library for talking to FTDI devices via the libftdi library. FTDI make a wide range of chipsets and modules for interfacing to a number of protocols via USB, including 8-bit parallel and RS232 serial modes. They're a great way of interfacing to other electronics from your computer.</p>
<p>I've just released <a href="http://pypi.python.org/pypi/pylibftdi">pylibftdi</a> 0.11. I'm at the point where I'm looking at
getting to RC and then stable status, which I'll release as 1.0 - at which
point the API will be considered stable. While it isn't yet, I've taken
the opportunity to tidy a couple of things, as well as add some
improvements.</p>
<h3>Raspberry Pi support; better documentation</h3>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhikp8fIsknE1HYmjxzOaS1-stYIZr1udCUX8SGP3-kGXJlTVPTZW_Qtki3J_R6G4XH2o5-4r8peD5T0ggXgEMaebyOVPV0xXUtNvPjMdfuMgDoenayu0zGgSNW93j2JT952Mg1jW6qV4o/s1600/DSCF2808.JPG" imageanchor="1" style="clear:right; float:right; margin-left:1em; margin-bottom:1em"><img border="0" height="298" width="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhikp8fIsknE1HYmjxzOaS1-stYIZr1udCUX8SGP3-kGXJlTVPTZW_Qtki3J_R6G4XH2o5-4r8peD5T0ggXgEMaebyOVPV0xXUtNvPjMdfuMgDoenayu0zGgSNW93j2JT952Mg1jW6qV4o/s400/DSCF2808.JPG" /></a></div><p>Though it worked previously, I've taken the opportunity to test it a bit on
Raspberry Pi, and I've updated the <a class="reference external" href="http://pylibftdi.readthedocs.org/en/latest/installation.html#debian-raspberry-pi-ubuntu-etc">docs</a> describing udev rules which allow
access to the devices without needing sudo / root access. I think this is now
a good option if you want a bidirectional 8 bit port for your Raspberry Pi,
and it's certainly lower risk of damaging your Pi than using the GPIO pins
directly.</p>
<h3>BitBangDevice changes</h3>
<h4>The new <tt>latch</tt> property</h4>
<p>BitBangDevices provide a simple abstraction of a parallel IO device;
a 'direction' property which controls which line is an input or output, and
a 'port' property for the actual reads and writes. This is based on systems
going all the way back to the BBC micro user port and earlier. <tt>direction</tt>
maps to the 'Data Direction Register' of the Beeb, the 'TRISx' register of the
Microchip PIC, and so on. <tt class="docutils literal">port</tt> maps to the 'data' register of the Beeb, or
the PORTx register of the Microchip PIC. Just as the PIC18F series introduced
the 'LATx' register, so too does pylibftdi 0.11 introduce the <tt>latch</tt>.
Read the <a class="reference external" href="http://pylibftdi.readthedocs.org/en/latest/how_to.html#what-is-the-difference-between-the-port-and-latch-bitbangdevice-properties">documentation</a> for more information - in most cases you simply don't
need to care about this.</p>
<h4>Initialisation</h4>
<p>If a physical FTDI device is not reset between program runs, then it retains
its output register state; a pin set high in one run of the program would still
be high when the device was opened in a subsequent program run. Prior to
pylibftdi v0.11, this was not taken into account, and the assumed state of all
output pins was that they were at the reset state, i.e. all low. This meant
that operations such as read-modify-write on port bits would not reflect the
current state, as they do not do a read based on the output state of the
port, but based on the internal view of what output values are set to.</p>
<p>With the change, the following will work as expected:</p>
<pre class="prettyprint"><code>$ python
>>> from pylibftdi import BitBangDevice
>>> bb = BitBangDevice()
>>> bb.port |= 1
>>> ^D
$ python
>>> from pylibftdi import BitBangDevice
>>> bb = BitBangDevice()
>>> bb.port |= 2
>>> ^D
</code></pre>
<p>Previously, the final state of the device pins would have been '2'; the read-modify-write implied by <tt class="docutils literal">|= 2</tt> would have used '0' is its 'read' source,
and have output '2'. The new code initialises the internal latch state to
the value read from the pins (it's possible to read the actual state of
output pins as well as input pins). With the latest version, the final state
of the pins after the above will be '3' - both D0 and D1 set high.</p>
<h3>API changes</h3>
<p>I've always said in the README for pylibftdi that the API won't be stable until
version 1.0, and I've changed two parameters only introduced in 0.10.x to have
clearer names.</p>
<p>The following two parameters to the <tt class="docutils literal">Device</tt> constructor have changed name:</p>
<dl class="docutils">
<dt><tt class="docutils literal">interface</tt> -> <tt class="docutils literal">interface_select</tt></dt>
<dd>I considered <tt class="docutils literal">interface</tt> too generic and unintuitive here. The values and
behaviour for this parameter (which selects which interface to use on a
multi-interface device) haven't changed.</dd>
<dt><tt class="docutils literal">buffer_size</tt> -> <tt class="docutils literal">chunk_size</tt></dt>
<dd>This is the maximum number of bytes which will be written / read at a time
in read/write calls to the libftdi library, designed to ensure we are
regularly executing at least some Python byte code, which we can then
interrupt (timely Ctrl-C interruption is the primary use-case for this
parameter). It was never about buffering, so I've changed the name to
reflect this.</dd>
</dl>
<h3>Other changes</h3>
<p>The <tt>bit_server</tt> example now works properly; this can be run as:</p>
<pre class="prettyprint"><code>$ python -m pylibftdi.examples.bit_server
</code></pre>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlJnWsBXfjO6gx9bodkf4Bq17XurRnt2Ges-xrH92nH1zLjmbKQz7abI2tUqIxR0Ynk0M9_4PiZjGoejjbToa_RmRj1KMy6C8Au-trIaEwiAbxoAH3ktCJ3HWefkDsLnT3jSpmIQdijB0/s1600/Screen+Shot+2013-01-03+at+22.20.11.png" imageanchor="1" style="clear:right; float:right; margin-left:1em; margin-bottom:1em"><img border="0" height="145" width="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlJnWsBXfjO6gx9bodkf4Bq17XurRnt2Ges-xrH92nH1zLjmbKQz7abI2tUqIxR0Ynk0M9_4PiZjGoejjbToa_RmRj1KMy6C8Au-trIaEwiAbxoAH3ktCJ3HWefkDsLnT3jSpmIQdijB0/s320/Screen+Shot+2013-01-03+at+22.20.11.png" /></a></div>
<p>and will start a basic CGI-based web server, open a web-browser talking to it
(on port 8008 by default), and allow you to control the state of each of the
8 output lines on the connected device (which it sets to async bit-bang mode).</p>
<p>This will be further developed in the future - it looks somewhat rough right
now :)</p>
<p>The <tt>led_flash</tt> example has also gained a feature in taking a command line
argument of the rate at which to flash - defaulting to 1 Hz. To cause an
LED (or a piezo buzzer works just as well - and more annoyingly!) to flash
at 10Hz, run:</p>
<pre class="prettyprint"><code>$ python -m pylibftdi.examples.led_flash 10
</code></pre>
<h3>Coming next</h3>
<p>I'm still trying to improve test coverage. I spent some time trying to port
the tests to the Mock library, though my efforts at effectively patching
at the ctypes DLL level weren't very successful.</p>
<p>Documentation continues, and thanks to the wonderful <a href="http://pylibftdi.readthedocs.org">readthedocs.org</a>, the
documentation isn't necessarily tied to the more sedate release cycle - it
always shows the latest version from Bitbucket. If more API changes happen
this could be counter-productive, but I'll try really hard to note if this is
the case, and it makes things much nicer when updating things like installation
instructions (which I have done, adding tested udev rules instructions etc).</p>
<p>libftdi 1.0 is just going through release candidate stage at the moment, so
I'll test against that. I expect only the installation docs will need changes.</p>
<p>I've never tested pylibftdi on Windows, and I'm keen to do this in the near
future, though I don't have regular access to a Windows machine, so no
guarantees about this. I suspect it all 'just works'...</p>
Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com3tag:blogger.com,1999:blog-3029438816324283001.post-45580246009093866772012-10-04T23:48:00.000+01:002012-10-04T23:48:50.338+01:00Typing to Yourself
<p>I had an awesome time at PyConUK last weekend. I went to my first code dojo where I helped write a text-based adventure game (with a disturbing plot!), played with using Python on a RaspberryPi to access the GPIO, started a new Python module for my own use, and gave my second ever lightning talk, titled ‘Typing to Yourself’. This is 'the blog of the talk'.</p>
<h3>What's this about then?</h3>
<p>I’d started finding that IM chat logs often gave a lot of information, and often the timestamps were useful. The conversational nature of the chats also often gave subtle and useful clues about things such as confidence levels which a more formal report would lose. I started to think that it would be worth having that even if I wasn’t chatting to someone else. <em>And so the madness started…</em>.</p>
<p>Typing to yourself. About stuff. Preferably as it happens, in ‘real time’ (is there another kind?). I suppose some people use Twitter like this, but I (and I'm sure my employer) like it that I keep at least some things to myself.</p>
<p>I've been doing this for a few months now, and got a single file with about 1300 lines of info I've been writing. Originally I cleared it out every few days, but then thought that maybe keeping it all around would be of some benefit.</p>
<h3>Why Type to Yourself?</h3>
<br />
<h4>Record snippets of new knowledge</h4>
<p>There are hundreds of small things I’ll find out about and then not look at again for 6 months. And chances are, I’ll forget all about them. It’s worth recording that sort of stuff. Things like <a href="http://linux.die.net/man/1/pv">pv</a>, a new and useful iptables rule, the name of a nice vim colour scheme.</p>
<h4>Decouple recording from reporting</h4>
<p>Part of a knowledge-based job, where part of the task involves continual learning and researching, is that there is always the risk of going off into some blind alleys, dead-ends, or things more interesting than what you / I should really be working on. Chances are, even if it’s tangential to the work you / I should be doing, it’s still useful in itself, and worth recording. If I’ve just spent half an hour reading about ZeroMQ, I’ll include that. I might not record it in a list of training activities for the week though. It <em>defers disclosure</em>, allowing selection to take place at a later point. And therefore encourages more interesting and accurate reporting. By separating out recording from (for example) time reporting systems, we can post-process and filter that raw data later. Same thing as RAW and JPEG files from a camera; it’s not a bad thing to have the RAW data even if the end result is somewhat different. We are likely to be more honest if we type to ourselves, including feelings, distractions, etc, some of which will be useful at a later point. </p>
<h4>Record <em>why</em> decisions were made</h4>
<p>We make dozens of design decisions every day, and the vast majority of these seem obvious at the time. But there are some that aren’t ever obvious, and some that won’t be tomorrow even if they are now. Recording why we make the choices we do is important, even if just to force us to make them consciously. And it can be very useful to document dead-end design decisions which we try and ultimately give up on, in the hope of avoiding repeating them in the future.</p>
<h4>Overcome creative blocks</h4>
<p>Writer’s block affects programmers as well as novelists. Or at least it affects me from time to time. Sometimes I sit there for minutes on end, simply staring at the screen. I’ve found that explaining my dilemma to myself through the medium of typing to myself can often overcome this. Sometimes <em>any</em> activity can be a key to being able to think clearly about a problem again. Not only that, but regularly writing down what you're doing can be a great antidote to distraction and procrastination. This comes back to being able to be honest with ourselves about what we're doing - writing this down makes us think about it, be able to criticise it, and therefore more quickly be able to change direction.</p>
<h4>Rubber duck debugging</h4>
<div class="separator" style="clear: both; text-align: center;">
<figure style="clear:right; float:right;margin-left:1em; margin-bottom:1em"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi59DJHE2156vaHjymE8y9OghIoMjrjVQROJ4hZ_mLhU__2rdbLFy0mZolnFOHHPzb_qYGJ3R61nc9FookdBy7t6CyTP4L2C7BueItPz-AsZU2mGEVh12xc6M2lVu1v-13a7shSFw3kHL4/s1600/Rubber_duck_assisting_with_debugging.jpg" imageanchor="1" ><img border="0" height="320" width="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi59DJHE2156vaHjymE8y9OghIoMjrjVQROJ4hZ_mLhU__2rdbLFy0mZolnFOHHPzb_qYGJ3R61nc9FookdBy7t6CyTP4L2C7BueItPz-AsZU2mGEVh12xc6M2lVu1v-13a7shSFw3kHL4/s320/Rubber_duck_assisting_with_debugging.jpg" /></a><figcaption style="font-size:small">© Tom Morris / Wikimedia Commons / CC-BY-SA–3.0 / GFDL</figcaption></figure></div>
<p>This is a technique which uses vocalisation of a problem we’re facing to make us think more clearly about the problem; to take a step back, and explain to a toy rubber duck - ideally one with no previous knowledge of the problem we’re facing - what’s going on and how we’re trying to fix it. Just explaining it often helps us realise the problem. But rubber ducks are tricky to find at the crucial moment, and people think programmers are mad enough already without seeing us all talking to little ducks sitting on our desks. No, typing to ourselves, writing down the problem, is clearly much safer. After all, a programmer writing down a problem to themselves looks highly productive, rather than slightly mad.</p>
<h4>Searchable history</h4>
<p>We version control our code. Why not version our thoughts and activities? Write stuff down. Be able to go back in time and revisit those thoughts at a later date. Use it to store our short-term thoughts just before a meeting or break, so picking up where we left of is easy. That sort of stuff. Or to record surprising errors which we can’t reproduce and just put down to ‘something must have been set up wrong’. But then we start to find that we’ve already recorded it two months earlier…</p>
<h3>How should I go about that?</h3>
<br />
<h4>Timestamped</h4>
<p>Typing to yourself is an activity best done in real-time. Doing it later may still have some benefit, but the stream of consciousness brain-dump in the background has a lot of value which is lost if we’re just typing a historical report on what happened earlier. The point of typing to yourself is that having a record is useful; trying to remember stuff to record after the fact is lossy and a waste of time. Having things timestamped is a motivation (‘I’ve not written anything for 2 hours!’) and useful for searching history - finding out just when that bug appeared last.</p>
<h4>Centralised</h4>
<p>For a given context (e.g. work), there should be a single log on which you type to yourself. Perhaps there shouldn’t even be multiple contexts; everything should go in one big fat log. But it should be a single log, and yet available everywhere. Having to merge logs, or wondering where the latest version is, or knowing but not having access to it - all bad things. Dropbox is good.</p>
<h4>Low friction</h4>
<p>The whole point of ‘typing to yourself’ is that it shouldn’t be a context switch. I tend to keep a <a href="http://tmux.sourceforge.net/">tmux</a> pane open with editfile running (as <code>track -t</code>). Switching into it is just a case of Ctrl-A/cursor key. Then type stuff. Then Ctrl-A/cursor the other way. There’s no alt-tabbing, no windows changing focus or popping in front of each other. And importantly, I can see what’s there at all times, so it’s always in my consciousness - I don’t have to ‘swap it back in’ when I switch to it. Another aspect of low-friction is that the data itself should be widely available to programs to use, whether for searching, editing, or anything else. A text file is ideal.</p>
<h3>An Implementation</h3>
<p>I’m more keen about the ideas here than the implementation, but without an implementation it couldn’t work. I use my <a href="http://bitbucket.org/codedstructure/editfile">editfile</a> program for almost all longer pieces of writing - blog posts, ideas, plans. And my ‘typing to myself’ log, which is just an editfile ‘instance’ used in ‘time track’ mode, which keeps a single file on Dropbox with all the content in a text file. I wrote about editfile in an <a href="http://blog.codedstructure.net/2012/01/linux-laziness-or-editing-text-files.html">earlier blog post</a>.
<code>editfile</code> started out as a very simple bash script:</p>
<pre class="prettyprint"><code>#!/bin/bash
$EDITOR "~/Dropbox/editfile/$(basename $0)"
</code></pre>
<p>but is now a more complex bash script, including search, a two level hierarchy (I had that before iCloud decided it was a good idea!), command-line completion, and the time track mode I use for typing to myself.</p>
<p>The time-track mode has a couple of useful features - readline & history integration, and prompting and storing a timestamp. It’s not perfect; one of the key things is that the timestamp prompt doesn’t update in real time (although it does store the current time in the text file rather than the potentially out-of-date displayed time). The implementation of the time-track loop is the following:</p>
<pre class="prettyprint"><code>now=$(date '+%Y/%m/%d %H:%M')
# read history from previous
history -r $HIST_FILE
while read -ep "$now >> " track_input ; do
now=$(date '+%Y/%m/%d %H:%M')
if [[ -z $track_input ]] ; then
# don't store blank lines
continue
fi
# use -- to indicate end to options e.g. if track_input
# starts with '->' which previously caused errors
history -s -- "$track_input"
echo "$now $track_input" >> ${TARGET_PATH}
done
# append current session to history
history -a $HIST_FILE
# ensure bash prompt starts on a new line
echo
</code></pre>
<p>I use this every day at work, and it's got to the stage where I want to use it more. I've got plenty of ideas for things to integrate into my implementation, though the real essence of it doesn't need anything clever really.</p>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com3tag:blogger.com,1999:blog-3029438816324283001.post-48579462614597068922012-09-24T23:05:00.002+01:002012-09-24T23:06:16.586+01:00Project Naming in a Google World
<p>I’m a great fan of Python; not only do I think the language itself is clean and readable, the community polite and helpful, and the ecosystem diverse and fascinating, but also the <a href="http://www.python.org/dev/peps/pep-0020/">Zen of Python</a> resonates with me.</p>
<p>I think there is significant value in that ‘there should be one - and preferably only one - obvious way to do it’, and that ‘namespaces are one honking great idea’. To me, it is sad that this essence of Python philosophy isn’t applied more widely.</p>
<p>Of course there is an element of tension in the Zen - Namespaces are about nesting, but ‘Flat is better than nested’. Nevertheless, flat within namespaces isn’t the same as not having any namespaces at all.</p>
<h3>Namespaces don’t exist in a Google world.</h3>
<p>I bet that most project name searches on Google are a single word. ‘jquery’ would get me want I want. ‘requests’ gets me what I want. Even one of my own projects - ‘pylibftdi’ gets me where I want to go. Getting to this point is probably part of choosing a good name. But that’s exactly the problem: how do I choose a good name for my new project? It’s one thing already knowing what project I’m interested in and simply using Google to get me there (sometimes a language name qualifier helps, e.g. ‘python flask’), it’s quite another two problems a) searching for a project to meet a given problem, not knowing what might be available b) searching for a project name I can use for my shiny new thing.</p>
<h3>
Searchable Project Names</h3>
<p>One of the technologies I use the most at work is SSH. I tend to use it mostly in a fairly crude way, via it’s normal client and server programs <code>ssh</code> and <code>sshd</code> with many configuration options, but I have used the <code>paramiko</code> library. Which works well, and has a great name - easily remembered, especially after reading about its etymology on the project page. And very easily searchable. Recently, however, it’s development has slowed. I read in some places that it is now ‘deprecated’, but I’m not sure about that - the github project was last updated 11 days ago as of now… Anyhow, recently it has been forked, and its ‘successor’, has the brilliant name of… wait for it… ‘ssh’. Yes, brilliant. No, actually, it isn’t that helpful. Search for ‘ssh’, and it obviously won’t be there, straightaway, on the first page. Search for ‘python ssh’, and it still won’t be there. I guess it might be in a few months or years once it (presumably) takes off as the ‘one way to do it’, but now? It’s not helpful. Maybe it’s only aimed at people who use the PyPI search engine? And even if / when it is ‘obvious’, it’s still going to be a pain to do web searches for problems relating to use of the package. If I want to know which to use, then ‘paramiko vs ssh’ is of no help. Is the new ssh module ‘preferred’ by the community going forward? Or is it just a random fork by the Fabric guys? Other than the download stats on PyPI, it’s difficult to tell, because searching for info about it is... tricky.</p>
<p>As another example, the <code>pbs</code> package has recently changed its name to <code>sh</code>. Now <code>pbs</code> might not be the bestest name, but changing it to <code>sh</code> causes exactly the same kind of problem as <code>ssh</code>. There can be a real feeling of ‘hijacking’ when something <em>so</em> domain specific is used for a general project name. Using such a name is a clear signal: <em>this</em> is the module <em>you</em> should want to use for this task - you’d would be crazy to try anything else! That may or may not be intended or justified, but when it is a trivial thing for anyone to do, we developers have to be very careful and deliberate. Domain-specific project names, with massively overloaded meanings, only make sense in a very defined namespace: in these cases, the set of Python packages on PyPI.</p>
<p>Except, in a Google world, there aren’t namespaces.</p>
<h3>
Finding a project name (or rather finding the absence of one)</h3>
<p>One of the problems with project naming in a flat unified project namespace (because of course there is <em>one</em> namespace) is project name squatting. For a variety of reasons - good and bad - developers decide that ‘release early, release often’ is a good policy. And one of the first things needed for that first visible release - perhaps the <em>only</em> thing needed - is a project name. So names are snapped up in an eager race. Project names have become the new hot-property. So we have lots of great project ideas, which need and find an awesome project name, make that first release, … and then do nothing. Stagnate. Just like the dot-com crazy days, we have project-name squatting, and permanent project-name ‘under construction’ empty shells… And, like defunct satellites cluttering low-earth orbit, the debris of project names now unused is a danger to every other project, trying to find its own space and path through the knowledge-sphere, avoiding the no-man’s land which has been staked out and left barren, taking juicy spectrum in an interference causing blackout. Soon there will be <em>no more names</em> left and [Sorry, I seem to have got carried away. Ahem.]</p>
<h3>
So…?</h3>
<p>The following are some more thoughts and examples. Most of this is subjective. Hurrah for being able to dump half-finished ideas in a well name-spaced environment!</p>
<h4>
Over-general names:</h4>
<ul>
<li>‘node’ - really unhelpful.</li>
<li>‘windows’ - key element in GUI programming. WIMP.</li>
<li>‘dropbox’ - to a certain extent.</li>
<li>‘color’ - remember them? Good thing they didn’t take this word away…</li>
<li>‘word’ - a tool for writing words?</li>
<li>eliminate a name not just from the <em>project</em> namespace, but increasingly from the <em>word</em> namespace.</li>
<li>makes web searching harder</li>
</ul>
<h4>
Unpleasant / generally bad names:</h4>
<ul>
<li>git</li>
<li>gimp</li>
<li>My[anything] ;-)</li>
<li>Any number of ‘offensive’ or ‘wrong connotation’ names, often leading to name changes, which help no one, except in an ‘any publicity is good publicity’ kind of way:
<ul>
<li><a href="http://github.com/vmg/sundown/issues/36">http://github.com/vmg/sundown/issues/36</a></li>
<li><a href="https://github.com/whit537/assertEquals/blob/master/ANNOUNCEMENT.rst#why-i-am-renaming-testosterone">testosterone</a> - an example of deliberately trying to change the perception of a project in a positive way</li>
</ul>
</li>
</ul>
<h4>
Duplicate projects with the same name:</h4>
<ul>
<li><a href="http://pyrorobotics.org/">pyro</a> / <a href="http://packages.python.org/Pyro4/">pyro</a></li>
</ul>
<h4>
Create or recognise our own namespaces:</h4>
<ul>
<li>blog articles: author + title</li>
<li>PyPI / CPAN etc</li>
<li>‘hungarian notation’ e.g. pyxyz, where the ‘py’ prefix includes some indicator of what namespace it lives in.</li>
<li>domain name country code extensions - ‘.io’ etc</li>
<li>‘file extension’ as part of project name: ‘node.js’ etc</li>
<li>identification by company or organisation: iOS / iPod / i*, gmail, google maps, etc</li>
<li>identification by well-known patterns: xUnit, [j/py]Query etc.</li>
</ul>
<h3>
Summary</h3>
<p>If I were to produce a new vacuum cleaner and call it ‘Vacuum’, then various people might get upset. We (in software development) don’t really want to have to deal with all the legal & trademark clutter - the fact that we can have an
idea, create a project and ‘market’ it all in a weekend is awesome, but requires us to act responsibly. Just because we <em>can</em> launch a new project into the orbital (name)space around us, doesn’t mean we must. Though it is awfully tempting…
In addition we need to recognise, use, and educate ourselves and others about the namespaces all around us. </p>
<p>So I guess what I’m really saying, is (to quote Tim Peters)...</p>
<blockquote>
Namespaces are one honking great idea - <em>let’s do more of those!</em></blockquote>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com3tag:blogger.com,1999:blog-3029438816324283001.post-23795944792295748632012-06-10T15:59:00.000+01:002012-06-10T18:15:27.968+01:00pylibftdi v0.10 released<p>I’ve recently released pylibftdi v0.10. <a href="http://pypi.python.org/pypi/pylibftdi">pylibftdi</a> is a ‘minimal Pythonic interface to libftdi’, and is intended to be (possibly?) the easiest way to get up and running for simple use cases of FTDI’s range of USB to serial and parallel interface chips and modules. v0.10 adds a couple of new features and bug fixes.</p>
<p>For a long time I suffered under the misapprehension that version numbers should follow the rules of decimal numbers, and that by all reasonable accounts v1.0 should have followed 0.9, and since I want(ed) 1.0 to be ‘stable’ (I currently classify it as ‘beta’), I’d reached an impasse. I can’t remember the exact moment, but I had a realisation that I didn’t have to approach 1.0 via smaller and smaller increments from 0.9 (as in Zeno’s race), but that I could go from 0.9 to 0.10. Anyway, I still want to do better documentation (and a few other things) before reaching 1.0.</p>
<p>Changes in v0.10:</p>
<ul>
<li><p>Running the unit tests is now easier due to some reorganisation - just run <code>python -m unittest discover</code> in the top level directory.</p></li>
<li><p>Support for the FT232H device - this has a different USB product ID (PID) to the previous devices I’d been testing with and using - mainly FT245BM/RL, FT232R/RL. All those devices have a PID of <code>0x6001</code>, while the newer FT232H has a PID of <code>0x6014</code>. I experimented for a while with having (defaulted) extra parameters for specifying the VID and PID of the target device, but this pushed too much complexity up to the user - I really want pylibftdi to be something which can be used with default options and next-to-no set up code for most basic operations. The approach taken is to have two lists (<code>USB_VID_LIST</code>, <code>USB_PID_LIST</code>) and have the device finding code iterate over the cartesian product of these (i.e. a nested loop, but implemented through the wonderful <code>itertools.product</code>). So adding new PIDs in the future is as simple as appending to <code>USB_PID_LIST</code>, and a device can be opened with no parameters to the <code>Device()</code> constructor if it’s the only FTDI device on the USB bus.</p></li>
<li><p>Resetting the device to serial mode on open. There’s been discussion about implementing this logic in the library on the libftdi mailing list, but doing it in pylibftdi as well doesn’t hurt. This fixes the unexpected condition that if a previous application had used a device in BitBang mode, reopening it just using <code>Device()</code> would leave it in BitBang mode, rather than the expected serial mode (for devices which have support both).</p></li>
<li><p>Added a ‘<code>buffer_size</code>’ parameter to the <code>Device()</code> constructor (defaulted to zero, which retains previous behaviour) which chunks reads and writes into accesses of that length at most. This avoids the issue that a call of (for example) <code>dev.write(‘hello’ * 100000)</code> over a 9600 serial link would take an incredibly long time, and since it is all running in the library context (via a <code>ctypes</code> call), it wouldn’t be interruptible by Ctrl-C.</p></li>
<li><p>Removed the deprecated use of <code>Driver()</code> to be a synonym for <code>Device()</code>.</p></li>
<li><p><b>Update:</b> I've already done two maintenance releases in the hours since originally writing this - v0.10.2 is now current. One of the major changes is that the <code>examples</code> subpackage is now included in the sdist - so <code>python -m pylibftdi.examples.led_flash</code> should work if you have an LED attached to D0 on your device.</p></li>
</ul>
<p>The plan for the next release is just more tidy-ups, examples and more documentation, but I might squeeze a few other things in there…</p>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com4tag:blogger.com,1999:blog-3029438816324283001.post-27223086629711496102012-06-08T22:42:00.001+01:002012-06-08T22:43:30.213+01:00My First 3 hours with a Raspberry Pi<p>My Raspberry Pi arrived a couple of weeks ago, but I've been too busy to play with it til now. And tomorrow I'll be too busy with different things :-)</p>
<p>I'd intended to put Arch Linux on it, since that's what I use on my Pogoplug, and it works really nicely there. But, starting at the downloads page (<a class="reference external" href="http://www.raspberrypi.org/downloads">http://www.raspberrypi.org/downloads</a>) I decided that since the Debian install seemed to have some 'official'ness about it, I'd start there. It pointed to <a class="reference external" href="http://elinux.org/RPi_Easy_SD_Card_Setup">http://elinux.org/RPi_Easy_SD_Card_Setup</a>, which contains a list of 'lazy/easiest/easy/...' and so on approaches to getting a card set up.</p>
<p>A link on that page led to a Mac-only (for now) script which downloads and writes an image to an SD card all in one operation - nothing too difficult to do manually, but I'm not one to ignore easy options. Mostly. <a class="reference external" href="http://exaviorn.com/raspiwrite/">http://exaviorn.com/raspiwrite/</a></p>
<p>I'd bought a Sandisk Extreme 8GB SDHC card - which is on 'the list' by the way - for use with my RPi. I ran the RasPiWrite script firstly with Debian, then Arch Linux. Neither worked. No 'OK' LED, only the red Power LED. Somewhat disappointed.
As a side-note, the RasPiWrite script worked better if I reformatted (standard FAT) the card between attempts; otherwise it initially considered selecting my external hard drive, which wasn't great - but it did warn and I was aware, so no worries (for me) there.</p>
<p>I then wondered what the OK LED was supposed to indicate, and read (here: <a class="reference external" href="http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=6735">http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=6735</a>) that it should be expected to at least flicker a little on power up - and it wasn't.
I had another SDHC card (an old Transcend 4GB SDHC Class 6) and decided to try that. After re-runninge RasPiWrite on it with Debian again, ... the OK light did stuff! It appears to be an SD access light - this wasn't mentioned anywhere I noticed - but it isn't just constantly on when the system is booted, but flickers on activity.
So I was sure it was finally booting. I'd read a bunch of things suggesting that as well as dodgy SD cards, flaky PSUs could be a fault, but the two I used (a Blackberry phone charger and an iPad charger) both worked fine in the end.</p>
<p>Sadly however, there was no output on HDMI, butfrom my router's DHCP client list that it was taking a lease. A quick OUI scan showed the Raspberry Pi foundation as the owner of a new MAC in the table :-)
I'd foolishly assumed ssh would work out of the box, but nmap showed just port 111 open. Following the instructions here (<a class="reference external" href="http://www.youtube.com/watch?v=SmMMKojOE4U">http://www.youtube.com/watch?v=SmMMKojOE4U</a>) to enable SSH worked great. (Basically '<tt>ssh-keygen; service ssh start; update-rc.d ssh defaults</tt>' as root), and I could ssh in! Hurrah!</p>
<p>After reading a few places that sometimes the RCA could work even if HDMI didn't, I found an RCA-SCART adapter and magic - it was actually doing stuff!
Finally, after searching and finding <a class="reference external" href="http://elinux.org/R-Pi_ConfigurationFile">http://elinux.org/R-Pi_ConfigurationFile</a> and <a class="reference external" href="http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=6762">http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=6762</a> I got HDMI working by (while logged in over SSH) by creating a /boot/config.txt file (there wasn't an old one) containing the following:</p>
<pre class="literal-block">
config_hdmi_boost=4
hdmi_force_hotplug=1
</pre>
<p>And that was it - task accomplished.</p>
<p>Since it's listed in the 'working SD card' list, presumably more recent kernels might work with the bigger / faster card I bought for the purpose of running my RPi; IMO it should be a priority to keep images on the downloads page current - containing latest kernels supporting as wide-a-range of SD cards as possible.
I'm sure many have fewer hassles than me, and equally sure many have had many more problems. But for this whole endeavour to really work, it needs to get easier. Is RPi about teaching programming, fun, and 'computing', or about the harsh realities of embedded Linux?
Fortunately there is an awesome community which I'm confident will keep pushing things forward - thanks!</p>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com0tag:blogger.com,1999:blog-3029438816324283001.post-37481682103097536012012-05-07T23:13:00.000+01:002012-05-07T23:13:52.683+01:00urlsearch - web searching from the command line<div class="document" id="urlsearch-web-searching-from-the-command-line">
<p>Continuing my pursuit of having a custom vocabulary on my command line, preferably via <tt>argv[0]</tt> abuse, I've now addressed the subject of web searching.
I probably make somewhere between 20 and 50 searches in a typical day,
mostly on Google, but Wikipedia comes high up on the list too.</p>
<p><tt>urlsearch</tt> is a small script which kicks off a browser search from the command line. The plan is that the task switch associated with switching from the command line (where you usually are, right?) to the browser is eliminated. By complex, spurious and - to-be-frank - non-existent calculations, I estimate that this reduction in friction getting your work done should make you 4.6% more productive, and thus make the world a better place.</p>
<p>Using Python's <tt>webbrowser</tt> module, it's straightforward to open a webbrowser to a particular page:</p>
<pre class="prettyprint">
>>> import webbrowser
>>> webbrowser.open('http://google.com/search?q=standard+library+pep8')
</pre>
<p>What <tt>urlsearch</tt> gives is the equivalent to the above from the following at a Bash prompt:</p>
<pre class="prettyprint">
$ google standard library pep8
</pre>
<p>It's simple, short, and in it's basic form is just a couple of lines of Python:</p>
<pre class="prettyprint">
#!/usr/bin/env python
import sys, urllib, webbrowser
webbrowser.open('http://google.com/search?q=' +
urllib.quote_plus(' '.join(sys.argv[1:])))
</pre>
<p>Make that executable, put it in the path, and you're good-to-go with google <a class="footnote-reference" href="#id3" id="id1">[1]</a> searching from the command line. However, as always, complexity is lurking, and desires to have it's way...</p>
<p>The following things are addressed in <tt>urlsearch</tt>:</p>
<div class="section" id="automatic-gtld-checking">
<h4>Automatic gTLD checking</h4>
<p>A range of gTLDs are searched in turn using <tt>socket.getaddrinfo(HOSTNAME, 'http')</tt>. By default this list starts with the empty gTLD (so local search domains are tried first), then .com, .org, .net, and .co.uk are tried in that order - these being most relevant to my uses. Changing it to default to '.fr' first might be reasonable for French-speakers, for example, but avoiding having to think about this is one more thing not to have to think about. As it were.</p>
</div>
<div class="section" id="generic-search-engine-support">
<h4>Generic search engine support</h4>
<p>This is where the wonder of <tt>argv[0]</tt> fits in :-) Via symlinks to urlsearch, various search engines can be supported. An <tt>argv[0]</tt> of 'google' will cause a google.com search, while 'wiki' is special-cased to wikipedia. The search query format also needs special-casing for many search engines - the default of <tt><span class="pre">/search?q={terms}</span></tt> works for Google, Bing and several other sites.</p>
<p>The following sites are directly supported or special cased:</p>
<table border="1" class="docutils"><colgroup><col width="20%" /><col width="80%" /></colgroup><thead valign="bottom"><tr><th class="head">argv[0]</th>
<th class="head">search engine</th>
</tr></thead><tbody valign="top"><tr><td>google</td>
<td>Google</td>
</tr><tr><td>bing</td>
<td>Bing</td>
</tr><tr><td>wiki</td>
<td>Wikipedia</td>
</tr><tr><td>duckduckgo</td>
<td>DuckDuckGo</td>
</tr><tr><td>pylib</td>
<td>Python standard libraries (direct jump)</td>
</tr><tr><td>jquery</td>
<td>jQuery API search (direct jump)</td>
</tr></tbody></table>
<p>These are managed in the code by a very dull <tt>if</tt>/<tt>elif</tt> chain, though something a bit less 'hackish' would probably be wanted to scale to further engines.</p>
</div>
<div class="section" id="trac-support">
<h4>Trac support</h4>
<p>Trac <a class="footnote-reference" href="#id4" id="id2">[2]</a> follows the same search query format as Google, and has a great 'quickjump' feature, where certain search query formats take the user directly to the relevant page. For example, a search for <tt>r5678</tt> will go directly to the changeset for revision 5678, and a search for <tt>#1234</tt> will go directly to ticket 1234. This ticket search can't be done from a Bash prompt however, as it will be treated as a comment and ignored. This is special cased such that if the search term is an integer, it will be preceded with '#'.</p>
</div>
<div class="section" id="other-tweaks">
<h4>Other tweaks</h4>
<p>Output from the browser writing into the command prompt (as happens with Chrome, for example) is redirected to <tt>/dev/null</tt>.</p>
</div>
<div class="section" id="the-code">
<h4>The Code.</h4>
<p>The code is here: <a class="reference external" href="http://bitbucket.org/codedstructure/urlsearch">http://bitbucket.org/codedstructure/urlsearch</a></p>
<table class="docutils footnote" frame="void" id="id3" rules="none"><colgroup><col class="label" /><col /></colgroup><tbody valign="top"><tr><td class="label"><a class="fn-backref" href="#id1">[1]</a></td><td>Other search vendors are available</td></tr></tbody></table><table class="docutils footnote" frame="void" id="id4" rules="none"><colgroup><col class="label" /><col /></colgroup><tbody valign="top"><tr><td class="label"><a class="fn-backref" href="#id2">[2]</a></td><td><a class="reference external" href="http://trac.edgewall.org">http://trac.edgewall.org</a></td></tr></tbody></table></div>
</div>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com1tag:blogger.com,1999:blog-3029438816324283001.post-62174665368394361122012-01-29T00:37:00.000+00:002012-05-13T22:52:28.782+01:00editfile - editing text files made twice as easy<p>It's been far too long since I wrote anything on this blog, so I'm easing myself back into it with what I've found to be a nice shortcut over the last few months. It uses one of my favourite POSIX tricks - symlinks to executables, together with behaviour based on the original symlink's name - the 'zeroth' argument to the executable.</p>
<p>As someone who identifies fully - and then some - with this xkcd comic: <a target="_blank" href="http://xkcd.com/974/">The General Problem</a>, I would really rather not have to type commands which require options. I like aliases, preferably one or two characters long, and I would prefer to write one long word than a short word with an option or two. The mental effort of typing a single word to do a specific action is exactly what I want.</p>
<p>Which leads me to the problem. Text files are great. I am (or was) regularly opening editors, making notes, and saving them somewhere random. The problem is that, like standards, there quickly become too many of them. Which one to use? How to keep track of them? The problem is that the action 'edit my notes file' has to be split into two: a 'verb' (edit), and an 'object' (that particular file). Now you might think I'm irredeemably lazy, but that's twice as many things to remember - or type - as I'd like. I could just add an alias to .bash_aliases, but that's no fun, and while I could have a bunch of them, if I wanted to do anything more complex, I'd be repeating myself too much. Laziness, remember? Got to solve 'The General Problem', and then never have to worry about the specifics ever again :-)</p>
<p>One of my fun projects many moons ago was about adding '<code>--xml</code>' options to ordinary posix commands (<a href="http://blog.codedstructure.net/2011/04/xmlcmd-adding-xml-option-one-command-at.html">xmlcmd</a> - because XML may have serious backing in the enterprise, but needs all the help it can get to make this 'the year of XML on the Linux desktop'), and in it I create symlinks from <code>~/bin/<em>name-of-posix-command</em></code> to a Python script which then looks up and runs the original command (using the Python <a href=" http://pypi.python.org/pypi/which/1.1.0">which</a> package), before encoding that command's output into XML to be returned to the user. So basically:</p>
<ul>
<li>Symlinks in a path early in $PATH to a common script</li>
<li>Use of <code>argv[0]</code> / <code>$0</code> in that script to know from which command it was called, and act accordingly.</li>
</ul>
<p><em>(And <code>which ls</code> on all my computers still shows $HOME/bin/ls, even though I hardly ever use the <code>--xml</code> option - isn't transparency nice?)</em></p>
<p>Applying these techniques to the rather simpler - yet more useful - task of editing files, and jumping to the punchline, I have this in <code>~/bin/editfile</code>:</p>
<pre class="prettyprint"><code>#!/bin/bash
EDITFILE_DIR=~/Dropbox/editfile
mkdir -p $EDITFILE_DIR # ensure this exists
TARGET_PATH=$EDITFILE_DIR/$(basename $0).txt
case $1 in
'-a')
# append command arguments to file and exit
shift # don't include the '-a'
echo "$*" >> $TARGET_PATH
;;
'-l')
# list file and exit
cat $TARGET_PATH
;;
*)
# Determine editor to use
if [[ -n "${EDITOR}" ]] ; then
EDIT=$EDITOR
elif $(which gedit) ; then
EDIT="gedit -b" # default fallback if present
else
EDIT="vim" # fallback if no gedit there
fi
# Edit it...
$EDIT $TARGET_PATH
;;
esac
</code></pre>
<p>And this:</p>
<pre class="prettyprint">
$ ls -l ~/bin
-rwxr-xr-x 1 ben staff 714 28 Jan 23:55 editfile
lrwxr-xr-x 1 ben staff 8 19 Dec 22:13 notes -> editfile
lrwxr-xr-x 1 ben staff 8 19 Dec 22:50 report -> editfile
lrwxr-xr-x 1 ben staff 8 19 Dec 22:50 todo -> editfile
lrwxr-xr-x 1 ben staff 8 19 Dec 22:50 track -> editfile
</pre>
<p>This has been developed from the much simpler starting point which looked something like this:</p>
<pre class="prettyprint">
#!/bin/bash
gedit ~/$(basename $0)
</pre>
<p>The result of this is that I can type '<code>notes</code>' from anywhere on my system, and up will pop my editor with the notes file, which happens to be something on Dropbox, and therefore available on all my computers. I can add as many extra files for different purposes as I like, and they all act in the same way. The files themselves aren't tied into any particular system, and even have helpful '.txt' extensions on the 'actual' files. And since there's a single script behind all of these, they all inherit the 'extras', such as '<code>-l</code>' to list the file, and '<code>-a</code>' to add a line, and I can add extra features and commands, and only have to do it once.</p>
<p>But the main point of all of this is that I've turned a verb + object into a more specific verb, and as long as the first verb is something like 'edit this text file', then I've solved The General Problem too. It might have taken a while longer than <code>'alias notes="vim ~/Dropbox/editfile/notes.txt"'</code>, but that wouldn't have been half as fun. And if you haven't looked at the <a target="_blank" href="http://xkcd.com/974/">xkcd comic</a> I linked to earlier, you should now... :-)</p>
<p>I feel the need for a '<code>today</code>' action, and while I'm at it I should add a todo:</p>
<pre class="prettyprint">
$ (cd ~/bin; ln -s editfile today)
$ todo -a blog post about recently released pylibftdi 0.9
</pre>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com0tag:blogger.com,1999:blog-3029438816324283001.post-72936492633439008952011-07-18T23:35:00.002+01:002011-07-18T23:40:12.282+01:00HTTP or it doesn't exist.<p>This is my 'or it doesn't exist' blog post<a href="#bpb-hode1">[1]</a><a href="#bpb-hode2">[2]</a><a href="#bpb-hode3">[3]</a>. I think everyone should have one ;-)</p>
<p>A big chunk of my life is processing electronic information. Since I would like it to be a (slightly) smaller chunk of my life, I want to automate as much as possible. Now ideally, I don't want a massive disconnect between what I have to do as a human processor of information and what I need to tell a computer to do to do that job done without my help. Because it's easier that way.</p>
<p>So when I hear that the information I need to process is in some spreadsheet or other on a Windows share, it makes me a little sad. When I hear that it is available via a sensible REST interface in a sensible format, my heart leaps for joy just a little.</p>
<p>With something like Python's standard library (and third-party package) support for HTTP (<a href="http://docs.python-requests.org/en/latest/index.html">requests</a>), XML (<a href="http://docs.python.org/library/xml.etree.elementtree.html">ElementTree</a>) and <a href="http://docs.python.org/library/json.html">JSON</a>, I should be able to get my computer to do most of the manual data processing tasks which involve 'documents' of some form or other.</p>
<p>In a previous job I worked at convincing anyone who would listen that 'XML over HTTP' was the best thing since sliced bread. With appropriate XSLT and CSS links, the same data source (i.e. URI) could be happily consumed by both man and machine. Admittedly most of the information was highly structured data - wire protocols and the like, but it still needed to be understandable by real people and real programs.</p>
<p>I'm not an XML expert, but I think I 'get' it. I never understood why it needed so much baggage though, and can't say I'm sad that the whole web services thing seems to be quietly drifting into the background - though maybe it was always trying to.</p>
<p>A lot changes in web technology in a short time, and XML is no longer 'cool', so I won't be quite as passionate about 'XML over HTTP' as I once was. For short fragments it is far more verbose than JSON, though I'd argue that for longer documents, XML's added expressiveness makes the verbosity worth it. Maybe it was ever thus, but whenever two technologies have even the slightest overlap, there seems to be a territorial defensiveness which makes the thought of using both in one project seem somewhat radical. So while I've used JSON much more than XML in the last couple of years, I've not turned against it. If done right (Apple, what were you thinking with plist files!?) - it is great. Compared to JSON-like representations, the ability to have attributes for every node in the tree is a whole new dimension in making a data source more awesome or usable (or terribly broken and rubbish). I've seen too many XML documents where either everything is an attribute or nothing is, but it's not exactly rocket science.</p>
<p>
Things I liked about XML:
<dl>
<dt>Simplicity</dt><dd>I like to think I could write a parser for XML 1.0 without too much effort. If it's not well formed, stop. Except for trivial whitespace normalisation etc, there is a one-to-one mapping of structure to serialisation. Compare that with the mess of HTML parsers. While HTML5 might now specify how errored documents should be parsed (i.e. what the resulting DOM should be), I suspect that a HTML5 -> DOM parser is a far more complex beast.</dd>
<dt>Names! Sensible Names!</dt><dd>Because HTML is limited in its domain, it has a fixed (though growing thanks to the living standard<a href="#bpb-hode4">[4]</a> which is HTML) set of tags. When another domain is imposed on top of that, the <code>class</code> attribute tends to get pressed into service in a ugly and overloaded way. By allowing top-level tags to be domain-specific, we can make the document abstraction more 'square'<a href="#bpb-hode5">[5]</a>.</dd>
<dt>Attributes</dt><dd>Attributes allow metadata to be attached to document nodes. Just as a lower-level language is fully capable of creating a solution to any given problem, having 'zero mental cost' abstractions (such as the data structures provided by high-level languages) enables new ways of thinking about problems. In the same way, having attributes on data nodes doesn't give us anything we couldn't implement without them, but it provides another abstraction which I've found invaluable and missed when using or creating JSON data sources.</dd>
</dl>
<p>What does make me slightly(!) sad though is the practical demise of XHTML and any priority that browsers might give to processing XML. There is now a many-to-one mapping of markup to DOM, and pre HTML5 (and still in practice for the foreseeable future considering browser idiosyncrasies and bugs) - a many-to-many mapping. It wouldn't surprise me if XSLT transform support eventually disappeared from browsers.</p>
<p>Maybe there's a bit of elitism here - if you can't code well-formed markup and some decent XSLT (preferably with lots of convoluted functional programming thrown in) - then frankly 'get of my lawn!'. I love the new features in HTML(5), but part of me wishes that there was an implied background 'X' unquestionably preceding that, for all things. The success of the web is that it broke out of that mould. But in doing that it has compromised the formalisms which machines demand and require. Is the dream of the machine-readable semantic web getting further away - even as cool and accessible (and standards compliant - at last) web content finally looks like it might possibly start to achieve its goal? Is it too much (and too late) to dream of 'data' (rather than ramblings like this one) being available in the same form for both the human viewer and the computer automaton?</p>
<p>I'm prepared to be realistic and accept where we've come to. It's not all bad, and the speed with which technology is changing has never been faster. It's an exciting time to wield electronic information, and we've got the tools to move forward from inaccessible files stored on closed, disconnected systems. So where I used to say 'XML over HTTP', my new mantra shall now be 'HTTP or it doesn't exist'. At least for a while.</p>
<ul>
<li><a name="bpb-hode1"><a href="http://lukeplant.me.uk/blog/posts/docs-or-it-doesnt-exist/">[1] http://lukeplant.me.uk/blog/posts/docs-or-it-doesnt-exist/</a></a></li>
<li><a name="bpb-hode2"><a href="http://blog.wearpants.org/elitism-and-the-importance-of-pypi">[2] http://blog.wearpants.org/elitism-and-the-importance-of-pypi</a></a></li>
<li><a name="bpb-hode3"><a href="http://epc.buffalo.edu/authors/goldsmith/if_it_doesnt_exist.html">[3] http://epc.buffalo.edu/authors/goldsmith/if_it_doesnt_exist.html</a></a></li>
<li><a name="bpb-hode4"><a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/">[4] http://www.whatwg.org/specs/web-apps/current-work/multipage/</a></a></li>
<li><a name="bpb-hode5"><a href="http://codedstructure.net/articles/square_abstractions.html">[5] http://codedstructure.net/articles/square_abstractions.html</a></a></li>
</ul>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com3tag:blogger.com,1999:blog-3029438816324283001.post-70834270615885399412011-06-05T22:51:00.000+01:002011-06-05T22:51:51.726+01:00Bugs: they hide where you're not looking<p>I bought a new <a href="http://www.pogoplug.com/">Pogoplug</a> over the weekend (only £50 new at PC World), and after being genuinely impressed by the Pogoplug software, decided it was far too easy and put <a href="http://plugapps.com/">PlugApps</a> Linux on it. These 'plug' type devices are fairly amazing - cheap, very low power (measured mine at just under 4 watts, with only a single USB flash stick), but with a decent 1.2GHz ARM processor. I'm already thinking my <em>next</em> computer might be another 'plug.</p>
<p>After hacking for I while (<em>why won't my printer work?</em>), I decided to check whether my <a href="http://pypi.python.org/pypi/pylibftdi">pylibftdi</a> package worked on it. To my shock, a quick '<code>pacman -S libftdi; pip install pylibftdi</code>', installed fine, and I could open a device connection to a FTDI device! But then things got worse. Trying to run <code>examples/lcd.py</code> failed with an exception in <code>BitBangDevice</code>, and I quickly realised that the API changes I'd done in 0.8 to make device access more 'file-like' had broken things in <code>BitBangDriver</code>. I was slightly sad that I'd released something where the <em>examples</em> didn't even work, and part of the whole reason the package might be useful to people (the abstraction over bit-bang device operation) was broken.</p>
<p><code>pylibftdi</code> is fairly simple, and basically consists of <code>Driver</code>, <code>Device</code>, and <code>BitBangDevice</code> classes. Most of the interesting is in the <code>Device</code> class - so this is where I started when I finally got round to adding some tests for the 0.8 release. Having achieved reasonable coverage (though shamefully less than the 100% Uncle Bob demands), I considered my initial testing 'done'. I knew there was more to add later, and had (and still have) full intentions to 'get around to it'.</p>
<div style="clear:both;border:1px solid transparent">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWFuqlv3_jS-lpa_3oQ7YTDYx53bWx-HtbprU51_5MsU93FKLx_yoUlABDrDaMiWmL-Y2FWETjgvQgfsuP99gsDXv9Vr-s3NBVquBUOVq92Bk_Ozo1Ikooa5jbKnQ6azOFTc6JsD1WV1M/s1600/pogo_pylibftdi.jpg" imageanchor="1" style="clear:left; float:left;margin-right:1em; margin-bottom:1em"><img border="0" height="320" width="302" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWFuqlv3_jS-lpa_3oQ7YTDYx53bWx-HtbprU51_5MsU93FKLx_yoUlABDrDaMiWmL-Y2FWETjgvQgfsuP99gsDXv9Vr-s3NBVquBUOVq92Bk_Ozo1Ikooa5jbKnQ6azOFTc6JsD1WV1M/s320/pogo_pylibftdi.jpg" /></a></div>
<p>What I failed to anticipate has the unintended side-effect of writing tests. In the same way teachers might teach how to pass an exam rather than the depth and breadth of a subject, once tests exist, the purpose can simply be to pass them. Old manual acceptance tests get ignored as the 'old way' of doing things. Ideally of course this isn't a problem, because full test cases exist for every feature and code-path in the system, but that was very far from the case here. So somehow, my standard acceptance test (do the example programs still work) got omitted, in preference for 'there are tests now, so it must be better! And the tests pass!'</p>
</div>
<p>So beware - a little testing can be a dangerous thing. The bugs hide where you're not looking for them. This is great motivation for achieving full test coverage, for automating acceptance testing (as well as unit / component level testing) so far as possible, and for being humble when it comes to writing tests. My motivations for writing them in the first place were two-fold: the feeling it was 'what I should do', and the idea that at some future point when I added or refactored things later I could be confident I hadn't broken things. I had no thought that the software was already broken; that I <em>needed</em> tests.</p>
<p>Anyway, <a href="http://pypi.python.org/pypi/pylibftdi">pylibftdi</a> 0.8.1 is now out, with trivial but important fixes and lots more tests.</p>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com1tag:blogger.com,1999:blog-3029438816324283001.post-50004723925591026412011-05-07T22:57:00.001+01:002011-05-07T23:01:57.415+01:00pylibftdi 0.8 released; testing, coverage, and mocking<p><a href="http://pypi.python.org/pypi/pylibftdi">pylibftdi</a> is a file-like wrapper to <a href="http://www.ftdichip.com/Products/Modules/DevelopmentModules.htm">FTDI USB devices</a> such as the UB232R (a USB<->serial converter) and the UM245R (8 bit parallel I/O).</p>
<p>No big changes for the 0.8 release, but a couple of new things:
<ul>
<li>ability to specify the device identifier in the <code>Device(<em>[device_id]</em>)</code> constructor as either a serial number (as previously), or a device description. So can now specify <code>Device('UB232R')</code>, and the first attached UB232R device will be opened. The code initially tries to open by serial number, and if that fails will try to open by description, which I'm fairly confident will be useful rather than annoying :-)</li>
<li>more file-like API functions (<code>flush</code>, <code>readline()</code>/<code>readlines()</code>/<code>writelines()</code>, iterating over lines). These probably only make sense for text over serial lines, but that's a use-case worth supporting, considering pylibftdi already has unicode support.</li></ul></p>
<p>As well as that, I finally got round to adding some tests, and discovered something wonderful: checking <strong>test coverage</strong> isn't just practical icing on the cake to make sure things are tested well, but is a powerful and effective <strong>motivation</strong> for writing tests. I'm using <a href="http://nedbatchelder.com/code/coverage/">coverage</a>, and have to say it's one of those things I wish I had got round to sooner.</p>
<p>Speaking of which, at some point I'll probably end up saying the same about <a href="http://www.voidspace.org.uk/python/mock/">Mock</a>, which I've read around and know I should probably start using, but it's just so easy in Python to knock up something like this:
</p>
<pre class="prettyprint">
fn_log = []
class SimpleMock(object):
"""
This is a simple mock plugin for fdll which logs any calls
made through it to fn_log, which is currently rather ugly
global state.
"""
def __init__(self, name="<base>"):
self.__name = name
def __getattr__(self, key):
# This makes me smile :)
return self.__dict__.get(key, SimpleMock(key))
def __call__(self, *o, **k):
fn_log.append(self.__name)
# most fdll calls return 0 for success
return 0
def get_calls(fn):
"return the called function names which the fdll mock object made"
del fn_log[:]
fn()
return fn_log
</pre>
<p>Sometimes I think Python makes 'clever' things like that too easy, and is perhaps the reason that although in the Python <em>language</em> there is only-one-way-to-do-it, in the Python ecosystem there is perhaps a tendency to reinvent the wheel over and over again. Because it's easy - and it's fun.</p>
<p>As always code is at <a href="http://bitbucket.org/codedstructure/pylibftdi">bitbucket</a>. For the next release (0.9) I'm planning to add more tests and docs (which are rather scarce), as well as one or two of the other things I've got planned (possible D2XX support, or at least some notes on libftdi on Windows, more examples & protocol adapters, maybe even a web interface for 8 bit IO...)</p>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com0tag:blogger.com,1999:blog-3029438816324283001.post-87869592952675802812011-04-18T22:56:00.001+01:002011-05-07T23:08:54.014+01:00piefull - a simple javascript & canvas visualisation tool<p>OK, so calling <a href="http://code.codedstructure.net/piefull"><strong><code>piefull</code></strong></a> a visualisation tool is going a bit over the top, but it is a tool, and it does help with visualisation. It does one thing and one thing only - plot pie-charts indicating a single value. And it's even more restricted than that - the value it plots needs to be a percentage value.</p>
<p>The main use cases for this are things like task completion status (project and outliner applications, test coverage, etc) or resource allocation (disk space, CPU usage).</p>
<p>By choosing contrasting colours (these are configurable in <strong><code>piefull</code></strong>), the the overall outlook can be ascertained by a glance from a distance. By default the charts it generates are fairly small - 24px - which allows them to be used in-line in text, or as entries in tables. The general approach is derived from the sparklines example given in David Flanagan's <a href="http://oreilly.com/catalog/0636920016045">Canvas Pocket Reference</a> - replacing some textual data with a pictorial equivalent in the hope(!) it will be more quickly understandable. Of course this approach also lends itself well to graceful degradation, as the data is already there in the document itself.</p>
<p>There are plenty of other pie-chart generators - after all it's a fairly trivial thing to write with HTML5 canvas elements. But most of these tend to be fairly complex, with lots of options. I needed something where a fixed size display could represent a single value clearly, and <strong><code>piefull</code></strong> is the result.</p>
<script>
var piefull={selector:'.piefull',yescol:'#00F',nocol:'#0F0',size:24,PERCENT_RE:/(-?\d+(?:\.\d*)?(?:e[+\-]?\d+)?)%?/i,START_ANGLE:(2*Math.PI)*0.75,main:function(selector,size,yescol,nocol){var that=this;if(typeof document.querySelectorAll==='undefined'){return;}
var _selector=selector||this.selector;var _piesize=size||this.size;var _yescol=yescol||this.yescol;var _nocol=nocol||this.nocol;var _drawArc=function(ctx,arclen,full){ctx.beginPath();ctx.moveTo(_piesize/2,_piesize/2);ctx.arc(_piesize/2,_piesize/2,_piesize/2,that.START_ANGLE,that.START_ANGLE+arclen,!full);ctx.fillStyle=full?_yescol:_nocol;ctx.fill();ctx.closePath();};var pies=document.querySelectorAll(_selector);for(var i=0;i<pies.length;i++){var pie=pies[i];var valueText=pie.textContent||pie.innerText;var percent_match=that.PERCENT_RE.exec(valueText);if(percent_match!==null){var value=percent_match[0];var arclen=(2*Math.PI)*parseFloat(value)/100;var canvas=document.createElement('canvas');var attr=pie.getAttribute('class');if(attr){canvas.setAttribute('class',attr);}
attr=pie.getAttribute('id');if(attr){canvas.setAttribute('id',attr);}
pie.parentNode.replaceChild(canvas,pie);canvas.setAttribute('title',valueText);canvas.width=_piesize;canvas.height=_piesize;if(typeof canvas.getContext==='undefined'){canvas=window.G_vmlCanvasManager.initElement(canvas);}
var ctx=canvas.getContext('2d');_drawArc(ctx,arclen,true);_drawArc(ctx,arclen,false);}}}};
</script>
<script>
window.addEventListener('load', (function() { piefull.main("table.piefull td div"); piefull.main(); }));
</script>
<style>
table.piefull-example td {
padding: 0 5px 0 5px;
}
</style>
<p>Basically <strong><code>piefull</code></strong> turns this:</p>
<table class="piefull-example">
<tr>
<td><div>10%</div></td>
<td><div>20%</div></td>
<td><div>33%</div></td>
<td><div>20%</div></td>
<td><div>10%</div></td>
</tr>
<tr>
<td><div>18%</div></td>
<td><div>55%</div></td>
<td><div>33%</div></td>
<td><div>23%</div></td>
<td><div>12%</div></td>
</tr>
<tr>
<td><div>14%</div></td>
<td><div>35%</div></td>
<td><div>40%</div></td>
<td><div>21%</div></td>
<td><div>11%</div></td>
</tr>
<tr>
<td><div>12%</div></td>
<td><div>29%</div></td>
<td><div>14%</div></td>
<td><div>11%</div></td>
<td><div>5%</div></td>
</tr>
<tr>
<td><div>12%</div></td>
<td><div>17%</div></td>
<td><div>10%</div></td>
<td><div>9%</div></td>
<td><div>5%</div></td>
</tr>
</table>
<p>(which looks like this in code:)</p>
<pre class="prettyprint lang-html">
<table class="piefull">
<tr>
<td><div>10%</div></td>
...
<td><div>5%</div></td>
</tr>
</table>
</pre>
<p>into this:</p>
<table class="piefull-example piefull">
<tr>
<td><div>10%</div></td>
<td><div>20%</div></td>
<td><div>33%</div></td>
<td><div>20%</div></td>
<td><div>10%</div></td>
</tr>
<tr>
<td><div>18%</div></td>
<td><div>55%</div></td>
<td><div>33%</div></td>
<td><div>23%</div></td>
<td><div>12%</div></td>
</tr>
<tr>
<td><div>14%</div></td>
<td><div>35%</div></td>
<td><div>40%</div></td>
<td><div>21%</div></td>
<td><div>11%</div></td>
</tr>
<tr>
<td><div>12%</div></td>
<td><div>29%</div></td>
<td><div>14%</div></td>
<td><div>11%</div></td>
<td><div>5%</div></td>
</tr>
<tr>
<td><div>12%</div></td>
<td><div>17%</div></td>
<td><div>10%</div></td>
<td><div>9%</div></td>
<td><div>5%</div></td>
</tr>
</table>
<p>by doing this:</p>
<pre class="prettyprint lang-html">
<script>
window.onload = (function() { piefull.main("table.piefull td div"); });
</script>
</pre>
<p>where <code>'table.piefull td div'</code> is a selector passed to <code>querySelectorAll()</code> to locate elements which will be replaced by little canvas piecharts. The contents of the selected elements are matched against a regular expression looking for a percentage value to extract (generally speaking, the first number), and the element is replaced with a canvas element displaying that value. The <code>class</code>es and <code>id</code> of the original element are preserved in the new element, allowing sensible CSS styling, and the canvas <code>title</code> takes on the text which it has replaced. As well as the selector, there are a (small) number of other parameters - size, 'yes' and 'no' colours. A value of e.g. 10 will display a 10% pie-segment in the 'yes' colour - the remainder will be the 'no' colour. (Like this: <span class="piefull">10%</span>.) These are all optional - even the selector defaults to <code>'.piefull'</code>, which works great for a small number of spans or divs in some prose:</p>
In other news, at least <span class="piefull">66%</span> of statistics are made up. No, wait - it should be a little higher now.
<p><em>Note:</em> If you're viewing this on IE8 or below, this probably makes no sense, as I've not included excanvas here. It is supported for IE8 with <a href="http://code.google.com/p/explorercanvas/">excanvas</a> (but not less than IE8). One gotcha with IE is that block-level elements such as canvas don't work inside <code><p></code> elements. But you know you want to get a better browser... And why not get one <a href="http://www.mozilla.com/firefox/">with</a> <a href="http://www.google.com/chrome">webgl</a> <a href="http://www.khronos.org/webgl/wiki/Getting_a_WebGL_Implementation">support</a> while you're at it?
<p>The code for <strong><code>piefull</code></strong> can be found <a href="http://code.codedstructure.net/piefull">here</a></p>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com0tag:blogger.com,1999:blog-3029438816324283001.post-9586705654894173702011-04-06T23:37:00.001+01:002011-04-06T23:39:45.025+01:00xmlcmd: adding an --xml option, one command at a time<p>In my <a href="http://blog.codedstructure.net/2011/03/xpath-is-to-xml-as-regex-is-to-text.html">last post</a>, I wrote some thoughts on how using the text based meta-language<a href="#ref_1">[1]</a> of regular expressions to filter and manipulate structured data from UNIX commands was not fully exploiting the jigsaw-puzzle approach of 'the unix philosophy'<a href="#ref_2">[2]</a>, and that XPath (and by implication XML) provided an alternative where structured data on the command line was concerned.<a href="#ref_3">[3]</a></p>
<p>I also mentioned how great things could be if, like subversion, every POSIX command line tool had an --xml switch which could output XML. (There are many programs with XML output, but the main POSIX ones<a href="#ref_4">[4]</a> don't have this as an option)</p>
<h3>Here's one I made earlier</h3>
<p>I was always aware of the danger of overstating the case, but sometimes that can be helpful. Or at least fun. And I'd already started prototyping something which looked fun, dangerous, and potentially useful. This is intended to be illustrative rather than a serious suggestion, but there might be many other cases where the concepts can be used more seriously.</p>
<h3>1. Add a path</h3>
<p>There isn't any magic to what we're doing in adding --xml options, and we're not touching the original programs. We're just using the fact that the PATH in POSIX operating systems contains an <em>ordered list</em> of entries, and we're simply inserting a 'hook' early on in the path which can catch and redirect certain formats of command, while transparently forwarding others.
</p>
<p>I tend to have a ~/bin directory on my path anyway (keeping good care that it is only writable by myself) - so I'm already set, but if not, you'll need a directory which appears first on the PATH.
<pre class="prettyprint lang-sh">
ben$ mkdir -p ~/bin
</pre>
add that path to the start of your login path (e.g. in .bashrc or .bash_profile):
<pre class="prettyprint lang-sh">
export PATH=$HOME/bin:$PATH
</pre>
Once that is done, anything in that directory will be run in preference to anything else. Put an 'ls' file in there something like the following:
<pre class="prettyprint">
#!/usr/bin/env python
print("These are not the files you're looking for")
</pre>
make it executable (<code>chmod +x ~/bin/ls</code>) and you won't be able to run '<code>ls</code>' anymore. Except you <em>are</em> running it, it's just a different <code>ls</code>, and not doing anything particularly helpful. You can always run the original ls with a fully specified path (or try using <code>$(whereis ls)</code>).
</p>
<p>
Two more things make this potentially useful:
<ul>
<li>Finding the next program on the PATH, which would have been run if something else hadn't sneaked in first</li>
<li>Selectively running either this 'original' program or some different code based on relevant criteria (e.g. existence of <code>--xml</code> in the command line options)</li>
</ul>
and the following makes things practical:
<ul>
<li>Making the two things above easily reusable for any command.
</ul>
</p>
<h3>2. The magic of args[0]</h3>
<p>Most of the time most programs ignore args[0] - the program's own name. But what if args[0] could be treated as a command line option, just like all the others? What makes this possible is having multiple symbolic links to a single program. args[0] is then the name of the original symlink by which the process was called, so although the same program is ultimately running, it can determine in what way it was called. It can therefore change its own operation. This technique is used in the <a href="http://www.busybox.net/about.html">busybox</a> project to implement a generous number of commands in a single executable.</p>
<h3>3. Introducing xmlcmd</h3>
<p><a href="http://pypi.python.org/pypi/xmlcmd">xmlcmd</a> is a Python package which supports this terrible corruption of POSIX as it should always be. The main xmlcmd module code is fairly straightforward, and is shown below. This finds the original program (which would have been run if we weren't first on the path), and then either <code>exec</code>s that (if no --xml option is provided), or runs some Python code in an dynamically imported Python module (_xyz from the xmlcmd package, where xyz is the 'original' command name) if --xml is present.</p>
<p>
<pre class="prettyprint">
#!/usr/bin/python
"""
xmlcmd.py - support for adding an --xml option to various commands
Ben Bass 2011. (Public Domain)
"""
import sys
import os
import which # from PyPI 'which' package
def process_cmd(cmd_name, args, orig_cmd_path):
"""
import and call the main() function from the module
xmlcmd._{cmd}
"""
module = __import__('xmlcmd._%s' % cmd_name,
fromlist=['_%s' % cmd_name])
raise SystemExit(module.main(args, orig_cmd_path))
def main(args=None):
"""
run system command from sys.argv[:], where sys.argv[0]
implies the real command to run (e.g. via symlinks to us)
"""
if args is None:
args = sys.argv
# args[0] will be a full path - we only want the command name
cmd_name = os.path.basename(args[0])
if cmd_name.startswith('xmlcmd'):
raise SystemExit('xmlcmd should not be called directly')
# get the command which would have run if we hadn't sneaked
# ahead of it in the $PATH
cmd_path_gen = which.whichgen(cmd_name)
cmd_path_gen.next() # skip first match (us)
orig_cmd_path = cmd_path_gen.next()
if '--xml' in args:
args.remove('--xml')
# forward to our xmlized version...
process_cmd(cmd_name, args, orig_cmd_path)
else:
# execv *replaces* this process, so it has no idea it
# wasn't called directly. Total transparency.
os.execv(orig_cmd_path, args)
if __name__ == '__main__':
main()
</pre>
</p>
<h3>4. The implementations</h3>
<p>The real work is all handled in the _{cmd} modules of course, so admittedly we've really only moved the problem around a bit. But the point of this exercise is about the ease with which we can add these new entry points into existing systems. Nothing slows down in any noticeable way, and it would be easy to extend an entire class of commands, one at a time, by nothing more than adding a Python module and creating a symlink.</p>
<p>For reference, the <code>main()</code> function from _ls.py looks something like this:
<pre class="prettyprint">
def main(args=None, orig_cmd_path=None):
"""very basic xml directory listing"""
if len(args) > 1:
target_dir = args[-1]
if not os.path.isdir(target_dir):
raise SystemExit('%s is not a directory' % (target_dir,))
else:
target_dir = os.getcwd()
root = ET.Element('directory', name=target_dir)
for fn in os.listdir(target_dir):
stat = os.stat(os.path.join(target_dir, fn))
f_el = ET.SubElement(root, 'file', mtime=str(stat.st_mtime))
ET.SubElement(f_el, 'name').text = fn
ET.SubElement(f_el, 'size').text = str(stat.st_size)
ET.ElementTree(root).write(sys.stdout, 'utf-8')
sys.stdout.write('\n')
</pre></p>
<h3>5. Example</h3>
<p>
<pre class="prettyprint lang-sh">
ben$ sudo pip install which xmlcmd
</pre>
(yup, it's on <a href="http://pypi.python.org/pypi/xmlcmd">PyPI</a>) will install the xmlcmd Python package (and the 'which' dependency), and an xmlcmd wrapper script which should end up on the path. With that done, you can now create the magic symlinks:
<pre class="prettyprint lang-sh">
ben$ ln -sf $(which xmlcmd) ~/bin/ls
ben$ ln -sf $(which xmlcmd) ~/bin/ps
</pre>
And now, assuming things are working properly (a quick <code>hash -r</code>/<code>rehash</code> can't hurt), you should now be able to do wonderful things like this:
<pre class="prettyprint lang-sh">
ben$ ps --xml aux | xpath '//process/command/text()[../../cpu > 2.5]'
</pre>
which in this case displays the command name of all processes currently taking more than 2.5% of the CPU. Sure the XPath isn't exactly elegant. But the point is that <em>patterns</em> of this micro-language would be shared between tasks, and manipulating structured data on the UNIX command line would become as easy as text manipulation is now.</p>
<h3>Here's some they made earlier...</h3>
<p>Having said and done all that, a few searches later (for 'posix commands' in this case) brought up <a href="http://www.xmlsh.org">xmlsh.org</a>, which seems to do some very similar things.</p>
<p>I also found (via <a href="#ref_2">[2]</a>) <a href="http://xmltk.sourceforge.net/">xmltk</a>, which at first glance seems to have beaten me to these ideas by about 9 years... :-)</p>
<h4>Notes</h4>
<dl><dt><a name="ref_1">[1]</a></dt><dd>'Regular expressions are notations for describing patterns of text and, in effect, make up a special-purpose language for pattern matching.' Brian Kerninghan, Beautiful Code (ed. Andy Oram & Greg Wilson, O'Reilly Media Inc).</dd>
<dt><a name="ref_2">[2]</a></dt><dd><a href="http://www.catb.org/~esr/writings/taoup/">The Art of Unix Programming</a>, Eric S. Raymond. Especially the 'Rule of Composition'; see Chapter 1. (Note this book also praises text of course...)</dd>
<dt><a name="ref_3">[3]</a></dt><dd>What a pointlessly long sentence.</dd>
<dt><a name="ref_4">[4]</a></dt><dd>POSIX 2 (Commands and Utilities) covers these, e.g. see reference <a href="http://refspecs.freestandards.org/LSB_3.2.0/LSB-Core-generic/LSB-Core-generic/command.html">here</a></dd>
</dl>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com1tag:blogger.com,1999:blog-3029438816324283001.post-82505223438405030342011-03-21T21:29:00.001+00:002011-03-21T21:40:44.572+00:00XPath is to XML as regex is to text<p>Anyone who has been a developer for a while gets familiar with <i>regular expressions</i>. They eat text for breakfast, and spit out desired answers. For all their cryptic terseness, they are at least in part reusable, and are based on only a handful (or two...) of rules. 'regexes' are a domain-specific micro-language for searching and filtering text.</p>
<p>But once we get outside of text, what is there?</p>
<p>With XML, we have XPath. I had one of those light-bulb realisations recently that what regexes are to text, XPath is to XML. And it made me think:<p>
<p><em>Why would I want to use a data substrate which doesn't have such a tool?</em></p>
<p>What I mean is this; text has regex. XML has XPath. RDBMS have SQL. Markup language of the day has... oh, it doesn't. Not really, in the sense of a standardised domain-specific micro-language. Regular expressions, XPath and SQL have history and mindshare. They 'work' specifically because they are DSLs, rather than high-level code. (OK, SQL is pushing it further than I'd like here, but it's still a DSL. Just not a micro-sized one.) To me, this is a problem which many 'NoSQL' tools have. I want the features of them, but CouchDB wants me to write map-reduce functions in JavaScript. MongoDB wants me to use a JSON-based query language. There is no commonality; no reuse; no <i>lingua franca</i> which will let me abstract the processing concepts away from the tools. Perhaps that will come in time for more data-representations (<a href="http://mypatterns.free.fr/">this</a> seems to be an attempt for JSON, for example), but there is a significant barrier before such a tool gains widespread acceptance as a common abstraction across an entire data layer.</p>
<h3>Pipelines and Data Processing</h3>
<p>The 'UNIX philosophy' of connecting together a number of single-purpose programs to accomplish larger tasks is one of the keys to its power. These tools can be plugged together in ways which the original creators may never have thought of. Tools such as sed and awk are often employed as regex-based filters to command pipelines. I wish more tools had XML output options, because the tools we use in our pipelines often output <em>structured</em> data in textual format, often in tabular form. Tables are great for human consumption (provided they are modest in size), but when we start getting empty columns, cells flowing onto multiple lines, and other inconsistencies, it becomes a pain to parse. How great things could be if every tool following subversion's lead and had an --xml option:</p>
<p><pre>svn diff -r $(svn log --stop-on-copy --xml | xpath -q -e '//log/logentry[last()]/@revision' | cut -d '"' -f 2):HEAD</pre>
(This command does a diff from a branch base to the most recent revision. It still does some basic text processing, because the end result of XPath expressions are still text nodes).</p>
<p>Just imagine if POSIX defined an XML schema for each relevant command, and mandated an --xml option. Life would be so much easier. In many environments, data is structured but we still represent it as text. The pipeline philosophy might be nice, but it isn't exploited to the full when we need to write convoluted awk scripts and inscrutable regular expressions (or worse, Perl ;) ) to try and untangle the structure from the text. Consider something straightforward like the output of 'mount' on a *nix box. On my Mac it looks like this:<br />
<pre>
ben$ mount
/dev/disk0s2 on / (hfs, local, journaled)
devfs on /dev (devfs, local, nobrowse)
map -hosts on /net (autofs, nosuid, automounted, nobrowse)
map auto_home on /home (autofs, automounted, nobrowse)
/dev/disk1s1 on /Volumes/My Book (msdos, local, nodev, nosuid, noowners)
</pre>
<p>This is structured data, but getting the information out of that text blob would not be trivial, and would probably take many minutes of trial and error with regexes to get something reasonable. And the crucial thing is that you couldn't be sure it would always work. Plug a new device in which gets mounted in some new and interesting way, and who is to say that the new output of mount won't suddenly break your hand-crafted regex? That's where XML shines. Adding new information doesn't change anything in the old information. The way to access it doesn't change. Nothing breaks in the face of extension. Compare this to something like CSV, where the insertion of an extra column means all the indices from that column onwards need to change in every producer and consumer of the data.</p>
<h3>XML and the Web</h3>
<p>I'm somewhat saddened that XHTML didn't win outright in the last decade, and that XML on the web never really took off. I spent months at a previous job trying to convince everyone that 'XML-over-HTTP' was the best thing since sliced bread. A single source of data, which could be consumed by man (via XSLT & CSS in the browser) and machine alike. Just think how much energy the likes of Google could save if our web content didn't focus almost entirely on human consumption and discriminate against machines ;-)</p>
<p>One interesting thing which has happened as XML-on-the-web has declined is the increase in use of CSS selectors, first via frameworks such as Sizzle (used in jQuery), and later in the standard <code>querySelectorAll</code> DOM method. There is clearly a need for these DSL micro-languages, and as the 'CSS selector' DSL shows, they can quickly establish themselves if there is a clear need and sufficient backing from the community. Also apparent is that existing solutions can be usurped - users could do virtually everything CSS selectors could do (and far more besides) with XPath, but that didn't happen. Simplicity won here. But just because XPath was (arguably) wrong for Web development, doesn't mean it is wrong everywhere, and I contend that there are places where we have over-simplified, forcing regular expressions and text manipulation to (and beyond) breaking point, when XML processing would make things simpler everywhere.</p>
<h3>Conclusion</h3>
<p>In terms of practicalities, if I had ever spent too long in the world of Java, I would probably see XML as an unwelcome and persistent pest. But living in the happier climes of Python-ville, I have access to the wonderful ElementTree API, via both <a href="http://docs.python.org/library/xml.etree.elementtree.html">ElementTree</a> itself (included in the standard library) and <a href="http://lxml.de/">lxml</a>.</p>Both of these support XPath as well as high-level abstractions of XML documents to and from lists and dictionaries. With ElementTree, XML access from Python is (almost) as easy as JSON access from JavaScript. And with technologies like XPath and XSLT available, I think it's worth it.</p>
<p>As a final thought, I've just had a quick glance through Greg Wilson's excellent <a href="http://www.amazon.co.uk/gp/product/0974514071/ref=as_li_ss_tl?ie=UTF8&tag=codedstruc-21&linkCode=as2&camp=1634&creative=19450&creativeASIN=0974514071">Data Crunching</a><img src="http://www.assoc-amazon.co.uk/e/ir?t=&l=as2&o=2&a=0974514071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />, which contains chapters on Text, Regular Expressions, Binary data (rather a short ad-hoc chapter), XML, and Relational Databases. Perhaps the 'binary data' chapter is short because there simply aren't many patterns available. There is no language to describe unabstracted data. And perhaps when we consider the data layer we should be using, we should think not only of features and performance, but also the power, expressiveness, and concision of the languages available to reason about the information. Perhaps too often we settle for a lowest common denominator solution (text) when a higher level one might be more powerful, especially if we don't have to give up on the concepts of fine-grained interoperability which micro-DSLs such as XPath give us.</p>
<p><em>To be continued...</em></p>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com3tag:blogger.com,1999:blog-3029438816324283001.post-12104550871212951112011-02-20T00:16:00.000+00:002011-02-20T00:16:06.693+00:00Concurrent Queue.get() with timeouts eats CPU<h4>...or how adding a timeout can make your program suffer</h4>
<p>
Call me lazy, but I like threads. Or at least I like the programming model they provide. I very rarely use explicit locks, and find the combination of threads and queues a great mental abstraction of parallel processing. Often though, the abstraction is so <a href="http://www.joelonsoftware.com/articles/LeakyAbstractions.html">leaky</a> that it gets me annoyed. Here is a case in point...
</p>
<p>I noticed this problem in a Python process which sits in a loop doing <code>readline()</code> on a file object and dispatching the incoming lines to different worker threads to do some various asynchronous actions. With no input on the source file, the process was still taking 5% CPU. I would have expected next-to-nothing, since everything should have been blocking.</p>
<p><code>strace -fc -p $PID</code> showed that the process was anything but idle though, and after further investigation, I found the culprit.
</p>
<p>
<b><i>
Concurrent <code>Queue.get()</code> with timeouts eats CPU.
</i></b>
</p>
<p>A test case for this is the following Python (2 & 3) code. It intentionally doesn't do anything, simply starting <code><i>WORKER</i></code> threads, each of which performs a blocking <code>Queue.get</code>. The main thread simply waits for a newline on stdin. I wouldn't expect this to take any significant CPU time - in theory all the threads are blocked - either waiting on stdin input, or on something to be available in the various worker queues (which nothing ever gets sent to).</p>
<pre class="prettyprint lang-py">
import threading, sys, time
try: import queue
except ImportError: import Queue as queue
WORKERS = 100
class Worker(threading.Thread):
def __init__(self):
self.queue = queue.Queue()
threading.Thread.__init__(self)
self.daemon = True
def run(self):
while True:
next_item = self.queue.get()
print(next_item)
def test():
w_set = set()
for i in range(WORKERS):
new_w = Worker()
new_w.start()
w_set.add(new_w)
print('Running: Press Enter to finish')
sys.stdin.readline()
if __name__ == '__main__':
test()
</pre>
<p>
Sure enough, running and monitoring this shows 0% CPU usage, but WORKER+1 threads in use (I'm using OS X's Activity Monitor at the moment).
</p>
<p>
But let's suppose we want to change the worker threads to wake up occasionally to do some background activity. No problem: provide a <code>timeout</code> on the <code>Queue.get()</code>:
</p>
<pre class="prettyprint lang-py">
class TimeoutWorker(Worker):
def run(self):
while True:
try:
next_item = self.queue.get(<i style="font-weight:bold">timeout=1</i>)
except queue.Empty:
# do whatever background check needs doing
pass
else:
print(next_item)
</pre>
<p>
OK, so now the threads can wake up occasionally and perform whatever activity they want.
</p>
<h4>Except...</h4>
<p>
CPU usage just went up from ~0% to 10%. Increasing <code>WORKERS</code> shows that the CPU load of this program which still does nothing (the queues never get anything put in them) is proportional to the number of threads (95% at 1000 worker threads). I'm not inclined to look further than assuming this is some artifact of the GIL (pthread activity seems to be the culprit).
</p>
<p>
This is fairly independent of the length of the timeout. For very short timeouts, I'd expect CPU usage to go up, as the worker thread is spending more time doing work rather than being blocked. But there is no noticeable difference between <code>timeout=10</code> and <code>timeout=sys.maxint</code>. In the latter case, the <code>get()</code> is never plausibly going to timeout, but the same high-CPU behaviour still occurs.
</p>
<h4>Fixing the code</h4>
<p>I'm not inclined to delve deep into CPython to look at what <code>Queue.get()</code> is doing under the hood. It's clearly something very different depending on whether it has a timeout or not. For now I'm content to fix the code to eliminate the situations where these problems can occur. Hopefully the fact that I've written this will keep me aware of this potential issue and I'll manage to avoid it in future :)</p>
<p>The code where I found this issue was using a 1 second timeout to continually check the while condition and exit if required. This was easily fixed with sending a poison-pill of None into the queue rather than setting a flag on the thread instance, and checking for this once we've got a next_item. This is cleaner anyway, allowing immediate thread termination and the use of timeout-less <code>get()</code>. For more complex cases where some background activity is required in the worker threads, it might make more sense to keep all threads using timeout-less <code>Queue.get()</code>s and have a separate thread sending sentinel values into each queue according to some schedule, which cause the background activity to be run.</p>
<h4>Conclusion</h4>
<p>It seems fairly unintuitive that simply adding a timeout to a <code>Queue.get()</code> can totally change the CPU characteristics of a multi-threaded program. Perhaps this could be documented and explained. But then in CPython it seems many threading issues are <a href="http://www.dabeaz.com/GIL/">entirely unintuitive</a>. The scientific part of my brain won't stop thinking threads are wonderful, but the engineering part is becoming increasingly sceptical about threads and enamoured with <a href="http://www.dabeaz.com/coroutines/">coroutines</a>, especially with <a href="http://www.python.org/dev/peps/pep-0380/">PEP 380</a> on the horizon.
</p>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com5tag:blogger.com,1999:blog-3029438816324283001.post-36038884047551642642011-02-09T22:42:00.000+00:002011-02-09T22:42:05.285+00:00pylibftdi 0.7 - multiple device support<p>pylibftdi has always been about minimalism, which means that if you wanted to do something it didn't support, things got tricky. One of it's glaring deficiencies until now was that it only supported a single FTDI device - if you had multiple devices plugged in, it would pick one - seemingly - at random.</p>
<p>With pylibftdi 0.7, that has finally changed, and devices can now be opened by name. Or at least by serial number, which is nearly as good. A new example script (which I've just remembered is hideously raw and lacks any tidying up at all) <code>examples/list_devices.py</code> in the source distribution will enumerate the attached devices, displaying the manufacturer (which should be FTDI in all cases), description, and serial number.</p>
<p>The API has changed slightly to cope with this; whereas previously there was just a single <code>Driver</code> class, now the primary interface is the <code>Device</code> class. <code>Driver</code> still exists, and holds the CDLL reference, as well as supporting device enumeration and providing backwards compatibility.</p>
<p>(As an aside, using ftdi_usb_find_all was (not) fun - it sets a pointer to pointer which is then used to traverse a linked list. Trivial in C, an hour of frustration in ctypes. Anyway, I got there in the end).</p>
<p><code><pre class="prettyprint">
>>> from pylibftdi import Device
>>> import time
>>>
>>> # make some noise
>>> with Device('FTE4FFVQ', mode='b') as midi_dev:
... midi_dev.baudrate = 31250
... for count in range(3):
... midi_dev.write(b'\x90\x3f\x7f')
... time.sleep(0.5)
... midi_dev.write(b'\x90\x3f\x00')
... time.sleep(0.25)
...
</pre></code></p>
<p>Both <code>Device()</code> and <code>BitBangDevice</code> take device_id as the (optional) first parameter to select the target device. If porting from an earlier version, one of the first changes is probably to use named parameters for options when instantiating these classes. My intention is that device_id will always be the first parameter, but the order and number of subsequent parameters could change.</p>
<p>Another change is that Devices are now opened implicitly on instantiation unless told not to (see the docstrings). Previously the Driver class only opened automatically when used as a context manager. There is no harm in opening devices multiple times though - subsequent <code>open()</code>s have no effect.</p>
<p>I've also finally figured out that I need to set long_description in setup.py to get documentation to appear on the PyPI front page. After all, without docs, <a href="http://lukeplant.me.uk/blog/posts/docs-or-it-doesnt-exist/">it doesn't exist</a>.
</p>
<p>It's only been a few days since 0.6, but I wanted to get this release out - I think it is a big improvement since 0.5, and It'll probably be a while till the next release. In the mean time, I'll try and get a vaguely useful example going - which will probably involve MIDI and an LCD...</p>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com0tag:blogger.com,1999:blog-3029438816324283001.post-15938673746259775122011-02-06T20:09:00.000+00:002011-02-06T20:09:52.761+00:00pylibftdi 0.6 released: now with Python 3 goodness<p><a href="http://pypi.python.org/pypi/pylibftdi/"><b>pylibftdi</a> 0.6</b> has been out the door and onto PyPI for the last few days, but I'm only just getting round to blogging about it. It's basically some minor work for Python 3 compatibility - the same code now works on both Python 2 (2.6/2.7) and Python 3. This means support for Python 2.5 has been dropped (due to use of <code>bytearray</code>/<code>bytes</code> types). I can always add it back in if people shout.</p>
<p>Other than trivially fixing a print statement to be a function call, the main change required was the expected bytes/string issue. The driver also gains a couple of parameters; <code>mode = 't'</code> ('t':text, 'b':binary) and <code>encoding = 'latin1'</code>.</p>
<p>In binary mode (the default - so no user code changes are required for this release), read() and write() take and return instances of type <code>bytes</code>. For text mode, <code>write()</code> will take either <code>bytes</code>/<code>bytearray</code>, or a string which it will encode with the given driver encoding, and <code>read()</code> will return a string. I've set the default to be <code>latin1</code> rather than using <code>utf-8</code> as it is an equivalence mapping over the first 256 code points.</p>
<h4>Coming soon...</h4>
<p>I've started work on 0.7 - the main feature of which is support for multiple devices. I had a few problems getting the right ctypes incantations to follow the linked-list which ftdi_usb_find_all sets, but that's sorted now. The bigger issue is that it really needs a split between driver and device, which could cause the API to change. I'm thinking of various ways to keep existing code working, and will probably go for something like:
<ul><li><b>0.7</b> - set pylibftdi.SUPPORT_MULTIPLE to True to use new API / support multiple devices</li>
<li><b>0.8</b> - set pylibftdi.SUPPORT_MULTIPLE to False to use old API / only support a single device / get a deprecation warning</li>
<li><b>0.9</b> - SUPPORT_MULTIPLE no longer used; old API disappears.
</ul>
</p>
<p>
So <b>0.7</b> is all about multiple device support, <b>0.8</b> will probably be support for Windows (supporting D2XX, for example), and <b>0.9</b> (or maybe just <b>1.0</b>) will be a tidy-up / bug-fix / improve docs release. In parallel with all of this I'm writing some test code which will gradually bring this side of things up-to-standard. I'm not allowing myself to do a 1.0 release without decent testing & docs. All that will probably take a two months; I only get a couple of hours a week looking at this. But it could be sooner - or later.</p>
<p>pylibftdi 0.7 should be out in within a week or so, and I'll elaborate more then, hence the lack of any examples here. I'm on the case!
</p>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com0tag:blogger.com,1999:blog-3029438816324283001.post-21192324205629532232010-12-31T17:28:00.000+00:002010-12-31T17:28:19.823+00:00HTTP 'streaming' from Python generators<p>
One of the more annoying things about HTTP is that it wants to send things in complete chunks: you ask for an object with a particular URL, and some point later you get that object. There isn't a lot you can do (at least from JavaScript) until the complete resource has loaded.
For more fine grained control, it's a bit annoying.</p>
<p>Of course web sockets will solve all of this (<a href="http://hacks.mozilla.org/2010/12/websockets-disabled-in-firefox-4/">maybe</a>) once the spec has gone through the mill a few more times. But in the mean-time there is often an impedance mismatch between how we'd like to be able to do things on the server-side and how we are forced to do them because of the way HTTP works.
</p>
<p>
The following is an example of one way to manage splitting things up. It allows Python generators to be used on the server side and sends an update to the client on every yield, with the client doing <a href="http://en.wikipedia.org/wiki/Comet_(programming)">long-polling</a> to get the data. This shouldn't be confused with CherryPy's support for yield streaming single responses back to the server (<a href="http://www.cherrypy.org/wiki/ReturnVsYield">which is discouraged</a>) - the yield functionality is hijacked for other purposes if the method decorator is applied. Also note that this is only of any help for clients which can use AJAX to repeatedly poll the generator.
</p>
<h4>Example</h4>
<p>
Let's suppose that we want to generate a large (or infinite) volume of data and send it to a web client. It could be a long text document served line-by-line. But let's use the sequence of prime numbers (because <a href="http://www.imdb.com/title/tt0118884/">that's good enough for aliens</a>). We want to send it to the client, and have it processed as it arrives. The principle is to use a generator on the server side rather than a basic request function, but wrap that in something which translates the generator into a sequence of responses, each serving one chunk of the response.</p>
<p>
Server implementation using CherryPy - note the json_yield decorator.
</p>
<pre class="prettyprint">
import cherrypy
class PrimeGen(object):
@cherrypy.expose
def index(self):
return INDEX_HTML # see below
@cherrypy.expose
@json_yield # see below
def prime(self):
# this isn't supposed to be efficient.
probe = 1
while True:
for i in range(2, probe):
if probe % i == 0:
break
else:
yield probe
probe += 2
cherrypy.quickstart(PrimeGen)
</pre>
<p>
The thing which turns this generator into something usable with long-polling is the following 'json_yield' decorator.
</p>
<p>
Because we might want more than one such generator on the server (not to mention generator instances from multiple clients), we need a key - passed in from the client - which associates a particular client with the generator instance. This isn't really handled in this example, see the source file download at the end of the post for that.
</p>
<p>
The major win is that the client doesn't have to store a 'next-index' or anything else. State is stored implicitly in the Python generator on the server side. Both client and server code should be simpler. Of course this goes against <a href="http://en.wikipedia.org/wiki/Representational_State_Transfer">REST</a> principles, where one of the fundamental tenets is that state should be stored only on the Client. But there is a place for everything.
</p>
<pre class="prettyprint">
import functools
import json
def json_yield(fn):
# each application of this decorator has its own id
json_yield._fn_id += 1
# put it into the local scope so our internal function
# can use it properly
fn_id = json_yield._fn_id
@functools.wraps(fn)
def _(self, key, *o, **k):
"""
key should be unique to a session.
Multiple overlapping calls with the same
key should not happen (will result in
ValueError: generator already executing)
"""
# create generator if it hasn't already been
if (fn_id,key) not in json_yield._gen_dict:
new_gen = fn(self, *o, **k)
json_yield._gen_dict[(fn_id,key)] = new_gen
# get next result from generator
try:
# get, assuming there is more.
gen = json_yield._gen_dict[(fn_id, key)]
content = gen.next()
# send it
return json.dumps({'state': 'ready',
'content':content})
except StopIteration:
# remove the generator object
del json_yield._gen_dict[(fn_id,key)]
# signal we are finished.
return json.dumps({'state': 'done',
'content': None})
return _
# some function data...
json_yield._gen_dict = {}
json_yield._fn_id = 0
</pre>
<p>
The HTML to go with this is basic long-polling, and separating out the state from the content. Here I'm using jQuery:
</p>
<pre class="prettyprint">
INDEX_HTML = """
<html>
<head>
<script src="http://code.jquery.com/jquery-1.4.4.min.js">
</script>
<script>
$(function() {
function update() {
$.getJSON('/prime?key=1', {}, function(data) {
if (data.state != 'done') {
$('#status').text(data.content);
//// alternative for appending:
// $('#status').append($('<div>'+data.content+'</div>'));
setTimeout(update, 0);
}
});
}
update();
});
</script>
</head>
<body>
<div id="status"></div>
</body>
</html>
"""
</pre>
<h4>Uses</h4>
<p>
The example above is contrived, but there are plenty of possibilities if the json_yield decorator is extended in a number of ways. Long running server side processes can send status information back to the client with minimal hassle. Client-side processing of large text documents can begin before they have finished downloading. One issue is that a chunk of the data should be semantically understandable. Using things like this on binary files or XML (where it is only valid once the root element is closed) won't have sensible results.
</p>
<p>
There are plenty of possibilities in extending this; the decorator could accumulate the content (rather than the client) and send the entire results up-to-now back, or (given finite memory) some portion of it using a length-limited deque. Additional meta-data (e.g. count of messages so far, or the session key) could be added to the JSON information sent to the client each poll.
</p>
<p><i><b>Disclaimer:</b> I've not done any research into how others do this, because coding is more fun than research. There are undoubtedly better ways of accomplishing similar goals. In particular there are issues with memory usage and timeouts which aren't handled with json_yield. Also note that the example is obviously silly - it would be much faster to compute the primes on the client.</i></p>
<a href="http://bitbucket.org/codedstructure/json_yield">Download Files</a>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com1tag:blogger.com,1999:blog-3029438816324283001.post-81009726536331536312010-12-16T22:27:00.000+00:002010-12-16T22:27:18.386+00:00Compiling "Essential Mathematics for Games" sample code on Mac OSX 10.6 Snow Leopard<p>
Before getting into a spot of games programming I thought I would buy a book which covered a few of the relevant topics. Wanting something a bit deeper and not likely to be outdated in the too-near future, I plumped for Van Verth and Bishop's <a href="http://www.amazon.co.uk/gp/product/0123742978?ie=UTF8&tag=codedstruc-21&linkCode=as2&camp=1634&creative=19450&creativeASIN=0123742978">Essential Mathematics for Games and Interactive Applications: A Programmer's Guide</a><img src="http://www.assoc-amazon.co.uk/e/ir?t=codedstruc-21&l=as2&o=2&a=0123742978" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />[Amazon Associates link]. Seems fairly good so far, mainly because it establishes <i>why</i> a certain approach is most applicable, rather than just telling you 'the' way to do things.
</p>
<p>
Anyway, the point of this blog post is to share some minor modifications required to the code included on the CD (which seems not available on-line) to get it building on Mac OS X 10.6 (Snow Leopard). From the documentation it seems the code was tested against OS X 10.4 and 10.5. It fails to build out-of-the-CD on 10.6, and I didn't find any updates on the website (<a href="http://www.essentialmath.com/">www.essentialmath.com</a>), but perhaps in time that will get updated.
</p>
<p>
Required changes are:
<ol>
<li>in common/MakefileCommon, change lines 15-17 from:
<pre class="prettyprint">
ifeq ($(PLATFORM), OSX)
CFLAGS_EXT = -fvisibility-inlines-hidden
endif
</pre>
to
<pre class="prettyprint">
ifeq ($(PLATFORM), OSX)
CFLAGS_EXT = -fvisibility-inlines-hidden <strong>-ffriend-injection</strong>
endif
</pre>
</li>
<li>
change the equivalent line 12 CFLAGS_EXT setting in Examples/MakefileExamples in the same way, i.e. adding <code>-ffriend-injection</code>. Presumably this is a requirement of a more recent version of GCC in Snow Leopard.
</li>
<li>In common/Graphics/OGL/IvRendererOGL.cpp line 204, remove the GLvoid in the InitGL definition, i.e. change from
<pre class="prettyprint">
int
IvRendererOGL::InitGL(GLvoid)
</pre>
to
<pre class="prettyprint">
int
IvRendererOGL::InitGL()
</pre>
</li>
</ol>
<p>
To build it (this is all in the relevant README files, as is the basic requirement of having the Mac Developer tools - i.e. XCode - installed):
<pre class="prettyprint">
cd <i><root of directory structure></i>
chmod -R +w * # only required once, files/directories copied from CD are read-only
pushd common
make PLATFORM=OSX
popd
pushd Examples
make PLATFORM=OSX
popd
</pre>
And then the example executables are available under the relevant chapter/section directory as <code>./Example</code>, e.g. <code>Examples/Ch13-Simulation/Simulation-02-Integration/Example</code>. Note some of the examples seem to segfault if not started from the working directory...
</p>
<p>Anyway, hope this helps someone. Now, on to developing some games...!</p>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com2tag:blogger.com,1999:blog-3029438816324283001.post-74701480894390016902010-11-25T23:16:00.000+00:002010-11-25T23:16:34.253+00:00pylibftdi updated to 0.5<p>
I've done some tidying up of <a href="http://pypi.python.org/pypi/pylibftdi">pylibftdi</a>, fixing a few bugs, refactored the pylibftdi package to contain several modules instead of everything dumped in a __init__.py file, and generally made it a bit cleaner. It even has docstrings for most things now, and a test or two!</p>
<p>
pylibftdi is a simple interface to <a href="http://www.intra2net.com/en/developer/libftdi/">libftdi</a>, which in turn allows accessing <a href-"http://www.ftdichip.com/">FTDI</a>'s range of USB parallel and serial chips. See <a href="http://codedstructure.blogspot.com/2010/09/announcing-pylibftdi-minimal-pythonic.html">here</a> for more details, but very briefly, serial access is provided by a file-like (read/write) interface with a <code>baudrate</code> property, and parallel access is provided by a pair of properties - <code>direction</code> (data direction register) and <code>port</code> (data IO register).</p>
<p>
I haven't yet added any more examples, hopefully I'll get round to that in the next week or so. I have been using it as a MIDI interface though, which is fun - I'll get an appropriate example of that out in the next version, together with some diagrams / photos etc.</p>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com2tag:blogger.com,1999:blog-3029438816324283001.post-46041690988533789002010-11-09T23:38:00.001+00:002010-11-09T23:38:31.954+00:00BlogPress, iPod, websiteSo this is the obligatory 'testing' post from BlogPress on my new-ish iPod touch 4th gen. <br /><br />I wish the battery life was better, but the retina display is <b>awesome</b> for reading tiny text, making ebooks surprisingly practical. It's also far too easy to spend lots of money on the app store. One nice thing is that three fingered touch typing is just about doable in landscape mode - not a skill I anticipated learning!<br /><br />I've been spending lots of time recently on my website <a target="_blank" href="http://codedstructure.net">codedstructure.net</a>, and especially on a project for playing with / learning GLSL in the context of WebGL: <a target="_blank" href="http://codedstructure.net/projects/webgl_shader_lab/">WebGL Shader Lab</a>, which I'll write more about next time.<br /><br />If only iOS Safari supported WebGL...<br /><br />- Posted using BlogPress from my iPod<br />Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com0tag:blogger.com,1999:blog-3029438816324283001.post-68435079246246430792010-10-04T22:27:00.000+01:002010-10-04T22:27:59.292+01:00dsPIC33 (and PIC18) programming on my EEEPC at last<p>I've got an EEEPC 701, and although it is annoying in some ways (small screen, limited SSD space), it's great for portability. One of my goals when I got it was to be able to use it to help developing various microcontroller projects I've got on the go, and I especially like that it doesn't take up lots of space on my (tiny) desk when I'm hacking around. It works great with <a href="http://arduino.cc">Arduino</a>, but I prefer PICs to be honest. The issue with this is that Microchip (producers of PIC microcontrollers) want everyone to use MPLAB, which only works on Windows. And even if it worked on my Linux-based EEEPC, it probably wouldn't be too great on a 7 inch screen. Anyway, being born before 1980 (but not by much!) I still prefer command lines to IDEs anyway. My first development environment was <a href="http://en.wikipedia.org/wiki/Debug_(command)">debug.com</a> on MS-DOS, and it's never been bettered :-)</p><p>At some point I'll get round to changing the OS (Linux Mint looks like the front runner at the mo...), but for now it's still got the Xandros stock install (though I've removed unionfs).
</p><p>Rather than mess around with getting the (GCC based) Microchip tools compiled on the machine, I'm using wine - both <a href="http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=1406&dDocName=en010014">C18</a> and <a href="http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=1406&dDocName=en535363">C30</a> install and work without any problems.
</p><p>For actually burning the image onto the controller, I use and the perfectly-working-without-lots-of-hassle <a href="http://www.microchip.com/pickit2">pk2cmd</a> and the brilliant PicKit2, (which was so successful Microchip went and <a href="http://www.youtube.com/watch?v=LjfIS65mwn8">broke it</a>).
</p><p>The following gives the commands I use to build and download the target .hex file - I've only got a single .c file (<a href="https://bitbucket.org/codedstructure/picmusic/src/tip/dspic_synth/DacMusic.c">this one</a>) for input at the moment, and haven't even got a makefile together yet. But getting this working took a couple of hours, so this is as much for reference as anything else...
</p>
<pre class="prettyprint"><code>#!/bin/bash
# exit on errors
set -e
C30_BASE=/home/user/.wine/drive_c/Program\ Files/Microchip/MPLAB\ C30/
echo "Building..."
wine "${C30_BASE}/bin/bin/pic30-coff-gcc.exe" -o dac_music.coff -mcpu=33fj64gp802 -Wl,--script "${C30_BASE}/support/dsPIC33F/gld/p33FJ64GP802.gld" DacMusic.c
echo "bin2hex..."
wine "${C30_BASE}/bin/bin/pic30-coff-bin2hex.exe" dac_music.coff
echo "burning..."
pk2cmd -Pdspic33fj64GP802 -Fdac_music.hex -Q -M -R -T
</code></pre>
For reference, I've also got a similar setup for something on the PIC18. I've had this running from fairly soon after I got my EEEPC, and didn't have problems getting it up and running:
<br />
<pre class="prettyprint"><code>#!/bin/bash
# exit on errors
set -e
C18_BASE=/home/user/.wine/drive_c/MCC18
echo "Compiling..."
wine ${C18_BASE}/bin/mcc18-traditional.exe -ml -p=18f252 -k -Oi+ music.c
echo "Linking..."
wine ${C18_BASE}/bin/mplink.exe \\MCC18\\lkr\\18f252i.lkr /l\\MCC18\\lib /aINHX32 music.o
echo "burning..."
pk2cmd -P18f252 -Fa.hex -M -R -T
</code></pre>
<p>At some point I'll turn them into makefiles, but for now - I can program tiny microcontrollers with my almost-as-tiny EEEPC. Which is nice and cosy.</p>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHMBQHEYk7pT2Zv4e51fmekW2AzGMWDsMmtGrY0PCmmHDU5ZnmQt3xzMdeQi51CPFNJssUB849jMFuwXzLUEHhm2i3MDKIZFTE9Mr2d1bs8obSdg7dn7xPoQZBCmUldBTxluL3pK_TJ7s/s1600/DSCN3429.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHMBQHEYk7pT2Zv4e51fmekW2AzGMWDsMmtGrY0PCmmHDU5ZnmQt3xzMdeQi51CPFNJssUB849jMFuwXzLUEHhm2i3MDKIZFTE9Mr2d1bs8obSdg7dn7xPoQZBCmUldBTxluL3pK_TJ7s/s400/DSCN3429.JPG" width="400" /></a></div>
<br />
<p>Some time soon I'll actually write a blog post or two about the projects I'm using this with - mostly synthesizers and other various things to do with MIDI.</p>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com0tag:blogger.com,1999:blog-3029438816324283001.post-19824570919100995142010-09-26T20:46:00.000+01:002010-09-26T20:46:13.052+01:00Non-evil SEO?<p>
I've always thought SEO was evil (in the non-theological sense!), but <a href="http://promotejs.com/">this effort</a> (via <a href="http://twitter.com/rem">@rem</a>) seems well worth supporting. According to my Facebook page my favourite quote is 'It is amazing what you can accomplish if you do not care who gets the credit' - so said Harry Truman. Perhaps amazing is overstating the goal, but maybe it could be paraphrased 'It is amazing what you can make Google do if it is actually helpful and not (directly) self-serving'. I guess we'll find out...
</p>
<a href='https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Function' title='JavaScript JS Documentation: JS Function apply, JavaScript Function apply, JS Function .apply, JavaScript Function .apply'><img src='http://static.jsconf.us/promotejsvs.png' height='280' width='160' alt='JavaScript JS Documentation: JS Function apply, JavaScript Function apply, JS Function .apply, JavaScript Function .apply'/></a>Ben Basshttp://www.blogger.com/profile/02283998269313171389noreply@blogger.com0