Monday 18 April 2011

piefull - a simple javascript & canvas visualisation tool

OK, so calling piefull a visualisation tool is going a bit over the top, but it is a tool, and it does help with visualisation. It does one thing and one thing only - plot pie-charts indicating a single value. And it's even more restricted than that - the value it plots needs to be a percentage value.

The main use cases for this are things like task completion status (project and outliner applications, test coverage, etc) or resource allocation (disk space, CPU usage).

By choosing contrasting colours (these are configurable in piefull), the the overall outlook can be ascertained by a glance from a distance. By default the charts it generates are fairly small - 24px - which allows them to be used in-line in text, or as entries in tables. The general approach is derived from the sparklines example given in David Flanagan's Canvas Pocket Reference - replacing some textual data with a pictorial equivalent in the hope(!) it will be more quickly understandable. Of course this approach also lends itself well to graceful degradation, as the data is already there in the document itself.

There are plenty of other pie-chart generators - after all it's a fairly trivial thing to write with HTML5 canvas elements. But most of these tend to be fairly complex, with lots of options. I needed something where a fixed size display could represent a single value clearly, and piefull is the result.

Basically piefull turns this:

10%
20%
33%
20%
10%
18%
55%
33%
23%
12%
14%
35%
40%
21%
11%
12%
29%
14%
11%
5%
12%
17%
10%
9%
5%

(which looks like this in code:)

<table class="piefull">
    <tr>
       <td><div>10%</div></td>
       ...
       <td><div>5%</div></td>
    </tr>
</table>                 

into this:

10%
20%
33%
20%
10%
18%
55%
33%
23%
12%
14%
35%
40%
21%
11%
12%
29%
14%
11%
5%
12%
17%
10%
9%
5%

by doing this:

<script>
  window.onload = (function() { piefull.main("table.piefull td div"); });
</script>

where 'table.piefull td div' is a selector passed to querySelectorAll() to locate elements which will be replaced by little canvas piecharts. The contents of the selected elements are matched against a regular expression looking for a percentage value to extract (generally speaking, the first number), and the element is replaced with a canvas element displaying that value. The classes and id of the original element are preserved in the new element, allowing sensible CSS styling, and the canvas title takes on the text which it has replaced. As well as the selector, there are a (small) number of other parameters - size, 'yes' and 'no' colours. A value of e.g. 10 will display a 10% pie-segment in the 'yes' colour - the remainder will be the 'no' colour. (Like this: 10%.) These are all optional - even the selector defaults to '.piefull', which works great for a small number of spans or divs in some prose:

In other news, at least 66% of statistics are made up. No, wait - it should be a little higher now.

Note: If you're viewing this on IE8 or below, this probably makes no sense, as I've not included excanvas here. It is supported for IE8 with excanvas (but not less than IE8). One gotcha with IE is that block-level elements such as canvas don't work inside <p> elements. But you know you want to get a better browser... And why not get one with webgl support while you're at it?

The code for piefull can be found here

Wednesday 6 April 2011

xmlcmd: adding an --xml option, one command at a time

In my last post, I wrote some thoughts on how using the text based meta-language[1] of regular expressions to filter and manipulate structured data from UNIX commands was not fully exploiting the jigsaw-puzzle approach of 'the unix philosophy'[2], and that XPath (and by implication XML) provided an alternative where structured data on the command line was concerned.[3]

I also mentioned how great things could be if, like subversion, every POSIX command line tool had an --xml switch which could output XML. (There are many programs with XML output, but the main POSIX ones[4] don't have this as an option)

Here's one I made earlier

I was always aware of the danger of overstating the case, but sometimes that can be helpful. Or at least fun. And I'd already started prototyping something which looked fun, dangerous, and potentially useful. This is intended to be illustrative rather than a serious suggestion, but there might be many other cases where the concepts can be used more seriously.

1. Add a path

There isn't any magic to what we're doing in adding --xml options, and we're not touching the original programs. We're just using the fact that the PATH in POSIX operating systems contains an ordered list of entries, and we're simply inserting a 'hook' early on in the path which can catch and redirect certain formats of command, while transparently forwarding others.

I tend to have a ~/bin directory on my path anyway (keeping good care that it is only writable by myself) - so I'm already set, but if not, you'll need a directory which appears first on the PATH.

ben$ mkdir -p ~/bin
add that path to the start of your login path (e.g. in .bashrc or .bash_profile):
export PATH=$HOME/bin:$PATH
Once that is done, anything in that directory will be run in preference to anything else. Put an 'ls' file in there something like the following:
#!/usr/bin/env python
print("These are not the files you're looking for")
make it executable (chmod +x ~/bin/ls) and you won't be able to run 'ls' anymore. Except you are running it, it's just a different ls, and not doing anything particularly helpful. You can always run the original ls with a fully specified path (or try using $(whereis ls)).

Two more things make this potentially useful:

  • Finding the next program on the PATH, which would have been run if something else hadn't sneaked in first
  • Selectively running either this 'original' program or some different code based on relevant criteria (e.g. existence of --xml in the command line options)
and the following makes things practical:
  • Making the two things above easily reusable for any command.

2. The magic of args[0]

Most of the time most programs ignore args[0] - the program's own name. But what if args[0] could be treated as a command line option, just like all the others? What makes this possible is having multiple symbolic links to a single program. args[0] is then the name of the original symlink by which the process was called, so although the same program is ultimately running, it can determine in what way it was called. It can therefore change its own operation. This technique is used in the busybox project to implement a generous number of commands in a single executable.

3. Introducing xmlcmd

xmlcmd is a Python package which supports this terrible corruption of POSIX as it should always be. The main xmlcmd module code is fairly straightforward, and is shown below. This finds the original program (which would have been run if we weren't first on the path), and then either execs that (if no --xml option is provided), or runs some Python code in an dynamically imported Python module (_xyz from the xmlcmd package, where xyz is the 'original' command name) if --xml is present.

#!/usr/bin/python
"""
xmlcmd.py - support for adding an --xml option to various commands

Ben Bass 2011. (Public Domain)
"""

import sys
import os
import which  # from PyPI 'which' package

def process_cmd(cmd_name, args, orig_cmd_path):
    """
    import and call the main() function from the module
    xmlcmd._{cmd}
    """
    module = __import__('xmlcmd._%s' % cmd_name,
                        fromlist=['_%s' % cmd_name])
    raise SystemExit(module.main(args, orig_cmd_path))

def main(args=None):
    """
    run system command from sys.argv[:], where sys.argv[0]
    implies the real command to run (e.g. via symlinks to us)
    """
    if args is None:
        args = sys.argv

    # args[0] will be a full path - we only want the command name
    cmd_name = os.path.basename(args[0])
    if cmd_name.startswith('xmlcmd'):
        raise SystemExit('xmlcmd should not be called directly')

    # get the command which would have run if we hadn't sneaked
    # ahead of it in the $PATH
    cmd_path_gen = which.whichgen(cmd_name)
    cmd_path_gen.next()   # skip first match (us)
    orig_cmd_path = cmd_path_gen.next()

    if '--xml' in args:
        args.remove('--xml')
        # forward to our xmlized version...
        process_cmd(cmd_name, args, orig_cmd_path)
    else:
        # execv *replaces* this process, so it has no idea it
        # wasn't called directly. Total transparency.
        os.execv(orig_cmd_path, args)

if __name__ == '__main__':
    main()

4. The implementations

The real work is all handled in the _{cmd} modules of course, so admittedly we've really only moved the problem around a bit. But the point of this exercise is about the ease with which we can add these new entry points into existing systems. Nothing slows down in any noticeable way, and it would be easy to extend an entire class of commands, one at a time, by nothing more than adding a Python module and creating a symlink.

For reference, the main() function from _ls.py looks something like this:

def main(args=None, orig_cmd_path=None):
    """very basic xml directory listing"""
    if len(args) > 1:
        target_dir = args[-1]
        if not os.path.isdir(target_dir):
            raise SystemExit('%s is not a directory' % (target_dir,))
    else:
        target_dir = os.getcwd()

    root = ET.Element('directory', name=target_dir)
    for fn in os.listdir(target_dir):
        stat = os.stat(os.path.join(target_dir, fn))
        f_el = ET.SubElement(root, 'file', mtime=str(stat.st_mtime))
        ET.SubElement(f_el, 'name').text = fn
        ET.SubElement(f_el, 'size').text = str(stat.st_size)
    ET.ElementTree(root).write(sys.stdout, 'utf-8')
    sys.stdout.write('\n')

5. Example

ben$ sudo pip install which xmlcmd
(yup, it's on PyPI) will install the xmlcmd Python package (and the 'which' dependency), and an xmlcmd wrapper script which should end up on the path. With that done, you can now create the magic symlinks:
ben$ ln -sf $(which xmlcmd) ~/bin/ls
ben$ ln -sf $(which xmlcmd) ~/bin/ps
And now, assuming things are working properly (a quick hash -r/rehash can't hurt), you should now be able to do wonderful things like this:
ben$ ps --xml aux | xpath '//process/command/text()[../../cpu > 2.5]'
which in this case displays the command name of all processes currently taking more than 2.5% of the CPU. Sure the XPath isn't exactly elegant. But the point is that patterns of this micro-language would be shared between tasks, and manipulating structured data on the UNIX command line would become as easy as text manipulation is now.

Here's some they made earlier...

Having said and done all that, a few searches later (for 'posix commands' in this case) brought up xmlsh.org, which seems to do some very similar things.

I also found (via [2]) xmltk, which at first glance seems to have beaten me to these ideas by about 9 years... :-)

Notes

[1]
'Regular expressions are notations for describing patterns of text and, in effect, make up a special-purpose language for pattern matching.' Brian Kerninghan, Beautiful Code (ed. Andy Oram & Greg Wilson, O'Reilly Media Inc).
[2]
The Art of Unix Programming, Eric S. Raymond. Especially the 'Rule of Composition'; see Chapter 1. (Note this book also praises text of course...)
[3]
What a pointlessly long sentence.
[4]
POSIX 2 (Commands and Utilities) covers these, e.g. see reference here