Monday 7 May 2012

urlsearch - web searching from the command line

Continuing my pursuit of having a custom vocabulary on my command line, preferably via argv[0] abuse, I've now addressed the subject of web searching. I probably make somewhere between 20 and 50 searches in a typical day, mostly on Google, but Wikipedia comes high up on the list too.

urlsearch is a small script which kicks off a browser search from the command line. The plan is that the task switch associated with switching from the command line (where you usually are, right?) to the browser is eliminated. By complex, spurious and - to-be-frank - non-existent calculations, I estimate that this reduction in friction getting your work done should make you 4.6% more productive, and thus make the world a better place.

Using Python's webbrowser module, it's straightforward to open a webbrowser to a particular page:

>>> import webbrowser
>>> webbrowser.open('http://google.com/search?q=standard+library+pep8')

What urlsearch gives is the equivalent to the above from the following at a Bash prompt:

$ google standard library pep8

It's simple, short, and in it's basic form is just a couple of lines of Python:

#!/usr/bin/env python
import sys, urllib, webbrowser

webbrowser.open('http://google.com/search?q=' +
                urllib.quote_plus(' '.join(sys.argv[1:])))

Make that executable, put it in the path, and you're good-to-go with google [1] searching from the command line. However, as always, complexity is lurking, and desires to have it's way...

The following things are addressed in urlsearch:

Automatic gTLD checking

A range of gTLDs are searched in turn using socket.getaddrinfo(HOSTNAME, 'http'). By default this list starts with the empty gTLD (so local search domains are tried first), then .com, .org, .net, and .co.uk are tried in that order - these being most relevant to my uses. Changing it to default to '.fr' first might be reasonable for French-speakers, for example, but avoiding having to think about this is one more thing not to have to think about. As it were.

Generic search engine support

This is where the wonder of argv[0] fits in :-) Via symlinks to urlsearch, various search engines can be supported. An argv[0] of 'google' will cause a google.com search, while 'wiki' is special-cased to wikipedia. The search query format also needs special-casing for many search engines - the default of /search?q={terms} works for Google, Bing and several other sites.

The following sites are directly supported or special cased:

argv[0] search engine
google Google
bing Bing
wiki Wikipedia
duckduckgo DuckDuckGo
pylib Python standard libraries (direct jump)
jquery jQuery API search (direct jump)

These are managed in the code by a very dull if/elif chain, though something a bit less 'hackish' would probably be wanted to scale to further engines.

Trac support

Trac [2] follows the same search query format as Google, and has a great 'quickjump' feature, where certain search query formats take the user directly to the relevant page. For example, a search for r5678 will go directly to the changeset for revision 5678, and a search for #1234 will go directly to ticket 1234. This ticket search can't be done from a Bash prompt however, as it will be treated as a comment and ignored. This is special cased such that if the search term is an integer, it will be preceded with '#'.

Other tweaks

Output from the browser writing into the command prompt (as happens with Chrome, for example) is redirected to /dev/null.

The Code.

The code is here: http://bitbucket.org/codedstructure/urlsearch

[1]Other search vendors are available
[2]http://trac.edgewall.org