Monday 24 September 2012

Project Naming in a Google World

I’m a great fan of Python; not only do I think the language itself is clean and readable, the community polite and helpful, and the ecosystem diverse and fascinating, but also the Zen of Python resonates with me.

I think there is significant value in that ‘there should be one - and preferably only one - obvious way to do it’, and that ‘namespaces are one honking great idea’. To me, it is sad that this essence of Python philosophy isn’t applied more widely.

Of course there is an element of tension in the Zen - Namespaces are about nesting, but ‘Flat is better than nested’. Nevertheless, flat within namespaces isn’t the same as not having any namespaces at all.

Namespaces don’t exist in a Google world.

I bet that most project name searches on Google are a single word. ‘jquery’ would get me want I want. ‘requests’ gets me what I want. Even one of my own projects - ‘pylibftdi’ gets me where I want to go. Getting to this point is probably part of choosing a good name. But that’s exactly the problem: how do I choose a good name for my new project? It’s one thing already knowing what project I’m interested in and simply using Google to get me there (sometimes a language name qualifier helps, e.g. ‘python flask’), it’s quite another two problems a) searching for a project to meet a given problem, not knowing what might be available b) searching for a project name I can use for my shiny new thing.

Searchable Project Names

One of the technologies I use the most at work is SSH. I tend to use it mostly in a fairly crude way, via it’s normal client and server programs ssh and sshd with many configuration options, but I have used the paramiko library. Which works well, and has a great name - easily remembered, especially after reading about its etymology on the project page. And very easily searchable. Recently, however, it’s development has slowed. I read in some places that it is now ‘deprecated’, but I’m not sure about that - the github project was last updated 11 days ago as of now… Anyhow, recently it has been forked, and its ‘successor’, has the brilliant name of… wait for it… ‘ssh’. Yes, brilliant. No, actually, it isn’t that helpful. Search for ‘ssh’, and it obviously won’t be there, straightaway, on the first page. Search for ‘python ssh’, and it still won’t be there. I guess it might be in a few months or years once it (presumably) takes off as the ‘one way to do it’, but now? It’s not helpful. Maybe it’s only aimed at people who use the PyPI search engine? And even if / when it is ‘obvious’, it’s still going to be a pain to do web searches for problems relating to use of the package. If I want to know which to use, then ‘paramiko vs ssh’ is of no help. Is the new ssh module ‘preferred’ by the community going forward? Or is it just a random fork by the Fabric guys? Other than the download stats on PyPI, it’s difficult to tell, because searching for info about it is... tricky.

As another example, the pbs package has recently changed its name to sh. Now pbs might not be the bestest name, but changing it to sh causes exactly the same kind of problem as ssh. There can be a real feeling of ‘hijacking’ when something so domain specific is used for a general project name. Using such a name is a clear signal: this is the module you should want to use for this task - you’d would be crazy to try anything else! That may or may not be intended or justified, but when it is a trivial thing for anyone to do, we developers have to be very careful and deliberate. Domain-specific project names, with massively overloaded meanings, only make sense in a very defined namespace: in these cases, the set of Python packages on PyPI.

Except, in a Google world, there aren’t namespaces.

Finding a project name (or rather finding the absence of one)

One of the problems with project naming in a flat unified project namespace (because of course there is one namespace) is project name squatting. For a variety of reasons - good and bad - developers decide that ‘release early, release often’ is a good policy. And one of the first things needed for that first visible release - perhaps the only thing needed - is a project name. So names are snapped up in an eager race. Project names have become the new hot-property. So we have lots of great project ideas, which need and find an awesome project name, make that first release, … and then do nothing. Stagnate. Just like the dot-com crazy days, we have project-name squatting, and permanent project-name ‘under construction’ empty shells… And, like defunct satellites cluttering low-earth orbit, the debris of project names now unused is a danger to every other project, trying to find its own space and path through the knowledge-sphere, avoiding the no-man’s land which has been staked out and left barren, taking juicy spectrum in an interference causing blackout. Soon there will be no more names left and [Sorry, I seem to have got carried away. Ahem.]

So…?

The following are some more thoughts and examples. Most of this is subjective. Hurrah for being able to dump half-finished ideas in a well name-spaced environment!

Over-general names:

  • ‘node’ - really unhelpful.
  • ‘windows’ - key element in GUI programming. WIMP.
  • ‘dropbox’ - to a certain extent.
  • ‘color’ - remember them? Good thing they didn’t take this word away…
  • ‘word’ - a tool for writing words?
  • eliminate a name not just from the project namespace, but increasingly from the word namespace.
  • makes web searching harder

Unpleasant / generally bad names:

  • git
  • gimp
  • My[anything] ;-)
  • Any number of ‘offensive’ or ‘wrong connotation’ names, often leading to name changes, which help no one, except in an ‘any publicity is good publicity’ kind of way:

Duplicate projects with the same name:

Create or recognise our own namespaces:

  • blog articles: author + title
  • PyPI / CPAN etc
  • ‘hungarian notation’ e.g. pyxyz, where the ‘py’ prefix includes some indicator of what namespace it lives in.
  • domain name country code extensions - ‘.io’ etc
  • ‘file extension’ as part of project name: ‘node.js’ etc
  • identification by company or organisation: iOS / iPod / i*, gmail, google maps, etc
  • identification by well-known patterns: xUnit, [j/py]Query etc.

Summary

If I were to produce a new vacuum cleaner and call it ‘Vacuum’, then various people might get upset. We (in software development) don’t really want to have to deal with all the legal & trademark clutter - the fact that we can have an idea, create a project and ‘market’ it all in a weekend is awesome, but requires us to act responsibly. Just because we can launch a new project into the orbital (name)space around us, doesn’t mean we must. Though it is awfully tempting… In addition we need to recognise, use, and educate ourselves and others about the namespaces all around us.

So I guess what I’m really saying, is (to quote Tim Peters)...

Namespaces are one honking great idea - let’s do more of those!