August 6, 2002
At last night's Seapig meeting, we had a couple guests from the Perl community who offered to help Python use some of the resources of CPAN.
I want to put forward the anarchist's viewpoint as food for thought. I think Python shouldn't have a CPAN. Basically, I think CPAN is too centralized a mechanism for finding modules, and the work to create a CPAN for Python could be better spent elsewhere.
Getting a Python module involves these major steps:
- SEARCH - You need to find candidate modules.
- FILTER - You need to weed out the bad ones.
- DOWNLOAD - You need to reliably download the module.
- INSTALL - You need to easily install the module.
SEARCH/FILTER - Right now the most common mechanisms for finding Python modules are probably Google and word-of-mouth. Google is crude, because you get too many John Cleese hits, and you only find modules that are well-referenced on the web. Word-of-mouth is a far better way to find modules. How do you enhance word-of-mouth communication? I don't think CPAN does this well. It's better for the Python community to establish a reputation for effective and quick interpersonal communication about module quality. Mailing lists, IRC, and MoinMoin are the answers here.
Another idea: search on Google for ".py" and ".pyc"?
DOWNLOAD - If you create a useful module, you need a place to upload it, where other people can download it. Most developers have access to hosting services. There's no reason that all Python modules need to be downloaded from the same web location. That defeats the whole purpose of the web.
But what about mirroring? Is one download location enough? Maybe not. A single point of failure is a bad thing. Download a module directly from Australia when some one in Tacoma already has the same source seems like a waste of network resources. But this seems to beg the issue of peer-to-peer. There are now open source peer-to-peer technologies that use MD5 to reliably identify binary files--they work just as well for Python tarballs as for Britney Spears downloads.
INSTALL - Python has distutils. CPAN solves no problems here. Don't take this post too seriously. I don't want to subvert good cooperation between the Perl and Python communities. I don't want Python to reinvent the wheel, and CPAN's been a good wheel for Perl. But I do think that are other ways to solve the problem.
-- SteveHowell
The arguments here are a bit odd. I think there's some confusion here between what CPAN is and what it's perceived to be. CPAN is just a somewhat organized mirror network.
SEARCH/FILTER - CPAN does not preclude word-of-mouth. In fact, they work very well together. One of the best ways to find the best modules on CPAN is to ask someone. CPAN's more organized search utilities (over just Googling) make telling someone about a module even easier, all you have to remember is the name, not the URL. Just look up the name in CPAN.
Also, there are many programmers who work in near isolation. Either self-inflicted or force upon them by policy (ie. no IRC at work). For these folks, it's nice to have something more than Google.
So CPAN and word-of-mouth searches are really symbiotic.
DOWNLOAD - The point about modules having "to be downloaded from the same web location" is confusing. CPAN is a mirror network. There's hundreds of web and ftp mirrors. www.cpan.org and search.cpan.org just happen to be two of the best put together. There's nothing stopping anyone from making their own mirror and their own interface to CPAN. Hell, put up a gopher interface if you want! search.cpan.org is simply the killer ap that slayed all the lesser CPAN search sites.
Sorry about the confusion. My comment "downloaded from the same web location" referred to the non-CPAN approach of just putting your distribution up on a website somewhere. I was then saying that CPAN addresses the inadequacy of that approach. But then I also say that peer-to-peer solutions might do it just as well.
As for peer-to-peer, it's not worth the effort when you have an existing, worldwide, high-speed, high-capacity, freely-available mirror network. Namely, CPAN. You wouldn't download a module from Australia, you'd download it from one of the THREE Washington state CPAN mirrors. No wasted bandwidth, no single point of failure.
Peer-to-peer is actually creates lower availability. "Oh damn, I need this.python.library but the author's isn't online at the moment." Or try downloading a popular module over some guy's 56k modem while 20 other people are, too. Peer-to-peer gives you a heap of problems with almost zero benefit.
Peer-to-peer technologies are a little more sophisticated than "you-have-to-download-from-the-author-while-he's-online." Peer-to-peer networks are self-organizing. The more people that use a module, the more people have it, and the quicker and more reliably you can acquire the module. My only personal experience with peer-to-peer was Napster, which was crude and unreliable. But my buddy who downloads a lot of music and software assures me that things have gotten a lot better. State-of-the-art technologies use MD5 to verify the contents of what you're getting, and they grab data streams from multiple sources, and they recover from broken connections. So, maybe a year from now mirroring technology will seem crude and outdated. You have a module? Put it up on your website. Is your hosting service too unreliable or too remote from possible downloaders? Let peer-to-peer fill in the gap.
Finally, remember that ubiquitous, cheap, fast Internet connections are only a reality in the USofA. Perl and Python programmers are world-wide. Folks anywhere else in the world usually have to pay for connections by the minute. CPAN provides them a free way to make their software available.
Basically, Python is in search of technology that lets people quickly and reliably get bitbuckets of data. CPAN has been attacking this problem, but so have folks in the peer-to-peer community. CPAN may very well be the best-of-breed solution, but the fact that CPAN is designed for software, whereas many peer-to-peer solutions are presumably designed with music or other non-software media in mind, doesn't necessarily mean CPAN is better for Python.
INSTALL - Thank god Python has distutils! There's no problem to solve. CPAN works better when there's a single (or at least a finite set) way to install modules so the installation can be automated. Python has that single way. Perl has that single way. Ruby doesn't, and that will be a problem for Ruby.
-- MichaelSchwern -- responses from SteveHowell
More thoughts coming...
Let's start with what Python needs. Modules are hard to find and uncategorized. Some attempts at cataloging module home pages has been done (e.g., The Vaults of Parnassus, Python Topic Guides), but these are perceived as inadequate. Here's what we'd like: (feel free to add entries)
- Easy to find a module/package (a good category hierarchy)
- Module accessible when you need it (store the modules themselves, not just links, and have a robust worldwide mirror system)
- Easy to download (one-click operation or CPAN-like shell)
- Easy to install distutils-compatible modules (one-click operation or CPAN-like shell)
- Almost easy to install distutils-incompatible modules (Zope?)
- Browse modules online (README, documentation, meta-info). Link to topic guides?
If we get serious about this, we'll need to turn this into a project page.
PS. What exactly is wrong with Parnassus? It seems to be all right for what it does. Does it really deserve its bad rap?
-- MikeOrr
Here are some Parnassus pros and cons:
Parnassus is cool because...
- it provides a searchable list of Python modules
- it provides a categorized list of Python modules
- it tries to be open to everyone, not just "approved" modules
- it has a cool name
there's nothing else like it in Python-land
Parnassus sucks because...
- it doesn't provide a standardized way to download the modules
- it doesn't provide places to download the modules from
- it doesn't have standardized metadata about modules
- the module maintainer doesn't control information about the module
the site is down frequently