Idea: Curated Open Source Libraries

2014-08-07

I have a feeling this idea is a lot like documentation. Everyone wants it but no one wants to do it.

Since about mid May (˜3 months) I've been busy writing a bunch of server code using node.js. I quickly learned about npm, the "node package manager" and their repository of modules, npmjs.org. It's really awesome and really inspiring. At the time I wrote this there were over 81,000 modules you could add to your project. Adding them is usually as simply as typing

    npm install name_of_module --save

npm itself is also kind of amazing. It's not just something that downloads and installs modules, it also helps you make them. Write some code, type

    npm init
    npm publish

and you just published your code for everyone else to use. It's nearly that easy. I quickly wrote a few modules myself and posted them.

I also started noticing badges like this:

And clicking the badge led to this amazing continuous integration service that's free for open source projects . Every time you check in some code on github it runs your tests and tells if your changes are passing the tests.

But, after being so excited about all this code that was going to make my life easier and help me get my stuff done quicker I quickly found out that far more often than not those modules don't actually work or are buggy or have issues. They might be abandoned. The docs are out of date or flat out wrong. I might even try to fix them and find the code is horrid. Sometimes it's hard to tell if it's a net plus or a net minus.

I's spend 15mins to an hour looking for a library. To pick a fresh example, I needed a library that would launch a browser from node.js. It needed to be cross platform. That doesn't sound all that hard. In fact at one level on windows if you want to launch the default browser just issue start url on windows or open url on mac or xdg-open url on linux and you're done. That's probably 10−15 lines of code max.

But, because the project I'm working on doesn't run in IE (nor Safari as of August 2014) and since so many people have IE and Safari as their default browsers I needed code that would let me choose which browser to launch.

So I go search. I find a few. I check them out. The first one, skim the docs, browse some of the code. Realize that's not the one for me. Keep searching.

Eventually I find one and I have learned to check the "issues" to see if there's anything obviously wrong. If there's 300+ issues and no one is paying attention to them that's no good. Even if there's only 5 issues if they're serious that's no go either. I found one that had a discussion of unifying a bunch of libraries with links to 4 or 5 other libraries. The person that opened the issue pointed out that many libraries had hard coded paths. That means they weren't very robust. He claimed to have written more robust support for windows and was hoping certain low−level parts of the various libraries could be split into something common. For example that part that finds the installed browsers, their versions, and where they are seems like something all those libraries could share. I go checkout out these other libraries.

For whatever reason, maybe bad intuition, I pick one. I think it was because the guy that started that thread had contributed his windows fixes it and I took that as a positive indicator since he'd pointed out why the others were wanting. So I decide to installe the module and set out to use it. Assuming it works the feature I want to add should only take 20−30 minutes. Hahahahahahahaha!!! 😎

Let me start out by staying to the guys that wrote this library. You guys are awesome! Seriously. It might hurt to read the stuff below and you might get defensive but I truly and sincerely appreciate the work you guys put into it. In the end you did save me time with all the hard earned knowledge that's in the library. So please don't take this as a personal attack. This is about the experience and the idea that follows, not about your library. I'm just using it because it's fresh (happened today).

Ok, install

$ npm install launchpad
npm http GET https://registry.npmjs.org/launchpad
npm http 304 https://registry.npmjs.org/launchpad
npm http GET https://registry.npmjs.org/q
npm http GET https://registry.npmjs.org/async
npm http GET https://registry.npmjs.org/browserstack
npm http GET https://registry.npmjs.org/plist
npm http GET https://registry.npmjs.org/underscore
npm http GET https://registry.npmjs.org/restify
npm http 304 https://registry.npmjs.org/q
npm http 304 https://registry.npmjs.org/async
npm http 304 https://registry.npmjs.org/underscore
npm http 304 https://registry.npmjs.org/restify
npm http 304 https://registry.npmjs.org/plist
npm http 304 https://registry.npmjs.org/browserstack
npm http GET https://registry.npmjs.org/bunyan/0.10.0
npm http GET https://registry.npmjs.org/byline/2.0.2
npm http GET https://registry.npmjs.org/formidable/1.0.11
npm http GET https://registry.npmjs.org/dtrace-provider/0.0.9
npm http GET https://registry.npmjs.org/http-signature/0.9.9
npm http GET https://registry.npmjs.org/lru-cache/1.1.0
npm http GET https://registry.npmjs.org/mime/1.2.5
npm http GET https://registry.npmjs.org/node-uuid/1.3.3
npm http GET https://registry.npmjs.org/qs/0.5.0
npm http GET https://registry.npmjs.org/retry/0.6.0
npm http GET https://registry.npmjs.org/semver/1.0.14
npm http GET https://registry.npmjs.org/xmlbuilder
npm http GET https://registry.npmjs.org/xmldom
npm http 304 https://registry.npmjs.org/dtrace-provider/0.0.9
npm http 304 https://registry.npmjs.org/formidable/1.0.11
npm http 304 https://registry.npmjs.org/bunyan/0.10.0
npm http 304 https://registry.npmjs.org/lru-cache/1.1.0
npm http 304 https://registry.npmjs.org/mime/1.2.5
npm http 304 https://registry.npmjs.org/node-uuid/1.3.3
npm http 304 https://registry.npmjs.org/qs/0.5.0
npm http 304 https://registry.npmjs.org/byline/2.0.2
npm http 304 https://registry.npmjs.org/xmlbuilder
npm http 304 https://registry.npmjs.org/semver/1.0.14
npm http 304 https://registry.npmjs.org/xmldom
npm http 304 https://registry.npmjs.org/retry/0.6.0
npm http 304 https://registry.npmjs.org/http-signature/0.9.9
npm http GET https://registry.npmjs.org/asn1/0.1.11
npm http GET https://registry.npmjs.org/ctype/0.5.0
npm http 304 https://registry.npmjs.org/asn1/0.1.11
npm http 304 https://registry.npmjs.org/ctype/0.5.0
launchpad@0.3.0 node_modules/launchpad
├── browserstack@0.2.0
├── async@0.1.22
├── q@1.0.1
├── underscore@1.4.4
├── plist@0.4.3 (xmlbuilder@0.4.3, xmldom@0.1.19)
└── restify@1.4.4 (byline@2.0.2, lru-cache@1.1.0, semver@1.0.14, retry@0.6.0, mime@1.2.5, node-uuid@1.3.3, dtrace-provider@0.0.9, formidable@1.0.11, bunyan@0.10.0, qs@0.5.0, http-signature@0.9.9)

W.T.F!??!?!. Hmm

$ find node_modules/launchpad -name "*.js" | xargs wc -l
...
46873 total

Horray! 46873 lines of code just to launch a browser (T_T). Well, maybe I can fork it and remove the stuff I don't need. Let's just try to use it first.

First thing I need to do is get a list of browsers installed. The docs say

var launch = require('launchpad');
launch.<type>(configuration, function(error, launcher) {
  launcher.browsers(function(error, browsers) {
    // -> List of available browsers with version
  });
...

That's the sum total of the docs for that feature. What is configuration? It's not specified anywhere. What's the format of browsers? Not specified.

Ok, fine ... let's write some code and see if we can guess.

// test.js
var launch = require('launchpad');
var configuration = {};
launch.<type>(configuration, function(error, launcher) {
  launcher.browsers(function(error, browsers) {
    // -> List of available browsers with version
    console.log(JSON.stringify(browsers, undefined, "  "));
  });
...

I run that and cross my fingers that browsers is not some abstract JavaScript object that JSON.stringify can't handle.

Out comes

   [
      {
        "name": "chrome",
        "version": "33.0.1985.2284"
      },
      { 
        "name": "safari",
        "version", "7.0.5"
      },
   ]

So far so good. Okay, it's running on OSX. Let's try Windows XP. It said it worked on Windows XP and the thread I was reading was even talking specifically about making the the windows code more robust.

> node test.js

path.js:204
        throw new TypeError('Arguments to path.join must be strings');
              ^
TypeError: Arguments to path.join must be strings
    at f (path.js:204:15)
    at Object.filter (native)
    at Object.exports.join (path.js:209:40)
    at getPath (Z:\src\launchpadlite\lib\local\platform\windows.js:7:20)
    at Object.<anonymous> (Z:\src\launchpadlite\lib\local\platform\windows.js:12:22)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Module.require (module.js:364:17)

Hrm.

So I go digging. The library is hard coded to expect an environment variable called LOCALAPPDATA but for whatever reason my XP doesn't have a LOCALAPPDATA. As I recently ran into a similar issue with another library I fix that part of the code. File an issue with the fix.

So, now I try it on Windows 8.1

> node test.js
{ [Error: Command failed: The network name cannot be found.
] killed: false, code: 1, signal: null }

sigh...

and so it goes. I noticed it taking like 3+ seconds to come back with the list of browsers. I remove the part that looks up the version since I don't personally need that. Now it starts working in Windows 8.1. Lucky!!!

Fine, let's try to launch stuff. Change the code to launch

var launch = require('launchpad');
var configuration = {};
launch.<type>(configuration, function(error, launcher) {
  launcher.browsers(function(error, browsers) {
    // -> List of available browsers with version
    console.log(JSON.stringify(browsers, undefined, "  "));
    launcher["chrome"]('http://greggman.com', function(e, i) {
      if &#x2709;&#xFE0F; {
        console.error("error launching browser");
      } else {
        console.log("launched?");
      }
    });
  });
...

Whoa! It worked!!! Hmm, node is not exiting. Ctrl−C. The browsers gets killed. Oops. Dig dig dig. Oh, I see, tracing through the code that configuration thing that's not documented is passed to child_process.spawn so if we put detach: true it should work. It does. Yea

And so on. By the time I get it all working it's been 6 hours. Something that I was hoping would take 20−40mins. And, this is not rare, this is the norm. I went through 4 zip file libs. The most popular failed on the first example zip file I tried to decompress with it. I checked the code it was atrocious. It just plain didn't work. How the heck could it be so popular. I ended up writing my own.

I tried downloading a library to help colorize the output in the terminal. It was 60k lines of dependencies. Deleted. I downloaded one to provide command line prompts. It was 120k lines of dependencies. Like REALLY? Seriously? Something there didn't strike the developer as bloated? Even the one I settled on is 30k lines of dependencies... FOR A COMMAND PROMPT!!

Except for a very few super mature libraries more often than not they're just plain broken, undocumented, docs are out date, examples are wrong, projects are abandoned, issues filed up to 2 or 3 years ago ignored. And I'm not just talking about small projects. CouchDB's official docks are 3 years out of date! The examples in their "book" don't work! Some of them have pull requests 3 years old not applied.

And so, to the whole point of this post. I don't know how you'd make any money. Maybe ask for donations like Wikipedia. But dang, it sure seems like it would be awesome to figure out a way have a curated repository of libraries and code. I have no idea how it would work because it seems unlikely anyone would have the time to check all the libraries and on top of that if you made it crowd sourced you'd have all the problems that come with that. Bad buggy libraries that are voted up because they're popular or the people that wrote them are popular but that in fact should be curated into oblivion.

The goal would to kind of set a standard. A target for open source awesomeness. If your library shows up on the curated list with a 10 rating you've got a solid library. Gold star and 3 happy faces for you. If it's only got a 6 or lower it's basically "warning, you're going to run into lots of issues and bugs". Some how there'd need to be points for things like

  • bloat:
  • how bloated is it? Is it 50k lines for something that could be 500 lines?

    Hopefully a low score here would encourage people to keep things small

  • dependencies:
  • does it have too many dependencies? Does it pull in a few very large dependencies for no good reason?

    Hopefully this would encourage people to figure out how to decouple that stuff. Don't pull in all of express just so your collection library can provide a debugging backend status webpage. Just provide a way that people can connect express with one or 2 lines and leave express out of your dependencies.

  • modular:
  • Is it broken in pieces so you can remove the parts you don't need? Or maybe another way of putting it. Is it made from other smaller parts so that if you only need the smaller part you can just use that smaller part?

    An example in the library above. It supports launching across the network and supports launching remotely though some semi−standard remote management system. Maybe it would be better if there was one tiny library and separate driver modules that you can add to with 1 line per module.

  • cross platform:
  • If it should run in more than one place does it? In particular Windows often gets no love.

  • testing:
  • Is it running tests and are they through. Just having some tests is not enough. They actually have to exercise the code.

  • active:
  • Is the project still active. Are there issues going unanswered? Are pull requests being ignored? Is it stable? Old but stable is fine as long as it's still working. Maybe libraries though get bit rot as the stuff around them changes.

  • documentation:
  • Does it have docs. Are they up to date? A very small project might not almost no docs so the appropriate amount of docs for the project but so many projects are inadequately undocumented.

  • examples:
  • Are their examples. Do they still run?

  • code quality:
  • Is the code relatively well written or is it the worst spaghetti ever? I imaging this might be one of the hardest to judge. It would have do not be based on style because people violently disagree on style. Basically, looking at the code do you trust it works and is not too smelly.

    I imagine you'd need some kind of colorful badge like

    vs

    so it would be very obvious you want everything to be green to have your library used.

    Maybe you could get Google and Facebook to sponsor this. They seem to like, use, and encourage open source software. Anything that helps improve the entire ecosystem of open source has to be good for them too. Google tried to help once with code.google.com. It's been superseded by github but maybe it's time for something else in addition.

    I have no idea how that would work. How you'd prevent bad reviewers, popularity contests, downvoting by competing libraries or just downvoting by griefers. I also realize that it might backfire and end up having unintended consequences. I'm sure Joel Spolsky or Jeff Atwood would have some better ideas.

    It just seems like there's got to some way to positively encourage us all to strive for better because, at least with my experience over the last 3 months we all need to step up. We need a goal. We need encouragement. I've seen those "build passing" badges spreading like crazy almost like if your project doesn't have CI it's not ready. Maybe a curated badge like the ones above would gentle push us all to fix our project's until their badges were green? I'm not saying my libraries are good either. But, I know if those badges were common the OCD part of me would feel compelled to get my badge to be all green.

    Comments
    WebGL - Less Code, More Fun
    Tonde Iko, a 6 Screen 5 player game