Google was worth 1,838,389 workers in 1998, maybe
What is an innovation worth? I'm not asking how much money it makes, because that's just part of it. To take an extreme example, if you give an invention away to the public it can still provide value to people, it's just that you're not getting any of it, or no more than anyone else. But the cash value of the uncaptured part is notoriously hard to quantify. How about a different approach?
Human labor has always been a fundamental good. And lots of new technologies have been called "labor saving devices". If we can figure out a way to calculate an innovation's equivalent in human work, we'd have a measure that works across history, even prehistory. Plus we wouldn't care if the invention was 'monetizable', ex. whether it appeared in a period with a legal system defending private property and maybe patents. 1 Maybe best of all we don't have to worry about the value of various currencies over time, and in fact can value innovations that predate money itself.
The value of some new ideas seems well captured by measuring how much human work they replace. Manufacturing and hanging drywall needs much less effort than lath and plaster. Dynamite and bulldozers remove rock with much less effort than picks and shovels. 2 But what labor is saved by the jet engine? All the laborers in the world couldn't get you from San Francisco to New York in six hours. 3
Google's search engine seems to be like that - not a labor saver but something that does what was before impossible. Is it though? Could you measure the value of Google by how much labor it would take to replace it? I think you can. What if, instead of Google's new software, you just had people? Could you build such a system that could rival, if not today's Google, then the first Google search engine, from 1998? Google was searching over only 26 million pages at the time. Couldn't you fulfill a query over those pages given enough 'librarians'? If we can value even Google this way, then maybe we've got a useful scale for innovation.
How about if you divided up the web among your librarians? Before reporting for duty, each would read each page in his or her bailiwick and remember, more or less, what they say. It's not so unrealistic if you assign people to pages pertaining to things they already know something about. Of course most pages weren't really about anything, then perhaps more than now. The blog hadn't been formally invented 4 but 'home pages' seemed to make up a majority of the web and few of them were about anything other than the author and his interests. Here's a surviving example that exemplifies the species: http://jerrypournelle.com/ Notice the multiple sections, "Books and Movie reviews", "What's new", "Reader email", etc. Remembering what was mentioned in one of those pages wouldn't be easy.
We can make it easier. Let's give each librarian the software from one of the existing, crummy pre-Google search engines (or maybe just grep) and set it up so search only their 100 pages. That will give the librarian a good quick start, jog his or her memory, and help a lot with the kind of things that unsophisticated software is good at, like finding exact matches of sentence fragments.
If we assign 100 web pages to each person and their search engine, we'd have the 26 million pages covered by 260,000 librarians. But what if you search for something common, like bill clinton and most of those 260,000 librarians have results? How to pick among them? This is really what the search engine that Google launched in 1998 did that was so great. Its results were ordered in a way that seemed like magic. You searched for that sherlock holmes story with the snake and, sure enough, the first result was The Adventure of the Speckled Band. To replicate that we're going to need more people to sort out the work of those first 260k people. We need editors.
Let's start with a layer of editors above the librarians. We assigned 100 pages to each librarian, so why not 100 librarians per editor? That'd be 2600 editors. When a dump of those bill clinton results comes in to an editor, he or she picks the 10 best, in order, and passes them on, declaring them the best 10 results from the 100 librarians he edits. Each of those are assigned 100 web pages, so the the editor's top 10 results are the best from the 10,000 pages his librarians cover. Now we've got 10 results each of our 2600 editors. We need to whittle these down to 10 results to show to the user, who is still staring at the screen, waiting. You can see that all we need are more layers of editors. log base 100 of 260,000 is 2.7 so a total of three layers of these editors is enough. That'll give us 260,000 librarians, 2,600 first level editors, 26 second level ones, and 1 chief editor: 262,627 workers. If it takes a minute for each layer to do its work, which seems reasonable, then a user gets a result back in three minutes, and the system can handle one query per minute. 5 That's not much. Luckily this system is easily parallelized. To get another query per minute we simply add another 262,627 workers searching over the same 26 million pages. Apparently Google was doing 10,000 searches per day in 1998. That's about 7 per minute. 6 To handle that at a steady state, we'll need 262,627 * 7 = 1,838,389 workers. 7
There you go. On the day Google launched they were providing, free of charge and with less than 1/120th the latency, what you'd need 1,838,389 smart workers to do the day before.
Does this technique work as a scale of innovation? Well, it’s got the nice advantages I mention above. But it can only give you an upper limit on the value of the innovation, since if it paid to do it the labor intensive way, that would have been happening. 8 It needs improvement. What do you think?
Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");
Or whether it appealed to the richer segments of society. Of course this last item is controversial. On one hand perhaps serving the needs of people who are themselves effectively creating (and capturing) value is morally better than otherwise. But that == "it's morally better to benefit the wealthy than the poor" which surely isn't true. It's not surprising that I've have waded into this swamp since I'm more or less writing about the labor theory of value, a staple of Marxism. ↩︎
Of course you need to amortize the work to build the bulldozers and dynamite. ↩︎
You might be able to do better than you'd think though. The way to see what could be done with enough manpower is to imagine yourself a Pharaoh. Better yet, the Pharaoh's head engineer with unlimited cooperative laborers. Now how fast can you move the Pharaoh from Luxor to Alexandria? I explored that here: The boat engine is worth 33500 Egyptian slaves. ↩︎
Although there were proto-bloggers already. ↩︎
I'm describing the worst-case scenario. Often an editor will have less to do for some searches, as when his reports give him fewer than 100 results. You could take advantage of this and drop the lockstep architecture. But no closed form solution to calculate how much more productive you could make the tree of workers comes to mind. You'd probably do best with a Monte Carlo simulation. This optimization would be an interesting problem. ↩︎
Actually a lot more than that at peak periods and fewer late at night. But for simplicity we'll stick with this. ↩︎
There are complications. For one thing, if the user asks for the second page of results, everyone has to do the same thing except each editor must pass up the 20 best results, since there's no way for such an editor to know which, if any of those could ultimately be in the overall top 20. All his fellow editors at his level do the same and now the editor above him has twice as much work to do. Further clicking deeper into the search results makes it worse (only linearly, though). But most people don't do that and anyway this is supposed to be a first version. ↩︎
In this case we don't really have an upper limit either, since the army of librarians and editors are so much slower than Google, and speed is so important in a search engine. ↩︎