“In my experience, people who don’t work in web design and hosting just have no concept of how heavy a load something like AO3 would have. Not only is the traffic absolutely buck wild, but the quantity of data that archive needs to store is fuckoff crazy. I’m talking “more than the library of congress” crazy. The only reason it doesn’t require Netflix levels of data serving is that it’s text based rather than video.”
I co-founded a startup a few years ago (rhinobird.tv) that was for creating a collaborative video platform (live video streaming + multi-device HTML5 video conversion + P2P networking + AWS hosting among other things) and our yearly costs during initial development were right up there (just with our tiny team and handful of useability testers). We didn’t even have users at that point.
Hosting is expensive. Design and development are expensive. Adding a new, or expanding an existing features is expensive.
That seems simple, but it’s not. Each time that feature is tweaked or expanded, it requires changes to the database, which could involve completely restructuring it, or moving to a different kind of database architecture. It involves changes to the search engine because there are different kinds of search methodologies that interact differently with different kinds of databases, and languages used.
Meanwhile, the technology and methodologies behind systems architecture, hosting, databases, and search are constantly evolving.
Which means that if your whole Thing is about providing:
Hosting that can withstand hundreds of thousands of requests per hour;
A database that can work with unstructured and semi-structured and highly-structured data which may have one-to-one relationships with other data, or may be non-relational and multi-dimensional.
A database that can handle the importation of data from other databases (e.g. FF.net, Tumblr, etc) whose schema, controlled vocabularies, taxonomy, and metadata can widely differ.
A database and backend that can normalize all of that data coming in (so you know, the Thing actually works).
A robust search that has to be intelligent enough to include and exclude across a variety of boolean or natural language options and understand the difference between tags and content (and presumably other categories within the taxonomy).
AND DO ALL OF THIS QUICKLY.
Then you’ll need people who must continuously improve their skills and knowledge to implement these evolving technologies and methodologies or else the thing you’ve built will die on the vine.
That is neither cheap nor easy.
Not even getting into the costs of maintenance and security. Or the front end development whose features can be broken by browser updates. Like that Rich Text Editor? Try supporting that feature cross-platform, browser and device agnostic.
If people want to question the cost of Things On The Internet, then direct thy gaze at JSTOR which profits from paywalls to research that is not always privately funded (e.g. public university funded studies). But again, JSTOR provides a service, and that service is not cheap to expand and maintain.
But really, it’s not about costs and never was. Bitching about costs is a straw-man. It’s a cover for authoritarian censorship. It was the same old bullshit even before the LJ strike-through, and it’s the same bullshit now.