B l u r b O n e
“I often find myself with bloody, bruised hands and the sheets all torn up with the headboard cracked, after waking from a dream with that hellish nightmare of a clown.”
-P.H.
seen from Germany
seen from United Kingdom
seen from United Kingdom

seen from United States
seen from United States
seen from China
seen from United States
seen from China
seen from United States
seen from China

seen from Brazil
seen from China
seen from United States
seen from United States
seen from Poland
seen from United Kingdom
seen from China
seen from China
seen from United States

seen from United States
B l u r b O n e
“I often find myself with bloody, bruised hands and the sheets all torn up with the headboard cracked, after waking from a dream with that hellish nightmare of a clown.”
-P.H.
The Catalog of the Future, pt. 1
At work yesterday, the topic of the future library catalog came up. I briefly told my boss how I would want the search aspect of this hypothetical catalog to work, but here's my full idea. Fair warning, this is a technical post.
Google proved the efficacy of parsing indexes of unstructured data with its awesome search engine. That's exactly what I propose for the library catalog of the future. How do we get this done?
The strategy of parsing through everything for search only works if you have all of the data*. For libraries, 'all of the data' is the full text of materials. Every word of every book you own. Picture books, graphic novels, comics, and everything visual pose problems here, which is why I call this the catalog of the future, not the perfect catalog. But everything else can be converted to plain text and thrown into an index easily.
Publishers won't give up this data easily. But I think libraries could negotiate carefully and make this happen. I imagine publishers would increase the cost of each book. They would deserve to do that, that would be fair. Libraries would definitely have to agree to not pirate the full-text for locally created book copies. The full-text of each book would only be visible on staff systems. There's no way publishers would allow it to be seen in OPACs. Plus, I wouldn't want all of that data in the actual OPACs because it would simply be too slow when patrons click into records to see more info. But, generally, when we purchase physical or digital materials in the future, for this catalog, they'll have to come with full-text records in the same way that Amazon's Kindle Matchbook lets consumers buy physical and digital books in one purchase.
So we've got the data in the catalog at this point. This was the easy part. Step 1 complete.
This Google strategy for library catalogs also only works if we've got an algorithm to do all the parsing through the full-text records. So step 2 is searching the data. This is the tricky part. Existing OPAC algorithms are decent, in my experience, but they leave things to be desired. If they had all of the data, the full-text of books rather than just today's amount of metadata, I'd be interested in the results. My instinct would be that the results would be way better immediately. Obviously OPAC vendors would have to rework their algorithms to use that kind of data. But I would guess that with only the minimal amount of changes necessary to use the full-text data that the results would be way better instantly. If the OPAC vendors were to totally rewrite their algorithms, or if libraries were to license an algorithm from Google, it could be totally awesome.
Now, there's one aspect of this I can't speak to. Google uses huge data centers of computers to do all of our searches instantly. Libraries with full-text indexes won't have anywhere near as much data as Google, but it will still be a lot of data to process. I don't know the hardware requirements to search the data quickly. It's a problem of having enough raw processor horsepower and enough RAM to keep up. I'm not concerned, however, about storing the actual data that will be parsed. The plain-text records won't take up that much storage space**. But in terms of the hardware required to process all of the searches, I'm counting on you, Moore's Law.
This is easy to describe, but none of this is easy to do. Keep your eyes out for pt. 2 of this, which is going to be a series on the catalog of the future. There are a whole bunch of other features needed, but the algorithms and data to be search are the biggest part. What do you think the library catalog of the future will be like?
--
*This is the same reason that the NSA etc. try to get as much data as possible. If you want to find the terrorist in the digital haystack, you have to have the whole haystack at your disposal to do a complete search. If you're missing anything, that's where the bad guy could be hiding online.
**Source: that's just the way plain text works, OR, if you must have a real source, see this.
A library search portal for academic OPACs.
What the hell is this? I've tried it. It's smooth as silk. Is it a replacement OPAC for a backend like Koha?