Monday, January 01, 2007

Indexing Copyrighted Works

I watched a panel discussion on satellite last night, well into 2007, regarding Google's intent to search engine the pre Internet database of print media.

Publishers feel it's their right to capitalize on copyrights they might own, perhaps owing to deals with authors, while meanwhile librarians exult in the new ways their great libraries might serve as front ends to whatever electronic stash Google in the long run manages to amass, including by data mining in Princeton's Firestone and Fine Hall libraries.

I think the trick is to see it both ways, and to recognize that authors aka authorities deserve compensation, one way or another, but exactly how they get it is what keeps changing.

Publishers too perform value added, not just through distribution, but by banking on stables of authors, assuming some risk. Universities, also publishing ventures (as academic presses) likewise assume such risks, in providing tenured and other tracks to their human resources.

Strong business models help stabilize this picture, giving authors more time to focus on exercising their best skill set, meaning less worrying about where the next publishing deal is coming from.

In the old days, Google's intended function, which is to "bring to mind" the books you might wanna look at, based on some keyword scans, was best served by living scholars. Indeed, living scholars are still Google's chief competition in that we're each proprietary search engines, able to connect things up pretty neatly, especially in some chosen field or discipline.

When it comes to quality linking and match making within a tight field, you really can't beat a real human.

Google, on the other hand, is in response to the exponential curves we've been seeing in print for quite some time now, curves made yet steeper by the Internet itself.

The need for machine indexing, ala Vannevar Bush's MEMEX, ala CERN's need for hypertext (to cohere particle physics), has never been greater, in the sense that individual human beings, capacious and intelligent though they are, don't have mastery over that scale of a database, minus these new power tools.

Google, like a multi-story crane, bulldozer, supertanker, is a result of coordination and group effort, by individual humans, to make a superhuman difference. Nor is Google the only such player, but is well representative of our need for media indexing on a gigantic scale.

And let's remember that as authors, not just as readers, we crave these new levels of access, akin to higher security clearances in the spy novels and TV shows. "So what secrets top these top secrets I'm getting?" is the perennial question.

A lot of the new stuff worth reading, watching, listening to, will have availed itself of such higher levels of access as the new indexing technologies provide, we may lip smackingly anticipate. We'll understand our history better, our selves better, is the promise of what might be characterized as a philosophical enterprise (building a new meta level).

We really need that element of brute force, those harnessed teams of big dino gigahertz and terabyte computers, to plough through such a vast acreage of electronified materials, and Google has primitive muscle of that nature (e.g. in The Dalles, Oregon), and not only Google. Several panelists expressed their belief that the USA itself would have more potential. But why see it as either/or?

Peter Suber's Open Access News (blog)