Landgrab For Ownership Of Library Catalog Data

from the not-good dept

Wed, Dec 10th 2008 9:15am — Mike Masnick

There's been an interesting (and somewhat troubling) behind the scenes fight going on concerning library catalog data over the past few months. The Online Computer Library Center (OCLC) is a nonprofit, made up of member libraries that basically tries to help facilitate access to information among libraries. That seems like a good thing. One of its offerings is WorldCat -- basically a big online catalog of library collections, so that it's easy for anyone to find books that are available at other libraries. This, obviously, seems quite useful, and many libraries agree and are a part of WorldCat. However, a month ago, OCLC announced new policies for WorldCat that effectively allowed OCLC to claim ownership over the records that any library put in its system -- and, upon doing so, limiting what libraries could do with that data (such as, say, giving it to competing cataloging services).

This has many in the library community quite reasonably worried, with specific questions about who should be allowed to "own" library records. As that last link shows, there are a number of different people and organizations involved in the creation of a basic library database record, and basically the only thing OCLC is doing is putting it online. It's difficult to see how they can then claim ownership of it.

While this may be new in the library space, this type of debate has raged for years in other arenas, and some of the findings from those earlier battles may be instructive. The issue has to do with the concept of "database rights." Normally, factual information is not subject to any sort of copyright or ownership rights for rather obvious reasons (how do you own a fact?). However, some believe that there should be separate "database rights" that allow ownership of the compilation of certain factual information. For the most part, the US has denied this right, while Europe has allowed it -- and the results have shown, quite clearly, that the US made the right decision. Ownership of database rights tends to damaging to business while allowing the data to remain free can help build booming industries.

In this case, the scenario is a little different, because OCLC isn't trying to claim a government backed "database right" over the content, but instead wants to achieve the same effective result via a unilateral change to its terms of service -- including a bit of viral licensing code that forces the "ownership" to travel with the data. OCLC doesn't really appear to have any legal authority here, but are trying to force it through by contract -- for which I'd say there's a decent chance it wouldn't hold up in court, though no one wants it to get that far. Between the unilateral change, the claiming of ownership of others' works (including public domain contributions from the Library of Congress) and the lack of database copyrights, you could probably make a good argument that the OCLC's policy change has no weight. Still, in the short term, a much better solution would be for OCLC to back off its silly ownership claim, recognize the power of open sharing of information, and focus on adding additional benefits and services for why libraries should want to work with OCLC over competitors, rather than trying to use slimy contract terms to block out competitors. And, of course, hopefully OCLC learns that pissing off your partners and customers by dumping draconian ownership claims on them is never a good business strategy.

Filed Under: catalog info, copyright, database rights, libraries, openness, ownership, worldcat
Companies: oclc

10 Comments

If you liked this post, you may also be interested in...

Reader Comments

Subscribe: RSS

View by: Time | Thread

Future Boy, 10 Dec 2008 @ 9:37am

"I own" ownership seems like a dumb concept...

I'm amazed we humans got this far while carrying such incredible intellectual garbage around.

In my time, we have a resource based economy: almost everything we need is produced en masse via automation. The cost (in time and resources--not "money") of most necessities is so close to zero that nobody worries about it.
[ link to this | view in chronology ]
Anonymous Coward, 10 Dec 2008 @ 9:56am

Libraries entered the data
The idea behind OCLC is that individual libraries actually enter the data. That way the data only has to be entered one time instead of every library repeating the effort. The work can be checked by other libraries, thereby giving every member library access to a catalog that is reasonably accurate. Everything depends on a cooperative spirit. That is what makes this information grab so odd to me. I just can't see libraries going along with it.
[ link to this | view in chronology ]
- Doug, 10 Dec 2008 @ 11:42am
  
  Re: Libraries entered the data
  Actually, this is not quite the case. I used to work at OCLC, and have friends and acquaintances who still work there. Many of them are doing nothing more than looking at stuff (books, maps, recordings, etc.) that library X sends them, determining the cataloging information (including supplying the Dewey Decimal number), and entering the data. For some, they specialize in materials written in certain, sometimes obscure, languages (such as Arabic, Welsh, or Ancient Greek). For others, they are more of a generalist, or specialize in other ways. But almost every one has a MLS or better, and they all determine the data and enter it. But this means that Future Boy's idea that the cost in time/resources being close to zero is sometimes far off the mark, given that some of these folks doing the cataloging work are specialists who do hours of research just to figure out where an item goes in the great scheme of things. It is not just glancing at it for 30 seconds, saying that it will be with a catalog number of 521.075 instead of 520.684 (just pulling numbers out of the air). Instead, each item can potentially cost hundreds of dollars or more to catalog.
  
  Now, regarding the ownership of the data, sidestepping the whole ownership argument for a second. I would say that if OCLC cataloged the item and entered the data, then the rights to records for the item should depend on the contract between the "library" and OCLC in effect at the time, for that item. But if the "library" entered it themselves, then the determination of the rights should belong to them rather than OCLC. But I would also agree that were it not from the fact that OCLC and the libraries spends money to catalog the items, it would be better to have the data in the public with the idea that you profit not from the data itself but instead profit with providing the service. And, there are ways to work around even that last bit.
  
  As for OCLC's unilateral change... yes, it probably will very much harm them in the short and long run. This is probably just the first echos heard here, and not the last echos heard anywhere to say the least.
  [ link to this | view in chronology ]
Jesse, 10 Dec 2008 @ 10:21am

That bit of viral code seems interesting...if you duplicate the data, do you duplicate the ownership? Could the libraries just keep a copy of the data to maintain ownership then? Then the OCLC can do what it wants with its copy and individual libraries can do what they want...I don't understand how this code changes anything.
[ link to this | view in chronology ]
Rob, 10 Dec 2008 @ 10:49am

Greed - yet again
The only rhyme or reason to any of this is the one thing that remains a constant in America - Greed. Something catches their eye and they think, "Hey, I can make a buck off this. It doesn't even matter if I have a right to." and off they go. They will not stop unless they are stopped. Libraries - take them to court and set a precedent so this junk stops.

The libraries are the ones that created the databases and populated them, this group is really only maintaining it and keeping it accessible. Are things a company makes now "owned" by the janitor?
[ link to this | view in chronology ]
hegemon13, 10 Dec 2008 @ 11:19am

Isn't this claiming ownership of facts?
The quantity of a particular book in a library's collection is a fact. A list of those quantities is a list of facts. I can see how OCLC could claim ownership of their presentation of a library's collection. However, isn't claiming ownership of the actual data the same as the MLB claiming ownership of stats?

The whole concept of a database right is outrageous. Facts are facts; they aren't created. To create another, similar example, should a library or video rental store be able to have a "collection right," so that other libraries/stores could not have the same or very similar collection to theirs? Simply putting facts together in one place does not denote creation. The copyrightable parts of a database should be restricted solely to the table layout (iffy), interface, and reports, charts, and other presentations of the data.
[ link to this | view in chronology ]
twitter, 10 Dec 2008 @ 3:14pm

It's the CDDB all over again.
They don't want to own the database, they want to control who can research. Protect your right to read by protecting your right to share.
[ link to this | view in chronology ]
Anon2, 10 Dec 2008 @ 8:03pm

bizarre
This all seems a bit bizarre to me, though Doug's comment helps shed some light. Seems that there are several dimensions to the problem. On the one hand, it's incredibly difficult for me to imagine any justification for claiming a proprietary interest in data that merely reflects facts such as which library holds copies of which work. To the extent that libraries are compiling the initial catalogs and simply exporting that data into a central, metadatabase, it doesn't make sense even under the current IP legal structure. OTOH, if this organization is actually involved in creating a useful way to organize, index and search that data, and is providing the backbone to run that system, it is not simply compiling and re-transmitting data; it is contributing something creative and transformative to the process, and offering a new and valuable tool to researchers.

Leaving aside how you characterize the resources it invests in having staff whose expertise can assist libraries in properly cataloguing the works in their collection, I think there is some justification for it to expect some return on its investment. Some of that can, and probably should, come from the libraries, who are receiving a valuable service. But some ought to come from researchers (and more importantly in the case of most academic research, their institutions), who are utilizing the system to make their work more efficient and fruitful. Having still some dim recollection of my senior year in college, at a huge research university that even with its vast library did not hold the kinds of materials I needed to do my senior honors thesis, and the ridiculous amount of time I had to put in making phone calls, writing letters (no email back then), and ultimately having to travel to depositories I identified as likely prospects for the primary materials I needed, I would gladly have paid some small sum -- or better, seen it included in my tuition or my annual student fees (whatever they were for) -- rather than spend hundreds of hours just locating the materials I needed.

But nobody has to assert ownership in the raw data to accomplish this. If they've really built a better mousetrap, then I would think it a fantastic product to market to colleges, universities, private research institutions, corporations, pretty much anywhere real research is being done. And if I personally wanted to really track some stuff down for my own research, I would no doubt pay some reasonable fee for shorter-term access to the data, the same as my law firm does to access privately owned databases that do a heck of a better job indexing, cross indexing and providing other value-added aspects that make the task of doing legal research vastly more efficient and comprehensive (and ultimately, accurate) than the free databases out there, let alone the nightmare it used to be when we all still used difficult, dense and inevitably outdated digests to find the books we needed, and even more arcane sets of books to check whether the cases, statutes and regs we wanted to rely upon were still good law.

This seems to me to come down to the difference between bare compilation of facts and the manipulation and enhancement of those facts in ways that render the compilation transformative and useful.
[ link to this | view in chronology ]
- John Jackson, 11 Dec 2008 @ 8:44am
  
  Re: bizarre
  Because it seems there are a lot of comments from non-librarians, I just wanted to give an example of the type of data we are talking about. It's more than just a record of holdings. Catalogers create a huge amount of information for each item the library holds. For example, here is the record data for an edition of David Weinberger's Everything is Miscellaneous:
  
  000: : am4a0c
  001: : ocn122291427
  008: : 070323s2007 nyua b 001 0 eng
  010: : 2007012024
  020: : 9780805080438
  020: : 0805080430
  035: : (OCoLC)122291427
  040: : DLC|cDLC|dBAKER|dBTCTA|dC#P
  049: : CSLG
  050: 00 : HD30.2|b.W4516 2007
  082: 00 : 303.48/33|222
  100: 1 : Weinberger, David,|d1950-|?UNAUTHORIZED
  245: 10 : Everything is miscellaneous :|bthe power of the new digital disorder /|cDavid Weinberger.
  250: : 1st ed.
  260: : New York :|bTimes Books,|c2007.
  300: : 277 p. :|bill. ;|c25 cm.
  504: : Includes bibliographical references (p. [235]-257) and index.
  505: 0 : The new order of order -- Alphabetization and its discontents -- The geography of knowledge -- Lumps and splits -- The laws of the jungle -- Smart leaves -- Social knowing -- What nothing says -- Messiness as a virtue -- The work of knowledge.
  650: 0 : Knowledge management.
  650: 0 : Information technology|xManagement.
  650: 0 : Information technology|xSocial aspects.
  650: 0 : Personal information management.
  650: 0 : Information resources management.
  650: 0 : Order.
  856: 41 : |3Table of contents only|uhttp://www.loc.gov/catdir/toc/ecip0714/2007012024.html
  945: : 322011|b31275044721398|d2007-05-30|fBUADY|hDOHSTK4WK|i149710|l25|n20.75|t1.71|vYANKEEH
  949: : HD30.2.W4516 2007|wLC|c1|hDOHSTK4WK
  994: : 92|bCSL
  596: : 7
  
  And this doesn't include everything, only the fields that the patrons see. Catalogers (of which I am not) know what each of these MARC numbers mean (i.e. 650 = subject headings). This record was first created by the Library of Congress, later touched by a vendor, then by my institution. OCLC never had a hand in it. So who owns the data? We all had a part in creating it.
  [ link to this | view in chronology ]
Ben, 11 Dec 2008 @ 6:26am

And who compiled the data?
Let's not forget that these library records were compiled first by the library's employees, some of them as volunteers who did not think their charitable efforts were for corporate profit. Also, libraries are funded by taxpayers. So this landgrab is to get free information backed by local government funding without a contract.

This won't get far if it goes to court.
[ link to this | view in chronology ]