Google Book Search Critics Ignore The Non-Exclusive Nature Of Scanning Contracts

from the no-monopolies-here dept

Fri, Nov 30th 2007 11:06am — Timothy Lee

Three years ago, Google announced an ambitious effort to scan millions of book in order to create a search engine that would do for books what the original Google search engine did for the web. The debate quickly ran into criticism from publishers who claimed the program was an infringement of the publishers' copyright. Others pointed out that Google's activities were well within the bounds of fair use. The debate has continued on and off ever since. Ars Technica points us to the latest round of this debate. On one side is economist Paul Courant, who was the provost of the University of Michigan when the University became one of Google's first library partners and is now the University's librarian. In his newly created blog, he vigorously defends Michigan's participation in the Google project, pointing out that Google will have the entire seven-million-volume collection digitized within six years, for free, while the competing Open Content Alliance charges "thousands of dollars to digitize books at a rate of tens of thousands of volumes a year." The University of Virginia's Siva Vaidhyanathan responds with a number of criticisms of the deal. In addition to copyright concerns, he's got a number of concerns about what Google will do with the digitized books. He worries about whether Google's search results will be fair, whether Google will promptly correct scanning quality problems, and whether Google will do a good enough job of preserving the files over the long term, and so forth.

These are somewhat puzzling concerns to raise at all given that Google has historically been absolutely obsessive about improving the quality of its search results and archiving useful data. But it also ignores a more fundamental point: Michigan, and Google's other library projects, aren't granting Google exclusive access to anything. Under the terms of the Google-Michigan agreement, Google returns each book after scanning it, and Michigan is free to sign up with other scanning projects, including Google's competitors. It's true that Michigan has agreed not to share the Google-created digital files with others. But the important point here is that those files wouldn't exist at all if not for the agreement. It would hardly be reasonable to expect Google to spend tens of millions of dollars to create digital files that would immediately be available to Google's competitors.

In short, Google is anything but a monopoly. There are already competing book-scanning efforts under way, and if Google's project is a success we can expect more such efforts to be launched in the future. And because Google isn't a monopoly, it doesn't make sense for universities to treat it like one by trying to micromanage every aspect of the service it ultimately offers. In the unlikely event that Google Book Search turns out to be a lousy product, consumers will punish Google by switching to the competing offerings of Microsoft, Yahoo, or others. It's pointless to try to force Google to produce a high-quality product when its competitors already give it plenty of reasons to do so.

Vaidhyanathan also characterizes the Michigan scanning program as "massive corporate welfare," but this, again, doesn't make a lot of sense. The vast majority of the books Google is scanning spend most of their time sitting on shelves unread. In principle, Google is no different from any other library patron: it checks out books, reads them, and returns them. The only difference is that it's doing it on a much larger scale than a normal library patron would. But there's no evidence that Michigan has been playing favorites. If another company approaches Michigan seeking to scan its books on the same terms, and is turned down, then people would have strong grounds for criticism. But that doesn't appear to have happened. Google's just made the best offer so far. The "corporate welfare" label just doesn't fit.

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: book scanning, libraries, michigan
Companies: google, university of michigan

17 Comments

If you liked this post, you may also be interested in...

Reader Comments

Subscribe: RSS

View by: Time | Thread

Anonymous Coward, 30 Nov 2007 @ 11:24am

"But the important point here is that those files wouldn't exist at all if not for the agreement. It would hardly be reasonable to expect Google to spend tens of millions of dollars to create digital files that would immediately be available to Google's competitors."

So if those files found their way on the Internet, would Google have a right to go after the people who put them up there?

Maybe they could hire the RIAA?
[ link to this | view in thread ]
Tim Lee, 30 Nov 2007 @ 11:32am

Re:
No, but they'd have a cause of action under either contract or trade secret law against whoever leaked them. And the copyright holders would of course have a cause of action against anyone who was making it available to the general public.
[ link to this | view in thread ]
Anonymous Coward, 30 Nov 2007 @ 1:01pm

The debate quickly ran into criticism from publishers who claimed the program was an infringement of Google's copyright.

Isn't this sentence is incorrectly stated? I don't think the publishers think Google's copyright is infringed but rather Google is infringing on the publisher's copyright.
[ link to this | view in thread ]
ChurchHatesTucker, 30 Nov 2007 @ 1:12pm

Re: Anon Coward #3
I was just going to ask the same thing. I think that second sentence should end "..an infringement of the publishers' copyrights." Unless I'm missing something.
[ link to this | view in thread ]
shanoboy (profile), 30 Nov 2007 @ 1:22pm

Hmm....
Google is taking tons of content you would other wise have to pay for and making it free across the net. Well, on one hand you look at it like a digital library.
You can borrow and look at it for as long as you like, all for free. On the other it's almost like illegal filesharing.
Normally you'd have to pay for these books. Google is giving them to you digitally, all for free. But since most literature is generally free to use through the library system anyway, I guess we've already decided in our society that literature should be shared freely (just not movies and music). Am I right in this?
[ link to this | view in thread ]
Tim Lee, 30 Nov 2007 @ 1:27pm

Re: Hmm....
They are not making books available for free on the Internet. They are displaying short excerpts in search results in order to give users context. Users who want to read the rest of the book still have to pay for it or get it from the library.
[ link to this | view in thread ]
Anonymous Coward, 30 Nov 2007 @ 1:29pm

Re: Re: Anon Coward #3
The only mar in an otherwise interesting and well thought out post.
[ link to this | view in thread ]
shanoboy, 30 Nov 2007 @ 1:45pm

Re: Re: Hmm....
"They are not making books available for free on the Internet. They are displaying short excerpts in search results in order to give users context. Users who want to read the rest of the book still have to pay for it or get it from the library."

Well crap, if that's the case I wish this technology was around when I was in college! Research would have been so much easier.

Thanks for the clarification Tim.
[ link to this | view in thread ]
Tim Lee, 30 Nov 2007 @ 1:55pm

Re: Re: Re: Anon Coward #3
Should be fixed now. Thanks.
[ link to this | view in thread ]
Lord, 30 Nov 2007 @ 3:00pm

It would hardly be reasonable to expect Google to spend tens of millions of dollars to create digital files that would immediately be available to Google's competitors. It would be good if they were eventually made available though. It could start an industry in information tools that they themselves could prosper from.
[ link to this | view in thread ]
R.Will, 30 Nov 2007 @ 5:08pm

Michigan/Google Digitization
The value of the project to Michigan is somewhere between $100MM and $250MM dollars. Is that corporate welfare? In a climate of declining state funding, should Michigan forego that benefit?

Let's assume that Barnes and Nobel stocks their shelves with unsold books (it is my understanding that they currently do that), how much does the publisher earn on those stocks? Zero. The publisher is depending upon forward expectations. How much does a publisher make on the books sold to Michigan? Full retail or whatever. In the case of B&N, the publisher MIGHT earn money, in the Michigan case, the publisher has already banked the money. Which seems like a better deal, expected earnings in the future, or banked earnings today?
[ link to this | view in thread ]
Siva Vaidhyanathan, 1 Dec 2007 @ 7:59am

Hi. Thanks for all these comments.

I have outlined my issues in various places far better suited for subtle arguments than the Web. And I will continue to do so. But let me just summarize some of these here.

Let's be clear here. Michigan and the other libraries control archives worth US$ billions. They don't need Google. Google needs the archives to enlarge its commercial venture. So it's clearly corporate welfare. The libraries are only in this for expediency -- not a core value of librarianship.

In addition, the Michigan contract is one of the few in which Michigan gets a complete archive of its collection. Most of the libraries in the project get only slivers of their collections back -- and only when Google decides to give them. In addition, the most recent contracts stipulate that the libraries may not make available to digital files until Google gives an OK. Google has all the power, yet the libraries have all the resources. This is a very bad deal. All of these restrictions work against the ethics and ideology of public and academic libraries.

And as far as exclusivity: there is exclusivity de jure and exclusivity de facto. The competition Tim mentions, such as the OCA, is very small compared to Google. And it has very little cash to spend on this project. In addition, in-house and consortia-led library digitization projects, done with the care and comprehensiveness that libraries have demonstrated for 600 years, have all been stopped thanks to Google's "crowding out" effect.

So Google, by virtue of its size, status, and market power has an effectively exclusive deal.

And Tim, please recognize that creating a PageRank-type of index for hyperlinked text is easy compared to doing such a thing with documents that lack metadata tags on the text itself. Full-text search without good metadata is folly. Web search and book search are two completely different projects. Book search is much more challenging and so far a complete failure. Google has no incentive to improve it if no one else offers anything close and Google's ranking themselves serve as a proxy for quality!

Complicate all this with the fact that such blind faith in Google's commitment to quality ignores the basic historical principle that past results do not mean anything to future ventures, the clear sense on Wall Street and elsewhere that Google is overextending itself, the fact that Google no longer answers to some egalitarian muse but to shareholders, and the fact that Google does not work for us (and never did). This is about accountability.

Google is a company. It should be acting in its own interest. Libraries serve and answer to the public. Conflating these missions is a grave mistake.

Consider all that in the light of the fact that Google cannot win in court against these publishers. You can read my UC Davis Law Review article to see why.

Google is doing more harm than good. It is threatening to undermine fair use with a ridiculous legal argument. It is crowding out quality efforts to extent knowledge with its emphasis on quantity.

Plus, snippets are useless research tools!
[ link to this | view in thread ]
Anonymous Coward, 1 Dec 2007 @ 11:37pm

Re: Re:
I dont think it would be contract law or trade secret law. The claim would be founded on a claim to control of the IP in the images that result from the scanning process. I am not a lawyer and its a while since I read the the Michigan/Google contract but I had the impression that Google was claiming property rights in the copies of the images of the books they were making (they keep and dont display the high quality copies and data they extract). It looks as though they are effectively claiming copyright in the copies. In the context of having scanned without permission of the copyright holders, this really is ODD and this behaviour is not "do no evil". Since Google is tightly limiting the use that can be made of the copies that it makes in its process, one can see why publishers and authors might be pissed off at Google's unwillingness to ask for permission to make digital copies of books which clearly are in copyright.
[ link to this | view in thread ]
bowerbird, 2 Dec 2007 @ 11:20am

siva's got a couple good points. libraries should, for instance,
negotiate far better terms from google than they have thus far.

but siva also has a number of _bad_ points that hurt his cause.
and he's heard lots of counterarguments to them, but he still
clings to them. i guess otherwise he just doesn't have a book.

-bowerbird
[ link to this | view in thread ]
Mike (profile), 2 Dec 2007 @ 4:27pm

Re:
Hi Siva,

Let's be clear here. Michigan and the other libraries control archives worth US$ billions. They don't need Google. Google needs the archives to enlarge its commercial venture. So it's clearly corporate welfare.

I don't get that statement at all. Because these libraries (in aggregate) have assets worth billions, they don't need Google? Assets are not the same as liquid assets, and you know that. To spend money on digitizing these books takes away from many other projects they can work on.

Google has all the power, yet the libraries have all the resources.

Again, that is not accurate. Google has the resources for scanning. That's the fair trade that's being made here. Libraries provide books, Google provides the labor and technology for the scanning and gives back the scanned copies in exchange for being able to index them online.

Both sides have resources here.
[ link to this | view in thread ]
Anonymous Coward, 3 Dec 2007 @ 10:49am

What a bunch of tools.

So its OK for Google to not want its digital files out there but if a studio or musician wants the same right, you think its OK to file-share?

Guess fair and balanced never made its way to Techdirt. Where is Mike talking about how much more Google would get were the information out there for free?
[ link to this | view in thread ]
Shun, 3 Dec 2007 @ 11:25am

Much of the information is out there for free
I do not even pretend to understand why Google is going this route. I do know that there are many books available as torrents. Copyright does not seem to be an issue, as a practical matter.

One major difference with books, from music and movies, is that most literature is in the public domain. Writing has been around far longer than recorded music, and the fact that the majority of the United States supports libraries, as opposed to actively burning them down, shows that they are a valuable institution.

Given that, it is a bit surprising to me that we do not already have a distributed, free library, that anyone can use to download public domain books. There is virtually no storage cost per book (I know the aggregate cost to store all these books on-line may be prohibitive, but compared to a physical library, I'm sure that they are not a whole lot).

The only issue would be accurate record keeping (making sure that no more than 1 copy of a given edition is in the library at any one time). In event of extreme threat to the collection, itself, one would also want mirrors of the library distributed around the world.

When you look for sites that distribute BSD, or the newest flavor of Linux, you get a ton of sites based throughout the world. Where can I get a copy of "Origin of Species"?
[ link to this | view in thread ]