Senator Wyden Promises To Read Out The Names Of Those Who Oppose PROTECT IP

UK Publishers Moan About Content Mining's Possible Problems; Dismiss Other Countries' Actual Experience

from the why-bother-looking-at-the-evidence? dept

Mon, Nov 21st 2011 11:17pm — Glyn Moody

One of the recommendations made by the Hargreaves Review in the UK was that a text- and data-mining exception to copyright should be created, with the following explanation of why that made sense (PDF):

We therefore recommend below that the Government should press at EU level for the introduction of an exception allowing uses of a work enabled by technology which do not directly trade on the underlying creative and expressive purpose of the work (this has been referred to as �non-consumptive� use). The idea is to encompass the uses of copyright works where copying is really only carried out as part of the way the technology works. For instance, in data mining or search engine indexing, copies need to be created for the computer to be able to analyse; the technology provides a substitute for someone reading all the documents. This is not about overriding the aim of copyright � these uses do not compete with the normal exploitation of the work itself � indeed, they may facilitate it. Nor is copyright intended to restrict use of facts. That these new uses happen to fall within the scope of copyright regulation is essentially a side effect of how copyright has been defined, rather than being directly relevant to what copyright is supposed to protect.

Who could possibly object to that? Certainly not the UK government, which accepted the recommendation (PDF):

The Government will therefore bring forward proposals in autumn 2011 for a substantial opening up of the UK�s copyright exceptions regime on this basis. This will include proposals for a limited private copying exception; to widen the exception for noncommercial research, which should also cover both text- and data-mining to the extent permissible under EU law.

Nonetheless, the UK Publishers Association, which describes its "core service" as "representation and lobbying, around copyright, rights and other matters relevant to our members, who represent roughly 80 per cent of the industry by turnover", is unhappy. Here's Richard Mollet, the Association's CEO, explaining why it is against the idea of such a text-mining exception:

If publishers lost the ability to manage access to allow content mining, three things would happen. First, the platforms would collapse under the technological weight of crawler-bots. Some technical specialists liken the effects to a denial-of-service attack; others say it would be analogous to a broadband connection being diminished by competing use. Those who are already working in partnership on data mining routinely ask searchers to �throttle back� at certain times to prevent such overloads from occurring. Such requests would be impossible to make if no-one had to ask permission in the first place.

Large-scale academic content mining is a pretty new and specialized field, so it's hardly likely that there is going to be a sudden mass attack of crawler-bots taking down sites. Publishers would have ample time to expand their infrastructure to handle demand as it developed, which would be to their advantage: the more their holdings are mined, the more they are likely to be cited and read. And if content mining did take off suddenly, that would suggest there is a huge pent-up demand that the current system of licensing has stifled - one more reason why it should be abolished.

Then there is the commercial risk. It is all very well allowing a researcher to access and copy content to mine if they are, indeed, a researcher. But what if they are not? What if their intention is to copy the work for a directly competing-use; what if they have the intention of copying the work and then infringing the copyright in it? Sure they will still be breaking the law, but how do you chase after someone if you don�t know who, or where, they are? The current system of managed access allows the bona fides of miners to be checked out. An exception would make such checks impossible.

This makes no sense. Infringing uses are either easy or hard to find using search engines. If they are easy to find, they are easy to pursue. If they are hard - internal uses, for example - then even miners with "bona fides" will be able to use copyright material in exactly these ways, and the publishers won't know.

Which leads to the third risk. Britain would be placing itself at a competitive disadvantage in the European & global marketplace if it were the only country to provide such an exception (oh, except the Japanese and some Nordic countries). Why run the risk of publishing in the UK, which opens its data up to any Tom, Dick & Harry, not to mention the attendant technical and commercial risks, if there are other countries which take a more responsible attitude.

The fact that some countries are already allowing content mining ought to be a hint that the other two fears are groundless. Instead, these inconvenient facts are dismissed out of hand as if the experience of "the Japanese and some Nordic countries" somehow doesn't count for UK publishers.

But as it turns out, there's actually a simple way to allay all of Mollet's fears at a stroke. At the beginning of his post he writes:

In coming to its recommendation on content mining, the [Hargreaves] Review drew heavily on the views of various strands of academia, most of which claimed that their vital research was being hampered by the lack of such an exception. The process of requesting licences of publishers was too time-consuming, it was claimed, and so an exception would make life easier.

This confirms that the text-mining issue is only being considered in an academic context � it's about giving scholars the ability to extract extra information from academic articles by performing analyses on their texts.

Now, most of that academic research is funded by the public through government grants to educational institutions and researchers, both in the UK and elsewhere. The open access movement has been pointing out for a decade that it would therefore not be unreasonable if the general public had free online access to the results of all this research it paid for - the articles published in academic journals. It would also allow many more scholars to access such publicly-funded work � including those who wanted to carry out text mining.

This would answer Mollet's fear that publishers' "platforms would collapse under the technological weight of crawler-bots." Since the papers could be freely downloaded from any one of the servers holding copies around the Internet, and then analysed on the researcher's own machine, there would be no crawler-bots involved at all. Open access would also eliminate the commercial risk: after all, what's the point in pirating material that is already freely available?

As for that competitive disadvantage Mollet is worried about, moving their academic titles to open access would actually give UK publishers a big advantage, since open access continues to sweep through the academic sector. It would mean that UK publishers were leading the way, rather than dragging their heels at the back.

Follow me @glynmoody on Twitter or identi.ca, and on Google+

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: copyright, data mining, education, exceptions, uk

10 Comments

If you liked this post, you may also be interested in...

Reader Comments

Subscribe: RSS

View by: Time | Thread

Anonymous Coward, 22 Nov 2011 @ 1:10am

"Content mining is a pretty new ..." !!!
I would have thought a techno blogger would be more familiar with the techno world.
[ link to this | view in thread ]
Ralph-J (profile), 22 Nov 2011 @ 1:11am

They don't want a meritocracy
If works could be freely analyzed, and potential customers can find the works that are really interesting to them, publishers lose the control over the marketability of their works.

To them, there is a risk that previously lesser known artists are going to be more successful than the ones they were hoping to earn big money with, and which they are supporting with big marketing efforts and investments.
[ link to this | view in thread ]
Glyn Moody, 22 Nov 2011 @ 1:47am

Re:
Thanks - I've clarified this in the text.
[ link to this | view in thread ]
mike allen (profile), 22 Nov 2011 @ 2:32am

Re: They don't want a meritocracy
NOTHING wrong with that If a lesser known artist produces a good work it deserves to be known and succeed.
[ link to this | view in thread ]
Dr Evil, 22 Nov 2011 @ 4:39am

keep your hands off my content
after all, if you run a crawler bot over my content, I won't have any left to sell or consume myself!
[ link to this | view in thread ]
Anonymous Coward, 22 Nov 2011 @ 6:25am

Re: keep your hands off my content
Well, crawler boys keep my house running, so it's win-win for me.
[ link to this | view in thread ]
Anonymous Coward, 22 Nov 2011 @ 8:49am

Problem is there are "search engines" that well know they are facilitating consumptive uses, and their entire purpose for existing is to knowingly, actively, and deliberately encourage consumptive use. Then, when they are called to task for what they are doing, they do all in their power to manufacture excuses for what they have been doing, looking everywhere else to place the blame.

BTW, I am pleased that you have used a quote containing the term "non-consumptive" use. The distinction between it and a "consumptive" use is a very beneficial way to contrast that which is of no moment and that which is problematic.
[ link to this | view in thread ]
Terry Bucknell, 22 Nov 2011 @ 10:40am

Surely text mining requires the ability to download the site's entire content (probably in NLM DTD XML), not just the odd researcher downloading a few OA articles to their own PC? And surely the owners of the servers are entitled to know who is harvesting their content and when so that they can ensure that their servers can meet all the demands placed upon it?
[ link to this | view in thread ]
Gwiz (profile), 22 Nov 2011 @ 11:13am

Re: Re: keep your hands off my content
crawler boys ???
[ link to this | view in thread ]
Anonymous Coward, 22 Nov 2011 @ 12:34pm

Re:
What did you expect a search engine to do? hide content from the eyes of everyone?

I your view apparently search engines should show nothing, you type in "LoL" and it shows nothing, because otherwise they would be facilitating consumptive uses right?
[ link to this | view in thread ]