Elsevier Says Downloading And Content-Mining Licensed Copies Of Research Papers 'Could Be Considered' Stealing
from the gotta-protect-that-39%-profit-margin dept
Elsevier has pretty much established itself as the most hated company in the world of academic publishing, a fact demonstrated most recently when all the editors and editorial board resigned from one of its top journals to set up their own, open access rival. A blog post by the statistician Chris H.J. Hartgerink shows that Elsevier is still an innovator when it comes to making life hard for academics. Hartgerink's work at Tilburg University in the Netherlands concerns detecting potentially problematic research that might involve data fabrication -- obviously an important issue for the academic world. A key technique he is employing is content mining -- essentially bringing together large bodies of text and data in order to extract interesting facts from them:
I am trying to extract test results, figures, tables, and other information reported in papers throughout the majority of the psychology literature. As such, I need the research papers published in psychology that I can mine for these data. To this end, I started 'bulk' downloading research papers from, for instance, [Elsevier's] Sciencedirect. I was doing this for scholarly purposes and took into account potential server load by limiting the amount of papers I downloaded per minute to 9. I had no intention to redistribute the downloaded materials, had legal access to them because my university pays a subscription, and I only wanted to extract facts from these papers.
He spread out the downloads over ten days so as not to hammer Elsevier's servers -- which in any case are doubtless pretty beefy given the 39% profit margin the company enjoys:
I downloaded approximately 30GB of data from Sciencedirect in approximately 10 days. This boils down to a server load of 35KB/s, 0.0021GB/min, 0.125GB/h, 3GB/day.
Elsevier's response to this super-considerate researcher is a classic:
Approximately two weeks after I started downloading psychology research papers, Elsevier notified my university that this was a violation of the access contract, that this could be considered stealing of content, and that they wanted it to stop. My librarian explicitly instructed me to stop downloading (which I did immediately), otherwise Elsevier would cut all access to Sciencedirect for my university.
There are clear parallels with the situation that Aaron Schwarz found himself in, but with a key difference. Elsevier is not only stopping Hartgerink from carrying out his research, but threatening to cut off all access to the company's journals and books for everyone working at Tilburg University if he tries to continue. Alicia Wise, Elsevier's Director of Access & Policy, added the following comment on Hartgerink's blog post:
We are happy for you to text mind content that we publish via the ScienceDirect API, but not via screen scraping.
When she was asked why it was necessary to use the API, rather than simply downloading articles, she replied:
The reason that we require miners to use the API is so that we can meet their needs AND ALSO the needs of our human users who can continue to read, search and download articles and not have their service interrupted in any way.
But that doesn't make any sense when Hartgerink had taken such pains to avoid any such adverse affects. Moreover, another commenter noted that Elsevier’s API often fails to work, rendering it useless for content mining. Even when it does work:
In many cases the API returns only metadata in the XML, compared to the fulltext PDF I can access on the website. Simply downloading the paper via the normal web service for readers is easy -- much easier than using the API.
What is really at stake here is control. Elsevier wants to be acknowledged as the undisputed gatekeeper for all possible uses of the research it publishes -- most of which was paid for by the public through taxes. And as far as the company is concerned, daring to use that knowledge in new ways without additional permission is simply "stealing."
Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: chris hartgerink, downloads, knowledge, research
Companies: elsevier
Reader Comments
Subscribe: RSS
View by: Time | Thread
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re: Re:
Elsevier adds value by putting a 'protecting' the content behind a troll gate paywall. Since you must pay to access the research, it must now (somehow) have become more valuable.
But I suppose Elsevier is lazy. If they wanted to add even more value, the research papers would have DRM and you would only be able to view them on special viewer software that runs on Windows. (Don't all scientists run only Windows?)
Copy / Paste and the ability to make screenshots would enable thieving pirates to read the research without paying through the nose.
[ link to this | view in chronology ]
just keep publishing their passwords until they have to die (and release the knowledge they have kidnapped)
[ link to this | view in chronology ]
Most of the research is taxpayer funded so the odds of our broken governments pillorying this theif is a quantum leap from zero.
Now if we could just find a couple of self serving corporations inconvenienced by this, then there would definitely be blood in the water.
[ link to this | view in chronology ]
The project makes tens of thousands of peer reviewed journals, books, and other documents freely available through a search engine. The back end repository for all this data is a torrent pool, invisible to the search engine users, but open to participation via seeding or extending the pool.
It mirrors quite a bit of otherwise paywalled content.
And its dedicated to the memory of Aaron Swartz. How cool is that?
[ link to this | view in chronology ]
Could Be Considered Stealing
COULD BE CONSIDERED STEALING
[ link to this | view in chronology ]
Re: Could Be Considered Stealing
and locking them up behind a paywall for private profit"
well, that sounds like a very profitable business model!
where can we find the annual earnings of elsevier?
can we buy stock or shares?
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Peter Murray-Rust has been fighting this for years
[ link to this | view in chronology ]
[ link to this | view in chronology ]
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re: Re:
[ link to this | view in chronology ]
Re: Re:
[ link to this | view in chronology ]
Thieves....
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Elsevier supports content mining, contra your salacious headline
As I mentioned in the comment thread to the original blog post, the reason that we require miners to use our API is so that we can meet their needs AND ALSO the needs of our human users. Our platforms provide access to 11million pieces of content, serves millions of researchers, and provides infrastructure for a number of services including ScienceDirect, Scopus, ClinicalKey. We are not alone in providing an API for this sort of high-volume content-intensive service – others including Wikipedia and Twitter take the same approach. We also appreciate that researchers might wish to text mine across publisher platforms, and this is why we also participate in the multi-publisher cross-platform text and data mining service by CrossRef (http://tdmsupport.crossref.org/).
With kind wishes,
Alicia
Dr Alicia Wise
Elsevier
Director of Access & Policy
a.wise@elsevier.com
@wisealic
[ link to this | view in chronology ]
Re: Elsevier supports content mining, contra your salacious headline
It's also been stated that there are short falls to your API, and that this user took all reasonable actions to use minimal resources. If your alternate service to provide for high volume is not usable for whatever reason, then it's fully reasonable to use the normal service, so long as you take as much care as this user is stated to have to prevent harming it, for your research.
If using your normal service in an automated manner is a problem, then please explain why, so that we can take appropriate care. Please do not be afraid to give us the technical reason why this use is an issue, as we will likely be able to understand the issue, and possibly propose a fix to this issue.
Sincerely,
Anonymous Coward
[ link to this | view in chronology ]
Re: Elsevier supports content mining, contra your salacious headline
1) Did Elsevier obtain this information in accordance with the law and with existing contractual rights and obligations, yes or no?
2) Does Elsevier have property rights in this information, yes or no?
3) Do Elsevier's protocols for allowing access to this information comply with law and with existing contractual rights and obligations, yes or no?
If the answer to all of these questions is "yes," then any rights to access that Elsevier grants in addition to what it is obligated, is irrelevant.
All of you folks, if you think Elsevier needs to do more based on a "moral imperative," I can think of a lot more things that people don't do voluntarily that they should which have a greater impact on humanity. Go after them first. If I read one more anonymous cowrard plugging away at what, in his not so humble opinion, some random company "should" or "shouldn't" do, I will barf.
[ link to this | view in chronology ]
Re: Re: Elsevier supports content mining, contra your salacious headline
Depends on exactly how you interpret "this information" but almost certainly no. Elsevier has the copyright on the specific expression written by the authors, but it does not have any property rights to the underlying facts and information. The big beef is that they are trying to get such rights by only letting you read their text if you sign a license that has such terms in it.
I am not aware of any copyright law, anywhere that gives the copyright holder control of how the facts and information in the work is used by someone who legally accesses the work, thus there is no copyright law basis for the distinction between reading and mining. They get this distinction in by putting into the licensing agreement. Is it legal to put it in the licensing agreement? My understanding is at least in the UK, it is not legal. Might be legal in the Netherlands.
[ link to this | view in chronology ]
Re: Re: Re: Elsevier supports content mining, contra your salacious headline
Response: Agreed. Rephrasing the question, does Elsevier have a legal or contractual obligation to provide non-copyrightable facts and information that has been organized in the way that this information is organized?
If it does, what is the extent of that obligation?
[ link to this | view in chronology ]
Re: Re: Re: Re: Elsevier supports content mining, contra your salacious headline
Note: the one caveat to the first sentence is that Elsevier does have a program where authors can pay a fee to make their paper free and open access. Peter Murray-Rust and others found numerous examples of papers where such fees were paid and yet there still was a charge to access the papers. Elsevier claimed it was due to some bugs (funny how the bugs only go one way) and I don't know how many papers were affected or if the problem persists, but there were certainly concrete cases where Elsevier violated their agreement with the author. Come to think of it, the best way to get stats is to use some kind of web crawler to scan though all the open papers, which of course Elsevier says violates your license agreement. Hmmmm.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Elsevier supports content mining, contra your salacious headline
And the fee that Elsevier considered reasonable is why an entire editorial team has resigned to start an open access journal.see “The editor had requested a price of 400 euros, an APC that is not sustainable”, where according to Elsevier:
As far as I know we as discussing a per page charge.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: Elsevier supports content mining, contra your salacious headline
I am sure someone can provide a service where authors can pay a fee to make their paper free and open access.
let's say for a symbolic 1 cent
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Elsevier supports content mining, contra your salacious headline
Wrong, copyright is the right to control the production of new copies, and not what use is made of the copies once sold. Unfortunately this does not fit well in a digital world, where copyright is being distorted into control over information and the uses that can be made of it.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: Elsevier supports content mining, contra your salacious headline
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: Re: Elsevier supports content mining, contra your salacious headline
You mean insisting that access to research done by scientists and paid for by tuition and grants from taxpayers and philanthropists should be unfettered? I fail to see why anyone needs to suffer the likes of Elsevier sticking their rapaciously greedy, self-entitled noses in there. They've long overstayed their welcome.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: Re: Re: Elsevier supports content mining, contra your salacious headline
As long as they have legally obtained this, it should not matter from a legal perspective if they obtained these texts from Warren Buffet or from teenage orphans living in a nunnery.
Come back to me when you see these noble scientists foregoing nicer homes, cars, etc. if an opportunity arises
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: Re: Re: Re: Elsevier supports content mining, contra your salacious headline
The good old it is the law, a common justification for maintaining the status quo used by those benefiting from the labours of others, from nobles enforcing serfdom, through slavery to modern corporations. Problem is, when those with the money have to fall back on this justification, they are ignoring the winds of change, and will likely lose more by clinging to the old ways than they would if they adapted their business to the changes in society.
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: Re: Re: Re: Re: Elsevier supports content mining, contra your salacious headline
I'm not one to much care about legal perspectives. There are other, far more important, perspectives besides the legal one, such as morality and ethics. Legality should be the last resort tool you reach for. No, I don't expect corporations to care about morality and ethics (they're ill equipped to do so, and by law constrained from doing so), but we do, and we should. I understand Elsevier wants to enrich its shareholders. That doesn't at all mean it would be smart or correct for us to let them get away with what to me looks like outright theft stirred with slavery.
Wow. Think of where Elsevier gets the content it publishes. Yes, those same "noble scientists" whose face you just spit on. They spent years, or decades, learning their chosen field and the tools they need to understand to practice in their field, competing against all those thousands of others who also want in, yet you can dismiss all of that with "they're greedy wanting nice homes and cars." What an asshole!
I look forward to the day Elsevier enters chapter eleven bankruptcy.
[ link to this | view in chronology ]
Re: Re: Elsevier supports content mining, contra your salacious headline
[ link to this | view in chronology ]
Kill Decision
you do not want scientists to do a private search
nor private data mining in their private and secure labs,
but you want to have in a file each search associated to each account? just to help us?
hm, that is interesting,
scary but interesting anyway:
-is this information safe? exactly how safe?
-who does have authorized access to this information?
-can this information be used to find out WHAT you are researching into?
-can this information be used to find WHO is researching around specific topics?
-can you think HOW MUCH this information is worth?
-and how dangerous it is for scientists to be in such a list?
have you read Daniel Suarez- Kill Decision?
[ link to this | view in chronology ]
Re: Elsevier supports content mining, contra your salacious headline
Could you explain in what way the access described in this scenario (data transfer amounting to 35 KB / second, sustained over a week and a half) in any way serves to prevent you from meeting the needs of the human users?
[ link to this | view in chronology ]
Are they maybe, oh, I don't know, hiding something?
[ link to this | view in chronology ]
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
which university would that be?
[ link to this | view in chronology ]
11 milion pieces of content
11 Million pieces of content could easily fit on a 5 GB DVD or two, or a cheap 64GB usb drive.
Where is your option for universities to get ALL documents on a stick for internal distribution, mining and other 'approved' purposes. It saves a bundle on server hosting costs too.
You would have to trust your sole suppliers of pieces of content not to distribute it to the world. But that's the premise of copyright, isn't it?
[ link to this | view in chronology ]
Nobel Prize Committee
Who could dare stand against them?
[ link to this | view in chronology ]
Re: Nobel Prize Committee
[ link to this | view in chronology ]
Re: Re: Nobel Prize Committee
In addition, declaring all prior academic publications as open sourced via changes to copyright laws should be done.
[ link to this | view in chronology ]
Define a successful parasite?
Elsevier.
[ link to this | view in chronology ]
And this is threatening a contract violation...
While there is a clause
But that, too, was not being violated. No evidence has been put forth of the use of automated downloading or of disruption of services.
So what we have here is a simple case of someone consuming a much larger amount of the services Elsevier provides than normal, while still not violating the contract terms.
In comcast terms, "he violated our unannounced bandwidth cap and must be terminated".
[ link to this | view in chronology ]
As a data center sysadmin with ca. thirty years in the trenches, this is bullshit. She's a corporate liar. I'd discount anything she says as corporate PR BS. Elsevier lost the moral high ground long ago, but they're desperate to not learn they're morally and ethically bankrupt. There's too much money at stake for them to acknowledge the facts of reality. She's been told to say this and has no idea what she's talking about. She's saying it because her employer told her to.
Yes. The corporate bottom line depends on their not accepting the truth of the situation. Elsevier's shareholders should be ashamed for consorting with the likes of this. Some people can ignore anything as long as it's to their financial benefit.
[ link to this | view in chronology ]
'Could Be Considered'
[ link to this | view in chronology ]
Elsevier is 'snake bit and doomed to die'
Once scientists stop sending them papers, they will wither and die. It is underway now.
[ link to this | view in chronology ]
What would it take to immediately take the ball away from Elsevier?
[ link to this | view in chronology ]
Re: What would it take to immediately take the ball away from Elsevier?
[ link to this | view in chronology ]
Re: What would it take to immediately take the ball away from Elsevier?
actually it looks VERY EASY:
1) hack elsevier
2) dump it to the net
the net will then manage to translate everything to searchable open format
and store it in multiple open repositories
If we can do this with movies, tv series, software and videogames I do not see why this has not been done with humanity knowledge
[ link to this | view in chronology ]
Re: What would it take to immediately take the ball away from Elsevier?
I don't understand why universities haven't yet banded together to do this. It would be a sweet revenue stream that would fund their students' research and/or university operations. They could charge a tenth of what Elsevier is skimming off just to enrich third party investors, and still make enough to have plenty left over to fund their students' research.
Letting Elsevier get away with this seems the silliest way possible, or else somebody's a getting sweet unearned free ride for the lousiest return imaginable.
[ link to this | view in chronology ]
Re: Re: What would it take to immediately take the ball away from Elsevier?
Because they will still need to pay the academic publisher for access to existing papers, and that is a big lever that these publishers wield over the universities.
[ link to this | view in chronology ]
Re: Re: Re: What would it take to immediately take the ball away from Elsevier?
Yeah, it's the same problem as moving to Open Source software. The initial cost is expensive and disruptive short term. Explaining you'll make up that cost big time on the other side doesn't seem to fly for short term profit addicts.
[ link to this | view in chronology ]
[ link to this | view in chronology ]
All I hear from Elsevier...
I think they should get used to disappointment.
[ link to this | view in chronology ]
"Alicia Wise
a.wise@elsevier.com"
did she just publish her email?
[ link to this | view in chronology ]
Aaron
[ link to this | view in chronology ]
Speaking out for copyright reform
[ link to this | view in chronology ]
Snake Bit and going to die = Elsevier
They now practice 'microkerning', which means that each copy they supply to a college in electronic format has the letter spacing and word spacing changed a little. It is a form of text based steganography. By this method they police the subscribers by threat of service withdrawal. Every researcher makes scans and sends to friends by e-mail for free. Whenever Elsevier finds one, that analyze it to see who made the scan = threat.
That is the club they bear - a product of a forced monopoly that would take government copyrifght action to recify.
What governments should do is enforce zero copyright on publicly financed papers. Other paper financiers should do the same. It is in all their interests that papers all become open ASAP. It is only in Elseviers monopoly interest that the current systems persist.
[ link to this | view in chronology ]
Re: Snake Bit and going to die = Elsevier
[ link to this | view in chronology ]
Re: Re: Snake Bit and going to die = Elsevier
It is so common now, that leakers have learned to retype and paraphrase things they want to leak.
some clues here, https://www.google.ca/search?q=micro-kerning+document+control&oq=micro-kerning+document+control& amp;aqs=chrome..69i57.11966j0j8&sourceid=chrome&es_sm=93&ie=UTF-8
[ link to this | view in chronology ]
Re: Re: Re: Snake Bit and going to die = Elsevier
I knew about micro-kerning and its purpose - I was specifically interested in the actual algorithms used - was it glyph widths, or heights, was it inter-character-spacing , etc.
If so, let me know.
(There's also the cruder annotation of the name of the library subscribing. )
[ link to this | view in chronology ]
Re: Re: Re: Re: Snake Bit and going to die = Elsevier
To combat this, documents need to be OCR recognised and all words re-word processed to standard kerning. Images can also be stripped of steganographic data via projection and re-photographing with a slightly different resolution.
As to the precise ways used, it is hard to say, but if a number of different subscribers downloaded the same document at different locations as discrete subscribers that used the Elsevier API, which causes the system to create the uniquely coded document. With a few of these, they can be analyzed from the various methods used to create them, to see what means is used to encode them
[ link to this | view in chronology ]
A aggregation service is needed.
I think I will sugggest t to Google.
[ link to this | view in chronology ]
You sicken me, computer programming punk.
[ link to this | view in chronology ]
Re:
Who appointed Elsevier in charge of deciding what scientists' published results would cost other researchers to keep up on on and continue their research?
The Jews have a great word for this. It's chutzpah.
You sicken me moocher, hanger on, know nothing person. I don't want to share a planet with the likes of you. You're a predatory a-hole which none of the rest of us wants to be here. Die screaming in a fire. Consider it an act of humanity. Or, just go away. You won't be missed.
[ link to this | view in chronology ]
Thanks! ;)
[ link to this | view in chronology ]
Peter Murray-Rust's views.
[ link to this | view in chronology ]
Peter Murray-Rust's views
I will spread it around.
[ link to this | view in chronology ]
Elesvier and anti-trust
Seems like a class-action lawsuit for anti-competitive would be a no-brainer, given the grotesqueness of their actions.
[ link to this | view in chronology ]