New Research Shows Digitization Results In Routine Lock-Down Of Public Domain Books
from the what-about-our-rights? dept
The public domain is supposed to be what we receive in return for, and after the expiry of, time-limited, government-backed intellectual monopolies that are granted to creators. As Mike noted recently, that neat equation does not reflect today's reality for copyright, where the situation is so complicated that it requires a 52-page handbook to determine whether or not something is in the public domain.But the situation is actually far worse than that, because the public is being denied access to many works that are unambiguously in the public domain because of new restrictions being placed on them when they are digitized. That's something that Techdirt has discussed before, but such stories have been largely anecdotal. Research from New Zealand provides us with more detailed information of what's going on:
In order to establish the extent to which digitized public domain books are being restricted, a sample of 100 pre-1890 books was selected from the New Zealand National Bibliography (NZNB). This sample was chosen on the assumption that these works had entered the public domain under New Zealand copyright law. Each book in the sample was searched for within six online repositories: Google Books, Hathi Trust, Internet Archive, Early New Zealand Books (ENZB), New Zealand Electronic Text Collection (NZETC) and Project Gutenberg. In addition, Google and Bing searches were conducted for all sample books that could not be located within these repositories.Here's what the researchers discovered:
The findings of this research suggest that a high proportion of digitized public domain books are being restricted by online repositories. Out of a sample of 100 public domain books, only three are hosted by repositories that do not impose any form of usage restriction. Furthermore, 48 percent (24) of all digitized books [50 out of the 100 public domain sample] are hosted by a repository that restricts or blocks access, with the most restrictive repository limiting or blocking access to 91 percent (21) of sample books within its collection.They also managed to pinpoint the key problem:
Almost all access restrictions applied to public domain books within the sample were the result of repositories using a process of estimation to assess copyright status. Within the sample, a one-minute search located accurate biographical information about authors two-thirds of the time. This task takes a fraction of the time required to digitize a book, which involves 30 minutes to scan 500 pages (Kelly, 2006).A solution is the following:
Digitizers should incorporate the sourcing of copyright information within the overall process of digitization, and copyright estimation should only be used as an option of last resort. Furthermore, copyright estimation periods should better reflect statistical norms regarding the actual duration of copyright protection. The current estimation period of 140 years, used by Google Books and Hathi Trust, is far too conservative. If hosted under this policy, 47 percent of sample books would be restricted. This is despite the fact that all books with locatable biographical information were confirmed as being in the public domain for between 30 and 132 years.This goes back to the problem of determining whether a work is in the public domain or not. Because that can be complex, those carrying out the digitization of works simply assume the worst, just to be on the safe side. That's something that needs to change, otherwise we risk losing not just the benefits of digitized public domain works, but also our undoubted rights to access them freely.
Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: archives, books, copyright, ebooks, libraries, lockdown, new zealand, public domain
Reader Comments
The First Word
“How can you call something intellectual PROPERTY if nobody can know who it belongs to?
How can you call something INTELLECTUAL property if most of it, is, well, FORGOTTEN?
[*]Yes, I'm speaking from experience, researching a book by a citizen of the Austro-Hungarian empire who came to the U.S. as a teenager and remained there the rest of his life. How am I as a U.S. citizen supposed to know what the Austro-Hungarian empire's copyright laws were--since the Empire didn't exist or even have a unique successor on the date the book was written! And how can I know whether/when someone became a U.S. citizen?
Subscribe: RSS
View by: Time | Thread
Congress has the power to create copyright laws, not the responsibility. The laws aren't necessary, effective, or proportionate. Enforcing them requires reduction of the common carrier principle and mass monitoring of who is doing what online.
I'd be curious to see a comparison of the percentage of voters who want marijuana legalized vs who want copyright law reformed. If people can spin that as pragmatic, it says something about our society when we can't spin something that affects more than our private lives as pragmatic enough to get off our asses and tend to.
[ link to this | view in chronology ]
Congress?
[ link to this | view in chronology ]
Re: Congress?
I suspect host law is more likely to be involved than destination law, which means that the article isn't, in fact, about New Zealand (law)... or perhaps it is for those sites which have local hosts in New Zealand (Google?).
See also https://www.techdirt.com/articles/20131231/23434825735/grinch-who-stole-public-domain.shtml
[ link to this | view in chronology ]
http://seegras.discordia.ch/Blog/stealing-from-the-public-domain/
[ link to this | view in chronology ]
re
[ link to this | view in chronology ]
This argument is so stupid. Neither you nor Mike actually try to figure out the public domain status of a given. If you did, you'd see how simple it is to do. You don't need all 52 pages for one work.
But the situation is actually far worse than that, because the public is being denied access to many works that are unambiguously in the public domain because of new restrictions being placed on them when they are digitized.
Even if a work is in the public domain, it can be locked up behind any paywall the owner of the COPY wants. Another stupid argument.
This goes back to the problem of determining whether a work is in the public domain or not. Because that can be complex, those carrying out the digitization of works simply assume the worst, just to be on the safe side.
Again, rather than alarmist bullshit, why don't you walk us through the determination of the public domain status of a given work. The handbook is simple to apply. They even released an 8-page flow chart version, and you only need one page for a given work. One page.
That's something that needs to change, otherwise we risk losing not just the benefits of digitized public domain works, but also our undoubted rights to access them freely.
"Undoubted rights"?? That's hilarious. If I have a copy of a public domain work on my bookshelf or on my server, you have ZERO rights to access it. Terrible argument, Glyn.
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re: Re:
[ link to this | view in chronology ]
Re: Re: Re:
[ link to this | view in chronology ]
Re: Re: Re:
My Brilliant Career (1901) - Miles Franklin, died 1954
Animal Farm (1945) - George Orwell, died 1950
The Great Gatsby (1925) - F Scott Fitzgerald, died 1940
Tender is the Night (1933) - F Scott Fitzgerald, died 1940
Lady Chatterley's Lover (1928) - D H Lawrence, died 1930
Gone with the Wind (1936) - Margaret Mitchell, died 1949
Between the Acts (1941) - Virginia Woolf, died 1941
All were published with copyright notices except for first which had copyright at time of creation under blanket copyright structures.
whether Renewed or not is irrelevant to the above due to the dates of death
So come on.. you are so knowledgeable and have decided that you can determine copyright in a simplistic flowchart. Have a go at them, should be easy. Oh and remember the answer should be contextually based upon the article above too.
[ link to this | view in chronology ]
Re: Re: Re: Re:
*crickets*
[ link to this | view in chronology ]
Re: Re: Re: Re:
My Brilliant Career (1901) - Miles Franklin,
died 1954 Under copyright until 2025
Animal Farm (1945) - George Orwell, died 1950 Under copyright until 2021
The Great Gatsby (1925) - F Scott Fitzgerald,
died 1940 Public Domain since 2011
Tender is the Night (1933) - F Scott Fitzgerald,
died 1940 Public Domain since 2011
Lady Chatterley's Lover (1928) - D H
Lawrence, died 1930 Public Domain from 1981-1996 then since 2001
Gone with the Wind (1936) - Margaret Mitchell,
died 1949 Under copyright until 2020
Between the Acts (1941) - Virginia Woolf, died 1941 Public Domain
[ link to this | view in chronology ]
Re: Re: Re: Re: Re:
[ link to this | view in chronology ]
Re: Re: Re:
[ link to this | view in chronology ]
Re:
But I agree that a company is not obligated to make their own copies of public domain works freely available to the public.
As long as no one gets any crazy ideas that there are any restrictions on what anyone can do, once they have access through a paywall or whatever, with the copies that appear on their own devices.
[ link to this | view in chronology ]
Re: Re:
So, you do care since if you didn't care then you *couldn't* care less...
[ link to this | view in chronology ]
Re: Re: Re:
Fun fact: this is one of the perversions that was exported from the UK rather than imported from the colonies
[ link to this | view in chronology ]
Re: Re: Re: Re:
[ link to this | view in chronology ]
Re: Re: Re: Re: Re:
Now, to figuratively run literally into the ground...
[ link to this | view in chronology ]
Re: Re: Re: Re:
[ link to this | view in chronology ]
Re:
That only applies when someone has read and understood the implications of all 52 pages. Until they have done that the cannot answer the question, do any other pages in the book change anything I have read so far.
[ link to this | view in chronology ]
[ link to this | view in chronology ]
How can you call something intellectual PROPERTY if nobody can know who it belongs to?
How can you call something INTELLECTUAL property if most of it, is, well, FORGOTTEN?
[*]Yes, I'm speaking from experience, researching a book by a citizen of the Austro-Hungarian empire who came to the U.S. as a teenager and remained there the rest of his life. How am I as a U.S. citizen supposed to know what the Austro-Hungarian empire's copyright laws were--since the Empire didn't exist or even have a unique successor on the date the book was written! And how can I know whether/when someone became a U.S. citizen?
[ link to this | view in chronology ]
Har!
Copyright has been turned on its head. Thanks to that %$%*@
Sonny Bono, copyright lasts longer than the lifespan of almost the entire population. That's like not having a copyright law at all.
[ link to this | view in chronology ]