New Research Shows Digitization Results In Routine Lock-Down Of Public Domain Books

Culture

from the what-about-our-rights? dept

Tue, Jun 24th 2014 12:32am — Glyn Moody

The public domain is supposed to be what we receive in return for, and after the expiry of, time-limited, government-backed intellectual monopolies that are granted to creators. As Mike noted recently, that neat equation does not reflect today's reality for copyright, where the situation is so complicated that it requires a 52-page handbook to determine whether or not something is in the public domain.

But the situation is actually far worse than that, because the public is being denied access to many works that are unambiguously in the public domain because of new restrictions being placed on them when they are digitized. That's something that Techdirt has discussed before, but such stories have been largely anecdotal. Research from New Zealand provides us with more detailed information of what's going on: In order to establish the extent to which digitized public domain books are being restricted, a sample of 100 pre-1890 books was selected from the New Zealand National Bibliography (NZNB). This sample was chosen on the assumption that these works had entered the public domain under New Zealand copyright law. Each book in the sample was searched for within six online repositories: Google Books, Hathi Trust, Internet Archive, Early New Zealand Books (ENZB), New Zealand Electronic Text Collection (NZETC) and Project Gutenberg. In addition, Google and Bing searches were conducted for all sample books that could not be located within these repositories.
Here's what the researchers discovered: The findings of this research suggest that a high proportion of digitized public domain books are being restricted by online repositories. Out of a sample of 100 public domain books, only three are hosted by repositories that do not impose any form of usage restriction. Furthermore, 48 percent (24) of all digitized books [50 out of the 100 public domain sample] are hosted by a repository that restricts or blocks access, with the most restrictive repository limiting or blocking access to 91 percent (21) of sample books within its collection.
They also managed to pinpoint the key problem: Almost all access restrictions applied to public domain books within the sample were the result of repositories using a process of estimation to assess copyright status. Within the sample, a one-minute search located accurate biographical information about authors two-thirds of the time. This task takes a fraction of the time required to digitize a book, which involves 30 minutes to scan 500 pages (Kelly, 2006).
A solution is the following: Digitizers should incorporate the sourcing of copyright information within the overall process of digitization, and copyright estimation should only be used as an option of last resort. Furthermore, copyright estimation periods should better reflect statistical norms regarding the actual duration of copyright protection. The current estimation period of 140 years, used by Google Books and Hathi Trust, is far too conservative. If hosted under this policy, 47 percent of sample books would be restricted. This is despite the fact that all books with locatable biographical information were confirmed as being in the public domain for between 30 and 132 years.
This goes back to the problem of determining whether a work is in the public domain or not. Because that can be complex, those carrying out the digitization of works simply assume the worst, just to be on the safe side. That's something that needs to change, otherwise we risk losing not just the benefits of digitized public domain works, but also our undoubted rights to access them freely.

Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: archives, books, copyright, ebooks, libraries, lockdown, new zealand, public domain

24 Comments

If you liked this post, you may also be interested in...

Reader Comments

The First Word

“

The rules (in the U.S.) are indeed horrifically complex, and include such facts as author's citizenship at the time of creation (and the copyright laws in that jurisdiction), author date of death, location and date (including month) of first publication anywhere, location and date (including month) of first U.S. publication ... and, as impossible as most of this is to find[*] there are additional, even-more-obscure details mentioned in the Stanford SUMMARY of copyright law that could impact the result.

How can you call something intellectual PROPERTY if nobody can know who it belongs to?

How can you call something INTELLECTUAL property if most of it, is, well, FORGOTTEN?

[*]Yes, I'm speaking from experience, researching a book by a citizen of the Austro-Hungarian empire who came to the U.S. as a teenager and remained there the rest of his life. How am I as a U.S. citizen supposed to know what the Austro-Hungarian empire's copyright laws were--since the Empire didn't exist or even have a unique successor on the date the book was written! And how can I know whether/when someone became a U.S. citizen?

—hutcheson

”

Subscribe: RSS

View by: Time | Thread

Anonymous Coward, 24 Jun 2014 @ 2:09am

Why don't people denounce copyright law?

Congress has the power to create copyright laws, not the responsibility. The laws aren't necessary, effective, or proportionate. Enforcing them requires reduction of the common carrier principle and mass monitoring of who is doing what online.

I'd be curious to see a comparison of the percentage of voters who want marijuana legalized vs who want copyright law reformed. If people can spin that as pragmatic, it says something about our society when we can't spin something that affects more than our private lives as pragmatic enough to get off our asses and tend to.
[ link to this | view in chronology ]
- Anonymous Coward, 24 Jun 2014 @ 3:31am
  
  Congress?
  Wait, are you talking about the USA congress? This article is about New Zealand.
  [ link to this | view in chronology ]
  - Anonymous Coward, 24 Jun 2014 @ 6:31pm
    
    Re: Congress?
    That's actually part of the problem. Do the websites in the study need to conform to the copyright legislation of the country they are hosted in, or the country the requests come from, or both, or take the worst case scenario from around the world just to be safe?
    
    I suspect host law is more likely to be involved than destination law, which means that the article isn't, in fact, about New Zealand (law)... or perhaps it is for those sites which have local hosts in New Zealand (Google?).
    
    See also https://www.techdirt.com/articles/20131231/23434825735/grinch-who-stole-public-domain.shtml
    [ link to this | view in chronology ]
Seegras (profile), 24 Jun 2014 @ 3:11am

I already wrote about it, about Copyfraud and about repositories supporting that fraud.

http://seegras.discordia.ch/Blog/stealing-from-the-public-domain/
[ link to this | view in chronology ]
broken, 24 Jun 2014 @ 3:51am

re
Copyrights are not for the public good. Simplistic Disney effects... Perpetual milking machines for corporations.
[ link to this | view in chronology ]
Anonymous Coward, 24 Jun 2014 @ 4:30am

As Mike noted recently, that neat equation does not reflect today's reality for copyright, where the situation is so complicated that it requires a 52-page handbook to determine whether or not something is in the public domain.

This argument is so stupid. Neither you nor Mike actually try to figure out the public domain status of a given. If you did, you'd see how simple it is to do. You don't need all 52 pages for one work.

But the situation is actually far worse than that, because the public is being denied access to many works that are unambiguously in the public domain because of new restrictions being placed on them when they are digitized.

Even if a work is in the public domain, it can be locked up behind any paywall the owner of the COPY wants. Another stupid argument.

This goes back to the problem of determining whether a work is in the public domain or not. Because that can be complex, those carrying out the digitization of works simply assume the worst, just to be on the safe side.

Again, rather than alarmist bullshit, why don't you walk us through the determination of the public domain status of a given work. The handbook is simple to apply. They even released an 8-page flow chart version, and you only need one page for a given work. One page.

That's something that needs to change, otherwise we risk losing not just the benefits of digitized public domain works, but also our undoubted rights to access them freely.

"Undoubted rights"?? That's hilarious. If I have a copy of a public domain work on my bookshelf or on my server, you have ZERO rights to access it. Terrible argument, Glyn.
[ link to this | view in chronology ]
- Anonymous Coward, 24 Jun 2014 @ 5:00am
  
  Re:
  Since you are an apparent expert in the field, why don't you walk us through the determination of the public domain status of a given work? I hear the handbook is simple to apply and you only need one page of a given work, or so I'm told anyways.
  [ link to this | view in chronology ]
  - Anonymous Coward, 24 Jun 2014 @ 5:19am
    
    Re: Re:
    Give me the specifics, such as date of publication, date of author's death, whether published with a copyright notice, whether renewed, etc. It might be a while before I can answer since I'm heading out the door right now, but I'll check back this afternoon.
    [ link to this | view in chronology ]
    - Anonymous Coward, 24 Jun 2014 @ 5:31am
      
      Re: Re: Re:
      How about you pick a good example, supply all the details and show why the work should or should not be considered public domain.
      [ link to this | view in chronology ]
    - G Thompson (profile), 24 Jun 2014 @ 6:01am
      
      Re: Re: Re:
      Ok I'll bite.
      
      My Brilliant Career (1901) - Miles Franklin, died 1954
      Animal Farm (1945) - George Orwell, died 1950
      The Great Gatsby (1925) - F Scott Fitzgerald, died 1940
      Tender is the Night (1933) - F Scott Fitzgerald, died 1940
      Lady Chatterley's Lover (1928) - D H Lawrence, died 1930
      Gone with the Wind (1936) - Margaret Mitchell, died 1949
      Between the Acts (1941) - Virginia Woolf, died 1941
      
      All were published with copyright notices except for first which had copyright at time of creation under blanket copyright structures.
      
      whether Renewed or not is irrelevant to the above due to the dates of death
      
      So come on.. you are so knowledgeable and have decided that you can determine copyright in a simplistic flowchart. Have a go at them, should be easy. Oh and remember the answer should be contextually based upon the article above too.
      [ link to this | view in chronology ]
      - G Thompson (profile), 26 Jun 2014 @ 1:55am
        
        Re: Re: Re: Re:
        Yep.. as I suspected...
        
        *crickets*
        [ link to this | view in chronology ]
      - Sheogorath (profile), 4 Nov 2014 @ 8:45am
        
        Re: Re: Re: Re:
        I can answer this one from a UK perspective:
        My Brilliant Career (1901) - Miles Franklin,
        died 1954 Under copyright until 2025
        Animal Farm (1945) - George Orwell, died 1950 Under copyright until 2021
        The Great Gatsby (1925) - F Scott Fitzgerald,
        died 1940 Public Domain since 2011
        Tender is the Night (1933) - F Scott Fitzgerald,
        died 1940 Public Domain since 2011
        Lady Chatterley's Lover (1928) - D H
        Lawrence, died 1930 Public Domain from 1981-1996 then since 2001
        Gone with the Wind (1936) - Margaret Mitchell,
        died 1949 Under copyright until 2020
        Between the Acts (1941) - Virginia Woolf, died 1941 Public Domain
        [ link to this | view in chronology ]
        
        Sheogorath (profile), 4 Nov 2014 @ 9:15am
        
        Re: Re: Re: Re: Re:
        My comment got cut off. The last part should read Between the Acts (1941) - Virginia Woolf, died 1941 Public Domain since 2012
        [ link to this | view in chronology ]
    - PaulT (profile), 24 Jun 2014 @ 8:26am
      
      Re: Re: Re:
      Missed the point, didn't you? Even if you're given specifics, you still have to follow the steps contained in those 52 pages to determine copyright status. When the answer should actually just be "if the work is over X years old, it's public domain". Or, preferably "has the author got a current registration on file?".
      [ link to this | view in chronology ]
- Anonymous Coward, 24 Jun 2014 @ 6:58am
  
  Re:
  Since I disagree with the entire concept of copyright, I could care less how difficult it is to figure out.
  
  But I agree that a company is not obligated to make their own copies of public domain works freely available to the public.
  
  As long as no one gets any crazy ideas that there are any restrictions on what anyone can do, once they have access through a paywall or whatever, with the copies that appear on their own devices.
  [ link to this | view in chronology ]
  - PaulT (profile), 24 Jun 2014 @ 8:29am
    
    Re: Re:
    "I could care less how difficult it is to figure out"
    
    So, you do care since if you didn't care then you *couldn't* care less...
    [ link to this | view in chronology ]
    - Anonymous Coward, 24 Jun 2014 @ 6:22pm
      
      Re: Re: Re:
      It's colloquial, not literal. English is imperfect, make the most of it and try to keep up. This one has been around for more than 50 years, anyway!
      
      Fun fact: this is one of the perversions that was exported from the UK rather than imported from the colonies
      [ link to this | view in chronology ]
      - Anonymous Coward, 25 Jun 2014 @ 5:12am
        
        Re: Re: Re: Re:
        It has always been 'I couldn't care less', and I am from the UK.
        [ link to this | view in chronology ]
        
        Anonymous Coward, 25 Jun 2014 @ 11:55pm
        
        Re: Re: Re: Re: Re:
        I always got the impression that both were valid: "couldn't" is a simple statement of fact, while "could" carried a sarcastic tone ("I could care less, but it would be hard.")
        
        Now, to figuratively run literally into the ground...
        [ link to this | view in chronology ]
      - PaulT (profile), 26 Jun 2014 @ 12:54am
        
        Re: Re: Re: Re:
        Well, I'm from the UK and I'd never heard the incorrect term being used until I started seeing it online. Where I'm from, it was always "I couldn't care less", which is accurate.
        [ link to this | view in chronology ]
- Anonymous Coward, 24 Jun 2014 @ 7:30am
  
  Re:
  This argument is so stupid. Neither you nor Mike actually try to figure out the public domain status of a given. If you did, you'd see how simple it is to do. You don't need all 52 pages for one work.
  
  That only applies when someone has read and understood the implications of all 52 pages. Until they have done that the cannot answer the question, do any other pages in the book change anything I have read so far.
  [ link to this | view in chronology ]
Zakida Paul (profile), 24 Jun 2014 @ 6:18am

The real copyright theft is what this is.
[ link to this | view in chronology ]
hutcheson, 24 Jun 2014 @ 11:07am

The rules (in the U.S.) are indeed horrifically complex, and include such facts as author's citizenship at the time of creation (and the copyright laws in that jurisdiction), author date of death, location and date (including month) of first publication anywhere, location and date (including month) of first U.S. publication ... and, as impossible as most of this is to find[*] there are additional, even-more-obscure details mentioned in the Stanford SUMMARY of copyright law that could impact the result.

How can you call something intellectual PROPERTY if nobody can know who it belongs to?

How can you call something INTELLECTUAL property if most of it, is, well, FORGOTTEN?

[*]Yes, I'm speaking from experience, researching a book by a citizen of the Austro-Hungarian empire who came to the U.S. as a teenager and remained there the rest of his life. How am I as a U.S. citizen supposed to know what the Austro-Hungarian empire's copyright laws were--since the Empire didn't exist or even have a unique successor on the date the book was written! And how can I know whether/when someone became a U.S. citizen?
[ link to this | view in chronology ]
1st Dread Pirate Roberts (profile), 25 Jun 2014 @ 11:32am

Har!
Prior to copyright enactment in England, authors had full control of works, essentially forever. Copyright law was intended to force works into the public domain. If you wanted a continuing income stream, you needed to produce new works. You were granted a limited period during which to earn income from your works.

Copyright has been turned on its head. Thanks to that %$%*@
Sonny Bono, copyright lasts longer than the lifespan of almost the entire population. That's like not having a copyright law at all.
[ link to this | view in chronology ]