Awesomeness: Millions Of Public Domain Images Being Put Online

from the go-use-them dept

Here's some nice news. Kalev Leetaru has been liberating a ton of public domain images from books and putting them all on Flickr. He's been going through Internet Archive scans of old, public domain books, isolating the images, and turning them into individual images. Because, while the books and images are all public domain, very few of the images have been separated from the books and released in a digital format.
To achieve his goal, Mr Leetaru wrote his own software to work around the way the books had originally been digitised.

The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text.

As part of the process, the software recognised which parts of a page were pictures in order to discard them.

Mr Leetaru's code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format.
Already over 2.6 million images have been posted to Flickr in this manner -- all completely in the public domain. From a historical perspective, the images are fascinating -- and the fact that anyone can do anything with them, free of charge, is important culturally as well. Just scrolling through the images is amazing. Here are a few interesting ones that I spotted:




There seem to be lots of images of musical scores, sewing machines, individual portraits, building and machinery. Each Flickr page associated with the image gives information about the book, including the text before and after the image, which is pretty cool. The one (only slightly) annoying thing is that on the Flickr pages, rather than saying these are public domain images, it says that there are "no known copyright restrictions." While that's accurate, and a potentially reasonable hedge against some miraculous finding that says these images are covered by copyright, it's really too bad that it's so problematic to come out and say "this is in the public domain, do whatever the hell you want with it."
Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: book scans, copyright, flickr, internet archive, kalev, leetaru, old books, public domain


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • icon
    That One Guy (profile), 29 Aug 2014 @ 5:35pm

    Come one, come all, and place your bets!

    While awesome for archival purposes if nothing else, I give it a week at most before some bot starts tagging and demanding pictures be removed and claiming that at least some of them are still under copyright, followed shortly thereafter(assuming Flickr doesn't just pull them immediately), by the ones running the bot doubling down and insisting that yes, they do indeed own the rights to the images, and will be filing a lawsuit if they aren't taken down immediately.

    Because when there's absolutely no penalty for copyfraud, well, why not try to claim everything you can, on the off chance that at least some of the claims will stick and/or the target will pay up?

    link to this | view in chronology ]

  • icon
    Toestubber (profile), 29 Aug 2014 @ 7:28pm

    Anyone know a good way...

    ...to download the originals en masse? This archive seems too important to entrust to Flickr.

    link to this | view in chronology ]

  • identicon
    s7, 29 Aug 2014 @ 9:36pm

    Haha, I Stumbled upon some of these last night while browsing flickr looking for some PD images to use for a project. Re-did my search today, and yep, it was them.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 29 Aug 2014 @ 11:38pm

    Oh, boy. Whatever's not going to like this, not one bit.

    link to this | view in chronology ]

    • identicon
      Anonymous Coward, 31 Aug 2014 @ 9:06am

      Re:

      IP extremists will try to argue that this is going to kill art and make all artists starve or something.

      link to this | view in chronology ]

  • identicon
    Anonymous Coward, 30 Aug 2014 @ 7:02am

    Thanks for the Info and story.

    link to this | view in chronology ]

  • icon
    orbitalinsertion (profile), 30 Aug 2014 @ 7:53am

    In jpeg format?

    Once more, just to help me mentally process this:
    Retrieving and publishing public domain images in jpeg format?

    link to this | view in chronology ]

  • icon
    1st Dread Pirate Roberts (profile), 30 Aug 2014 @ 12:27pm

    Way Cool!

    This is way cool! This guy needs to patent this technique.

    link to this | view in chronology ]

  • identicon
    bob, 30 Aug 2014 @ 2:54pm

    Lather, rinse, repeat

    It's interesting that Leetaru has taken on images. He is a major force behind GDELT, the Global Database of Events, Language, and Tone which uses automated techniques to mine news sources for event summaries (among other things).

    Unlike GDELT, here all the source material is demonstrably public domain, so publishing the image extracts (in whatever form) should not cause any hiccoughs.

    link to this | view in chronology ]

  • icon
    Antsan (profile), 31 Aug 2014 @ 3:49am

    Unfortunately there seems to be something strange going on on Flickr. I cannot just right click on the images and save them like I am used to.
    Would be nice if the pictures were uploaded somewhere where they are more easily accessible.

    link to this | view in chronology ]

    • identicon
      NikFromNYC, 2 Sep 2014 @ 3:54pm

      Re:

      There's a little hard to hit three dot icon leading to various sizes that includes original that I can download just fine on an iPhone browser. I just have to zoom in to not miss the dots button since the next image hot area is the whole right edge of the image right down to that button, irritatingly.

      link to this | view in chronology ]

    • identicon
      Victoria Love, 9 Sep 2014 @ 5:06pm

      Re: saving images

      I was able to isolate and save by playing around with the "all sizes" option on flickr. Once the image was displayed without the caption information I was able to use the "save image" option. This was on my iPad. I was able to save a single image to "my photos" on iPad.

      link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Aug 2014 @ 9:12am

    Torrent?

    Would be cool if someone could create a .torrent file of all the images.

    link to this | view in chronology ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.