Awesomeness: Millions Of Public Domain Images Being Put Online
from the go-use-them dept
Here's some nice news. Kalev Leetaru has been liberating a ton of public domain images from books and putting them all on Flickr. He's been going through Internet Archive scans of old, public domain books, isolating the images, and turning them into individual images. Because, while the books and images are all public domain, very few of the images have been separated from the books and released in a digital format.To achieve his goal, Mr Leetaru wrote his own software to work around the way the books had originally been digitised.Already over 2.6 million images have been posted to Flickr in this manner -- all completely in the public domain. From a historical perspective, the images are fascinating -- and the fact that anyone can do anything with them, free of charge, is important culturally as well. Just scrolling through the images is amazing. Here are a few interesting ones that I spotted:
The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text.
As part of the process, the software recognised which parts of a page were pictures in order to discard them.
Mr Leetaru's code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: book scans, copyright, flickr, internet archive, kalev, leetaru, old books, public domain
Reader Comments
Subscribe: RSS
View by: Time | Thread
Come one, come all, and place your bets!
Because when there's absolutely no penalty for copyfraud, well, why not try to claim everything you can, on the off chance that at least some of the claims will stick and/or the target will pay up?
[ link to this | view in chronology ]
Anyone know a good way...
[ link to this | view in chronology ]
Re: Anyone know a good way...
[ link to this | view in chronology ]
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
[ link to this | view in chronology ]
In jpeg format?
Retrieving and publishing public domain images in jpeg format?
[ link to this | view in chronology ]
Way Cool!
[ link to this | view in chronology ]
Lather, rinse, repeat
Unlike GDELT, here all the source material is demonstrably public domain, so publishing the image extracts (in whatever form) should not cause any hiccoughs.
[ link to this | view in chronology ]
Would be nice if the pictures were uploaded somewhere where they are more easily accessible.
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re: saving images
[ link to this | view in chronology ]
Torrent?
[ link to this | view in chronology ]
Re: Torrent?
[ link to this | view in chronology ]