Awesomeness: Millions Of Public Domain Images Being Put Online
from the go-use-them dept
Here's some nice news. Kalev Leetaru has been liberating a ton of public domain images from books and putting them all on Flickr. He's been going through Internet Archive scans of old, public domain books, isolating the images, and turning them into individual images. Because, while the books and images are all public domain, very few of the images have been separated from the books and released in a digital format.To achieve his goal, Mr Leetaru wrote his own software to work around the way the books had originally been digitised.Already over 2.6 million images have been posted to Flickr in this manner -- all completely in the public domain. From a historical perspective, the images are fascinating -- and the fact that anyone can do anything with them, free of charge, is important culturally as well. Just scrolling through the images is amazing. Here are a few interesting ones that I spotted:
The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text.
As part of the process, the software recognised which parts of a page were pictures in order to discard them.
Mr Leetaru's code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format.
Filed Under: book scans, copyright, flickr, internet archive, kalev, leetaru, old books, public domain