New Research Shows How Copyright Law Is Keeping Useful Info Off Wikipedia
from the too-bad dept
The Atlantic has an interesting article about some forthcoming research from MIT PhD student Abhishek Nagaraj (though, oddly, the article never introduces him, never mentions his first name, and just refers to him throughout by his last name only). It's the latest in an increasingly long line of evidence showing how copyright is stifling content and keeping it from reaching the public in useful ways. Nagaraj found a particularly useful natural experiment in the archives of Baseball Digest "the oldest and longest-running journal of matters baseball-related," which has been published continuously since 1942. For various reasons (sounds like they didn't renew...) the issues from 1942 until 1964 are in the public domain. Everything after that... not so much. Google's book scanning project scanned nearly every issue from July 1945 until 2008.Nagaraj realized that Wikipedians were using this as good source material for Wikipedia pages -- especially on the profiles of older baseball players. He noted that there was little stopping the text from being rewritten, but the real issue was around images. People could use the scanned images to illustrate the profiles, but clearly they could only use the public domain ones without permission.
But Nagaraj found was that the availability of public domain material dramatically improved the article's images. Before the digitization, players from between '44 and '64 had an average of .183 pictures on their articles. The '64 to '84 group had about .158 pictures. But after digitization, those numbers dramatically changed: there were 1.15 pictures on each of the older group's articles -- but only .667 in the new group. More recent players, covered by privately-owned parts of Baseball Digest, had half as many images on their pages as did old-timers.And, yes, the article notes that he put in place various controls to correct for unrelated differences. Basically, the only observable difference in why the pages have more images is the public domain status of some of those works vs. others. Some might argue that this is no big deal, but he found a second bit of useful data s well:
And the effects of this -- of just having an image on the page -- cascaded to other metrics. "Out-of-copyright" players's pages saw a significant boost in traffic. Articles from the pre-'64 that were already in the top 10 percent saw their hits increase more than 70 percent. Articles from that group in the least-popular ten percent saw traffic to their articles increase by 25 percent. Those pages were more frequently edited across the board, too. And this makes sense: Google rewards updated content, and it rewards images. The out-of-copyright players provided more of both.I'm reminded, yet again, of that chart of the now infamous gap in books under copyright that you can't find any more -- even though older books in the public domain are widely available. Once again, we're seeing not only the massive value of the public domain, but how much useful content is being locked away by excessively strict (and excessively long) copyright law.
Filed Under: baseball digest, copyright, information, learning, public domain, wikipedia