Harvard Opens Up Its Massive Caselaw Access Project
from the good-to-see dept
Almost exactly three years ago, we wrote about the launch of an ambitious project by Harvard Law School to scan all federal and state court cases and get them online (for free) in a machine readable format (not just PDFs!), with open APIs for anyone to use. And, earlier this week, case.law officially launched, with 6.4 million cases, some going back as far as 1658. There are still some limitations -- some placed on the project by its funding partner, Ravel, which was acquired by LexisNexis last year (though, the structure of the deal will mean some of these restrictions will likely decrease over time).
Also, the focus right now is really on providing this setup as a tool for others to build on, rather than as a straight up interface for anyone to use. As it stands, you can either access data via the site's API, or by doing bulk downloads. Of course, the bulk downloads are, unfortunately, part of what's limited by the Ravel/LexisNexis data. Bulk downloads are available for cases in Illinois and Arkansas, but that's only because both of those states already make cases available online. Still, even with the Ravel/LexisNexis limitation, individual users can download up to 500 cases per day.
The real question is what will others build with the API. The site has launched with four sample applications that are all pretty cool.
- H2O is a tool that law professors can use to easily create casebooks for students in various areas of law. Anything published on H2O gets a Creative Commons license and can then be shared widely. I wonder if professors like Eric Goldman, who offers an Internet Law Casebook, or James Grimmelmann, who has a different Internet Law Casebook, will eventually port them over to a platform like H2O.
- A wordcloud app that currently shows the "most used words" in California cases in various years. Here, for example, are the word clouds in California cases from 1871... and 2012. See if you can tell which one's which.
- Caselaw Limericks that appears to randomly generate what it believes is a rhyming limerick from the case law. Here's what I got:
Her son Julius is a confirmed thief.
He did not turn over a new leaf.
The vessel, not.
the parking lot.
Respondent concedes this in its brief.
- The quality overall is... a bit mixed. But it's fun.
- And, finally, in time for Halloween, Witchcraft in Law, which totals up cases that cite "witchcraft" by state.
Hopefully this inspires a lot more on the development side as well.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: caselaw, caselaw access project, legal data, public info, public records, transparency
Companies: harvard, lexisnexis, ravel
Reader Comments
Subscribe: RSS
View by: Time | Thread
"Lowering The Bar"...
[ link to this | view in chronology ]
Re: "Lowering The Bar"...
Lowering the Bar's subtle humor deserves a link.
[ link to this | view in chronology ]
Non-Free
From the H2O Terms of Service:
and
[ link to this | view in chronology ]
Re: Non-Free
[ link to this | view in chronology ]
Re: Non-Free
That only means that people other than the copyright holder are not allowed to use the data in a book, if the obtain it under that license. The contributor is free to license or sell their own works as part of a commercial enterprise. Similarly, anybody with a commercial enterprise in mind is free to approach the copyright holder to obtain a license that permits commercial use, they just have to live with and compete with the creative commons version.
[ link to this | view in chronology ]
Re: Re: Non-Free
Yes, but the "noncommercial" CC licenses are considered non-free, by the definition of free culture licenses. See the NonCommercial interpretation page on the CC Wiki:
[ link to this | view in chronology ]
Re: Re: Re: Non-Free
If that is what the person wants, they distribute the work via one of those platforms under a suitable free license, or if they have already done so, they can submit to the project under an NC license.
Nothing in the rules stops a copyright owner distributing a work under several licenses.
[ link to this | view in chronology ]
Re: Re: Re: Re: Non-Free
I...don't see anyone claiming otherwise?
[ link to this | view in chronology ]
Huh?
[ link to this | view in chronology ]
Re: Huh?
Thing is, eventually all of whatever Harvard harvests will be re-harvested and then become freely available and unencumbered. Though the fancy apps might not be included. Seems to me some folks were doing this with Pacer, or some other unreasonably encumbered system.
500 downloads per day would only need the cooperation of 13,000 people for one day to capture the entire database. There certainly could be permutations of people and days. To think that anyone might be able to control this pubilic infomation beyond the download (incorporating in the apps is different) would be incredulous.
[ link to this | view in chronology ]
Re-publishing and archiving
And yeah, I was thinking, someone should definitely coordinate this. I'd certainly run a CRON job on one of my systems to pull down another 500 downloads per day, orchestrated to avoid duplication of effort by some central server like how bitcoin mining pools work.
[ link to this | view in chronology ]
Re: Re-publishing and archiving
[ link to this | view in chronology ]
Limericks
"Threes and fours, mostly rejects;
He questioned all of the suspects.
A particular bank,
A cylindrical tank,
Affirmed in all other respects."
[ link to this | view in chronology ]
But how will censorship be properly implemented?
Despite lacking any actual political power, this online "encyclopedia" could be viewed as a kind of democracy in action, a way to determine what kind of information could be considered fit for public consumption and what is not. And judging by Wikipedia's standards, much of the information in court records is considered private and thus must not be seen by the public (even when it can already be easily found on the internet).
So the question is, will these court documents get reviewed and scrubbed of personally identifying (and potentially embarrassing) information, and perhaps even corrected to meet modern day standards of social etiquette (like using the "correct" pronouns), or will this be a kind of massive data leak sure to upset everyone from traditional privacy advocates to modern social-justice activists?
[ link to this | view in chronology ]
Re: But how will censorship be properly implemented?
Public documents on the other hand, whomever posted them, are not actually actionable, though there are some in the EU that might differ with that.
Maybe if Wikipedia opened a set of public documents pages and then linked to that it might preserve some of the legal angst that would come their way if they didn't. Then again, maybe not.
I will be looking forward to hearing here on Techdirt about the lawsuits from folks in the EU against Harvard for the publication of these documents, even though those lawsuites should go nowhere.
[ link to this | view in chronology ]
machine readable
[ link to this | view in chronology ]
Re: machine readable
[ link to this | view in chronology ]
Next App: Which cops perjured themselves??
[ link to this | view in chronology ]