Transitions for the Caselaw Access Project

The Library Innovation Lab is excited to announce that the original limitations on the data available for the Caselaw Access Project expired this month, and that data can now be fully released without restriction on access or use.

As part of our original collaboration agreement with Ravel Law, Inc. (now part of LexisNexis) for the Caselaw Access Project there had been access limitations on the full text and bulk data available, which have now expired. Over the next few months, we will be partnering with other organizations in the open legal data space like the Free Law Project to shepherd this data into its next phase. The Free Law Project already includes all CAP cases, as well as cases scraped from court websites, in its CourtListener search engine.

We will continue hosting the CAP data in bulk for researchers, and as individual readable cases, at case.law. However, we will be winding down services that can be better provided elsewhere, such as the search function and API.

The previous version of the site will still be available at old.case.law until September of this year. If there are features of the previous site that are not well covered by the current site or by CourtListener, we welcome feedback to info@case.law.

This transition will allow new avenues for users to access the data produced by the Caselaw Access Project, and will consolidate efforts to create centralized access points for the law. We are very proud of the contribution that CAP has made to the open legal data movement, and will continue working to expand and support free, open, and fair access to information.

History of the Caselaw Access Project

In 2018, the Library Innovation Lab launched case.law to host and distribute data created by the Caselaw Access Project. Its release was the culmination of several years of work at the Harvard Law Library to digitize a corpus of 6 million cases representing almost all precedential law in the United States. The cases were digitized from Harvard’s own collection of hardbound court reporters from across the nation, an archive which predates the founding of the United States. The digitization process involved removing the binding of each volume, scanning 40 million pages, and using OCR technology to convert the PDF images into human and machine-readable text. You can see parts of that process in this video we released about the project.

Though most government documents are in the public domain, including case law, this scope of United States case law had never before been made easily accessible to the public.