November Business Meeting - David Schwartz and Adam Pah from SCALES-OKN

November 19, 2020 Business Meeting at 12:00 p.m. via Zoom

Our November Business Meeting had 59 attendees. President Lindsey Carpino opened the meeting by welcoming new members and introducing our sponsor, Thomson Reuters. Blythe McCoy discussed the various COVID-19 and civil rights resources that Thomson Reuters has made available to researchers. She also discussed Thomson Reuter’s financial support to the city of Minneapolis and expanded opportunities for employees to volunteer in the community.

Jamie Sommer introduced our speakers, Professors David Schwartz and Adam Pah of Northwestern University, who joined us to discuss their project, SCALES – OKN (Systematic Content Analysis of Litigation EventS Open Knowledge Network). This is a project to build an open, searchable platform to provide the public and researchers with access to federal court records and analytics.

Profs. Schwartz and Pah are two members of a larger team from multiple universities and disciplines. The team has over 20 scholars from seven different schools, though the lion’s share are from Northwestern. The four main disciplines are law, computer science/engineering, data science, and journalism. They also have a number of partners, including the Free Law Project, other advocacy groups like clinics, law firms, and more.

Their goal is to bring transparency to federal court operations to make sure they’re fair and accurate. Federal courts see hundreds of thousands of civil actions and around 75,000 criminal actions ach year, and related documents are stored in PACER. One problem SCALES – OKN seeks to address is that PACER charges 10 cents per printed page to access court documents, even though those documents are PDFs. This makes it expensive to do systematic research of how courts operate. With that in mind, the project’s initial goal is to help researchers who regularly do court research.

The team started by interviewing potential PACER users, whose concerns were that it was costly to do research, the clunkiness of the PACER interface, narrow search fields, various inefficiencies, and the fact that each court has its own means of access to PACER. So to start, the team downloaded and paid for docket reports for all of 2016, both civil and criminal, in all 94 district courts. They also, downloaded the last 10 years from the Northern District of Illinois to take a longitudinal look. They scraped information including judge, attorneys, parties, etc. They then categorized litigants–companies, government, and private individuals. One of the more challenging aspects of the project has been that different courts use different terms/language for things that are the same for the purposes of this project.

Their proof of concept project involved the rate at which fee waivers were granted among the courts. Recently, Andrew Hammond wrote a law review article discussing how district courts have different ways of figuring out if someone’s financial situation qualifies for a fee waiver. Some districts have a simple form, while others’ forms are more detailed. Regardless of the court, the decision to grant the waiver is determined with a judge’s discretion. The argument is that this is a recipe for problems. SCALES decided to do an empirical study on this and used machine learning to determine how frequently people asked for in forma pauperis status and how often that was granted or denied. If this were a consistent determination, there would be a standard bell-curve distribution, which ended up not being the case. Not only are there differences between districts, but with judges in a given district as well. Removing PACER’s paywall would make more studies like this feasible and potentially uncover further inconsistencies in court operations.

SCALES is focused on building machine models that can deal with natural language so that people can go and ask the questions they need answered. This process involves downloading more dockets, downloading the underlying documents, integrating those with other data, and building a search interface. The idea is to “generalize [the] ability to ask questions, analyze data, and find answers.” Other questions they seek to answer include how long cases last, motion grant-rates, rates of settlement, and the rates at which documents are sealed and redacted. The more time the machine learning process has to work, the easier it should be to answer these questions. Broadly speaking, they hope to systematically identify differences or potential differences between courts and judges.

Question and Answer

Q: On fee waiver research, is there any basis for statistical significance? Have you gotten any pushback from judges?
A: Our analysis looks at each district and compares a judge’s difference to the mean population of other judges. We identify how many judges in a district vary from the cohort. Overall, across judges, 40% differ from cohort population. Also, on fee waivers, we’ve restricted our analysis to judges that have seen 10 or more motions to proceed in forma pauperis per year. On the second question, the only kind of push is on questions of comparisons between courts. Further, by chance because cases are randomly assigned, there will be differences sometimes. We tested to see if the distribution matched what you might get by chance, which was not the case.

Q: Are you suggesting there’s myriad difference between courts and judges, which makes it harder to determine certain things by machine learning?
A: Yes, courts and judges use language differently, and the variations are rampant.

Q: Is there a timeline to wrap up or make something available to the public?
A: In a perfect world, we’re looking at a year out for something for the public, though it’ll be limited in the beginning because we know it won’t be perfect. But yes, next year. Feel free to email if you want access and would like to help with data testing.

Committee Announcements

Community Service – Jesse Bowman announced that we’re supporting the Public Interest Law Initiative today. PILI is an Illinois-based organization committed to equal access to justice. You can make a donation at on the website.
Nominations and Elections – Joe Mitzenmacher announced the slate of candidates for our next election. Carrie Port and Patricia Scott are running for Director, Sarah Andeen and Mike McMillan are running for Secretary, and Shari Berkowitz Duff and Scott Vanderlin are running for Vice President/President-Elect.
Continuing Education – Todd Hillmer announced their next program, which will be a bystander intervention training held in coordination with Asian Americans Advancing Justice. The event will be online, December 3 at noon.
Grants and Chapter Awards – Jessie LeMar reminded everyone that we still have grant money for this year. If there’s continuing education outside of a conference that could be funded, let the committee know. They’re a bit more flexible this year on what they’ll fund.
Meetings – Carrie Port announced a trivia night on December 17 at 6:00 p.m. It’s free to all and will be team-based. Even if you don’t have a team going in, sign up in advance so they know who to expect. Jill Meyer also announced the CALL Book Club. A survey is out to determine the selection, and the meeting will be in January.

CALL Bulletin

November Business Meeting – David Schwartz and Adam Pah from SCALES-OKN

Newsletter of the Chicago Association of Law Libraries