Today’s posting is the last of a four-part series on PACER provided by G. Thomas Sandbach, Esq, owner of Justice Technology Consulting. Tom’s contact information appears at the end of the post.

We have a very good idea about the immediate consequences of removing the PACER “paywall.” During the free PACER pilot program, Carl Malamud and Aaron Swartz were able to capture and make available to the public 20% of the 100 million pages of documents contained in the PACER repository. Indeed, according to a profile in Wired Magazine, “Malamud dreams of a day [when all of] PACER’s legal documents are free, so that…custom search engines and new tools… make the information available to American citizens.” He insists that if he had the money he would free PACER himself, “If I had $10 million, I’d make a copy of all the documents and be done.” During a new period of free access Malamud or other enterprising folks like him could create a new database containing all of PACER’s contents that would eventually result in every word and number contained in the documents being indexed, making their contents truly a matter of public knowledge.

What about Privacy?

These documents contain a vast amount of information that can be directly linked to individuals. The privacy policy of the United States Courts makes it very clear that some information, specifically Social Security numbers (SSN), names of minor children, financial account numbers, dates of birth and (in criminal cases) home addresses must be redacted from electronic documents filed with the court. The filers, both attorneys and pro se litigants themselves, are responsible for redaction. Their responsibilities are not always scrupulously performed.

After acquiring 20 million pages of PACER records Malamud scanned them for SSN records and, according to the same Wired article:

[Malamud’s organization] Public.Resource.org used some primitive software tools to search for social security numbers in court filings from 32 district courts. The results: 1,700 confirmed documents, including one from a Massachusetts court that had a 54-page list of the names, medical problems, Social Security numbers and birth dates of 353 patients.

According to Wired, Malamud alleges “that there are … massive privacy violations lurking inside some court filings, since clerks, judges and lawyers aren’t adhering to rules about what can and can’t be in legal filings.” To his credit, Malamud reported the results to the federal judiciary in an effort to make sure that the documents would be redacted.

Unfortunately, automated tools don’t currently exist to identify and redact children’s names, account numbers, birth dates and home addresses. As a result automated redaction, either by the courts or by data aggregators like Malamud, is unlikely, if not impossible in the near future.

Further, the federal privacy rules don’t require removal of such other sensitive information as records of medical conditions and treatment or items that may enable identity theft like schools attended, maiden names, etc.

Beyond normal privacy issues, pleadings in legal actions are unlike other forms of public records. They often contain a plethora of allegations, filled with hyperbole and intended to cover all manner of actionable conduct, both provable and unprovable. In general they are exempt from a defamation action, but, once they become part of the Googlesphere they may baselessly harm reputations and jeopardize careers. Though the underlying law suit may be long dismissed or settled, naked unanswered pleadings may remain forever stored, indexed and publicly available, easily subject to misinterpretation and abuse.

Criminal Case Concerns

Criminal cases present their own special kind of challenges. While indictments and other charging documents will be readily available in a free PACER, defense information and even dismissals will be more difficult, if not impossible to find. Criminal court records, even those that have been expunged, will remain forever in a public database after being mined using free access to PACER.

There are also issues related to physical safety of criminal victims, witnesses and defendants to be considered. While, according to the federal privacy rules, home addresses should be redacted in criminal filings, many documents “fall through the cracks” and redaction simply does not occur. Privacy rules do not require redaction of other identifying information (business addresses, schools, relatives’ names, etc.), thereby endangering those who may potentially testify. Sites like whosarat.com use documents like plea agreements that have become public to “out” defendants who are cooperating with law enforcement, placing them in peril from attacks by other defendants. Once this information becomes public, there is no control over its potential abuse.

And Did I Mention…

Authenticity? While using PACER, we can have some confidence that the documents we find are actually documents that have been filed or created during the course of litigation. After all, PACER uses the actual Electronic Case Management system used for filing the documents with the courts. Unfortunately these documents are not electronically signed in some manner guaranteeing their authenticity that can be verified outside of PACER. As a result, Joe’s Database of Court Documents may contain documents of some questionable origin that cannot be relied upon, yet they may be difficult to distinguish from the genuine article.

The Answer – Trust Us!

So how do we know that the results of a free PACER will successfully deal with these issues about privacy, unsafe disclosure and authenticity? We can look to those who either have or are in the process of liberating PACER documents for free access:

Carl Malamud’s Public.Resource.org has placed its 20 million pages of documents here, in blocks of large compressed files. He has left the industry standard request for search engines not to index the files. Although most search engines (like Google) will comply with the request, they are free to ignore it. Anyone is free to download the documents and create their own completely searchable database.

The folks at the Center for Information Technology Policy at Princeton University who have developed RECAP, the Firefox add on that uploads PACER documents to the Internet Archive, advise us that “At our request, the Internet Archive has disallowed search engine indexing of the documents we submit.” They also reserve the right to change that when they can better address privacy concerns. (Presumably those would be their concerns, not necessarily mine). They also advise that they scan for SSNs and suppress documents containing them.

At best, these safeguards seem to rely on the old system of practical obscurity to protect us from potential abuse. At worst, we must depend on the good intentions of the aggregators and the voluntary standards compliance of search engines. Perhaps we can trust Carl Malamud and these folks at Princeton at this time to safeguard this private data from search engines that obey the rules. But what about a new generation of hackers, with more malevolent intent and more intrusive software and search engines with less concern for the rules; what will that mean for a free PACER?

Perhaps the situation may be best summarized by the Public.Resource.org Moral License that Malamud attaches to his trove of liberated PACER data:

These documents are distributed under a MORAL LICENSE.

If you build a site using PACER documents, you should do so with a sense of responsibility. In particular, you must maintain your archive or take it offline, and you must provide a feedback channel where the public may contact you if there are issues, such as Social Security numbers, names of minor children, or other violations by the filing attorneys of the E-Government Act. When you do find problems, you have an obligation to let the clerks of the courts know.

While it is the responsibility of the courts to obey the privacy requirements, if you redistribute this data you have a moral obligation to help the courts make this data better.

This is not the law, this is just the right thing to do.

We probably need to check back in a while to see how that sense of responsibility and moral obligation thing are working out.

NOTE: Although this is the last of a four-part series, Tom will follow-up with a postscript covering related issues … this is important because all of the issues presented in this series arise in state and local court public access debates and the same principles apply.

G. Thomas Sandbach, Esq.
tom.sandbach@justicetech.org
www.justicetech.org
Phone - 302-824-0760
Member: Delaware State Bar Association
Affiliate: IJIS Institute
Member: National Criminal Justice Association

Photo Credit = Pacer

NOTE: The views expressed in this posting are the author’s and do not necessarily reflect the position of Justice Served.
2 Responses to “Free Pacer? - The implications (guest post)”
  1. Extract Systems has been providing redaction software to courts and other public and private entities for a long time. The author incorrectly states that tools “don’t currently exist to identify and redact children’s names, account numbers, birth dates and home addresses”. These and more, are exactly the fields we DO redact (automatically). For example, we handled 22 distinct field types for the New York Secretary of State. Full disclosure… I am employed by Extract Systems. For more information see http://www.extractsystems.com.

  2. @Rasmussen - Thanks for your comments. I apologize for understating the ability of automated redaction tools to identify and redact specific types of data. I am not an expert in redaction, but common sense suggests some potential factors bear on redaction success rates:

    - Data to be redacted. Generalized data types are easier to recognize than specific data types. Recognizing proper names is different than finding all proper names. Not all dates are birth dates. As a result, in order to redact home addresses, but not those of businesses, the tool must be able to understand these data items in context.

    - Document types. Context is easier to recognize in some documents than in others. In forms, for instance, a field may be labeled “DOB”, enabling software to identify a date in proximity as a birth date. I’m sure that the identification task becomes more difficult as contents become more complex. Registration forms present less of a challenge than depositions, for instance.

    - Accuracy. The white papers on your website suggest that redaction of a given document can provide different (and predictable) levels of confidence in the results of data to be redacted. Those confidence levels would apparently reflect a combination of the success of the automated tool and the amount of verification necessary to produce a finished product.

    My initial statement obviously failed to reflect that level of complexity.

Leave a Reply