Judge Peck's Predictive Coding Game-Changer

On Friday, Southern District of New York Magistrate Judge Andrew Peck issued what LTN referred to as “a much-anticipated opinion” on the use of computer-assisted review (aka “Predictive Coding”) technology and workflow in civil litigation. The 49-page opinion was an instant sensation for one simple reason best described by Judge Peck himself: it was “the first in which a Court has approved of the use of computer-assisted review.” Commentators immediately heralded the opinion as one that will guide litigants for years to come. In one fell swoop, Judge Peck likely had as big an impact as thousands before him who spent years working to lift the eDiscovery industry out of its inefficient, antiquated, overly manual and keyword-centric past.
First, a caveat. For reasons of client confidentiality which become clear in the case’s now-public ESI protocol and order, we and others similarly situated have declined to comment on the case itself, an approach from which we will not deviate. However, beginning with this post and continuing over the next days and weeks, we will provide our thoughts about what Judge Peck's landmark opinion means for eDiscovery specifically, for the legal industry in general and as a microcosm of the larger trend toward the data-driven enterprise (aka “Big Data”).
Congratulations are in order to all those whose work provided the foundation for Judge Peck’s opinion. Many are unknown and those that are known are too numerous to list here, but the most high-profile groups include the Sedona Conference and the TREC Legal Track. Both helped push the industry – parties, litigators, judiciary and vendors alike – to use and embrace better technology and workflow to achieve the FRCP's goal of a “just, speedy, and inexpensive determination of every action.” Forward-looking vendors have also played a role. Fierce competition between different approaches has forced all to “up their game” to capture more business. Thanks to this large and impressive body of work, embracing Predictive Coding was not a hard choice for the court to make; as Judge Peck stated, “the decision to allow computer-assisted review in this case was relatively easy.”
This opinion is nothing short of a game-changer for the eDiscovery industry because it signals that eDiscovery, and indeed discovery, has entered a new phase where Predictive Coding will quickly go from cutting-edge approach to mainstream technology and workflow embraced by most, and eventually all. Lest anyone miss this key takeaway, Judge Peck made the point as clearly as can be done:
“Counsel no longer have to worry about being the "first" or "guinea pig" for judicial acceptance of computer-assisted review. As with keywords or any other technological solution to ediscovery, counsel must design an appropriate process, including use of available technology, with appropriate quality control testing, to review and produce relevant ESI while adhering to Rule 1 and Rule 26(b )(2)(C) proportionality. Computer-assisted review now can be considered judicially-approved for use in appropriate cases.”
Predictive Coding was already emerging as the preferred approach for document analysis and review, and this opinion will undoubtedly expedite the process. Given the clarity of Judge Peck's opinion and the momentum behind this trend, it's not hard to imagine a time in the near future when the use of older methodologies, such as keyword-only search, keyword “clustering” and lack of integrated sampling will themselves be considered unreasonable. How’s that for rapid change?
For the sizable community of leading enterprises and AmLaw 200 firms already using Predictive Coding, this opinion will confirm their foresight in early adoption. Of greater importance to this group – especially inside counsel – is that the opinion should quickly mitigate the perceived risk of plaintiffs’ counsel claiming the use of computer-assisted methodology is unreasonable (note that this risk has already effectively disappeared with regulatory authorities, as most regulators already use Predictive Coding). Seemingly overnight, the shoe appears to have moved to the other foot…which raises the question: how long until not using Predictive Coding creates a strategic risk to a party?
Much of the rest of the industry will likely view Judge Peck's opinion as the green light it has been watching and waiting for. Bedeviled by ever-increasing volumes and variety of corporate information, and the velocity at which employees create it, corporate counsel have been embracing new approaches for years while similarly pushing outside counsel to do the same (with mixed results in the latter effort). With this decision, outside counsel now have judicial “air cover” to use Predictive Coding to dramatically improve their ability to get to the heart of any matter quickly, thereby allowing them to provide better, more timely service to clients. We can’t speak for others, but based on our recent results this transition was well underway before Judge Peck’s opinion – and will surely accelerate significantly in the months and quarters ahead.
What happens next? The first order of business is education. Many will want to get up to speed on exactly what Predictive Coding is, how it works, how it is employed and how they can get started using it. The best place for them to go for guidance is to those practitioners who have been using Predictive Coding for years and thus have the extensive, hard-won experience with which to educate. Among others, examples of such sources can be found at www.predictivecoding.com, at end user-driven seminars and from leading firms like WilmerHale, Morgan Lewis and Fulbright & Jaworski. As the leader in Predictive Coding, Recommind will be formalizing the educational options it has offered clients for years – most of which will also now be made available to virtually anyone who wants to learn more.
Through these and other educational efforts, we are confident Predictive Coding will become the de facto method of conducting document review as part of civil litigation, regulatory investigations and internal investigations by 2013. The benefits of Predictive Coding are many, including better and quicker strategy formation, a leveling of the playing field (as defendants can avoid “eDiscovery blackmail”), and making proportionality an honest-to-goodness argument to be made in and by courts of law. Predictive Coding’s uses go well beyond document review, with inside counsel already deriving enormous benefits and cost savings using Predictive Coding as part of the preservation, collection and ECA processes. But most of all, to quote Bob Trenchard of WilmerHale, this development will let lawyers "get back to lawyering."
While it had been anticipated for some time, Judge Peck’s opinion approving the use of Predictive Coding in the Southern District of New York is still a momentous decision and the culmination of an enormous amount of work by many smart, driven organizations and people. The opinion, and the rapid mainstream adoption of Predictive Coding it portends, clearly point to a brighter future. The sooner we as an industry begin to reap its benefits, the better.
Preservation & Proportionality after Pippins v. KPMG

When I first read about the District Court’s decision in the KPMG/Pippins case, I tweeted that this is another example of bad facts making bad law. Now that I’ve had time to ponder the decisions---both the Magistrate’s and the District Court’s---I find that I was wrong. This is a case where bad facts make good law. In fact, the main lesson to be learned from these decisions isn’t about preservation at all.
The driving fact here isn’t that KPMG wanted to inject proportionality into its preservation obligations. Rather, it’s a lesson in cooperation during discovery. In 22 pages, Judge McMahon mentioned 12 times that KPMG hadn’t turned over even a single hard drive for examination. The third time is a good example of the tone of the decision: “KPMG, hiding behind the stay of discovery, insisted it could not produce even one hard drive for inspection by Plaintiffs.” Or the ninth:
“I certainly do not intend to reverse Judge Cott’s Order on the purported ground that he erred by concluding that KPMG failed to demonstrate that preserving the hard drives was unreasonable. Frankly, the only things that were unreasonable were: (1) KPMG’s refusal to turn over so much as a single hard drive so its contents could be examined….”
Or my personal favorite:
It smacks of chutzpah (no definition required) to argue that the Magistrate failed to balance the costs and benefits of preservation when KPMG refused to cooperate with that analysis by providing the very item that would, if examined, demonstrate whether there was any benefit at all to preservation.
On the law, the District Judge got it exactly right. If you divorce reasonableness from a party’s preservations obligations, you open a loophole to the very idea of proportionality that would dwarf any benefit. Preservation is a huge cost. In a potential class action with thousands of plaintiffs, preservation costs can contribute to a “settle to avoid costs” mentality that discovery rules are supposed to avoid.
Judge McMahon specifically acknowledged that proportionality applied to preservation obligations: “[p]reservation and production are necessarily interrelated. The application of the proportionality principle to preservation flows from the existence of that principle” in the discovery rules.
But since KPMG failed to allow examination of even one of the hard drives it seeks to destroy, the Magistrate, nor the District Judge after him, was able to balance costs against benefits.
The lesson of this case is in the necessity of cooperation, not in the application of proportionality. The Sedona Principles advocate for parties to cooperate in discovery. KPMG’s failure to do so here led to a preservation bill it didn’t want to pay. Bad facts made for good law, but an adverse decision for KPMG.
Taming Big Data

My dozen or so years as an industry analyst focused on information management have led me to one principal overarching conclusion, which is that success in the future for companies and other organizations that rely on information for their day-to-day operations – i.e., all of them – will be based on their ability to unlock hitherto-hidden intelligence and value from information. And they will need to do that regardless of the structure of the information – structured in a relational database, unstructured text in a file system, or anything in between.
Such things wouldn’t matter if the growth in the amount of information being created within an organization was easily manageable. But any information worker from the CIO down knows that that is no longer the case.
Call it Big Data, Total Data or any other variant, the realization has hit people that no matter your area of responsibility within the organization, the information explosion and the resulting complexity it creates are a concern for you.
If security is your concern, then as the volume of information increases, spreads across more connected devices, in more formats and in more countries, then your information risk is heightened.
If storage is your area, then you don’t want your storage costs rising in parallel with the growth of information in your organization, so identifying information that can be defensibly deleted or at least archived off to cheaper storage is a major priority.
And if you’re a user of business applications, you want to be sure that decisions you make using those applications are based on the totality of information available to you; not merely a subset of transactional data from a relational database. If you rely on that structured information alone, you’re missing out on what customers are saying to your sales people, support people, call center operatives, or saying to the whole world via social media, such as Facebook and Twitter.
So how do you achieve this? Extracting value from relational databases has long been understood. An entire sector of the software industry – business intelligence – was built upon adding a layer atop the relational database business. What hasn’t been achieved yet is the same level of application sophistication leveraging unstructured information.
The key piece of technology needed to make that happen is categorization; software that can analyze unstructured information to identify words and phrases in documents, emails, tweets and so on and determine what they’re about.
It is at the heart of application areas such as e-Discovery, information governance, data loss prevention, compliance and whole swathe of search-based applications; that is, applications that use a search index as their primary source of information, rather than a relational database (though may pull information from such a database).
Categorization software enables users to automate processes that some people still think computers can’t handle: the understanding and interpreting of text. I’ve been watching the market for unstructured information management for more than a decade and I can say with confidence that such technology is ready for prime time.
One of the earliest and most successful adoption areas for categorization has been e-Discovery, where a whole industry has been transformed from one based on manual categorization of documents to one where it is now generally accepted that software can do it more accurately than humans - and doesn’t get tired in the process. Predictive Coding is a great example of how software - utilizing powerful categorization technology - can automate a repetitive human process, namely linear review. And if you look back at the history of the software business, the most successful applications have been the ones that have done just that - applied software to a repetitive manual process and proven that it can be done more accurately and efficiently than it could be by hand, thus freeing up knowledge workers of various stripes to focus on higher-value tasks.
I’m looking forward to seeing what other areas can be transformed by categorization-driven information management.
Legal Tech Recap
I think we’ve all recovered from Predictive Coding 2012. I mean, Legal Tech 2012. If you were there, you can forgive my confusion. I don’t have a final tally on how many panels and supersessions were on Predictive Coding because I’ve run out of fingers and toes. Suffice it to say that there were plenty.
Now, a week later, emails are getting answered again, phone calls returned. And lessons learned.
It seemed this year that all everyone talked about was Predictive Coding. Now that Recommind has validated the technology and the process, and driven widespread adoption, it seems everyone wants to jump on board.
If there's one purchasing rule, it's "beware the marketing claims of late entrants." Businesses need to put vendors to the test, and dig into whether what they’re being sold is the genuine article or just the same old technology, repackaged. In the next few weeks, I'm going to be working and writing on definitional issues: what exactly is "Predictive Coding?" What distinguishes it from TAR? [ed. note: I personally wouldn't want my review technology identified by something slow and sticky, but to each their own.] What questions should you, as corporations or law firms, be asking your vendors to help you follow that one ancient rule: caveat emptor?
Keep your eyes peeled to this space.












