The Year is New. Is Predictive Coding 3.0?

Bob Tennant
January 12, 2016

To paraphrase some of my favorite dialog from the West Wing, I make it a point never to disagree with Ralph Losey, when he’s right. In his recent “Predictive Coding 3.0” blog posts, Ralph is right about a great many things: Using control sets to validate the completeness of predictive coding results is problematic. At the beginning of discovery, one’s case knowledge is limited, so any initial control set is based on ill-informed review. So-called “stabilization” models do end the system’s training prematurely based on a poor standard of quality. What is inaccurate in these posts, however, is the characterization of Recommind’s machine learning approach or even that what Ralph refers to as “Predictive Coding 3.0” is new. So, with all respect to Ralph, let’s set the record straight. Recommind’s Axcelerate platform was architected, from the very beginning, to be both flexible and to incorporate reviewer feedback interactively. It leverages advanced, proprietary technology that a lot of people now call continuous learning. The continuous part means that machine learning is integrated into the review process itself. Axcelerate easily adapts to the different shape of those review processes and has from the outset. Product architecture aside, the workflows we have suggested for many years are both flexible and data driven. From the first moment our CTO and I sat down with early eDiscovery thought leaders to get input into what problems eDiscovery practitioners were facing, flexibility and the changing shape of cases were key criteria incorporated into our thinking. We agree with Ralph’s outline of an interactive model for working with data, but think about it more simply as consisting of three basic steps: Blog Response to Losey and TAR 3.0

    1. Analyze: Gain insight into the data, define review success criteria, and estimate the effort required to meet those criteria. Leverage all the analytical tools at your disposal—including keyword search, phrase extraction, concepts and concept search, communication visualizations, metadata-based filtering, estimation sampling, and more.
    2. Identify: Identify sets of documents that are potentially relevant and then batch those documents out to reviewers to code. In the most efficient reviews, both strategic search and machine learning (i.e. documents categorized as “suggested” after an iteration of Predictive Coding) are utilized in tandem to identify responsive material. As the number of reviewed documents grows over time, the system refines its training and suggests additional documents likely to be responsive in light of review team feedback. This process continues until the team believes that the review goals have been satisfied.
    3. Validate: Ensure that the review goals have been met. At this stage, the team usually performs a validation test, evaluates the results, and considers the overall completeness of the review. Identifying when the review is complete is a decision made by the case team, and different matters will require different types of validation. As with any review, there needs to be confidence that the review process was reasonable and comprehensive.

This approach, paired with our continuous learning technology, yields numerous benefits:

  • It incorporates rolling data loads and even rolling productions seamlessly and without disruption.  
  • It anticipates that all relevant, produced documents are reviewed by humans (though this not required and is, therefore, at the discretion of the client).
  • It lets you track your goals and results at all times on an issue-specific basis, so you can gauge success and enhance your training, where needed, in a targeted way.
  • It is not aimed at reaching a level of recall that must be defined and negotiated upfront, but rather at finding as many relevant documents as possible, to the point where continuing manual review is no longer reasonable or proportional.
  • It removes the onus of generating a gold standard “seed set” or “control set” upfront, instead letting the reviewers’ understanding evolve and improve the training at each iteration, with the entire corpus re-ranked each time.

Given that our technology can be leveraged in a variety of different workflows, why have we been recommending this approach for some time? And why have we embedded these concepts directly into Axcelerate 5’s interactive review dashboards? Because using machine learning as part of a flexible, prioritized review strategy adds value to virtually every review project. And such an approach avoids the rigid protocols that can lead to protracted motion practice and disputes over validation methodologies. So that’s the straight record on Recommind and Predictive Coding. In many ways, however, the discussion around Predictive Coding version numbers is too narrow in scope. Machine learning is, after all, just one part of an efficient review strategy. And review efficiency is ultimately about spending more time with relevant content and less with the irrelevant. This is how you can quickly find the documents that make a difference in your case. In the coming weeks, you’ll hear more from Recommind about new ways we’re enabling our clients to visualize review efficiency. Because knowing what efficiency levels you’re achieving, and which strategies are yielding the best results, is crucial to repeating your success. Come try us in 2016. Let us help you understand how Axcelerate can help you succeed. As for “Predictive Coding 3.0?” At the end of the day, it’s not the version number that matters. It’s about finding the documents that matter.