In a recently published study titled “Advice from Counsel: Can Predictive Coding Deliver on Its Promise?” by FTI Consulting, leading corporations and law firms were asked their perspectives on Predictive Coding technology. The study listed the top reasons respondents had for using Predictive Coding technology as well as the top four concerns respondents have with Predictive Coding technology. In this series of blogs, I will address the four concerns respondents had.
The first concern was related to how it works. Respondents expressed concern about understanding and explaining in court “what goes on underneath the hood” with predictive coding software.
Predictive Coding is not just a piece of software. It’s a process which must include input from a case expert, keyword-agnostic analytics to search for and find key documents to create seed sets to train the system, a proven workflow to deliver statistically certain results, iterative machine-learning (utilizing mathematical algorithms) to find documents based on meaning (not keywords) and an integrated sampling process for unparalleled defensibility to be considered a true Predictive Coding solution.
The respondents in the study are no doubt getting hung up with the “iterative machine learning” and “mathematical algorithms” terminology in the Predictive Coding description. What the heck is an algorithm…it sounds medical?. “Machine learning” sounds dangerous… Isn’t that called “SkyNet”? I can’t balance my checkbook…how am I supposed to describe this in court?
Algorithms and Machine Learning Explained
Let’s look at some simple descriptions of the terms in question; algorithm and machine learning. Simply put, an algorithm is any series of steps which, if followed properly, always yield a correct result. Below is an example of a simple algorithm:
Above graphic from the website: http://personal.denison.edu/~havill/algorithmics/everyday_algs/Login.jpg
To take the description further, a mathematical algorithm is a set of steps used to solve a mathematical computation such as; “divide 73 by 3”. The steps you would use to answer that question are the mathematical algorithm to solve for the answer 24.33333.
Machine learning is a discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on experiential data. A major focus of machine learning techniques is for the computer to automatically “learn”, through iterative training, to recognize complex patterns and make intelligent decisions based on that learning on much larger data sets.
Everyday examples of machine learning applications utilizing mathematical algorithms include: facial recognition software, optical character recognition software, financial high speed trading programs, traffic-light controls, credit card fraud detection systems, and email spam filtering. For example many email spam filtering solutions use the Bayesian Algorithm (shown below) to find and act on spam.
Circling back to the main question posed at the beginning of this blog; is Predictive Coding too complicated for a Judge and or Attorneys to understand and therefore indefensible in court? Or to rephrase the question; do all parties to a case need to fully understand the mathematical algorithms and probability theories used in Predictive Coding solutions for them to be defensible in court?
The answer to the question is NO – of course not. The fact is our society relies on machine learning and mathematical algorithms every day. Vendors who have jumped into the Predictive Coding market recently like to throw around marketing terms like “Blackbox” and “Transparent” to try to differentiate their unproven and in many cases technically lacking Predictive Coding offerings. They tend not to mention the fact that true Predictive Coding solutions have already been adopted by leading Judges as a suggested best practice to reduce eDiscovery cost as well as raise accuracy.
The documented fact that true Predictive Coding solutions, used successfully in hundreds of cases, produce results with much higher accuracy rates and much lower error rates than traditional linear review means the “indefensible” argument is a red herring.