Ending the Debate on TAR Seed Sets

Hal Marcus
March 19, 2015

In the wake of Judge Peck’s recent Rio Tinto opinion on technology assisted review, the ediscovery blogosphere has been repeatedly quoting its bold pronouncements that judicial acceptance of TAR “is now black letter law” and that “it is inappropriate to hold TAR to a higher standard than keywords or manual review.” And rightly so — these statements appear intended to put outdated predictive coding debates to rest once and for all. Yet a good deal of the focus is going to the question Judge Peck raises but does not fully resolve: whether disclosure of TAR seed sets may be required. Though the issue was moot in Rio Tinto, Judge Peck offers guidance on this “robust” debate. He notes conflicting judicial guidance, implicitly questions whether courts have authority to order disclosure, and acknowledges the uncertainty as to what constitutes a seed set in the context of certain workflows. Given the pathway that Judge Peck illuminates in Rio Tinto, here are three reasons why we should finally be ready to end the debate on seed set disclosure:

Reason 1: Continuous learning defies the notion of a seed set.   Judge Peck states in Rio Tinto that continuous learning workflows make the seed set “much less significant.” Put another way, when you leverage continuous learning for a prioritized review workflow, the documents leveraged for the initial iteration should ultimately hold no more significance than any others in the production set. With continuous learning, the predictive coding engine doesn’t “stabilize” after a few rounds, thus ending the education of the algorithm prematurely. Rather, the iterations continue until concluded at counsel’s discretion. With each iteration, documents coded as relevant are fed into the system to refine the training, which repeatedly updates the rankings of all documents in the database accordingly. At the end of the day, the entire production set has essentially been leveraged for training. Short of including documents withheld for privilege and the limited number of irrelevant documents reviewed before the first iteration, the producing party has already shared the “seed set” upon delivering a production. And while not all TAR systems currently support continuous learning or do so precisely in the manner described above, the evolution of TAR technology points clearly in that direction.

Reason 2: Seed sets (to the degree they exist) are work product. My colleague Phil Favro, along with the Honorable John M. Facciola (retired), penned a recently-published law review article setting forth a compelling case that seed sets, under many circumstances, are appropriately considered attorney work product. While I won’t endeavor to do justice to their analysis here, they argue effectively that the creation of a targeted, “judgmental” seed set clearly encompasses counsel’s exercise of skill, judgment, and reasoning, thus safeguarding it from discovery under FRCP 26(b)(3) and pertinent case law. Judge Peck wisely calls attention to this publication in Rio Tinto and I expect we will see further judicial notice taken of this important article.

Reason 3: It’s the result that matters, not the process. The Dynamo Holdings opinion of September 2014, to which Judge Peck cites liberally in Rio Tinto, held that the appropriate time for the requesting party to object to the producing party’s discovery fulfillment is after production. Dynamo Holdings teaches that the process that led to a production is of little significance if the result is a satisfactory production of responsive documents. Counsel should nonetheless be prepared to defend the completeness of the review process, particularly why any unreviewed document sets did not warrant full human review.

In Rio Tinto, Judge Peck lists various methodologies that can provide assurance of the completeness of a review. Though these and other validation test methods feature a variety of names, costs, and depths of analysis, at their core they involve taking samples of documents and reviewing them. When validation methods establish that unreviewed document sets have low relevance relative to the projected cost of full review, the producing party should be fully capable of satisfying the traditional discovery touchstones of reasonableness and proportionality. At the very least, there should be no question about their needing to disclose any further documents beyond the production set – particularly non-responsive seed documents – to support their position. The reasons above provide a compelling basis to settle the issue of seed set disclosure. A contrary result would surely “hold TAR to a higher standard than keywords or manual review.”