Nov 7

Originally posted on November 13, 2012


My last two posts have focused on the predictive coding metrics that so many eDiscovery professionals are waiting for with bated breath.  What is the real problem here?  It’s not that we don’t have standards which are reasonableness or proportionality, or that we don’t have metrics which are present almost everywhere you look when considering these tools.  The problem is that lawyers don’t think in large enough numbers to understand the meaning of the basic metrics they have in front of them. Most of the basic metrics can be validated by simple sampling principles to not only make an argument that their approach is reasonable, but also to know when to disagree with an opponent in the heat of battle.  It is much easier for an opponent of TAR to give a blanket statement about not wanting to miss any documents and having smart people, instead of computers, look at the documents because that is the surest way not to miss responsive ESI.  By the way, there are no reasonable metrics in that position because there is never perfection with the search for relevant ESI, and not a single number can be shown in practice that supports this position!!

Even without metrics, we all know that there is way too much information to review in most collections and we need a way to find what to look at because litigation holds are cesspools of unstructured data.  Intuitively, it would seem apparent that using technology is the only way to search for data to review.  As a result, lawyers have used date ranges, file types, custodian selection, and key words to try and cull what to be reviewed as the only way to avoid looking at way too much unresponsive data.  These are all forms of technology assisted review because the underlying metadata or key word hit is revealed based on using technology to filter the documents.

We do have plenty of metrics showing that this approach does not work very well.  Maura Grossman and Professor Gordon Cormack’s ground breaking article Technology-Assisted Review in E-Discovery Can Be More Effective and Efficient Than Exhaustive and Manual Review, XVII Rich. J.L. & Tech. 11 (2011), revealed how much more precise machines assisted review results were in TREC in identifying responsive documents. It was also found that they were marginally better in finding responsive documents than traditional manual review approaches as well.

The Electronic Discovery Institute’s study overseen by Herb RoitblatAnne Kershaw and Patrick Oot showed similar results but elected to say TAR was at least as good as human review to be less controversial, given the huge cost advantage TAR has in terms of time and money.  See Document Categorization in Legal Discovery: Computer Classification vs. Manual Review, Journal of the American Society for Information Science and Technology, by Herbert L. Roitblat, Anne Kershaw and Patrick Oot, Vol 41 No. 1, 2009.

The infamous Blair Maron study from 1985 showed just how poorly key word searching does in the aggregate.   Seasoned litigators and paralegals using key word searching estimated they found 75% of the documents in a collection but, in fact, found only 20% of the documents (Blair and Maron, Communications of the ACM, 28, 1985, 289-299).  There is also the question of consistency within a set of reviewers.  There are studies which show that people on average agree with each other 50% of the time, or a coin toss, and that number drops to 30% when a third reviewer is added.  Roitblat, H. L., Kershaw, A. & Oot, P. (2010). Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, Journal of the American Society for Information Science and Technology, 61(1):70-80.

Add to the mix the Rand Institute’s recent study on the economic costs with $.73 of every eDiscovery dollar being spent on review – dwarfing the costs being spent on collecting and processing the data – and reasonableness arguments for using TAR based on a metrics avalanche against the status quo is impressive to most everyone I speak to.  Everyone, that is, except litigators in the trenches and their clients who need to be able to argue they have done enough with TAR or not but perhaps don’t understand the metrics well enough.  See the not for profit Rand Institute for Civil Justice study entitled Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery, by  Nicholas M. Pace and Laura Zakaras, Santa Monica, CA: RAND Corporation, 2012.

I will explore predictive coding metrics and some of the real world cases in my next post.

eDiscoveryJournal Contributor Karl Schieneman

- See more at:

leave a reply

Recent Posts

Recent Comments