Since my server died with the entire collection of DocReviewMD blog posts, I figured I would update these posts based on where the predictive coding field is today and treat these posts as a re-release. I remember drafting this post last year when I thought rather naively that we had just changed the litigation world. Global Aerospace then continued with the predictive coding process being run, meet and confers with the opposing lawyers, and an agreement to produce the documents we coded using predictive coding. While the case eventually settled, Global Aerospace for me stands as the best example of a successful predictive coding case. We got the order we needed to use the process when opposing counsel wouldn’t agree to use predictive coding. We then ran the software program and trained it using an expert reviewer who was a partner in the law firm’s construction litigation group. We stabilized the tool then met with opposing counsel and talked through how we coded documents including sharing the null set. Opposing counsel disagreed on a small number of documents we had coded not responsive. We added these documents into the language model and pushed the button. No other case has made it this far. I believe it was the transparent approach we followed to make discovery about getting agreement on the process which is best accomplished when you are transparent. Contrast that with Da Silva Moore which is still an ongoing case, and you can get a sense of how successful we were. The end result is we culled the 1.3 million documents in the collection to 200,000 documents with family members included and got the review done with a handful of attorneys. A great outcome which resulted in two Wall Street Journal articles. Yet for some reason today, the acceptance of predictive coding hasn’t taken off. Which is one reason why we have been touring the country offering inexpensive CLE’s through the Predictive Coding Thought Leadership Series on how to validate predictive coding results. Read on and enjoy…. Read More
Originally posted on March 27, 2012
When a server crashed and I lost my many of my original blog posts, it gave me the opportunity to update the DocReviewMD posts and look back. Sorry to say, I have been unable to locate the first da Silva Moore blog post but I will keep looking. Today that case continues to grow and has become a battle between experts. In essence it is a train wreck. But this is no surprise as it is the first case using predictive coding and there are many issues surrounding the use of these tools which we are still figuring out. That is the basis of the 2013 Predictive Coding Thought Leadership Series I have been touring the country with in 2013. But it is fun to look at the back catalogue on what we were saying back in early 2012 about these tools. Enjoy….. Read More
Since losing access to the server which hosted our blog site, I have been trying to update my blog posts with what I have on my hard drive. This has given me the opportunity to create a new paragraph as an update on where we stand today as opposed to where we stood almost 3 years ago when this blog site was launched. The thesis of DocReviewMD rings even truer today. Linear review lives on and is still the primary review approach being used. Nothing wrong with that. There are ways to do good reviews even with older tools. But it does take more effort and costs more. Studies show it is not more effective. So why do it? A lack of education. That is why we launched the Predictive Coding Thought Leadership Series tour in 2013. To try and teach simple statistics so lawyers can become more comfortable using technology to support reviews. This is just what the Doctor ordered. Read on and enjoy…. Read More
Originally posted on April 11, 2012
da Silva Moore + Kleen Products = It’s All About the Math
This was one of my favorite early blog posts and is the theme of the Predictive Coding Thought Leadership Series I have been teaching around the country. I think it is more relevant today than when I initially posted it. Due to a server issue, we lost all of the original blog posts from DocReviewMD which is giving me an opportunity to “re-release” these posts, with an updated introduction paragraph identifying the relevance of the post today. Enjoy…..
It is nice to know that people are reading these blog posts. It’s amazing how much more immediate a reaction is to a blog post than a podcast. Now that I have had a chance to digest where we are at, I thought it would be a good chance to summarize where I believe the tea leaves say we are with Da Silva Moore and Kleen. The posturing over judicial and vendor conspiracies, untested technologies, if an expert is competent and who should control the process selected for document review and production are really side issues in both of these cases. The common strand in both of these matters is offering some comfort to an adversary that most of the ESI that is related to a case is in fact being produced and the gaps in production are not intentional but are caused by search and retrieval limitations. Both cases are about this same issue and are less about predictive coding than people realize.
The Da Silva Moore plaintiffs are comfortable that predictive coding works better than key word searching. What they are not comfortable with is how they can be sure they received the relevant documents and that is what is at the heart of their debate over the size of the sampling to be done to validate the process. The Kleen plaintiffs want predictive coding to be used because they know identifying key words in an antitrust case is really hard to do. They believe their best chance to find the relevant ESI they hope to find is to use some form of predictive coding across a wide number of data sources and custodians. The key theme here with both parties is a lack of comfort from parties receiving the ESI on what they are receiving and a desire to receive what they are entitled to.
The answer to this issue has to come from statistics. There is no way to measure a large data set and say what was done without sampling. It is too expensive, reviewing everything is prone to huge inconsistencies, and it would take too long. Providing these assurances is what sampling is intended to do and has been doing since the time of the Greeks. So the easiest way for lawyers to understand these cases about predictive coding is to forget about predictive coding even being involved in the case. The analysis I am blogging about works the same with predictive coding, computer assisted review, key word guessing, linear review, producing every 5th document or any method of document review or production going forward. The test is a simple two part test:
- How did you find and produce your relevant ESI and discharge your Rule 26(g)(1) certification obligation or other similar state court obligation that you have done a reasonable job?
- How are you going to show you satisfied this obligation to the other side and the court?
The issue of what you did really becomes less relevant to this test. It only becomes important if you can’t show that you met your burden in the second question. That is when picking the right approach really becomes more of an issue. It is this reason why in my mind predictive coding will eventually take off. Kleen cries out for this result as you hear the defense talk about the thousands of hours spent, over a million documents produced, and teams of experts used to find key words. But the Kleen defendants have not shown is a sample process across the population of what has been left behind or what is potentially there. This information is what any party on the other side needs to see to be comfortable they have received what they are entitled to. This is where the ugliness can occur. Going forward, these disputes need to be addressed earlier on and before resources are spent. No one wants to pay to redo work. But knowing this re do risk exists once we start measuring will undoubtedly aid judges and parties in deciding if a production is complete, to choose the most efficient review approach going forward just in case they have to do more work!! This is one of the strengths of predictive coding. It enables parties to do more work if they have to or facts change. Linear review does not. But parties should be able to choose their own processes. They should also have to show results that prove they have done a reasonable job. Again these cases are about statistical sampling more than predictive coding.
So my final conclusion about both of these cases is regardless of what the decisions are, the issues are not going away. Large data sets, fear the other side is hiding something, unknown technology, mistakes being made, high costs …it’s a long list of fears parties have which are compounded by the uncertainty of what one is receiving in a production. Understanding how to make reasonable statistical offerings and providing sampling results is the answer I believe of both of these cases and the many more like them that will follow regardless of whether predictive coding, stratified key word selection, linguistic analysis or key word guessing is used.