Losing my server and with it the original DocReviewMD blog posts has given me the opportunity to repost and revisit the posts I have been able to locate and re-issue them. This was not my favorite post because it was a response to the mud slinging which occurred in da Silva Moore. My original draft posting never saw the light of day. To put it mildly, I was incensed at the personal attacks made in this case which time has shown were highly ineffective. There is no vendor conspiracy in this field. Judges who offer their time to educate are doing just that, educating. Education remains the biggest challenge in the advancement of document review and that is why we launched the Predictive Coding Thought Leadership Series which has the involvement of many judges who are all freely donating their time to help educate lawyers on their reaction to new technology like predictive coding. Enjoy…. Read More
Since my server died with the entire collection of DocReviewMD blog posts, I figured I would update these posts based on where the predictive coding field is today and treat these posts as a re-release. I remember drafting this post last year when I thought rather naively that we had just changed the litigation world. Global Aerospace then continued with the predictive coding process being run, meet and confers with the opposing lawyers, and an agreement to produce the documents we coded using predictive coding. While the case eventually settled, Global Aerospace for me stands as the best example of a successful predictive coding case. We got the order we needed to use the process when opposing counsel wouldn’t agree to use predictive coding. We then ran the software program and trained it using an expert reviewer who was a partner in the law firm’s construction litigation group. We stabilized the tool then met with opposing counsel and talked through how we coded documents including sharing the null set. Opposing counsel disagreed on a small number of documents we had coded not responsive. We added these documents into the language model and pushed the button. No other case has made it this far. I believe it was the transparent approach we followed to make discovery about getting agreement on the process which is best accomplished when you are transparent. Contrast that with Da Silva Moore which is still an ongoing case, and you can get a sense of how successful we were. The end result is we culled the 1.3 million documents in the collection to 200,000 documents with family members included and got the review done with a handful of attorneys. A great outcome which resulted in two Wall Street Journal articles. Yet for some reason today, the acceptance of predictive coding hasn’t taken off. Which is one reason why we have been touring the country offering inexpensive CLE’s through the Predictive Coding Thought Leadership Series on how to validate predictive coding results. Read on and enjoy…. Read More
Originally posted on March 27, 2012
When a server crashed and I lost my many of my original blog posts, it gave me the opportunity to update the DocReviewMD posts and look back. Sorry to say, I have been unable to locate the first da Silva Moore blog post but I will keep looking. Today that case continues to grow and has become a battle between experts. In essence it is a train wreck. But this is no surprise as it is the first case using predictive coding and there are many issues surrounding the use of these tools which we are still figuring out. That is the basis of the 2013 Predictive Coding Thought Leadership Series I have been touring the country with in 2013. But it is fun to look at the back catalogue on what we were saying back in early 2012 about these tools. Enjoy….. Read More
Kleen Hearing Day Two – The Battle of Boolean Searches versus Sampling and Predictive Coding and Attacking Expert Witnesses
Losing my entire collection of blog posts due to a server error has allowed me to revisit the posts I could find on my hard drive and update them like artists do with box collections. So this first paragraph updates the blog post. I actually updated this post with an eDJ Group post last year when the Kleen Products litigants agreed to cooperate and continue the current Boolean Searches. Many pundits claimed a victory for the anti-predictive coding camp. I said it was an uphill battle for the plaintiffs because of the burden changing in mid stream a review approach that the Defendants’ had been using across 6 defendants. I was thrilled to get to watch this argument first hand but the real challenge today isn’t the arguing the process. It is that lawyers do are not yet comfortable on how to validate these tools to even propose in large numbers that they start using predictive coding. That is the purpose behind the Predictive Coding Thought Leadership Series which I have been touring the country and leading a CLE programs on validation and statistics. So enjoy my accurate assessment of this case, though I admit I overstated the pace of future change to predictive coding…. Read More
Originally posted on April 11, 2012
da Silva Moore + Kleen Products = It’s All About the Math
This was one of my favorite early blog posts and is the theme of the Predictive Coding Thought Leadership Series I have been teaching around the country. I think it is more relevant today than when I initially posted it. Due to a server issue, we lost all of the original blog posts from DocReviewMD which is giving me an opportunity to “re-release” these posts, with an updated introduction paragraph identifying the relevance of the post today. Enjoy…..
It is nice to know that people are reading these blog posts. It’s amazing how much more immediate a reaction is to a blog post than a podcast. Now that I have had a chance to digest where we are at, I thought it would be a good chance to summarize where I believe the tea leaves say we are with Da Silva Moore and Kleen. The posturing over judicial and vendor conspiracies, untested technologies, if an expert is competent and who should control the process selected for document review and production are really side issues in both of these cases. The common strand in both of these matters is offering some comfort to an adversary that most of the ESI that is related to a case is in fact being produced and the gaps in production are not intentional but are caused by search and retrieval limitations. Both cases are about this same issue and are less about predictive coding than people realize.
The Da Silva Moore plaintiffs are comfortable that predictive coding works better than key word searching. What they are not comfortable with is how they can be sure they received the relevant documents and that is what is at the heart of their debate over the size of the sampling to be done to validate the process. The Kleen plaintiffs want predictive coding to be used because they know identifying key words in an antitrust case is really hard to do. They believe their best chance to find the relevant ESI they hope to find is to use some form of predictive coding across a wide number of data sources and custodians. The key theme here with both parties is a lack of comfort from parties receiving the ESI on what they are receiving and a desire to receive what they are entitled to.
The answer to this issue has to come from statistics. There is no way to measure a large data set and say what was done without sampling. It is too expensive, reviewing everything is prone to huge inconsistencies, and it would take too long. Providing these assurances is what sampling is intended to do and has been doing since the time of the Greeks. So the easiest way for lawyers to understand these cases about predictive coding is to forget about predictive coding even being involved in the case. The analysis I am blogging about works the same with predictive coding, computer assisted review, key word guessing, linear review, producing every 5th document or any method of document review or production going forward. The test is a simple two part test:
- How did you find and produce your relevant ESI and discharge your Rule 26(g)(1) certification obligation or other similar state court obligation that you have done a reasonable job?
- How are you going to show you satisfied this obligation to the other side and the court?
The issue of what you did really becomes less relevant to this test. It only becomes important if you can’t show that you met your burden in the second question. That is when picking the right approach really becomes more of an issue. It is this reason why in my mind predictive coding will eventually take off. Kleen cries out for this result as you hear the defense talk about the thousands of hours spent, over a million documents produced, and teams of experts used to find key words. But the Kleen defendants have not shown is a sample process across the population of what has been left behind or what is potentially there. This information is what any party on the other side needs to see to be comfortable they have received what they are entitled to. This is where the ugliness can occur. Going forward, these disputes need to be addressed earlier on and before resources are spent. No one wants to pay to redo work. But knowing this re do risk exists once we start measuring will undoubtedly aid judges and parties in deciding if a production is complete, to choose the most efficient review approach going forward just in case they have to do more work!! This is one of the strengths of predictive coding. It enables parties to do more work if they have to or facts change. Linear review does not. But parties should be able to choose their own processes. They should also have to show results that prove they have done a reasonable job. Again these cases are about statistical sampling more than predictive coding.
So my final conclusion about both of these cases is regardless of what the decisions are, the issues are not going away. Large data sets, fear the other side is hiding something, unknown technology, mistakes being made, high costs …it’s a long list of fears parties have which are compounded by the uncertainty of what one is receiving in a production. Understanding how to make reasonable statistical offerings and providing sampling results is the answer I believe of both of these cases and the many more like them that will follow regardless of whether predictive coding, stratified key word selection, linguistic analysis or key word guessing is used.