Originally posted on April 30, 2013
On April 19th, 60 invited delegates convened on Washington, DC with the Federal Rules Committee to discuss Technology Assisted Review. The object of the meeting was to have the delegates give their perspectives on whether the Rules currently being readied for public comment should incorporate changes that take into account the unique needs of TAR. My overall conclusion is that the Duke Conference was an outstanding event and it went along way to show that attorneys need more transparency when using TAR or it becomes very hard for parties to effectively cooperate.
My first observation of this Duke Conference was that in spite of the clear sentiment that TAR is extremely useful technology, there is substantial disagreement among the experts on which types of tools offer the best approaches. The kick off presentation defined the two primary forms of TAR as machine learning approaches and Rule based linguistic modeling.
Future discussions revealed a further split in the machine learning group between two camps. The first group, the “Random Samplers” believe random sampling is the only way to validate TAR results transparently by providing measurements on recall and precision which give some comfort to users on the efficacy of TAR on a given project. This group allows for different TAR search techniques but the measurement on how the tool ultimately performed is based on a random sample of the collection.
The other camp, the “Multi-Modulars,” suggests that the low prevalence of responsive ESI in many collections makes it harder to gain comfort with random sampling. They believe that TAR users need to dig more in an iterative and experimental manner to find what they are looking for. This means that they use TAR tools by following their intuition to search out responsive ESI. An example of this process would be using a straw to drink a milkshake (the “collection”) and moving the straw (‘multi-modal search techniques”) around the collection and shaking the straw until a sufficient number of hot documents and responsive documents are identified. This approach is followed by a statistical test to see how the approach did with the goal of catching as much responsive data in collections which have a low prevalence of responsive data.
Both sides took shots at each other supported by strong arguments with statistical techniques but clearly disagreed which approach is best. However, it is clear that both sides are extremely bullish on the use of TAR in general. It is also clear that techniques from both approaches can work and may in fact be preferred depending on how rich the data set is.
For me, the disagreement only serves to undermine the ultimate benefit of TAR which represents a technological advance to help us get control ever growing data collections which human armies are unable to review. My advice is that TAR practitioners need to be flexible enough to use different work flows, even ones they might not completely endorse.
There were some in the audience who called for more research before we throw away key word culling and linear review workflows. But the most negative comments that came from this group still acknowledged that TAR methods do some things very well and warrant being a part of any attorney’s tool kit. That happens to be a view that I can live with during this “figure things out” period.
Having said that, waiting for research is not an excuse to walk away from improving the speed, cost and accuracy of document review in eDiscovery. The tools currently available work reasonably well in my opinion and based on the academic research already presented by TREC and the E-Discovery Institute, as well as cases like Global Aerospace and Actos, it can be shown that Predictive coding done in a cooperative manner can speed up discovery and result in higher recall levels and lower costs than linear review typically produces.
One of the big fights (which I helped to start) was on whether there needs to be more transparency and production of samples including Null Sets to satisfy a Rule 26(9)(g) “Reasonable Inquiry” standard. This standard requires a production to be made in order for disagreement to become ripe over the sufficiency of the “reasonable inquiry” by the producing party. As a result, there are no opinions in this area supporting more transparency in sharing actual documents including the null set between parties. These rules also provide plenty of ammunition to parties who seek to be less than transparent. The fights between parties may be civil, but when actual documents are not shared, it slows down the use of TAR tools.
This lack of transparency and the resulting diminished cooperation has caused some in the judiciary to call for Daubert hearings on TAR tools when there is a dispute to evaluate TAR. These calls are coming from Judges who are fans of the increased use of technology.
Personally, while the argument holds together technically, it loses sight of the fact that these tools operate differently with different operators and different data collections. There is no speedway to try these tools on and measure performance. But there is always random sampling which can provide snap shot comparisons with confidence levels and margins of error which can in many instances give users comfort in the TAR output. These statistical tools work best when there is cooperation and transparency. This is the topic of the Predictive Coding boot camps that we are presenting around the country.
The best result of having this discussion among so many knowledgeable participants is no expert was able to push their view from the podium as being the clear-cut best approach. The audience was filled with experts who have different perspectives who were not bashful in participating in the discussion. I give kudos to Duke University for putting together such a fair and open forum.
All in all, the technical level each of the presentations was appropriate for the type of audience at the conference. Some of the charts shown could have been considered a little scary if you were not caught up on statistics.
The key take away was the proposed new rules for Civil Procedure dealing with E-Discovery will be ready for public comment on August 15th. So the audience and readers of this blog are both encouraged to respond to the Rules Committee starting on August 15th.
I was able to ask a half dozen questions and am ready to post some commentary directed around the current version of the Rule 26(g) and whether the “reasonable inquiry” requirement can truly be monitored in a cooperative and transparent manner without more disclosure between parties.
Let’s hope sanity prevails over time.