Home » Posts tagged 'legal text mining'
Tag Archives: legal text mining
Racing with the Legal Computation Machine at the Inaugural Center for Computation, Mathematics, and the Law Workshop
I took a deep dive last week into the world of legal computation, to see just how far it has come, where it is going, and how transformative it will be as a force in legal thought and practice. I was provided this opportunity as a participant in the inaugural workshop of the University of San Diego Law School’s new Center for Computation, Mathematics, and the Law (CCML). (Before going into the details, let me add that if one is going to attend a workshop, USD is one heck of a nice place to do it! To emphasize the point, and to highlight the impact the CCML already is having, the International Conference on Artificial Intelligence and Law has selected USD as the site for its 2015 annual meeting.) Ted Sichelman and Tom Smith at USD Law are the founders and directors of the CCML, and the workshop will rotate annually between USD and the University of Illinois Law School, where patent law expert Jay Kesan will coordinate the program.
By way of disclaimer, I have to emphasize that I am not a Comp Sci guy. My math ended with Calculus II, my stats ended with multivariate regression, and my coding ended with SPSS and Fortran, and all are in the distant past. To say the least, therefore, the workshop was a humbling experience, as I was reminded at every turn that I was not the smartest guy in the room! So I approached the workshop through the eyes of Law 2050—I don’t need to know how to code to know how the end product works and to assess its potential to influence legal theory and practice. From that perspective, the workshop revealed an astounding and exciting array of developments. All of the presentations were tremendously well done; here is a taste of those that resonated most with the Law 2050 theme:
Paul Ohm (University of Colorado Law School) presented a fascinating study of how to parse the U.S. Code text to extract instances of defined terms. While at the workshop, he coded a software search engine that instantaneously returns links to all provisions in the Code defining a particular term. I tried it—it works!
Dan Katz (Michigan State University Law School) presented his research team’s ongoing work on a classification algorithm for predicting affirm/reverse outcomes of U.S. Supreme Court decisions. Previous work on this front (Ruger et al., 2004) pitted expert lawyers against a classification tree algorithm applied to one year of Court decisions, with the computer’s accuracy outperforming the experts by 75% to 58%. Dan’s team applied a more advanced “random forests” classification approach to the last 50 years of Court decisions and maintained accuracy levels of 70%.
Kincho Law (Stanford Civil Engineering) presented a robust text parsing and retrieval project designed to allow the user to extract and compare regulations pertaining to specific topics. For example, if the user is interested in water toxicity regulations for a particular contaminant, the program identifies and compares federal and state regulations on point. His team also has embedded a plethora of information into many of the regulations (e.g., links to relevant regulatory documents) and has also embedded formal logic statements for many regulations, allowing the user to treat the regulations as a true set of coding.
Jay Kesan (University of Illinois Law School) demonstrated another text parsing and retrieval project aimed at unifying the various databases relevant to patent lawyers, including all the patents, court litigation, scientific publications, and patent file wrappers in the biomedical technology domain.
Harry Surden (University of Colorado School of Law) delved into what he calls “computable contracts,” referring to the trend in finance to embody contractual terms entirely as computer code. These “contracts” allow computers to understand the terms and generate real-time compliance assessments. His project assesses the conditions under which a broader array of contracting practices might move to this computable contract format and the implications of doing so.
Seth Chandler (University of Houston) gave us a deep dive into the Affordable Care Act with a demonstration of software he has developed to extract and evaluate a variety of important analytics from the database available at healthcare.gov.
David Lewis (Independent Consultant) outlined the use of predictive coding in e-discovery and presented the preliminary results of a study comparing human manual document review and computer predictive coded e-discovery accuracy based on a large (500K documents) real-world discovery event. The results suggest that predictive coding, while presenting challenges, has substantial promise.
Henry Smith (Harvard Law School) and Ted Sichelman presented work on legal entitlements illustrating the potential for legal computation to advance legal theory. Ted’s project carefully examines how legal entitlements can be represented in formal, computable logic models, and together they are developing a model for computing the “modularity” of real property entitlements using network analytics. By representing legal entitlements as networks of rights, duties, privileges, and powers, they propose a method for measuring the degree to which a property legal regime has departed from the state of fully unrestricted right to use and exclude.
Jack Conrad (Thompson Reuters R&D and President of the International Association for Artificial Intelligence and Law) explained the importance of the “use case” in developing applied uses of legal computation—i.e., what are you going to use this to do?—and also emphasized the importance of evaluation of experimental efforts using standard test sets and metrics.
Last but by no means least, Roland Vogl of Stanford’s CodeX Center for Legal Informatics Skyped in an overview of what CodeX is doing to advance information retrieval technology, legal technology infrastructure, and computational law, as well as a review of some of the start-up incubation successes (Lex Machina, LawGives, Ravel Law, Judicata, etc.).
All in all, the workshop made two things abundantly clear for me: (1) legal computation has taken off and its horizons are boundless, and (2) San Diego in March is OK!