Artificial Intelligence (AI), chiefly in the forms of machine learning, natural language processing, and computational topic modeling, is fueling the new generation of e-discovery and contract due diligence tools exploding on the legal market. But AI is also taking hold in my more wonky world of legal academia.
In Topic Modeling the President: Conventional and Computational Methods (or here), recently published in the George Washington Law Review with co-authors John Nay and Jonathan Gilligan, we demonstrate how these tools can tap into large bodies of legal text to help reveal patterns and categories that might not be easily apparent to the human researcher’s eye. The (rather long) article abstract explains our project and the potential for using AI in legal studies:
Law is generally embodied in text, and lawyers have for centuries classified large bodies of legal text into distinct topics—that is, they “topic model” the law. But large bodies of legal documents present challenges for conventional topic modeling methods. The task of gathering, reviewing, coding, sorting, and assessing a body of tens of thousands of legal documents is a daunting proposition. Yet recent advances in computational text analytics, a subset of the field of “artificial intelligence,” are already gaining traction in legal practice settings such as e-discovery by leveraging the speed and capacity of computers to process enormous bodies of documents, and there is good reason to believe legal researchers can take advantage of these new methods as well. Differences between conventional and computational methods, however, suggest that computational text modeling has its own limitations. The two methods used in unison, therefore, could be a powerful research tool for legal scholars.
To explore and critically evaluate that potential, we assembled a large corpus of presidential documents to assess how computational topic modeling compares to conventional methods and evaluate how legal scholars can best make use of the computational methods. We focused on presidential “direct actions,” such as Executive orders, presidential memoranda, proclamations, and other exercises of authority the President can take alone, without congressional concurrence or agency involvement. Presidents have been issuing direct actions throughout the history of the republic, and although the actions have often been the target of criticism and controversy in the past, lately they have become a tinderbox of debate. Hence, although long ignored by political scientists and legal scholars, there has been a surge of interest in the scope, content, and impact of presidential direct actions.
Legal and policy scholars modeling direct actions into substantive topic classifications thus far have not employed computational methods. To compare the results of their conventional modeling methods with the computational method, we generated computational topic models of all direct actions over time periods other scholars have studied using conventional methods, and did the same for a case study of environmental-policy direct actions. Our computational model of all direct actions closely matched one of the two comprehensive empirical models developed using conventional methods. By contrast, our environmental-case-study model differed markedly from the only empirical topic model of environmental-policy direct actions using conventional methods, revealing that the conventional methods model included trivial categories and omitted important alternative topics.
Provided a sufficiently large corpus of documents is used, our findings support the assessment that computational topic modeling can reveal important insights for legal scholars in designing and validating their topic models of legal text. To be sure, computational topic modeling used alone has its limitations, some of which are evident in our models, but when used along with conventional methods, it opens doors towards reaching more confident conclusions about how to conceptualize topics in law. Drawing from these results, we offer several use cases for computational topic modeling in legal research. At the front end, researchers can use the method to generate better and more complete topic-model hypotheses. At the back end, the method can effectively be used, as we did, to validate existing topic models. And at a meta-scale, the method opens windows to test and challenge conventional legal theory. Legal scholars can do all of these without “the machines,” but there is good reason to believe we can do it better with them in the toolkit.