August 14, 2015
Corpus Linguistics in the Courts (Again)
Posted by Gordon Smith

Here are the basic facts of State v. Rasabout from the majority opinion of the Utah Supreme Court, which was issued today: 

Andy Rasabout is a member of the street gang known as the Tiny Oriental Posse. On November 1, 2007, Mr. Rasabout, riding shotgun in a Honda Civic, fired twelve shots from a Glock 9 mm semiautomatic pistol at a house and a car parked in front. Lee Tran, whom Mr. Rasabout knew to be a rival in the Original Laotian Gangsters, owned the car and was inside the house at the time. But Mr. Tran was not the only person in danger. Two young girls and their mother were asleep upstairs. Several others were playing cards in the basement. And one man was standing in the carport, enjoying the crisp morning air and a cigarette.

Rasabout was convicted of 12 counts of "unlawful discharge of a firearm." But is each shot a separate "discharge"? Or should the 12 shots together be considered one "discharge"? For more insight on those questions, read below the fold.

This is a matter of statutory construction, but the relevant statute, Utah Code section 76-10-508, does not address these questions expressly. The relevant focus of the Court's inquiry is legislative intent, but what did the Utah Legislature intend when it criminalized the "discharge of any kind of dangerous weapon or firearm ... from an automobile ... within 600 feet of ... a house, dwelling, or any other building"?

The Utah Supreme Court held "each discrete shot" is one "discharge." The majority opinion came to that conclusion via conventional textual analysis, including consideration of the dictionary definition of "discharge." According to the Court, "Under these dictionary entries, the clearest reading of the statute is that discharging a weapon or firearm means shooting a weapon or firearm." 

The majority opinion also relied on a close reading of the statute, which refers to a firearm as "any device . . . from which is expelled a projectile by action of an explosive." And, of course, there is the policy argument: "it was reasonable for the Legislature to criminalize each shot fired because each shot carries an independent harm."

None of this seems dispositive, but it's pretty standard analysis from a Court. 

Associate Chief Justice Tom Lee agreed with all of the majority's conclusions, but was uncomfortable resolving the statutory ambiguity by reference to the dictionary. You should know why, but the gist of the problem is that the dictionary definition of "discharge" could mean "to shoot" or it could mean "to unload." And the dictionary does not tell us the best meaning in this context. To resolve this problem, Justice Lee turns to corpus linguistics:

In this age of information, we have ready access to means for testing our resolution of linguistic ambiguity. Instead of just relying on the limited capacities of the dictionary or our memory, we can access large bodies of real-world language to see how particular words or phrases are actually used in written or spoken English. Linguists have a name for this kind of analysis; it is known as corpus linguistics.

The fancy Latin name makes this enterprise seem esoteric and daunting. It is not. We all engage in it even if we don’t attach the technical label to it. A corpus is a body, and corpus linguistics analysis is no more than a study of language employing a body of language. When we communicate using words we naturally access a large corpus—the body of language we have been exposed to during our lifetimes—to decode the groups of letters or sounds we encounter. The most basic corpus linguistics analysis involves our split-second effort to access the body of language in our heads in our ongoing attempt to decode words or phrases we may be uncertain of. We all do that repeatedly every day.


It is a small step to utilize a tool to aid our linguistic memory. Judges do this with some frequency as well. Naturally. If judges are entitled to consult the corpus of language in our heads (and how could we not?), we must also be permitted to supplement and check our memory against publicly available sources of language.

Yes, yes, yes!

And the result in this case? A somewhat primitive search of Google News and a more sophisticated analysis using the Corpus of Contemporary American English both confirm that "discharge" of a firearm "is almost always used in the sense of a single shot."

Justice Lee has authored other opinions using corpus linguistics (most notably here), but this latest contribution will stand as a landmark defense of the method, mainly because of many excellent challenges raised by Justice Parrish in the opinion of the Court. In the following paragraphs, I interlace two of Justice Parrish's challenges with Justice Lee's responses:

Justice Parrish: Justice Lee argues that we should decide this case against Mr. Rasabout on the basis of the corpus linguistics research he has conducted sua sponte. But because his rationale is so different in kind from any argument made by the parties, Mr. Rasabout has never had a reasonable opportunity to present a different perspective. This violates the very notion of our adversary system, which "assures fairness by exempting a party from the inequity of [losing] on appeal on a ground that [he] had no opportunity to address." "[W]e should not dilute [the protections of our adversary system] by stretching their standards to justify our consideration of [an argument] we find interesting or important." Moreover, deciding this case on the basis of an argument not subjected to adversarial briefing is a recipe for making bad law.

Justice Lee: Independent investigation is foreclosed only as to "facts," not law. That is significant. Judicial analysis of the meaning of language, using corpus analysis or anything else, is aimed at interpreting the law. That is the judge’s role. In performing the core function of deciding what the law is or should be, we cannot properly be restricted from consulting sources that inform our understanding.

... To the extent the [corpus] analysis ... encompasses “facts” (e.g., as to the way the verb discharge is typically used in our language in connection with the noun firearm or its synonyms), moreover, our law makes clear that we are free to consider them as a matter of judicial notice. Otherwise my colleagues’ opinions also cross the line, as they also consider evidence not presented in the briefs regarding the ordinary meaning of discharge.

... A contrary conclusion would call into question a wide range of opinions of this court and many others. If we were foreclosed from considering outside material that informs our resolution of open questions of law, we would be barred from engaging in historical analysis relevant to a question of original meaning of a provision of the constitution, or from considering social science literature in resolving a difficult question under the common law. Linguistic analysis is no different; to the extent we charge our judges with resolving ambiguities in language, we cannot (and do not) reasonably restrict their ability to do so on a well-informed basis—even on grounds not presented by the parties, and not within the domain of judges’ professional training. Such a restriction, after all, would not just foreclose corpus analysis; it would also prevent us from consulting a dictionary or our own experience with language.

Justice Parrish: Additionally, it would be entirely inappropriate for this court to conduct the independent scientific research that serves as the basis for Justice Lee’s approach. Justice Lee admits that he is not an expert in this field and does not completely understand its methodologies, but asserts that we must "try." Linguistics is a scientific field of study that uses empirical research to draw findings. And just as with other fields of scientific study, simply trying harder will not lead us to a better answer. The knowledge and expertise required to conduct scientific research are "usually not within the common knowledge" of judges, so "testimony from relevant experts is generally required in order to ensure that [judges] have adequate knowledge upon which to base their decisions." We regularly refuse to conduct legal research, a field in which we are experts. We should similarly refuse our inclination to contrive of interesting research projects that require expertise in fields in which we have no training.

Moreover, as Justice Lee points out, "[m]ost judges are generalists." Indeed, we are aware of almost no one sitting on the bench or practicing law in this state who has the kind of scientific expertise required to reliably conduct the research Justice Lee requires. And issues of statutory construction where "both sides are able to marshal dictionary definitions in support of their view" permeate the majority of cases in this state. But in every such case, Justice Lee would require ad hoc linguistics research that could only be reliably conducted by dueling linguistics experts. Imposing such a significant financial burden on so many of the litigants coming through the doors of our courts would be tantamount to locking those doors for all but the most affluent. Moreover, it would place an unbearable burden upon our already thinly stretched district judges. That is simply unacceptable.

Justice Lee’s appeal to linguistics research would not be so disquieting if he were not conducting the research himself, but merely citing findings that have been advanced by the parties, published in a scholarly journal, or authored by a respected source. Such sources include the reports of expert witnesses, published academic articles, and widely available dictionaries. In those cases, findings are subject to review by the relevant field of study or the opposing party’s expert before we rely upon them. But if we conduct our own research, the parties are bound by our decision even if our methods or findings are subsequently found to be flawed. Accordingly, it is unfair, and indeed unwise, for us to decide a case on the basis of scientific research that is subject to neither prior review by the relevant field of study or adversarial briefing.

Justice Lee: We judges are experts on one thing—interpreting the law. And the fact that that enterprise may implicate disciplines or fields of study on which we lack expertise is no reason to raise the white flag. It is reason to summon all our faculties as best we can, and to overcome any weaknesses we may possess. This is not a matter of dreaming up “interesting research projects.” It is a matter of doing our job—of doing all we can to understand and implement the will of the legislature as expressed in the terms of its statutes, and to convey our grounds for doing so in a written opinion.

That job isn’t always easy. It involves not just linguistic analysis but also historical inquiry—e.g., in finding original meaning. Few of us have training in historical research. It may even be said that lawyers and judges “are for the most part extremely bad historians,” and may “make up an imaginary history and use curiously unhistorical methods.” Yet judges of all stripes engage in historical analysis, particularly in their interpretation of the constitution. So the response to our lack of historical training is not to back away from the enterprise; it is to arm ourselves with the tools necessary to do the best history we can.

We face a parallel problem when it comes to our analysis of the meaning of language. When it comes to training or experience in methods of linguistic analysis, most of us lack specialized training. So there is certainly a risk, to paraphrase Max Radin, of judges using curiously unscientific linguistic methods.

But the proper response to this risk is not the abandonment of the enterprise of linguistic analysis. That enterprise is an integral element of judging. Judges cannot do their job without assessing the ordinary meaning of words. So the question is not whether to engage in linguistic analysis; it is whether to do so with the aid of—instead of in open ignorance of or rebellion to— modern tools developed to facilitate that analysis.

We could continue to judge the ordinary meaning of words based on intuition, aided by the dictionary. But those tools are problematic, for reasons explained above. And the impact of a judge’s mere gut intuition is entirely opaque. So it is our current methodology and tools that involve bad linguistics produced by unscientific methods. If the concern is reliability, the proper response is to embrace—and not abandon—corpus-based analysis.

To do so well, we judges must seek to understand this field better. We are not experts. At least I am not; I do not possess a complete understanding of the methodologies at our disposal. But I am convinced that the approach employed above is essential to a more reliable, transparent fulfillment of this judicial task.

Corpus analysis, in all events, is not rocket science. At some level we all do it intuitively in our minds. It’s a small leap to check our intuition against examples of real-world language revealed by a Google- or COCA-based search of a body of written English. We don’t need much expertise to do that well.

Corpus analysis is like math—at one level it’s something basic that everyone does; at another level it’s something complicated that only "experts" can do. The type of corpus analysis I am doing—and advocating—is the former. I just think we should be using a calculator instead of doing it in our heads.

As to historical analysis, Justice Antonin Scalia and his co-author Bryan Garner have aptly repudiated the charge that “‘no one can reconstruct original understanding precisely’” with a powerful reminder of the judicial task: “Our charge is to try.” The same can be said of linguistic analysis. It may not be possible to resolve questions of ordinary meaning with absolute certainty. But we must try. And in so doing we must bring to bear the methods and tools developed in the 21st Century to better understand the meaning of language for this crucial element of judging.

Meanwhile, Chief Justice Durrant had words of praise for Justice Lee:

While I too would decline to use that method in this case, I applaud Justice Lee for his thoughtful exploration of corpus linguistics as a potential additional tool for our statutory interpretation tool box. I am open to the possibility that in certain cases it may well prove useful in our assessment of the ordinary meaning of statutory terms. But I do not consider it necessary in this case, because I find the majority’s analysis of the meaning of the term “discharge,” an analysis conducted using our long-established methods of statutory construction, to be compelling. And I would further note that although the dictionary alternatively defines "discharge" as "to shoot" or "to unload," the former definition applies specifically to firearms while the latter seems to encompass the more general concept of unloading cargo or emptying a container. Granted, the cartridge of a firearm is in some sense a "container" that can be "unloaded," but in the context of a statute that involves firearms, I am confident the legislature intended the term "discharge" to mean "to shoot," not "to unload." So unlike Justice Lee, I do not believe this is a case where the dictionary "fails to dictate the meaning that the statutory terms must bear."

Further, I am concerned about our use of corpus linguistics in a case where it has not been argued by the parties. Now, it is certainly true, that we may, and often have, employed dictionaries, canons of construction, or other tools for statutory interpretation that have not been argued by the parties. But primary linguistics research seems to me a step removed from these traditional tools. I fear that a sua sponte venture into such territory may be fraught with the potential for error. I think this is illustrated by Justice Lee’s critique of the flaws in Judge Posner’s sua sponte attempt at such research. The fact that a jurist of Judge Posner’s intellect and stature is capable of such missteps suggests to me that, even in those cases where corpus linguistics may be a useful addition to our traditional methods of statutory construction, it would be best employed by us, or by other judges, only after the parties have raised it and argued it. This would allow the respective sides in the dispute to challenge each other’s database, methodologies, and conclusions.

Finally, in our exploration of whether or when corpus linguistics should play a role in our interpretation of statutes, it is important that we weigh the potential usefulness of the approach against its potential cost. In some cases, the linguistic analysis may be sufficiently complicated that an expert would be required. This would be an unfortunate cost to add to the already-too-expensive litigation process. But I think that in other cases the lawyers could themselves conduct the linguistic analysis, outline their methods and conclusions in their briefs, and present argument as to why we should adopt their approach over their opponents’. Because so few judges and, perhaps, even fewer members of the bar are familiar with corpus linguistics, I believe it is simply too soon to know whether the benefits of using this new tool warrant the increased expense.

Finally, I wish to make clear that while I would not employ Justice Lee’s corpus linguistics analysis in this case for the reasons I have specified, and because of other concerns I have not discussed here nor resolved for myself, I look forward to our continued debate in this area. Should we elect to embark down this path, we should tread slowly and cautiously. And caution dictates that this potential method of statutory interpretation be fully tested in the crucible of the adversarial process, rather than simply applied sua sponte by our court.

This is a fascinating and important set of opinions. I favor Justice Lee's approach, and I think it will not be long until we see it applied more broadly.

