June 28, 2016
Michigan Supreme Court Embraces Corpus Linguistics
Posted by Gordon Smith

In the case of People v. Harris, the Michigan Supreme Court became the first state supreme court in the United States to embrace corpus linguistics. (I have written here about Justice Thomas Lee's concurrence in the Utah Supreme Court's Rasabout case, which is cited in this Michigan opinion.) The consolidated cases relate to the "Disclosures by Law Enforcement Officers Act" (DLEOA), which bars the use in a subsequent criminal proceeding of all "information" provided by a law enforcement officer under threat of any employment sanction. While the act does not distinguish between true and false statements, the court used corpus analysis to investigate whether "information" must be true. The majority concludes, "false or inaccurate information cannot be used against a law enforcement officer in subsequent criminal proceedings. To hold otherwise would defeat the Legislature's stated intent...."

Three police officers in Detroit were involved in the assault of Dajuan Hodges-Lamar. In a Garrity hearing, all three officers lied about the incident. As the majority noted, "A video recording of the incident [that] surfaced after defendants had made their statements ... is wholly at odds with the statements provided by defendants." The officers were subsequently charged with various crimes, including obstruction of justice, but they moved to dismiss these charges on the ground that the only evidence against them was their false statements, which were excludable under DLEOA. The district court excluded the statements, and a divided court of appeals reversed. The Michigan Supreme Court reversed, holding that DLEOA prohibits the use of all officer statements, whether true or false.

In reaching this conclusion, the majority opinion by Justice Brian K. Zahra relied on corpus linguistics:

Keeping in mind that we must interpret the word "information" as used in the DLEOA "according to the common and approved usage of the language," we apply a tool that can aid in the discovery of "how particular words or phrases are actually used in written or spoken English." The Corpus of Contemporary American English (COCA) allows users to "analyze[] ordinary meaning through a method that is quantifiable and verifiable." [Citing Stephen Mouritsen, Hard Cases and Hard Data: Assessing Corpus Linguistics as an Empirical Path to Plain Meaning, 13 Colum Sci & Tech L Rev 156, 202 (2012).]

The dissent claims that, in ordinary usage, "we should not think of someone who provided inaccurate statements as having imparted `knowledge' or `information' ...." Empirical data from the COCA, however, demonstrates the opposite. In common usage, "information" is regularly used in conjunction with adjectives suggesting it may be both true and false.* This strongly suggests that the unmodified word "information," can describe either true or false statements. Moreover, by reading each identified use of the word "information" in its surrounding context, it is clear that "information" is often used to describe false statements. Quite simply, "information" in common parlance describes perceptions conveyed about the world around us, which may be true or false.

*The footnote states:

In conducting a COCA search, the word "accurate" is the most common adjective collocated with "information" to bear a meaning that refers to truth or falsity. The words "false" and "inaccurate" are also commonly collocated with "information."

Ok, that's interesting enough, but it gets much more interesting when Justice Stephen Markman in dissent engages the majority's corpus analysis:

The majority relies on the Corpus of Contemporary American English (COCA), a truly remarkable and comprehensive source of ordinary English language usage compiled by linguistic scholars at Brigham Young University, in particular Professor Mark Davies. The COCA, available at (accessed June 7, 2016), is an online "resource [that can be used by courts] for assessing the ordinary meaning of a statutory term." State v Rasabout, 2015 Utah 72, ¶ 72; 356 P3d 1258 (2015) (Lee, A.C.J., concurring in part) (assessing with an impressive thoroughness, in ¶¶ 40-134, the strengths and limitations of using a corpus to facilitate the interpretive processes of the judiciary). By using the COCA, "we can access large bodies of real-world language to see how particular words or phrases are actually used in written or spoken English." Id. at ¶ 57. However, notwithstanding the majority's invocation of the COCA, I believe that the COCA actually supports the proposition set forth in this dissent that the common and most reasonable understanding of the term "information" excludes false statements.

The term "information" is found within the COCA 168,187 times and yet it is only modified by the term "truthful" 28 times, "true" 18 times, "accurate" 508 times, "inaccurate" 112 times, and "false" 271 times. In other words, the term "information" is modified by one of these adjectives 937 times. The other 167,250 times that the word "information" is used it is unmodified by one of these adjectives. That is, 99.44% of the time "information" in the COCA is unmodified by any of these adjectives related to veracity. Therefore, I disagree with the majority's contention that the COCA affords support for the proposition that the term "information" is "regularly" or "commonly" modified by one of these adjectives. I find to the contrary. And where "information" is unmodified by one of these adjectives, I believe it is overwhelmingly used to refer to truthful information.... 

I do not believe that a judicial interpretation of "information" drawn from use of the term in ½ of 1% of all of its appearances in a corpus constitutes an ordinary, common, or reasonable interpretation of the term. There is no word that cannot be abused, misused, or employed in an exotic or puzzling way in everyday discourse, and a corpus will reflect this reality; it is not, however, the purpose of a corpus to transform every such use of a word into a reasonable construction of the words of the law....

Furthermore, the reader may wish to peruse at random any number of the 167,250 uses of "information" in the COCA and assess whether the term was reasonably used and understood as indistinguishably referring to true and false information. When, for example, the doctor is offered "information" from a patient concerning the latter's condition, would either party suppose that the latter was not intending in a reasonably accurate manner to describe his symptoms as he then believed them to be? Or, by further example, when a "contract" or "trade-off" of some kind is delineated by the elected representatives of the people in the Legislature, with an explicit quid pro quo defined in terms of the production of "information," and presumably with some measure of public benefit to be derived by the production of that "information," could that Legislature genuinely have been disinterested in whether such information was true or false?

Here we see judges struggling with the rules of application under corpus analysis, an issue that was a central topic of conversation in the recent Conference on Corpus Linguistics held at BYU. We clearly have a long way to go before corpus analysis is regularized in judicial proceedings, but this case represents an nice step forward, where both majority and dissenting justices agree that corpus linguistics could inform their interpretation of the statute.

I am working with the BYU Law Review to organize a symposium on corpus linguistics to be held next winter semester. If you would like more information about the symposium, please do not hesitate to contact me.

