November 21, 2018
Corpus Linguistics in the US Supreme Court (again)
Posted by Gordon Smith

Corpus linguistics made its debut in federal court in February 2018 when Judge Dabney L. Friedrich cited COHA in an opinion to demonstrate that a relevant statutory term was a term of art at the time the statute was passed: "[T]he New York Times database in Lexis/Nexis, and the US News database in Westlaw ... contain virtually no record of 1934-era language usage, but a more robust database [COHA] indicates that the phrase rural district was used with some frequency in the first half of the twentieth century before mostly falling out of usage in the second half. This suggests that even if rural district does not carry meaning distinct from its individual words today, it did in 1934." American Bankers Association v. National Credit Union Administration, 306 F.Supp.3d 44 (D.D.C. 2018).

A few months later, James Heilpern -- a Law and Corpus Linguistics Fellow at BYU Law School -- filed an amicus brief in Lucia v. SEC. The brief was signed by 15 other corpus linguists. Although the Court did not cite the brief, it did embrace the reasoning (see here) and cited Jenn Mascott's Stanford Law Review article, which used a BYU Law Corpus. The following day, in Carpenter v. United States, Justice Thomas cited the Corpus of Historical American English (COHA) and the Corpus of Founding Era American English (COFEA), both created at BYU.

Carpenter is the first citation to corpus linguistics in the US Supreme Court, but I have a feeling we will see another appearance soon, perhaps in Rimini Street, Inc. v. Oracle USA, Inc., a case involving the interpretation of "full costs" under the Copyright Act. Here is the amicus brief, again from James Heilpern who is again accompanied by other corpus linguists, arguing:

the linguistic evidence shows that the "full" in Section 505 should be considered a "delexicalized" adjective, that is, an adjective whose purpose is to draw attention to and underline an attribute already fundamental to the nature of the noun that is already embedded in the meaning of the noun. "Full" often serves to emphasize the completeness of an object that
is already presumed to be complete, like "full deck of cards," "full set of teeth," and "full costs."

Read the whole brief. This is really good stuff.

Permalink | Corpus Linguistics | Comments (0) | Bookmark

June 28, 2016
Michigan Supreme Court Embraces Corpus Linguistics
Posted by Gordon Smith

In the case of People v. Harris, the Michigan Supreme Court became the first state supreme court in the United States to embrace corpus linguistics. (I have written here about Justice Thomas Lee's concurrence in the Utah Supreme Court's Rasabout case, which is cited in this Michigan opinion.) The consolidated cases relate to the "Disclosures by Law Enforcement Officers Act" (DLEOA), which bars the use in a subsequent criminal proceeding of all "information" provided by a law enforcement officer under threat of any employment sanction. While the act does not distinguish between true and false statements, the court used corpus analysis to investigate whether "information" must be true. The majority concludes, "false or inaccurate information cannot be used against a law enforcement officer in subsequent criminal proceedings. To hold otherwise would defeat the Legislature's stated intent...."

Three police officers in Detroit were involved in the assault of Dajuan Hodges-Lamar. In a Garrity hearing, all three officers lied about the incident. As the majority noted, "A video recording of the incident [that] surfaced after defendants had made their statements ... is wholly at odds with the statements provided by defendants." The officers were subsequently charged with various crimes, including obstruction of justice, but they moved to dismiss these charges on the ground that the only evidence against them was their false statements, which were excludable under DLEOA. The district court excluded the statements, and a divided court of appeals reversed. The Michigan Supreme Court reversed, holding that DLEOA prohibits the use of all officer statements, whether true or false.

In reaching this conclusion, the majority opinion by Justice Brian K. Zahra relied on corpus linguistics:

Keeping in mind that we must interpret the word "information" as used in the DLEOA "according to the common and approved usage of the language," we apply a tool that can aid in the discovery of "how particular words or phrases are actually used in written or spoken English." The Corpus of Contemporary American English (COCA) allows users to "analyze[] ordinary meaning through a method that is quantifiable and verifiable." [Citing Stephen Mouritsen, Hard Cases and Hard Data: Assessing Corpus Linguistics as an Empirical Path to Plain Meaning, 13 Colum Sci & Tech L Rev 156, 202 (2012).]

The dissent claims that, in ordinary usage, "we should not think of someone who provided inaccurate statements as having imparted `knowledge' or `information' ...." Empirical data from the COCA, however, demonstrates the opposite. In common usage, "information" is regularly used in conjunction with adjectives suggesting it may be both true and false.* This strongly suggests that the unmodified word "information," can describe either true or false statements. Moreover, by reading each identified use of the word "information" in its surrounding context, it is clear that "information" is often used to describe false statements. Quite simply, "information" in common parlance describes perceptions conveyed about the world around us, which may be true or false.

*The footnote states:

In conducting a COCA search, the word "accurate" is the most common adjective collocated with "information" to bear a meaning that refers to truth or falsity. The words "false" and "inaccurate" are also commonly collocated with "information."

Ok, that's interesting enough, but it gets much more interesting when Justice Stephen Markman in dissent engages the majority's corpus analysis:

The majority relies on the Corpus of Contemporary American English (COCA), a truly remarkable and comprehensive source of ordinary English language usage compiled by linguistic scholars at Brigham Young University, in particular Professor Mark Davies. The COCA, available at (accessed June 7, 2016), is an online "resource [that can be used by courts] for assessing the ordinary meaning of a statutory term." State v Rasabout, 2015 Utah 72, ¶ 72; 356 P3d 1258 (2015) (Lee, A.C.J., concurring in part) (assessing with an impressive thoroughness, in ¶¶ 40-134, the strengths and limitations of using a corpus to facilitate the interpretive processes of the judiciary). By using the COCA, "we can access large bodies of real-world language to see how particular words or phrases are actually used in written or spoken English." Id. at ¶ 57. However, notwithstanding the majority's invocation of the COCA, I believe that the COCA actually supports the proposition set forth in this dissent that the common and most reasonable understanding of the term "information" excludes false statements.

The term "information" is found within the COCA 168,187 times and yet it is only modified by the term "truthful" 28 times, "true" 18 times, "accurate" 508 times, "inaccurate" 112 times, and "false" 271 times. In other words, the term "information" is modified by one of these adjectives 937 times. The other 167,250 times that the word "information" is used it is unmodified by one of these adjectives. That is, 99.44% of the time "information" in the COCA is unmodified by any of these adjectives related to veracity. Therefore, I disagree with the majority's contention that the COCA affords support for the proposition that the term "information" is "regularly" or "commonly" modified by one of these adjectives. I find to the contrary. And where "information" is unmodified by one of these adjectives, I believe it is overwhelmingly used to refer to truthful information.... 

I do not believe that a judicial interpretation of "information" drawn from use of the term in ½ of 1% of all of its appearances in a corpus constitutes an ordinary, common, or reasonable interpretation of the term. There is no word that cannot be abused, misused, or employed in an exotic or puzzling way in everyday discourse, and a corpus will reflect this reality; it is not, however, the purpose of a corpus to transform every such use of a word into a reasonable construction of the words of the law....

Furthermore, the reader may wish to peruse at random any number of the 167,250 uses of "information" in the COCA and assess whether the term was reasonably used and understood as indistinguishably referring to true and false information. When, for example, the doctor is offered "information" from a patient concerning the latter's condition, would either party suppose that the latter was not intending in a reasonably accurate manner to describe his symptoms as he then believed them to be? Or, by further example, when a "contract" or "trade-off" of some kind is delineated by the elected representatives of the people in the Legislature, with an explicit quid pro quo defined in terms of the production of "information," and presumably with some measure of public benefit to be derived by the production of that "information," could that Legislature genuinely have been disinterested in whether such information was true or false?

Here we see judges struggling with the rules of application under corpus analysis, an issue that was a central topic of conversation in the recent Conference on Corpus Linguistics held at BYU. We clearly have a long way to go before corpus analysis is regularized in judicial proceedings, but this case represents an nice step forward, where both majority and dissenting justices agree that corpus linguistics could inform their interpretation of the statute.

I am working with the BYU Law Review to organize a symposium on corpus linguistics to be held next winter semester. If you would like more information about the symposium, please do not hesitate to contact me.

Permalink | Corpus Linguistics | Comments (0) | Bookmark

August 14, 2015
Corpus Linguistics in the Courts (Again)
Posted by Gordon Smith

Here are the basic facts of State v. Rasabout from the majority opinion of the Utah Supreme Court, which was issued today: 

Andy Rasabout is a member of the street gang known as the Tiny Oriental Posse. On November 1, 2007, Mr. Rasabout, riding shotgun in a Honda Civic, fired twelve shots from a Glock 9 mm semiautomatic pistol at a house and a car parked in front. Lee Tran, whom Mr. Rasabout knew to be a rival in the Original Laotian Gangsters, owned the car and was inside the house at the time. But Mr. Tran was not the only person in danger. Two young girls and their mother were asleep upstairs. Several others were playing cards in the basement. And one man was standing in the carport, enjoying the crisp morning air and a cigarette.

Rasabout was convicted of 12 counts of "unlawful discharge of a firearm." But is each shot a separate "discharge"? Or should the 12 shots together be considered one "discharge"? For more insight on those questions, read below the fold.

This is a matter of statutory construction, but the relevant statute, Utah Code section 76-10-508, does not address these questions expressly. The relevant focus of the Court's inquiry is legislative intent, but what did the Utah Legislature intend when it criminalized the "discharge of any kind of dangerous weapon or firearm ... from an automobile ... within 600 feet of ... a house, dwelling, or any other building"?

The Utah Supreme Court held "each discrete shot" is one "discharge." The majority opinion came to that conclusion via conventional textual analysis, including consideration of the dictionary definition of "discharge." According to the Court, "Under these dictionary entries, the clearest reading of the statute is that discharging a weapon or firearm means shooting a weapon or firearm." 

The majority opinion also relied on a close reading of the statute, which refers to a firearm as "any device . . . from which is expelled a projectile by action of an explosive." And, of course, there is the policy argument: "it was reasonable for the Legislature to criminalize each shot fired because each shot carries an independent harm."

None of this seems dispositive, but it's pretty standard analysis from a Court. 

Associate Chief Justice Tom Lee agreed with all of the majority's conclusions, but was uncomfortable resolving the statutory ambiguity by reference to the dictionary. You should know why, but the gist of the problem is that the dictionary definition of "discharge" could mean "to shoot" or it could mean "to unload." And the dictionary does not tell us the best meaning in this context. To resolve this problem, Justice Lee turns to corpus linguistics:

In this age of information, we have ready access to means for testing our resolution of linguistic ambiguity. Instead of just relying on the limited capacities of the dictionary or our memory, we can access large bodies of real-world language to see how particular words or phrases are actually used in written or spoken English. Linguists have a name for this kind of analysis; it is known as corpus linguistics.

The fancy Latin name makes this enterprise seem esoteric and daunting. It is not. We all engage in it even if we don’t attach the technical label to it. A corpus is a body, and corpus linguistics analysis is no more than a study of language employing a body of language. When we communicate using words we naturally access a large corpus—the body of language we have been exposed to during our lifetimes—to decode the groups of letters or sounds we encounter. The most basic corpus linguistics analysis involves our split-second effort to access the body of language in our heads in our ongoing attempt to decode words or phrases we may be uncertain of. We all do that repeatedly every day.


It is a small step to utilize a tool to aid our linguistic memory. Judges do this with some frequency as well. Naturally. If judges are entitled to consult the corpus of language in our heads (and how could we not?), we must also be permitted to supplement and check our memory against publicly available sources of language.

Yes, yes, yes!

And the result in this case? A somewhat primitive search of Google News and a more sophisticated analysis using the Corpus of Contemporary American English both confirm that "discharge" of a firearm "is almost always used in the sense of a single shot."

Justice Lee has authored other opinions using corpus linguistics (most notably here), but this latest contribution will stand as a landmark defense of the method, mainly because of many excellent challenges raised by Justice Parrish in the opinion of the Court. In the following paragraphs, I interlace two of Justice Parrish's challenges with Justice Lee's responses:

Justice Parrish: Justice Lee argues that we should decide this case against Mr. Rasabout on the basis of the corpus linguistics research he has conducted sua sponte. But because his rationale is so different in kind from any argument made by the parties, Mr. Rasabout has never had a reasonable opportunity to present a different perspective. This violates the very notion of our adversary system, which "assures fairness by exempting a party from the inequity of [losing] on appeal on a ground that [he] had no opportunity to address." "[W]e should not dilute [the protections of our adversary system] by stretching their standards to justify our consideration of [an argument] we find interesting or important." Moreover, deciding this case on the basis of an argument not subjected to adversarial briefing is a recipe for making bad law.

Justice Lee: Independent investigation is foreclosed only as to "facts," not law. That is significant. Judicial analysis of the meaning of language, using corpus analysis or anything else, is aimed at interpreting the law. That is the judge’s role. In performing the core function of deciding what the law is or should be, we cannot properly be restricted from consulting sources that inform our understanding.

... To the extent the [corpus] analysis ... encompasses “facts” (e.g., as to the way the verb discharge is typically used in our language in connection with the noun firearm or its synonyms), moreover, our law makes clear that we are free to consider them as a matter of judicial notice. Otherwise my colleagues’ opinions also cross the line, as they also consider evidence not presented in the briefs regarding the ordinary meaning of discharge.

... A contrary conclusion would call into question a wide range of opinions of this court and many others. If we were foreclosed from considering outside material that informs our resolution of open questions of law, we would be barred from engaging in historical analysis relevant to a question of original meaning of a provision of the constitution, or from considering social science literature in resolving a difficult question under the common law. Linguistic analysis is no different; to the extent we charge our judges with resolving ambiguities in language, we cannot (and do not) reasonably restrict their ability to do so on a well-informed basis—even on grounds not presented by the parties, and not within the domain of judges’ professional training. Such a restriction, after all, would not just foreclose corpus analysis; it would also prevent us from consulting a dictionary or our own experience with language.

Justice Parrish: Additionally, it would be entirely inappropriate for this court to conduct the independent scientific research that serves as the basis for Justice Lee’s approach. Justice Lee admits that he is not an expert in this field and does not completely understand its methodologies, but asserts that we must "try." Linguistics is a scientific field of study that uses empirical research to draw findings. And just as with other fields of scientific study, simply trying harder will not lead us to a better answer. The knowledge and expertise required to conduct scientific research are "usually not within the common knowledge" of judges, so "testimony from relevant experts is generally required in order to ensure that [judges] have adequate knowledge upon which to base their decisions." We regularly refuse to conduct legal research, a field in which we are experts. We should similarly refuse our inclination to contrive of interesting research projects that require expertise in fields in which we have no training.

Moreover, as Justice Lee points out, "[m]ost judges are generalists." Indeed, we are aware of almost no one sitting on the bench or practicing law in this state who has the kind of scientific expertise required to reliably conduct the research Justice Lee requires. And issues of statutory construction where "both sides are able to marshal dictionary definitions in support of their view" permeate the majority of cases in this state. But in every such case, Justice Lee would require ad hoc linguistics research that could only be reliably conducted by dueling linguistics experts. Imposing such a significant financial burden on so many of the litigants coming through the doors of our courts would be tantamount to locking those doors for all but the most affluent. Moreover, it would place an unbearable burden upon our already thinly stretched district judges. That is simply unacceptable.

Justice Lee’s appeal to linguistics research would not be so disquieting if he were not conducting the research himself, but merely citing findings that have been advanced by the parties, published in a scholarly journal, or authored by a respected source. Such sources include the reports of expert witnesses, published academic articles, and widely available dictionaries. In those cases, findings are subject to review by the relevant field of study or the opposing party’s expert before we rely upon them. But if we conduct our own research, the parties are bound by our decision even if our methods or findings are subsequently found to be flawed. Accordingly, it is unfair, and indeed unwise, for us to decide a case on the basis of scientific research that is subject to neither prior review by the relevant field of study or adversarial briefing.

Justice Lee: We judges are experts on one thing—interpreting the law. And the fact that that enterprise may implicate disciplines or fields of study on which we lack expertise is no reason to raise the white flag. It is reason to summon all our faculties as best we can, and to overcome any weaknesses we may possess. This is not a matter of dreaming up “interesting research projects.” It is a matter of doing our job—of doing all we can to understand and implement the will of the legislature as expressed in the terms of its statutes, and to convey our grounds for doing so in a written opinion.

That job isn’t always easy. It involves not just linguistic analysis but also historical inquiry—e.g., in finding original meaning. Few of us have training in historical research. It may even be said that lawyers and judges “are for the most part extremely bad historians,” and may “make up an imaginary history and use curiously unhistorical methods.” Yet judges of all stripes engage in historical analysis, particularly in their interpretation of the constitution. So the response to our lack of historical training is not to back away from the enterprise; it is to arm ourselves with the tools necessary to do the best history we can.

We face a parallel problem when it comes to our analysis of the meaning of language. When it comes to training or experience in methods of linguistic analysis, most of us lack specialized training. So there is certainly a risk, to paraphrase Max Radin, of judges using curiously unscientific linguistic methods.

But the proper response to this risk is not the abandonment of the enterprise of linguistic analysis. That enterprise is an integral element of judging. Judges cannot do their job without assessing the ordinary meaning of words. So the question is not whether to engage in linguistic analysis; it is whether to do so with the aid of—instead of in open ignorance of or rebellion to— modern tools developed to facilitate that analysis.

We could continue to judge the ordinary meaning of words based on intuition, aided by the dictionary. But those tools are problematic, for reasons explained above. And the impact of a judge’s mere gut intuition is entirely opaque. So it is our current methodology and tools that involve bad linguistics produced by unscientific methods. If the concern is reliability, the proper response is to embrace—and not abandon—corpus-based analysis.

To do so well, we judges must seek to understand this field better. We are not experts. At least I am not; I do not possess a complete understanding of the methodologies at our disposal. But I am convinced that the approach employed above is essential to a more reliable, transparent fulfillment of this judicial task.

Corpus analysis, in all events, is not rocket science. At some level we all do it intuitively in our minds. It’s a small leap to check our intuition against examples of real-world language revealed by a Google- or COCA-based search of a body of written English. We don’t need much expertise to do that well.

Corpus analysis is like math—at one level it’s something basic that everyone does; at another level it’s something complicated that only "experts" can do. The type of corpus analysis I am doing—and advocating—is the former. I just think we should be using a calculator instead of doing it in our heads.

As to historical analysis, Justice Antonin Scalia and his co-author Bryan Garner have aptly repudiated the charge that “‘no one can reconstruct original understanding precisely’” with a powerful reminder of the judicial task: “Our charge is to try.” The same can be said of linguistic analysis. It may not be possible to resolve questions of ordinary meaning with absolute certainty. But we must try. And in so doing we must bring to bear the methods and tools developed in the 21st Century to better understand the meaning of language for this crucial element of judging.

Meanwhile, Chief Justice Durrant had words of praise for Justice Lee:

While I too would decline to use that method in this case, I applaud Justice Lee for his thoughtful exploration of corpus linguistics as a potential additional tool for our statutory interpretation tool box. I am open to the possibility that in certain cases it may well prove useful in our assessment of the ordinary meaning of statutory terms. But I do not consider it necessary in this case, because I find the majority’s analysis of the meaning of the term “discharge,” an analysis conducted using our long-established methods of statutory construction, to be compelling. And I would further note that although the dictionary alternatively defines "discharge" as "to shoot" or "to unload," the former definition applies specifically to firearms while the latter seems to encompass the more general concept of unloading cargo or emptying a container. Granted, the cartridge of a firearm is in some sense a "container" that can be "unloaded," but in the context of a statute that involves firearms, I am confident the legislature intended the term "discharge" to mean "to shoot," not "to unload." So unlike Justice Lee, I do not believe this is a case where the dictionary "fails to dictate the meaning that the statutory terms must bear."

Further, I am concerned about our use of corpus linguistics in a case where it has not been argued by the parties. Now, it is certainly true, that we may, and often have, employed dictionaries, canons of construction, or other tools for statutory interpretation that have not been argued by the parties. But primary linguistics research seems to me a step removed from these traditional tools. I fear that a sua sponte venture into such territory may be fraught with the potential for error. I think this is illustrated by Justice Lee’s critique of the flaws in Judge Posner’s sua sponte attempt at such research. The fact that a jurist of Judge Posner’s intellect and stature is capable of such missteps suggests to me that, even in those cases where corpus linguistics may be a useful addition to our traditional methods of statutory construction, it would be best employed by us, or by other judges, only after the parties have raised it and argued it. This would allow the respective sides in the dispute to challenge each other’s database, methodologies, and conclusions.

Finally, in our exploration of whether or when corpus linguistics should play a role in our interpretation of statutes, it is important that we weigh the potential usefulness of the approach against its potential cost. In some cases, the linguistic analysis may be sufficiently complicated that an expert would be required. This would be an unfortunate cost to add to the already-too-expensive litigation process. But I think that in other cases the lawyers could themselves conduct the linguistic analysis, outline their methods and conclusions in their briefs, and present argument as to why we should adopt their approach over their opponents’. Because so few judges and, perhaps, even fewer members of the bar are familiar with corpus linguistics, I believe it is simply too soon to know whether the benefits of using this new tool warrant the increased expense.

Finally, I wish to make clear that while I would not employ Justice Lee’s corpus linguistics analysis in this case for the reasons I have specified, and because of other concerns I have not discussed here nor resolved for myself, I look forward to our continued debate in this area. Should we elect to embark down this path, we should tread slowly and cautiously. And caution dictates that this potential method of statutory interpretation be fully tested in the crucible of the adversarial process, rather than simply applied sua sponte by our court.

This is a fascinating and important set of opinions. I favor Justice Lee's approach, and I think it will not be long until we see it applied more broadly.

Permalink | Corpus Linguistics | Comments (0) | Bookmark

June 05, 2014
Algorithm as Director
Posted by Gordon Smith

This looks like an April Fool's story, but the date is off:

A Hong Kong VC fund has just appointed an algorithm to its board.

Deep Knowledge Ventures, a firm that focuses on age-related disease drugs and regenerative medicine projects, says the program, called VITAL, can make investment recommendations about life sciences firms by poring over large amounts of data.

Just like other members of the board, the algorithm gets to vote on whether the firm makes an investment in a specific company or not. The program will be the sixth member of DKV's board.

I know what you are thinking ... could we see algorithms as board members in the US?

The Delaware General Corporation Law anticipates the question of non-human board members. Section 141(b) of the DGCL provides: "The board of directors of a corporation shall consist of 1 or more members, each of whom shall be a natural person."

The Model Business Corporation Act, on the other hand, provides: "A board of directors must consist of one or more individuals ..." As far as I know, "individual" is not a defined term in the MBCA or by any court interpreting this provision, but I assume it would be interpreted to mean "human being." That could be a nice question for a corpus linguist, if it came to that. Note that the statute does not provide every director must be an individual.

Permalink | Corporate Law| Corpus Linguistics | Comments (0) | TrackBack (0) | Bookmark

March 02, 2012
Judge Posner and Corpus Linguistics
Posted by Gordon Smith

In a recent opinion, Judge Posner wanted to know more about the meaning of "harboring," and he didn't find what he wanted in a dictionary. "Dictionary definitions are acontextual," he wrote, "whereas the meaning of sentences depends critically on context,including all sorts of background understandings."

I wish he would have cited The Best Student Comment Ever, aka The Dictionary Is Not a Fortress: Definitional Fallacies and a Corpus-Based Approach to Plain Meaning. If he had, he would have discovered corpus linguistics, and that might have led him to the Corpus of Contemporary American English (COCA).

As it was, he simply turned to Google. While it's true that Google is a corpus -- check out the Web as Corpus Community -- it's not a very good way to do what Posner was trying to do, which was to find the ordinary meaning of "harboring." 

Judge Posner should have just called Utah Supreme Court Justice Tom Lee for some tips. Justice Lee has used corpus linguistics in two opinions (see here for the first one), with the help of Stephen Mouritsen (author of the aforementioned comment).

Have I mentioned that I think corpus linguistics is a big deal that is going to transform legal scholarship? If not, I just wanted to get that on the record because I have been telling everyone I know.

Permalink | Corpus Linguistics | Comments (0) | TrackBack (0) | Bookmark

February 14, 2012
Fun with Word Frequencies
Posted by Gordon Smith

We are talking about word frequencies today in Corpus Linguistics, and as my in-class experiment, I created a Wordle cloud from my most recent article, Private Ordering with Shareholder Bylaws:

Wordle: Private Ordering with Shareholder Bylaws

If you don't know what the article is about, you can read the abstract, but reading the word cloud gives you a pretty good idea at a glance. Could word frequencies help us find the scholarship that most interests us?

Permalink | Corporate Governance| Corpus Linguistics | Comments (0) | TrackBack (0) | Bookmark

January 12, 2012
Posted by Gordon Smith

If you are fascinated by words, you will enjoy Wordnik. Recently featured in the NYT, this site is fun and educational. Just try it

Permalink | Corpus Linguistics | Comments (0) | TrackBack (0) | Bookmark

January 09, 2012
Data and Intuition
Posted by Gordon Smith

Tomorrow is my first class in Corpus Linguistics, the data-driven study of language. I am lucky that Mark Davies teaches at BYU, and grateful that he has agreed to let me sit in on his course. I have blogged about corpus linguistics herehere, and here, but, until now, I have had no formal training in the field.

Today I was reading tomorrow's assignment out of Susan Hunston's Corporata in Applied Linguistics, and I found this sentence in chapter 1:

The main argument in favour of using a corpus is that it is a more reliable guide to language use than native speaker intuition is.

Just two pages later, however, Hunston offers this on the role of intuition:

[Intuition] is an essential tool for extrapolating important generalizations from a mass of specific information in a corpus.

Data + intuition. That's quite a nice description of empirical study generally, and it makes sense that corpus linguistics would follow the pattern.

In addition to reading the first chapter of Hunston, we were assigned to read the last chapter (no cliffhangers in this course), which tells us that corpus linguistics can make life simpler ... and more complex. The source of complexity?

New ideas about language emerge and the old ones need re-evaluation.

This is going to be fun.

Permalink | Corpus Linguistics | Comments (0) | TrackBack (0) | Bookmark

July 20, 2011
More on Corpus Linguistics
Posted by Gordon Smith

I have blogged twice about corpus linguistics (here and here), but both posts have generated substantial interest, so in an effort to give additional credit where credit is due -- and to facilitate exploration of the application of corpus linguistics to law -- I am linking to Neal Goldfarb's briefs, including the AT&T brief that attracted so much attention recently. Goldfarb is not trained as a linguist, but he has elevated the profile of linguistics through his work as a practicing lawyer. You can read about him here. His blog is called LAWnLinguistics

For other pioneers in this area, readers may be interested in Clark D. Cunningham, Judith N. Levi, Georgia M. Green & Jeffrey P. Kaplan, Plain Meaning and Hard Cases, 103 Yale L.J. 1561 (1993) and Clark D. Cunningham & Charles J. Fillmore, Using Common Sense: A Linguistic Perspective on Judicial Interpretations of “Use a Firearm,” 73 Wash. U. L. Q. 1159 (1995).

By the way, Mark Liberman, Director of the Linguistic Data Consortium at the University of Pennsylvania, is astonished that it has taken lawyers so long to get around to linguistics. Yes, I agree. I am astonished, too.

Permalink | Corpus Linguistics | Comments (0) | TrackBack (0) | Bookmark

July 19, 2011
A Landmark Opinion: Corpus Linguistics in the Courts
Posted by Gordon Smith

Last month I blogged about the "best student comment ever," the first law review article to rely on corpus linguistics as the basis for analysis. [See below for update.] As I have worked with corpus linguistics (through the comment's author, Stephen Mouritsen) over the past few months, I have come to conclude that it will revolutionize the study of law, at least insofar as we are attempting to understand word usages.

Today, my former colleage and current Utah Supreme Court Justice Tom Lee used corpus linguistics in a lengthy concurring opinion (the relevant section starts at page 34). In this opinion, Justice Lee is interpreting the word "custody," and he brings corpus linguistics to the fight. Of course, it's no accident that Stephen Mouritsen is Justice Lee's law clerk, but the bigger point here is that Justice Lee was persuaded -- as I am -- of the value of corpus linguistics to shed light on this interpretive question. Justice Lee's collegues are not enamored with the approach, but you can read the opinions for yourself and see who gets the better of the argument. 

This seems to be the first judicial opinion anywhere using corpus linguistics, but it will surely not be the last. If you are as intrigued by corpus linguistics as I am, you might be interested in this paper by Mark Davies, a BYU Professor of Corpus Linguistics who is a leader in this field, on how one might use the Corpus of Contemporary American English. I am told that a similar paper on the Corpus of Historical American English is forthcoming.

UPDATE: As noted by Neal Goldfarb, the first law review article to use a linguistic corpus was written by Charles Fillmore and Clark Cunningham, Using Common Sense: A Linguistic Perspective on Judicial Interpretations of 'Use a Firearm', 73 Wash. U. L.Q. 1159 (1995). Indeed, Mouritsen cites the article in his comment.

Mouritsen’s comment differs from the Fillmore and Cunningham article both in its method and its claim. Fillmore and Cunningham use corpus linguistics to examine the word "use" in an attempt to understand what it might mean to "use a firearm." They use the British National Corpus to examine the range of possible meanings of that statutory term in much the same way that a lexicographer might rely on a citation file to find usage examples.

Rather than explore the range of possible uses of a statutory term, Mouritsen relies exclusively on corpus-based data to attempt to demonstrate the “ordinary meaning” of a statutory term in a particular context. His article is the first to do this. Thanks to Neal for raising the issue, causing me to make a more precise statement about the contribution of the Mouritsen piece.

Permalink | Corpus Linguistics | Comments (0) | TrackBack (0) | Bookmark

June 13, 2011
The Best Student Comment Ever
Posted by Gordon Smith

The best student comment I have ever read was published by the BYU Law Review last year. The Dictionary Is Not a Fortress: Definitional Fallacies and a Corpus-Based Approach to Plain Meaning was written by Stephen Mouritsen, who is now clerking with Justice Tom Lee on the Utah Supreme Court and will start at Cravath this fall. Stephen's comment was cited by the NYT today, though that citation did not do the piece justice. By the way, Stephen and I are writing an article this summer using corpus linguistics. Very cool stuff.

Permalink | Corporate Law| Corpus Linguistics| Legal Scholarship| Supreme Court | Comments (0) | TrackBack (0) | Bookmark

Recent Comments
Popular Threads
Search The Glom
The Glom on Twitter
Archives by Topic
Archives by Date
January 2019
Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
Miscellaneous Links