The surprising forces influencing the complexity of the language we speak and write.
Nautilus Julie Sedivy
“[[[When in the course of human events it becomes necessary for one people [to dissolve the political bands [which have connected them with another]] and [to assume among the powers of the earth, the separate and equal station [to which the laws of Nature and of Nature’s God entitle them]]], a decent respect to the opinions of mankind requires [that they should declare the causes [which impel them to the separation]]].”
—Declaration of Independence, opening sentence
An iconic sentence, this. But how did it ever make its way into the world? At 71 words, it is composed of eight separate clauses, each anchored by its own verb, nested within one another in various arrangements. The main clause (a decent respect to the opinions of mankind requires …) hangs suspended above a 50-word subordinate clause that must first be unfurled. Like an intricate equation, the sentence exudes a mathematical sophistication, turning its face toward infinitude.
To some linguists, Noam Chomsky among them, sentences like these illustrate an essential property of human language. These scientists have argued that recursion, a technique that allows chunks of language such as sentences to be embedded inside each other (with no hard limit on the number of nestings) is a universal human ability, perhaps even the one uniquely human ability that supports language. It’s what allows us to create—literally—an infinite variety of novel sentences out of a limited inventory of words.
But that leads to a curious puzzle: Complex sentences are not ubiquitous among the world’s languages. Many languages have little use for them. They prefer to string together simple clauses. They may even lack certain words such as relative pronouns that and which or connectors like if, despite, and although—these words make it possible to link clauses together into larger sentences. Allegedly, the Pirahã language along the Maici River of Brazil lacks recursion altogether. According to linguist Dan Everett, Pirahã speakers avoid linguistic nesting of all kinds, even in structures such as John’s brother’s house. (Instead, they would say something like: Brother’s house. John has a brother. It is the same one.)
This can’t be pinned on biological evolution. All evidence suggests that humans around the world are born with more or less the same brains. Abundant childhood exposure to a language with layered sentences practically guarantees their mastery. Even adult Pirahã speakers, who have remained unusually isolated from European languages, pick up the trick of complex syntax, provided that they spend enough time interacting with speakers of Brazilian Portuguese, a language that offers an adequate diet of embedded structures.
Sentences like the opening line of the Declaration of Independence simply do not occur in conversation.
More useful is the notion of linguistic evolution. It’s the languages themselves, rather than the brains, that have evolved along different paths. And just as different species are shaped by adaptations to specific ecological niches, certain linguistic features—like sentence complexity—survive and thrive under some circumstances, whereas other features take hold and spread within very different niches.
Languages with very simple sentence structure are, for the most part, oral languages. It’s the languages that have a culture of writing, developed over a long span of time, that display a fondness for stacking clauses onto one another to create towering sentences. This pattern raises the possibility that the invention of writing, a very recent innovation tagged on to the very last millennia of human evolution, can dramatically alter a language’s linguistic niche, spurring the development of elaborate sentence structure, and leading to the shedding of other features, on a timescale that cannot be achieved through biological evolution. If that’s so, then the languages that many of us have grown up with are very different from the languages that have been spoken throughout the vast majority of human existence.
Many of the world’s oral languages are quite unlike European languages. Their sentences contain few words. They rarely combine more than one clause. Linguist Marianne Mithun has noted some striking differences: In English, 34 percent of clauses in conversational American English are embedded clauses. In Mohawk (spoken in Quebec), only 7 percent are. Gunwinggu (an Australian language) has 6 percent and Kathlamet (formerly spoken in Washington state) has only 2 percent. An English speaker might say: Would you teach me to make bread? But a Mohawk speaker would break this down into several short sentences, saying something like this: It will be possible? You will teach me. I will make bread. In English, you might say: He came near boys who were throwing spears at something. A Kathlamet approximation would go like this: He came near those boys. They were throwing spears at something then.
Some oral languages do regularly embed clauses, suggesting that writing is not necessary for complex syntax. But, as can be seen in a number of indigenous languages, longer and more complicated sentences often emerge as a result of contact with a written language. Structurally useful words (if, and then, because, but) have spread from Spanish to various Mexican languages such as Nahuatl, Sierra Popoluca, and Otomi, with the latter borrowing heavily from Spanish; in one sample, almost 80 percent of Otomi subordinate clauses began with borrowed connector words. (Only four such words are native to Otomi; in contrast, Spanish has 40.) Moreover, languages may borrow the concept of certain relationships between clauses, but not the exact words. This can be seen in various Iroquoian languages which have a word for “and” that doesn’t derive from a common linguistic ancestor, but instead reflects recent adaptions of each language’s own linguistic resources to express the notion of and-ness.
We utter the first syllables of a sentence while taking a leap of faith that we’ll be able to choose the right words en route.
The development of intricate sentences in modern European languages has unfolded slowly. These languages now churn out relative clauses with boundless enthusiasm but their common ancestor, Proto-Indo-European, may have lacked the necessary grammatical tools to produce them at all. According to linguist Guy Deutscher, the earliest clay tablets (about 2500 B.C.) of the ancient language Akkadian reveal few embedded clauses. The same is evidently true of the earliest stages of other ancient written languages such as Sumerian, Hittite, or Greek. Although these languages boasted a profusion of grammatical features suitable for expressing subtle nuances of meaning, and included a variety of fancy word-building techniques, they avoided complicated sentence recursion. When they did combine clauses into larger structures, this technique looked less like Russian dolls, with one clause inside another, and more like beads on a necklace, with one clause added next to another, and resembled this sentence from an old Hittite text (14th century B.C.):
I drove in a chariot to Kunnu, and a thunderstorm came, then the Storm-God kept thundering terribly, and I feared, and the speech in my mouth became small, and the speech came up a little bit, and I forgot this matter completely, but afterwards the years came and went, and this matter came to appear repeatedly in my dreams, and God’s hand seized me in my dreams, and then, my mouth went sideways, and …
The invention of writing sparked certain innovations such that by 1800 B.C., Akkadian texts already exhibited complex sentences that rival the prose of Henry James in their complexity. One such sentence (from Hammurabi’s Code of Law) proceeds like this:
[If, [after the sheep and goats come up from the common irrigated area [when the pennants announcing the termination of pasturing are wound around the main city gate], the shepherd releases the sheep and goats into a field], the shepherd shall guard the field].
The divergence between spoken and written language can be witnessed around the world, at all time scales. Compare, for example, the number of Finnish subordinate clauses in the old oral tradition to modern written Finnish. There are embedded clauses in The Kalevela (a collection of folk poetry that constitutes the Finnish national epic). But there are not very many: a 1,300-word sample yields three fairly simple examples, but a 1,300-word stretch of current written Finnish would typically contain about 60—and these would be more varied and more complex. A more recent example: The Somali language had essentially no written tradition until 1972, when it became the official state language. Over a mere 20-year period, researchers have observed noticeable changes to the written language, such as the emergence of longer and more complex words and greater elaboration of sentence structure.
Modern languages with a long literary tradition show a stark split between their written and spoken styles across many contexts. In current English, writing uses more varied vocabulary than conversational speech, and it uses rarer and longer words much more often. Certain structures (such as passive sentences, prepositional phrases, and relative clauses) appear more often in written than spoken language. Writers generally elaborate their ideas more explicitly through syntax whereas speakers leave more material implicit. And written language stacks clauses inside each other to a greater depth than spoken language. This is one of the most striking differences between speech and text; sentences like the opening line of the Declaration of Independence simply do not occur in conversation.
Why and how did syntax explode like this? For one thing, writing creates an entirely different communicative environment from spoken interactions, with text intensifying certain pressures and relaxing others. When writing is first introduced into a society, it is typically used mainly as a record of spoken language, but as writers eventually write for readers, not hearers, the language of text diverges from speech. Some existing features come to be used more generously than in spoken language, and new grammatical tools may be introduced.
Speech also proceeds under the whips of two tyrants: time and memory. Our memories aren’t nearly capacious enough to allow us to compose and precompile each sentence before beginning to utter its first syllable. Instead, speaking is like driving with a general sense of the destination, but no clear route planned—we utter the first syllables of a sentence while taking a leap of faith that we’ll be able to choose the right words en route and formulate phrases adequately as the words tumble out of our mouths and bring us to an intersection in our thoughts that demands our next move. This puts an upper bound on complexity. But written text, which can be more deliberately planned out and revised, is able to transcend this.
Readers too, and not just writers, are sprung from the shackles of time and memory. If reading were like hearing language, we would view text through a two-character aperture moving inexorably forward, unable to slow down, pause, or dart back and re-read. But eye-tracking studies show that when we read, we break free of linear time and seize control over the flow of information, our eye movements lurching along at inconsistent speeds and frequently jumping back to earlier parts of a sentence which, during speech, would already be auditory vapor. Such freedoms invite the most glorious excesses of recursion in text.
The complex syntax fostered by writing seems to be an acquired skill much like mental arithmetic. More or less everyone is born with the potential to do it, but to be able to calculate truly spectacular equations in your head, you need heavy practice, just as to understand (and compose) elaborate sentences with ease, you need plenty of experience with such sentences. Reading transforms the experiential landscape, offering a range and complexity of sentence structure that is rarely found in speech.
When children are fed a steady linguistic diet that is rich in complex sentences, these become easier to compute, and in turn, more readily produced under the time pressures imposed by speech. For example, psychologist Jessica Montag and her colleagues targeted relative clauses in the passive voice (the dog that was hit by the car), which are exceedingly rare in speech but more abundant in text, even that written for children. They found that heavy readers in the 8-12-year-old range produced such structures more often than children who read less. Even among adults, the production of these sentences was highly correlated with how much text they consumed, suggesting that avid readers are far more likely to transmit complex sentences to future generations.
The unpredictable aspects of language, the things you just have to know, may be especially slippery for the adult mind.
All of this suggests that exposure to literary language is essential for the health of complex recursive sentences in English. If certain structures are too rare in speech to be reliably mastered by learners and passed on, then they may fade out within a community of non-readers. Naturally, this raises the question: Could syntactic complexity in literate languages diminish over time, if new technologies (podcasts, video lectures, and audiobooks) tether language more tightly to speech and its inherent limitations?
In fact, heavily recursive sentences like those found in the Declaration of Independence have already been dwindling in written English (as well as in German) for some time. According to texts analyzed by Brock Haussamen, the average sentence length in written English has shrunk since the 17th century from between 40-70 words to a more modest 20, with a significant paring down of the number of subordinate and relative clauses, passive sentences, explicit connectors between clauses, and off-the-beaten-path sentence structures.
These changes may reflect shifts in readers’ experience with language: Where literacy used to be an elite skill commanded by a very few steeped in lives of scholarly study, now it’s a universal basic necessity. More people now read—because they have to—but many probably still consume the vast bulk of their linguistic diet in spoken form and may have little patience for writing that is mentally taxing and reeks of snobbery. The need to make text accessible to a broader population, with a wide range of linguistic experiences, has created some pressure to bring the structures of written English more in line with spoken English.
Still, the English language represents not a single ecosystem, but many. A bird’s eye view of the overall trajectory of English would miss some of the most dramatic changes occurring within its particular linguistic niches. That brings to the fore another key reason why language might gravitate toward streamlined syntax: the nature of the communities that use it.
Oral languages may avoid pushing the limits of syntax not just because they are bound to speech, but also because they have other ways to express complex meanings. Linguists take great pains to point out that languages with simple sentences erupt with complexity elsewhere: They typically pack many particles of meaning into a single word. For example, the Mohawk word sahonwanhotónkwahse conveys as much meaning as the English sentence “She opened the door for him again.” In English, you need two clauses (one embedded inside the other) to say “He says she’s leaving,” but in Yup’ik, a language spoken in Alaska, you can use a single word, “Ayagnia.” (Ayagniuq, in contrast, means “He says he himself is leaving”; Ayagtuq means, more simply, “He’s leaving.”)
The templates for creating such complex words can be unruly, making them seem, to the average English speaker, more like cryptographic problems than words. Linguist John McWhorter offers an astounding example from the Siberian tongue Ket, a language in which verbs take pronoun prefixes to mark who is performing an action. There are two different sets of prefixes that attach to different verbs, and you simply have to know which verb takes which set. Moreover, many verbs simultaneously take two pronoun prefixes that mean the same thing (but many don’t—you just have to know), which trigger subtle shifts in meaning. For instance, digdabatsaq means “I go to the river and come back a bit later,” but digdaddaq (which involves the double use of the same pronoun prefix d) means “I go to the river and stay for a season.” The same word with just one pronoun—digdaksak—means “I go to the river and stay some days or weeks.”
Could syntactic complexity in literate languages diminish if new technologies tether language more tightly to speech?
If such a language seems unlearnable, well, that is exactly the argument that linguists such as McWhorter have been making: that an adult venturing into Ket would inevitably mangle it, just as an adult learner of English may never quite grasp its irregular verbs or idiosyncratic prepositions (why do you say in a club but on a team?). The unpredictable aspects of language, the things you just have to know, may be especially slippery for the adult mind—and there are so many more of these in Ket than in English. This could be why languages that rely on dense words in all their irregular glory tend to be esoteric—that is, spoken within small and insular communities, where everyone who speaks the language has been marinating in it since birth. The inscrutability of such languages may have the effect of repelling outsiders, ultimately reinforcing their insularity.
Complex and quirky words tend to be pruned out by exoteric languages, which are spoken in more porous communities that include a diversity of learners of all ages. Instead, exoteric languages rely on systematic rules that, like rules of arithmetic, offer tidy procedures for creating complex meanings (elaborate sentences) out of easily-mastered elements (simple words). Once learned, these procedures can be applied to any new combination of words. In contrast, the intricate words of many esoteric—and typically oral—languages are amalgams of meanings that have likely become fused together in the minds of speakers over countless instances of use. In other words, where speakers of English or German have become adept at performing the linguistic equivalent of mental arithmetic, speakers of Ket or Yup’ik don’t have to: They have mastered a mind-boggling number of “math facts.”
The insularity of many oral languages also reduces the need for syntactic elaboration in a more direct way: Speakers within a small, tight-knit community have an immense store of shared knowledge. This allows shortcuts in communicating with each other, so that many aspects of meaning can be left implicit. But these shortcuts are not available to speakers of big, sprawling languages who communicate with each other across divides of culture, experience, and specialized knowledge. And writing stretches the distance between participants even wider, allowing writers to communicate with unseen readers across expanses of time and space, thus amplifying the pressures that drive exoteric languages to syntactic complexity.
The borders that enclose some of the world’s smallest and most intimate language communities give us another handle on understanding the waning of sentence complexity in English. Linguists Douglas Biber and Bethany Gray have documented some specific shifts away from syntactic elaboration beginning in the 20th century—but these are not taking place in the language’s most democratic public squares. Instead, they have been fomenting within one of its most esoteric corners, that of specialized scientific writing. And far from representing an impulse to embrace a diversity of readers, they reveal increasingly insular communities—communities that can dispense with detailed embedded clauses simply because they do not have to make meanings explicit to readers who share troves of background knowledge.
Over the course of the 20th century, scientific communities began to splinter apart into highly specialized sub-disciplines. This coincided with a new tendency for academic prose to compress elaborate sentence structure into tighter, smaller expressions—for example, “actions that the government initiated” might become the compound noun “government actions.” What’s striking about such compound nouns is that they are like math facts: They rely on memorized rather than computed meanings. A house boat, for example, is a boat that functions like a house, but a housecoat is a coat you wear in a house, and a housewife fits neither pattern. There is no general algorithm that yields a predictable meaning for a compound. A reasonable guess is often possible but sometimes, you just have to know.
Biber and Gray note that noun compounds increased by more than 400 percent over the 20th century, and there was an explosion in the types of nouns that could be merged. As a result, even when they join everyday nouns, the precise meaning of compounds may be opaque without specialized knowledge: a storm surge is a surge in the level of water due to a storm; harvest effect refers to the extent to which the magnitude of a harvest varies. During the second half of the 20th century, three- and four-noun combinations proliferated, and now we have noun pile extravaganzas such as state hate crime victim numbers. A 19th-century writer might have written instead: the numbers of victims who have experienced crimes that were motivated by hatred directed at their ethnic or racial identity, and who have reported these crimes to the state.
These new complex, compressed, meaning-particle-mashing noun compounds bear more resemblance to the intricate words of Ket or Mohawk than to the airy periodic sentences of the Declaration of Independence. This is likely no accident. Biber and Gray point out that scientific prose, which used to be written for any reasonably educated person, has come to target smaller and smaller communities of close colleagues—like the small, close-knit communities that have nurtured many of the world’s oral languages and their meaning-dense linguistic units.
Evidence shows that the most insular scientific communities have led the march away from elaborated sentences in favor of complex, compressed nouns: Science articles in specialist publications such as the Journal of Cell Biology contain fewer relative clauses and more noun compounds than articles in publications like Science, which target a more diverse community of scientists. Both of these samples in turn have less syntactic elaboration and more compression than academic writing in the humanities, which presupposes even less specialized knowledge among its readers. And lagging behind all of these in the trend toward noun-heavy compression is the language of novels and plays. And, as Biber and Gray have shown, university students learn the art of compression gradually, with those in the sciences coming to rely less on multiple clauses and more on complex nouns than their peers in the arts and humanities.
These findings highlight the extent to which languages are shaped by the structure of their communities—so much so that even a cosmopolitan globe-straddling language like English contains within it an esoteric register whose linguistic opacity has the effect of repelling outsiders and reinforcing the insularity of its community.
Julie Sedivy has taught linguistics and psychology at Brown University and the University of Calgary, and is the author of Language in Mind: An Introduction to Psycholinguistics. She is currently writing a book about losing and reclaiming a native tongue.