People around the world would have a preference of using grouped dependent words, says a new study.
The study conducted by Massachusetts Institute of Technology researchers has shown that most languages move toward "dependency length minimization" (DLM) in practice.
That means language users have a global preference for more locally grouped dependent words, whenever possible. Richard Futrell, a PhD student in the Department of Brain and Cognitive Sciences at MIT, said that people wanted the words that are related to each other in a sentence to be close together.
To conduct the study, the researchers has used four large databases of the sentences that have been parsed grammatically: one from Charles University in Prague, one from Google, one from the Universal Dependencies Consortium (a new group of computational linguists), and a Chinese-language database from the Linguistic Dependencies Consortium at the University of Pennsylvania.
The sentences are taken from published texts, and thus represent everyday language use. To quantify the effect of placing related words closer to each other, the researchers compared the dependency lengths of the sentences to a couple of baselines for dependency length in each language.
One baseline randomizes the distance between each "head" word in a sentence and the "dependent" words.
However, since some languages, including English, have relatively strict word-order rules, the researchers also used a second baseline that accounted for the effects of those word-order relationships.
In both cases, Futrell, Gibson, and co-author Kyle Mahowald found that, the DLM tendency exists, to varying degrees, among languages.
Italian appears to be highly optimized for short sentences; German, which has some notoriously indirect sentence constructions, is far less optimized, according to the analysis.
The paper was published in the Proceedings of the National Academy of Sciences.