Researchers from UC Berkeley and the University of British Columbia built a computer program that can rapidly reconstruct "proto-languages" - the linguistic ancestors from which all modern languages have evolved.
These earliest-known languages include Proto-Indo-European, Proto-Afroasiatic and, in this case, Proto-Austronesian, that gave rise to languages spoken in Southeast Asia, parts of continental Asia, Australasia and the Pacific.
The research team's computational model uses probabilistic reasoning - that explores logic and statistics to predict an outcome - to reconstruct more than 600 Proto-Austronesian languages from an existing database of more than 140,000 words, replicating with 85 percent accuracy what linguists had done manually.
According to the researchers, while manual reconstruction is a meticulous process that can take years, this system performs a large-scale reconstruction in a matter of days or even hours.
The program can also provide clues as to how languages might change years from now.
The computational model is based on the established linguistic theory that words evolve along the branches of a family tree - much like a genealogical tree - reflecting linguistic relationships that evolve over time, with the roots and nodes representing proto-languages and the leaves representing modern languages.
Using an algorithm known as the Markov chain Monte Carlo sampler, the program sorted through sets of cognates - words in different languages that share a common sound, history and origin - to calculate the odds of which set is derived from which proto-language.
At each step, the program stored a hypothesized reconstruction for each cognate and each ancestral language.