Welsh and other smaller language movements on Wikimedia projects suggest there may be ways to train technology to allow for cultural differences.
An article in Slate about the role of Wikipedia in creating language tools. Excerpt:Â
Although Alexa still does not speak or understand Welsh, the Celtic languageâs presence in tech has increased dramatically within a short period. Google announced in February that it had expanded its offerings in Docs, Sheets, Slides, and Drive to include Welsh. And Google Translateâinfamous since 2009 for its Scymraeg, or scummy Welshâhas, according to the BBC, recently taken a great leap forward in terms of the accuracy and quality of its Welsh translations. Morlais and others attribute this in part to the fact that there are now more than 100,000 articles on the Welsh version of Wikipedia, known as Wicipedia.
Like other language editions, Wicipedia is a separate website with its own content, not simply a translation of English Wikipedia, a distinction that matters for both users and big tech companies. Back in 2017, Morlais observed, âThere appears to be an indication that there is a link between the languages with the most Wikipedia articles or pages and the languages that are supported by the digital giants.â Google Translate and other technologies use artificial neural networks to learn from example, training themselves with language data from rich internet sources like Welsh Wikipedia.
The Welsh community is not alone in using wiki-technology to promote its language. This yearâs Celtic Knot conference in Cornwall, England, included several indigenous languages with their own Wikipedia editions. The original idea, as the name suggests, was to focus on Celtic languages, including Irish, Scots, Breton, Welsh, and Cornish, which was declared extinct merely a decade ago. But as word got out about a Wikipedia minority language conference, others began to join, representing, for example, the SĂĄmi language spoken in parts of Norway, Finland, Sweden, and Russia; the Berber family of languages spoken in Northern Africa; and the Basque and Catalan communities. (In his 2017 presentation, Morlais noted that Catalan was one of the few minority languages supported by Google search, an accomplishment he linked to the fact that Catalan already had more than 500,000 articles on its language edition of Wikipedia.)
Read the whole thing.












