A series of intimate conversations could teach an AI to understand both language and culture.
An interesting article about the social process of creating machine translation datasets in Khoekhoegowab. Excerpt:
On the surface, Wilhelmina Ndapewa Onyothi Nekoto and Elfriede Gowases seem like a mismatched pair. Nekoto is a 26-year-old data scientist. Gowases is a retired English teacher in her late 60s. Nekoto, who used to play rugby in Namibia’s national league, stands about a head taller than Gowases, who is short and slight. Like nearly half of Namibians, Nekoto speaks Oshiwambo, while Gowases is one of the country’s roughly 200,000 native speakers of Khoekhoegowab.
But the women grew close over a series of working visits starting last October. At Gowases’s home, they translated sentences from Khoekhoegowab to English. Each sentence pair became another entry in a budding database of translations, which Nekoto hopes will one day power AI tools that can automatically translate between Namibia’s languages, bolstering communication and commerce within the country.
“If we can design applications that are able to translate what we’re saying in real time,” Nekoto says, “then that’s one step closer toward economic [development].” That’s one of the goals of the Masakhane project, which organizes natural language processing researchers like Nekoto to work on low-resource African languages.
Compiling a dataset to train an AI model is often a dry, technical task. But Nekoto’s self-driven project, rooted in hours of close conversation with Gowases, is anything but. Each datapoint contains fragments of cultural knowledge preserved in the stories, songs, and recipes that Gowases has translated. This information is as crucial for the success of a machine translation algorithm as the grammar and syntax embedded in the training data.
Read the whole thing.













