Blog

Research Dive: making sense of language data in Indonesia

Dikara Alkarisya and Elaine Hartono
Aug 15, 2016

“Doing a research dive is an entirely new way of approaching research. It allowed me to come together with other lecturers and researchers, and to find new ways to collaborate” - Research Dive Participant.

A new dictionary for development

Just a few months ago, Pulse Lab Jakarta launched Translator Gator, a people-powered language game to support research initiatives in Indonesia by translating words from English to any of six common Indonesian languages.

The game enabled the Lab to compile user-created dictionaries of words related to sustainable development. These dictionaries will help Global Pulse and partners in carrying out automated analyses of social media, to better understand which issues matter to people and what they are saying about education, health, climate change, and other key development issues.

Gaming proved to be a powerful and efficient way to tap into the 'wisdom of the crowd'. In just a few months, Translator Gator gathered more than 109,000 user contributions from hundreds of players from across Indonesia. After casting the net wide to gather this valuable body of data, Global Pulse recently invited a group of linguistic experts to dive deep into this data.

A Research Dive into 109,000 pieces of new data

Last month's Research Dive (one of many diferent types of learning events the Lab organizes) took place at the UN Global Pulse Lab in Jakarta and carried the theme Computational Linguistics and Natural Language Processing. Over the span of this 2-day event, 19 computational linguistic experts and advisors from 18 different universities and government research institutions were invited to collaboratively explore and analyze the data. Broken up in groups, participants were tasked with assessing the quality of the translations, visualizing the data to make better sense of it, and filling in important translation gaps.

Through Translator Gator, gamers had translated a pre-defined set of 2,000 English key words and phrases related to the Sustainable Development Goals (SDGs) to six Indonesian languages (Bahasa Indonesia, Jawa, Sunda, Bugis, Minang, and Melayu). However, while the entire English word set was translated to the Indonesian national language, only about 80%, 25% and 24% of the English taxonomy was translated to Bahasa Jawa, Sunda and Melayu, respectively. Finding creative digital solutions to filling in these gaps was thus an important task for the computational linguists. 

A quick look into the days of the “Research Divers”

“It is a great experience to join a Research Dive. I love the work atmosphere and the ethic that allows people to make mistakes” -Research Dive Participant

On the first day, the participants familiarized themselves with the datasets provided to them. As experts on computational linguistics and natural language processing - the field of study concerned with the interactions between computers and human languages - the participants brainstormed about how these rich and entirely new data sets could feed their own research interests.

Each team determined their research question and conducted research through the second day. One group, for example, used the data sets to construct a synonym dictionary in Bahasa Indonesia, and another used only meta-information to make inferences about translation accuracy. At the end of the Dive, the teams presented their results. 

What's next?

“Having a solid team and great teamwork made the tasks assigned much easier. Not only did the Research Dive teach me to be more critical, it also gave me a lot of new research ideas” -Research Dive Participant

The Research Dive served as an opportunity for the computational linguists to network and share each other’s expertise, laying the foundation for new collaborations. Participants will take their initial explorations of the data back to their home institutions, where they will conduct mini-workshops and explore further research opportunities using this data. The teams will continue to collaborate, and will be submitting technical reports for publication once finalized.

The Lab is excited to have had so many women with a strong background in computer science participate and is grateful to all who contributed to making this a successful event. Pulse Lab Jakarta plans to continue using the learning event model of the Research Dive for the different areas of work it operates in.

Add comment