London School of Economics Takes Agile Approach to Data Labeling with Appen
“Appen’s platform is really easy to use. What makes it great is you can reach so many different channels because of its global outreach.”
– Kenneth Benoit, Director of the Data Science Institute, LSE
The University
Founded in 1895, the London School of Economics and Political Science (LSE) has long been a global leader in social sciences among universities. One of their many research wings, the Data Science Institute (DSI) focuses on studying data science as it pertains to social, political, and economic issues. Experiments cover a range of human matters and frequently include data annotation projects that require human labels.
The Challenge
Researchers led by Kenneth Benoit at the Department of Methodology set out to study political science as it pertained to political texts—both in their content and in their sophistication. With the first project, their interest was in capturing the content of the messages that political actors send to others and further, using those discoveries to calculate political party positions. They found that relying on expert researchers to go through these messages was time-consuming, expensive, and nearly impossible to scale. Plus, using only experts in a field would provide a more singular perspective, making the data potentially more biased and less reliable.
The team was in need of a more agile, reproducible process for data labeling that would replace their current approach.
With the second project, researchers set out to identify indicators that would measure the sophistication, or readability, of political texts. To do so, they needed a large and varied sample size of texts, and numerous human labelers to compare texts to one another. They also wanted to reproduce the experiment across several languages, which would require fluent labelers in that language. Again, the challenge of experts in each of these languages was hard to find, expensive, and time-consuming. At the time, they were partnering with an organization that had limitations on the languages they could support, making it impossible to translate these political texts into all the languages they’d envisioned. The reporting that was needed for the research project was also not available through their provider so they had to calculate their own validity checks - something critical for research papers.
The Solution
“Appen’s reporting features were very useful, as was knowing the completion times, the responses, and the reliability scores of the crowd.”
- Kenneth Benoit, Director of Data Science Institute, LSE
The research team engaged us (at the time known as CrowdFlower) in 2015 after meeting at a conference. Our platform had several features that they needed:
- a dashboard that included important validation metrics, such as confidence checks
- user-friendly, so setting up jobs was a quick process
- access to an unrestricted global Crowd of contributors
With the first project, contributors were given sentences from political leaders and asked several questions, like “Is this sentence about immigration? If so, is it pro, neutral, or negative toward immigration?” Aggregating the answers to these questions using statistical modeling produced an overall score that indicated a political party’s position on that policy. These metrics were then used for inputs on other models, so labeling accuracy was key. As long as the Crowd maintained an accuracy score of at least 70%, they could continue working on the project.
With the second project, our Crowd performed thousands of pairwise comparisons of short passages of political texts, answering the question of which text was more difficult to read and understand. The research team then used the crowd comparisons to fit a statistical model that measured textual sophistication. Using 24 quantitative indicators constructed from the text (such as sentence length, the number of syllables in a word, the number of dependent clauses, etc.), they identified the indicators that most highly predicted the difficulty in understanding a particular political piece.
The Result
Both projects represent the successful use of our technology platform and our global Crowd to accomplish data labeling in a way that’s fast, inexpensive, and scalable—without sacrificing data quality. With the first project, a labeling task that would have taken weeks for experts to complete could be accomplished in just four to five hours using our Crowd, and likely resulted in less biased output thanks to the diversity of perspectives. By the end of the experiment, the contributors had annotated a total of 20,000 sentences from six political parties, each between five and 20 times. Thanks to our global platform, the LSE researchers were also able to replicate their study in several other languages to further validate the data they produced. For more information about this study, view their article in a top political science journal, the American Political Science Review.
The second experiment was published in another top political science journal, the American Journal of Political Science. Leveraging our crowd for this experiment enabled the LSE Department of Methodology research team at the London School of Economics to capture a large enough body of data to perform further analysis. They were able to identify the top four indicators that best predicted the readability of political texts, which were then used to build a machine learning model that could predict sophistication for any given political text. This model enables more accurate comparisons and analysis of political discourse going forward.
Learn more about expanding and amplifying your own AI initiatives and data annotation projects with Appen's data annotation capabilities.
*Moving forward, Kenneth Benoit will continue to lead similar research projects, but as the Director of the Data Science Institute (DSI), a relatively new research wing at the London School of Economics.