The XKCD comic "Study" parodies the challenges of recruiting study participants. |
As part of "Social Cards Probably Provide For Better Understanding Of Web Archive Collections" (recently accepted for publication by CIKM2019), I had to learn how to conduct user studies. One of the most challenging problems to solve while conducting user studies is recruiting participants. Amazon's Mechanical Turk (MT) solves this problem by providing a marketplace where participants can earn money by completing studies for researchers. This blog post summarizes the lessons I have learned from other studies that have successfully employed MT. I have found parts of this information scattered throughout different bodies of knowledge, but not gathered in one place; thus, I hope it is a useful starting place for future researchers.
MT is by far the largest source of study participants, with over 100,000 available participants. MT is an automated system that facilitates the interaction of two actors: the requester and the worker. A worker signs up for an Amazon account and must wait a few days to be approved. Once approved, MT provides the worker with a list of assignments to choose from. A Human Interface Task (HIT) is an MT assignment. Workers perform HITs for anywhere from $0.01 up to $5.00 or more. Workers earn as much as $50 per week completing these HITs. Workers are the equivalents of subjects or participants found in research studies.
Workers can browse HITs to complete via Amazon's Mechanical Turk. |
Requesters can create HITs using the MT interface, which provides a variety of templates. |
The MT environment is different from that used in traditional user studies. MT participants can use their own devices to complete the study wherever they have a connection to the Internet. Requesters are limited in the amount of data that they can collect on MT participants. For each completed HIT, the MT system supplies the completion time and the responses provided by the MT participant. A requester may also employ JavaScript in the HIT to record additional information.
In contrast, traditional user studies allow a researcher to completely control the environment and record the participant's physical behavior. Because of these differences, some scholars have questioned the effectiveness of MT's participants. To assuage this doubt, Heer et al. reproduced the results of a classic visualization experiment. The original experiment used participants recruited using traditional methods. Heer recruited participants via MT and demonstrated that the results were consistent with the original study. Kosara and Ziemkiewicz reproduced one of their previous visualization studies and discovered that MT results were equally consistent with the earlier study. Bartneck et al. conducted the same experiment with both traditionally recruited participants and MT workers. They also confirmed consistent results between these groups.
MT is not without its criticism. Fort, Adda, and Cohen raise questions on the ethical use of MT, focusing on the potentially low wages offered by requesters. In their overview of MT as a research tool, Mason and Suri further discuss such ethical issues as informed consent, privacy, and compensation. Turkopticon is a system developed by Irani and Silberman that helps workers safely voice grievances about requesters, including issues with payment and overall treatment.
In traditional user studies, the presence of the researcher may engender some social motivation to complete a task accurately. MT participants are motivated to maximize their revenue over time by completing tasks quickly, leading some MT participants to not exercise the same level of care as a traditional participant. Because of the differences in motivation and environments, MT studies require specialized design. Based on the work of multiple academic studies, we have the following advice for requesters developing meaningful tasks with Mechanical Turk:
- complex concepts, like understanding, can be broken into smaller tasks that collectively provide a proxy for the broader concept (Kittur 2008)
- successful studies ensure that each task has questions with verifiable answers (Kittur 2008)
- limiting participants by their acceptance score has been successful for ensuring higher quality responses (Micallef 2012, Borkin 2013)
- participants can repeat a task – make sure each set of responses corresponds to a unique participant by using tools such as Unique Turker (Paolacci 2010)
- be fair to participants; because MT is a competitive market for participants, they can refuse to complete a task, and thus a requester's actions lead to a reputation that causes participants to avoid them (Paolacci 2010)
- better payment may improve results on tasks with factually correct answers (Paolacci 2010, Borkin 2013, PARC 2009) – and can address the ethical issue of proper compensation
- being up front with participants and explaining why they are completing a task can improve their responses (Paolacci 2010) – this can also help address the issue of informed consent
- attention questions can be useful for discouraging or weeding out malicious or lazy participants that may skew the results (Borkin 2013, PARC 2009)
- bonus payments may encourage better behavior from participants (Kosara 2010) – and may also address the ethical issue of proper compensation
For researchers starting down the road of user studies, I recommend starting first with Kelly's work and then circling back to the other resources noted here when developing their experiment.
-- Shawn M. Jones
No comments:
Post a Comment