Home                    CV                            Swann Lab Group                          About Me                    MTurk Guide

Amazon Mechanical Turk Guide for Social Scientists  (updated 1-18-12)
By Michael Buhrmester (buhrmester@gmail.com)


For our evaluation of MTurk in PoPS, see Buhrmester, Kwang, & Gosling & accompanying supplement
Note: This page is meant to help the curious researcher successfully get studies up and running on Mechanical Turk with minimal fuss. I’ve answered a lot of mturk questions this past year and have tried to condense my answers into the FAQ below. I encourage anyone with any tips/comments/questions to please contact me! 


FAQ

Can I screen participants who fit my specific criteria?
            Yes (ish). Currently, the only in-house criteria screens are for physical location (which participants specify when they create an account), approval rate %, raw # of HITs approved, and willingness to see adult content. But what if you want to include males 18-24, currently pregnant women, or middle-aged men who've recently purchased a convertible? One approach would be to simply ask that people who fit your criteria only participate in your study. The problem, of course, is that people who don't fit your criteria can ignore you potentially without consequence. How can this be prevented? One solution that I've found to work is to screen participants yourself. It'll cost a little money because you'll be paying a small amount to potentially many people to who don't fit your criteria, but I believe it's a safe way to collect solid data. Essentially, you'll want to embed your screening criteria within a number of other questions so the screening items don't look suspicious. For everyone who qualifies, you could 1) instantly give them instructions for how to proceed with the real study (i.e., within SurveyMonkey, use the logic commands to have them continue onto the real survey) or 2) let them know that if they qualify, they'll be contacted via email.
            Another issue to be aware of is that mturk workers come from all over the world. If you leave your HIT up overnight (from the US), expect that the vast majority of responses will be coming from people on the opposite side of the planet. Deciding who to limit your survey to is obviously important, as is when you have it posted and available. I

Can I run longitudinal studies on mturk?
            Yes. You can ask participants to provide their email in the first wave, contact them via email for the follow up with instructions for the second wave. Be aware that turkers are weary of giving out their email because of the potential to receive spam. You'll want to be very explicit that you won't give out their email for any purposes other than the study. Even then, expect that a significant % of turkers won't give it.
            A second option is to contact workers using the “bonus worker” feature. You’ll have to award workers at least a penny, but you’ll also be able to include a message that they’ll receive. You can get to the “bonus worker” feature by clicking “Manage” > Workers > find the worker ID that matches who completed your wave 1. Alternatively, you can pull up the batch and see all the workers who completed your wave 1 batch at once. 
            You can also email workers without giving a bonus through the mturk system’s more advanced “command line reference” toolbox. It requires some basic coding know-how to use.
            Whichever method you choose, you’ll want to do some things to ensure that only your wave 1 participants come back for wave 2. One route is to take advantage of the “bonus worker” feature. Give each worker a code to enter at the end of wave 2 so you know who completed it successfully. Then give the worker their agreed-upon payment in the form of the bonus. The downside of this is that the “contract” or promise to pay is agreed upon through email rather than through accepting an altogether different HIT. So the second way to go is to create a second HIT and invite workers from wave 1 to complete the second HIT for wave 2. You’ll want to be explicit in the HIT’s description that the HIT is only for participants you’ve contacted and that people who weren’t contacted (i.e. your wave 1’ers) won’t be paid. A password/codeword system that you send in the email invite for wave 2 would be useful here.

How do I know that the participants are paying attention, or worse, are even real people and not survey-taking robots?
            Simple – test them! Provide an attention-check item somewhere in your study. I would argue that it’s better to check toward the middle of the study where participants may turn on “cruise-control”. Here’s a paper describing one method -- http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1003424. It’s been suggested that the same item (i.e., the one in the paper) not be copied again and again, potentially conditioning workers to identify the item. Get creative and make up your own.

Why does it say I have to be from the US to make a requester account?
            It’s sort of a complicated answer, but there’s some potential ways around it. A good discussion can be found here: http://www.behind-the-enemy-lines.com/2010/02/why-mechanical-turk-allows-only-us.html

Is mturk ethical to use?
            There are some ethical concerns about mturk regarding low payment, with some calling mturk an online sweatshop. In some respects, these concerns are warranted – getting paid pennies to identify objects in a picture or retrieve the location of some website ad infinitum can look a lot like a sweatshop. One response by some to these concerns has been to point out that mturk is a voluntary place to earn money. The investment on the part of workers to get started on the site is extremely low, and they are free to come and go as they please. Basically, or so the argument goes, mturk is not like a regular job and therefore the same ethical rules don’t apply. Multiple findings support this stance to some extent -- a majority of workers use the site casually as an alternative to less-productive internet activities (i.e., surfing facebook) and don’t rely on mturk as their #1 source of income.
            If this argument doesn’t feel satisfying, you’re not alone. There are a number of things a researcher can (and should) do to mitigate ethical concerns about payment:
            1. Simply pay more. When you enter in how long you expect the study to take and how much you will pay, mturk calculates the hourly wage you’re paying. Our work and others has shown that workers are sensitive to how much they are getting paid – the more you pay, the quicker the data rolls in. Everybody wins.
            2. Be explicitly clear about how long the study will take. Requesters input and workers see the time allotted to complete the study; it’s up to the requester to describe how long the study will actually take in the description write-up. One reason workers may take on tasks that pay a low wage is because they believe they can do it faster than expected, pushing their hourly work wage higher. Be clear that the study will take everyone X amount of time.
            3. Treat workers more like in-the-lab participants. Simply describing the study as an “academic research study” doesn’t mean a whole lot to most people. Describe in more detail how important their participation is to conducting psychological science -- that their responses will be used to make generalizations about how people think, feel, and act in general. Workers who may be accustomed to completing relatively mindless tasks will appreciate what they are doing and may realize that their participation is less about making money and more about the experience. Our data and others suggest that a strongly held motivation of workers is to engage in interesting activities in productive ways. Hone in on this motivation. Part of the supposed reward for participants in undergrad research pools is the knowledge and insight that they gain by participating in research studies. Do the same online – give participants the opportunity to learn about cutting-edge research during the debriefing. Go even further than the traditional debriefing too if you can – provide personalized feedback based on their responses in the study. Sites like http://www.outofservice.com/ have received millions of responses just by providing Big 5 personality feedback to people. Your feedback doesn’t even have to be necessarily related to the focus of your research – it just needs to be something that is interesting and informative. If our goal as researchers is to spread psychological knowledge, mturk is a high-volume way to get this done.
            In short, I think there are many simple and creative ways to keep mturk from becoming an online sweatshop. I’m (hopefully) preaching to the choir here: it’s the responsibility of every researcher using the site to adopt an ethical approach to mturk – wield your power wisely. 

Should I trust the data I collect from mturk?
            There’s many ways to answer this question. One route researchers have taken is to replicate reliable effects found in the lab on mturk. A fast growing body of work is showing that turkers think and act a lot like other samples. A number of papers have made more explicit comparisons between turkers and various other types of samples, finding a lot of similarities. Before getting started, I suggest you review this growing literature to learn about the advantages and limitations of mturk compared to other methods. I hope to add a list of mturk-related papers here soon.

What’s the deal with taxes and the IRS on mturk?
            Because you are paying people and acting essentially as a part-time employer, taxes potentially become an issue if you pay any individual worker more than the minimum threshold for IRS reporting, which I believe is $400 a year. This $400 threshold is per worker in a tax year, not the total amount you’re paying out to all workers. I’ve never come remotely close to that threshold, and you probably won’t either. In short, tax reporting should not be an issue. Amazon has a FAQ from the worker perspective on these issues here: https://www.mturk.com/mturk/help?helpPage=worker#tax_why_tax_info .

Why do you use surveymonkey? And why doesn’t it have a random assignment feature?
            The short answer to the first question is that surveymonkey is rather easy to use and changing is hard. It also allows unlimited responses for a reasonable price, and multiple researchers can use one account. Regarding random assignment, there isn’t an easy solution, but here’s what I’ve developed. Surveymonkey has a “randomize answer order” feature for multiple choice items, meaning that the answers will appear in a random order below the question. In the question, I instruct participants to “Please choose the number that appears at the top of the list below:”. Then in the list of answer choices I put x number of made up numbers depending on how many conditions of the experiment there are. I then use the answer logic feature to link each answer to a different page in the survey corresponding to the conditions. For example, say I have two conditions where in one condition I want people to write about the existential terror of dying and in another they write about going to the dentist. Right before the writing pages, I have a my random assignment # question – people pick one of two numbers. Then when they click “next”, the logic takes them to either my existential terror condition writing page, or it skips that page and takes them to the dental pain condition page. You can then use the page logic function to link everyone back to your DVs or whatever else you’ve got going on. 

Does mturk keep demographic info on file or do I have to ask?
            You have to ask.

-----------------------------------------

Getting Started – the basics

Visit www.mturk.com and get the lay of the land. Peruse the introductory pages about being a worker and being a requester. Take a look at the “HITs” tab where you can see all the currently available tasks. Clicking on the “Get Started” button on the requester side takes you here --https://requester.mturk.com/mturk/resources-- and has a lot of business-oriented stuff on it that you can ignore.

Read the Basic Getting Started Guide for Requesters.
From the Resource Center page, under the How To Guides box on the right, click on “Requester Website User Guide”. Here’s some extra things to consider:

Making an account:
1.       When creating an account, you might consider making a new account that is separate from an existing personal Amazon.com account (with new email address).
2.       You will also be asked for a Requester name. This name is what will be seen by other Turkers, so choose wisely (such as using X Lab rather than your personal name).
3.       Before you are able to post a study (i.e., a HIT/batch), you need to pre-pay for the work the Turkers are about to complete. Once logged in, in the top right click on “Account Settings”. In the bottom left, click on “Prepay for Mechanical Turk HITs”. Punch in how much you want in your Turk account, then you’ll be taken to a billing screen where you enter in credit card information. Ask whoever is in charge of university/department participant payments before doing this – there’s likely a form for you to complete and a receipt to return. MTurk will send you an email confirmation of the pre-payment purchase and you can also find a log under “View Transaction History”.
Designing a HIT Template:
1.       Across the top of the page should be Design, Publish, and Manage tabs. Start with the Design tab. If you want to create a simple survey within MTurk (rather than linking to a different survey site like SurveyMonkey, described below), I would start with the “Survey Template” near the bottom.
4.       The guide contains some potentially confusing tips about how to create multiple HITs within a batch (e.g., if you want people to rate a bunch of images and get paid for each individual image (or set) of images, MTurk can do that). If you’re looking to do simple surveys or experiments, you can safely ignore those sections.
5.       On the “Enter Properties” page, I tend to make my titles short and include the estimated time it takes to complete (e.g., “5-10 minute survey on self-attitudes”). Then in the description I’ll explain what the study entails a little more. Enter in a bunch of keywords related to your study (survey, experiment, psychology, questionnaire, etc.). “Time allowed per assignment” refers to the amount of time a worker has from the time they click “accept HIT” to when they are allowed to “submit HIT”. I generally give plenty of time here in case people forget to submit after completing the survey (this is more common when you link them to a different site like SurveyMonkey and they forget to navigate back to MTurk and submit it).
6.       The criteria functions are important. People from all over the world are on MTurk, so think about who you want to participate in your study. For example, if you put no “location” restrictions on your HIT and leave the HIT available for completion overnight, expect to wake up to a lot of submissions completed by people living in India or somewhere else on the opposite side of the world (if you are from the US of course). MTurk seems to have caught on in some Asian countries. So as a researcher, ask yourself if you would like to sample from these countries. In my experience, some foreign Turkers tend to complete surveys quicker and are more likely to skip questions that require typed short-answer responses. This may be because they are more motivated by money than are workers from other countries (although this is certainly an empirical question).  If you do not wish to place a location requirement but want to avoid potentially sub-par work, the approval rating function is your best shot. The approval rating is calculated for each worker and is the percentage of approved submissions divided by the worker’s total submissions. So if a worker has 3 approved submissions but 1 rejected submission, he/she would have a rating of 75%. MTurk recommends a 95% approval rating. I’m not sure how they decided on that number – perhaps it’s p-value inspired? I’ve personally moved the approval rating around between 50-99 and at least in my experience, the higher rating requirement seems to slow down the flow of incoming submissions without affecting data quality, but I’ve done no formal test of this.
7.       Payment amount is up to you. There’s significant effects of both the estimated time it takes to complete the HIT and the payment amount, so you’ll have to peruse the “market rate” based on what other Requesters are paying at that time. Generally, it’s going to be really cheap…I’ve collected loads of data paying around 10 cents per person for 10 minute studies (see Buhrmester, Kwang, & Gosling, in press).
8.       On the Design Layout tab, here’s how I generally lay things out…
a.       Title
b.      Study description (can put IRB consent statements here)
c.       Statement about re-posting of the HIT (explained below)
d.      The survey / instructions and link to survey (e.g., SurveyMonkey)
e.      Completion code / comments box (optional)
9.       For c., I have found that data can be collected faster if you re-post your HIT after a day or even a few hours. When you first publish your HIT, it is loaded to the top of the long list of available HITs (if you click on the “HITs” tab from the main page, your freshly published HIT should appear shortly). As other Requesters publish their HITS, they get put on top and yours slides down the list. Apparently, most workers hunt for work from the top down, so fewer eyes will see your HIT after it’s been posted for a while. My solution has been to re-post the HIT, sending it to the top again. The potential issue with this is that there’s no way I know of to disallow people who’ve already completed the HIT from completing it again. To deter would-be duplicate responders, I include a statement that says “This HIT is periodically re-posted. If you’ve already completed this HIT previously, please do not complete it a second time. You will not be compensated a second time.” This statement has worked for me for the most part, though I recommend checking your actual data for duplicate responders.
10.   If you want to use an outside survey site like SurveyMonkey, I recommend using a completion code system to 1) deter people from accepting and submitting the HIT without having actually gone to SurveyMonkey and completing the study and 2) to be able to link MTurk submissions to the SurveyMonkey data. The high-tech way to do completion codes (which I don’t think you can do with SurveyMonkey) is to assign each person a different code at the end of the study, and instruct him/her to enter the same code in a text box on the MTurk page before submitting. The low-tech way that I use is to instruct each participant to make up their own 4 or 5 digit completion code number, enter it on SurveyMonkey, and again on MTurk. If more than one person makes up the same code, I can use the timestamp data from each to figure out who’s who.
11.   When linking to a different site, I’ve found that explicitly asking people to open the page in a new window/tab helps out. If you link them with a hyperlink and it opens in that same window, people have a hard time navigating back to the submission page and will likely e-mail you to complain.

Publishing your HIT
1.       This should be pretty straightforward. Double-check that everything looks right and you’re ready to make your HIT available to turkers. Make sure you’ve got enough money in your account to pay for the # of responses X payment amount per response.

Managing your HIT
1.       I have to admit, the first time I collected data on MTurk, I was glued to the screen, watching the green bar tick up as the submissions came in in real-time. If you’re linking to an outside survey site, you’ll want to keep tabs on how many people have submitted on MTurk versus how many have completed the survey on your site.
2.       As soon as submissions come in, you can individually review them and award or reject payment. Deciding when to reject payment can potentially be tricky – you’ll definitely want to speak with your IRB about what circumstances are appropriate. You can also reject or award payment en masse. Note that under the design tab, you enter a cutoff time for when payment will be automatically rewarded.
      Another good resource for starting up is this best practices guide: http://mturkpublic.s3.amazonaws.com/docs/MTURK_BP.pdf

Other Links:
Basic Resource Center for Requesters -- https://requester.mturk.com/mturk/welcome

Advanced “how-to” guide by Amazon (good if you have programming knowledge and want to do more complex stuff) --http://docs.amazonwebservices.com/AWSMechTurk/latest/AWSMechanicalTurkGettingStartedGuide/

FAQs by Amazon -- https://requester.mturk.com/mturk/help?helpPage=main

Some meta-data on the mturk worldhttp://mturk-tracker.com/about/       

A recent newsfocus report in Science by John Bohannon about mturk in the social science world -- http://www.sciencemag.org/content/334/6054/307.short

An excellent in-depth guide by Winter Mason & Siddharth Suri can be found here --http://www.mendeley.com/research/guide-conducting-behavioral-research-amazons-mechanical-turk/

Mturk blogs:

Panos Ipeirotis’ excellent blog on many mturk issues: http://www.behind-the-enemy-lines.com/

Gabriele Paolacci & Massimo Warglien’s excellent blog that also touches on many mturk issues:
http://experimentalturk.wordpress.com/

A post on pros & cons of mturk for science: http://blogs.scientificamerican.com/guilty-planet/2011/07/07/the-pros-cons-of-amazon-mechanical-turk-for-scientific-surveys/

Feel free to contact me with any mturk related advice, updates, news, etc. I’m happy to add and collect as many mturk related resources here as possible.  Happy turking.

 

 

 

Free counter and web stats