Thursday, 14 May 2015

Rules for writing software to run experiments

Most economic lab experiments are programmed in zTree, a language/server designed for the purpose by the formidable Urs Fischbacher. But zTree requires a special client, so it can't run over the web, and in other ways it is showing its age a bit. As a result, many people are trying to write new software for running experiments, both in the lab and on the web.

This list of rules comes from my experience writing betr, the behavioural economics toolkit for R. If you know R and want to try betr out, there is an introduction here. Some of these rules are things I got right... others I have learned by bitter experience.

Allow your users to write experiments in a natural idiom

Here is how I think about an ultimatum game.

1      When all subjects are ready in front of their computers,
2      the instructions are read out.
3      Then subjects are divided randomly into groups of 2.
4      In each group:
5          One person is randomly chosen to be the proposer. The other is the responder.
6          The proposer chooses an offer x between 0 and 10 pounds.
7          The responder sees this offer x and accepts or rejects it.
8      When all groups are done, profit is calculated (0 if offer was rejected; otherwise responder x, proposer 10 minus x)
9      All subjects are shown their profit.
10     After a questionnaire and payment, subjects leave.

This is a clear sequence of instructions. Some are performed in parallel for different groups of subjects. There are places where people have to wait (When all subjects are ready... When all groups are done...).

When I program an ultimatum game, I want to write something like this. If you ask me to write a set of web pages, or create an experiment by object-oriented inheritance (you wot?) then it will be hard for me to understand how my program relates to my design.

This is the most important point. If you haven't done this, you haven't helped me – after all, given enough expertise I could write my program in any language out there. Make it easy for the experimenter!

zTree gets this right: experiments are sets of stages which run programs before displaying some forms to the user. betr gets it fairly right too, I think .

Make the server persistent. Don't use a scripting language that exits after each request

Lots of people think "Oh, an experiment is just a website! I'll write it in PHP." This is a big mistake.

An experiment is not a collection of web pages which subjects can visit at will. It is a defined sequence of events, controlled by the experimenter, which the subjects must go through.

A typical setup for a website is: a web server such as Apache accepts HTTP requests from the clients. It responds to each request by starting a script, which gets info from the client - e.g. a form submission - passes some HTML back to the server, and exits. The server passes the HTML back to the client.

Doing things this way is fine for most websites, but it will cause you two problems.

First, all of your files will start like this (pseudocode):
// ok where are we?
s = get_session_from_cookie()
// have we done page 1 yet?
if (s.page_one_complete == FALSE) redirect_to("pageone")
// great! let's go... wait, what if they've used the back button from page 3?
if (s.page_three_complete == TRUE) {
  error = "You mustn't use the back button, you cheater!"  
// ready now...
wait, what if the other subject hasn't finished?
other_subject = s.other_subject()
if (other_subject.page_one_complete == FALSE) {
    come_back_to = "pagetwo"

Et cetera. When you have many stages in your experiment, this rapidly becomes an unreadable mess.

This problem can be mitigated by using a nice library. Unfortunately, the next problem is much nastier. Here's some pseudocode for a Dutch auction. The price goes down and the first person to click "buy" gets the object.
price = starting_price - (now() - start_time) * step
if (user_clicked_buy()) {
    if (object_bought == FALSE) {
        object_bought = TRUE;
        user.profit = object_value - price;
Looks fine?

HOOONK. Race condition.

A race condition happens when two different bits of code execute simultaneously. Sometimes two users will click buy at almost the same time. These scripts will then execute in parallel. The first one will get to object_bought and check that it is FALSE. hortly afterwards, the second one will get to object_bought and it will still be FALSE. Then, the first script will buy the object, setting object_bought to TRUE, but too late, the second script has bought it as well!

Now you have two users who bought the object; the rest of your code, which assumes only one user has bought the object, is broken; you're going to have to pay twice; and you've misled your participants – bye bye, publication.

The good news: this will happen only rarely. The bad news: rarely enough that you never catch it in testing.

You want a single server process that decides where your subjects are and what they can see, and that is running from the start to the end of the session.

zTree and betr both get this right.

Use the web browser as a client

I think this is a no-brainer. If you are running online experiments, it's the obvious choice – every device has a web browser. Even if you are in the lab, you can still run web browsers on the subject computers.

Web browsers display HTML. HTML is incredibly powerful and versatile. Video chat. UI toolkits. Angry Birds. You get all these possibilities for free, often in easy-to-use components that can be copied and pasted. You can even build HTML pages with a GUI.

Most experiments are simple: some instructions, an input form, a result. But sometimes experimenters want to do more, and if you are designing for participants recruited from the web, you want your UI to be as easy as possible because, unlike in the lab, you do not have a captive audience. So, make the power of HTML available. Of course, it should also be easy to whip up a quick questionnaire or form without knowing HTML.

betr ✔ zTree ✘.

Don't use HTTP for communication

Web browsers display pages using HTML, the HyperText Markup Language. They communicate with web servers using HTTP, the HyperText Transfer Protocol. You should not do the same, however!

The reason is: HTTP is driven by requests from the client, to which the server responds. But often, experiments need to be pushed forward from the server. For example, think of a public goods game with a group size of four. Until there are four people, subjects see a waiting page. When the fourth subject joins, you want all four subjects move forward into the first round. The server must push clients 1, 2 and 3 forward.

If you try to do this with HTTP, you will have some sort of polling where clients regularly send requests to the server, which responds "keep waiting" until it is ready. This is a horrible kludge. You are reliant on your clients to keep polling (but maybe some guy closed the browser window and you don't know that!) You have to manually work out when everyone has moved on, which means keeping track of state. Argh.

Another example: suppose you have a market. The server takes bids and offers and calculates prices. Whenever the price changes, you want all clients to update. With http, this is going to be impossible.

Luckily, modern browsers have a new technology called websockets which allow for two way communication between client and server. This is probably the way to go. Load a basic HTML page which connects to your server using a websocket. Then send new HTML pages and other updates via the websocket.

zTree obviously gets this right by not using http . Sadly, betr gets it wrong . Oliver Kirchcamp pointed this out to me. I'm thinking about how to fix it in future versions.

Make testing and debugging easy

Testing and debugging zTree is not much fun. It involves manually opening one window for each client, and running through the experiment. Then doing it again. And again. And again.

But testing is really important. If you screw up in a real session, you instantly waste maybe £500 in experimental payouts. Worse still, if your experiment didn't do what you said it was going to, your subjects may not believe experimental instructions in future.

And, you typically need to test a lot, because you want to test distributions of outcomes. For example, suppose in your design, subjects are randomly rematched. You want to make sure that subjects are never rematched with the same partners. This requires testing not once, but many times!

Failure to test leads to problems. Here is a zTree code snippet that is around on the internet, e.g. here. It lets you pick a random number uniformly between 1 and max.
rand = round(random()*(max-1), 1) + 1
There is just one problem: this snippet is wrong. It does not create a uniform distribution. 1 and max will be half as likely as the numbers in between. If experimenters test that their random number is really uniformly distributed, they will catch this bug. Otherwise, they will have bad code, and perhaps mistaken results.

A good way to test is to create "robots" that can play your experiment automatically. This requires some clever design. Nice features would include
  • printing out the HTML pages sent to subject
  • being able to mix robots and real users (so you can change one users' actions while keeping others the same)
  • create robots easily from records of previous play
All of this should be provided by good modern experiment software.

betr gets a partial ✔. There is a nice replay() function which can do a lot of this stuff. zTree ✘.

Experiment sessions should be idempotent

This is related to the previous point. It also helps for crash recovery. Idempotency means that if it gets the same inputs, an experiment should produce the same results. Wait, don't we want to do a lot of randomization? Well, yes, but we also want to be able to recover from crashes, and to replay experiments exactly, by replaying commands from the clients. Among other things, this means that everything your clients do should be stored on disk so you can replay it. You should also take care to store the seeds used for random numbers.

Both betr ✔ and zTree encourage this. zTree gets a bigger tick because (I believe) it enforces it.

Let experiments be written in an existing language, don't create your own

This is the same principle as "use the web browser as a client". It's about giving your experimenters tools. zTree can do a lot, but you can't (for example) define your own functions. So, if you want to repeat a bit of code... you have to copy and paste it. A good general purpose language gives the user access to many many libraries that do useful things.

Here betr ✔ wins against zTree . On the face of it, R is a strange choice of language to write a web platform in! But it has a lot of power, and many academics use it already. Python would be another good choice. (PHP would not.)

It must be easy to share data with clients

An awesome feature of zTree is that if you write a piece of user interface code which uses a variable X, and X changes on the server, that is automatically reflected on all clients' screens. Doing this is hard, but very useful: think about displaying prices in a market experiment.

zTree ✔ betr .

There are many alternatives to zTree out there. (Some of the most interesting: oTree, moblab, sophie, boxs). I expect that soon, some of them will start to get traction and be used more widely. I look forward to a wider choice of powerful, easy software to run experiments!