Category Archives: Game Theory

The ways roads work as part of a user-interface

If you start thinking of the way roads work as part of the user-interface for driving, then it is straightforward to see why occasional one-way streets are problematic. Consider how clickable-things worked in Windows 7. Sometimes, to get something to ‘go’ in the relevant sense, you would click twice on the thing. Other times, you had to click just once. This caused significant confusion, because it created an inconsistent interface. Similarly, when driving, if roads interface usually one way, but then switch to another way occasionally, it can cause similar confusion and frustration for the user (i.e., the driver).

Poorly designed stove tops

A couple months ago, I started using a new stove. Instead of the traditional electric coil-top, it features a glass top.

The basic interface isn’t that complicated. Corresponding to the four circular burners, there are four knobs that control the heat.

This is where the first major design error is. The back-left burner is smaller than the front-left one, and so is positioned to the left of the front-left one.

Naturally, therefore, one would expect the left-most dial to control the back-left area, and the second most left dial to control the front-left one.

This is not so – instead, these are reversed. The situation is similar on the right side of the stove.

The second obvious design error is with the burner light. There is one light on the left of the stove top that indicates that the heating area is hot. There is another light on the right of the stove top that indicates that a burner is on. The only thing that distinguishes these lights is where they are (left or right side of the stove) and the small text next to each. The left one says “Hot Surface,” the right one says “Element On.”

The first problem is that it’s intuitive to think that a given light is indicating something about the burners on its side. The second problem is that the lights have the same colour. Since the only way to keep them straight at a distance (because the text is small) is to remember one is on the left side, the other on the right – which is not something easily remembered – an obvious solution to this design problem is to colour-code the lights. Perhaps yellow and red could be used to distinguish.

In a way, it’s baffling that these basic elements of the stove user interface could have been gotten wrong.

Economic arguments

The non-monetized economy involves more value than the monetized economy – substantially more value. Many arguments are made in terms of ‘economic’ – i.e., monetized economic – benefit or detriment. It follows that this sort of argument will be incomplete, and often will be significantly incomplete – i.e., most of the loss or gain following from such-and-such policy may be in the non-monetized economy.

The problem with the non-monetized economy is that it is difficult to quantify. The monetized economy is relatively easy to quantify (measure the money being exchanged). If there were a way to estimate and then quantify in a comparable way the non-monetized economic impact, that would make responding to monetized economic arguments easier, it seems.

Newcomb’s Paradox: A Solution Using Robots

Newcomb’s Paradox is a situation in decision theory where the principle of dominance conflicts with the principle of expected utility. This is how it works:

The player can choose to take both box A and box B, or just take box B. Box A contains $1,000. Box B contains nothing or $1,000,000. If the Predictor believes that the player will take both boxes, then the Predictor puts $0 in box B. If the Predictor believes that the player will take just B, then the Predictor puts $1,000,000 in box B. Then the player chooses. The player doesn’t know whether the Predictor has put the $1,000,000 in box B or not, but knows that the Predictor is 99% reliable in predicting what the player will do.

Dominance reasoning says for the player to take both boxes. Here’s why:

If the Predictor predicted that the player will choose just one box, then if the player picks just box B the player gets $1,000,000, but if the player picks both boxes the player gets $1,001,000. $1,001,000 > $1,000,000, so in this case the player should pick both boxes.

If the Predictor predicted that the player will choose both boxes, then if the player picks just box B the player gets $0, but if the player picks both boxes, the player gets $1,000. $1,000 > $0, so in this case the player should pick both boxes.

So, no matter what the Predictor did, the player is better off choosing both boxes. Therefore, says dominance reasoning, the player should pick both boxes.

Expected utility reasoning, however, says for the player to take just box B:

If the player picks both boxes, expected utility is 0.99*$1,000 + 0.01*$1,100,000 = $11,990. If the player picks just box B, expected utility is 0.99*$1,000,000+0.01*$0 = $990,000. Expected utility is (much) higher if the player picks just box B.

The problem is called a ‘paradox’ because two decision making processes that both sound intuitively logical give conflicting answers to the question of what choice the player should make.

This description of Newcomb’s Paradox is actually ambiguous in certain respects. First, how does the Predictor predict? If you don’t have any idea, it could be difficult to figure out what’s going on here. The second (and related ambiguity) is how the player can choose. Can they choose randomly, for example? (If they choose in a completely random way, it is difficult to understand how the Predictor predicts correctly most of the time.)

Instead of addressing the ambiguous problem above, I decided to create a model of the situation that clarifies the exact mechanics. This model, then, might not address certain issues others have dealt with in the original problem, but it adheres to the general parameters above. Any solutions derived from the model apply to at least a subset of the formulations of the problem.

It is difficult to create a model with humans, because humans are too complex. That is, it is very difficult to predict human behaviour on an individualized basis.

Instead, I created a model involving robot agents, both player and Predictor.

This is how the model works (code at bottom of post):

time = 1

Player is either Defiant Dominance (DD) or Defiant Expected Utilitarian (DE). What this means is that

if player is DD, then % chance player picks both boxes = 99%.

if player is DE, then % chance player picks just box B = 99%.

time = 2

The Predictor checks the player’s state:

if player is DD, then Predictor puts no money in box B

if player is DE, then Predictor puts $1,000,000 in box B

time = 3

Then the player plays, based on its state as either DD or DE, as described above.

It follows that the Predictor will get it right about 99% of the time in a large trial, and that the DE (the player that consistently picks the expected utility choice) will end up much wealthier in a large trial.

Here are some empirical results:

trials = 100, average DD yield = $1,000, average DE yield = $1,000,000

trials = 10,000, $990.40, $1,000,010.20

trials = 100,000, $990.21, $1,000,010.09

Yet, to show the tension here, you can also imagine that the player is able to magically switch to dominance reasoning before selecting a box. This is how much that the players lost by not playing dominance (same set of trials):

trials = 100, total DD lost = $0, total DE lost = $100,000

trials = 10,000, $96,000, $9,898,000

trials = 100,000, $979,000, $98,991,000

What this shows is that dominance reasoning holds at the time the player chooses. Yet, the empirical results for yield for the kind of player (DD) that tends to choose dominance reasoning are abysmal (as shown in the average yield results earlier). This is the tension in this formulation of Newcomb’s Paradox.

What is clear, from looking at the code and considering the above, is that the problem isn’t with dominance reasoning at time = 3 (i.e., after the Predictor makes his prediction). A dominance choice always yields a better result than an expected utility choice, in a given environment.

The problem, rather, is with a player being a DD kind of player to begin with. If there is a DD player, the environment in which a player chooses becomes significantly impoverished. For example, here are the results for total rewards at stake (same trials):

trials = 100, total with DD = $100,000, total with DE = $100,100,000

trials = 10,000, $10,000,000 ($10M), $10,010,000,000 (> $10B)

trials = 100,000, $100,000,000 ($100M), $100,100,000,000 (> $100B)

DD is born into an environment of scarcity, while DE is born into an environment of abundance. DE can ‘afford’ to consistently make suboptimal choices and still do better than DD because DE is given so much in terms of its environment.

Understanding this, we can change how a robot becomes a DE or a DD. (Certainly, humans can make choices before time = 2, i.e., before the Predictor makes his prediction, that might be relevant to their later choice at time = 3.) Instead of simply being assigned to DD or DE, at time = 1 the robot can make a choice using the reasoning as follows:

if expected benefits of being a DE type > expected benefits of being a DD type, then type = DE, otherwise type = DD

This does not speak directly to the rationality of dominance reasoning at the moment of choice at time = 3. That is, if a DE robot defied the odds and picked both boxes on every trial, they would do significantly better than the DE robot who picked only 1 box on every trial. (Ideally, of course, the player could choose to be a DE, then magically switch at the time of choice. This, however, contravenes the stipulation of the thought experiment, namely that the Predictor accurately predicts.)

By introducing a choice at time = 1, we now have space in which to say that dominance reasoning is right for the choice at time = 3, but something that agrees with expected utility reasoning is right for the choice at time = 1. So, we have taken a step towards resolving the paradox. We still, however, have a conflict at time = 3 between dominance theory and expected utility theory.

If we assume for the moment that dominance reasoning is the rational choice for the choice at time = 3, then we have to find a problem with expected utility theory at time = 3. A solution can be seen by noting that there is a difference between what a choice tells you (what we can call ‘observational’ probability) and what a choice will do (what we can call ‘causal’ probability).

We can then put the second piece of the puzzle into place by noting that observational probability is irrelevant for the player at the moment of a choice. Expected utility theory at time = 3 is not saying to the player “if you choose just box B then that causes a 99% chance of box B containing $1,000,000″ but rather in this model “if you chose (past tense) to be a DE then that caused a 100% chance of box B containing $1,000,000 and also caused you to be highly likely to choose just box B.” I.e., expected utility theory at time = 3 is descriptive, not prescriptive.

That is, if you are the player, you must look at how your choice changes the probabilities compared to the other options. At t = 1, a choice to become a DE gives you a 100% chance of winning the $1,000,000, while a choice to become a DD gives you a 0% chance of winning the $1,000,000. At t = 3, the situation is quite different. Picking just box B does not cause the chances to be changed at all, as they were set at t = 2. To modify the chances, you must make a different choice at t = 1.

Observational probability, however, still holds at t = 3 in a different way. That is, someone looking at the situation can say “if the player chooses just box B, then that tells us that there is a very high chance there will be $1,000,000 in that box, but if they choose both boxes, then that tells us that there is a very low chance that there will be $1,000,000 in that box.”

Conclusion:

So, what is the Paradox in Newcomb’s Paradox? At first, it seems like one method, dominance, contravenes another, expected utility. On closer inspection, however, we can see that dominance is correct, and expected utility is correct.

First, there are two different decisions that can be made, in our model, at different times (time = 1 and time = 3).

player acts rationally at time = 1, choosing to become a DE

player thereby causes Predictor to create environmental abundance at time = 2

but player also thereby causes player to act irrationally at time = 3, choosing just box B

The benefits of acting rationally at time = 1 outweigh the benefits of acting rationally at time = 3, so “choosing just box B” is rational in so far as that phrase is understood as meaning to choose at time = 1 to be a DE, which in turn leads with high probability to choosing just box B.

Second, there are two different kinds of probability reasoning that are applicable: causal reasoning for agents (the player in this case), on the one hand, and expected utility for observers, on the other. Causal reasoning says at time = 1 to choose to be a DE, and at time = 3 to choose both boxes.

At neither time does causal reasoning conflict with dominance reasoning. Expected utility reasoning is applicable for observers of choices, while causal reasoning is applicable for agents making the choices.

Therefore, Newcomb’s Paradox is solved for the limited situation as described in this model.

Applying this to humans: For a human, there is probably no ‘choice’ to be a DD or DE at time = 1. Rather, there is a state of affairs at time = 1, which leads to their choice at time = 3. This state of affairs also causes the Predictor’s prediction at time = 2. The question of “free will” obscures the basic mechanics of the situation.

Since a human’s choice at t = 3 is statistically speaking preordained at time = 2 (per the stipulations of the thought experiment) when the Predictor makes his choice, all the human can do is make choices earlier than t = 2 to ensure that they in fact do pick just box B. How a human does this is not clear, because human reasoning is complex. This is a practical psychological question, however, and not a paradox.

Notes:

One lesson I took from solving Newcomb’s Paradox is that building a working model can help to ferret out ambiguities in thought experiments. Moving to a model using robots, instead of trying to think through the process first-person, helped significantly in this respect, as it forced me to decide how the players decide, and how the Predictor predicts.

Creating a model in this case also created a more simple version of the problem, which could be solved first. Then, that solution could be applied to a more general context.

It only took a few hours to get the solution. The first part, that there is a potential choice at time = 1 that agrees with expected utility reasoning, came first, then that what matters for the player is how their choice causally changes the situation. After thinking I had solved it, I checked Stanford’s Encyclopaedia of Philosophy, and indeed, something along these lines is the consensus solution to Newcomb’s Paradox. (Some people debate whether causality should be invoked because in certain kinds of logic it is more parsimonious to not have to include it, and there are debates about the exact kind of causal probability reasoning that should be used.) The answer given here could be expanded upon in terms of developing an understanding of the agent and observer distinction, and in terms of just what kind of causal probability theory should be used.

Code:

UnicodeString NewcombProblem()
{
int trialsCount = 100000;

enum
{
DefiantDominance, // i.e., likes to use dominance reasoning
DefiantExpectedUtilitarian
}playerType;

// setup which type of player for this trial
// time = 1;

//playerType = DefiantExpectedUtilitarian;
playerType = DefiantDominance;
double totalPlayerAmount = 0.0;
double totalPlayerAmountLost = 0.0;
double totalAmountAtStake = 0.0;
int timesPredictorCorrect = 0;

for (int trialIdx=0; trialIdx<trialsCount; trialIdx++)
{
// Predictor makes his decision
// time = 2;

bool millionInBoxB = false;
if (playerType == DefiantExpectedUtilitarian)
millionInBoxB = true;

// player makes their decision
// time = 3;

double chancePicksBoth = playerType == DefiantDominance ? 99 : 1;

// now results …
// time = 4;

bool picksBoth = THOccurs (chancePicksBoth);

// now tabulate return, if !millionInBoxB and !picksBoth, gets $0
if (millionInBoxB)
totalPlayerAmount += 1000000.0;
if (picksBoth)
totalPlayerAmount + = 1000.0; // box A always has $1,000

totalAmountAtStake += 1000.0;
if (millionInBoxB)
totalAmountAtStake + = 1000000.0;

if (!picksBoth)
totalPlayerAmountLost + = 1000.0;

if (picksBoth && !millionInBoxB)
timesPredictorCorrect++;
if (!picksBoth && millionInBoxB)
timesPredictorCorrect++;
}

double averageAmount = totalPlayerAmount/(double)trialsCount;
double percentagePredictorCorrect = (double)timesPredictorCorrect/(double)trialsCount*100.0;

UnicodeString s = “Trials: “;
s += trialsCount;
s += “, “;
s += playerType == DefiantDominance ? “DefiantDominance” : “DefiantExpectedUtilitarian”;
s += ” – Average amount: “;
s += averageAmount;
s += “, Total amount lost because didn’t use dominance reasoning at moment of choice: “;
s += totalPlayerAmountLost;
s += “, Total amount at stake (environmental richness): “;
s += totalAmountAtStake;
s += “, Percentage Predictor correct: “;
s += percentagePredictorCorrect;
s += “%”;
return s;
}
//—————————————————————————

Cause and compliance

If someone asks: “For whom are you going to vote?” and one answers “I’m not going to vote, as my vote won’t make a difference to who’s elected,” one might frequently hear the response “Yes, but if everyone thought that way …” This appears beside the point: one’s voting or not won’t cause everyone to think that way, and so it still won’t make a difference.

This conclusion, however, is a little beside the point. The situation is not really about whether one’s actions are the difference that can make the difference, although that’s often how it’s consciously couched. Rather, what is occurring is an example of a compliance mechanism for results requiring coordinated behaviour.

This compliance mechanism cuts across a large class of actions: every ’cause’ where one’s actions aren’t going to make a relevant difference (most voting, recycling, foreign assistance, fighting in a war, and so on). In each of these situations, one can respond to requests for an action with the same response as the voting example above: my action won’t make a relevant difference.

This compliance mechanism is actually two major compliance mechanisms:

1. A social cost for non-compliance, or gain for compliance. So, people won’t invite you to their cocktail parties, will be angry with you, and so on, or will invite you to their cocktail parties, be nice to you, and so on. This mechanism operates through other people’s reactions to your (non-)compliance. Knowing this increases compliance.

2. A personal gain for compliance (one feels good about doing something to help a cause, even knowing the action isn’t the difference that makes difference). Knowing one will feel good about taking action increases one’s chance of compliance.

I think this compliance mechanism is properly biological, i.e., wired into the human psyche, at least in some cases – we can posit that this psychological mechanism was adopted for an evolutionary reason (groups with it tended to prosper). Regardless, the point here is that it is fundamental, not an arbitrary aspect of over-zealous people in one’s society. Therefore, the dismissive response to voting in the example given at the start is technically correct, but misses the thrust of what’s occurring – a biologically founded compliance mechanism for achieving results requiring coordinated humans’ actions.

In terms of a straightforward hedonic theory of utility, game theory would have to include these ‘payouts’ (1. and 2. above) to arrive at a more accurate picture of the rationality of a given course of action in a social context.

Also see here.

Non-monetized economies

Non-monetized economies are more important than monetized economies. When economists or game theorists ignore this, they make mistakes in their analysis of rationality.

A monetized economy is the sum of the transactions where money is exchanged. The money is a simple, quantified representation of an estimate of value. Person r pays $y for thing x, where statistically there is some association between the price y and the value of thing x. This makes it convenient to track estimated value.

The non-monetized economy includes everything from parent-child interactions to having a dinner party to gazing at the starry sky. In all these cases, there is creation of value, but there isn’t as simple a way to quantify an estimate of it. Yet, these non-monetized transactions or activities form the majority of what is valuable in a person’s life.

Consider the Ultimatum Game (see here). Part of the tension in the analysis is resolved once one realizes that the non-monetized payoffs can be much more important than the monetized payoff, and so therefore seemingly irrational behaviour can be rational (and, vice versa, seemingly rational behaviour when only looking at the monetized payoffs becomes quite irrational when looking at both monetized and non-monetized payoffs).

The problem with monetized economics is that it is radically incomplete. When people worry about monetized economics, they are worrying about something that is a part of the overall economy.

Reclaiming Rationality and the Ultimatum Game

The Ultimatum Game (here) is as follows:

There are two players, and an amount of money they can divide. The first player proposes how to divide the sum between the two players, and the second player can either accept or reject this proposal. If the second player rejects, neither player receives anything. If the second player accepts, the money is split according to the proposal. The game is played only once so that reciprocation is not an issue.

Let’s say the amount if $100. If player-1 offers player-2 a payout of $99 for player-1 and $1 for player-2, is it irrational for player-2 to reject the offer?

The reasoning ‘yes’ usually goes as follows: if player-2 accepts, he gets $1. If he rejects, he gets $0. $1 > $0. Therefore, it is rational for him to take the $1.

The obvious problem is: the payout includes emotions, which are not included in the explicit analysis of the payouts of the game above. Rejecting an offer such as 1% of the total sum in the case above may lead to a feeling of retribution, for example. A feeling of retribution is a good. This is not included in the calculus of the payouts above.

Rationality comes from ratio, which involves comparing two things. In this case, the relevant ratio is “utility if accept” : “utility if don’t accept”. Emotion can impede an accurate assessment of the ratio, but here the feeling of retribution is part of the value on one side of the ratio. It is no longer $1 : $0, but $1 + emotional payout : $0 + emotional payout. It is easy to imagine how the right side of the ratio might become a lot bigger than the left side.

The bigger question I have, then, is: is a player being irrational if he accepts the offer? Based on the various payouts not included in the explicit description, my guess is yes, he is being irrational in accepting the specific kind of offer above. Not only is it not irrational to reject the $1, but he will be much better off rejecting the offer.

The above is about emotions in the payout, but what about being emotional while assessing the payout? Although being emotional when assessing payouts can interfere with one’s assessment, emotion can also inform one’s assessment, as emotion is basically a mechanism we have to synthesize (large amounts of) information, that would be intractable for the ‘rational’ part of the brain to work through, or that we have gained (presumably) through an evolutionary process. A feeling of retribution is telling us important information, in this case about what is the appropriate response to such a proposed sharing of resources in a more natural situation. Although in this situation the feeling of anger at such an offer may be outmoded, regardless the feeling of retribution one gains from spurning the offer is still a good in itself.

Why do we have a feeling of retribution, or other emotional aspects of a payout either way? Humans are social animals, so fair sharing is an important part of surviving. Emotions related to that are an important part of survival. We can’t turn the emotions on or off at will. Therefore, they need be included in an analysis of the payouts from games like these. Why is money important? Because it brings goods that are usually emotional, such as feeling good when drinking a coffee, or feeling good because one has higher financial status, and so on.

Reclaiming rationality means thinking more about emotion in payouts and assessment.

Thanks to Sacha for bringing the Ultimatum Game to my attention.