Prisoner's Dilemma

From EvoLudo


The Prisoner's Dilemma, now well established as the predominant metaphor for studying cooperative interactions, made its first appearance in an experimental bargaining setup designed by Melvin Dresher and Merill M. Flood at the time also working at the RAND Corporation. Their aim was to illustrate that the Nash equilibrium is not necessarily the outcome realized in experimental and real world settings. Only some time later the game got its name through an anecdotal story attributed to Albert W. Tucker, the PhD supervisor of John F. Nash:

Two suspects are captured near the scene of a burglary. Each has to choose whether or not to confess and implicate the other. If neither confesses, then both will serve one year on a charge of carrying a concealed weapon. If each confesses and implicates the other, both will go to prison for 10 years. However, if one burglar confesses and implicates the other, and the other burglar does not confess, the one who has collaborated with the police will go free, while the other burglar will go to prison for 20 years on the maximum charge.

What should the suspects do? According to rational reasoning a suspect should confess and implicate the other because by doing so he is better off no matter what the other suspect does. If the other does not confess our suspect goes free instead of a sentence of 10 years and if the other does confess our suspect goes to prison for 10 years as opposed to 20 years had he not implicated the other. Since the charges are symmetrical, the rational reasoning is identical for both suspects. Thus, both will confess and live behind bars for 10 years instead of just a single year had they refused to give evidence - hence the dilemma. Quite fortunately for animal and human societies this is not what happens in general. Cooperative behavior does have a chance but it delicately depends on the circumstances.

A subsequent milestone and the foundation of the game's current fame represent Robert Axelrod's famous computer tournaments where submitted strategies competed in iterated Prisoner’s Dilemma interactions. Axelrod rephrased the game in a more intuitive context to describe cooperative interactions and since has become the standard formulation. The outcomes of a game with two players each having two behavioral options - to cooperate or to defect - are easily summarized in a 2×2 payoff matrix. Since the game is symmetrical, only the payoffs for the column player are shown.

Column player
Row player [math]\displaystyle{ \begin{matrix}&C&D\\C&R&S\\D&T&P\end{matrix} }[/math]

(a) general payoff matrix

Column player
Row player [math]\displaystyle{ \begin{matrix}&C&D\\C&3&0\\D&5&1\end{matrix} }[/math]

(b) Axelrod's payoffs

Column player
Row player [math]\displaystyle{ \begin{matrix}&C&D\\C&b-c&-c\\D&b&0\end{matrix} }[/math]

(c) payoffs in terms of cost & benefits

Rational Strategy

Evolutionary Dynamics


In the Prisoner's Dilemma two players simultaneously decide cooperate or to defect. Cooperation results in a benefit [math]\displaystyle{ b }[/math] to the recipient but incurs costs [math]\displaystyle{ c }[/math] to the donor ([math]\displaystyle{ b \gt c \gt 0 }[/math]). Thus, mutual cooperation pays a net benefit of [math]\displaystyle{ R = b - c }[/math] whereas mutual defection results in [math]\displaystyle{ P = 0 }[/math]. However, unilateral defection yields the highest payoff [math]\displaystyle{ T = b }[/math] and the cooperator has to bear the costs [math]\displaystyle{ S = -c }[/math]. It immediately follows that it is best to defect regardless of the opponents decision. For this reason defection is the evolutionarily stable strategy even though all individuals would be better of if all would cooperate ([math]\displaystyle{ R \gt P }[/math]). The characteristics of the Prisoner's Dilemma is determined by the rank ordering of the four payoff values: [math]\displaystyle{ T \gt R \gt P \gt S }[/math].

Whenever the success of a strategy depends on its relative payoff, i.e. on the difference in performance as compared to other contestants, the payoffs [math]\displaystyle{ T, R, S }[/math] and [math]\displaystyle{ P }[/math] can be rescaled without loss of generality such that [math]\displaystyle{ R = 1 }[/math] and [math]\displaystyle{ P = 0 }[/math] holds. In terms of costs [math]\displaystyle{ c }[/math] and benefits [math]\displaystyle{ b }[/math] this yields [math]\displaystyle{ T = 1 + r }[/math] and [math]\displaystyle{ S = -r }[/math] where [math]\displaystyle{ r }[/math] denotes the cost-to-benefit ratio [math]\displaystyle{ r = c / (b - c) }[/math] of mutual cooperation. This convenient mapping reduces the number of parameters from four to one and ensures that the required payoff ranking is always preserved.