Abstract
This paper presents karma mechanisms, a novel approach to the repeated allocation of a scarce resource among competing agents over an infinite time. Examples include deciding which ride hailing trip requests to serve during peak demand, granting the right of way in intersections or lane mergers, or admitting internet content to a regulated fast channel. We study a simplified yet insightful formulation of these problems where at every instant two agents from a large population get randomly matched to compete over the resource. The intuitive interpretation of a karma mechanism is “If I give in now, I will be rewarded in the future.” Agents compete in an auction-like setting where they bid units of karma, which circulates directly among them and is self-contained in the system. We demonstrate that this allows a society of self-interested agents to achieve high levels of efficiency without resorting to a (possibly problematic) monetary pricing of the resource. We model karma mechanisms as dynamic population games and guarantee the existence of a stationary Nash equilibrium. We then analyze the performance at the stationary Nash equilibrium numerically. For the case of homogeneous agents, we compare different mechanism design choices, showing that it is possible to achieve an efficient and ex-post fair allocation when the agents are future aware. Finally, we test the robustness against agent heterogeneity and propose remedies to some of the observed phenomena via karma redistribution.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The scarcity of resources is one of the modern day society’s most prominent challenges. With a strong population growth and shift to urbanization, our finite natural and infrastructure resources are seeing unprecedented levels of stress. The need to devise fair and efficient means of access to these resources is now more eminent than ever.
In this paper, we study a class of dynamic resource allocation problems in which an indivisible resource is repeatedly contested between two anonymous users who are randomly drawn from a large population. Figure 1 demonstrates three motivating examples for this class of resource competitions.
-
(a)
Due to excessive demand, only one of two trip requests from ride-hailing riders can be served by the closest ride-hailing driver. Which rider should be served?
-
(b)
Two autonomous vehicles (AVs) meet at an unsignalled intersection. Which AV should go first?
-
(c)
To improve quality of service for critical content, an internet service provider (ISP) splits its bandwidth into a high capacity fast channel and a low capacity slow channel, and dedicates the fast channel to half of the total traffic. Two internet content providers (ICPs) simultaneously request service. Which ICP should be granted to the fast channel?
These examples share in common that the resource (the ride-hailing driver, the intersection, or the fast channel) is repeatedly contested over a long time horizon by anonymous users in a large population (the ride-hailing riders, the AVs, or the ICPs), who could have time-varying private needs for accessing the resource. It is important to note that the reality of these examples is complex, and they each deserve a separate treatment. Nonetheless, the level of abstraction chosen in this paper serves to highlight the fundamental trade-offs that arise in this class of problems, as well as to ease the presentation of the novel analysis tools developed to study them. We adopt a level of generality in these tools that allows them to be readily specified to more complex settings.
One can draw inspiration from how small communities sometimes manage to self-organize the access to common resources [41]. Typically these communities devise systems that facilitate a form of fair ‘giving and taking,’ i.e., systems where the users take turns in accessing the resource (see, e.g., the system devised by local fishermen to manage the inshore fishery in Alanya, Turkey [3, 41]). This work is an attempt to systematize the ‘give and take’ such that it can be applied unambiguously in a large-scale system and builds upon the exploratory concept proposed in [13]. The main enabler is karma, a counter that encodes the history of ‘giving and taking’ of the users. Loosely inspired by the notion of karma in Indian tradition [39], each user is endowed with karma points which increase when the user yields the resource, and decrease when the user accesses the resource. A user with a high level of karma was likely yielding in the past and gets an advantage in receiving the resource now. In turn, the disfavored user that yields now gets compensated in karma which will give them an advantage in the future.
In its simplest form, such a karma mechanism is an effective mean to facilitate turn-taking on the large scale. But a karma mechanism can do more than simple turn-taking. If users are given the option to choose how much karma to use now, e.g., through an auction-like karma bidding scheme, the karma becomes also a means to express private temporal preferences. For example, a user may decide to yield (and gain karma) when its urgency is low, in anticipation to the situation where accessing the resource is time-critical. A karma mechanism can hence facilitate the allocation of the resource to whoever needs it most, i.e., the maximization of resource allocation efficiency.
A classical device that is used to express preferences and facilitate access to resources is money. The ride-hailing platform can raise the trip price until only one of the contesting riders is willing to pay for it—a common practice referred to as surge pricing [10, 12]. This practice has seen some public criticism due to a lack of transparency and a tendency to raise prices in a manner that is deemed unfair [15, 51]. While in principle it is meant to allocate trips to the most needy riders, in practice the trips are allocated simply to those who can afford them. In the transportation domain, decades of research on the use of monetary road pricing policies has been faced with little public enthusiasm due to concerns for equitable access to the roads [7, 59, 60]. A popular remedy is the use of tradable credits, which are periodically issued road access tokens that are allowed to be traded in a monetary market. While these schemes ensure that the yielding users can at least sell their credits and receive (monetary) compensation, they neglect the fact that wealthy users (i.e., those who have high ‘value of time’ [5]) persist to have a systematic advantage in accessing the resource [59]. Finally, the topic of net neutrality has seen widespread public debate in recent years, with strong concerns that the internet will lose its integrity as an open and free resource if ISPs charge ICPs differently [6, 30, 37, 42]. In all of the above debates, the potential existence of a simple and efficient non-monetary solution seems to be overlooked.
Karma shares similarities with money in how it acts as a token of exchange, but has the distinguishing feature of being acquired from fair exchanges that are relevant to the resource allocation problem at hand. One need not worry about matters of wealth inequality, or rely on the assumption that money is a universal, objective measure of value, since karma is only acquired from the process of yielding the resource to another user. Karma hence facilitates the design of a purpose-built, self-contained economy for the resource allocation task. Like in monetary economies, the karma economy can be tuned to achieve different fairness and efficiency objectives in a manner that is targeted to the specific resource allocation task, through the design of karma payment rules, the redistribution of karma, and other techniques that we explore here.
Karma mechanisms promise to be efficient and fair, but these unconventional mechanisms require a novel analysis. A difficulty in the analysis arises due to the lack of reliance on an extrinsic measure of value. Karma does not have value a-priori and is never used directly in the cost functions of the users. The value of karma instead arises from how it facilitates access to the resource, and how users need to ration its use in order to cover their future resource access demands. This makes the behavior of rational users under the karma mechanism, and the resulting social welfare, difficult to predict, and requires the formulation of non-trivial dynamic games played in large populations. In this work, and in comparison to [13], we develop a tractable and rigorous game-theoretic model to study karma mechanisms that is built on top of the class of dynamic population games [17], and prove the existence of a suitable notion of equilibrium, the stationary Nash equilibrium. We then utilize these technical tools to numerically investigate the strategic behaviors that emerge under the karma mechanisms, their consequences for the social welfare, as well as how the mechanisms can be tuned to achieve different resource allocation objectives.
1.1 Related Works
1.1.1 Repeated Games
The celebrated folk theorem [19, 21] asserts that any individually rational outcomeFootnote 1 of a finite-player single-stage game can be sustained in a Nash equilibrium of the infinitely repeated game, provided that the players are sufficiently future aware. The constructive proofs of this classical result and many of its extensions rely on the notion of switch strategies, in which the players initially agree on the socially desirable set of actions to play. In case a player deviates from the agreed action, all other players effectively punish the deviator by switching to a set of actions that make the deviator worst off. This requires the ability to both detect a deviation and identify the deviator.
Several extensions of the folk theorem consider when the actions of others are not perfectly observable, thereby posing a difficulty in identifying deviators. These include when the players observe a common public outcome [22] or when they only observe private outcomes [20]. These works impose identifiability conditions on the stage game which essentially guarantee that each player can identify the history of others’ actions from the observable outcomes. Another extension of the folk theorem is to stochastic games [16] where the stage games are time-varying and depend on the previous actions of the players. The difficulty in this setting is that deviations are not only immediately beneficial, but could also take subsequent games into a regime that is profitable to the deviator on the long run. The authors impose conditions on the game which essentially guarantee that the long run cost of punishment outweighs the long run benefit of deviation.
All of these works consider that every stage game is played by the same finite set of players. A more related setting is when two players in a large population are randomly matched in each stage, for which a folk theorem is shown in [40]. Each player is associated with as social status state that is observable by others. Deviators are punished by changing their status from good to bad, which all future matched players observe and punish for.
Our setting differs fundamentally from the above cited works. We consider that the players have time-varying private preferences, namely their urgency to acquire a contested resource. In contrast to folk theorem results with private information [20], the privacy is with respect to the payoffs of the opponents, rather than the actions they play. In contrast to folk theorem results for stochastic games [16], the time-varying nature is with respect to the private player preferences rather than a fully observable game state. In the context of a folk theorem, the socially desirable set of actions in our setting is for the players to report their urgency truthfully such that the resource is allocated to the highest urgency player. It is not obvious how deviation from truthfulness can be detected in the first place. In principle, if the time-varying urgency process is public and the game is played with the same finite set of players, non-truthfulness can be detected on the long run by correlating the history of each player’s reports to the expected history [32]. But in a large population setting where maintaining explicit histories is infeasible, we interpret karma as an extension of the social status state in [40] to handle private preferences, essentially placing a budget on how often players can declare high urgency.
1.1.2 Mechanism Design Without Money
The famous Gibbard–Satterthwaite impossibility theorem [24, 48] poses a fundamental challenge in the design of resource allocation mechanisms. It asserts that when there are three or more alternative allocations, it is impossible to design a strategy-proof mechanism that is non-dictatorial when the domain of preferences is unrestricted. One avenue to escape this impossibility is in the use of money, which imposes structure on the preferences of the users by measuring them against the objective monetary yardstick. The problem of designing monetary mechanisms is well studied, with positive results including the Vickrey–Clarke–Groves (VCG) mechanism [14, 28, 57], a general mechanism that is well known to be strategy-proof and lead to efficient allocations. On the other hand, the design of mechanisms without money [49] is in general more difficult due to the lack of a general instrument that can be used to align incentives. Some successes include the cases when preferences are single-peaked [36], when each user has one item to trade [50], and when matching users pairwise in a bipartite graph [23], which all leverage specific structures in the preferences of the users that are difficult to generalize. When users must express preferences over many alternatives, a general approach is the pseudo-market pioneered by [31] and famously adopted in the context of allocating course seats in business schools [8, 9, 52]. In a pseudo-market, users are given a finite budget of tokens to distribute over the alternatives, whose prices (in tokens) are set/discovered to clear the market (i.e., allocate the correct amount of resources to the correct amount of users). However, pseudo-markets only promise to be Pareto-efficient and are also not strategy-proof (although strategizing becomes difficult when there are many users and alternatives) [31]. It is noteworthy to mention that in our motivating examples, any allocation of the contested resource is Pareto-efficient.
The aforementioned difficulty stems from the fact that the classical mechanism design problem is concerned with a static or one-shot allocation of goods. On the other hand, when the allocation is dynamic or repeating over time, new opportunities for the design of strategy-proof and efficient mechanisms present themselves. On a conceptual level, just as how money can be used to incentivize truthful behavior (or punish non-truthfulness), a similar incentive can be achieved through a promise of future service (or denial thereof). Despite of this intuitive notion, the role that repetition could play in mechanism design has only been recently noticed, and the literature on mechanism design for dynamic resource allocation is sparse. A few recent works build upon the notion of “promised utilities,” pioneered in the context of contract design in repeated relationships [54]. These include [29], which develops an incentive compatible mechanism for the case when a single principal repeatedly allocates a single good to a single user, and [1], which extends this approach to the case when the single good is repeatedly allocated to one of the same contesting users. The working principle of these works is to find a set of future utilities that the principal promises to the user(s) as a function of their reported immediate utility, in a manner that incentivizes truthful reporting, and while ensuring that the principal can recursively keep these future promises. This is based on the assumption that the principal has the power to commit to the promised utilities, without specifying the exact mechanism to do so. Similarly, karma is a device that encodes future promises; the higher a user’s karma the more favorable its future position will be. But with karma, these promises need not be made explicit, and a single principal need not be held accountable for them. Instead, the future value of karma arises in a decentralized and natural manner as the users strategically ration its use. The promise of future utility is made by the population as a whole by attributing the right of access to future resources to karma.
In other related works, [53] leverage the high likelihood of kidney transplant failures to incentivize participation in the kidney exchange by providing a priority for re-transplant to participants. [34] similarly incentivize participation in the kidney exchange by issuing participants a voucher for re-transplant that is also redeemable by their offspring. We consider our karma mechanisms to be complimentary to these works, offering a simple and intuitive alternative that has the potential to scale to large systems and across multiple applications.
1.1.3 Artificial Currency Mechanisms
A special class of mechanisms without money, which is perhaps the most related to our karma mechanisms, is the so-called artificial currency or scrip mechanisms. These mechanisms have been proposed in multiple isolated application instances since the early 2000’s. [25] propose a “point system” to address the problem of free-riding in peer-to-peer networks, where agents tend to download many more files than they upload. Similar works in the domain of peer-to-peer networks include [58], who specifically call their point system “karma” and focus on its cryptographic implementation rather than the design of the mechanism itself; and [18], who do incorporate elements of mechanism design but focus solely on the choice of a single parameter, which is the total amount of karma in their specific model. In the domain of transportation, [46] recently demonstrated how an artificial currency (also called “karma”) can be utilized instead of monetary tolls to achieve optimal routing in a two arc road network. To the extent of our knowledge, the only concrete example of a real-life implementation of a karma-like concept is the “choice system” for the allocation of food donations to food banks in the USA [43]. There, food banks are allocated “shares” which they use to bid on the food donations they need. It is considered to be a major success as evidenced by the active participation of food banks in the system as well as the unprecedented fluidity of food donations it resulted in.
All of the above works share in common the need for a non-monetary medium of exchange to coordinate the use of shared resources. However, there is an apparent lack of unity in the approaches taken, with most works proposing a problem-tailored, heuristic mechanism with little rigorous justification and scope for generalization. In [43], a model is presented that makes many simplifying assumptions on the strategic behavior of the users. This model does not truly capture the dynamic nature of the optimization problem of the users, who must ration their use of shares now to secure their future needs. In [46], a game-theoretic equilibrium is considered in which individual users solve a finite horizon dynamic optimization, but importantly, the amount of karma saved at the end of the horizon is treated as an exogenous parameter. A few other works that attempt to systematically study artificial currency mechanisms include [26, 33]. [33] studies a setting where a pool of users alternate between requesting and providing services to each other (e.g., a pool of parents exchanging baby-sitting services). This differs from our work in the following fundamental aspects. First, it does not give the users the flexibility to express intensity of preferences through a bidding procedure. Second, the equilibrium notion considered relies on remembering if certain users denied providing service before and punishing those users by never granting them service again. As discussed in Sect. 1.1.1, retaliation schemes are crucially based on the capability of detecting defection, which in our setting cannot be done without knowing the private preference of the agents. [26] provides a general method to convert truthful monetary to non-monetary mechanisms, which relies on a central planner estimating how much money each user would spend in a finitely repeated monetary auction and giving the user a similar amount of artificial currency at the beginning of the horizon. This requires central knowledge of the players’ private preferences and does not capture the important dynamic feedback process of gaining currency through yielding. In contrast to these works, our approach shows that a robust and ultimately efficient behavior emerges only from the dynamic strategic problem faced by the users of the karma mechanism, without additional rules or coordination mechanisms. To the extent of our knowledge, there are no other works that study the strategic behavior in artificial currency mechanisms at this level of generality. We believe that this is fundamental for the understanding of these mechanisms and serves as an important tool for the mechanism design.
1.2 Organization of the Paper
Section 2 introduces the setting of dynamic resource allocation and the concept of karma mechanisms. In Sect. 3, we model karma mechanisms as dynamic population games and show that a stationary Nash equilibrium is guaranteed to exist. Section 4 focuses specifically on how different karma payment and redistribution rules can be incorporated in the model. The model is utilized in a numerical investigation of karma mechanisms in Sect. 5, where we provide insights on the emerging strategic behavior as well as the consequences of the karma mechanism design on the achieved efficiency and fairness of the resource allocation.
1.3 Notation
Let D be a discrete set and C be a continuous set. Let \(a,d \in D\) and \(c \in C\). For a function \(f: D \times C \rightarrow {{\mathbb {R}}}\), we distinguish the discrete and continuous arguments through the notation f[d](c). Alternatively, we write \(f: C \rightarrow {{\mathbb {R}}}^{|D |}\) as the vector-valued function f(c), with f[d](c) denoting its \(d^th \) element. Similarly, \(g[a \mid d](c)\) denotes the conditional probability of a given d and c. Specifically, \(g[d^+ \mid d](c)\) denotes one-step transition probabilities for d. We denote by \(\Delta (D):=\left\{ p \in {{\mathbb {R}}}_+^{|D |} |\sum _{d \in D} p[d] = 1 \right\} \) the set of probability distributions over the elements of D. For a probability distribution \(p \in \Delta (D)\), p[d] denotes the probability of element d. When considering heterogeneous agent types, we denote by \(x_\tau \) a quantity associated to type \(\tau \).
2 Karma Mechanisms for Dynamic Resource Allocation
We consider a population of agents \({{\mathcal {N}}}= \{1,\dots ,N\}\), where the number of agents N is typically large. For example, \({{\mathcal {N}}}\) is the set of ride-hailing platform riders in the metropolitan area of interest.
At discrete global time instants \(t \in {{\mathbb {N}}}\), two random agents from the population (denoted by \({{\mathcal {C}}}[t] \subset {{\mathcal {N}}}\)) compete for a scarce, indivisible resource, such as the closest ride-hailing driver. We are concerned with designing a mechanism that, at each interaction time t, selects one of the two agents in \({{\mathcal {C}}}[t]\) to allocate the resource to (i.e., grant the trip request).
A karma mechanism works as follows. Each agent \(l \in {{\mathcal {N}}}\) in the population is endowed with a non-negative integer counter \(k^l[t] \in {{\mathbb {N}}}\), called karma, which is private to the agent. Moreover, an additional surplus karma counter \(k^s [t] \in {{\mathbb {N}}}\) exists in the system.
At each interaction time t, each agent \(i \in {{\mathcal {C}}}[t]\) involved in the resource competition submits a sealed non-negative integer bid \(b^i[t] \in \{0,\dots ,k^i[t]\}\), which is bounded by the agent’s karma. The outcome of the interaction is determined by a resource allocation rule and a payment rule. The resource allocation rule decides which of the two competing agents is selected to receive the contended resource.
It is natural to consider a resource allocation rule that allocates the resource to the agent with the highest bid. A tie-breaking rule is needed when both bids coincide, and we use a fair coin toss for this purpose.
The karma payment rule determines the karma payments of the two competing agents.
Note that the yielding agent makes a non-positive payment (i.e., it receives karma). As a consequence of this payment rule, at each interaction time t, the karma counters are updated as follows:
Examples of karma payment rules are presented in Sect. 2.1. The surplus karma \(k^s [t]\) is meant to keep track of any excess karma payment in the interaction, in case \(p^{i^*}[t]+ p^{-i^*}[t]\ne 0\), such that this excess gets redistributed to the population agents. This redistribution occurs at time instants \(t^r \) in accordance with a karma redistribution rule.
As a consequence of this redistribution rule, at each redistribution time \(t^r \) the karma counters are updated as follows:
A considerable freedom in the design of the karma payment and redistribution rules is possible. Hereafter, we present some possible examples, and in Sect. 5 we demonstrate how the choice of these rules affects the strategic behavior of the agents and allows the system designer to achieve different resource allocation objectives.
2.1 Examples of Karma Payment Rules
\(\texttt {PBP}\) is an example of completely peer-to-peer karma payment rules. It has the advantage of not requiring the system-level surplus karma counter \(k^s [t]\) or any system-wide karma redistribution.
In contrast to \(\texttt {PBP}\), \(\texttt {PBS}\) is an example of a karma payment rule in which surplus karma is generated and needs to be redistributed. We will demonstrate in the numerical analysis in Sect. 5 that such a redistributive scheme can lead to higher levels of efficiency and fairness of the resource allocation.
2.2 Examples of Karma Redistribution Rules
There are a plethora of methods to redistribute the surplus karma to the agents in the population, as the designer has the freedom to decide both the redistribution times \(t^r \) and the redistribution rule. Nevertheless, we assume that redistribution rules are intended to completely redistribute the surplus karma so that most of the karma in the system is held by the agents, i.e., the total karma held by the agents is a preserved quantity. We will formalize this assumption in the analysis in Sect. 4, where we will effectively assume that the surplus is kept at zero (either by not generating surplus or by redistributing it immediately).
One possibility is for the time instants \(t^r \) to be periodic events in which the redistribution occurs (e.g., every day at midnight). If \(k^s [t^r ]\) is not an integer multiple of the number of agents N, then a remainder is left to be redistributed in the next period. Another possibility is for the redistribution to occur asynchronously whenever \(k^s [t]\) exceeds N, so that a unit of karma per agent can be transferred. Finally, a non-uniform (possibly randomized) redistribution is possible.
Beyond these examples, we will not detail here the specifics of the karma redistribution rule. However, it is interesting to notice that redistributions need not to be limited to positive karma values. For example, it is possible to reduce the karma of each agent according to a non-decreasing “tax” function \(0 \le h[k] \le k\), in order to create surplus (which will then be redistributed). We will briefly discuss the consequences of such a design in the numerical analysis in Sect. 5.
3 A Game-Theoretic Model for Karma Mechanisms
In this section, we develop a game-theoretic model to facilitate the analysis of karma mechanisms. The goal of the model is to address the following points:
-
1.
Karma mechanisms induce a strategic scenario in which the agents must strategically choose their karma bids. The model will serve to demonstrate that this strategic scenario is well-posed by showing the existence of a suitable notion of equilibrium (the stationary Nash equilibrium, see Sects. 3.2, 3.3).
-
2.
The model enables a computational tool to compute stationary best-response behavior of the agents. This tool can be used by the agents to derive their optimal bidding strategies.
-
3.
The specifics of the karma mechanism (e.g., the karma payment and redistribution rules) affect the strategic behavior of the agents, which in turn affects population-level design objectives, in non-trivial ways. The model will serve as an important tool to make mechanism design choices, as demonstrated in the numerical analysis in Sect. 5.
The game played under the karma mechanism has a number of complicating features: it is an infinite dynamic game involving a large number of anonymous agents who have private states which depend on their past actions and, in turn, affect their future available actions. For this purpose, we build our model on the class of dynamic population games, following the formalism of [17].
3.1 The Karma Dynamic Population Game
3.1.1 Population Model
We consider that the number of agents N is large such that they approximately form a continuum of mass. This is reasonable to consider for many of our envisioned applications, e.g., the number of ride-hailing riders in a metropolitan area is typically large. We take the point of view of an ego agent playing against the population from which a random anonymous opponent is uniformly drawn in every resource competition instance. Let i be the identity of the ego agent, and \(t^i\) be the time instants at which the ego agent is involved in the resource competition, i.e., \(i \in {{\mathcal {C}}}[t^i]\) for all \(t^i\). These are the only time instants of relevance to the ego agent, and therefore we model its state dynamics as a discrete-time Markov chain, with the discrete update events occurring at the global times \(t^i\). To simplify notation, we will drop the explicit time dependency since the only time instances of interest are a current time of the ego agent \(t^i\) and a next time of the ego agent \(t^{i+} = \min \{t > t^i \mid i \in {{\mathcal {C}}}[t]\}\). We will also drop the superscript i since all quantities belong to the ego agent, unless explicitly stated otherwise. For example, we write k instead of \(k^i[t^i]\) to denote the ego agent’s current karma, and \(k^+\) instead of \(k^i[t^{i+}]\) to denote its next karma.
3.1.2 Agents’ Type and State
Each ego agent has a private static type \(\tau \in {{\mathcal {T}}}= \{\tau _1,\dots ,\tau _{n_\tau }\}\). The distribution of the agents’ types in the population is specified by the parameter \(g \in \Delta ({{\mathcal {T}}})\), with \(g_\tau \) denoting the fraction of agents belonging to type \(\tau \).
Moreover, each ego agent has a private time-varying state x which consists of an urgency state u and the karma k of the agent, i.e.,
The urgency state represents a private valuation for the resource and takes one of the values in the discrete and finite set \({{\mathcal {U}}}\). It therefore corresponds to the cost incurred by the agent when they cannot procure the resource. For example, each value in \({{\mathcal {U}}}\) could correspond to different classes of trips and how important it is that the agent secures a ride-hail for those trips. The urgency at consecutive resource competition instances of an ego agent of type \(\tau \) follows an exogenous, irreducible Markov chain process with transition probabilities denoted by
This process allows to model different assumptions on the temporal preferences of the agent. For example, a static urgency process models that the agent has an equal need for the resource at all times. Alternatively, the process can encode that the agent experiences exogenous events of high urgency where its inconvenience for failing to acquire the resource is elevated (e.g., the cost of failing to secure a ride-hailing trip as a function of the trip length or purpose). While it is possible to extend the analysis to the case where the agent’s next urgency is affected by the local outcome of the interaction (e.g., failing to secure a ride-hailing trip today leads to higher urgency tomorrow due to accumulated delay), we insist on considering the urgency to be a function of exogenous events (e.g., whether the agent must catch a flight today). Moreover, we assume that the urgency processes of different agents are statistically independent.
3.1.3 Social State
The joint distribution of the agents’ types and states in the population is given by
where \(d_\tau [u,k]\) denotes the fraction of agents in type-state \([\tau , u,k]\). The type-state distribution d is a time-varying quantity whose dynamics evolve in terms of the global times rather than the specific time instants of the ego agent. However, we will be looking for conditions where this distribution is stationary and therefore this difference is inconsequential.
The action of the ego agent is a non-negative integer bid which is limited by its karma
The ego agent of type \(\tau \) chooses its bid according to the homogeneous policy of its type
which maps its state [u, k] to a probability distribution over the bids b. We denote by \(\pi _\tau [b \mid u,k]\) the probability of bidding b when the agent of type \(\tau \) is in state [u, k]. The concatenation of the policies of all types \(\pi =(\pi _{\tau _1},\dots ,\pi _{\tau _{n_\tau }})\) is simply referred to as the policy. The set of policies is denoted by \(\Pi \).
The pair \((d,\pi ) \in {{\mathcal {D}}}\times \Pi \) is referred to as the social state,Footnote 2 as it gives a macroscopic description of the distribution of the agents’ states, as well as how they behave.
In order to characterize the Markov decision process that the ego agent faces, we will now turn to define an immediate reward function \({\zeta }_\tau [u,k,b](d,\pi )\) and a state transition function \({\rho }_\tau [u^+,k^+ \mid u,k,b](d,\pi )\). When the ego agent of type \(\tau \) is in state [u, k] in the current resource competition instance and it bids b, \({\zeta }_\tau [u,k,b](d,\pi )\) gives its expected immediate reward, and \({\rho }_\tau [u^+,k^+ \mid u,k,b](d,\pi )\) gives the probability that, at its next resource competition, its state is \([u^+,k^+]\). Both the immediate reward and the state transition are functions of the social state \((d,\pi )\).
3.1.4 Immediate Reward Function
Since the ego agent gets matched with a random opponent from the population, an important quantity is the distribution of other agents’ bids, which can be readily derived from the social state as
where \(b'\) (similarly, \(\tau '\), \(u'\), \(k'\)) denotes that these quantities belong to agents other than the ego agent. On a fundamental level, the ego agent is playing a game against this distribution, since it determines the likelihood of being selected to receive the resource for a given bid b, as well as the likelihood of transitioning to the next karma \(k^+\). Let us denote the resource competition outcome to the ego agent by \(o \in {{\mathcal {O}}}= \{0,1\}\), where \(o=0\) means that it is selected and \(o=1\) that it is yielding. Conditional on its bid b and the opposing bid \(b'\), the ego agent has the following probability of being selected
which lets us compute the probability of its resource competition outcome given its bid as a function of the social state as
The ego agent incurs a cost equal to its urgency u when it yields the resource \((o=1)\), and zero cost otherwise \((o=0)\). This allows us to define the immediate reward function as
which is negated to denote a reward rather than a cost. Note that it only depends on the urgency u and the bid b and not on the type \(\tau \) or the karma k. Note also that it is continuous in the social state \((d,\pi )\).
3.1.5 State Transition Function
The urgency of the ego agent at its next resource competition instance follows the exogenous process \(\phi _\tau [u^+ \mid u]\). In contrast, the ego agent’s next karma depends on multiple factors, including its current bid, the resource competition outcome, the specifics of the karma payment rule, and whether a karma redistribution event occurs before its next resource competition instance (see Fig. 2). We abstract this dependency with the karma transition function
which lets us express the state transition function as
Section 4 details the specifics of how to derive the karma transition function (7) for the cases of pay bid to peer (\(\texttt {PBP}\)) and pay bid to society (\(\texttt {PBS}\)). Here, we highlight two general properties of (7), (8) that are necessary for the main technical results to follow.
Assumption 1
(Continuity of the state transition function) The state transition function \({\rho }_\tau [u^+,k^+ \mid u, k, b](d,\pi )\) defined in (8) is continuous in \((d,\pi )\).
The continuity assumption is barely restrictive since the dependency on \((d,\pi )\) typically arises in the form of expectations, as is demonstrated in Sect. 4.
Assumption 2
(Karma preservation in expectation) Karma is preserved for all \((d,\pi )\), when taking the expectation over the entire population, i.e.,
which expands to
Intuitively, Assumption 2 requires that the karma held by the agents is preserved, either because no surplus karma is generated or because it is promptly redistributed. This can be guaranteed by appropriate design of the payment rules or it can be achieved under some assumptions on the karma redistribution scheme as is demonstrated in Sect. 4.
3.2 Solution Concept: Stationary Nash Equilibrium
3.2.1 Best Response
We assume that the ego agent of type \(\tau \) discounts its future rewards with the discount factor \(\alpha _\tau \in [0,1)\). Then, the expected immediate reward of the ego agent of type \(\tau \) when it follows the policy \(\pi _\tau \) is
and its state transition probabilities are
The expected infinite horizon reward is therefore recursively defined as
Equation (10) is the well-known Bellman recursion for the fixed policy \(\pi _\tau \). We next show that it has a unique solution that is continuous in \((d,\pi )\).
Lemma 1
Let Assumption 1 hold. Then the solution of (10) is unique and continuous in \((d,\pi )\).
Proof
First we show uniqueness. Let \(V_\tau (d,\pi )\) be the vector formed by stacking \(V_\tau [x](d,\pi )\) for all \(x \in {{\mathcal {X}}}\). It is straightforward to show that \(\Vert V_\tau (d,\pi )\Vert _\infty \le \frac{u_max }{1 - \alpha }\), where \(u_max \) is the maximal element of the finite set \({{\mathcal {U}}}\).Footnote 3 Therefore, \(V_\tau (d,\pi )\) lies in the Banach space of bounded sequences \((\ell ^\infty , \Vert \cdot \Vert _\infty )\). For a fixed social state \((d,\pi )\), let \(T_\tau ^{(d,\pi )}: \ell ^\infty \rightarrow \ell ^\infty \) be the map defined by the right-hand side of (10) (in vector form), i.e., \(T_\tau ^{(d,\pi )}(v) = R_\tau (d,\pi ) + \alpha _\tau \, P_\tau (d,\pi ) \, v\). Observe that \(V_\tau (d,\pi )\) is a fixed point of \(T_\tau ^{(d,\pi )}\), which we show to be unique by showing that \(T_\tau ^{(d,\pi )}\) is a contraction mapping, i.e.,
Since \(\alpha _\tau \in [0,1)\), this proves that \(T_\tau ^{(d,\pi )}\) is contractive.
Consider next the normed space \(({{\mathcal {D}}}\times \Pi , \Vert \cdot \Vert )\) (with an arbitrary norm) and the function \(V_\tau : {{\mathcal {D}}}\times \Pi \rightarrow \ell ^\infty \) defined as the unique fixed point of \(T_\tau ^{(d,\pi )}\). We show that \(V_\tau \) is continuous at every \((d,\pi ) \in {{\mathcal {D}}}\times \Pi \). Fix \(\epsilon > 0\). Choose \(\epsilon ' = (1 - \alpha _\tau ) \, \epsilon \) and \(\delta > 0\) such that \(\Vert (d,\pi ) - (d',\pi ')\Vert< \delta \Rightarrow \Vert T_\tau ^{(d,\pi )}(V_\tau (d,\pi )) - T_\tau ^{(d',\pi ')}(V_\tau (d,\pi ))\Vert _\infty < \epsilon '\). Such a \(\delta \) is guaranteed to exist since \(T_\tau ^{(d,\pi )}(v)\) is continuous in \((d,\pi )\) for any fixed v (\(R_\tau (d,\pi )\) is continuous, and so is \(P_\tau (d,\pi )\) as a consequence of Assumption 1). Then, we have for any \((d',\pi ')\) such that \(\Vert (d,\pi ) - (d',\pi ')\Vert < \delta \),
where the last inequality follows from (11). Manipulating yields \(\Vert V_\tau (d,\pi ) - V_\tau (d',\pi ')\Vert _\infty <\epsilon \), showing continuity. \(\square \)
The ego agent’s single-stage deviation reward (commonly known as the Q-function) is
which is the expected infinite horizon reward when the ego agent deviates from the policy \(\pi _\tau \) for a single resource competition instance by bidding b at the state [u, k], then follows \(\pi _\tau \) in the future resource competition instances. Consequently, the state-dependent best response correspondence of the ego agent is
This is the set of probability distributions over the bids maximizing the expected single-stage deviation reward of the ego agent of type \(\tau \) when its state is [u, k] and the social state is \((d,\pi )\).
3.2.2 Stationary Nash Equilibrium
We are now ready to define the solution concept that we adopt for this game.
Definition 1
(Stationary Nash equilibrium) A stationary Nash equilibrium is a social state \(({{\varvec{d}}},{\varvec{\pi }}) \in {{\mathcal {D}}}\times \Pi \) which satisfies for all \([\tau ,u,k] \in {{\mathcal {T}}}\times {{\mathcal {U}}}\times {{\mathbb {N}}}\)
The stationary Nash equilibrium is similar to the classical notion of the Nash equilibrium in that it denotes a state of the game where agents have no incentive to unilaterally deviate from the equilibrium policies of their types \({\varvec{\pi }}_\tau \) (SNE.2), but additionally requires that the type-state distribution \({{\varvec{d}}}\) is stationary under the stochastic processes characterized by the transition probabilities \(P_\tau [u^+,k^+ \mid u,k]({{\varvec{d}}},{\varvec{\pi }})\) (SNE.1). This stationarity condition implies that the ego agent need not consider the dynamics of the type-state distribution \({{\varvec{d}}}\) in its strategic behavior. Moreover, since the number of agents is large, the ego agent cannot unilaterally alter \({{\varvec{d}}}\) to further improve its rewards. Therefore, the equilibrium policies \({\varvec{\pi }}_\tau \) are indeed present and future optimal.
3.3 Existence of Stationary Nash Equilibrium
In [17], it is shown that a stationary Nash equilibrium is guaranteed to exist in every dynamic population game when the state space \({{\mathcal {X}}}\) is finite. We now extend this result to the karma dynamic population game, where the state space is countably infinite due to the karma state \(k \in {{\mathbb {N}}}\). Observe that the set of type-state distributions \({{\mathcal {D}}}\), given in (2), is a convex subset of the Banach space of finitely summable infinite sequences \((\ell ^1,\Vert \cdot \Vert _1)\). This is because the elements \(d \in {{\mathcal {D}}}\) can be represented as the infinite sequence \(\{\sigma [n]\}_{n \in {{\mathbb {N}}}}\) with
Trivially, this sequence is finitely summable, with \(\sum _n |\sigma [n]|= \sum _{\tau ,u,k} d_\tau [u,k] = 1\).
Let us further restrict \({{\mathcal {D}}}\) to the subset of type-state distributions which respect a fixed average amount of karma \({{\bar{k}}}\in {{\mathbb {N}}}\), denoted by:
This is also a convex subset of \(\ell ^1\). Furthermore, it is compact in \(\ell ^1\), as we show next using the following auxiliary definition and lemma.
Definition 2
(Equismall at infinity, [55] p.451) A subset \(\Sigma \) of \(\ell ^1\) is said to be equismall at infinity if, for every \(\epsilon > 0\), there is an integer \(n_\epsilon \ge 0\) such that
Lemma 2
(Compactness in \(\ell ^1\), [55, Theorem 44.2]) The following properties of a subset \(\Sigma \) of \(\ell ^1\) are equivalent:
-
(a)
\(\Sigma \) is compact;
-
(b)
\(\Sigma \) is bounded, closed, and equismall at infinity.
Corollary 1
\({{{\mathcal {D}}}^{{\bar{k}}}}\) is a compact subset of \(\ell ^1\).
Proof
The set \({{{\mathcal {D}}}^{{\bar{k}}}}\) is trivially closed since it is an intersection of closed polytopes. It is also trivially bounded, since \(0 \le d_\tau [u,k] \le 1\) for all \(d \in {{{\mathcal {D}}}^{{\bar{k}}}}\), \([\tau ,u,k] \in {{\mathcal {T}}}\times {{\mathcal {U}}}\times {{\mathbb {N}}}\). It therefore suffices to show that it is equismall at infinity. For any \(\epsilon > 0\), choose \(k_\epsilon \in {{\mathbb {N}}}\) such that \(\frac{{{\bar{k}}}}{k_\epsilon } < \epsilon \), and \(n_\epsilon = n_\tau \, n_u \, k_\epsilon \). For an arbitrary \(d \in {{{\mathcal {D}}}^{{\bar{k}}}}\), let \(\{\sigma [n]\}_{n \in {{\mathbb {N}}}}\) be its sequence representation as given in (13). We have
\(\square \)
The compactness of \({{{\mathcal {D}}}^{{\bar{k}}}}\) will enable us to invoke an infinite dimensional version of Kakutani’s fixed point theorem to establish the existence of a stationary Nash equilibrium. Before we do, we need to ensure that the fixed point correspondence maps elements of \({{{\mathcal {D}}}^{{\bar{k}}}}\) into itself. For a fixed policy \(\pi \in \Pi \), define the map \(W^\pi : {{\mathcal {D}}}\rightarrow {{\mathcal {D}}}\) as the concatination of the right-hand side of condition (SNE.1) for all \([\tau ,u,k] \in {{\mathcal {T}}}\times {{\mathcal {U}}}\times {{\mathbb {N}}}\), i.e.,
That \(W^\pi \) maps elements of \({{\mathcal {D}}}\) to itself follows trivially from the fact that \(P_\tau [u^+,k^+ \mid u,k](d,\pi )\) are transition probabilities. We further have the following lemma.
Lemma 3
Let Assumption 2 hold. Then for all \({{\bar{k}}}\in {{\mathbb {N}}}\) and \(\pi \in \Pi \), \(W^\pi \) maps \({{{\mathcal {D}}}^{{\bar{k}}}}\) into itself.
Proof
For a \(d \in {{{\mathcal {D}}}^{{\bar{k}}}}\), the average amount of karma of \(W_{\tau }^\pi (d)\) is
where we used the non-negativity of the summands to exchange the order of the infinite sums [45], and condition (KP). Therefore, \(W^\pi (d) \in {{{\mathcal {D}}}^{{\bar{k}}}}\). \(\square \)
We are now ready to apply the following infinite dimensional fixed point theorem to establish our main technical result: the existence of a stationary Nash equilibrium in karma dynamic population games (Theorem 1).
Lemma 4
(Kakutani–Glicksberg–Fan fixed point theorem, [27, Theorem 8.6]) Let C be a compact convex subset of a locally convex Hausdorff space E, and let \(S: C \rightarrow 2^C\) be a set-valued correspondence which is upper hemicontinuous, nonempty, compact and convex. Then S has a fixed point.
Theorem 1
(Existence of a stationary Nash equilibrium in karma dynamic population games) Let Assumption 1 and 2 hold. Then for each \({{\bar{k}}}\in {{\mathbb {N}}}\), a stationary Nash equilibrium \(({{\varvec{d}}},{\varvec{\pi }})\) satisfying \({{\varvec{d}}}\in {{{\mathcal {D}}}^{{\bar{k}}}}\) is guaranteed to exist.
Proof
We can write the stationary Nash equilibrium conditions (SNE.1), (SNE.2) as the fixed points of the correspondence defined as
where \(\{B_\tau [u,k](d,\pi )\}_{[\tau ,u,k]}\) is the sequence of best responses at all type-states \([\tau ,u,k] \in {{\mathcal {T}}}\times {{\mathcal {U}}}\times {{\mathbb {N}}}\).
-
The set \(C = {{{\mathcal {D}}}^{{\bar{k}}}}\times \prod \limits _{\tau ,u,k} \Delta ({{\mathcal {B}}}^k)\) is a compact subset of the locally convex Hausdorff space \(E = \ell ^1 \times \prod \limits _{\tau ,u,k} {{\mathbb {R}}}^{k+1}\), by Corollary 1 and Tychonoff’s theorem [56]. C is also trivially convex.
-
S maps C into subsets of C, by Lemma 3 and the definition of the best response.
-
S is upper hemicontinuous and nonempty, by the continuity of \(P_\tau (d,\pi )\) and \(Q_\tau [u,k,b] (d,\pi )\) in \((d,\pi )\), and Berge’s maximum theorem [2].
-
S is compact and convex, since \(W^\pi (d)\) is a singleton and \(B_\tau [u,k](d,\pi )\) is the set of convex mixtures over the finite number of bids maximizing \(Q_\tau [u,k,b](d,\pi )\).
It follows from Lemma 4 that S is guaranteed to have a fixed point, which coincides with a stationary Nash equilibrium. Since \({{\bar{k}}}\) above was arbitrary, this holds for each \({{\bar{k}}}\in {{\mathbb {N}}}\). \(\square \)
The significance of Theorem 1 lies in establishing that karma mechanisms induce a well-posed game in which a rational behavior exists and is well defined. Consequently, one can rigorously study the long-term social welfare implications of karma mechanisms at the stationary Nash equilibrium. The uniqueness of the stationary Nash equilibrium, as well as whether different learning dynamics are guaranteed to converge to it, remain open research questions.
3.4 Discussion of Incentive Compatibility
We now turn to discuss the classical notion of incentive compatibility (also known as strategy-proofness or truthfulness) in the context of karma mechanisms, in particular with respect to the bidding and resource allocation that happens at every interaction between the agents. Following [35, 38], we say that an auction-like mechanism is incentive compatible if the optimal (selfish) action by each agent is to bid their own truthful evaluation of the contended resource, thus revealing their private preference. Notice that, unlike classical monetary instruments, karma does not possess any intrinsic value, as it has no use outside of the game. It does however acquire value as a means of exchange in the game, and one could attempt to define a notion of incentive compatiblity with respect to the value of karma given by the expected infinite horizon reward in (10). However, we argue that such a notion of incentive compatibility is not critical for the efficiency of the resource allocation (i.e., the allocation of the resource to the highest urgency agent) at the stationary Nash equilibrium of the karma mechanism. First, this is supported by the numerical analysis in Sect. 5, where near-optimal efficiency is robustly observed for a wide range of settings under all of the karma mechanism designs considered. Second, unlike the classical monetary setting, incentive compatibility with respect to the value of karma does not guarantee efficiency. This is because the value of karma depends on the contingent private state of the agent (the immediate urgency, but also the current karma balance and how urgent they expect to be in the future). For this reason, even if the karma mechanism was incentive compatible, a truthful bid would not be a perfect revelation of the agent’s urgency. It is important to highlight that in this work, we develop the tools to predict the strategic behavior under general karma mechanisms. Therefore, we are able to robustly assess the resource allocation efficiency of these mechanisms when agents bid optimally according to their own self-interest, without the need for incentive compatibility as an intermediate step.
This is not to say that incentive compatibility is not a desirable property of the karma mechanism. It will assist in the process of learning optimal policies by providing optimal feedback when agents bid truthfully with respect to their value of karma (which also needs to be learnt; it corresponds to the value function that solves the Bellman equation given in (10)). Moreover, it will likely lead to robustness against uncertain information about the social state. The precise effect of the karma mechanism design on the learning process of the agents remains an exciting open research question.
4 Modeling of Karma Payment and Redistribution Rules
We now revisit the karma payment and redistribution rules introduced in Sect. 2, show how the karma transition function (7) in Sect. 3.1.5 can be specialized to model them, and verify that they satisfy Assumption 1 and 2.
A key difference when it comes to deriving the karma transition function is whether there is no surplus karma generated by the payment rule, such as in pay bid to peer (\(\texttt {PBP}\)), or whether surplus karma is generated, such as in pay bid to society (\(\texttt {PBS}\)). In the case of no surplus karma, the ego agent’s karma at the next resource allocation instance is fully determined by the outcome of the current instance, making the karma transition probabilities easier to model. In the case in which surplus is generated, then redistribution needs to occur, which in full generality might or might not happen between successive interactions of the ego agent (see Fig. 2). Extra care must be taken in order to guarantee that Assumption 2 holds, and we will do so by introducing additional modeling assumptions that guarantee that the surplus karma is entirely redistributed between successive interactions of the ego agent.
4.1 Payment Rules with no Surplus Karma
In the pay bid to peer (\(\texttt {PBP}\)) karma payment rule, the ego agent pays its bid if it is selected, and otherwise it gets paid the opposing bid \(b'\). Consequently, the conditional probability of its next karma is
which leads to the following karma transition function
It is straightforward to verify that (16) satisfies the continuity assumption (Assumption 1). Karma preservation in expectation (Assumption 2) is also satisfied for all \((d,\pi )\) (that we omit from the notation), as
4.2 Payment Rules with Surplus Karma
We make the following assumption to ease the modeling of payment rules which generate surplus karma to be redistributed to all the agents.
Assumption 3
(Synchronous matching and redistribution) At every time instant t, the whole population is randomly matched in simultaneous pairwise resource competition instances, and all surplus karma is redistributed immediately.
Under the pay bid to society (\(\texttt {PBS}\)) karma payment rule, the ego agent pays its bid if it is selected, and pays nothing otherwise. Its conditional payment is hence given by
Due to Assumption 3, the average generated surplus can be computed by letting the ego agent assume the role of all the agents in the population, whose type-states are distributed as per d and who follow the policies \(\pi _\tau \)
This gets redistributed to all the agents using the following integer-preserving redistribution rule (although other redistribution rules could be employed, as long as they redistribute the entire surplus):
-
distribute \(\left\lfloor p^\texttt {PBS}(d,\pi ) \right\rfloor \) to a fraction \(f^low (d,\pi ):=\Bigg \lceil {\bar{p}}^\texttt {PBS}(d,\pi ) \Bigg \rceil - {\bar{p}}^\texttt {PBS}(d,\pi )\) of agents, randomly selected;
-
distribute \(\Bigg \lceil {\bar{p}}^\texttt {PBS}(d,\pi ) \Bigg \rceil \) to the remaining fraction \(f^high (d,\pi ):= 1 - f^low (d,\pi )\) of agents.
Consequently, the probability that the ego agent receives a surplus payment of \(\left\lfloor {\bar{p}}^\texttt {PBS}(d,\pi ) \right\rfloor \) (respectively, \(\Bigg \lceil {\bar{p}}^\texttt {PBS}(d,\pi ) \Bigg \rceil \)) is \(f^low (d,\pi )\) (respectively, \(f^high (d,\pi )\)), resulting in the following karma transition function
It is straightforward to verify that (17) satisfies the continuity assumption (Assumption 1). Karma preservation in expectation (Assumption 2) is also satisfied, as
where we use \(f^low \, \left\lfloor {\bar{p}}^\texttt {PBS} \right\rfloor + f^high \, \Bigg \lceil {\bar{p}}^\texttt {PBS} \Bigg \rceil = {\bar{p}}^\texttt {PBS}\).
5 Numerical Analysis
In this section, we perform a numerical analysis of karma mechanisms, providing insights on the strategic behaviors that emerge at the stationary Nash equilibrium, and their consequences on the social welfare. We first define the social welfare measures in Sect. 5.1, then analyze the performance of the mechanisms in a demonstrative case study in Sect. 5.2. Finally, we test the robustness of the mechanisms to heterogeneity of the agents in Sects. 5.3 and 5.4.
As detailed in Appendix A, all the stationary Nash equilibria presented were computed using a dynamic equilibrium-seeking algorithm that is inspired by evolutionary dynamics in population games [17, 47].
5.1 Social Welfare Measures and Benchmark Resource Allocation Schemes
In order to quantitatively assess the performance of karma mechanisms, we introduce the following social welfare measures, along with benchmark resource allocation schemes that optimize them. As a baseline, we take a resource allocation scheme that simply allocates the resource in every competition instance based on a fair coin toss. We denote this scheme by \(\texttt {COIN}\).
5.1.1 Efficiency
We define efficiency as
which is the expected average reward of the two agents involved in the infinitely repeated resource competition instances. At the stationary Nash equilibrium \(({{\varvec{d}}},{\varvec{\pi }})\) of the continuous population model, (18) evaluates to
which is the expected average reward per resource competition instance of an ego agent assuming the role of all the agents (leveraging the stationarity of \({{\varvec{d}}}\)).
A benchmark resource allocation scheme which maximizes the efficiency is known as the omniscient benevolent dictator, who has access to the agents’ private urgency and allocates the resource to the agent with highest one. We denote this scheme by \(\texttt {DICT}\).
5.1.2 Ex-post Access Fairness and Ex-post Reward Fairness
In line with the literature on randomized resource allocations (e.g., [11]), the following ex-post fairness measures are defined for finite time horizons T and particular realizations of the repeated resource allocations.Footnote 4
Let \(w^i_T\) be the fraction of times agent i was selected to receive the resource (with respect to the times it was involved in resource competitions). The ex-post access fairness is defined via the standard deviation of \(w^i_T\) with respect to the different agents, i.e.,
A benchmark dynamic resource allocation scheme which aims to maximize the ex-post access fairness is one that ensures that the agents take turns accessing the resource, by selecting the agent who has received the resource the least fraction of times in the past. We denote this scheme by \(\texttt {TURN}\).
Let instead \({\bar{{\zeta }}}^i_T\) be an agent’s mean reward. Then the ex-post reward fairness is defined as the standard deviation of \({\bar{{\zeta }}}^i_T\) with respect to the different agents, i.e.,
Notice that, in contrast to the ex-post access fairness, ex-post reward fairness cannot be evaluated without knowing the private urgency of the agents.
5.2 Case Study: Homogeneous Agents with Rare High-Urgency State
We showcase our results in a scenario where the agents are homogeneous, i.e., they all follow the same urgency process \(\phi \) and have the same future discount factor \(\alpha \) (and there is only one type \(\tau \), which we drop from the notation in this section). We will investigate the role of heterogeneity in karma mechanisms in Sects. 5.3, 5.4. The agents are typically lowly urgent (\(u=1\)), and have a rare occurrence of being highly urgent (\(u=10\)). The agents can anticipate when they will be highly urgent ahead of time. This is represented by the following urgency process:
Notice that there are two low urgency states; the first is the ‘default’ state in which the agents find themselves most of the times, and the second is an ‘intermediate’ state which has a high probability of transitioning to the high urgency state.
For example, \(u=1\) default represents a regular day, \(u=1\) intermediate a regular day where the agent anticipates it must go to the airport during rush hour tomorrow, and \(u=10\) the day of that important trip.
Figure 3 shows the stationary Nash equilibrium computed for the case when the karma payment rule is pay bid to society (PBS) and the agents discount their future rewards with \(\alpha =0.98\). The average amount of karma per agent is \({{\bar{k}}}=10\). The top of the figure shows the equilibrium bidding policy \({\varvec{\pi }}\) at each urgency state, where for a given level of karma (x-axis) the intensity of the red color denotes the probabilistic weight placed on the bids (y-axis), and disallowed bids that exceed the karma budget are displayed gray. The bottom of the figure shows the stationary joint urgency-karma distribution \({{\varvec{d}}}\). The stationary Nash equilibrium exhibits multiple intuitive behaviors. First, agents bid parsimoniously, in order to save karma for the future rather than maximize their immediate chances of success. Second, agents bid more in the high urgency state than in the low urgency states, thereby effectively signalling their urgency. Interestingly, the agents bid zero when they are low on karma in the intermediate low urgency state, in order to gather karma for the anticipated high urgency state. As a consequence, high urgency agents typically have more karma.
Figure 4 shows the performance of the karma mechanisms with respect to the social welfare measures of efficiency, ex-post access fairness and ex-post reward fairness, as a function of the agents’ future discount factor \(\alpha \). In generating this figure, for each value of \(\alpha \), we ran agent-based simulations with \(N=200\) agents who were randomly matched in a total of \(T=1000\) interactions per agent and bid according to the stationary Nash equilibrium policy for \(\alpha \). Each simulation was repeated 10 times in order to construct the displayed confidence intervals. Efficiency and ex-post access fairness are plotted jointly as a trade-off chart on the left side of the figure, and the ex-post reward fairness is plotted on the right. We compare the performance under the karma payment rules \(\texttt {PBP}\) and \(\texttt {PBS}\) and the benchmark resource allocation schemes introduced in Sect. 5.1. As expected, the best efficiency is achieved by \(\texttt {DICT}\), the best ex-post access fairness by \(\texttt {TURN}\), and the baseline \(\texttt {COIN}\) performs poorly in all measures. Interestingly, the performance of \(\texttt {PBP}\) coincides with \(\texttt {COIN}\) when the agents are fully myopic (\(\alpha =0\)). In this case, the equilibrium policy can be computed in closed formFootnote 5; it is a dominant strategy for the agents to bid all their karma since there is no sense in saving it for the future. Under \(\texttt {PBP}\), this leads to all the karma in the system being in the possession of one single agent at a time, rendering an essentially random allocation among all the other agents who have no karma. In contrast, this does not occur under \(\texttt {PBS}\) due to the karma redistribution, and while the bidding all behavior is not efficient also under this payment rule, it preserves some of the turn-taking capability of the karma. In fact, the performance of \(\texttt {PBS}\) dominates that of \(\texttt {PBP}\) across all values of \(\alpha \) and for all of the social welfare measures considered, highlighting the advantage of incorporating a redistributive scheme rather than a strictly peer to peer scheme. This advantage comes at a price, since redistributive schemes such as \(\texttt {PBS}\) requires some degree of centralization to keep track of and redistribute the surplus karma. In many cases, however, it is natural to consider that the agents have a reasonably high value of \(\alpha \), since they are expected to remain in the system for long. Interestingly, both \(\texttt {PBP}\) and \(\texttt {PBS}\) perform similarly well in these cases, exposing that the performance of the karma mechanisms is robust to the specifics of the mechanism design in many reasonable scenarios. Remarkably, both payment rules approach the optimal efficiency of \(\texttt {DICT}\), without ever having to access the agents’ private urgency. At the same time, they vastly outperform \(\texttt {DICT}\) both in terms of ex-post access efficiency and ex-post reward fairness, demonstrating that the karma mechanisms are successful in both achieving fair turn-taking, as well as catering to the agents’ varying temporal needs. This occurs as long as the agents do have some future discounting. A severe degradation in the ex-post fairness occurs in the “pathological” case when the agents do not discount their future (\(\alpha =1\)). This interesting case is discussed separately in Appendix B as it requires different analysis tools to those presented in Sect. 3.
To provide insight into why \(\texttt {PBS}\) outperforms \(\texttt {PBP}\) for low values of the future discount factor \(\alpha \), as well as why the performance of the karma mechanisms improves with increasing \(\alpha \), we compare a number of stationary Nash equilibria in Fig. 5. Here, we compactly represent the equilibrium bidding policies through the mean bids, and only present the results for the default low urgency and the high urgency states (omitting the intermediate low-urgency state). The first two columns of the figure compare the stationary Nash equilibria under \(\texttt {PBS}\) and \(\texttt {PBP}\) for the relatively low value of \(\alpha =0.7\). Observe that under \(\texttt {PBP}\), a significant mass of agents are expected to be low on karma when they are highly urgent, and therefore fail to signal their high urgency against a lowly urgent opponent, contributing to the loss of efficiency observed in Fig. 4. In contrast, under \(\texttt {PBS}\) the mass of the stationary karma distribution at the high urgency state is concentrated in a region where the agents will effectively outbid lowly urgent opponents most of the times, explaining the superior performance of \(\texttt {PBS}\) in terms of efficiency. This occurs due to the redistribution of karma, which ensures that the agents are sufficiently far from having critically low karma.
A similar mechanism is responsible for the improved efficiency at higher values of the future discount factor \(\alpha \), as the rightmost three columns of Fig. 5 demonstrate by contrasting the stationary Nash equilibria under \(\texttt {PBP}\) for \(\alpha \in \{0.7,0.95,0.99\}\) (qualitatively similar results hold for \(\texttt {PBS}\)). Instead of relying on karma redistribution, highly future aware agents learn to be sparing in the use of karma, in order to avoid the situation of being highly urgent and low on karma, in which karma loses its effectiveness as a signaling device. This precisely exemplifies how repetition can be leveraged to align the agents’ incentives, and suggests that karma is an effective instrument for this purpose. Additionally, the mass of agents that have critically low karma values is generally much smaller for \(\alpha =0.99\) than for \(\alpha =0.7\) (at all urgency states), which contributes to the improved ex-post access fairness.
5.3 Robustness to Heterogeneous Future Discount Factors
In this section, we consider a mixed population where the agents can have one of two future awareness types; half of the agents heavily discount the future reward (\(\alpha _1 = 0.7\)) while the other half are strongly future-aware (\(\alpha _2 = 0.99\)). This heterogeneity can have one of two interpretations. In the first interpretation, the future discount factors are true representatives of the agents’ objectives, e.g., the \(\alpha _1\) agents are expecting to exit the system sooner than the \(\alpha _2\) agents. In the second interpretation, all the agents are expected to remain in the system for the same (infinite) time, and the heterogeneity represents differences in their strategic competence, i.e., the \(\alpha _1\) agents are less patient than the \(\alpha _2\) agents. This is the interpretation we focus on here. We would like to investigate the extent to which karma mechanisms are gracious with respect to this difference.
Figure 6 shows stationary Nash equilibrium results for \(\texttt {PBS}\) (top) and \(\texttt {PBS}+\texttt {TAX}\) (bottom), where in the latter we collect a small progressive karma tax of the form \(h[k] = 0.005 \, k^2\) from all the agents and redistribute it uniformly. The mathematical modeling of the karma tax follows similar principles as the redistributive payment rule \(\texttt {PBS}\) (see Sect. 4.2). The defining feature of Fig. 6 is that in the untaxed case, a slight ‘under-bidding’ behavior of the \(\alpha _2\) agents leads them to accumulate significantly more karma than the \(\alpha _1\) agents on the long run. As the ex-ante access fairness and ex-ante reward fairness plots demonstrateFootnote 6, this leads to some degree of unfairness between the two types, with the \(\alpha _2\) agents getting access to the resource a higher fraction of times, as well as experiencing higher average rewards (although the disparity is reasonably small). Nonetheless, applying a karma tax is an effective measure to equalize this disparity, since it both disincentivizes the \(\alpha _2\) agents to hold on to too much karma, and also redistributes some of that karma to the \(\alpha _1\) agents. This serves as a demonstration of the freedom that karma mechanisms give to the system designer, who has a principled tool to achieve different resource allocation objectives.
5.4 Robustness to Heterogeneous Urgency Processes
Thus far in our numerical analysis we have considered that all agents have the same urgency process. The homogeneity of the urgency process facilitates the interpersonal comparability [44] of utility between agents, and ultimately it allows to define a simple notion of efficiency. It is important to notice, however, that each agent’s urgency process is completely private and its only purpose is to encode the user temporal preference for when they would prefer to acquire the resource. It enables to compare the value of the resource for the same agent at different times, more than enabling the comparison between different agents.
We have looked at the effect of introducing some invadersFootnote 7 in the population. We assumed that these invaders have a different urgency process in that they are in a high-urgency state more often than the nominal population. We have considered the case of a small and a large subpopulation of invaders.
The numerical results are reported in Table 1. It is evident that agents that present a higher frequency of the high-urgency state are not granted additional resources under the \(\texttt {PBS}\) karma mechanism. On the contrary, the karma mechanism incentivizes agents to identify their most urgent instances parsimoniously. In contrast, the benchmark strategy \(\texttt {DICT}\) allocates additional resources to the high-urgency invading subpopulation. This behavior illustrates what is a key feature of the proposed “self-contained” karma mechanism, that differentiates it from monetary schemes: fairness of the resource allocation emerges intrinsically from the mechanism and is not affected by exogenous factors of inequality between agents. Uneven allocation of the resource is possible if desired, but it requires deliberate design choices such as non-uniform karma redistribution rules.
6 Conclusion
We have demonstrated the effectiveness of karma mechanisms for the dynamic allocation of common resources. These mechanisms make it possible to achieve highly efficient and fair allocations when the resources are repeatedly disputed, without requiring access to the users’ private preferences, and without resorting to monetary pricing, which is problematic in many important domains. The efficiency and fairness of karma mechanisms is robustly observed in multiple numerical cases involving different mechanism designs, preference structures, and user heterogeneity.
We show that it is possible to rigorously study the strategic behavior of the users of a karma mechanism by modeling it as a dynamic population game, in which a stationary Nash equilibrium is guaranteed to exist. We numerically investigate the karma stationary Nash equilibrium, providing insights on the strategic behaviors that emerge and on their consequences for the social welfare. We also provide examples of how our model can be a versatile mechanism design tool for the system designer that wants to affect these behaviors and achieve different resource allocation objectives.
Future work includes applying karma mechanisms in the specific motivating use cases, which include the allocation of ride-hailing trips, autonomous intersection management, as well as traffic and/or internet congestion management. We believe that many more applications are possible. We would also like to investigate the surprisingly understudied notions of fairness in (infinitely) repeated resource allocations, and develop axiomatic principles and specifications to guide the design of karma mechanisms. Moreover, our analysis suggests that karma mechanisms are robust to some types of user heterogeneity, but a comprehensive analysis of the practical effects of more forms of heterogeneity is desirable for some applications. Finally, we remark that effective strategic play by the agents is how karma acquires value in a karma mechanism. An important open research question is how the users of a karma mechanism can learn their optimal bidding strategy from repeated play in a distributed fashion.
Notes
In an individually rational outcome, the payoff of each player weakly dominates the player’s security or minimax payoff. The set of individually rational outcomes include socially efficient outcomes in many games, such as in the prisoner’s dilemma.
This terminology is adapted from [47], where the social state refers to the distribution of actions played in the static population. Notice that we must also account for the distribution of states in our dynamic setting.
This bound holds at a worst case where the ego agent yields with urgency \(u_max \) all the time.
To the extent of our knowledge, ex-post fairness has not been defined in infinitely repeated settings thus far.
Except for this special case, it is numerically difficult to compute \({\varvec{\pi }}\) for \(\texttt {PBP}\) and low values of \(\alpha \), and we start at \(\alpha =0.15\).
When there are multiple agent types, we may have a systematic difference in the long-run probability of access to the resource as well as the mean rewards of the different types. This ex-ante unfairness can be quantified directly from the stationary Nash equilibrium policies and distributions of the different types.
We borrow the term invaders from the standard nomenclature in evolutionary games.
Numerical conditioning is required for high \(\lambda \) to ensure that the exponential is stable. This is achieved with \({{\tilde{\pi }}}_\tau [b \mid u,k](d,\pi ) = \frac{\exp {(\lambda \, Q_\tau [u,k,b](d,\pi ) - \max _{b^*} \lambda \, Q_\tau [u,k,b^*](d,\pi ))}}{\sum _{b'} \exp {(\lambda \, Q_\tau [u,k,b'](d,\pi ) - \max _{b^*} \lambda \, Q_\tau [u,k,b^*](d,\pi ))}}\).
References
Balseiro SR, Gurkan H, Sun P (2019) Multiagent mechanism design without money. Oper Res 67(5):1417–1436
Berge C (1997) Topological spaces: including a treatment of multi-valued functions, vector spaces, and convexity. Dover Publications, Mineola NY
Berkes F (1986) Local-level management and the commons problem: A comparative study of turkish coastal fisheries. Mar Policy 10(3):215–229
Bertsekas D (2007) Dynamic programming and optimal control, vol 2. Athena Scientific, Belmont MA
Börjesson M, Fosgerau M, Algers S (2012) On the income elasticity of the value of travel time. Transp Res Part A Policy Pract 46(2):368–377
Bourreau M, Kourandi F, Valletti T (2015) Net neutrality with competing internet platforms. J Ind Econ 63(1):30–73
Brands DK, Verhoef ET, Knockaert J, Koster PR (2020) Tradable permits to manage urban mobility: market design and experimental implementation. Transp Res Part A Policy Pract 137:34–46
Budish E (2011) The combinatorial assignment problem: approximate competitive equilibrium from equal incomes. J Polit Econ 119(6):1061–1103
Budish E, Cantillon E (2012) The multi-unit assignment problem: theory and evidence from course allocation at harvard. Am Econ Rev 102(5):2237–71
Cachon GP, Daniels KM, Lobel R (2017) The role of surge pricing on a service platform with self-scheduling capacity. Manuf Serv Oper Manag 19(3):368–384
Cappelen AW, Konow J, Sørensen EØ, Tungodden B (2013) Just luck: an experimental study of risk-taking and fairness. Am Econ Rev 103(4):1398–1413
Castillo JC, Knoepfle D, Weyl G (2017) Surge pricing solves the wild goose chase. In: ACM Conference on Economics and Computation, pp 241–242
Censi A, Bolognani S, Zilly JG, Mousavi SS, Frazzoli E (2019) Today me, tomorrow thee: efficient resource allocation in competitive settings using karma games. In: IEEE Intelligent Transportation Systems Conference (ITSC), pp 686–693
Clarke EH (1971) Multipart pricing of public goods. Public Choice 11(1):17–33
Dholakia UM (2015) Everyone hates Uber’s surge pricing - here’s how to fix it . https://hbr.org/2015/12/everyone-hates-ubers-surge-pricing-heres-how-to-fix-it
Dutta PK (1995) A folk theorem for stochastic games. J Econ Theory 66(1):1–32
Elokda E, Censi A, Bolognani S (2021) Dynamic population games. arXiv preprint arXiv:2104.14662
Friedman EJ, Halpern JY, Kash I (2006) Efficiency and Nash equilibria in a scrip system for P2P networks. In: ACM Conference on Electronic Commerce, pp 140–149
Friedman JW (1971) A non-cooperative equilibrium for supergames. Rev Econ Stud 38(1):1–12
Fudenberg D, Levine DK (1991) An approximate folk theorem with imperfect private information. J Econ Theory 54(1):26–47
Fudenberg D, Maskin E (1986) The folk theorem in repeated games with discounting or with incomplete information. Econometrica 54(3):533–554
Fudenberg D, Levine DK, Maskin E (1994) The folk theorem with imperfect public information. Econometrica 62(5):997–1039
Gale D, Shapley LS (1962) College admissions and the stability of marriage. Am Math Month 69(1):9–15
Gibbard A (1973) Manipulation of voting schemes: a general result. Econometrica 41(4):587–601
Golle P, Leyton-Brown K, Mironov I, Lillibridge M (2001) Incentives for sharing in peer-to-peer networks. In: International Workshop on Electronic Commerce, Springer, pp 75–87
Gorokh A, Banerjee S, Iyer K (2021) From monetary to nonmonetary mechanism design via artificial currencies. Math Oper Res 46(3):835–855
Granas A, James D (2003) Fixed point theory. Springer, New York NY
Groves T (1973) Incentives in teams. Econometrica 41(4):617–631
Guo Y, Hörner J (2020) Dynamic allocation without money. TSE Working Paper
Hahn RW, Wallsten S (2006) The economics of net neutrality. Econ Voice 3:6
Hylland A, Zeckhauser R (1979) The efficient allocation of individuals to positions. J Polit Econ 87(2):293–314
Jackson MO, Sonnenschein HF (2007) Overcoming incentive constraints by linking decisions. Econometrica 75(1):241–257
Johnson K, Simchi-Levi D, Sun P (2014) Analyzing scrip systems. Oper Res 62(3):524–534
Kim J, Li M, Xu M (2021) Organ donation with vouchers. J Econ Theory 191:105–159
Krishna V (2009) Auction theory. Academic Press, Burlington MA
Moulin H (1980) On strategy-proofness and single peakedness. Public Choice 35(4):437–455
Net Neutrality - President Obama’s plan for a free and open internet (2016). https://obamawhitehouse.archives.gov/net-neutrality
Nisan N, Roughgarden T, Tardos E, Vazirani VV (2007) Algorithmic game theory. Cambridge University Press, Cambridge
O’Flaherty WD (1980) Karma and Rebirth in classical indian traditions. University of California Press, Berkeley CA
Okuno-Fujiwara M, Postlewaite A (1995) Social norms and random matching games. Games Econ behav 9(1):79–109
Ostrom E (1990) Governing the commons: the evolution of institutions for collective action. Cambridge University Press, Cambridge
Pil Choi J, Kim B-C (2010) Net neutrality and investment incentives. RAND J Econ 41(3):446–471
Prendergast C (2022) The allocation of food to food banks. J Polit Econ 130(8):1993–2017
Roberts KW (1980) Interpersonal comparability and social choice theory. Rev Econ Stud 47(2):421–439
Rudin W (1976) Principles of mathematical analysis, vol 3. McGraw-Hill, New York NY
Salazar M, Paccagnan D, Agazzi A, Heemels WM (2021) Urgency-aware optimal routing in repeated games through artificial currencies. Eur J Control 62:22–32
Sandholm WH (2010) Population games and evolutionary dynamics. MIT Press, Cambridge MA
Satterthwaite MA (1975) Strategy-proofness and Arrow’s conditions: Existence and correspondence theorems for voting procedures and social welfare functions. J Econ Theory 10(2):187–217
Schummer J, Vohra RV (2007) Mechanism design without money. In: Algorithmic Game Theory, pp 243–299
Shapley L, Scarf H (1974) On cores and indivisibility. J Math Econ 1(1):23–37
Shontell A, Moss C (2014) Uber denies text message that suggests it withheld cars ‘to make earnings higher’ on Valentine’s Day. https://www.businessinsider.com/uber-surge-pricing-text-2014-2?r=US &IR=T
Sönmez T, Ünver MU (2010) Course bidding at business schools. Int Econ Rev 51(1):99–123
Sönmez T, Ünver MU, Yenmez MB (2020) Incentivized kidney exchange. Am Econ Rev 110(7):2198–2224
Spear SE, Srivastava S (1987) On repeated moral hazard with discounting. Rev Econ Stud 54(4):599–617
Treves F (1967) Topological Vector Spaces. Distributions and Kernels. Academic Press, San Diego CA
Tychonoff A (1930) Über die topologische Erweiterung von Räumen. Math Ann 102(1):544–561
Vickrey W (1961) Counterspeculation, auctions, and competitive sealed tenders. J Finance 16(1):8–37
Vishnumurthy V, Chandrakumar S, Sirer EG (2003) Karma: A secure economic framework for peer-to-peer resource sharing. In: Workshop on Economics of Peer-to-peer Systems
Xiao F, Qian ZS, Zhang HM (2013) Managing bottleneck congestion with tradable credits. Transp Res Part B Methodol 56:1–14
Yang H, Wang X (2011) Managing network mobility with tradable credits. Transp Res Part B Methodol 45(3):580–594
Acknowledgements
We would like to acknowledge anonymous reviewers whose comments and suggestions greatly improved the paper. We are also thankful to Heinrich Nax for many fruitful discussions. Research supported by NCCR Automation, a National Centre of Competence in Research, funded by the Swiss National Science Foundation (grant number 180545).
Funding
Open access funding provided by Swiss Federal Institute of Technology Zurich
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Computation of a Stationary Nash Equilibrium
Our stationary Nash equilibrium computation algorithm is motivated by the notion of evolutionary dynamics in static population games. We write the equilibrium seeking problem as the problem of finding the rest points of the following continuous-time dynamical system:
See [17, Section 6] for an interpretation. We use \(\eta \) as a policy update rate parameter, which controls the rate of policy changes (EV.2) relative to the rate of state changes (EV.1), and
is one of the familiar mean dynamics in evolutionary game theory [47, Part II]. We choose the perturbed best response dynamic as our mean dynamic, which allows modeling agents that are not perfectly rational. The perturbed best response policy is given by:
where the parameter \(\lambda \) controls the degree of rationality of the agents. For \(\lambda = 0\), the perturbed best response is a uniform random distribution over all the bids. At finite values of \(\lambda \), it assigns higher probabilities to bids with higher single-stage deviation rewards in a smooth manner. At the limit \(\lambda \rightarrow \infty \), we recover the perfect best response with a uniform random distribution over all the bids maximizing the single-stage deviation rewards. The policy update dynamics (EV.2) are then simply
Note that these dynamics lead to perturbed versions of the equilibrium policies \({\varvec{\pi }}_\tau \) at the rest points, rather than the exact policies [47, Section 6.2.4]. However, for a sufficiently large \(\lambda \), exact policies can be computed in practice, and we use \(\lambda =1000\) in our computationsFootnote 8.
For computation purposes, we discretize the dynamics (EV.1)–(EV.2) using the discrete step size dt, yielding Algorithm 1.
Appendix B Stationary Nash Equilibrium in the Case of no Future Discounting
We have presented results for the case when agents do not discount their future rewards, i.e., when \(\alpha _\tau =1\), which does not fit the dynamic population model of Sect. 3. In particular, the expected infinite horizon reward \(V_\tau (d,\pi )\) (10) is not well defined for \(\alpha _\tau = 1\). Nevertheless, the case of no future discounting can be treated by considering that the agents face an average reward per time step problem rather than a discounted problem [4]. Analysis of the average reward per time step problem is intricate and we refer the reader to [4, Chapter 4] for the details. For our purposes, it suffices to use the following proposition.
Proposition 2
([4] Propositions 4.2.1, 4.2.2) For a fixed social state \((d,\pi )\), if a scalar \(\sigma _\tau (d,\pi )\) and a vector \(Y_\tau (d,\pi )\) satisfy
then \(\sigma _\tau (d,\pi )\) is the expected average reward per time step of the ego agent of type \(\tau \) starting from any state [u, k], supposing that \((d,\pi )\) is not time-varying.
Furthermore, if
then \(\sigma _\tau (d,\pi )\) is the optimal expected average reward per time step of the ego agent of type \(\tau \), and \(\pi _\tau \) is an optimal policy for \(\alpha _\tau =1\).
A stationary Nash equilibrium for \(\alpha _\tau =1\) is then a social state \(({{\varvec{d}}},{\varvec{\pi }})\) where \({{\varvec{d}}}\) is stationary, i.e., (SNE.1) holds, and \({\varvec{\pi }}_\tau \) satisfies (A4). Noting the similarity between (A3) and (10), Algorithm 1 can be readily modified to seek a stationary Nash equilibrium for \(\alpha _\tau =1\). Instead of computing \(V_\tau (d,\pi )\), we compute the pair \((\sigma _\tau (d,\pi ),Y_\tau (d,\pi ))\) as follows. Noting that if \(Y_\tau (d,\pi )\) satisfies (A3), so will \({\tilde{Y}}_\tau (d,\pi ) = Y_\tau (d,\pi ) + y \mathbb {1}\) for any constant offset y, we can eliminate this degree of freedom by fixing \(Y_\tau [u_1,0](d,\pi ) = 0\) for the arbitrary ‘default state’ \([u,k]=[u_1,0]\), which allows us to re-write (A3) as
This system of equations can be solved iteratively by using an initial guess \(Y_\tau ^0(d,\pi )\) and updating \(\sigma _\tau (d,\pi )\) and \(Y_\tau (d,\pi )\) until convergence, a procedure known as relative value iteration [4, Section 4.3.1]. The vector \(Y_\tau (d,\pi )\) can be interpreted as a relative rewards vector, with \(Y_\tau [u,k](d,\pi )\) representing transient reward advantages/disadvantages of state [u, k] with respect to the ‘default state’ \([u_1,0]\). With this, we can define the single-stage deviation rewards \(Q_\tau [u,k,b](d,\pi )\) (12) with respect to \(Y_\tau (d,\pi )\) instead of \(V_\tau (d,\pi )\), and proceed with the remainder of Algorithm 1 unmodified.
Figure 7 shows the stationary Nash equilibrium under karma payment rule \(\texttt {PBS}\) when all agents have \(\alpha =1\) (we drop the subscript \(\tau \) here because there is only one type), computed using this method. The defining feature of the equilibrium bidding policy is that it is very sparing on karma, with the agents bidding 1 only in the high urgency state and zero otherwise, no matter how much karma they have (unless they have zero karma). While on first investigation, this behavior resembles a truthful declaration of the urgency, and indeed \(\alpha =1\) performs well for the efficiency, it has a few negative consequences. First, it leads to a high stationary mass of the high urgency, zero karma state, which means that a high fraction of times the high urgency agents do not have one karma to bid. Second, the homogeneity of the bids leads to the karma losing its effectiveness as a turn-taking device, and results in the poor ex-post access fairness observed in Fig. 4 at \(\alpha =1\).
As we explored in Sect. 5.3, an effective measure against this karma hoarding behavior is to let the karma expire, e.g., by applying a karma tax.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Elokda, E., Bolognani, S., Censi, A. et al. A Self-Contained Karma Economy for the Dynamic Allocation of Common Resources. Dyn Games Appl 14, 578–610 (2024). https://doi.org/10.1007/s13235-023-00503-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13235-023-00503-0