Opportunities to show ads on the worldwide web are typically sold in real time through advertising auctions, held billions of times per day, every day. A product like Amazon’s Sponsored Display, which participates in auctions on our advertising customers’ behalf, must learn bidding strategies that help our advertisers achieve their goals. If we bid too much, we end up overpaying for the impression opportunity. But if we underbid, we might lose the opportunity. Theory offers little guidance here: in practice, learning to bid requires a lot of trial and error.
At this year’s AdKDD Workshop on computational advertising, at the ACM Conference on Knowledge Discovery and Data Mining (KDD), my colleagues and I won a best-paper award for our presentation of AuctionGym, a simulation environment that enables robust and reproducible evaluation of bandit and reinforcement learning methods for learning bidding strategies for online advertising auctions.
We have publicly released the code for AuctionGym, and it is our hope that it will allow researchers as well as practitioners (in computer science, machine learning, economics, and beyond) to gain insights into auction and bidding dynamics and to develop and evaluate novel machine learning approaches that tackle the open problems in the field.
In the paper, we formalize the problem of learning to bid as off-policy, counterfactual estimation and learning. Within that formalism, we show that most existing solutions adhere to the value-based paradigm, meaning they model the outcome of placing a bid in a specific auction and use that model to make real-time bidding decisions.
This is not the only option: in the paper, we propose alternative formulations of the problem, which leverage policy-based and doubly robust counterfactual estimators. These approaches have the advantage of ironing out biases that can arise when we try to model every single auction outcome: they provide unbiased estimates of the true value a bidding strategy will bring, in expectation.
Auctions are run by entities called ad exchanges, and the mechanisms they implement decide which bidder wins the impression opportunity and how much it costs.
Even for seemingly simple situations with two items and two bidders, the revenue-maximizing auction mechanism remains undiscovered.
Auction mechanisms are well studied in the economics literature and have a long history. The research on this topic typically cares about truthful or incentive-compatible auctions, where bidders are incentivized to report their true valuations for the items being sold. This simplifies the problem setting, as we no longer need to consider strategic misreporting by the bidders.
Famous mechanisms exist to either maximize welfare for auction participants who bid truthfully (the Vickrey-Clarke-Groves auction) or to maximize revenue for the auctioneer (the Myerson auction). Myerson auctions are an example of optimal auction design, a field originated by Roger Myerson, who was awarded a Nobel Prize for it.
Although these well-known mechanisms have been hugely influential and have contributed strongly to our understanding of auction theory, there are still many open problems in practice. Indeed, even for seemingly simple situations with two items and two bidders, the revenue-maximizing auction mechanism remains undiscovered.
The heterogeneity and complexity of present-day online advertising auctions make theoretical analysis difficult. Indeed, theoretical results typically rely on assumptions that rarely hold and are hard to validate in real-world settings. As a result, auction theory only partially informs practical implementations of these systems.
Moving away from pure theory, we can draw inferences and gain insights through empirical observations. The problem then becomes that experimental data is costly to acquire, and observational data addresses a limited range of questions.
To complicate the picture, the design of many present-day online advertising auctions may in practice encourage strategic bidding. This implies that we should take strategic-bidding behavior into account: when we submit bids to external auctions, as we do with Sponsored Display, we should ensure that we’re using strategies that will help achieve our advertising customers’ goals. But the types of datasets that could be used to evaluate alternative strategies are hard to come by for academic researchers — and offline evaluation methodologies are limited.
AuctionGym provides an alternative way to improve our understanding of auction dynamics and to provide reproducible validation of learning-to-bid methods. It simulates repeated auction rounds in an end-to-end manner, from impression opportunities to impressions and the resulting conversions.
In AuctionGym, bidders have private catalogues listing their advertisements, their goals, and their private valuations of those goals. These valuations can be drawn from a prespecified distribution, or they can be based on the bidder’s real data.
AuctionGym comes with implementations of several standard auction mechanisms, but our hope is that it can help identify new mechanisms, learned from data. AuctionGym tracks multiple outcome measures, such as the auctioneer’s revenue; the bidders’ welfare and surplus; return on ad spend (ROAS), an industry-standard efficiency measure; and various notions of bidders’ regret, or how much value bidders miss out on due to suboptimal allocation or bidding decisions.
If you’re interested in learning more, try it yourself at https://github.com/amzn/auction-gym.