Tag Archives: Ad Click Prediction

[repost ]Google 的论文:Ad Click Prediction: a View from the Trenches


Ad Click Prediction: a View from the Trenches
H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young,
Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov,
Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg,
Arnar Mar Hrafnkelsson, Tom Boulos, Jeremy Kubica
Google, Inc.
mcmahan@google.com, gholt@google.com, dsculley@google.com
Predicting ad click{through rates (CTR) is a massive-scale
learning problem that is central to the multi-billion dollar
online advertising industry. We present a selection of case
studies and topics drawn from recent experiments in the
setting of a deployed CTR prediction system. These include
improvements in the context of traditional supervised learning based on an FTRL-Proximal online learning algorithm
(which has excellent sparsity and convergence properties)
and the use of per-coordinate learning rates.
We also explore some of the challenges that arise in a
real-world system that may appear at rst to be outside
the domain of traditional machine learning research. These
include useful tricks for memory savings, methods for assessing and visualizing performance, practical methods for
providing con dence estimates for predicted probabilities,
calibration methods, and methods for automated management of features. Finally, we also detail several directions
that did not turn out to be bene cial for us, despite promising results elsewhere in the literature. The goal of this paper
is to highlight the close relationship between theoretical advances and practical engineering in this industrial setting,
and to show the depth of challenges that appear when applying traditional machine learning methods in a complex
dynamic system.
Categories and Subject Descriptors
I.5.4 [Computing Methodologies]: Pattern Recognition|
online advertising, data mining, large-scale learning
Online advertising is a multi-billion dollar industry that
has served as one of the great success stories for machine
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
KDD’13, August 11–14, 2013, Chicago, Illinois, USA.
Copyright 2013 ACM 978-1-4503-2174-7/13/08 …$15.00.
learning. Sponsored search advertising, contextual advertising, display advertising, and real-time bidding auctions have
all relied heavily on the ability of learned models to predict
ad click{through rates accurately, quickly, and reliably [28,
15, 33, 1, 16]. This problem setting has also pushed the
eld to address issues of scale that even a decade ago would
have been almost inconceivable. A typical industrial model
may provide predictions on billions of events per day, using
a correspondingly large feature space, and then learn from
the resulting mass of data.
In this paper, we present a series of case studies drawn
from recent experiments in the setting of the deployed system used at Google to predict ad click{through rates for
sponsored search advertising. Because this problem setting
is now well studied, we choose to focus on a series of topics
that have received less attention but are equally important
in a working system. Thus, we explore issues of memory
savings, performance analysis, con dence in predictions, calibration, and feature management with the same rigor that
is traditionally given to the problem of designing an e ective learning algorithm. The goal of this paper is to give the
reader a sense of the depth of challenges that arise in real
industrial settings, as well as to share tricks and insights
that may be applied to other large-scale problem areas.
When a user does a search q, an initial set of candidate
ads is matched to the query q based on advertiser-chosen
keywords. An auction mechanism then determines whether
these ads are shown to the user, what order they are shown
in, and what prices the advertisers pay if their ad is clicked.
In addition to the advertiser bids, an important input to the
auction is, for each ad a, an estimate of P(click j q; a), the
probability that the ad will be clicked if it is shown.
The features used in our system are drawn from a variety of sources, including the query, the text of the ad creative, and various ad-related metadata. Data tends to be
extremely sparse, with typically only a tiny fraction of nonzero feature values per example.
Methods such as regularized logistic regression are a natural t for this problem setting. It is necessary to make
predictions many billions of times per day and to quickly
update the model as new clicks and non-clicks are observed.
Of course, this data rate means that training data sets are
enormous. Data is provided by a streaming service based on
the Photon system { see [2] for a full discussion.
Because large-scale learning has been so well studied in
recent years (see [3], for example) we do not devote signif-