Criteo is continuing its AI leadership push with a new open-source dataset.
The company has launched CriteoPrivateAd, which it describes as “the largest real-world anonymized bidding dataset, in terms of number of features.” So, what the hell is this thing, you might ask?
Basically, the dataset is designed to help adtech companies and researchers figure out how to run ad bidding effectively in a world without identifiers like third-party cookies. Right now, ad bidding relies on user signals (obviously!)—data collected across websites to predict who is most likely to engage with an ad. But as browsers phase out cookies (especially in Chrome, where deprecation will be more user-controlled), advertisers will lose access to these signals, making ad targeting much harder.
CriteoPrivateAd is meant to simulate those real-world challenges. It provides anonymized ad impressions, click data, and other contextual and campaign-level information from Criteo campaigns—essentially, a test environment where companies can experiment with different privacy-preserving bidding methods. The idea is that by testing their models with this dataset, researchers and adtech companies can develop stronger, privacy-first bidding tools and techniques that still drive performance.
The dataset includes a 100 million-record anonymized sample from 30 days of real ad bidding data collected from Chrome. Each record represents a single ad impression (a banner ad shown to a user), capturing details about the ad, its placement, and user interactions—without exposing any identifying information.
Why This Matters:
Ok, here are some key implications of the launch:
An obvious (again) challenge in adtech right now is how to maintain effective advertising without IDs. CriteoPrivateAd lets companies experiment with different privacy approaches, like aggregation (grouping user data together) or differential privacy (adding noise to obscure individual identities), to see what works best.
Google and Microsoft are also developing new privacy-focused ad targeting frameworks (e.g., Protected Audience API in Chrome). This dataset can help evaluate how well these systems work by providing offline benchmarks to test performance in a controlled environment before rolling out changes live.
Since the dataset mimics how ad bidding works in the real world, companies can use it to build and test predictive models for clicks and conversions, helping them refine their bidding strategies under new privacy constraints.
(We think this is what it means—Criteo, let us know if not.)
Experts React:
We asked Alexandre Nderagakura, former tech director at IAB Europe, for his take on the rationale behind CriteoPrivateAd. Here’s what he had to say:
“The idea is to be seen as an innovative leader and push the market, especially considering Criteo has invested time and resources into these different initiatives.”
To his point, Criteo has already collaborated with Google and Microsoft on Privacy Sandbox testing, and this dataset builds on those efforts, so the move makes sense.
Our Take:
Thought leadership is key here, especially in AI. We’ve said before that Criteo is making a bigger push into AI, and this dataset—focused on ML-powered bid optimization—fits right into that strategy.
Here’s what we said earlier this month about Criteo’s earnings call.
“AI was a big focus. Criteo mentioned AI nine times in prepared remarks—up from seven in the last earnings call—and it felt even more prominent this time. For example, the company highlighted the Criteo AI Lab, which wasn’t mentioned previously (and is really cool). AI also made it into this quarter’s press release, whereas it was absent last time.”
It’s not enough to just be a retail or commerce media player—AI expertise and infrastructure will be a key differentiator for Criteo (and everyone else) moving forward.