Reverse Engineer Google Political Ads with Big Data 2024
Bottom Line Up Front: Reverse-engineering Google political ads with their data.
Abstract:
This study is the first of its kind, investigating Google Political Ads data, to understand patterns in political ad engagement and spending. Importantly, examining ad duration, frequency by month, types of ads, and impressions per dollar, the analysis provides insights into the structure of political advertising on Google. The findings reveal that political ads typically peak in October and have shorter durations, rarely exceeding 100 days. Notably, video ads are more common than text ads except in April, and impressions follow a distinct tiered structure, with a limited correlation between ad spending and engagement beyond specific thresholds. Advertisers within top impression tiers are predominantly large Super PACs, suggesting restricted ad visibility for non-affiliated advertisers. These results highlight systematic, non-random engagement in political ad distribution, rejecting the hypothesis of random viewer interaction. Data limitations due to Google’s row cap constrain the analysis but support the findings that reveal the dynamics and constraints within political ad placements on Google.
Research Questions:
How long do political ads run?
2. What months do the ads run the most?
3. How do the ad types compare month by month?
4. What day of the month gets the most ads?
5. How has the frequency of ads changed over the years?
6. Does more money equal more impressions?
7. Which advertisers make up the top impression tiers?
8. How many impressions per dollar do presidential candidates get?
Null Hypothesis H0:
The distribution of viewer engagement is random.
Alternative Hypothesis H1:
The distribution of viewer engagement is not random.
Context:
The contribution of this study to the field of Data Analytics and political intelligence is to investigate Google Political Ad spending. With this information a political campaign can maximize the investment put into the advertising. A Harvard University article titled, Political Advertising and Election Results, showcases a study using political ad data to explore engagement strength using the identical variables of ‘Adverstiser Name’, ‘Impressions’ and ‘Amount Spent’ (Harvard, 2016). They found that these variables are key factors in overall content engagement and brand awareness.
Another article called, How Algorithms Effect the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikToc, explored the similar variables of ‘Adverstiser Name’, ‘Impressions’ and ‘Amount Spent’. They noticed that impressions are binned into discrete groups, not exact numbers. They noticed that Google rarely focused on audience micro-targeting but bulk distribution instead. They also generated a cost per impression analysis, comparing Joe Biden and Donald Trump (Oxford University, 2018).
Data:
An opensource dataset of Google Political Ad data containing the necessary variables about video and text campaigns. The dataset from www.google.com. Google is the opensource repository / organization that hosts the datasets. The dataset contains almost 300,000 rows (before any rows where removed) and 16 columns.
The dataset is limited in to only 300,000 instances because that is limit that Google Big Query allows a user to pull, given the 100k row limit; Additionally, there are only 3 possible options to research with Google Big Query to generate unique instances. The dataset is also limited to only seven years of ads, starting from 2018 and ending to present 2024.
Another limitation to the study: while the dataset has multiple columns for possible exploration. Delimitations for this analysis, of the 33 initial columns offered in the dataset, many of the columns had to be dropped from due to extreme data sparsity, Delimiting the study to only: 'advertiser_name', 'ad_type', 'date_range_start', ‘num_of_days', 'Impressions', 'spend_range_max_usd'.
Available to the public via Google.com, meaning that the dataset may be limiting in accuracy and completeness. A link to Google BigQuery can be found here:
Available to the public via Google.com, meaning that the dataset may be limiting in accuracy and completeness.
The code for this analysis can be found on Github here:
Below is a list of the variables used for this analysis:
Data Gathering:
Plan and direct data gathering to opensource repositories (Google). Looking for keywords such as “Poltical Ad”, “Videos”, “upload times”, “+ .csv”. Next, selecting the 1st to 3rd ranked piece of content (reachable csv file) and inspecting each csv file for quality such as “length” (at least 7k rows), data cleanliness, massive gaps in data, and enough relevant variables to create an ‘X’ and ‘Y’ axis.
Fortunately, all dependent variables are continuous. The dataset is 0% sparse before attempting to drop all missing or null columns from the dataset. The ‘date_range_start’ will be separated from the date in Microsoft Excel into a numerical category of twenty-four separate hour categories.
Data Analytics Tools and Techniques: A KDE plot was used to visualize the distributions. Quality visualizations are germane to studying this data because they allow the reader to compare distributions of data. Overall, this is an exploratory quantitative data analytic technique and a descriptive statistic. The tools used will be Jupyter Notebook operating in Python code, running statsmodel api as a reliable open-source statistical library. Due to the data size, a Pandas data frame will be called, same with Numpy and Seaborn will be used for visualizations. Additionally, added is a presentation layer containing univariate and by bivariate graphs.
Justification of Tools/Techniques:
Python will be used for this analysis because of Numpy and Pandas packages that can manipulate large datasets (IBM, 2021). The tools and techniques are common industry practice and have consensus of trust.
The technique is justified through the integer variables necessary to plot against a timeline. In so doing, may just reveal different modes of frequency distribution. Another reason why basic exploratory data analysis is ideal is because the data is based off of human political sentiment, which is notoriously skewed. Because of the size of the dataset, pandas and Numpy will be called. Python is being selected over SAS because the Python has better visualizations (Panday, 2022).
Project Outcomes: To understand the data and find statistically significant differences, the proposed end-state is outputting the names of advertisers as they are located on the distribution of impressions. Also added is a visualization of the frequency distribution of the impressions against money spent. A cleaned dataset of all the correctly labeled columns and rows, for replication. A better understanding of previously stated groups with exploratory graphs, giving support as to how Google political ads are disseminated to the public. Lastly, a copy of the Jupyter NoteBook with the Python code will be available, along with a video presentation added by PowerPoint. According to the same study data exploration was instrumental in support for alternative hypothesis, against other categorical variables. (Harvard, 2016).
Now exploring the findings:
Above is a graph of the number of days the post lasted for notice that the distribution is left skewed towards the 0 mark. The graph is made with the number of days on the X axis and the density of the frequency distribution. This frequency distribution represents the number of days that political ad lasted after being initially uploaded. Secondly, it appears that google political ads do not last longer then 200 days and the major of ads last less than 100 days. Now exploring the frequency per month:
From the above graph, it would appear that October is the highest month for ads with September being a close second.
Below is a histogram of monthly frequency for ad types, comparing video and text ads. It appears that there are generally more video ads then text ads. The month of April is the only month that has more text ads than video ads. September and October increase in video ads significantly.
Above is a histogram indicating what months received the most political ads. Notice how September and October are the two highest months (October being the highest month), but it sharply drops off after November. Now exploring the number of impressions per month.
In the above graph it shows the month on the X axis and total number of impressions on the Y axis. Notice how the amount spent is applied to the graph as well. Notice how the impressions quickly increase after July and drop off after November. Also take note of how the increase in spending is slowly tapering up as well between the stated months. The month of October has yielded 7 billion views. Now exploring the political ad frequency per day:
The above histogram showcases the number of political ads uploaded per day. Notice how the 1st of each month has the highest spike in political ads then appears to be rather random throughout the month. Now exploring the Google Political Ad spending per year:
The above a graph showcases the frequency of political ads per year starting from 2018 and ending in 2024. Notice that 2020 was the highest spike and what is equally interesting is that 2024 has a record low number. Overall, the distribution appears to be random or left-skewed. However, when only factoring the years from 2020 to 2024 there appears to be a decreasing trend. Now exploring the political ad types per year.
The above graph indicates that the years 2018, 2019, and 2021 have more text ads than video ads. Years: 2020, 2022, 2023, and 2024 have more than double the amount of video ads versus text ads.
The above scatter plot shows the number of impressions plotted along the ‘X’ axis going from zero to 1 billion impressions. The above represents a discrete probability distribution. Along every part of the ‘X’ axis there is distinct tier groups for impressions. On the ‘Y’ axis is the amount of U.S. dollars spent to get these impressions. Notice before the 300k (0.3 1e7) impression mark, it appears to be a random scattering of advertisers paying high price points but not receiving more impressions. What's interesting is the regression line is slightly sloped upward, indicating that the amount of money somebody would spend doesn't necessarily equal more impressions. Google admits it has a “vetted list of advertisers” who are allowed to promote political advertisements on Google ads (Google, 2024). However, it appears they have a distinct cut off between those in the groups of big impressions versus those who pay high amounts of money and will never receive those impressions. It appears like a Good-Old-Boy's-Club, those who are not in the club will never get the amount of impressions that they want no matter how much they pay. Moreover, the data seems to suggest is you do not get what you pay for unless you are of a certain membership.
Now exploring who is in these membership tears by generating outputs of the top 30 advertiser entities and how frequently they had political ads disseminated by Google starting at 500 - 600 million impression tier. Below is the output:
Now displaying the top 30 advertisers for the 600-700 million impression tier:
Now displaying the top 30 advertisers for the 700- 800 million impression tier:
Now displaying the top 30 advertisers for the 800 - 999 million impression tier:
Advertiser networks indicating Donald Trump are of the most frequent amongst the given impression tiers.
Now Exploring the impression rate for 2024 presidential candidates: Trump, Biden, and Harris.
During this research we answered the following research questions:
1. How long do political ads run?
Most political ads run for 100 days and no longer than 200 on the tail end.
2. What month do the ads run the most?
September and October, with October being the highest.
3. How do the ad types compare month by month?
April is the only month where text ads outnumber video ads.
4. What day of the month gets the most ads?
The first of the month.
5. How has the frequency of ads changed over the years?
Between 2018 and 2024 the distribution peaked at 2020 and has slowly less than it was in 2018.
6. Does more money equal more impressions?
Absolutely not. The distribution is discreate. The regression line is almost flat, showcasing a very minimal rise of impression per dollar spent.
7. Which advertisers make up the top impression tiers?
See the above outputs per impression tier.
8. How many impressions per dollar do presidential candidates get?
As of 2024 Trump received 29 impressions per dollar, Harris received 20 impressions per dollar, and Joe Biden received 54 impressions per dollar.
In final analysis, we reject the null hypothesis in favor of the alternative hypothesis: the distribution of impressions for Google Political ads is not random; Made evident with the discrete probability distribution consisting of deliberate tiers. Before the 300,000-impression level, distribution appears to be a heteroscedastic shotgun blast of residuals before manifesting into tight homoscedastic tiers past the 300k mark. The barely present residual slope indicates no-hope for more money to equal more audience reached. After exploring the advertisers in each tier, they appear to be big money Super Pacs; If you are a vetted advertiser and not in the big club, then you will not get more impressions, you play in the casino for 300k impressions at best. Secondly, that may indicate a lack of faith in Google political ads. After the height of the 2020 rise in ad spend, the frequency continues to decline year over year; 2024 being a record low. The distribution of political ad frequency appeared to correlate with the election season, with October being the highest month and most ad money being paid on the 1st of the month. Lastly, as of 2024 Joe Biden has received more impressions per dollar than Donald Trump. Suspiciously, Google Big Query claims to be transparent, but they only allow 300K rows due to querying options made possible to the end-user. Lastly, this analysis has effectively show how we can reverse-engineer Google Political Ads with data.
Work cited:
Google. (n.d.). Google Political Ads Transparency Portal . Google cloud console. https://console.cloud.google.com/marketplace/product/transparency-report/google-political-ads?project=youtube-api-pull-381414
Google. (n.d.). Google Political Advertiser Verification . Google. https://support.google.com/adspolicy/topic/9646742?hl=en&ref_topic=9646537%2C1308156%2C&sjid=16985924250476509333-NC
How algorithms shape the distribution of political advertising. (n.d.-a). https://arxiv.org/pdf/2206.04720
IBM products. IBM. (n.d.). https://www.ibm.com/cloud/blog/python-vs-r
Pandey, Y. (2022, May 25). SAS vs python. LinkedIn. https://www.linkedin.com/pulse/sas-vs-python-yuvaraj-pandey/
Political Advertising and election results. (n.d.-b). https://economics.harvard.edu/files/economics/files/ms25651.pdf
Get more industry intelligence in these articles here:
Exploring State Voter Turnout %
Exposing Scandals in Patient Law
See Pork in Congressional Bills with AI