Exploring State Amazon Spending with Machine Learning.

Bottom Line Up Front: The Department of Children, Youth, and Families is grossly abusing the Washington State Amazon Emergency Procurement Policy.  

Orientation:

  

Significant Discoveries:

 

·      The Department of Children, Youth, and Families are buying in bulk. Increasing “Emergency Procurement”, spending 92% from FY21-23; Most of the spending occurred in FY23 (post COVID Lockdown).

·      Second place is the Department of Social and Health Services (DSHS). DSHS bulk-buys “Apparel” and has the most variance in spending; Over $25,000 per purchase and almost double the standard deviation compared to the average department.

·      When aggregated, the Department of Corrections blatantly buys books in bulk ($4,509,297) as “Emergency Procurement”.

·      When aggregated, the Department of Fish and Wildlife primarily purchases “personal laptops” as an “Emergency”.

·      Notably, the number of ‘repeat purchases’ has reduced from 88% (from 2021-22) to 8% in 2023.

·      However, 4 times more spending was done in FY23 vs. FY22.

·      Washington State has over 5,000 orders a day.

·      The items purchased are random and not predictable for cross-purchasing or bundling.

·      The keywords: “Heavy”, “Duty”, “Black”, “Pack”, and “Amazon basic” are the most common words to entice Washington State procurement specialists on Amazon.  

·      The target variable, 'Amount' was passed into numerous predictive machine learning models but no accurate score was yielded; More data needs to be compiled.


 Additionally, this report answers the following questions:

 

1.     Which Washington State agencies have purchased from Amazon?

2.    What are the top 20 highest spending agencies?

3.    What are the spending variances per top 20 agencies?

4.    What are the top 20 agencies most purchased item-category?

5.    What basket of items was the highest spending day of the top 10 agencies?

6.    What are the top 20 most purchased items?

7.     What are the most common words in the item 'description'?

8.    How many Amazon orders are made daily?

9.    What is the highest spending month?

10. Is the rate of Amazon spending increasing or decreasing?

11.  Can a descriptive Market Basket Analysis (MBA) model be developed from the data?

 

1.     Which Washington State agencies have purchased from Amazon?

2.    Which 20 agencies spent the most on Amazon?

 3.    What are the spending variances among these top 20 agencies?

4.    What item categories were most purchased by these top 20 agencies?

5.    What are the Top 20 product item categories overall?

Below is a visualization of the above list.

6.    For the highest spending day for each of the top 10 agencies, what items did they purchase?

7.    What are the most common words in the item 'description'?

8. How many Amazon orders are made daily by the State of Washington?

9.    Which month had the highest spending?

10. Is the rate of Amazon spending increasing or decreasing?

The above graph shows the frequency of purchases is highest during the months of March, April, May, and June; Then decrease in July and remain more consistent from August through November.

11.     Can a descriptive Market Basket Analysis (MBA) model be developed from the data?

 

Questions to be answered with the following advanced statistical data mining:

·      MBA Null Hypothesis (H0): There is NO significant association between the Washington State Agency purchases and product categories.

·      Alternative Hypothesis (H1): There IS A significant association between the Washington State Agency purchases and product categories.

·      PCA Null Hypothesis (H2): There is no significant reduction in dimensionality achieved by PCA. The variance explained by the first ‘k’ principal components is not significantly different from the total variance in the original dataset.

·      Alternative Hypothesis (H3): There is a significant reduction in dimensionality achieved by PCA. The variance explained by the first ‘k’ principal components is significantly different from the total variance in the original dataset.

·      KMeans Clustering Null Hypothesis (H4): There is no significant difference between the groups identified by K-Means clustering. The centroids of the clusters do not significantly differ from each other.

·      Alternative Hypothesis (H5): There is a significant difference between the groups identified by k-means clustering. At least one pair of cluster centroids is significantly different from each other.

 

 

Situation:

 

According to the dataset made publicly available, since 2017 Washington State allowed “Emergency Procurement” via an Amazon direct buying policy regulated by the Department of Enterprise Services (DES 2023).

 

Agencies should ensure that products are not available through statewide contracts prior to utilizing Amazon. Agency spend with Amazon is not counted towards small business, veteran, or supplier diversity goals (DES).

 

It’s fiduciary of the incumbent to the taxpayer, if bulk orders are to be made, they are to go through a cooperative-purchase, reducing the most costs. Cooperative purchasing and Government vendor vetting, while not perfect, allows the more ethical practice of Government spending with certified and sustainable small businesses of Washington State. The more Amazon spending is increased, the less the small businesses of Washington are economically represented; Skewing the mission statement of DES.    

 

 Therefore, in attempts to inform the public and reduce spending, DES has published the Amazon spending for state agencies collected through the Washington State Amazon Business account. The data set only includes closed orders. Any orders that are still in process or have been cancelled are not included. This data is for Fiscal Year 22 (July 1, 2021 to June 30, 2022). Data is updated monthly. DES-125-03

 

Notably, a competitive solicitation process must be used for all purchases of goods and services unless there is an exception listed under RCW 39.26.125. Direct buy purchases are one of the exceptions, which do not require a competitive process. Certain public purchases do not justify the administrative time and expenses necessary to conduct a competitive process.

 

RCW 39.26.125

Competitive solicitation—Exceptions.

All contracts must be entered into pursuant to competitive solicitation, except for:

(1) Emergency contracts;

(2) Sole source contracts that comply with the provisions of RCW 39.26.140;

(3) Direct buy purchases, as designated by the director. The director shall establish policies to define criteria for direct buy purchases. These criteria may be adjusted to accommodate special market conditions and to promote market diversity for the benefit of the citizens of the state of Washington;

(4) Purchases involving special facilities, services, or market conditions, in which instances of direct negotiation is in the best interest of the state;

(5) Purchases from master contracts established by the department or an agency authorized by the department;

(6) Client services contracts;

(7) Other specific contracts or classes or groups of contracts exempted from the competitive solicitation process when the director determines that a competitive solicitation process is not appropriate or cost-effective;

(8) Off-contract purchases of Washington grown food when such food is not available from Washington sources through an existing contract. However, Washington grown food purchased under this subsection must be of an equivalent or better quality than similar food available through the contract and must be able to be paid from the agency's existing budget. This requirement also applies to purchases and contracts for purchases executed by state agencies, including institutions of higher education as defined in RCW 28B.10.016, under delegated authority granted in accordance with this chapter or under RCW 28B.10.029;

(9) Contracts awarded to companies that furnish a service where the tariff is established by the utilities and transportation commission or other public entity;

(10) Intergovernmental agreements awarded to any governmental entity, whether federal, state, or local and any department, division, or subdivision thereof;

(11) Contracts for services that are necessary to the conduct of collaborative research if the use of a specific contractor is mandated by the funding source as a condition of granting funds;

(12) Contracts for architectural and engineering services as defined in RCW 39.80.020, which shall be entered into under chapter 39.80 RCW;

(13) Contracts for the employment of expert witnesses for the purposes of litigation;

(14) Contracts for bank supervision authorized under RCW 30A.38.040;

(15) Contracts for the purchase of opioid overdose reversal medication authorized under RCW 70.14.170; and

(16) Contracts for investigators awarded by the office of independent investigations as authorized under RCW 43.102.050.

US Bank changes rebates for Amazon purchases

Effective July 1, 2023, all US Bank rebates from purchases made with Amazon will reduce by 19%. DES announced the impending change to purchasers on June 15. US Bank made this change in accordance with statewide contract #00719 in response to negotiations between Visa and Amazon. This change impacts all US Bank purchasers throughout the country (DES).

 

Mission:

 

Data mine the DES datasets and disseminate the findings. The contribution of this study is for the field of industry intelligence, taxpayer awareness, and small business recovery. Exploring with machine learning, Washington State direct-buy Amazon purchasing behavior; Next, attempt to predict purchasing pattern cooccurrences given the data. 

 

An article titled, Market Basket Analysis Evaluation and Scoring for Contract Award, showcases a study using Market Basket Analysis (MBA) testing to explore identical variables for contact purchasing probability (CDE, 2020). They found that MBA helps to understand purchasing patterns in overall contract purchasing behavior. Market Basket Analysis is a data mining technique that analyzes patterns of co-occurrence and determines the strength of the link between products purchased together (Scipy, 2020). Understanding these variables can help describe the relationship between the Independent Variables IVs and Dependent Variables.

 

 

Data:

 

According to Amazon, the vanguard of MBA in action:

 

Market Basket Analysis (MBA) makes several assumptions about the variables involved. Here are some common assumptions:

§  Binary Data:

§  The data is binary, meaning each item is either present or absent in a transaction.

§  Independence:

§  The items in the basket are assumed to be independent of each other.

§  This implies that the purchase of one item does not influence the purchase of another.

§  Fixed Basket Size:

§  The number of items in a basket is assumed to be fixed.

§  Static Transactions:

§  MBA often assumes that transactions are static, meaning that the set of items in a transaction does not change over time.

§  No Quantity Information:

§  MBA typically doesn't consider the quantity of items bought, only whether an item is present or not.

§  It assumes that the occurrence of an item in a transaction is the relevant information.

§  Customer Homogeneity:

§  MBA assumes a degree of homogeneity among customers.

§  This means that the behavior of one customer is representative of the behavior of other customers.

§  No Time Sensitivity:

§  MBA often assumes that the order in which items are purchased doesn't matter.

§  In some cases, the temporal order might be important, and other methods like sequence analysis might be more suitable.

Two opensource datasets uploaded by the Department of Enterprise Services (DES) to www.data.gov, which included Washington State agencies Amazon purchasing behavior, was used.  Data.gov is the opensource repository/organization that hosts these datasets.  DES is the government procurement organization that is concerned about direct-buy Amazon spending on the small business contractor.

 

The 2021 - 2022 dataset contained 486,273 orders before any duplicates were removed. When testing for duplicates, the 2021-22 set contained 88% duplicate orders, which indicates 88% repeat purchases of line items. Additionally, there were 143,926 orders for the 2022-2023 data set before repeat orders were removed, yielding only 8% duplicate values. Both sets are cleaned and combined for larger exploration.

 

Of the 35 variables in the dataset, 27 variables were excluded from this exploratory analysis to minimize redundancies and reduce noise. The remaining eight variables were factored into the final MBA model. Of note, the data available for this exploratory analysis was limited to three years of data collection.

While the dataset contained instances of purchasing behavior dating back to 2017 through January 1, 2021, the data for 2017-2020 are missing all associations with ‘Agency’ and other germane variables. Consequently, these rows were excluded from the analysis.

Get the datasets here:

https://catalog.data.gov/dataset/state-agency-amazon-spend-fiscal-year-23

https://catalog.data.gov/dataset/state-agency-amazon-spend

 

Available to the public via the government, meaning that the dataset may be limited in accuracy and completeness.

 

Below is the list of the eight variables that factor into this analysis:

Execution:

Data Gathering:

 

Plan and direct data gathering to opensource repositories (Google). Looking for keywords such as “Washington”, “Amazon”, “spending”, “+ .csv”. Next, selecting the 1st to 3rd ranked piece of content (reachable csv file) and inspecting each csv file for quality such as “length” (at least 7k rows), data cleanliness, massive gaps in data, and enough relevant variables to create an ‘X’ and ‘Y’ axis. Available to the public via the government means that it may be limited in accuracy and completeness. In MBA, the dependent variable must be a binary level of measurement (Statology, 2019). Fortunately, all dependent variables are continuous. The dataset is 12% sparse and all missing or null columns will be dropped when cleaning the dataset. The ‘Date’ will be separated by Day, Month, Year in Microsoft Excel prior to python exploration.

 Data Analytics Tools and Techniques: A KDE plot was used to visualize the distribution for normality. The tools used were Jupyter Notebook operating in Python code, running statsmodel api as a reliable open-source statistical library. Due to the data size, a Pandas data frame was called. Numpy and Seaborn were used for visualizations. An MBA test with statsmodel’s function was performed and a presentation layer of Univariate and Bivariate graphs were created.

Justification of Tools/Techniques:

Python will be used for this analysis because of Numpy and Pandas packages that can manipulate large datasets (IBM, 2021). The tools and techniques are common industry practice and have consensus of trust. An MBA test was selected because it can compare distributions of data of non-parametric data. However, the MBA test does not assume normality in the data (Statology, 2019). The technique is justified because the categorical variables can be converted into binary which can then be plotted against one another to identify instances of cross-selling. This may also reveal different modes of frequency distribution. Another reason why MBA test is ideal is because the data is based off human purchasing behavior, which is notoriously skewed. Overall, this is an exploratory quantitative data analytic technique and a descriptive statistic. Because of the size of the dataset, Pandas and Numpy will be called. Python is being selected over SAS because Python provides better visualizations (Panday, 2022).  

 

Project Outcomes: To find statistically significant differences, the proposed end state is a Market Basket Analysis descriptive statistical model that can compare the distribution shapes of the targeted groups (Statology,2019). Presented will be a visualization of the frequency distribution of the ‘Amount’ against other variables for statistical baseline comparison. A cleaned dataset of all the correctly labeled columns and rows was compiled for replication. Exploratory graphs yielded a better understanding of previously stated groups, giving support to possible cross-purchasing. Lastly, a copy of the Jupyter NoteBook with the Python code will be available, along with a video presentation added by PowerPoint. According to the same previously mentioned study by CDE in 2020, MBA was instrumental in support of the alternative hypothesis against other categorical variables. (GATR, 2020).

 Get access to the code on github here: WashingtonAmazonPurchasingBehaviorAnalysis21_23

The following python code documents the discovery for scientific replication:

The amount variable appears to be left-skewed and non-parametric.

The above graphs indicate that the Principal Components are equal in variance. There is no distinct elbow or cutoff in the residuals.

In final analysis the following research questions are answered:

Additionally, the difference in Agency spending did not allow the data to meet the MBA model assumptions of “customer homogeneity”. According to the correlation matrix, all correlations are low-value negative numbers between categories, indicating an unlikely chance of creating an accurate predictive model. The variances of eigenvectors was equal and KMeans cluster visualizations appear to be a dense random blog with scatterings of haphazard residuals. However, more time and data may yield machine learning models with accurate predictions.

 

§  We accept the MBA Null Hypothesis (H0): There is NO significant association between Washington State purchases between agencies and categories at minimum association rate of .01.

§  We accept the PCA Null Hypothesis (H2): There is no significant reduction in dimensionality achieved by PCA. The variance explained by the first ‘k’ principal components is not significantly different from the total variance in the original dataset.

§  We accept the KMeans Clustering Null Hypothesis (H4): There is no significant difference between the groups identified by KMeans clustering. The centroids of the clusters do not significantly differ from each other.

The following questions were answered:

 

1.     Which Washington State agencies have purchased from Amazon?

2.    What are the top 20 highest spending agencies?

3.    What are the spending variances per top 20 agencies?

4.    What are the top 20 agencies most purchased item-category?

5.    What basket of items was the highest spending day of the top 10 agencies?

6.    What are the top 20 most purchased item categories?

7.     What are the most common words in the item 'description'?

8.    How many Amazon orders are made daily?

9.    What is the highest spending month?

10. Is the rate of Amazon spending increasing or decreasing?

11.  Can a descriptive Market Basket Analysis (MBA) model be developed from the data?

Small expenses add up over time; Like a reprimanding a child that got ahold of a USBank Credit Card. Notably, the Washington Department of Children, Youth, and Families, and DSHS don’t seem to be interpreting the RCW 39.26.125 as most Departments. Glaringly, children’s products like diapers, formula, car seats, infant toys, and clothing now scale to a cooperative purchase level. Costly, considering basic consumer taxes, shipping, and interest accrued from USBank. “Apparel” is not an “Emergency,” when our homeless have bags of clothing.

 

In fact, when an agency can scale emergency spending, it can “scale” and “reclassify” “Emergencies”; The big data accumulated overtime is overwhelming.

Though this is the first of 50 samples made publicly available. Imagine what is the case on a national level?

Moreover, while Amazon.com maybe perceived as a leader of book selling, there are many more cost saving options to acquire books for the Department of Corrections. Little purchases add up; Considering the $4 million+ expense over the timeline, the Department of Corrections would be more responsible using a cooperative purchase like any other public library. There is no shortage of books that are publicly donated and books are not an “Emergency”.

 The Department of Corrections is not alone in their fallacious thinking, the Secretary of State, and Department of Health appear to be statistically similar. Measurably, many agencies accept “Office Products” and “Personal Computers” as acceptable for Amazon direct purchases.

While the indent of the individual agencies is not to harm small bussiness, these reckless behaviors need to change. Justifiably, this spending behavior needs to change because it is increasing. Hopefully this information will help inform Washington taxpayers of the spending habits by their incumbents.  These expenses hurt sustainable certified small business and take clothing, food, and laptops from the children of communities supported by small business. However, this is good news for Amazon shareholders.

 

 

Command and Signal Plan:

 

The findings will be disseminated to the directors of the trouble agencies via certified mail. The blog articles and follow-on video explanations will be disseminated via corradiated-clicking-campaigns over 50 different platforms. Further dissemination will be conducted by supporting agencies of the BloomingBiz Media Network and associated press. Sharing is Caring.

Work Cited:

Market basket analysis when procuring program goods and modifying contracted-for product lists. Food and Nutrition Service U.S. Department of Agriculture. (n.d.). https://www.fns.usda.gov/usda-fis/market-basket-analysis-when-procuring-program-goods-and-modifying-contracted-product-lists

Market basket analysis. (n.d.). https://rstudio-pubs-static.s3.amazonaws.com/463712_490513db5cc54f6b87da6cdc756eceb8.html

Pandey, Y. (2022, May 25). SAS vs python. LinkedIn. https://www.linkedin.com/pulse/sas-vs-python-yuvaraj-pandey/

RCW 39.26.125: Competitive solicitation-exceptions. (n.d.). https://app.leg.wa.gov/rcw/default.aspx?cite=39.26.125

Segaline, M. (2024, February 16). What is a cooperative purchase?. Bloomingbiz.marketing. https://www.bloomingbiz.marketing/blog/what-is-a-cooperative-purchase

Team, I. C. (2021, March 23). Python vs. R: What’s the difference? IBM Blog. https://www.ibm.com/blog/python-vs-r/

Use of Amazon Business. Use of Amazon Business | Department of Enterprise Services (DES). (n.d.). https://des.wa.gov/purchase/purchase-cards/use-amazon-business

Follow Data Mining Mike for for great intelligence!

Michael Segaline

A Data Scientist and Search Engine Optimization Expert.

https://www.bloomingbiz.marketing
Previous
Previous

See Pork in Congressional Bills with AI!

Next
Next

How to Detect Malware with Random Forest?