How to Forecast Drug Overdose Deaths in 2023

Abstract:

 

Bottom Line Up Front: Over the next 650 days, 120 days are predicted to have a high frequency of Overdose Deaths; The deaths are not seasonal.

 

Attempting to improve the human condition with Data Science, the BloomingBiz Media Network has decided to explore Accidental Drug Death Data from the State of Connecticut. In wake of the Opioid Crises, the State of Connecticut has been the Vanguard for daily TimesSeries related open-source datasets for the public to mine. The dataset was extracted, cleaned, and statistically tested. The analysis compares the following variables: Age, Sex, Race, Heroine, Fentanyl, Ethanol, Meth/Amphetamine, Cocaine, Any Opioid, and Location. These different groups were explored and tested for stationarity using the ADfuller test, then re-tested; Processed through an auto_arima() method and to find the optimal parameters for the model. What resulted is an Auto Regressive Incremental Moving Average (ARIMA) forecast model predicting the amount of people that will die, Death Calendar. The findings of the exploratory analyses are: 55-year-old men are 20% more likely to die any other Age or Sex. Another insight, 55-year-old women are at a higher rate of Accidental Drug Death than 20 years old. The null hypothesis stands true: the number of Accidental Drug Deaths are NOT seasonal; There is NO particular “Season” that Accidental Drug Deaths occur. Shockingly, if a victims race is identified as “black”, they will have a 52% chance of Cocaine in their system.

 

Research Question:

 

Can a ARIMA Forecast model be developed from the data?

 

Null Hypothesis H0:

There is no seasonality in the data.

 

Alternative Hypothesis H1:

There is seasonality in the data.

 

The contribution of this study to the field of Data Analytics and the Human Condition, is to create an ARIMA timeseries model to predict the number of OD deaths per day. With this information a calendar can be created for situational awareness and possible death prevention. Part of good Science and Journalism is ending myths. Two myths in particular: 1. That younger people are at a higher rate of opioid death.  that Overdose Death increase around a particular time of year.

 

An article titled, Drug Overdose using Timeseries Models evaluating the variables of ‘Age’ and daily death count. They found that these variables are key factors in overall understand of the Opioid Crisis.  The Auto Regressive Incremental Moving Average (ARIMA) is used to predict / forecast the daily amount of deaths along a given timeseries. One of the assumptions of the ARIMA model is “Stationarity”, meaning the data distribution is consistent overtime. “One of the assumptions of ARIMA model is that, for a good model, the residuals must follow a white noise process” (Penn State, 2021). That is, the residuals have zero mean, constant variance and also uncorrelated. Understanding these variables can help describe the relationship between the Independent Variables and Dependent Variables.

 

Data Collection

An opensource dataset of Accidental Drug Death data containing the necessary variables about Accidental Drug Deaths in the State of Connecticut. The dataset is from Data.gov. Data.gov is the opensource repository / organization that hosts the datasets.  The dataset contains almost 9,203 rows (before any rows where removed) and 16 columns. The dataset is limited to only 7 months of YouTube’s trending videos; Uploaded from 2012 – 2021. The dataset has multiple columns for possible exploration. Accidental Drug Deaths can involve multiple drugs in some cases. Delimitations for this analysis, only 11 columns (drug types) of the dataset will be explored and only one column “Any Opioid” will be factored into the forecast: The ‘Age’, ‘Sex’, ‘Race’, ‘Herion’, ‘Fentayl’, ‘Ethanol’, ‘Meth/Amphetatmine’, ‘ Cocaine’, ‘Any Opioid’, and ‘Location’; The dataset is easy to work with and the features needed little engineering. Limitations of the study of 100 Independent Variables but only 11 where studied.

Data Extraction and Preparation

 

In order to find statistically significant predictions, the proposed end state is a ARIMA predictive statistical model that can forecast the number of Accidental Drug Deaths per day.  A visualization of the frequency distribution of the daily-deaths against a scaled timeline, indexed at 3/23/2023. A cleaned dataset of all the correctly labeled columns and rows, for replication. A better understanding of previously stated groups with exploratory graphs, giving support as to what time deaths maybe highest and how the death frequency distribution interacts with other variables. Lastly, a copy of the Jupyter NoteBook with the Python code will be available, along with a video presentation. According to the same study ARIMA was instrumental in support for alternative hypothesis, against other categorical variables. (nsc.gov, 2021).

Over dose Death Code for ARIMA model

Now testing the Hypothesis; Testing for Seasonality using the seasonal_decompose() function.

The trend is that the overdose deaths are increasing.

There is no Seasonality. There appears to be no particular season when deaths occur.

Final Analysis

 

In final analysis, the residuals of death-frequency-distribution exhibited no seasonality and tested negative for seasonality. Therefore, the assertion that “Overdose-Deaths occur during particular times of the year”, are categorically false. What is evident, is the increase of the variance of the residuals, indicating that the frequency wave opioid death-count, is increasing. The ARIMA model, even when adjusted for seasonality and processed via SARIMAX , produced a Mean Absolute Error (MEA) of 2.33; Less than 3 deviations from mean and keeping it with-in a Lean Six Sigma standard of 97% accuracy. This model is indexed with a date of March 23, 2023 and predicts the amount of daily deaths for the next 650 days (1.5 years). During the exploratory analysis the distribution comparison of 20-year-olds to 55-year-olds is appears to be skewed to 55-year-olds; Men particularly, are almost double that of women. 55-year-old women appear to be at higher risk of opioid death vs. 20-year-old women. If a victim is black, there is a 51% chance they have cocaine in their system. Most victims are likely to die in their homes and then hospitals.

 

A Link to the Dataset is below:

https://catalog.data.gov/dataset/accidental-drug-related-deaths-2012-2018/resource/de6b0a0a-d48f-4920-ae5f-23f5454e321b

Click here for the : Overdose Death Forecast Calendar

 

Available to the public via Data.gov, meaning that the dataset may be limiting in accuracy and completeness.

To watch the Youtube video on the code click here: How to make a death calendar in 2023

Michael Segaline

A Data Scientist and Search Engine Optimization Expert.

https://www.bloomingbiz.marketing
Previous
Previous

The Man Nobody Knows: Book Review 2023

Next
Next

Drug Overdose Death Calendar 2024-2025