Sleepless in Seattle

Problem statement

In a not-so-distant future, Seattle police department is experiencing an increase in 911 calls and a shortage of operators to answer them in time. The department must prioritize calls with a violent nature(assault, robbery, etc) over non-violent ones (illegal parking, noise complaint, etc). They need a way to categorize an incoming call as either violent or non-violent, and respond to the violent events with top priority.

Strategy

1. Acquire a 911 call data set and categorize each incoming call as either violent or non-violent

2. Apply machine learning and create a predictive model

3. Use the model to predict future incoming calls

Dataset

The dataset was acquired from data.gov. A sample selection is shown below.

The "Event Clearance Code" provides the clue to whether an event is violent or not, based on which we added a column "Violent" as the target variable.

This has become a classification problem that can be approached with various methods such as decision tree, nearest neighbor, random forest, logistic regression, and so on.

Feature selection and modeling

After data cleaning, the size of the pandas dateframe is about 300,000. We further split it into a train set(2/3) and a test set(1/3).

After trying out various models on the train set, we chose a logistic regression model(threshold* = 0.04). The following features were selected to be used in the model.

Month
Day of week
Time of day
Police sector

*threshold: for each event, there is a probability of it being violent according to the model, we label it to be "violent" if that probability is greater than the threshold. We evaluated the balance between precision and recall before choosing 0.04 as the threshold in the model. Precision(0.06 with our model) measures the percentage of true violent events in all reported violent events; recall(0.57 with our model) measures the percentage of true violent events reported out of all true violent events.

Call prediction web app

We built a web app to predict whether a future incoming call reflects a crime of violent nature based on the predictive model. The user selects the features from the drop-down menu and hits "SUBMIT" to see the predicted result of a particular event.

Call ranking system

We also created an aggregated version of the web app above in the form of a call ranking system.

Suppose the department is only able to pick up 50% of the total incoming calls. If the calls were picked up randomly, then it is expected to capture only 50% of the violent crimes on average. However, with our predictive model, the department is able to sort the calls by probability of being violent and take action in the top 50% of the list. This will help capture more violent crimes with the same amount of resource.

In fact, as shown in the comparison below, the model improved the capture rate by 50%. The left side shows the violent crimes captured using a random strategy (35 out 70) ; the right side predictive model strategy (52 out 70).

Summary

This project made substantial use of the following skills:

feature selection, classification modeling, web design, data visualization.

To improve the model in the future, demographic information will need to be incorporated into the study. Moreover, the placement of police stations in the city has a significant bearing on how this model is used in real-life scenarios.