Evaluating the Performance of Explainable Machine Learning Models in Traffic Accidents Prediction in California

Parra, Camilo; Ponce, Carlos; Salas, Rodrigo

Keywords: Traffic Accidents , Geolocation data , Random Forest , Decision Trees , Gradient Boosted Trees

Abstract

Reducing and preventing road traffic accidents is a major public health problem and a priority for many nations. In this paper, we seek to explore the performance of explainable machine learning models applied to the prediction of road traffic crashes using a dataset containing nearly three million records of this type of events and the conditions under which they occurred. To achieve this, the dataset US Accidents -A Countrywide Traffic Accident Dataset is used. First we will clean, standardize and reduce the data, then we will transform the time and location values using a geohashing library developed by Uber, later, we will increase our dataset to obtain events classified as ‘not an accident’ using web scraping techniques in the data sources of the original authors of the dataset. Then, we will evaluate the performance of different implementations of Random Forest and decision trees, we obtained a performance superior to 70% for the F1 score of these models. Finally, we conclude that weather conditions are strongly related to the car accident.

Más información

Editorial: IEEE
Fecha de publicación: 2020
Año de Inicio/Término: 16-20 Noviembre 2020
Página de inicio: 1
Página final: 8
Idioma: English