ILLNESS, An Alternative Way To Assess Wildfire Risk

UCSD Data Science Capstone Project
Group Members: Gloria Kao, Shentong Li, Neil Sharma

Introduction

Wildfires are a major environmental and safety issue in Southern California, and they are becoming more common and threatening due to climate change and drought. One of the most common source of ignition is faulty power lines, and utility companies like San Diego Gas & Electric (SDG&E) have been making efforts to analyze the reasons. Through the power of data science, utility companies can analyze the risk of a wildfire happneing and prevent it by turning off the power grid, thus lowering the probability of a bush catching on fire because of nearby, high-voltage power lines. Such an event is called a power safety public shutoff (PSPS). SDG&E's current data analysis include weather reports (wind speed), vegetation in the geographical area, and the conductor span impact on serviced customers. Our project aims to further analyze possible reasons for a wildfire, and create a more data-informed system that decides when a emergency shutoff is indeed needed or when it might impact people more negatively than positively.

Objective

To create a multi-factor machine learning model designed to enhance the accuracy of PSPS decisions.
We call it the ILLNESS Model, which stands for Insights on...
Life Of people, paying attention to those who need medical devices and critical services.
Living Situational, major social factors such as elections or concerts.
Nature Vegetation around the area, particularly the dry plants and their fuel level.
Energy Power grid data, such as the location and type of a conductor.
Service Last time SDG&E serviced an area for maintenance.
Season Environmental data such as wind gust and rainfall.

Data

Based on the factors we have listed above, we obtained several datasets to train our models.
Weather Forecasted & observed data from SDG&E website.
Vegetation Tree density, VRI (Vegetation Resource Inventory).
Geographic Elevation, HFTD (High Fire Threat District).
Conductor Material type, age, wire risk, historical maintenance.
Living Population density and medically vulnerable customers.
All datasets are provided by SDG&E. They are collected by the company and kept privately under security.

Methodology

We coded a few different types of machine learning models in Python, with the help of data science packages such as Pandas, NumPy, and Scipy.

We have 4 models in total, which is 2 less of our proposed factors in ILLNESS because some of the factors are similar enough to be combined into one model. Each model uses a different machine learning algorithm that is best suited to their data type. For each model, a corresponding type of dataset is taken as input and learns the mathematical parameters according to the algorithm. The learned model predicts the wildfire risk using a test dataset and outputs a score, which we collect for the final composite ILLNESS model.

flowchart
Figure 1 - Flowchart of our risk assessment model. Sub-models for weather patterns, vegetation risk, energy infrastructure, and life-critical services generate intermediate embeddings, which are combined into a final composite model for wildfire risk prediction.

Results

Model Comparison Table

Model Features Output
Weather Model - MLP Temperature, dryness, windspeed A weather risk score, reflecting how weather conditions contribute to potential wildfire ignition or spread.
Nature Model - Linear Regression Latitude, longitude, VRI, strike trees, elevation. A weighted Nature Index (scaled 1-10).
Energy/Service Model - Random Forest Upstream HFTD, Days since work order (upstream/downstream), miles. R²: 0.537, showing moderate predictive power.
Life Model - Custom Weighted Function Population density, number of customers served, presence of critical facilities. A custom score out of 100 that takes into account critical customers such as essential service and customers on life support for each region of SDG&E's territory.

The "Features" column shows the most critical features taht are used in our intermediate models. They help us understand what our model is looking at when considering wildfire and/or PSPS risk.

Predicted Wildfire Risk Using the Intermediate Models

Below is an interactive heatmap that shows our predicted wildfire risk. A redder area indicates a higher risk. The areas are divided into districts of San Diego County, making it easier for users to find their location. When the mouse is hovered over an area, the tooltip shows the district name and wildfire risk breakdown: energy, nature, weather, and overall. The red-green color scale shows according to the overall score.

Final Composite ILLNESS Model

We created a mathematical model to assess wildfire risk and impact of power shutoffs. It includes all the aforementioned factors: weather, nature, and infrastructure wildfire risk within an area, plus the impact on customers if power is shut off. Each varaible is weighted according to its importance, and produce a final score that indicates the PSPS risk.

The overall composite score measures the magnitude of deviation from zero, with larger absolute values indicating stronger recommendations. A negative score suggests that the area should not undergo a PSPS, meaning power should be maintained to minimize impact. Conversely, a positive score indicates that the area should be PSPSed, prioritizing wildfire risk mitigation over potential disruptions.

Please check our report (linked at the bottom of the page) if you would like to see more details about the composite function.

Predicted Areas for PSPS Using Our ILLNESS Model

Below is an interactive heatmap that shows PSPS risk, i.e. where a PSPS should occur or not. The difference between this visualization and the previous one is that the population risk (life/living factor) is considered as well, so we can assess the impact a PSPS will bring to a community. A redder area means a higher PSPS risk, therefore should not be shut off. As you can see, the red areas are mostly downtown, where the population is dense and that increases the life risk factor. If an area is green, that means the PSPS impact is low, or that risk of wildfire outweighs the risk on the community and a power shutoff is needed.

Discussion

As we can see from the heatmaps, the east side of San Diego County has higher wildfire risk but lower PSPS risk. Geographically, the east side has more valleys that guide strong and dry Santa Ana wind into the area, which makes wildfires more likely. Those districts are also less populated so the impact of an PSPS is predicted to be low (as illustrated by the green color), and a PSPS can be safely issued without too much concerns from the neighbors.

On the flip side, the west side of the county has a lower wildfire risk but higher PSPS risk. This is because most of the population live in or close to downtown San Diego, which is on the coast. Shutting off the energy of this area could lead to many problems, from general inconvenience to larger safety on the road or in public areas. There are special cases like Ramona, where the heatmap colors change from red to green. Even though these districts are populated and the life factor should be high, the risk of wildfire still outweighs the risk on the community, and a PSPS is strongly advised to mitigate damage on the houses and on people.

Future Improvements

This project could be for internal use, implemented by SDG&E as a more data-informed decision maker for issuing a PSPS. The datasets can be expanded to include more and newer data, such as the meteorology data collected from SDG&E's weather stations every 10 minutes. We could also include data from different sources, such as public weather data, satelite images of vegetation, and public census for the life factor.

Our project can also benefit the general public, as the ILLNESS model provides a single numerical value that is easy to interpret, even for those who may not understand the PSPS decision process entirely. Although our current geographical heatmaps are static, we could improve them by creating a live dashboard that updates the score periodically. It would again include the tooltip details with score breakdowns, as we want to provide transparency to people who consumes energy from SDG&E so they are aware of possible PSPS events and the reasons.

Acknowledgement

We would like to express our sincere gratitude to my mentors at San Diego Gas & Electric (SDG&E) — Phi Nguyen, Kasra Mohammadi, Jacob Wigal, Moon, Yumin Park, Kelly H., and others — for their invaluable guidance and support throughout this project. Their expertise provided critical insights into SDG&E's original scoring system, shaping our understanding of risk assessment.

View More

Click here to visit our GitHub repository.
Click here to view our poster.
Click here to view our report.