Stepwise Regression Analysis on Probability of Forest Fire

Project Description: The relationships between the 'Probability of Forest Fire' in Algeria and its various weather components have been estimated. Stepwise Linear Regression has been performed for this purpose and K-fold Cross Validation (K=10) has been carried out to evaluate the performance of the model. SAS has been used for the regression and validation. Prior to that, Python was used for data cleaning.
Data Introduction: The dataset includes 2,243 records of weather components' data.

1. Date: (DD/MM/YYYY) Day, month, and year.
2. Temperature: Temperature at noon (temperature max) in Celsius degrees: 22 to 42.
3. Wind: Wind speed in km/h: 6 to 29.
4. RainAmount: Total rain on that day in mm: 0 to 16.8.
5. FineMoisture: Fine Fuel Moisture Code (FFMC) index: 28.6 to 92.5.
6. DuffMoisture: Duff Moisture Code (DMC) index: 1.1 to 65.9.

7. Drought: Drought Code (DC) index: 7 to 220.4.
8. InitialSpeed: Initial Speed Index (ISI) of the fire: 0 to 18.5.
9. BuildUp: Buildup Index (BUI) index: 1.1 to 68.
10. WeatherIndex: Fire Weather Index: 0 to 31.1.
11. FireProb: The probability percentage of the occurrence of forest fire: 0 to 100.


There are 9 predictor variables: Temperature, Wind, RainAmount, FineMoisture, DuffMoisture, Drought, InitialSpeed , BuildUp, and WeatherIndex.
The response variable is FireProb.

Methods Used: * A least square regression has been carried out with all the predictor variables.
* Stepwise Regressions and K-fold validations have been performed following this:
Random_variable = Random integer between 1 and K;
For i=1 to K do:
 Training_DataSet = K-1 folds. (random_variable ≠ i);
 Stepwise regression performed on Training_Dataset with entry significance level = remove significance level = 0.05;
 Testing_DataSet = 1 fold. (random_variable =i);
 Calculate the model fit statistics on the Testing_Dataset;
End;
Calculate the average test results from all the K-folds.


Results:











Note: The original dataset has been collected from UC Irvine Machine Learning Repository (https://archive.ics.uci.edu/ml/index.php).
The relevant research paper is: Abid, F., & Izeboudjen, N. (2020). Predicting forest fire in Algeria using data mining techniques: Case study of the decision tree algorithm. In International Conference on Advanced Intelligent Systems for Sustainable Development (pp. 363-370). Springer, Cham


Link to the Github repository