";s:4:"text";s:5671:"Notice the red horizontal line in Fig-11 (B). For instance, the variable aboard and fatalities have a strong negative correlation.Fig-9: Correlation detection for continuous variablesTo treat the correlation, I have applied an unsupervised dimensionality reduction and feature selection approach called the Principal Component Analysis (PCA) for continuous variables, and the Multiple Correspondence Analysis (MCA) for the categorical variables. It is exercised in numerous fields like medicine, environment, education, crime, etc. The 7 independent variables are, crash year, crash month, crash date, crash minute, aboard, fatalities and crash survivor. The reasoning here is, balanced or imbalanced data is dependent on distribution of data points. While we avoid losing information with this approach, we also run the risk of over fitting our model as we are more likely to get the same samples in the training and in the test data, i.e. Some Clustered crash Points in Auckland, New Zealand. See Fig-13 and the confusion matrix results are shown below.Fig-13: Accuracy plot of predictive models on balanced dataAfter balancing the data and reapplying a logistic regression algorithm, the accuracy to predict the air crash survivor accuracy reduced to 98%, as shown in confusion matrix below.In answering the second objective of this analysis, it’s been found that the logistic regression model gives 98% accuracy in determining the accuracy of an air crash survival. For the year 2016, the USA alone had recorded 37, 461 motor vehicle crash-related deaths, averaging around 102 people per day. The aviation accident database includes: All civil and commercial aviation accidents of scheduled and non-scheduled passenger airliners worldwide, which resulted in a fatality (including all U.S. Part 121 and Part 135 fatal accidents) All cargo, positioning, ferry and test flight fatal accidents. And each such variable having more than 10 distinct levels. This study is an attempt to explore the possible causes of such air crashes, and to determine if air travel is a safe option.a. Rain, snow, and frost count also for a high percentage.The geographic data visualizations indicate clearly where clashes happen.
For instance in one military action induced civil aircraft crash took the life of all 290 people aboard. Below you will find information about how the research is done, the resulting data and statistics, and information on funding and grant data. For instance, in this analysis I have shown that imbalanced data give 100% accuracy, in contrast the balanced data accuracy reduces to 98%. It will basically be the same approach. Data visualization helps in determining possible relationship between variables.