Main Article Content

Abstract

Loan processing is an important aspect of the financial industry, where the right decisions must be made to determine loan approval or rejection. However, the issue of default by loan applicants has become a significant concern for financial institutions. Hence, ensemble learning needs to be used with random forest and Extreme Gradient Boosting (XGBoost) algorithms. Unbalanced data are handled using the Synthetic Minority Over-sampling Technique (SMOTE). This research aimed to improve accuracy and precision in credit risk assessment to reduce human workload. Both algorithms used a dataset of 4,296 with 13 variables relevant to making loan approval decisions. The research process involved data exploration, data preprocessing, data sharing, model training, model evaluation with accuracy, sensitivity, specificity, and F1-score, model selection with 10-fold cross-validation, and important variables. The results showed that XGBoost with imbalanced data handling had the highest accuracy rate of 98.52% and a good balance between sensitivity of 98.83%, specificity of 98.01, and F1-score of 98.81%. The most important variables in determining loan approval are credit score, loan term, loan amount, and annual income.

Keywords

classification ensemble learning loan random forest XGBoost

Article Details

How to Cite
Anadra, R., Sadik, K., Soleh, A. M., & Astari, R. A. (2024). Loan Approval Classification Using Ensemble Learning on Imbalanced Data. Enthusiastic : International Journal of Applied Statistics and Data Science, 4(2), 85–95. https://doi.org/10.20885/enthusiastic.vol4.iss2.art1

References

Read More