Main Article Content

Abstract

The deadliest infectious disease in Indonesia is tuberculosis (TB), and South Sulawesi is one of the provinces that contributed the most tuberculosis cases in Indonesia in 2018 with 84 cases per 100,000 population. This study aims to identify variables that could explain the proportion of TB cases in South Sulawesi. The data used has many explanatory variables, and there are outliers. Sparse Least Trimmed Squares (LTS) analysis can be used to handle data that has many explanatory variables and outliers. The resulting sparse LTS model successfully selects and shrinks the variables to 14 variables only. In addition, based on the value of R2 and RMSE for the model evaluation, the sparse LTS shows satisfying results rather than classical LASSO. The government can focus on these factors if they want to reduce the proportion of TB cases in South Sulawesi.

Keywords

LASSO Outliers Penalized regression Tuberculosis Robust regression

Article Details

How to Cite
Randa, T. M., Tinungki, G. M., & Sunusi, N. (2022). Modeling the Proportion of Tuberculosis Cases in South Sulawesi using Sparse Least Trimmed Squares. EKSAKTA: Journal of Sciences and Data Analysis, 3(2). https://doi.org/10.20885/EKSAKTA.vol3.iss2.art6

References

  1. World Health Organization, Global tuberculosis report 2018, 2018.
  2. Badan Pusat Statistik Sulawesi Selatan, Provinsi Sulawesi Selatan Dalam Angka 2018, 2019.
  3. Badan Penelitian dan Pengembangan Kesehatan, Laporan Provinsi Sulawesi Selatan Riskesdas 2018, 2019.
  4. Kementerian Kesehatan Republik Indonesia, Profil Kesehatan Indonesia 2020, 2021.
  5. Badan Pusat Statistik Sulawesi Selatan, Provinsi Sulawesi Selatan Dalam Angka 2020, 2021.
  6. R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Series B., 58(1) (1996) 267-288.
  7. A. Alfons, C. Croux, S. Gelper, Sparse least trimmed squares of analyzing high-dimensional large data sets, Ann. Appl. Stat., 7(1) (2013) 226-248.
  8. A. Sejati, L. Sofiana, Faktor-faktor terjadinya tuberkulosis, KEMAS: Jurnal Kesehatan Masyarakat, 10(2) (2015) 122-128.
  9. Y. A. Mait, D. T. Salaki, H. A. Komalig, Kajian model prediksi metode least absolute shrinkage and selection operator (lasso) pada data mengandung multikolinearitas, d'CARTESIAN: Jurnal Matematika dan Aplikasi, 10(2) (2021) 69-75.
  10. E. Yang, A. C. Lozano, A. Aravkin, A general family of trimmed estimators for robust high-dimensional data analysis, Electron. J. Stat., 12(2) (2018) 3519-3553.
  11. A. Hoerl and R. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics, 12(1) (1970) 55-67.
  12. Y. Z. Liang, O. M. Kvalheim, Robust methods for multivariate analysis-a tutorial review, Chemom. Intell. Lab. Syst., 32(1) (1996) 1-10.
  13. Y. Z. Liang, K T. Fang, Robust multivariate calibration algorithm based on least median of squares and sequential number theory optimization method, Analyst, 121(8) (1996) 1025-1029.
  14. R. Maronna, R. Martin, V. Yohai, Robust Statistics: Theory and Methods, John Wiley & Sons, New York, 2006.
  15. P. Rousseeuw and A. Leroy, Robust Regression and Outlier Detection, 2nd edition, John Wiley & Sons, New York, 2003.
  16. P. J. Rousseeuw, Least median of squares regression, J. Am. Stat. Assoc., 79(388) (1984) 871–880.
  17. P. J. Rousseeuw and K. Van Driessen, Computing LTS regression for large data sets, Data Min. Knowl. Discov., 12(1) (2006) 29–45.
  18. C. Sammut and G. I. Webb, Eds., Mean Squared Error, in Encyclopedia of Machine Learning, Boston, MA: Springer US, 2010, pp. 653.
  19. D. Rosadi, Analisis statistika dengan R, 1st ed., Gadjah Mada University Press, Yogyakarta, 2016.