Main Article Content

Abstract

Type 2 diabetes mellitus (T2DM) is a metabolic disorder primarily driven by insulin resistance, involving complex genetic regulation. Understanding the molecular mechanisms underlying insulin resistance is crucial for identifying therapeutic targets. This study compared the performance of two biclustering algorithms, factor analysis for bicluster acquisition (FABIA) and the Cheng and Church algorithm (CCA), in analyzing gene expression data associated with insulin resistance. Using the GSE19420 dataset, simulated missing values were introduced to evaluate the robustness of both methods. Results showed that CCA consistently achieved lower mean squared error (MSE) in reconstructing gene expression patterns, suggesting higher accuracy in capturing co-expression structures. Nevertheless, FABIA effectively detected sparse, biologically relevant clusters. Notably, key genes such as MYO5B, DLG2, AXIN2, and PTK7 were identified within the biclusters, supporting their involvement in insulin signaling and metabolic regulation. These findings underscore the need to select biclustering methods that align with specific analytical goals and offer insights into gene networks involved in insulin resistance.

Keywords

Biclustering Gene Expression Mean Squared Error Insulin Resistance Missing Value Imputation

Article Details

How to Cite
Soemarso, D. R., Siswantining, T., & Pramana, S. (2025). Genetic Cluster Analysis of Insulin Resistance Using KNN Imputation and FABIA-CCA Biclustering . Enthusiastic : International Journal of Applied Statistics and Data Science, 5(1), 101–106. https://doi.org/10.20885/enthusiastic.vol5.iss1.art10

References

  1. American Diabetes Association, “Standards of medical care in diabetes—2022 abridged for primary care providers,” Clin. Diabetes, vol. 40, no. 1, pp. 10–38, 2022, doi: 10.2337/cd22-as01.
  2. M.O. Goodarzi et al., “Classification of type 2 diabetes genetic variants and a novel genetic risk score association with insulin clearance,” J. Clin. Endocrinol. Metabolism, vol. 105, no. 4, pp. 1251–1260, Apr. 2020, doi: 10.1210/clinem/dgz198.
  3. A. Mahmoud and A. Mohammed, “A survey on deep learning for time-series forecasting,” in Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges, A.E. Hassanien and A. Darwish, Eds., Cham, Switzerland: Springer, 2021, pp. 365–392, doi: 10.1007/978-3-030-59338-4_19.
  4. S. Hochreiter et al., “FABIA: factor analysis for bicluster acquisition,” Bioinformatics, vol. 26, no. 12, pp. 1520–1527, Apr. 2010, doi: 10.1093/bioinformatics/btq227.
  5. Y. Cheng and G.M. Church, “Biclustering of expression data,” in Proc. 8th Int. Conf. Intell. Syst. Mol. Biol., 2000, pp. 93–103.
  6. B. Pontes, R. Giráldez, and J.S. Aguilar-Ruiz, “Biclustering on expression data: A review,” J. Biomed. Inform., vol. 57, pp. 163–180, Oct. 2015, doi: 10.1016/j.jbi.2015.07.003.
  7. Breast Cancer Association Consortium, “Breast cancer risk genes—association analysis in more than 113,000 women,” New Engl. J. Med., vol. 384, no. 5, pp. 428–439, Feb. 2021, doi: 10.1056/NEJMoa1913948.
  8. M.I. Love, A.M. Bush, L.H. Chen, S.K. Patel, A.J. Cutler, and J.D. Cooper, “Large-scale genomic analyses reveal insights into pleiotropy across traits,” Nat. Commun., vol. 13, Jun. 2022, Art. no. 3428, doi: 10.1038/s41467-022-30678-w.
  9. S.C. Madeira and A.L. Oliveira, “Biclustering algorithms for biological data analysis: a survey,” IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 1, no. 1, pp. 24–45, Mar.–Jun. 2004, doi: 10.1109/TCBB.2004.2.
  10. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York, NY, USA: Springer, 2019.
  11. A. Prelić et al., “A systematic comparison and evaluation of biclustering methods for gene expression data,” Bioinformatics, vol. 22, no. 9, pp. 1122–1129, May 2006, doi: 10.1093/bioinformatics/btl060.
  12. M.G. Rahman and M.Z. Islam, “Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques,” Knowl.-Based Syst., vol. 53, pp. 51–65, 2013, doi: 10.1016/j.knosys.2013.08.023.
  13. T. Siswantining, A.E. Aminanto, D. Sarwinda, and O. Swasti, “Biclustering analysis using plaid model on gene expression data of colon cancer,” Austrian J. Stat., vol. 50, no. 5, pp. 101–114, Aug. 2021, doi: 10.17713/ajs.v50i5.1195
  14. T. Siswantining, D. Rahmawati, S. ‘Uyun, and A.Z. Arifin, “Biclustering of diabetic nephropathy and diabetic retinopathy microarray data using a similarity-based biclustering algorithm,” Int. J. Bioinform. Res. Appl., vol. 17, no. 4, pp. 343–362, 2021, doi: 10.1504/IJBRA.2021.117934.
  15. O. Troyanskaya et al., “Missing value estimation methods for DNA microarrays,” Bioinformatics, vol. 17, no. 6, pp. 520–525, Jun. 2001, doi: 10.1093/bioinformatics/17.6.520.
  16. I. Bitan-Roch, D. Levin, and D. Mahgereftekhari, “Imputation of missing PM2.5 observations in a network of air quality monitoring stations by a new k-NN method,” Atmosphere, vol. 13, no. 11, Nov. 2022, Art. no.1934, doi: 10.3390/atmos13111934.
  17. G. Gan, C. Ma, and J. Wu, Data Clustering: Theory, Algorithms, and Applications. Philadelphia, PA, USA: SIAM, 2007.
  18. H. Cho, I.S. Dhillon, Y. Guan, and S. Sra, “Minimum sum-squared residue co-clustering of gene expression data,” in Proc. SIAM Int. Conf. Data Mining, 2004, pp. 114–125, doi: 10.1137/1.9781611972740.11.