Enthusiastic : International Journal of Applied Statistics and Data Science

Implementation of Hotelling’s T2 Method in Quality and Capability Control of Newlab Collagen Production Processes

Rahmadana Kadija Indrani — 2025-10-01

Every company has quality standards that are determined for the production process. However, there are factors that occur in the production process that causes defects in the product. From these problems, this research was conducted to analyze the quality control, causal factors, and performance of the production process on Newlab Collagen products. The methods used in production quality control were Hotelling’s T² control chart, fishbone diagram, and process capability analysis. In the Hotelling’s T² control chart, the multivariate observation data was divided into two phases, with five quality indicators. The results of the first phase of the Hotelling’s T² control map showed that the quality indicators of the Newlab Collagen production were out of control, which caused by unstable machine factors. Based on control chart, the second phase showed that the quality indicators of the Newlab Collagen production process were still out of control. This condition was evidenced by the process capability value in phase I and phase II being less than one. These findings suggest that the company needs to make improvements, optimization, and quality control in the production.

Analysis of Factors that Influence Maternal Mortality Rates Using Generalized Poisson Regression

Yuniar Ines pratiwi — 2025-10-20

Maternal Mortality Rate (MMR) is the number of deaths of women within 42 days after childbirth or during pregnancy. Objective: This study aims to identify factors affecting MMR in East Java and compare the performance of the Generalized Poisson Regression (GPR) model with Poisson regression. The method used is Generalized Poisson Regression, a regression model for count data, which extends Poisson regression to overcome the problem of overdispersion or underdispersion with data derived from the East Java Health Office, including MMR as the dependent variable, as well as five variables that are thought to affect it in 38 districts/cities. The GPR model proved superior to Poisson regression with an Akaike Information Criterion (AIC) value of 239.515 to identify factors affecting maternal mortality. Factors such as delivery handled by health workers, K6 visits by pregnant women, provision of diphtheria-tetanus immunization, and obstetric complications affect MMR in East Java in 2022.

Pension Funding Calculation Using the Benefit Prorate Method of the Constant Percent Type and Vasicek Interest Rates

Laili Nur Khafifa — 2025-10-31

Pension funds are financial programs established by individuals or companies to secure the future of employees by providing benefits during their retirement years. Pension funds are built up through contributions made by both the participants (employees) and the employer. To calculate the amount of pension benefits, normal costs, and actuarial liabilities, various method are used, including the constant percent benefit prorate method. A key factor influencing these calculations is the interest rate. This study employs the Vasicek model, a stochastic interest rate model, to analyze the Unfunded Actuarial Liability (UAL). The analysisi reveals that the amount of normal cost (annual contributions) will vary, and both contributions and actuarial liabilities that are calculated using vasicek interest rate for each participant will adjust based on the interest rate during their retirement period. The amount of UAL is derived from the discrepancy between the total amount of actuarial liabilities from all the participant in certain period and the accumulated funds. The UAL is sufficient to cover future pension fund payments when calculated using the Vasicek interest rate model.

Spatio-Temporal Modeling of Crime Rates Using Geographically and Temporally Weighted Regression

Robiansyah Putra Putra — 2025-10-30

This study analyzes the spatio-temporal modeling of crime rates in 35 regencies and cities in Central Java using the geographically and temporally weighted regression (GTWR) method. The objective is to investigate how socio-economic factors, including the open unemployment rate, percentage of the poor population, population density, average years of schooling, job vacancies, labor force participation rate, and labor wage, influence crime rates across different regions and periods. The goodness-of-fit test results indicateed that the GTWR model had an R-squared value of 93.51%, higher than the 88.64% of the geographically weighted regression (GWR) model, demonstrating GTWR’s ability to explain crime data variations that were heterogeneous both spatially and temporally. Partial significance tests and mapping results showed that the influence of variables differed across years and regions, with population density and labor-related factors consistently being the main predictors. These findings highlight the importance of designing crime prevention policies that are locally tailored and based on spatio-temporal evidence.

Analyzing Sentiments on IISMA Discontinuation Rumors with SVM, Random Forest Classifier, and XGBoost Classifier

Michelle Intan Handa — 2025-10-30

Indonesian International Student Mobility Award (IISMA) is a government-run student exchange program. Recently, rumors regarding its discontinuation have sparked various public opinions. This study aims to analyze these public sentiments and evaluate which machine learning model is most suitable for classifying sentiment labels in the dataset. The models tested included support vector machine (SVM), random forest classifier (RFC), and extreme gradient boosting (XGBoost) classifier. The dataset consisted of 630 tweets scraped from Twitter and was split into an 80:20 ratio, with 80% allocated for training and 20% for testing. The results indicated that both SVM and RFC were the most effective models, achieving the highest accuracy of 85.44%. Sentiment analysis reveals that the majority of public opinion is positive, suggesting that most people agree with the discontinuation of the IISMA program because the program is perceived as nonurgent and not a current national priority. These findings provide insights into public sentiment and highlight the utility of machine learning models in classifying such sentiment data effectively.

Log-Linear Analysis of the Association among Hematological Variables in Dengue Hemorrhagic Fever Cases

Miftahul Irfan — 2025-10-31

Health data are often analyzed in their continuous form through approaches such as linear, logistic, or survival models. In this study, hematological variables were dichotomized based on established clinical cut-offs to enable log-linear analysis of associations among categorical variables, acknowledging the potential loss of information from this transformation. A log-linear model was applied to evaluate independence, dependence, and interaction patterns among leukocyte, hemoglobin, and hematocrit categories in a dengue hemorrhagic fever (DHF) patient dataset. Previous analyses using survival models identified these variables as factors associated with recovery rates; however, these models did not capture their interaction structure. Log-linear analysis was therefore employed to examine these associations more comprehensively. The best-fitting model was identified as , which included two-factor interactions between leukocyte–hematocrit and hemoglobin–hematocrit. This model demonstrated a good fit (Pearson , , ), including a three-factor interaction resulted in a saturated model (= 0) and did not improve model performance. These findings highlight significant interaction patterns among hematological variables in DHF patients, providing a more detailed understanding of their joint associations.

Tourist Preference Analysis Based on Google Reviews Using the DBSCAN Method

Marita Qori'atunnadyah — 2025-10-30

Tourism is a strategic sector contributing to regional economic growth. Although Lumajang Regency offers prominent natural destinations, data-based insights into tourist preferences remain limited. This study analyzed tourist preferences using Google Reviews through a text mining approach that integrated the density-based spatial clustering of applications with noise (DBSCAN) algorithm and lexicon-based sentiment analysis. Data were collected via web scraping from six major destinations, yielding 16,904 reviews, of which 9,800 contained analyzable text. The text data were preprocessed using the term frequency-inverse document frequency (TF–IDF) to generate numerical representations prior to clustering. Using DBSCAN with parameters ε = 0.8 and MinPts = 4, one main cluster comprising 9,353 reviews and 447 outliers was identified. The main cluster was dominated by keywords such as waterfall, beautiful, and scenery, emphasizing the visual appeal of Tumpak Sewu as Lumajang’s tourism icon, while the outliers reflected reviews from international visitors and practical travel information. Sentiment analysis showed that most reviews were positive (68.0%), followed by neutral (24.1%) and negative (7.9%). These findings indicate a predominantly positive perception of Lumajang tourism, though accessibility and facilities require improvement. The study demonstrates the potential of digital review data for developing data-driven tourism management and promotion strategies.

Recommending E-Commerce Platforms for MSMEs: A Sentiment Analysis Approach

Imam Adiyana — 2025-12-25

The rapid growth of e-commerce in Indonesia presents significant opportunities for micro, small, and medium enterprises (MSMEs), yet the diversity of marketplace platforms complicates the selection of an optimal sales channel. This study addressed this challenge by developing a data-driven recommendation system based on sentiment analysis of user reviews. Utilizing a dataset of 80,000 reviews scraped from four major platforms on the Google Play Store (Shopee, Tokopedia, Lazada, and Blibli), two classification approaches were implemented and compared: support vector machine (SVM) and long short-term memory (LSTM). Both models demonstrated a competitive performance, enabling effective sentiment categorization. Furthermore, multinomial logistic regression was employed to analyze the influence of key variables rating, number of likes, and marketplace brand on sentiment outcomes. The analysis revealed that Shopee yielded the highest probability of receiving positive reviews (97.82%) and showed no significant association with negative sentiment. Consequently, this study recommends Shopee as the primary platform for MSMEs to enhance their digital presence and sales performance. The primary contribution lies in integrating machine learning-based sentiment analysis with statistical modelling to generate actionable, evidence-based marketplace recommendations for MSMEs.