Enthusiastic : International Journal of Applied Statistics and Data Science

Determination Premiums Motor Vehicle Insurance Using Bonus-Malus Optimal

2025-09-10T13:31:12+00:00

The increasing number of motor vehicles in Sumatera has heightened accident risks, emphasizing the need for motor vehicle insurance to distribute risk between policyholders and insurers. Determining fair and risk-based premium requires consideration of each policyholder’s claim history. This study aimed to determine motor vehicle insurance premiums using the optimal bonus-malus system based on claim data for the minibus category with comprehensive coverage in Sumatera during 2022. The proposed model extended the Bayesian bonus-malus framework by incorporating the trust region reflective (TRR) method for estimating claim severity and the Newton-Raphson method for estimating claim frequency, thereby enhancing parameter estimation accuracy and numerical stability. This approach offers a more equitable and precise premium adjustment mechanism aligned with individual risk levels, contributing to improved risk-based pricing, reduced underwriting losses, and greater transparency for policyholders. The results showed that the claim frequency followed the Poisson-Lindley distribution, while claim severity followed the lognormal-gamma distribution. Based on these models, the premium was computed by multiplying the basic premium by the relative value of the subsequent year and dividing it by the base relative value. Premium decrease in the absence of claims and increase when claims occur.

Sharpe Ratio-Based Dynamic Crypto Asset Allocation with Trend Filtering Using SMA

2025-06-13T07:59:30+00:00

This paper proposes a dynamic cryptocurrency asset allocation strategy that combines Sharpe Ratio-based weighting with trend filtering using the Simple Moving Average (SMA) of Bitcoin (BTC). The model reallocates capital among a portfolio of seven major cryptocurrencies (BTC, ETH, BNB, SOL, TON, TRX, XRP) every three days, conditional on BTC trading above its respective SMA threshold (50-day, 100-day, or 200-day). When BTC trends below the SMA, the strategy shifts fully to USDT to minimize downside risk. Using historical data from January 1, 2024, to January 1, 2025, the study evaluates performance across three SMA configurations and benchmarks against a buy-and-hold baseline. Results show that the SMA-50 strategy achieved the highest cumulative return (+231.51%) and Sharpe Ratio (2.51), significantly outperforming both the longer SMA-based models and the baseline average return (+132.14%). Risk analysis indicates that shorter SMA windows allow more responsive exposure during market uptrends but increase short-term volatility. Overall, the findings support the use of hybrid strategies combining trend-following filters and risk-adjusted allocation for managing crypto portfolios in volatile environments.

Utilizing Geographically Weighted Regression with a Gaussian Kernel to Analyze Unemployment

2026-01-08T14:29:43+00:00

Unemployment is a major challenge in economic development, reflecting an imbalance between labor supply and available job opportunities. This study aimed to examine the spatial variation of factors influencing the open unemployment rate (OUR) in Lampung Province, Indonesia, and to compare the performance of a global regression model with the geographically weighted regression (GWR) model in explaining these variations. The GWR method, using a fixed Gaussian kernel, was applied to capture spatial heterogeneity across regions. Secondary data were obtained from the Statistics Indonesia of Lampung Province in 2023, including economic growth (EG), human development index (HDI), and labor force participation rate (LFPR). The results showed that in the global regression model, LFPR was the only variable that significantly reduced unemployment, while EG and HDI were not statistically significant. The Breusch–Pagan test confirmed spatial heterogeneity, supporting the use of the GWR. The GWR model performed better, with Akaike information criterion (AIC) of 40.8262 and R² of 0.6059. Spatial analysis indicated that EG and HDI positively affected unemployment in several districts, suggesting limited job absorption and possible skill mismatches, whereas LFPR consistently showed a negative relationship with the open unemployment rate (OUR) across regions.

Comparative Machine Learning Methods for ICD-10 Diagnosis Classification

2025-10-20T12:40:08+00:00

The classification of disease diagnoses using the International Classification of Diseases (ICD-10) standard is essential for supporting clinical decision-making and administrative processes in healthcare systems. This study evaluated the performance of three machine learning algorithms, namely decision tree, random forest, and support vector machine (SVM), for ICD-10 diagnosis classification using 3,730 textual medical record entries collected from the Klinik Pratama UIN Sunan Kalijaga, Yogyakarta, Indonesia. The dataset exhibited significant class imbalance, which was addressed using the synthetic minority oversampling technique (SMOTE). The preprocessing procedures included text normalization and Term frequency-inverse document frequency (TF-IDF) vectorization, followed by model development with hyperparameter tuning through grid search cross validation. Model performance was assessed using accuracy, precision, recall, F1-score, confusion matrix, and five-fold cross validation. Random forest achieved the highest mean accuracy at 93.65%, followed by decision tree at 92.25% and SVM at 87.91%. These results indicate that ensemble-based approaches provide more reliable classification outcomes for imbalanced textual medical data. The findings are expected to support the development of semi-automated ICD-10 coding systems and improve the efficiency and accuracy of medical coding workflows.

Forecasting Production Growth of Micro and Small Textile Industries Using SARIMA

2025-09-22T12:13:10+00:00

Micro and small industry (MSME) is an industrial sector that includes small-scale businesses, both those with limited assets and turnover. MSME is an industrial business that is mostly labor-intensive and plays a role in creating jobs and driving the local economy. One of the largest industries in MSME is the textile industry. Production in the textile industry tends to fluctuate due to market demand, availability of raw materials, and economic conditions. Understanding the dynamics of market demand is very important for the government and business actors in making decisions. This study aimed to predict the growth of MSME production in the textile industry using the seasonal autoregressive integrated moving average (SARIMA) method. Several SARIMA models were used to predict the growth of MSME production in the textile industry. However, only the model with the smallest AIC value was selected to predict the growth of MSME production in the textile industry. The prediction results showed that fluctuations occurred in the growth of the textile industry in each period.

Household Electricity Demand Forecasting in Batam from 2023 to 2047 Using Multilayer Perceptron Neural Network

2026-01-29T12:45:16+00:00

The rapid growth of electricity demand in Batam, driven by increasing household and industrial consumption, necessitates accurate long-term energy forecasting. This study aimed to forecast household electricity demand in Batam from 2023 to 2047 using the multilayer perceptron (MLP) artificial neural network (ANN) model. Secondary data from PT PLN Batam (2013-2022), including customer numbers, electricity sales volume, and revenue, were analyzed. A total of 200 MLP models were trained, varying the number of hidden layers and nodes, with algorithms including BACKPROP, RPROP+, RPROP−, SAG, and SLR. The partial autocorrelation function (PACF) was used to determine the number of input layer nodes. The optimal model, using the smallest learning rate (SLR) algorithm with four hidden layers and ten nodes, achieved the best performance with the lowest mean squared error (MSE) of 35.93 and mean absolute percentage error (MAPE) of 0.47%. The projection results show a consistent increase in electricity demand, with a peak forecast of 2,114 GWh by 2047. These findings provide valuable insights for long-term energy planning and policy-making, ensuring adequate electricity supply and infrastructure development in Batam.

Black-Scholes Method for Rainfall Index-Based Agricultural Insurance Premiums

2025-09-22T11:51:43+00:00

Agriculture plays an important role in national economic development. However, it has the highest risk of loss due to its dependence on climate conditions. One of the efforts to reduce the risk of crop failure is through an agricultural insurance program. This study aimed to analyze the value of the rainfall climate index used and the calculation of agricultural insurance premiums based on it. The method used to determine the rainfall climate index was the historical burn analysis method, while the method used to calculate agricultural insurance premiums was the Black-Scholes method. The study showed significant spatial variation in rainfall index-based agricultural insurance premiums across Sumatra. Premiums rose with higher percentiles, with North Sumatra the highest (IDR 3.28–3.55 million) and Aceh the lowest (IDR 100–137 thousand). The inclusion of all rainfall stations revealed a more detailed spatial pattern. Overall, premiums strongly reflect local climatic conditions and can aid risk assessment and insurance planning.

Identification of Sexual Harassment Comment on Tiktok Platform Using Indobert Embedding and Long Short-Term Memory

2026-04-05T02:22:22+00:00

Sexual harassment in online comment sections is a growing concern on social media platforms like TikTok, where informal language makes manual moderation ineffective. This study developed an automated detection model using a hybrid Indonesian bidirectional encoder representation from transformers (IndoBERT)-long short-term memory (LSTM) architecture, employing IndoBERT as a static feature extractor and an LSTM network to model sequential dependencies. Given the high-class imbalance in social media data, this study specifically evaluated the impact of the synthetic minority over-sampling technique (SMOTE) on classification performance. Experimental results showed that the base IndoBERT-LSTM model achieved a high overall accuracy of 89.04% but struggles with a low recall (0.40) for the minority harassment class. While applying SMOTE improved the model’s sensitivity (recall) for harassment to 0.54, it resulted in a significant decrease in precision, and an overall accuracy drop to 87.79%. These findings indicate that while oversampling can modestly enhance the detection of harassment instances, it introduces a substantial trade-off by increasing false positives. This study concludes that for highly informal and imbalanced TikTok data, standard oversampling techniques such as SMOTE may be less effective, suggesting the need for more advanced contextual augmentation or cost-sensitive learning approaches in future digital safety research.