Tourist Preference Analysis Based on Google Reviews Using the DBSCAN Method

Marita Qori'atunnadyah; Cahyasari Kartika Murni; Achmad Firman Choiri; Hadi Marianto; Muhammad Yazid

doi:10.20885/enthusiastic.vol5.iss2.art7

Submitted

September 3, 2025

Accepted

October 20, 2025

Published

October 30, 2025

Download

PDF

Statistic

Read Counter : 256 Download : 160

Abstract

Tourism is a strategic sector contributing to regional economic growth. Although Lumajang Regency offers prominent natural destinations, data-based insights into tourist preferences remain limited. This study analyzed tourist preferences using Google Reviews through a text mining approach that integrated the density-based spatial clustering of applications with noise (DBSCAN) algorithm and lexicon-based sentiment analysis. Data were collected via web scraping from six major destinations, yielding 16,904 reviews, of which 9,800 contained analyzable text. The text data were preprocessed using the term frequency-inverse document frequency (TF–IDF) to generate numerical representations prior to clustering. Using DBSCAN with parameters ε = 0.8 and MinPts = 4, one main cluster comprising 9,353 reviews and 447 outliers was identified. The main cluster was dominated by keywords such as waterfall, beautiful, and scenery, emphasizing the visual appeal of Tumpak Sewu as Lumajang’s tourism icon, while the outliers reflected reviews from international visitors and practical travel information. Sentiment analysis showed that most reviews were positive (68.0%), followed by neutral (24.1%) and negative (7.9%). These findings indicate a predominantly positive perception of Lumajang tourism, though accessibility and facilities require improvement. The study demonstrates the potential of digital review data for developing data-driven tourism management and promotion strategies.

Keywords

Digital Tourism Analytics Google Review TF–IDF DBSCAN Lexicon-Based Sentiment Analysis

License

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

How to Cite

Qori’atunnadyah, M., Murni, C. K., Choiri, A. F., Marianto, H., & Yazid, M. (2025). Tourist Preference Analysis Based on Google Reviews Using the DBSCAN Method. Enthusiastic : International Journal of Applied Statistics and Data Science, 5(2), 178–189. https://doi.org/10.20885/enthusiastic.vol5.iss2.art7

Download Citation

References

S. Bairavel and M. Krishnamurthy, “User preference and reviews analysis with neural networks for travel recommender systems,” Int. J. Eng. Res. Technol., vol. 13, no. 8, pp. 1896–1900, 2020, doi: 10.37624/ijert/13.8.2020.1896-1900.
M. Sharmin, A. Sumy, Y.A. Parh, and S. Hossain, “Identifying and classifying traveler archetypes from Google Travel Reviews,” Int. J. Stat. Appl, vol. 11, no. 3, pp. 61–69, 2021, doi: 10.5923/j.statistics.20211103.02.
L. Durmishi, A.C.M. Paredes, J.G. Sávoly, and G. Kovács, “Investigating the customer preference towards Michelin restaurants in Europe through Google Reviews,” Ecocycles, vol. 10, no. 1, pp. 115–123, 2024, doi: 10.19040/ecocycles.v10i1.434.
H. Yaşar and M. albayrak, “Comparison of serial and parallel programming performance in outlier detection with DBSCAN algorithm,” Bilecik Şeyh Edebali Üniversitesi Fen Bilim. Derg., vol. 7, no. 1, pp. 129–140, 2020, doi: 10.35193/bseufbd.649539.
R. Benaya, Y. Sibaroni, and A.F. Ihsan, “Clustering content types and user roles based on tweet text using K-medoids partitioning based,” J. Comput. Syst. Inform., vol. 4, no. 4, pp. 749–756, Aug. 2023, doi: 10.47065/josyc.v4i4.3751.
O.O. Wijaya and Rushendra, “Analysis of Sulawesi earthquake data from 2019 to 2023 using DBSCAN clustering,” J. RESTI (Rekayasa Sist. Teknol. Inf.), vol. 8, no. 4, pp. 454–465, Aug. 2024, doi: 10.29207/resti.v8i4.5819.
M. Qori’atunnadyah, “Pengelompokkan wilayah berdasarkan rasio guru-murid pada jenjang pendidikan menggunakan algoritma K-means,” J. Inform. Dev., vol. 1, no. 1, pp. 33–38, Mar. 2023, doi: 10.30741/jid.v1i1.898.
M. Qori’atunnadyah, “Fuzzy C-Means for regional clustering in East Java Province based on human development index indicators,” J Stat. J. Ilm. Teor. dan Apl. Stat., vol. 16, no. 2, pp. 524–534, Dec. 2023, doi: 10.36456/jstat.vol16.no2.a8240.
M. Qori’atunnadyah, “Metode C-Means untuk pengelompokkan kabupaten/kota Provinsi Jawa Timur berdasarkan indikator indeks pembangunan manusia (IPM),” J. Inform. Dev., vol. 1, no. 2, pp. 51–58, Apr. 2023, doi: 10.30741/jid.v2i2.1013.
M. Qori’atunnadyah, “Mapping of domestic and foreign tourist visits in East Java using the DBSCAN method,” J. Pilar Nusa Mandiri, vol. 21, no. 1, pp. 9–15, Mar. 2025, doi: 10.33480/pilar.v21i1.6073.
F.A. Hizham, C.K. Murni, and M. Qori’atunnadyah, “Uji klasifikasi algoritma naïve Bayes classification dalam analisis sentimen ulasan Puncak B29 Lumajang,” Progresif J. Ilm. Komput., vol. 20, no. 1, p. 361, 2024, doi: 10.35889/progresif.v20i1.1618.
F. Lan, “Research on text similarity measurement hybrid algorithm with term semantic information and TF–IDF method,” Adv. Multimed., vol. 2022, pp. 1–11, Apr. 2022, doi: 10.1155/2022/7923262.
W. Zhuohao, W. Dong, and L. Qing, “Keyword extraction from scientific research projects based on SRP-TF–IDF,” Chinese J. Electron., vol. 30, no. 4, pp. 652–657, Jul. 2021, doi: 10.1049/cje.2021.05.007.
[D. Chaurasia, P.V.D. K, and M. Bhatta, “Enhancing text summarization through parallelization: A TF–IDF algorithm approach,” in 2024 2nd Int. Conf. Intell. Cyber Physical Syst. Internet of Things (ICoICI), Aug. 2024, pp. 1503–1508, doi: 10.1109/ICoICI62503.2024.10696641.
N.S.M. Nafis and S. Awang, “An enhanced hybrid feature selection technique using term frequency–inverse document frequency and support vector machine–recursive feature elimination for sentiment classification,” IEEE Access, vol. 9, pp. 52177–52192, 2021, doi: 10.1109/ACCESS.2021.3069001.
F. Lechtenberg, J. Farreres, A.-L. Galvan-Cara, A. Somoza-Tornos, A. Espuña, and M. Graells, “Information retrieval from scientific abstract and citation databases: A query-by-documents approach based on Monte-Carlo sampling,” Expert Syst. Appl., vol. 199, Aug. 2022, Art. no 116967, doi: 10.1016/j.eswa.2022.116967.
J. Attieh and J. Tekli, “Supervised term-category feature weighting for improved text classification,” Knowledge-Based Syst., vol. 261, Feb. 2023, Art. no 110215, doi: 10.1016/j.knosys.2022.110215.
S. Hao, C. Shi, L. Cao, Z. Niu, and P. Guo, “Learning deep relevance couplings for ad-hoc document retrieval,” Expert Syst. Appl., vol. 183, Nov. 2021, Art. no 115335, doi: 10.1016/j.eswa.2021.115335.
N.P. Yunita, “Aplikasi pencarian hadis menggunakan vector space model dengan pembobotan TF–IDF dan Confix-Stripping stemmer,” J. Teknol. Inf. Ilm. Komput., vol. 10, no. 3, pp. 665–676, Jul. 2023, doi: 10.25126/jtiik.20231036736.
A.S. Bashir, A.A. Bichi, and A. Adamu, “Automatic construction of generic Hausa language stop words list using term frequency-inverse document frequency,” J. Electr. Syst. Inf. Technol., vol. 11, no. 1, Dec. 2024, Art. no 58, doi: 10.1186/s43067-024-00187-5.
A.F. Al Shammari, “Implementation of keyword extraction using term frequency–inverse document frequency (TF–IDF) in Python,” Int. J. Comput. Appl., vol. 185, no. 35, pp. 9–14, Sep. 2023, doi: 10.5120/ijca2023923137.
H.S. Lubis, M.K.M. Nasution, and A. Amalia, “Performance of term frequency–inverse document frequency and K-means in government service identification,” in 2024 4th Int. Conf. Sci. Inf. Technol. Smart Administration (ICSINTESA), Jul. 2024, pp. 772–777, doi: 10.1109/ICSINTESA62455.2024.10748106.
P. Giordani, M.B. Ferraro, and F. Martella, An Introduction to Clustering with R, vol. 1. Singapore: Springer Singapore, 2020, doi: 10.1007/978-981-13-0553-5.
M. Pietrzykowski, “Comparison of mini-models based on various clustering algorithms,” Procedia Comput. Sci., vol. 176, pp. 3563–3570, 2020, doi: 10.1016/j.procs.2020.09.030.
F. Batool and C. Hennig, “Clustering with the average silhouette width,” Comput. Stat. Data Anal., vol. 158, Jun. 2021, Art. no 107190, doi: 10.1016/j.csda.2021.107190.
H. Liu, X. Wang, Z. Wang, and Y. Cheng, “Does digitalization mitigate regional inequalities? Evidence from China,” Geogr. Sustain., vol. 5, no. 1, pp. 52–63, Mar. 2024, doi: 10.1016/j.geosus.2023.09.007.
P. Aceves and J.A. Evans, “Mobilizing conceptual spaces: How word embedding models can inform measurement and theory within organization science,” Organ. Sci., vol. 35, no. 3, pp. 788–814, May 2024, doi: 10.1287/orsc.2023.1686.
M. Husnain, M.M.S. Missen, N. Akhtar, M. Coustaty, S. Mumtaz, and V.B.S. Prasath, “A systematic study on the role of SentiWordNet in opinion mining,” Front. Comput. Sci., vol. 15, no. 4, Aug. 2021, Art. no 154614, doi: 10.1007/s11704-019-9094-0.
L. Barbaglia, S. Consoli, S. Manzan, L.T. Pezzoli, and E. Tosetti, “Sentiment analysis of economic text: A lexicon‐based approach,” Econ. Inq., vol. 63, no. 1, pp. 125–143, Jan. 2025, doi: 10.1111/ecin.13264.
A.M. van der Veen and E. Bleich, “The advantages of lexicon-based sentiment analysis in an age of machine learning,” PLoS One, vol. 20, no. 1, Jan. 2025, Art. no e0313092, doi: 10.1371/journal.pone.0313092.

References

S. Bairavel and M. Krishnamurthy, “User preference and reviews analysis with neural networks for travel recommender systems,” Int. J. Eng. Res. Technol., vol. 13, no. 8, pp. 1896–1900, 2020, doi: 10.37624/ijert/13.8.2020.1896-1900.

M. Sharmin, A. Sumy, Y.A. Parh, and S. Hossain, “Identifying and classifying traveler archetypes from Google Travel Reviews,” Int. J. Stat. Appl, vol. 11, no. 3, pp. 61–69, 2021, doi: 10.5923/j.statistics.20211103.02.

L. Durmishi, A.C.M. Paredes, J.G. Sávoly, and G. Kovács, “Investigating the customer preference towards Michelin restaurants in Europe through Google Reviews,” Ecocycles, vol. 10, no. 1, pp. 115–123, 2024, doi: 10.19040/ecocycles.v10i1.434.

H. Yaşar and M. albayrak, “Comparison of serial and parallel programming performance in outlier detection with DBSCAN algorithm,” Bilecik Şeyh Edebali Üniversitesi Fen Bilim. Derg., vol. 7, no. 1, pp. 129–140, 2020, doi: 10.35193/bseufbd.649539.

R. Benaya, Y. Sibaroni, and A.F. Ihsan, “Clustering content types and user roles based on tweet text using K-medoids partitioning based,” J. Comput. Syst. Inform., vol. 4, no. 4, pp. 749–756, Aug. 2023, doi: 10.47065/josyc.v4i4.3751.

O.O. Wijaya and Rushendra, “Analysis of Sulawesi earthquake data from 2019 to 2023 using DBSCAN clustering,” J. RESTI (Rekayasa Sist. Teknol. Inf.), vol. 8, no. 4, pp. 454–465, Aug. 2024, doi: 10.29207/resti.v8i4.5819.

M. Qori’atunnadyah, “Pengelompokkan wilayah berdasarkan rasio guru-murid pada jenjang pendidikan menggunakan algoritma K-means,” J. Inform. Dev., vol. 1, no. 1, pp. 33–38, Mar. 2023, doi: 10.30741/jid.v1i1.898.

M. Qori’atunnadyah, “Fuzzy C-Means for regional clustering in East Java Province based on human development index indicators,” J Stat. J. Ilm. Teor. dan Apl. Stat., vol. 16, no. 2, pp. 524–534, Dec. 2023, doi: 10.36456/jstat.vol16.no2.a8240.

M. Qori’atunnadyah, “Metode C-Means untuk pengelompokkan kabupaten/kota Provinsi Jawa Timur berdasarkan indikator indeks pembangunan manusia (IPM),” J. Inform. Dev., vol. 1, no. 2, pp. 51–58, Apr. 2023, doi: 10.30741/jid.v2i2.1013.

M. Qori’atunnadyah, “Mapping of domestic and foreign tourist visits in East Java using the DBSCAN method,” J. Pilar Nusa Mandiri, vol. 21, no. 1, pp. 9–15, Mar. 2025, doi: 10.33480/pilar.v21i1.6073.

F.A. Hizham, C.K. Murni, and M. Qori’atunnadyah, “Uji klasifikasi algoritma naïve Bayes classification dalam analisis sentimen ulasan Puncak B29 Lumajang,” Progresif J. Ilm. Komput., vol. 20, no. 1, p. 361, 2024, doi: 10.35889/progresif.v20i1.1618.

F. Lan, “Research on text similarity measurement hybrid algorithm with term semantic information and TF–IDF method,” Adv. Multimed., vol. 2022, pp. 1–11, Apr. 2022, doi: 10.1155/2022/7923262.

W. Zhuohao, W. Dong, and L. Qing, “Keyword extraction from scientific research projects based on SRP-TF–IDF,” Chinese J. Electron., vol. 30, no. 4, pp. 652–657, Jul. 2021, doi: 10.1049/cje.2021.05.007.

[D. Chaurasia, P.V.D. K, and M. Bhatta, “Enhancing text summarization through parallelization: A TF–IDF algorithm approach,” in 2024 2nd Int. Conf. Intell. Cyber Physical Syst. Internet of Things (ICoICI), Aug. 2024, pp. 1503–1508, doi: 10.1109/ICoICI62503.2024.10696641.

N.S.M. Nafis and S. Awang, “An enhanced hybrid feature selection technique using term frequency–inverse document frequency and support vector machine–recursive feature elimination for sentiment classification,” IEEE Access, vol. 9, pp. 52177–52192, 2021, doi: 10.1109/ACCESS.2021.3069001.

F. Lechtenberg, J. Farreres, A.-L. Galvan-Cara, A. Somoza-Tornos, A. Espuña, and M. Graells, “Information retrieval from scientific abstract and citation databases: A query-by-documents approach based on Monte-Carlo sampling,” Expert Syst. Appl., vol. 199, Aug. 2022, Art. no 116967, doi: 10.1016/j.eswa.2022.116967.

J. Attieh and J. Tekli, “Supervised term-category feature weighting for improved text classification,” Knowledge-Based Syst., vol. 261, Feb. 2023, Art. no 110215, doi: 10.1016/j.knosys.2022.110215.

S. Hao, C. Shi, L. Cao, Z. Niu, and P. Guo, “Learning deep relevance couplings for ad-hoc document retrieval,” Expert Syst. Appl., vol. 183, Nov. 2021, Art. no 115335, doi: 10.1016/j.eswa.2021.115335.

N.P. Yunita, “Aplikasi pencarian hadis menggunakan vector space model dengan pembobotan TF–IDF dan Confix-Stripping stemmer,” J. Teknol. Inf. Ilm. Komput., vol. 10, no. 3, pp. 665–676, Jul. 2023, doi: 10.25126/jtiik.20231036736.

A.S. Bashir, A.A. Bichi, and A. Adamu, “Automatic construction of generic Hausa language stop words list using term frequency-inverse document frequency,” J. Electr. Syst. Inf. Technol., vol. 11, no. 1, Dec. 2024, Art. no 58, doi: 10.1186/s43067-024-00187-5.

A.F. Al Shammari, “Implementation of keyword extraction using term frequency–inverse document frequency (TF–IDF) in Python,” Int. J. Comput. Appl., vol. 185, no. 35, pp. 9–14, Sep. 2023, doi: 10.5120/ijca2023923137.

H.S. Lubis, M.K.M. Nasution, and A. Amalia, “Performance of term frequency–inverse document frequency and K-means in government service identification,” in 2024 4th Int. Conf. Sci. Inf. Technol. Smart Administration (ICSINTESA), Jul. 2024, pp. 772–777, doi: 10.1109/ICSINTESA62455.2024.10748106.

P. Giordani, M.B. Ferraro, and F. Martella, An Introduction to Clustering with R, vol. 1. Singapore: Springer Singapore, 2020, doi: 10.1007/978-981-13-0553-5.

M. Pietrzykowski, “Comparison of mini-models based on various clustering algorithms,” Procedia Comput. Sci., vol. 176, pp. 3563–3570, 2020, doi: 10.1016/j.procs.2020.09.030.

F. Batool and C. Hennig, “Clustering with the average silhouette width,” Comput. Stat. Data Anal., vol. 158, Jun. 2021, Art. no 107190, doi: 10.1016/j.csda.2021.107190.

H. Liu, X. Wang, Z. Wang, and Y. Cheng, “Does digitalization mitigate regional inequalities? Evidence from China,” Geogr. Sustain., vol. 5, no. 1, pp. 52–63, Mar. 2024, doi: 10.1016/j.geosus.2023.09.007.

P. Aceves and J.A. Evans, “Mobilizing conceptual spaces: How word embedding models can inform measurement and theory within organization science,” Organ. Sci., vol. 35, no. 3, pp. 788–814, May 2024, doi: 10.1287/orsc.2023.1686.

M. Husnain, M.M.S. Missen, N. Akhtar, M. Coustaty, S. Mumtaz, and V.B.S. Prasath, “A systematic study on the role of SentiWordNet in opinion mining,” Front. Comput. Sci., vol. 15, no. 4, Aug. 2021, Art. no 154614, doi: 10.1007/s11704-019-9094-0.

L. Barbaglia, S. Consoli, S. Manzan, L.T. Pezzoli, and E. Tosetti, “Sentiment analysis of economic text: A lexicon‐based approach,” Econ. Inq., vol. 63, no. 1, pp. 125–143, Jan. 2025, doi: 10.1111/ecin.13264.

A.M. van der Veen and E. Bleich, “The advantages of lexicon-based sentiment analysis in an age of machine learning,” PLoS One, vol. 20, no. 1, Jan. 2025, Art. no e0313092, doi: 10.1371/journal.pone.0313092.

Tourist Preference Analysis Based on Google Reviews Using the DBSCAN Method

Article Sidebar

Main Article Content

Abstract

Keywords

Article Details

References

References

Related Articles based on the article keywords