Nowcasting the Transportation and Accommodation Sectors Growth using the Google Trends Index

: This research aims to assess the possibility of the daily and weekly Google Trends Index (GTI) to predict the quarterly GDP growth. The U-MIDAS approach is utilized because it allows using of daily and weekly basis data to forecast quarterly indicators without aggregating them onto a quarterly basis hence it does not eliminate useful information on the daily and weekly data. This research uses quarterly GDP for the transportation sector and the accommodation and restaurant sector which are considered potential industries for the future of Indonesia's economy. The result shows that the daily basis GTI can effectively predict the quarterly GDP growth better than the weekly basis GTI based on the RSE scores.


Introduction
The evident characteristic of economic development in many countries is the existence of structural transformation [1].A structural transformation is a change in an economy from agricultural sectors which is labor-intensive activities to manufacturing and services sectors which are skill-intensive activities [2]- [4].The evidence of countries experiencing a structural transformation appears in the shifting of the sector's contribution to GDP or the movement of the laborers ( [2], [5]. Indonesia is most likely experiencing a structural transformation.It is evidenced by the contribution of service sectors to total GDP continues to increase in the past 10 years.Its contribution is around 46,2 percent to the total GDP in 2021, increasing from 41,76 percent in 2010.Two crucial service sectors for Indonesia's economy are the transportation sector and accommodation and restaurants.The average growth of the transportation sector in the past 10 years is 7.3 percent while the accommodation and restaurants sector is 5.7 percent, higher than the average growth of the agriculture (3.5 percent) and manufacturing sector (4.4 percent) which are the two sectors with the largest contribution to GDP.The labor absorption in the transportation and accommodation and restaurants sectors in 2022 also shows an increase of 16.6 for transportation and 54.1 percent for accommodation and restaurants from 2010, strengthening the existence of structural transformation in Indonesia.These sectors also play an important role in tourism development in Indonesia which is considered a potential industry for the future of Indonesia's economy [6].Accordingly, forecasting the growth of the transportation sector and accommodation and food services sector will be prominent in providing insight related to the development of the tourism sector in Indonesia which is essential for both the government and private sectors.
Most forecasting methods usually require data with the same frequencies.However, some datasets available at higher frequencies may provide prospectively fruitful information for forecasting which will most likely fail to be applied by those forecasting methods [7], [8].This concern leads to the possibility of using mixed-frequency data for forecasting known as Mixed Data Sampling (MIDAS) regression [7].Most studies of MIDAS implementation in Indonesia and around the world use monthly socioeconomic data to forecast the quarterly GDP [8]- [11].However, in the era of internet proliferation, data can come from various sources with various frequencies and various forms which are later known as big data or alternative data.Some of the alternative data do not rule out the possibility of being strongly associated with the GDP.Nakazawa [12] has proven the possibility of MIDAS implementation using monthly basis alternative data namely the google trends index and METIPOS to predict Japan's GDP.Nevertheless, based on the literature review carried out so far, this research has not found a study that implements daily and weekly basis alternative data to predict the GDP, particularly in Indonesia.Therefore, this research contributes by offering the possibility of using the daily and weekly Google Trends Index to predict Indonesia's Quarterly GDP.
The rest of this research is arranged as follows: the next section describes the current literature on MIDAS Regression and Google Trends, followed by data and methodology.Next, the result of this research is presented.Finally, the two last sections provide some discussion and conclusion.

Mixed Data Sampling (MIDAS) Regression and Google Trend
MIDAS is first introduced by Ghysels et.al [13].The objective is to utilize high frequencies data to forecast lower frequencies data without aggregating them into lower frequencies basis.The rationale for this idea is that many researchers experience the condition where the explanatory variables are in high frequencies, while the response variable is in low frequency.Transforming high-frequency data to lowerfrequency data may induce deprivation of information [8].Some recent studies have applied MIDAS to forecast economic indicators, particularly GDP.Barsoum and Stankiewicz [8] performed MIDAS to predict the U.S GDP growth with monthly macroeconomic and financial variables.Subsequent to Barsoum, Bilgin et.al [9] utilized MIDAS to forecast the U.S quarterly GDP using monthly CPI and unemployment rates.Another study was done in Thailand [10] who performed MIDAS regression using monthly indicators that are export growth, unemployment rates, and stock index.There were also some studies of MIDAS implementation to forecast GDP in Indonesia.Utari [14] used the monthly agricultural export data and JCI to forecast quarterly Indonesia's GDP.
Google Trends is a feature from Google that provides information concerning search requests of specific topics made to the Google search engine [15].It provides real-time and archived daily, weekly, and monthly data by geographic location and category from 2004 [16], [17].Because of its characteristics, it's considered one of the alternative data that is powerful to be used for predicting, forecasting, moreover for nowcasting.Google Trends has been used dispersedly for forecasting in many sectors.In the business world, the evidence of the study by Wijnhoven and Plant [18] showed that Google Trends gave a better prediction for car sales than social media sentiment did.Another study was done by Ahmed et.al [19] who applied machine learning to analyze whether human behavior captured by Google Trends affects stock markets in Pakistan.In the economic aspect, [20] constructed a machine-learning algorithm to estimate real-time GDP growth in 46 countries using Google Trends.The last study shows evidence of the performance of Google Trends for nowcasting.
There are also some studies regarding the utilization of Google Trends to forecast tourism indicators.[21] applied Random Forest to predict tourism the number of inbound tourists in China using Google Trends.[22] Performed time series regression and ARIMAX model to forecast room occupancy rates using google trends.The study showed that the ARIMAX model gave the best performance based on the RMSE score.Besides the ARIMAX model, a MIDAS approach also has been applied to predicting the tourism indicator from google trends.The result of the study [23] showed that the use of weekly Google Trends in the MIDAS model gave a better performance in predicting monthly tourist arrivals than excluding the Google Trends in the model.

Data
This research uses two datasets, the first is the quarterly GDP growth.This research uses the yearon-year (y-on-y) growth rates of Indonesia's quarterly GDP for the transportation sector and the accommodation and restaurants sector from the second quarter of 2012 to the first quarter of 2022.This data portrays the change of value-added in the transportation sector and accommodation and restaurant sector in a quarter compared to the same quarter in the previous year.This data is regularly produced and published by Badan Pusat Statistik, the National Statistical Office of Indonesia.Henceforth, this research uses the "GDP sector H" term to represent the quarterly GDP for the transportation sector and the "GDP sector I" term to represent the quarterly GDP for the accommodation and restaurants sector.
The second data set is the Google Trends Index (GTI).The Google Trends gives open access to the public to obtain Google Trends Index (GTI) data.The public can acquire the data by downloading it from the official website or by performing some queries to retrieve it using programming languages.This research applies a package from the R language called "GtrendsR" [24] to perform queries for acquiring information from Google Trends.To get better information on Google Trends regarding the transportation sector and accommodation and restaurants sector from Google Trends, this research utilizes categories instead of specific keywords for the queries [25].The category "Transportation & Logistics" is used for acquiring information regarding the transportation sector.While the category "Hotels & Accommodations" is used to acquire information regarding the accommodation and restaurant sectors.This research performs queries to obtain the GTI data from 01 January 2012 to 30 June 2022 and calculate the year-on-year growth rate for the index.

Disaggregating Monthly GTI to daily GTI
GTI is an index obtained by normalizing the number of specific search requests to the Google search engine over the selected period [16].The normalized index of Google Trends is ranged between 0 and 100.A value of 100 indicates the highest number of searches for a certain keyword in a certain period.An index value of 50 indicates the number of searches for a keyword is half of the highest number of searches for the given period.A value of 0 indicates not enough data available/no searches for that keyword.Mathematically, the google trends normalized index can be written as follows: Where   is the Google Trends Index at time point i in period p,   is the number of searches for a specific keyword at time point i in period p.  ()  is the highest number of searches for a specific keyword at period p.According to formula 1, it should be noticed that GTI is a relative index so different periods return a different index for a specific time point and equivalent keywords.
Google Trends provides different frequencies of the GTI depending on the selected period.Table 1 shows the detail of GTI frequency and the selected period.First, this research retrieves the GTI from January 2012 to June 2022 which returns monthly GTI relative to the selected period.Then, this research also retrieves the GTI for every month in the period which returns daily GTI relative to the selected month.To obtain daily GTI that is comparable over months in the period, this research applies a formula as follows: Where      is comparable daily GTI over months in the selected period for month i and day j, 〖      is daily GTI relative to month i for day j, 〖   is monthly GTI for month i. in the selected period.For weekly GTI, this research performs aggregation of daily GTI to weekly GTI using the median score in a respective week.The median score is used thus the aggregation results are not affected by extreme values.

Unit Root Test
This research performs Augmented Dickey-Fuller (ADF) and Phillip-Perron (PP) as unit root tests to check the stationarity of the datasets [27] .The ADF test checks the existence of a unit root from the data using the OLS estimator of an autoregressive model [28] as follows, The existence of a unit root indicates the dataset is not stationer.From equation 3, The Y_t is considered stationary time series if |ρ| <1 and not stationer if |ρ|=1.Accordingly, the time series dataset can be decided as stationer if it meets the alternative hypothesis of ADF test.
PP test is proposed by Phillip and Perron that permits to check of the presents of a unit root for dependent and heterogeneously data using a nonparametric approach [29].PP test is an improvement of the ADF test which allows the distribution of error.The time series data set is considered stationer if satisfy the alternative hypothesis of the PP test which is |ρ|<1.

Cross Correlation Function (CCF)
The cross-correlation function is used to evaluate the association between two time series datasets [30].The concept of cross-correlation is similar to cross-variance in statistics.Cross-correlation aims to measure the similarity of two signals to obtain features by comparing the less informative signal with the more informative signal [31].The cross-correlation function formula can be written as follow, where   and   are the standard deviations of   t and   , while   and   are the means of X_t and Y_t.

Mixed Data Sampling (MIDAS) Regression
This research performs an unrestricted MIDAS regression or U-MIDAS.U-MIDAS is a type of MIDAS regression that does not consider the lag polynomials distribution function, or in other words without constraint on the parameters [32], [33].U-MIDAS parameters is estimated using Ordinary Least Square.U-MIDAS regression can be written as follows, ()  = ()′ ,0 +   (5) Where () = The MIDAS estimation in this research is performed using the R package called "midasr".

Model Evaluation
This research utilizes the residual standard error (RSE) score to evaluate the MIDAS model.RSE is one of the goodness-of-fit tests to appraise whether a model fits a dataset [34].The RSE score can be measured as follows, where Y-Y_est is residuals of the model and df is degrees of freedom.

Results and Discussions
Overview of longitudinal patterns of the GDP sector H, the GDP sector I, and Google Trends Index  Figure 2 shows the pattern of the GDP sector I that have a similar condition to the pattern of GDP sector H.The search requests made daily and weekly to the Google search engine regarding hotels and accommodation apparently can effectively describe the activities in the accommodation sector.
According to the delineation of Figure 1 and Figure 2, this research believes that the daily and weekly GTI regarding transportation and logistics as well as hotels and accommodations can be useful to nowcast the GDP Sector H and the GDP Sector I respectively.
Figure 2. The daily and weekly GTI time series for the "Hotels and Accommodations" category from 2012 to 2022 shows an identical pattern to the GDP Sector I for both levels and growth.
To confirm that all the datasets are eligible to perform MIDAS regression, the datasets should meet the stationarity assumption [35].Table 2 provides the result of the unit root test for all datasets used in this research.The unit root tests show that all datasets are significantly stationary on alpha 0.05 which is considered a satisfactory result.According to this result, all datasets are appropriate for the models.This research then performs a cross-correlation function to determine whether the growth rates of the GTI are considered a lagging, coincidence, or leading indicator for the growth rates of the GDP.Because the two datasets have different frequencies, the growth rates of the GTI should be aggregated to the quarterly index therefore the cross-correlation function can be applied.Figure 3 provides the result of the cross-correlation function of the GDP and the GTI.
Figure 3.The cross-correlation of the growth rates of the quarterly GDP Sector H and Sector I with the growth rates of the quarterly GTI for "transportation and logistics" and "hotels and accommodations", respectively.The growth rates of the quarterly GTI for both sectors have the highest correlation with the growth rates of the quarterly GDP in lag 0, indicating that the GTI coincides with the GDP.
From Figure 3, this research finds that the growth rates of the quarterly GDP from 2012 to 2022 coincide with the growth rates of the quarterly GTI for the same period.It means that the growth rates of daily and weekly GTI in a quarter will have the highest correlation with the growth rates of the GDP in that quarter if the growth of the quarterly GDP is released on the first day of the next quarter.This is in accordance with the statement of Varian [17] who claims that google trends may be fruitful in predicting the present rather than the future.

MIDAS Regression to nowcast GDP
Based on the cross-correlation result (see Figure 3), this research focuses on the daily and weekly GTI growth in a quarter to determine how many lags effectively describe the GDP growth rates in each corresponding quarter.Therefore, this study performs a U-MIDAS regression for all possible combinations of the daily and weekly GTI lag counts to check which combination that gives the best result.The model with the lowest RSE score is considered the best model.Table 3 shows the result of the model selection.Considering the change in the GDP trends caused by the Covid-19 outbreak (see Figure 1 and Figure 2), this research also performs simulation by including the Covid dummy variables to explain the shifting in the trends.This research finds that the daily GTI growth rate for the "transport and logistics" effectively predicts the GDP growth rate for Sector H if it is in the last month of each corresponding quarter.For instance, on the MIDAS model without Covid dummy variables, the daily GTI for "transportation and logistics" from day 7 to day 36 prior to the last day of a quarter will give the best prediction on the GDP growth for sector H for that quarter.While including the covid dummy variable, the MIDAS model using the daily GTI from day 6 to day 35 prior to the last day of a quarter will give the best prediction on the GDP growth for that quarter.
On the other hand, according to the RSE score, the best MIDAS model to predict the GDP growth for the sector I use information on the daily GTI growth in the first month of each corresponding quarter.For instance, on the MIDAS model without Covid dummy variables, the daily GTI for "hotels and accommodations" from day 57 to day 85 prior to the last day of a quarter will give the best prediction on the GDP growth for sector I for that quarter.While including the covid dummy variable, the MIDAS model using the daily GTI from day 57 to day 86 prior to the last day of a quarter will give the best prediction on the GDP growth for that quarter.
For the MIDAS model with weekly GTI growth rates, this research finds that most weeks of a quarter effectively predict GDP growth rates for that quarter.The weekly GTI growth rates for "transportations and logistics" from week 11 or 12 to the last week of a quarter effectively predict the GDP growth rates for sector H.While the weekly GTI growth rates for "hotels and accommodations" from week 11 to week 3 prior to the last day of a quarter effectively predict the GDP growth rates for sector I.   3).

Discussions
This research finds that the MIDAS model using the daily Google Trends Index gives a better result to predict the quarterly GDP growth.This is evident by the RSE scores for the daily MIDAS model which are lower than the RSE scores of the weekly MIDAS hotel.It strengthens with the plot of fitted values and the actual values (see Figure 4 and Figure 5).Apparently, the statement from Foroni [32]which said that the U-MIDAS model will be more fruitful if applied to datasets with small frequency differences is not applicable to the condition in this research.The provisional presumption is this research gain an advantage from utilizing a dataset that has an identical pattern to the responding dataset (see Figure 1 and Figure 2).However, conducting further exploration by comparing it to the restricted MIDAS model and performing other model evaluations such as monte Carlo simulations and AIC is necessary to obtain factual explanations.

Conclusion
This research has proven the possibility of the use of daily and weekly alternative data to nowcast the GDP.The nowcasting is performed using the U-MIDAS model and the explanatory variable used for the model is the daily and weekly Google Trends Index (GTI).The models provide satisfactory results based on the RSE score.The evidence in this research shows that the U-MIDAS model using the daily GTI is better to predict the current GDP than the U-MIDAS model using the weekly GTI.

Figure 1
Figure1provides the comparison of the daily and weekly GTI time series for the category "Transportation and Logistics" with the GDP sector H.The daily and weekly GTI time series show similarities to the GDP time series pattern.

Figure 1 .
Figure 1.The daily and weekly GTI time series for the "Transportation and Logistics" category from 2012 to 2022 shows an identical pattern to the GDP Sector H for both levels and growth.

Figure 4 .
Figure 4. Plots of the GDP growth for Sector H and Sector I and their fitted values of MIDAS model using the daily GTI with and without Covid dummy variable.All models fit well with the actual values.

Figure 5 .
Figure 5. Plots of the GDP growth for Sector H and Sector I and their fitted values of MIDAS model using the weekly GTI with and without Covid dummy variable.All models are not very good at predicting the GDP growth rates.

Figure 4 and
Figure 4 and Figure 5 present visual descriptions of the goodness of the models compared with the actual values.The Covid dummy variables show an insignificant impact on the prediction for all models (see Table3).

Table 1 .
[26]od and Frequency of GTI uses growth rates of Quarterly GDP data from 2012 to 2022, the GTI used in this research should meet the period of the data.Regarding Table1, Google Trends can only provide monthly GTI to meet the period used in this research.Accordingly, this research performs a method introduced in[26]to disaggregate monthly GTI from a large period to daily and weekly GTI.

Table 2 .
Result of Stationarity Test

Table 3 .
Best Model Based on RSE Score