Risk Analysis on the Growth Rate of Covid-19 Cases in Indonesia Using Statistical Distribution Model

Coronavirus or Covid-19 outbreak has been declared as a pandemic and many countries were not ready to deal with such an eventuality. The highly rapid rate of transmission is one reason for the need to take mitigation measures, since healthcare system has limited capacity. Indonesia is one of the countries that has lost medical resources to the pandemic. In order to provide more comprehensive information about the characteristics of Covid-19 in Indonesia, risk analysis of the occurrence of new cases was needed. This study proposes a related overview about risk occurrence of new Covid-19 cases per daily basis by performing distribution fitting technique to form a statistical distribution model. Among the available alternative models, Geometric distribution is the most suitable to describe the growth of new cases in Indonesia. Received February 12, 2021Revised March 25, 2021Accepted April 15, 2021


Introduction
In mid-March 2020, the World Health Organization (WHO) officially declared the coronavirus disease (Covid-19) as a pandemic [2]. The disease has passed through the phases of an outbreak and epidemic, as that occurred for H1N1 or swine flu in 2009. Starting from its outbreak in the city of Wuhan, Hubei Province, China in December 2019, the virus then spread to many regions and 114 countries across the globe, and shocked many people for being unprepared in mitigating the impacts of this pandemic. By September 20, 2020, the overall cases worldwide had reached 30.9 million recorded cases with a death toll of 3% [9]. Based on statistical data, although the cure rate for this disease reaches 72%, the high transmission rate leads to the surge of new confirmed cases, thus making the medical resources overwhelmed for lacking capacity.
The skyrocketing infirmed cases of Covid-19 that exceed the capacity of medical resources leads to longer case handling. Until August 28, 2020, it was noted that the occupancy rates of the isolation rooms and ICU bed of 67 referral hospitals in Jakarta reached 69 and 77 percent, respectively [7]. However, the medical resources are not only experiencing limited capacity in terms of medical equipment and isolation or inpatient rooms, but also in terms of medical personnel and doctors. About 92 health workers died for every 100,000 Covid-19 cases identified in Indonesia, according to the Coronavirus or Covid-19 outbreak has been declared as a pandemic and many countries were not ready to deal with such an eventuality. The highly rapid rate of transmission is one reason for the need to take mitigation measures, since healthcare system has limited capacity. Indonesia is one of the countries that has lost medical resources to the pandemic. In order to provide more comprehensive information about the characteristics of Covid-19 in Indonesia, risk analysis of the occurrence of new cases was needed. This study proposes a related overview about risk occurrence of new Covid-19 cases per daily basis by performing distribution fitting technique to form a statistical distribution model. Among the available alternative models, Geometric distribution is the most suitable to describe the growth of new cases in Indonesia. Jakarta Globe calculation based on data on 12 September 2020. This number is the fourth-highest fatality rate for health workers across the world after Mexico, Egypt, and the United Kingdom [4].

Keywords
Various efforts have been made by the Indonesian Government through the National Disaster Mitigation Agency (BNPB) to tackle the Covid-19 problem, one of which is importing medical equipment, in-vitro diagnostic medical devices and household health supplies for mitigation [5]. As seen from a risk analysis, the Covid-19 pandemic case is a risk with a large impact, so it requires proper mitigation to minimize losses.
The above description underlines that the root cause of this pandemic is the massive transmission rate and the rapidly escalating number of new cases in the community. Therefore, minimizing the occurrence of new cases can be considered as a preventive solution to reduce the impact of potential losses incurred. To describe the characteristics of the emergence of Covid-19 cases in Indonesia, a statistical distribution model can be used since almost all events or phenomena can be converted into a model. Although there is no model that can perfectly replicate actual events, the information of a case or phenomenon can be better understood by simplifying the problem into a statistical model. Through a statistical distribution model, this study provides a risk analysis that can be taken into consideration by the public in their activities during a pandemic based on the characteristics of Covid-19 cases growth in Indonesia.

Random Variable
A random variable, say X, is a function defined over a sample space, , that associates a real number, X(e)=x, with each possible outcome e in S [1].

1) Discrete Random Variable
If the set of all possible values of a random variable, X, is a countable set, x 1 , x 2 , …, x n , or x 1 , (1) That assigns the probability to each possible value x will be called the discrete Probability Density Function (PDF). The Cumulative Distribution Function (CDF) of a random variable X is defined for any real x by (2) 2) Continuous Random Variable A random variable X is called a continuous random variable if there is a function f(x), called the PDF of X, such as the CDF that can be represented as

Cumulative Distribution Function (CDF)
The CDF of a random variable X is F(x)≔P [X≤x]. When an independent and identically distributed (iid) sample X 1 , …, X n is given, the CDF can be estimated by the Empirical Distribution Continuous random variables are either characterized by the CDF F or the PDF f=F ' , which represents the infinitesimal relative probability of per unit of length. We write X~F (or X~f) to denote that X has a CDF F (or a PDF f). If two random variables, X and Y, have the same distribution, we write X=Y.

Fitting Distribution
The purpose of fitting a distribution is to predict the probability or forecast the frequency of occurrence of events in a certain interval. By ordering the goodness of fit of various distributions, a decision can be made about which distribution is acceptable and matches the data used.
Assume that an iid sample X 1 , …, X n from the distribution is given. Tests for the null hypothesis H 0 :F=F 0 are against the most general alternative H 1 :F≠F 0 where F 0 is a pre-specified, not-data-dependent, distribution model. If some parameters of F 0 are estimated from the sample, the presented tests will not respect the significance level (α) for which they are constructed, and as a consequence they will be highly conservative. The next two well-known goodness of fit tests are as follows [6]. 1) Kolmogorov-Smirnov Test a. Test purpose. Given X 1 , …, X n~F , test H 0 :F=F 0 vs. H 1 :F≠F 0 consistently go against all the alternative to F 0 . b. Statistic definition. The test statistic uses the supremum distance between F n and F 0 : If H 0 :F=F 0 holds, D n tends to be small. Conversely, when F≠F 0 , larger values of D n are expected, and the test is rejected when D n is large. c. Statistic computation. The computation of D n can be efficiently achieved by realizing that the maximum difference between F n and F 0 happens at x=X i , for a certain X i . From here, sorting the sample and applying the probability transformation F 0 gives the following function: e. Highlights and caveats. The Kolmogorov-Smirnov test is a distribution-free test because its distribution under H 0 does not depend on F 0 , but only if F 0 is continuous and the sample X 1 , …, X n is also continuous, i.e., the sample has no ties. If these assumptions are met, the iid sample X 1 , …, X n~F0 generates the iid sample U 1 ,…, U n~U (0, 1). As a consequence, the distribution of (6) does not depend on F 0 . If F 0 is not continuous or there are ties on the sample, the K function is not the true asymptotic distribution. A possibility if there are ties on the sample is to perturb the sample slightly in order to remove them. 2) Anderson-Darling Test a. Test purpose. Given X 1 , …, X n~F , test H 0 :F=F 0 vs. H 1 :F≠F 0 consistently go against all the alternative to F 0 . b. Statistic definition. The test statistic uses a quadratic distance between and 0 weighted by w(x)=(F 0 (x)(1-F 0 (x))) -1 : If H 0 holds, A n 2 tends to be small (because of the denominator), so rejection happens for large values of A n 2 . Note that, compared with W n 2 , A n 2 places more weight in the deviations that happen on the tails, that is, when F 0 (x)≈0 or F 0 (x)≈1.
c. Statistic computation. The computation of A n 2 can be significantly simplified as: e. Highlights and caveats. As with the previous tests, the Anderson-Darling test is also distribution-free if F 0 is continuous and there are no ties in the sample. Otherwise, the null asymptotic distribution is different from the one of (10). The Anderson-Darling test also presents empirical evidence pointing out that it is more powerful than the Kolmogorov-Smirnov test for a broad class of alternative hypotheses. In addition, due to its construction, the Anderson-Darling test can better detect the situations in which F 0 and F differ on the tails.

Results
The number of Covid-19 cases in Indonesia according to Indonesia Covid-19 Task Force (2020) [3] from the first confirmed case until 10 th September 2020 is shown in Figure 1 below. The graph shows the number of daily new cases for the last 6 months, which is represented by X-axis, while the occurrence probability of daily new cases is represented by Y-axis. Each probability value was identified by calculating the unique number of daily new cases divided by the total data record. The number of new cases may vary from one day to another. Therefore, the number of daily new cases can be assumed as an event which is visualized in Figure 1 that contains event (X-axis) and probability of event (Y-axis).

Fig.1. Histogram of daily new Covid-19 cases
The histogram trend as shown in Fig.1 indicates a similar distribution pattern. In order to prove whether the data follows a statistical distribution, a fitting distribution process is required. Since the data is discrete, many possible discrete distributions can be considered as null hypothesis in the fitting distributions process. In order to limit the scope of the fitting process, there are three statistical distribution models chosen as an initial hypothesis in the goodness of fit test. Those three distributions are Geometric, Discrete Uniform, and Negative Binomial. The comparison of goodnessof-fit test results towards the distributions is shown in Table 1 below. According to goodness-of-fit test results using Kolmogorov-Smirnov and Anderson-Darling method in several α values, the null hypothesis of Geometric distribution is almost failed to be rejected. Therefore, the Geometric distribution was then chosen as the fittest statistical model to represent the Covid-19 cases in Indonesia.

Discussion
The Geometric distribution has two definitions, such as the number of trials until the first success in a sequence of independent Bernoulli trials, and the number of failures before the first success in a sequence of independent Bernoulli trials. A Bernoulli trial is an experiment that has two results, usually referred to as a "failure" or a "success." The success occurs with probability and the failure occurs with probability 1-p. "Success" means that a specific event occurred, whereas "failure" indicates that the event did not occur. Because the event can be negative (death, recurrence of cancer, etc) [8], the probability density function (PDF) for Geometric distribution is shown in the following formula: f(x)=p(1-p) x Meanwhile, the cumulative distribution function (CDF) for Geometric distribution is expressed as the following formula: The PDF represents the probability of getting x failures before the first success, while the CDF represents the probability of getting most of x failures before the first success. In order to analyze the risk of Covid-19 cases growth, the Geometric CDF was then plotted into a graph to give a projection about the probability of an event occurring on or before n th trial. In this case, cumulative probability (p) value can be interpreted as a potential risk of Covid-19 occurrence and the number of trials (n) quantifies the number of activities which are potentially exposed to virus infection. The graph (with additional 5% and 95% potential risk indicator) is shown on Fig.2  The graph shows that the less the activity the less likely the case to occur. Otherwise, the more the activities which are potentially exposed to virus transmission or infection, the more likely the case to occur. It also shows that doing 49 trials/activities can lead to 5% risk of infection, while 95% risk potential can be obtained by doing 3099 potentially exposed activities. The result of research analysis can be considered as the risk analysis of Covid-19 occurrence during the pandemic, only if the parameter (p) does not change significantly. In case, the parameter (p) increases as indicated by the increasing number of new cases, the Geometric distribution curve will get narrower that makes either 5% and 95% risk potential can be reached with fewer number of trials.

Conclusion
As the Covid-19 pandemic evolves around the world, there has been an alarming surge of new confirmed cases, especially in Indonesia. The rapid spread is mainly attributed to the highly contagious and easily transmittable characteristics of the Covid-19 virus from one person to another. These extraordinary situations certainly requires an understanding of its risk and the ability to mitigate the risk. This study describes the potential risk of Covid-19 transmission in Indonesia from the initial cases until the period of early September 2020 in order to inform related parties with the proper way to mitigate the impact of the pandemic. The results of the fitting distribution process revealed that the characteristics of the Covid-19 pandemic in Indonesia follows the Geometric distribution with parameter ( ) of 0.0009354. The characteristic of an event, which follows the Geometric distribution means that the probability of its occurrence depends on the number of trials mattered. Thus, in the case of Covid-19, the number of trials indicates the amount of activity that potentially leads to the outbreak of new cases. These potentially high-risk activities can be interpreted as activities that are carried out alone or those that involve social interactions. Based on the concept of geometric distribution (which represents Covid-19 cases in Indonesia), the risk mitigation measures to minimize new Covid-19 occurrence are: (1) reducing the probability ( ) of occurrence, by way of implementing strict health protocols (e.g: washing hands, wearing face masks, and maintaining body immunity); and (2) reducing the number of trials. ( ) by limiting each individual from potentially high-risk activities that may lead to infection (e.g: avoiding the crowd and physical contact, and avoiding touching the face).