SARIMA-LSTM COMBINATION FOR COVID-19 CASE MODELING

: The study of SARIMA method in combination with LSTM is interesting to do. This combination method can be convincing and significant because the data collected is numerical and saved based on time. In addition, the proposed method can anticipate datasets, either linear or non-linear. Based on several previous studies, the SARIMA method has the advantage of completing linear datasets while the LSTM method excels in achieving non-linear datasets. Also, both methods have been shown to have an accuracy value compared to some other methods. This study tried to combine the two through several stages of the first stage of applying the SARIMA method using fit datasets (linear data) then residual Dataset (non-linear data) analysed using the LSTM method. The result of the combination methods will be checked for the accuracy value. This research will be compared by using SARIMA and LSTM methods separately. The Dataset used as a trial is COVID-19 patient data in the United States. The results showed that the combination of SARIMA-LSTM method is better than either SARIMA or LSTM alone with RMSE of 0.33905765 and MAE of 0.29077017.


INTRODUCTION
The term SARIMA designates a seasonal autoregressive integrated moving average.One of the time series method's topics is this model.To handle problems involving timerun results, time series are frequently used.The time-series methodology is used in a https://doi.org/10.31436/iiumej.v23i2.2134variety of domains, including the economics and financial turnaround results at hospitals [1][2][3].
A discussion of the time series method was held to complete the modelling of potential bioelectric plant data.Using autoregressive (AR), moving average (MA), and ARIMA models are one example.However, some of these models have not produced the best value.The average error rate for mean square error (MSE) and the mean absolute error (MAE) continues to be strong.Although the average prediction accuracy is still about 75% [4][5][6].Therefore, this study aims to increase accuracy by using another time series model, SARIMA.In this study, the SARIMA method was combined with the LSTM method.
The combination of SARIMA with other methods is proven to have better accuracy results.Among them is the combination of the SARIMA method with other methods, including the SVM method, to predict the production value of the machine industry in Taiwan [7].The results showed that SARIMA hybrid accuracy with SVM is better than with each method.Other studies used hybrid ARIMA with ANN for forecasting pollution index in cities throughout Southeast Asia and further research used the same approach to predict tourists coming at Minangkabau international airport [8].The results showed the accuracy value of hybrid methods is better.Subsequent research compared SARIMA and ANN methods to predict power absorption in Turkey's electricity users [9].After 12 weeks, the results showed that the ANN method's MAPE value was 1.8% better than SARIMA because it had a MAPE of 2.6%.However, in certain conditions, such as the time after a holiday, the result is the opposite.Another study combined SARIMA with SVM, and then in analysis using clustering [10], this research was used to predict passengers at northern Iranian stations.The result is a better mix of these approaches than the individual methods.As a result, the integrated approach outperforms the respective processes.Additional research on the combination of SVR-SARIMA models was done for tourist forecasting [11], for the best model's determination using the decision support system PROMETHEE II.The result is the same combination method is better.
Thus, this study combined the SARIMA method with LSTM.The SARIMA model successfully predicts a person's position for linear data set type better than the deep learning method [12], and also the SARIMA model has been tested with high accuracy of about 80% [13].According to research, the Long-Short Term Memory (LSTM) Recurrent Neural Network on Workload Forecasting Models for Cloud Datacentres has generated empirical results.The proposed method achieves high accuracy in prediction by reducing average squared errors by up to 3.17 x 10 -3 [14].Therefore, we use both methods because both can solve problems for linear and non-linear data sets.In addition, this study will compare the SARIMA-LSTM combination with each method separately.

METHOD 2.1 ARIMA Model
ARIMA is a term derived from its parts: autoregressive (AR), integration (I), and moving average (MA) shape.In general, the ARIMA models are classified into two types, namely non-traditional (non-seasonal) ARIMA and Seasonal ARIMA models [15][16][17].The ARIMA model is as follows: ARIMA (p,d,q).p represents the sum of AR values, d is the value of integration (I), and q is the MA value.In general, the ARIMA model (p,d,q) can be seen from the model as follows: https://doi.org/10.31436/iiumej.v23i2.2134 There are 3 main components in the model, the first being AR (p), The second is differentiation through I (d), The third was indeed MA (q), (1 +  1 B..... +  q B q )e t (4) c, on the other side, is a constant value.

Seasonal ARIMA Model
The seasonal ARIMA, or SARIMA model, is a model or shape that repeats itself at regular intervals.For stationary datasets, seasonality can be detected from the ACF plot.If the ACF visualization shows seasonal patterns, it will be done with a different solution [18][19][20].In general, the seasonal ARIMA equation is shown in eqn. 5.
ARIMA(p,d,q)(P,D,Q) s (5) where (p,d,q) is the non-seasonal ARIMA model index, while (P, D, Q) is the seasonal ARIMA model, and S is the number of periods on the seasonal model.
For example, if ARIMA (1,0,0), then the model follows the following eqns.6 and 7: ( where is BYt = Yt-1.So To detect seasonal datasets, there are several chart techniques including sequential plots, seasonal plot subseries, multiple box plots, and autocorrelated plots.The study will use autocorrelated schemes to detect seasonality.One of the solutions for this autocorrelation plot is to use seasonal differential operators.

LSTM
Long-short-term Memory (LSTM) is a form of RNN that consists of a collection of cells with features that allow them to memorize data sequences.Data streams are captured and stored in cells.The cell then connects one module from the past to another, allowing data to be transmitted from several previous instances to the present.The data in every cell can be rejected, screened or started adding as a result of the gates in every cell in preparation for the cells that come after [19,21].
The gates focus on a neural network with sigmoidal shape layers, and the active cells either transfer data or discard it.Each sigmoid layer generates a number between 0 and 1, indicating the sum of each data segment that must be permitted in every cell.More precisely, The estimated low value assumes that "nothing should be allowed to pass"; while forecast one shows that "let it all pass".Every LSTM has three types of gates that regulate a state for every cell: 1) Forget Gate produces a value between 0 and 1, with 1 denoting success."fully save this"; whereas 0 says "ignore this." 2) Memory Gate The sigmoid layer, following either by the tanh layer, determines which of the cell's most recent data must be kept.The first sigmoid layer, known as https://doi.org/10.31436/iiumej.v23i2.2134 the "doorway layer," selects which values to modify.The tanh layer then generates a new candidate value vector, which can be added to the state.
3) The Output Gate determines what to make from each cell.The final value will be determined by the cell state as well as newly added filtered data.
If the distance is vast, the RNN will be unable to predict the next result.Consider the following text: "I go to work every day" and "I work hard at the office."The location's name is the next possible word for current knowledge, but deciding what kind of location to use is difficult.Since there was some related knowledge in the previous period, RNN cannot learn to relate information.As a result, LSTM is a solution for overcoming these flaws.
LSTMs can study long-term dependencies.Remembering information for a long time is the default behavior.Some of the equations show this module as follows [22], [23].

Combination of SARIMA Method with LSTM
The steps of this combination are described in Fig. 1.

SARIMA Model Implementation
In implementing the SARIMA method, COVID-19 patients who died based on gender in the United States were used.The original data used as many as 1008 datasets but because many values were empty (missing value), the empty data was deleted.The net data was 711 datasets.The analysis process used R and Minitab software.

Identification
As a dataset trial, male COVID-19 patients' datasets were used.A plot of male COVID-19 patients who died in the United States is shown in Fig. 2.

Box-Cox Plot of Male
Because the value is still 0.11 should be worth 1.So, the transformation process is carried out as follows: Since the lambda value = 1 is stationary.It then checked stationary against the average by looking at its ACF and PACF scores.

b) Stationary checks against averages by checking ACF and PACF values
Based on ACF and PACF lag, 1-3 images are still inside the significance interval.
Then it has been declared stationary against the average.Based on Fig. 12, it appears that there is a trend.That is, there is an increase in the number of COVID-19 patients who die per 10 datasets.Therefore, the identification process is carried out further by differencing

Estimation
In this process, an analysis is carried out based on the sex of the male.The ACF chart shows the dying down as much as five lags.Meanwhile, the PACF chart shows the cut-off pattern.Thus, the non-seasonal ARIMA model formed is ARIMA (5,1,0).They are furthermore checking the non-seasonal ARIMA model.
The step to determine the seasonal ARIMA model is the same as that performed to find the best model in non-seasonal ARIMA by determining its ACF and PACF charts.The seasonal value determines the difference.In this case, it is 10 because for every 10 datasets, there is a significant increase in COVID-19 patient deaths.

Model Evaluation
At this stage, checking the error value and other values.Based on the output of the auto_arima, the evaluation value as follows.
SARIMA (2,1,2)(0,0,2) 12   Next, check the error value and accuracy.Here are the evaluation values obtained: These values are the best value when compared to other models.

Forecasting
Based on Fig. 10, the number of COVID-19 patients who died of male gender in the USA, in general, decreased for the following forecasting result.The average difference is about 500 people.

Implementation of LSTM and Combination of SARIMA-LSTM method
This analysis used the parameters of batch size of 100, the look_back of seven data to previous, and the epoch for learning of 100.Furthermore, for the forecasting process we split the data set with the composition of 80% training data and 20% testing data from the same data: male COVID-19 patients in the USA.Based on the analysis using LSTM obtained an RMSE value of 0.35847506 and MAE of 0.29837463.Based on RMSE and MAE results, the values are smaller than the SARIMA result, so it can be said that LSTM is better than SARIMA.From Fig 11, it appears that the number of COVID-19 patients who died on average tends to decrease.For the test, the SARIMA method combined with LSTM was done to determine how much accuracy through RMSE and MAE value obtained and prediction result.Based on the calculation resulted that an RMSE value was 0.33905765 and MAE was 0.29077017.
Those results presented that the RMSE and MAE of combination SARIMA-LSTM was better than the SARIMA and LSTM methods (Table 1).The results of combination SARIMA and LSTM are seen in Fig 12.This figure performed that the result is the best result because the predicted data is almost similar with real data.The number of COVID-19 suspected deaths decreased to near-zero.In addition, the table of comparison of RMSE and MAE are presented in Table 1.This result presented that the combination of SARIMA and LSTM is the best method for predicting the death of COVID-19 patients in the USA.

CONCLUSION
Based on the results of the study, the combination SARIMA-LSTM method is the best one.It performed better than the SARIMA or LSTM methods separately.Based on the results of general predictions using all three methods, there was a decrease in the number of male COVID-19 patients who died in the USA on average.This research has the limitation of not explaining the mortality number of patients in every state within the USA.For future work, the analysis will be carried out using a combination of SARIMA -PARCD methods.
for Male_lag1(with 5% significance limits for the partial autocorrelations)

Table 1 :
The comparison of RMSE and MAE from SARIMA, LSTM, and combination SARIMA-LSTM