Performance improvement via bagging in probabilistic prediction of chaotic time series using similarity of attractors and LOOCV predictable horizon

Neural Computing and Applications, Jul 2017

Recently, we have presented a method of probabilistic prediction of chaotic time series. The method employs learning machines involving strong learners capable of making predictions with desirably long predictable horizons, where, however, usual ensemble mean for making representative prediction is not effective when there are predictions with shorter predictable horizons. Thus, the method selects a representative prediction from the predictions generated by a number of learning machines involving strong learners as follows: first, it obtains plausible predictions holding large similarity of attractors with the training time series and then selects the representative prediction with the largest predictable horizon estimated via LOOCV (leave-one-out cross-validation). The method is also capable of providing average and/or safe estimation of predictable horizon of the representative prediction. We have used CAN2s (competitive associative nets) for learning piecewise linear approximation of nonlinear function as strong learners in our previous study, and this paper employs bagging (bootstrap aggregating) to improve the performance, which enables us to analyze the validity and the effectiveness of the method.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs00521-017-3149-7.pdf

Performance improvement via bagging in probabilistic prediction of chaotic time series using similarity of attractors and LOOCV predictable horizon

I C O N I P Performance improvement via bagging in probabilistic prediction of chaotic time series using similarity of attractors and LOOCV predictable horizon Shuichi Kurogi 0 1 Mitsuki Toidani 0 1 Ryosuke Shigematsu 0 1 Kazuya Matsuo 0 1 0 Department of Control Engineering, Kyushu Institute of Technology , Tobata, Kitakyushu, Fukuoka 804-8550 , Japan 1 & Shuichi Kurogi Recently, we have presented a method of probabilistic prediction of chaotic time series. The method employs learning machines involving strong learners capable of making predictions with desirably long predictable horizons, where, however, usual ensemble mean for making representative prediction is not effective when there are predictions with shorter predictable horizons. Thus, the method selects a representative prediction from the predictions generated by a number of learning machines involving strong learners as follows: first, it obtains plausible predictions holding large similarity of attractors with the training time series and then selects the representative prediction with the largest predictable horizon estimated via LOOCV (leave-one-out cross-validation). The method is also capable of providing average and/or safe estimation of predictable horizon of the representative prediction. We have used CAN2s (competitive associative nets) for learning piecewise linear approximation of nonlinear function as strong learners in our previous study, and this paper employs bagging (bootstrap aggregating) to improve the performance, which enables us to analyze the validity and the effectiveness of the method. Probabilistic prediction of chaotic time series; Long-term unpredictability; Attractors of chaotic time series; Leave-one-out cross-validation; Estimation of predictable horizon 1 Introduction So far, a number of methods for time series prediction have been studied (cf. [ 1, 2 ]), and our methods have awarded 3rd and 2nd places in the competitions of time series prediction held at IJCNN’04 [3] and ESTSP’07 [ 4 ], respectively. Our methods have used model selection methods evaluating MSE (mean square prediction error) for holdout and/or cross-validation datasets. Recently, we have developed several model selection methods for chaotic time series prediction [ 5, 6 ]. The method in [5] utilizes moments of predictive deviation as ensemble diversity measures for model selection in time series prediction and achieves better performance from the point of view of MSE than the conventional holdout method. The method in [ 6 ] uses direct multistep ahead (DMS) prediction to apply the out-of-bag (OOB) estimate of MSE. Although both methods have selected the models to generate good predictions on average, they cannot always have provided good predictions, especially when the horizon to be predicted is large. This is owing mainly to the fact that the MSE of a set of predictions is largely affected by a small number of predictions with short predictable horizons even if most of the predictions have long predictable horizons. This is because the prediction error of chaotic time series increases exponentially with the increase in time after the predictable horizon (see [ 6 ] for the analysis and [ 1 ] for properties of chaotic time series). Instead of using model selection methods employing the estimation of the MSE, we have developed a method of probabilistic prediction of chaotic time series [ 7 ]. Here, from [ 8 ], the probabilistic prediction has come to dominate the science of weather and climate forecasting, mainly because the theory of chaos at the heart of meteorology shows that for a simple set of nonlinear equations (or Lorenz’s equations shown below) with initial conditions changed by minute perturbations, there is no longer a single deterministic solution and hence all forecasts must be treated as probabilistic. Although most of the methods shown in [ 8 ] use ensemble mean for representative forecast, our method in [ 7 ] (see below for details) uses an individual prediction selected from a set of plausible predictions for the representative because our method employs learning machines involving strong learners capable of making predictions with small error for a desirably long duration and we can see that ensemble mean does not work when the set of predictions for the ensemble involves a prediction with short predictable horizon. This is owing mainly to the exponential increase in prediction error of chaotic time series after the predictable horizon (see Sect. 3.2 for details) Thus, instead of using ensemble mean, our method in [ 7 ] firstly selects plausible predictions by means of evaluating the similarity of attractors between training and predicted time series and then obtains the representative prediction by means of LOOCV (leave-one-out cross-validation) to select the prediction with longer predictable horizon. Comparing with our previous methods using the MSE for model selection [ 5, 6 ], the method in [7] has an advantage that it is capable of selecting the representative prediction from plausible predictions for each start time of prediction and providing the estimation of predictable horizon. Furthermore, it has achieved long predictable horizons on average. However, there are several cases where the method selects representative prediction with short predictable horizon, although there are plausible predictions with longer predictable horizons. To overcome this problem, this paper tries to improve the performance of learning machines by using bagging (bootstrap aggregating) method and show the analysis of LOOCV predictable horizon. Here, the bagging is known to use ensemble mean to have an ability to reduce the variance of predictions by single learning machines, and then, we can expect that the performance in time series prediction becomes more stable and higher. Note that, in this paper, the bagging ensemble is employed for iterated one-step-ahead (IOS) prediction of time series, and we deal with probabilistic prediction as an ensemble of longer-term predictions. Furthermore, we use CAN2 (competitive associative net 2) as a learning machine (see [ 3 ] for the details of CAN2), where CAN2 has been introduced for learning piecewise linear approximation of nonlinear function and the performance has been shown in evaluating predictive uncertainty challenge [ 9 ], where our method has been awarded the first place in regression problems. The CAN2 has been used in our methods [ 3, 4 ] for the competitions of time series predictions shown above. We show the present method of probabilistic prediction of chaotic time series in Sect. 2, experimental results and analysis in Sect. 3, and the conclusion in Sect. 4. 2 Probabilistic prediction of chaotic time series 2.1 IOS prediction of chaotic time series Let yt ð2 RÞ denote a chaotic time series for a discrete time t ¼ 0; 1; 2; . . . satisfying yt ¼ rðxtÞ þ eðxtÞ; where rðxtÞ is a nonlinear target function of a vector xt ¼ ðyt 1; yt 2; . . .; yt kÞT generated by k-dimensional delay embedding from a chaotic differential dynamical system (see [ 1 ] for the theory of chaotic time series). Here, yt is obtained not analytically but numerically, and then, yt involves an error eðxtÞ owing to an executable finite calculation precision. This indicates that there are a number of plausible target functions rðxtÞ with allowable error eðxtÞ. Furthermore, in general, a time series generated with higher precision has small prediction error for longer duration of time from the initial time of prediction. Thus, let a time series generated with a high precision (or 128-bit precision; see Sect. 3 for details), be ground truth time series y½tgt , while we examine predictions generated with standard 64-bit precision. Let yt:h ¼ yt ytþ1. . .ytþh 1 denote a time series with the initial time t and the horizon h. For a given training time series ytg:hg ð¼ y½ttgr:ahign Þ, we are supposed to predict succeeding time series ytp:hp for tp tg þ hg. Then, we make the training dataset D½train ¼ fðxt; ytÞ j t 2 I½train g for I½train ¼ ft j tg þ k t\tg þ hgg to train a learning machine. After the learning, the machine executes IOS prediction by ð1Þ ð2Þ y^t ¼ f ðxtÞ for t ¼ tp; tpþ1; . . ., recursively, where f ðxtÞ denotes prediction function of xt ¼ ðxt1; xt2; . . .; xtkÞ whose elements are given by xtj ¼ yt j for t j\tp and xtj ¼ y^t j for t j tp. Here, we suppose that yt for t\tp is known as the initial state for making the prediction y^tp:hp . As explained above, we execute the prediction with standard 64-bit precision, and we may say that there are a number of plausible prediction functions f ðxtÞ with small error for a duration of time from the initial time of prediction by means of using strong learning machines. 2.2 Single CAN2 and the bagging for IOS prediction We use CAN2 as a learning machine. A single CAN2 has N units. The jth unit has a weight vector wj,ðwj1; . . .; wjkÞT 2 Rk 1 and an associative matrix (or a row vector) Mj , ðMj0; Mj1; . . .; MjkÞ 2 R1 ðkþ1Þ for j 2 IN ,f1; 2; . . .; Ng. The CAN2 after learning the training dataset D½train ¼ fðxt; ytÞ j t 2 I½train g approximates the target function rðxtÞ by ð3Þ implausible ones. To have this done, we select the following set of plausible predictions: Y ½Sth tp:hp ¼ ½hN ytp;hp where S y½thp;Nhp ; y½ttgr:ahign , S y½thp;Nhp ; y½train tgP:hg P i j a½ihjN rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffirffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P P P P i j a½ihjN a½itjrain 2 ½train 2 i j aij Sth; hN 2 H ð5Þ ð6Þ denotes the similarity of two-dimensional attractor (trajectory) distributions a½ihjN and a½itjrain of time series y½thp;Nhp and y½ttgr:ahign , respectively, and Sth is a threshold. Here, the twodimensXional attractor distribution, aij, of a time series yt:h is given by aij ¼ tþh 1 s¼t 1 ys v0 Da ¼ i ^ ysþ1 v0 Da ¼ j ; ð7Þ where v0 is a constant less than the minimum value of yt for all time series and Da indicates a resolution of the distribution. Furthermore, 1fzg is an indicator function equal to 1 if z is true, and 0 if z is false, and b c indicates the floor function. 2.3.2 LOOCV measure to estimate predictable horizons ð4Þ y½thp:Nhp and y½thp:Nh0p in Y½tSp:thhp as Let us define predictanble horizon between two predictioons NbeuraleComput & Apeplic e yt ¼ ycðtÞ ¼ McðtÞ xt; e e where xt , ð1; xtTÞT 2 Rðkþ1Þ 1 denotes the (extended) input vector to the CAN2, and ycðtÞ ¼ McðtÞ xt is the output value of the c(t)th unit of the CAN2. The index c(t) indicates the unit who has the weight vector wcðtÞ closest to the input vector xt, or cðtÞ, argminj2IN kxt wjk: Note that the above prediction performs piecewise linear approximation of y ¼ rðxÞ and N indicates the number of piecewise linear regions. We use the learning algorithm shown in [ 10 ] whose high performance in regression problems has been shown in evaluating predictive uncertainty challenge [ 9 ]. We obtain bagging prediction by means of using a number of single CAN2s as follows (see [ 11, 12 ] for details); let D½na];j ¼ fðxt; ytÞ j t 2 I½na];j Þg be the jth bag (multiset, or bootstrap sample set) involving na elements, where the elements in D½na];j are resampled randomly with replacement from the training dataset D½train involving n ¼ jD½train j elements. Here, a ð [ 0Þ indicates the bag size ratio to the given dataset, and j 2 J½bag ,f1; 2; . . .; bg. Here, note that a ¼ 1 is used in many applications (see [ 12, 13 ]), which we use in the experiments shown below after the tuning of a (see [ 12 ] for validity and effectiveness of using variable a). Using multiple CAN2s employing N units after leaning D;½na] j, which we denote h½Nj ð2 HN ,fh½Nj j j 2 J½bag gÞ, tXhe baggingD forEpredicting the target value rtc ¼ rðxtc Þ is done by h½bag y^½ N t 1 , b j2J½bag y^½tj y^½tj j2J½bag where y^½tjc ,y^½j ðxtc Þ denotes the prediction by the jth machine h½Nj . The angle brackets h i indicate the mean, and the subscript j 2 J½bag indicates the range of the mean. For simple expression, we sometimes use h ij instead of h ij2J½bag in the following. 2.3 Probabilistic prediction and estimation of predictable horizon 2.3.1 Similarity of attractors to select plausible predictions ½hN First, we make a number of IOS predictions y^tp:hp ¼ ytp:hp by means of learning machines or CAN2s, hN 2 H, with different number N of units, where H indicates the set of all learning machines. We employ single and bagging CAN2s, which we denote h½single and h½bag , respectively, if N N necessary. We suppose that there are a number of plausible prediction functions f ð Þ ¼ f ½hN ð Þ, and we have to remove h y½thp:Nhp ; y½hN0 tp:hp ¼ max h 8s\h hp; jy½thpþNs y½hN0 tpþsj ey ; ð8Þ where ey indicates the threshold of prediction error to determine the horizon. Then, we employ LOOCV method to estimate predictable horizon of y½thp:Nhp in Y ½tSp:thhp . Namely, we use D ~½hN htp:hp ¼ h y½thp:Nhp ; Y ½tSp:thhp ¼ h y½thp:Nhp ; y½hN0 tp:hp E fy½thp:Nhp g y½thpN:h0p 2Y½tSpt:hhp fy½thpN:hp g ; ð9Þ which we call LOOCV measure of predictable horizon or LOOCV predictable horizon. Here, we expect that h y½thp:Nhp ; Y ½tSp:thhp fy½thp:Nhp and h y½thp:Nhp ; y½tgt have positive correlation by means of assuming that Y½tSp:thhp involves a number of predictions neighboring y½tgt . 2.3.3 Probabilistic prediction involving longer LOOCV predictable horizons 8 9 Let a subs<et of plausible pred=ictions involving longer LOOCV predictable horizons be : ; Y ½tHp:hthp;Sth ¼ ½hrðiÞ ytp:hp i Y½Sth j tp:hp j Hth ; where rðiÞ denotes the order of LOOCV predictable horizons satisfying h~½hrðiÞ tp:hp threshold Hth ð0\Hth ~½hrðiþ1Þ for i ¼ 1; 2; . . .; jY ½tSp:thhp j. The htp:hp 1Þ indicates the ratio of the number of elements in Y½tHp:hthp;Sth and Y ½tSp:thhp , or Y ½tHp:hthp;Sth ¼ Hth Y ½tSp:thhp . Now, we derive the p(ro$bability %of the)+prediction yt for * tp t\tp þ hp as pðvi yt\viþ1Þ ¼ 1 y½th v0 Dv ¼ i h2H½Hth;Sth ð10Þ ð11Þ where H½Hth;Sth is the set of parameters h of learning machines which have generated y½tp:hp 2 Y ½tHp:hthp;Sth , and Dv h denotes the resolution of yt, and vi ¼ iDv þ v0 for i ¼ 0; 1; 2; . . .. Note that the probability pðvi yt viþ1Þ indicates how much the plausible predictions in Y ½Hth;Sth tp:hp take the values in between vi and viþ1. 2.3.4 Representative prediction and estimation of predictable horizon Now, we provide y½thp:rhð1pÞ as a representative prediction, and an estimationn of the predictable horizon h½thp:rhð1pÞo¼ h y½thp:rhð1pÞ ; y½gt tp:hp as h^½thp:rhð1pÞ ¼ min hðy½thp:rhð1pÞ ; y½thp:hp Þ 8ythp:hp 2 Yt½pH: hthp;Sth ½hrð1Þ ; ytp:hp ð12Þ where we have to tune Hth from the point of view of accuracy and safeness. Here, the safe estimation of h^½thp:rhð1pÞ indicates that h^½thp:rhð1pÞ is smaller than or equal to the actual ½hrð1Þ , and we can see that h^½hrð1Þ predictable horizon htp:hp tp:hp become safer with the increase in Hth. 3 Numerical experiments and analysis 3.1 Experimental settings We use the Lorenz time series, as shown in Fig. 1 and [ 6 ], obtained from the original differential dynamical system given by dxc dtc ¼ rðyc dzc dtc ¼ xcyc xcÞ; bzc; dyc dtc ¼ xczc þ rxc yc; ð13Þ for r ¼ 10, r ¼ 28 and b ¼ 8=3. Here, we use tc for continuous time and t ð¼ 0; 1; 2; . . .Þ for discrete time related by tc ¼ tT with the sampling time or the embedding delay T ¼ 25 ms. We have generated the time series y½tgt ¼ xcðtTÞ for t ¼ 1; 2; . . .; 5000 from the initial state ðxcð0Þ; ycð0Þ; zcð0ÞÞ ¼ ð 8; 8; 27Þ via the fourth-order Runge–Kutta method with step size Dt ¼ 10 4 and r ¼ 128-bit precision of GMP (GNU multiprecision library). Using y½train tg:hg ¼ y½0g:t2000, we make the training dataset D½train ¼ fðx½tgt ; y½tgt Þ j t 2 I½train g for I½train ¼ f10 ð¼ kÞ; 11; . . .; 1999g and x½tgt ¼ ðy½tgt1; . . .; y½tgtkÞT. For learning machines hN , we have employed single CAN2s h½Nsingle and bagging CAN2s h½Nbag with the number of units N ¼ 5 þ 20i ði ¼ 0; 1; 2; . . .; 14Þ. After the training, we execute IOS prediction y^t ¼ f ½hN ðxtÞ for t ¼ tp; tp þ 1; . . . with the initial input vector xtp ¼ ðy½tgpt 1; . . .; y½tgpt kÞ for prediction start time tp 2 Tp ¼ f2000 þ 100i j i ¼ 0; 1; 2; . . .; 29g and prediction horizon hp ¼ 500. We show experimental results for the embedding dimension being k ¼ 10 and the threshold in (8) being ey ¼ 10 (see [ 7 ] for the result with k ¼ 8, which is not significantly but slightly different). Fig. 1 Lorenz time series yt for t ¼ 0; 1; 2; . . .; 4999, or ground truth time series y½gt 0:5000 D E In order to estimate the accuracy of y½tgt , we have obtained an average predictable horizon Dt ¼ 10 5 and r ¼ 128-bit precision via the Runge–Kutta method. This indicates that y½tgt with Dt ¼ 10 4 and r ¼ 128 is considered to be accurate during 230 steps on average because we have observed that predictable horizon of two time series generated by the Runge–Kutta method with step sizes Dt ¼ 10 n and 10 n 1 for n ¼ 3; 4; 5; 6; 7 increases monotonically with the decrease in step size or the increase in n. Here, note that we have executed several experiments with using the parameter h ¼ ðN; kÞ for k ¼ 6, 8, 10, 12 and so on, and we do not have found out any critically different results, although we would like to execute and show the results of comparative study in our future research. 3.2 Results and analysis First, we show an example of all predictions y½thp:Nhp for tp ¼ 2300 in Fig. 2a. Note that tp ¼ 2300 is the start time of representative prediction y½thp:rhð1pÞ with predictable horizon ½hrð1Þ being smaller than 100 by single CAN2 (actually htp:hp h½thp:½rshðin1pgÞle ¼ 72) and improved by bagging CAN2 as h½thp:½rbhða1pgÞ ¼ 183 (see Fig. 3a). In Fig. 2b, we can see that single CAN2s have larger number of predictions with the similarity S smaller than Sth ¼ 0:8 than bagging CAN2s at t ¼ 2799, and their predictions are not selected as plausible predictions. A detailed analysis of the similarity is shown below. The representative prediction y½thp:rhð1pÞ (green) shown in (c) is chosen by means of selecting the largest LOOCV predictable horizon h~½thp:rhð1pÞ shown in (d). From (d), we can see that the single CAN2 (left) has actual predictable horizon h½thp:Nhp larger than 200 and LOOCV predictable horizon ~½thp:Nhp smaller than 100, actually ðh½thp:Nhp ; h~½hN h tp:hp Þ ¼ ð209; 72:1Þ. Since the present method selects the prediction with the largest h~½thp:Nhp , the prediction with h½thp:Nhp ¼ 209 could not have selected. On the other hand, we can see that bagging CAN2 (right in (d)) successfully selects the prediction with htp:hp larger than 100, actually ðh½thp:Nhp ; h~½hN ½hN tp:hp Þ ¼ ð183; 191Þ. Precisely, bagging CAN2s have successfully provided large h~½hN tp:hp ¼ 191 because there are a number of predictions ½hN with long predictable horizons around htp:hp ¼ 200 as shown as the group of points neighboring h½hN tp:hp ¼ 200 in (d) on the right-hand side. Incidentally, from (c), we can see that ensemble mean does not seem appropriate for producing representative prediction in long-term prediction oDf chaotic time series. E In Fig. 3, we show the results of actual and estimated prediDctable horizons. NoteE that we have obtained and t2Tp 25 ms) and the former is almost the same as the mean of predictable horizons achieved by single and bagging CAN2 being 170 and 175 steps, respectively. This indicates that single and bagging CAN2s after learning the training data generated via the Runge–Kutta method with the step size Dt ¼ 10 4 have almost the same prediction performance as the Runge–Kutta method with Dt ¼ 5 10 4. Although we do not have no general measure to evaluate time series prediction so far, the above method using the step size of Runge–Kutta method and the mean predictable horizon seems reasonable. In Fig. 3a, we can see that the performance of the stability of prediction by single CAN2 is improved by bagging CAN2 from the point of view that the h½single former has four actual predictable horizons h½tp:rhð1pÞ smaller than 100 among all predictions for tp 2 Tp and bagging h½bag CAN2 has achieved all h½tp:rhð1pÞ larger than 100. From (b), we can see that the estimated predictable horizon h^½hrð1Þ with tp:hp Hth ¼ 0:5 is almost the same as actual predictable horizon ½hrð1Þ , while Hth ¼ 0:9 has achieved safe estimation, or htp:hp h^½thp:rhð1pÞ h½thp:rhð1pÞ , In order to analyze the property of the method, we show the attractor distribution of training and representative time series in Fig. 4. We can see that the similarity of attractors h½single Sðy½tp:rhð1pÞ ; y½ttgr:ahign Þ ¼ 0:859 obtained by single CAN2 is h½bag smaller than Sðy½tp:rhð1pÞ ; y½ttgr:ahign Þ ¼ 0:939 obtained by bagging CAN2. From the result on the left in Fig. 2b, we can see that there is a prediction with the similarity larger than 0.859 for single CAN2. Actually, the maximum similarity of single CAN2s is 0.931. The prediction y½hrSð1Þ with the tp:hp maximum similarity of attractors in plausible predictions has a possibility to be used for selecting a representative -10 (b) 1 t t t ˜[θσ(1) ] htp :hp ˜ [θN ] htp :hp 2400 2500 2600 2700 2800 2400 2500 2600 2700 2800 2400 2500 2600 2700 2800 2400 2500 2600 2700 2800 yt[pθ:σh(p1)] Fig. 2 Experimental results obtained by single CAN2s (left) and bagging CAN2s (right) for the prediction start time tp ¼ 2300 and the horizon hp ¼ 500. The top row, a, shows superimposed original predictions y½thp:Nhp . b Shows time evolution of similarity S of attractors, and the predictions with S Sth ¼ 0:8 at t ¼ tp þ hp 1 ¼ 2799 are selected as plausible predictions. c Shows selected plausible prediction, where hrSð1Þ indicates the learning machine with ½hrð1Þ the maximum similarity. The comparison between htp:hp and h½thp:rhSpð1Þ is shown in Fig. 5a, where h½thp:rhSpð1Þ seems competitive with h½thp:rhð1pÞ for single CAN2, but worse for bagging predictions y½thp:Nhp as well as ground truth time series y½tgt (red) and representative prediction y½thp:rhð1pÞ (green). d Shows the relationship ½hN between actual predictable horizons htp:hp and LOOCV predictable horizons h~½thp:Nhp of plausible predictions (colour figure online) CAN2. To analyze much more, we have examined the correlation rðS½thp:Nhp ; h½hN ½hN tp:hp Þ between the similarity Stp:hp ¼ Sðy½thp:Nhp ; y½train tg:hg Þ htp:hp ¼ hðy½thp:Nhp ; y½train , ½hN tg:hg Þ and the predictable horizon as well as the correlation (a) 300 Fig. 4 Experimental result of attractor distribution: a a½train of ij training time series y½ttgr:ahign , b a½ihj½rsðin1Þgle of the representative prediction y½thp:½rshðin1pÞgle obtained by single CAN2 with rð1Þ ¼ N ¼ 145, and c a½ihj½rbða1gÞ rðh~½thp:Nhp ; h½hrSð1Þ tp:hp Þ as shown in Fig. 5b. From this result, there are a number of cases with positive low or negative value of correlations. In particular, the correlation of similarity, rðS½thp:Nhp ; h½hN tp:hp Þ, has few cases with the values larger than 0.5 for both single and bagging CAN2. This suggests that the selection of representative prediction by using the similarity measure is not so reliable. On the other hand, bagging CAN2 has larger number of cases with the correlations larger than 0.5 as we can see the thick line of rðh~½thp:Nhp ; h½hrSð1Þ tp:hp Þ on the right-hand side in Fig. 5b. Furthermore, we can see that there are several cases of tp with negative correlations rðh~½thp:Nhp ; h½hrSð1Þ tp:hp Þ in (b), and the h½bag ½ rð1Þ obtained by bagging CAN2 of the representative prediction ytp:hp with rð1Þ ¼ N ¼ 225, at t ¼ 2799. The resolution of the distributions is Da ¼ ðvmax v0Þ=40 ¼ ð18:5 ð 18:5ÞÞ=40 ¼ 0:925. The simi ½h½rsðin1Þgle ; y½ttgr:ahign Þ ¼ 0:859 and Sðy½thp:½rbhða1pgÞ ; y½ttgr:ahign Þ ¼ 0:939 larity Sðytp:hp corresponding predictable horizons h½thp:rhð1pÞ in (a) are shorter than the neighboring (w.r.t. tp) horizons. This correspondence seems reasonable because negative correlation does not contribute to the selection of the prediction with large predictable horizon. Thus, we have to remove the cases of negative correlations. So far, we have two approaches: one is to improve the performance of learning machine much more as we have done with the bagging method in this paper, and the other is to refine the selection method by means of modifying LOOCV predictable horizon or developing new methods. Actually, we have predictions with much longer predictable horizons not shown in this paper, but we cannot select such predictions without knowing the ground truth time series, so far. 0.5 -0.5 r(St[θp N:h]p ,h[tθpN:h]p ) r(h˜[tθpN:h]p ,h[tθp N:h]p ) r(St[θp N:h]p ,h[tθp N:h]p ) 3000 4000 5000 3000 tp 4000 5000 Fig. 5 Experimental result of a predictable horizons, h½thp:rhð1pÞ and h½thp:rhSpð1Þ , and b the correlations rðh~½thp:Nhp ; h½hN tp:hp Þ for single CAN2s tp:hp Þ and rðS½thp:Nhp ; h½hN (left) and bagging CAN2s (right) 4 Conclusion We have presented a performance improvement in the method for probabilistic prediction of chaotic time series by means of using bagging learning machines. The method obtains a set of plausible predictions by means of using similarity of attractors between training and predicted time series. And then, it provides representative prediction which has the longest LOOCV predictable horizon. By means of executing numerical experiments using single and bagging CAN2s, we have shown that bagging CAN2 improves the performance of single CAN2 and analyzed the relationship between LOOCV and actual predictable horizons. In our future research studies, we would like to overcome the problem of negative correlation between the achieved predictable horizon and the LOOCV predictable horizon, or the measure of selecting representative prediction. Compliance with ethical standards Conflict of interest The authors declare no conflicts of interest associated with this article. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://crea tivecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. 1. Aihara K ( 2000 ) Theories and applications of chaotic time series analysis . Sangyo Tosho , Tokyo 2. Lendasse A , Oja E ( 2004 ) Time series prediction competition: the cats benchmark . Proc IJCNN 2004 : 1615 - 1620 3. Kurogi S , Ueno T , Sawa M ( 2007 ) Time series prediction of the CATS benchmark using Fourier bandpass filters and competitive associative nets . Neurocomputing 70 ( 13 -15): 2354 - 2362 4. Kurogi S , Tanaka S , Koyama R ( 2007 ) Combining the predictions of a time series and the first-order difference using bagging of competitive associative nets . In: Proceedings of the European symposium on time series prediction (ESTSP) 2007 , pp 123 - 131 5. Kurogi S , Ono K , Nishida T ( 2013 ) Experimental analysis of moments of predictive deviations as ensemble diversity measures for model selection in time series prediction . In: Proceedings of ICONIP , ( 2013 ) Part III , LNCS 8228 . Springer, Heidelberg 6. Kurogi S , Shigematsu R , Ono K ( 2014 ) Properties of direct multistep ahead prediction of chaos time series and out-of-bag estimate for model selection . In: Proceedings of ICONIP2014 , Part II , LNCS 8835. Springer, Heidelberg 7. Kurogi S , Toidani M , Shigematsu R , Matsuo K ( 2015 ) Prediction of chaotic time series using similarity of attractors and LOOCV predictable horizons for obtaining plausible predictions . In: Proceedings of ICONIP 2015 , LNCS 9491, pp 72 - 81 8. Slingo J , Palmer T ( 2011 ) Uncertainty in weather and climate prediction . Phil Trans R Soc A 369 : 4751 - 4767 9. Quin˜ onero-Candela J , Rasmussen CE , Sinz FH , Bousquet Q , Scho¨lkopf B ( 2006 ) Evaluating Predictive Uncertainty Challenge . In: Quin˜ onero-Candela J et al (eds ) MLCW 2005 , LNAI 3944. Springer, Heidelberg, pp 1 - 27 10. Kurogi S , Sawa M , Tanaka S ( 2006 ) Competitive associative nets and cross-validation for estimating predictive uncertainty on regression problems . Lecture Notes on Artificial Intelligence (LNAI ) 3944 : 78 - 94 11. Breiman L ( 1996 ) Bagging predictors . Mach Learn 26 : 123 - 140 12. Kurogi S ( 2009 ) Improving generalization performance via outof-bag estimate using variable size of bags . J Jpn Neural Netw Soc 16 ( 2 ): 81 - 92 13. Efron B , Tbshirani R ( 1997 ) Improvements on cross-validation: the 632? bootstrap method . J Am Stat Assoc 92 : 548 - 560


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs00521-017-3149-7.pdf

Shuichi Kurogi, Mitsuki Toidani, Ryosuke Shigematsu, Kazuya Matsuo. Performance improvement via bagging in probabilistic prediction of chaotic time series using similarity of attractors and LOOCV predictable horizon, Neural Computing and Applications, 2017, 1-9, DOI: 10.1007/s00521-017-3149-7