Github repository is currently not available and will be available in next update.
Methodology
In summary, the search engine runs on a ridge regression model, with a predefined penalty, and the regressor and regressand are defined by the users. There are two main functions for the search engine:
1) To find potential features to predict target series
2) To explore what the target series can predict
After defining the search options, the search engine will do the following steps:
1) Obtain relevant X based on the start and end date of the target series. Data will be selected if there is full data between start and end date. Also it’s based on the Frequency of the target series (y). E.g., if it’s Monthly data, then “Monthly”, “Weekly” and “Daily” data will be examined, first resample to monthly data, use “average” and ‘last’ for the resampling method, and obtain the X from the database.
2) Get Forecast of X based on the Horizon chosen by the user, it’s cached in the database. This aims to remove trend, seasonality, autoregressive component of X, such that it may capture a better relationship between X and y.
3) Get Forecast of y using some benchmark models, basically the same as step 2.
4) Do Ridge regression with a penalty K times on
\(\varepsilon \sim L_1(\varepsilon_X) + \text{Avg}(L(\varepsilon_X))\), or other user defined formulas
where \(\varepsilon_X\) is the residuals of \(X\) and \(\hat{X}\) and \(\varepsilon\) is the residuals of \(y\) and \(\hat{y}\).
and K is the number of series that fit the data analysis criteria in the database.
5) Rank the series based on RMSE of a 5-fold TSCV, with a 10% test size each.
Note: the forecast is calculated based on a warm-up period, which is a minimum of 200 points and 40% of the original data length, and do a rolling 1-step forecast. This rolling forecast aims to prevent data leakage problems.