Modeling of holidays and special events
If you want to model holidays or other recurring events, you need to create a dataframe for them. It has two columns (The holiday
andDS
) and a line for each occurrence of the holiday. It must contain all occurrences of the holiday, both in the past (back to historical dates) and in the future (back to forecasts). If they don't repeat themselves in the future, Prophet will model them and then not include them in the forecast.
You can also include columnslower_window
andupper_window
who extend the vacation[bottom_window, top_window]
days around the date. For example, if you wanted to include Christmas Eve in addition to Christmas, you would include thatlower_window=-1,upper_window=0
. If you want to take advantage of Black Friday in addition to Thanksgiving, you would include thatlower_window=0,upper_window=1
. You can also include a columnprior_scale
to set the prior scale for each holiday separately, as described below.
Here we create a dataframe that contains the dates of all of Peyton Manning's playoff appearances:
12345678910111213141516171819
# Rlibrary(dplyr)endgames <- data frame( The holiday = 'game start', DS = as.date(c('2008-01-13', '2009-01-03', '2010-01-16', '2010-01-24', '2010-02-07', '2011-01-08', '2013-01-12', '2014-01-12', '2014-01-19', '2014-02-02', '2015-01-11', '2016-01-17', '2016-01-24', '2016-02-07')), lower_window = 0, upper_window = 1)Superbowl <- data frame( The holiday = 'Super Bowl', DS = as.date(c('2010-02-07', '2014-02-02', '2016-02-07')), lower_window = 0, upper_window = 1)public holidays <- bind_rows(endgames, Superbowl)
123456789101112131415161718
#Pythonendgames = pd.data frame({ 'The Holiday': 'game start', 'ds': pd.to_datetime(['2008-01-13', '2009-01-03', '2010-01-16', '2010-01-24', '2010-02-07', '2011-01-08', '2013-01-12', '2014-01-12', '2014-01-19', '2014-02-02', '2015-01-11', '2016-01-17', '2016-01-24', '2016-02-07']), 'lower_window': 0, 'upper_window': 1,})Superbowl = pd.data frame({ 'The Holiday': 'Super Bowl', 'ds': pd.to_datetime(['2010-02-07', '2014-02-02', '2016-02-07']), 'lower_window': 0, 'upper_window': 1,})public holidays = pd.concatenated((endgames, Superbowl))
Above we have listed the Superbowl days as both playoff games and Superbowl games. This means that the superbowl effect is an added bonus to the playoff effect.
Once the table is created, holiday effects are included in the forecast by passing them alongpublic holidays
Fight. Here we do it with the data from Peyton Manning from theQuick Start:
123
# Rm <- a Prophet(df, public holidays = public holidays)forecast <- predict(m, future)
123
#Pythonm = Prophet(public holidays=public holidays)forecast = m.fit(df).predict(future)
The holiday effect can be seen inforecast
Data Frame:
12345
# Rforecast %>% choose(DS, game start, Super Bowl) %>% Filter(Abs(game start + Super Bowl) > 0) %>% tail(10)
123
#Pythonforecast[(forecast['game start'] + forecast['Super Bowl']).Abs() > 0][ ['ds', 'game start', 'Super Bowl']][-10:]
DS | game start | Super Bowl | |
---|---|---|---|
2190 | 2014-02-02 | 1.223965 | 1.201517 |
2191 | 2014-02-03 | 1.901742 | 1.460471 |
2532 | 2015-01-11 | 1.223965 | 0,000000 |
2533 | 2015-01-12 | 1.901742 | 0,000000 |
2901 | 2016-01-17 | 1.223965 | 0,000000 |
2902 | 2016-01-18 | 1.901742 | 0,000000 |
2908 | 2016-01-24 | 1.223965 | 0,000000 |
2909 | 2016-01-25 | 1.901742 | 0,000000 |
2922 | 2016-02-07 | 1.223965 | 1.201517 |
2923 | 2016-02-08 | 1.901742 | 1.460471 |
The holiday effects will also show up in the component chart, where we see that there is a spike in the days surrounding playoff appearances, with a particularly large spike for the Superbowl:
12
# Rprophet_plot_components(m, forecast)
12
#PythonFeige = m.plot_components(forecast)
Individual public holidays can also be drawn inplot_forecast_component
Function (imported fromprophet.plot
in Python) likeplot_forecast_component(m, forecast, 'superbowl')
to plan just the Superbowl holiday component.
Built-in shore leave
You can have a built-in collection of country-specific holidays by using theadd_country_holidays
Method (Python) or Function (R). The name of the country is given and then the major holidays for that country are included in addition to any holidays given viapublic holidays
argument described above:
1234
# Rm <- a Prophet(public holidays = public holidays)m <- add_country_holidays(m, country name = 'US')m <- becomes a prophet(m, df)
1234
#Pythonm = Prophet(public holidays=public holidays)m.add_country_holidays(country name='US')m.fit(df)
You can see which public holidays were includedtrain_holiday_names
(Python) ortrain.holiday.names
(R) Attribute of the model:
12
# Rm$train.holiday.names
12345678
[1] "Playoff" "Superbowl" [3] "New Year's Day" "Martin Luther King Jr. Day" [5] "Washington's Birthday" "Memorial Day" [7] "Independence Day" "Labor Day" [9] "Columbus Day" "Veterans Day"[11] "Veterans Day (observed)" "Thanksgiving"[13] "Christmas Day" "Independence Day (observed)"[15] "Christmas Day (observed)" "New Year's Day (observed) "
12
#Pythonm.train_holiday_names
1234567891011121314151617
0 Playoff1 Superbowl2 New Year's Day3 Martin Luther King Jr. Day4 Washington's Birthday5 Memorial Day6 Independence Day7 Labor Day8 Columbus Day9 Veterans Day10 Thanksgiving11 Christmas Day12 Christmas Day (note)13 Veterans Day (note)14 Independence Day (note)15 New Year's Day (note)dtype: object
The public holidays for each country are provided by thepublic holidays
Package in Python. For a list of available countries and the country name to use, see their page: https://github.com/dr-prodigy/python-holidays. In addition to these countries, Prophet includes public holidays for these countries: Brazil (BR), Indonesia (ID), India (IN), Malaysia (MY), Vietnam (VN), Thailand (TH), Philippines (PH), Pakistan (PK) , Bangladesh (BD), Egypt (EG), China (CN) and Russia (RU), Korea (KR), Belarus (BY) and United Arab Emirates (AE).
In Python, most holidays are calculated deterministically and are therefore available for any date range. A warning will be issued if dates are outside the supported range of that country. In R, holiday dates for 1995 to 2044 are calculated and stored in the package asdata-raw/generated_holidays.csv
. If a wider date range is needed, this script can be used to replace this file with a different date range: https://github.com/facebook/prophet/blob/main/python/scripts/generate_holidays_file.py.
As above, the country-level holidays are then displayed in the component chart:
123
# Rforecast <- predict(m, future)prophet_plot_components(m, forecast)
123
#Pythonforecast = m.predict(future)Feige = m.plot_components(forecast)
Fourier order for seasonality
Seasonalities are estimated using a partial Fourier sum. Seethe paperfor full details andThis illustration is available on Wikipediato illustrate how a partial Fourier sum can approximate any periodic signal. The number of terms in the subtotal (the order) is a parameter that determines how quickly the seasonality can change. To illustrate, consider Peyton Manning's data from theQuick Start. The default Fourier order for annual seasonality is 10, resulting in this fit:
123
# Rm <- a Prophet(df)a Prophet:::plot_yearly(m)
1234
#Pythonout prophet.plot import plot_yearlym = Prophet().fit(df)a = plot_yearly(m)
The default values are often reasonable, but can be increased if seasonality needs to adjust to more frequent changes, and are generally less consistent. The Fourier order can be specified for each built-in seasonality when instantiating the model, here it is increased to 20:
123
# Rm <- a Prophet(df, annual.seasonality = 20)a Prophet:::plot_yearly(m)
1234
#Pythonout prophet.plot import plot_yearlym = Prophet(yearly_seasonality=20).fit(df)a = plot_yearly(m)
Increasing the number of Fourier terms allows the seasonality to adjust for faster changing cycles, but can also lead to overfitting: N Fourier terms correspond to 2N variables used to model the cycle
Setting custom seasonalities
By default, Prophet adjusts weekly and annual seasonalities when the time series is longer than two cycles. It also adjusts daily seasonality for an intraday time series. With you can add further seasonalities (monthly, quarterly, hourly).add_seasonality
Method (Python) or Function (R).
The inputs to this function are a name, the period of seasonality in days, and the Fourier order for seasonality. For reference, by default, Prophet uses a Fourier order of 3 for weekly seasonality and 10 for yearly seasonality. An optional input foradd_seasonality
is the prior scale for this seasonal component - discussed below.
As an example, here we fit Peyton Manning's dataQuick Start, but replace weekly seasonality with monthly seasonality. The monthly seasonality then appears in the component chart:
123456
# Rm <- a Prophet(weekly.seasonality=NOT CORRECT)m <- add_seasonality(m, Name='monthly', Period=30.5, fourier.order=5)m <- becomes a prophet(m, df)forecast <- predict(m, future)prophet_plot_components(m, forecast)
12345
#Pythonm = Prophet(weekly_seasonality=NOT CORRECT)m.add_seasonality(Name='monthly', Period=30.5, fourier_order=5)forecast = m.fit(df).predict(future)Feige = m.plot_components(forecast)
Seasonalities that depend on other factors
In some cases, seasonality may depend on other factors, such as B. a weekly seasonal pattern that is different in summer than the rest of the year, or a daily seasonal pattern that is different on weekends than weekdays. These types of seasonalities can be modeled using conditional seasonalities.
Consider the example of Peyton Manning from theQuick Start. The default weekly seasonality assumes the weekly seasonality pattern to be the same year-round, but we expect the weekly seasonality pattern to be different during the season (when games are played every Sunday) and the off-season. We can use conditional seasonalities to construct separate weekly in-season and off-season seasonalities.
First, let's add a boolean column to the dataframe that indicates whether each date is in season or in the off-season:
12345678
# Ris_nfl_season <- function(DS) { Term <- as.date(DS) Month <- as.numeric(Format(Term, '%m')) return(Month > 8 | Month < 2)}df$on_season <- is_nfl_season(df$DS)df$off-season <- !is_nfl_season(df$DS)
1234567
#Pythondef is_nfl_season(DS): Datum = pd.to_datetime(DS) return (Datum.Month > 8 or Datum.Month < 2)df['on_season'] = df['ds'].use(is_nfl_season)df['off-season'] = ~df['ds'].use(is_nfl_season)
Then we disable the built-in weekly seasonality and replace it with two weekly seasonalities that have these columns specified as a condition. This means that seasonality is only applied to dates where thecondition name
column isTRUE
. We also need to add the column toofuture
Data frame for which we make predictions.
12345678910
# Rm <- a Prophet(weekly.seasonality=NOT CORRECT)m <- add_seasonality(m, Name='weekly_on_season', Period=7, fourier.order=3, Bedingung.Name='on_season')m <- add_seasonality(m, Name='weekly_off_season', Period=7, fourier.order=3, Bedingung.Name='off-season')m <- becomes a prophet(m, df)future$on_season <- is_nfl_season(future$DS)future$off-season <- !is_nfl_season(future$DS)forecast <- predict(m, future)prophet_plot_components(m, forecast)
123456789
#Pythonm = Prophet(weekly_seasonality=NOT CORRECT)m.add_seasonality(Name='weekly_on_season', Period=7, fourier_order=3, condition name='on_season')m.add_seasonality(Name='weekly_off_season', Period=7, fourier_order=3, condition name='off-season')future['on_season'] = future['ds'].use(is_nfl_season)future['off-season'] = ~future['ds'].use(is_nfl_season)forecast = m.fit(df).predict(future)Feige = m.plot_components(forecast)
Both seasonalities are now shown in the component charts above. We can see that during the on-season when games are played every Sunday, there are big increases on Sunday and Monday that are completely absent during the off-season.
Prior scale for holidays and seasonality
If you find that the holidays are over-adjusted, you can adjust their previous scaling to smooth them with the parameterholiday_prior_scale
. By default, this parameter is 10, which offers very little regularization. Reducing this parameter dampens holiday effects:
1234567
# Rm <- a Prophet(df, public holidays = public holidays, holidays.before.the.season = 0,05)forecast <- predict(m, future)forecast %>% choose(DS, game start, Super Bowl) %>% Filter(Abs(game start + Super Bowl) > 0) %>% tail(10)
12345
#Pythonm = Prophet(public holidays=public holidays, holiday_prior_scale=0,05).fit(df)forecast = m.predict(future)forecast[(forecast['game start'] + forecast['Super Bowl']).Abs() > 0][ ['ds', 'game start', 'Super Bowl']][-10:]
DS | game start | Super Bowl | |
---|---|---|---|
2190 | 2014-02-02 | 1.206086 | 0,964914 |
2191 | 2014-02-03 | 1.852077 | 0,992634 |
2532 | 2015-01-11 | 1.206086 | 0,000000 |
2533 | 2015-01-12 | 1.852077 | 0,000000 |
2901 | 2016-01-17 | 1.206086 | 0,000000 |
2902 | 2016-01-18 | 1.852077 | 0,000000 |
2908 | 2016-01-24 | 1.206086 | 0,000000 |
2909 | 2016-01-25 | 1.852077 | 0,000000 |
2922 | 2016-02-07 | 1.206086 | 0,964914 |
2923 | 2016-02-08 | 1.852077 | 0,992634 |
The magnitude of the holiday effect has been reduced from what it used to be, particularly for superbowls, which had the fewest observations. There is one parameterseasonal_priority_scale
This similarly adjusts the extent to which the seasonality model is fitted to the data.
Advance staggers can be set separately for individual public holidays by adding a columnprior_scale
in the holiday data frame. Prior scales for individual seasonalities can be passed to as an argumentadd_seasonality
. For example, the prior scale for weekly seasonality only can be set with:
1234
# Rm <- a Prophet()m <- add_seasonality( m, Name='weekly', Period=7, fourier.order=3, prior.scale=0,1)
1234
#Pythonm = Prophet()m.add_seasonality( Name='weekly', Period=7, fourier_order=3, prior_scale=0,1)
Additional regressors
Additional regressors can be added to the linear part of the modeladd_regressor
method or function. A column with the regressor value must exist in both the fit and prediction data frames. For example, we can add an additional effect on Sundays during the NFL season. In the component graph, this effect is shown in the extra_regressors graph:
12345678910111213141516
# Rnfl_sunday <- function(DS) { Term <- as.date(DS) Month <- as.numeric(Format(Term, '%m')) as.numeric((days of the week(Term) == "Sunday") & (Month > 8 | Month < 2))}df$nfl_sunday <- nfl_sunday(df$DS)m <- a Prophet()m <- add_regressor(m, 'nfl_sunday')m <- becomes a prophet(m, df)future$nfl_sunday <- nfl_sunday(future$DS)forecast <- predict(m, future)prophet_plot_components(m, forecast)
1234567891011121314151617
#Pythondef nfl_sunday(DS): Datum = pd.to_datetime(DS) if Datum.weekday() == 6 and (Datum.Month > 8 or Datum.Month < 2): return 1 anders: return 0df['nfl_sunday'] = df['ds'].use(nfl_sunday)m = Prophet()m.add_regressor('nfl_sunday')m.fit(df)future['nfl_sunday'] = future['ds'].use(nfl_sunday)forecast = m.predict(future)Feige = m.plot_components(forecast)
NFL Sundays could also have been handled using the "Holidays" interface described above by creating a list of past and future NFL Sundays. Thatadd_regressor
function provides a more general interface for defining additional linear regressors and, in particular, does not require the regressor to be a binary indicator. Another time series could be used as a regressor, but the future values would have to be known.
This notebookshows an example of using weather factors as additional regressors in a bicycle usage forecast and provides a good illustration of how other time series can be included as additional regressors.
Thatadd_regressor
The function has optional arguments to specify the a priori scale (by default the holiday a priori scale is used) and whether the regressor is standardized or not - see the docstring withhelp(Prophet.add_regressor)
in Python u?add_regressor
in R. Note that regressors must be added before fitting the model. The Prophet will also report an error if the regressor is constant throughout history as there is nothing to match.
The additional regressor must be known for both historical and future data. So it must either be something that has known future values (such asnfl_sunday
) or something that has been forecast separately elsewhere. The weather regressors used in the notebook linked above are a good example of an additional regressor that contains forecasts that can be used for future values. One can also use another time series as a regressor, which was predicted with a time series model like Prophet. For example whenr(t)
is included as a regressor fory(t)
, Prophet can be used for predictionr(t)
and then this forecast can be included in the forecast as future valuesy(t)
. A note of caution with this approach: this probably won't be useful unlessr(t)
is then somehow easier to predicty(t)
. This is because there are errors in the forecast ofr(t)
will generate errors in the forecast ofy(t)
. One setting where this can be useful is in hierarchical time series where there is a top level forecast which has a higher signal to noise ratio and is therefore easier to forecast. Its forecast can be included in the forecast for each subseries.
Additional regressors are inserted into the linear component of the model such that the underlying model appears to depend on the additional regressor as either an additive or a multiplicative factor for the time series (see the next section for multiplicativeness).
Coefficients of additional regressors
To extract the beta coefficients of the additional regressors use the utility functionregressor_coefficients
(aus prophet.utilities import regressor_coefficients
in Python,prophet::regressor_coefficients
in R) on the fitted model. The estimated beta coefficient for each regressor approximately represents the increase in predicted value for a one unit increase in regressor value (note that the coefficients returned are always on scale with the original data). ifmcmc_samples
is specified, it also returns a credible interval for each coefficient that can be used to determine whether each regressor is "statistically significant".