使用Prophet预言家进行时间序列预测 - 文章 - 开发者社区

prophet是facebook在2017年开源的强大的时间序列预测工具。

prophet(读作 ˈprɒfɪt)这个英文单词的意思是 先知，预言家 (没错，就是天黑请睁眼的那位😋)。顾名思义，它能够预测未来。

picture.image

Prophet是一个设计精妙的单层的回归模型， 特别适合对具有明显季节周期性(如气温，商品销量，交通流量等)的时间序列进行预测 ，并具有强大的解释性。

我们将简要介绍Prophet框架的算法原理，并以一个开源的能源消耗时间序列数据预测为例，展示prophet的使用方法和强大能力。

公众号 算法美食屋 后台回复关键词： prophet ，获取本教程notebook代码和数据集下载链接~

预测效果展示：

picture.image

〇，Prophet原理概述

1，prophet的优点：

1，拟合能力强。可以拟合时间序列数据中的趋势特性，周期特性，以及节假日时间/特殊事件影响等，可以返回置信区间作为预测结果。

2，对噪声鲁棒。趋势项中引入了changepoints，模型的参数量远远小于lstm等深度方案，不容易过拟合，收敛迅速。

3，模型解释性好。提供了强大的可视化分析辅助工具，便于分析趋势，不同周期，不同节假日/特殊事件各自的贡献。

2，prophet的缺点：

1，不适用协变多维序列。prophet仅仅能够对单个时间序列建模(例如某地气温)，不能够对协变的多个序列同时建模(例如沪深300支股票走势)。

2，无法进行自动化复杂特征抽取。受prophet模型相对简单的假设空间的限制，它无法对输入特征进行交叉组合变换等自动化抽取操作。

3，prophet的原理：

prophet是一个加法模型，将时间序列分解成趋势项，周期项，节假日项/特殊事件影响，以及残差项的组合。

【注：根据需要，也可以将周期项和节假日项/特殊事件影响设置为乘数，而非加数】

1，其中趋势项被拟合成分段线性函数(默认) 或者分段logistic函数(存在上下限的情形，如虫口模型，病毒传播等存在容量上限)。

2，周期项使用有限阶(通常对一个是3到8阶)的傅里叶级数进行拟合，大大减少了参数量，避免对噪声数据过拟合。

3，节假日项/特殊事件项可以作为点特征或者区间特征引入，并支持自定义不同类型的节假日或事件，还可通过add_regressor引入其它已知序列作为特征，非常灵活。


        
          
import numpy as np  
import pandas as pd  
import seaborn as sns  
import matplotlib.pyplot as plt  
plt.style.use('fivethirtyeight') # For plots

一，准备数据

我们使用的数据集是美国能源消耗数据集。该数据集包含了美国一家能源公司的长达数十年的能源消耗数据，数据分辨率为小时。

1，读取数据


        
          
dfdata = pd.read_csv('PJME\_hourly.csv',  
                   index_col=[0], parse_dates=[0])  
dfdata.columns = ['y']  #将预测目标列名称改为y   
dfdata.index.name = 'ds'   #将时间序列名称改为ds


        
          
# Color pallete for plotting  
color_pal = ["#F8766D", "#D39200", "#93AA00",  
             "#00BA38", "#00C19F", "#00B9E3",  
             "#619CFF", "#DB72FB"]  
dfdata.plot(style='.', figsize=(15,5), color=color_pal[1], title='PJM East')  
plt.show()

picture.image

2, 数据eda

我们设计一些时间日期特征来查看数据的趋势


        
          
def create\_features(df, label=None):  
    """  
    Creates time series features from datetime index.  
    """  
    df = df.copy()  
    df['date'] = df.index  
    df['hour'] = df['date'].dt.hour  
    df['dayofweek'] = df['date'].dt.dayofweek  
    df['quarter'] = df['date'].dt.quarter  
    df['month'] = df['date'].dt.month  
    df['year'] = df['date'].dt.year  
    df['dayofyear'] = df['date'].dt.dayofyear  
    df['dayofmonth'] = df['date'].dt.day  
    df['weekofyear'] = df['date'].dt.isocalendar().week #df['date'].dt.weekofyear  
      
    X = df[['hour','dayofweek','quarter','month','year',             'dayofyear','dayofmonth','weekofyear']]  
    if label:  
        y = df[label]  
        return X, y  
    return X  
  
X, y = create_features(dfdata, label='y')  
  
dfXy = pd.concat([X, y], axis=1)


        
          
# See our features and target  
dfXy.head()


        
          
sns.pairplot(dfXy.dropna(),  
             hue='hour',  
             x_vars=['hour','dayofweek',  
                     'year','weekofyear'],  
             y_vars='y',  
             kind='scatter',  
             height=5,  
             plot_kws={'alpha':0.05, 'linewidth':0},  
             palette='husl'  
            )  
plt.suptitle('Power Use MW by Hour, Day of Week, Year and Week of Year')  
plt.show()

picture.image

3, 数据分割


        
          
split_date = '2015-01-01'  
dftrain = dfdata.loc[dfdata.index <= split_date].copy()  
dftest = dfdata.loc[dfdata.index > split_date].copy()


        
          
dftest \  
    .rename(columns={'y': 'TEST SET'}) \  
    .join(dftrain.rename(columns={'y': 'TRAINING SET'}),  
          how='outer') \  
    .plot(figsize=(15,5), title='PJM East', style='.')  
plt.show()

picture.image

二，定义模型


        
          
from prophet import Prophet  
# model = Prophet()  #使用默认参数  
  
#1，趋势项相关设置  
  
# model = Prophet(growth = 'logistic') #默认是linear  
# model = Prophet(changepoints=['2020-01-01','2023-01-01']) #手工设置changepoints  
# model = Prophet(n\_changepoints=25)   
# model = Prophet(changepoint\_range=0.8)  
# model = Prophet(changepoint\_prior\_scale=0.05) #越大越容易转变趋势  
  
  
#2，周期项相关设置  
  
# model = Prophet(yearly\_seasonality='auto')  
# model = Prophet(weekly\_seasonality=True)  
# model = Prophet(daily\_seasonality=False)  
# model = Prophet(seasonality\_mode='multiplicative') #默认是additive  
# model.add\_seasonality(name='monthly', period=30.5, fourier\_order=5, prior\_scale=0.1) #手工添加周期项  
  
#3，节假日项相关设置  
  
# model.add\_country\_holidays(country\_name='AU')  
# model = Prophet(holidays=dfholidays,holidays\_mode='multiplicative')   
# model.add\_regressor('temperature') #使用温度特征作为一个回归项因子，需要在训练集和测试集中都知道


        
          
from pandas.tseries.holiday import USFederalHolidayCalendar as calendar  
  
cal = calendar()  
dfdata['date'] = dfdata.index.date  
dfdata['is\_holiday'] = dfdata.date.isin([d.date() for d in cal.holidays()])  
dfholidays = dfdata.loc[dfdata['is\_holiday']] \  
    .reset_index() \  
    .rename(columns={'Ddate':'ds'})  
dfholidays['holiday'] = 'USFederalHoliday'  
dfholidays = dfholidays.drop(['y','date','is\_holiday'], axis=1)  
dfholidays['ds'] = pd.to_datetime(dfholidays['ds'])  
  
dfholidays.head()


        
          
model = Prophet(holidays=dfholidays)

三，训练模型


        
          
model.fit(dftrain.reset_index(), iter=10000, show_console=True)

picture.image

四，使用模型


        
          
# dftest = model.make\_future\_dataframe(periods=180)  
dftest_fcst = model.predict(df=dftest.reset_index())


        
          
# Plot the forecast  
f, ax = plt.subplots(1)  
f.set_figheight(5)  
f.set_figwidth(15)  
ax.scatter(dftest.index, dftest['y'], color='r') #真实值  
fig = model.plot(dftest_fcst,  
                 ax=ax) #预测值以及范围  
plt.show()

picture.image


        
          
# 加法模型结构  
display(dftest_fcst.tail())  
cols = [x for x in dftest_fcst.columns if 'lower' not in x and 'upper' not in x]   
print(cols)


        
          
# yhat = trend+additive\_terms  
print((dftest_fcst['yhat']-dftest_fcst.eval('trend+additive\_terms')).sum())  
  
# additive\_terms = daily+weekly+yearly+multiplicative\_terms+USFederalHoliday  
print((dftest_fcst['additive\_terms']-dftest_fcst.eval('daily+weekly+yearly+multiplicative\_terms+USFederalHoliday')).sum())


          
0.0  
-9.681855317467125e-11


        
          
# model explaination  
fig = model.plot_components(dftest_fcst)

picture.image

五，评估模型


        
          
from sklearn.metrics import mean_squared_error, mean_absolute_error  
  
def mean\_absolute\_percentage\_error(y\_true, y\_pred):   
    """Calculates MAPE given y\_true and y\_pred"""  
    y_true, y_pred = np.array(y_true), np.array(y_pred)  
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100  
  
  
  
mse = mean_squared_error(y_true=dftest['y'],  
                   y_pred=dftest_fcst['yhat'])  
  
mae = mean_absolute_error(y_true=dftest['y'],  
                   y_pred=dftest_fcst['yhat'])  
mape = mean_absolute_percentage_error(y_true=dftest['y'],  
                   y_pred=dftest_fcst['yhat'])  
  
print('mse = ',mse)  
print('mae = ',mae)  
print('mape = ',mape)


          
mse =  43889420.377070874mae =  5190.580798417535mape =  16.5407058132873


        
          
#真实值和预测值差异比较  
ax = dftest_fcst.set_index('ds')['yhat'].plot(figsize=(15, 5),  
                                                 lw=0,  
                                                 style='.')  
dftest['y'].plot(ax=ax,  
                  style='.',  
                  lw=0,  
                  alpha=0.2)  
plt.legend(['Forecast','Actual'])  
plt.title('Forecast vs Actuals')  
plt.show()

picture.image

六，保存模型


        
          
import json   
from prophet.serialize import model_to_json, model_from_json  
# save  
with open('prophet\_model.json', 'w') as md:      
    json.dump(model_to_json(model), md)  
      
      
# load  
with open('prophet\_model.json', 'r') as md:     
    model_loaded = model_from_json(json.load(md))

万水千山总是情，点个在看行不行！

公众号 算法美食屋 后台回复关键词： prophet ，获取本教程notebook代码和数据集下载链接~