博主前期相关的博客见下:
cs109-energy+哈佛大学能源探索项目 Part-1(项目背景)
cs109-energy+哈佛大学能源探索项目 Part-2.1(Data Wrangling)
cs109-energy+哈佛大学能源探索项目 Part-2.2(Data Wrangling)
这次是讲数据的探索性分析。

Exploratory Analysis

探索性分析

%matplotlib inline import requests
from StringIO import StringIO
import numpy as np
import pandas as pd # pandas
import matplotlib.pyplot as plt # module for plotting
import datetime as dt # module for manipulating dates and times
import numpy.linalg as lin # module for performing linear algebra operations
from __future__ import division
import matplotlibpd.options.display.mpl_style = 'default'

requests 库用于发送 HTTP 请求和处理响应。
StringIO 模块用于创建文本数据流。
numpy.linalg 模块提供了线性代数函数。
from future import division 语句将 Python 2.x 中除法运算符 / 的行为更改为像 Python 3.x 一样,其中除法始终返回一个浮点数。
最后一行 pd.options.display.mpl_style = ‘default’ 将 pandas 数据帧的默认样式设置为 matplotlib 提供的 default 样式。

Monthly energy consumption

每月的能量消耗

pd.options.display.mpl_style = 'default'
consumption = pd.read_csv('Data/Monthly_Energy_Gund.csv')
for i in range(len(consumption)):consumption['CW-kBtu'][i] = float(consumption['CW-kBtu'].values[i].replace(',', ''))consumption['EL-kBtu'][i] = float(consumption['EL-kBtu'].values[i].replace(',', ''))consumption['ST-kBtu'][i] = float(consumption['ST-kBtu'].values[i].replace(',', ''))time_index = np.arange(len(consumption))
plt.figure(figsize=(15,7))
b1 = plt.bar(time_index, consumption['EL-kBtu'], width = 0.6, color='g')
b2 = plt.bar(time_index, consumption['ST-kBtu'], bottom=consumption['EL-kBtu'], width = 0.6, color='r')
b3 = plt.bar(time_index, consumption['CW-kBtu'], bottom=consumption['EL-kBtu']+consumption['ST-kBtu'], width = 0.6, color='b')plt.xticks(time_index+0.5, consumption['Time'], rotation=90)
plt.title('Monthly Energy consumption')
plt.xlabel('Month')
plt.ylabel('Consumption (kBtu)')
plt.legend( (b1, b2, b3), ('Electricity', 'Steam', 'Chilled Water') )

pd.options.display.mpl_style = 'default' 这行代码的作用是将 pandas 显示的默认风格设置为 matplotlib 提供的 ‘default’ 风格。

下面的代码将读取名为’Monthly_Energy_Gund.csv’的数据文件,并使用循环将每个单元格中的逗号删除并转换为浮点数,以便进行后续的可视化操作。
接下来,使用 numpy.arange 创建时间序列索引,然后使用 matplotlib 创建一个堆叠条形图,用不同的颜色表示每个月的电力、蒸汽和冷却水的消耗量。最后,通过添加标签和标题等元素来完善图表。

Electricity energy consumption pattern

电力能源消耗模式

“pattern” 可以理解为“模式”,是指在一定时间范围内,某种现象、行为或趋势的重复出现或表现出来的规律性。在能源领域中,“Electricity energy consumption pattern” 指的是电力能源在一定时间范围内的消耗规律,包括消耗的数量、消耗的时间分布、消耗的趋势等。

First, let’s see what we can find in hourly and daily electricity energy consumption.

hourlyElectricity = pd.read_excel('Data/hourlyElectricity.xlsx')index = (hourlyElectricity['startTime'] >= np.datetime64('2011-07-03')) & (hourlyElectricity['startTime'] < np.datetime64('2014-10-26'))
hourlyElectricityForVisualization = hourlyElectricity.loc[index,'electricity-kWh']print "Data length: ", len(hourlyElectricityForVisualization)/24/7, " weeks"

选择特定时间的 hourly 数据分析;

data = hourlyElectricityForVisualization.values
data = data.reshape((len(data)/24/7,24*7))from mpl_toolkits.axes_grid1 import make_axes_locatableyTickLabels = pd.DataFrame(data = pd.date_range(start = '2011-07-03', end = '2014-10-25', freq = '4W'), columns=['datetime'])
yTickLabels['date'] = yTickLabels['datetime'].apply(lambda x: x.strftime('%Y-%m-%d'))s1 = ['Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ']
s2 = ['12AM ', '6 AM', '12PM', '6 PM']
s1 = np.repeat(s1, 4)
s2 = np.tile(s2, 7)
xTickLabels = np.char.add(s1, s2)fig = plt.figure(figsize=(20,30))
ax = plt.gca()
im = ax.imshow(data, vmin =0, vmax = 500, interpolation='nearest', origin='upper')
# create an axes on the right side of ax. The width of cax will be 5%
# of ax and the padding between cax and ax will be fixed at 0.05 inch.
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="3%", pad=0.2)
ax.set_yticks(range(0,173,4))
ax.set_yticklabels(labels = yTickLabels['date'], fontsize = 14)ax.set_xticks(range(0,168,6))
ax.set_xticklabels(labels = xTickLabels, fontsize = 14, rotation = 90)plt.colorbar(im, cax=cax)

绘制逐日的数据

上图中为选择的每天的数据;横轴为一周中的小时的数据;空白的部分为缺失数据的部分

dailyElectricity = pd.read_excel('Data/dailyElectricity.xlsx')index = (dailyElectricity['startDay'] >= np.datetime64('2011-07-03')) & (dailyElectricity['startDay'] < np.datetime64('2014-10-19'))
dailyElectricityForVisualization = dailyElectricity.loc[index,'electricity-kWh']print "Data length: ", len(dailyElectricityForVisualization)/7, " weeks"data = dailyElectricityForVisualization.values
data = data.reshape((len(data)/7/4,7*4))from mpl_toolkits.axes_grid1 import make_axes_locatableyTickLabels = pd.DataFrame(data = pd.date_range(start = '2011-07-03', end = '2014-10-25', freq = '4W'), columns=['datetime'])
yTickLabels['date'] = yTickLabels['datetime'].apply(lambda x: x.strftime('%Y-%m-%d'))s = ['Sun ', 'Mon ', 'Tue ', 'Wed ', 'Thu ', 'Fri ', 'Sat ']
xTickLabels = np.tile(s, 4)fig = plt.figure(figsize=(14,15))
ax = plt.gca()
im = ax.imshow(data, interpolation='nearest', origin='upper')
# create an axes on the right side of ax. The width of cax will be 5%
# of ax and the padding between cax and ax will be fixed at 0.05 inch.
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="3%", pad=0.2)
ax.set_yticks(range(43))
ax.set_yticklabels(labels = yTickLabels['date'], fontsize = 14)ax.set_xticks(range(28))
ax.set_xticklabels(labels = xTickLabels, fontsize = 14, rotation = 90)plt.colorbar(im, cax=cax)
plt.show()plt.figure()
fig = dailyElectricity.plot(figsize = (15, 6))
fig.set_axis_bgcolor('w')
plt.title('All the daily electricity data', fontsize = 16)
plt.ylabel('kWh')
plt.show()

这里绘制的daily的数据
这里是横向累计的(hourly累的层数更多)

dailyElectricity = pd.read_excel('Data/dailyElectricity.xlsx')
weeklyElectricity = dailyElectricity.asfreq('W', how='sume', normalize=False)plt.figure()
fig = weeklyElectricity['2012-01':'2014-01'].plot(figsize = (15, 6), fontsize = 15, marker = 'o', linestyle='--')
fig.set_axis_bgcolor('w')
plt.title('Weekly electricity data', fontsize = 16)
plt.ylabel('kWh')
ax = plt.gca()
plt.show()

做的是每周的数据

findings

  • 电力消耗表现出强烈的周期性模式。您可以清楚地看到白天和晚上工作日和周末之间的不同。
  • 看起来在每学期期末,电力使用量会逐渐增加,达到高峰,这可能代表了学习模式。学生们会越来越努力地准备期末考试。然后,在学期结束后会有一个低谷,包括圣诞假期。在一月份和暑期学期以及春假期间,校园可能相对空旷,电力消耗相对较低。 (部分文字由Steven贡献)
  • Selfideas:每学期的增加可能也和温度相关(低温需要加热);当然在前面的分析中也涉及到分析气候部分

Relationship between energy consumption and features

能量消耗与特征之间的关系
我们考虑的主要特征:在这一节中,我们将电力、冷水和蒸汽的消耗量(每小时和每日)与各种特征进行绘图比较。

# Read in data from Preprocessing resultshourlyElectricityWithFeatures = pd.read_excel('Data/hourlyElectricityWithFeatures.xlsx')
hourlyChilledWaterWithFeatures = pd.read_excel('Data/hourlyChilledWaterWithFeatures.xlsx')
hourlySteamWithFeatures = pd.read_excel('Data/hourlySteamWithFeatures.xlsx')dailyElectricityWithFeatures = pd.read_excel('Data/dailyElectricityWithFeatures.xlsx')
dailyChilledWaterWithFeatures = pd.read_excel('Data/dailyChilledWaterWithFeatures.xlsx')
dailySteamWithFeatures = pd.read_excel('Data/dailySteamWithFeatures.xlsx')# An example of Dataframe
dailyChilledWaterWithFeatures.head()

A note for features

Nomenclature (Alphabetically)
特征说明(符号(按字母顺序))

  • coolingDegrees:

制冷度数:如果T-C-12>0,则为T-C-12,否则为0。假设当室外温度低于12°C时,不需要制冷,这对许多建筑物来说是正确的。这将对每日预测有用,因为小时制冷度数的平均值比小时温度的平均值更好。

  • cosHour:

cos ( hourOfDay ⋅ 2 π 24 ) \text{cos}(\text{hourOfDay} \cdot \frac{2\pi}{24}) cos(hourOfDay⋅242π​)

  • dehumidification

如果 humidityRatio-0.00886> 0,then = humidityRatio - 0.00886,否则= 0。这对冷水预测特别是每日冷水预测很有用。

  • heatingDegrees

if 15 - T-C > 0, then = 15 - T-C, else = 0. 假设当室外温度高于15°C时,不需要供暖。这对每日预测有用,因为小时供暖度数的平均值比小时温度的平均值更好。

  • occupancy

一个介于0和1之间的数字。0表示没有人员占用,1表示正常占用。这是根据假期、周末和学校学术日历进行估算的。

  • pressure-mbar

atmospheric pressure

  • RH-%

Relative humidity

  • Tdew-C

Dew-point temperature

  • Humidity ratio

Humidity ratio 是预测冷水的重要因素,因为冷水也用于干燥排放到房间中的空气。使用湿度比比使用相对湿度和露点温度更有效和有效。

holidays = pd.read_excel('Data/holidays.xlsx')
holidays

节假日的特征,如果全占的话设置为1

Energy Consumption versus Features

能量消耗与特征的关系

Temperature & cooling/heating degrees

fig, ax = plt.subplots(3, 2, sharey='row', figsize = (15, 12))
fig.subplots_adjust(hspace = 0.1, wspace = 0.1)hourlyElectricityWithFeatures.plot(kind = 'scatter', x = 'T-C', y = 'electricity-kWh', ax = ax[0,0])
hourlyElectricityWithFeatures.plot(kind = 'scatter', x = 'coolingDegrees', y = 'electricity-kWh', ax = ax[0,1])
hourlyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'T-C', y = 'chilledWater-TonDays', ax = ax[1,0])
hourlyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'coolingDegrees', y = 'chilledWater-TonDays', ax = ax[1,1])
hourlySteamWithFeatures.plot(kind = 'scatter', x = 'T-C', y = 'steam-LBS', ax = ax[2,0])
hourlySteamWithFeatures.plot(kind = 'scatter', x = 'heatingDegrees', y = 'steam-LBS', ax = ax[2,1])for i in range(3):    ax[i,0].tick_params(which=u'major', reset=False, axis = 'y', labelsize = 13)#ax[i,0].set_axis_bgcolor('w')for i in range(2):    ax[2,i].tick_params(which=u'major', reset=False, axis = 'x', labelsize = 13)ax[2,0].set_xlabel(r'Temperature ($^\circ$C)', fontsize = 13)
ax[2,0].set_xlim([-20,40])
ax[0,0].set_title('Hourly energy use versus ourdoor temperature', fontsize = 15)ax[2,1].set_xlabel(r'Cooling/Heating degrees ($^\circ$C)', fontsize = 13)
#ax[2,1].set_xlim([0,30])
ax[0,1].set_title('Hourly energy use versus cooling/heating degrees', fontsize = 15)plt.show()

冷水蒸汽的消耗量温度存在强烈的相关性。然而,仅使用室外温度或制冷/制热度来预测每小时的冷水和蒸汽消耗是不足够的。(第二行;第三行)

fig, ax = plt.subplots(3, 2, sharey='row', figsize = (15, 12))
fig.subplots_adjust(hspace = 0.1, wspace = 0.1)dailyElectricityWithFeatures.plot(kind = 'scatter', x = 'T-C', y = 'electricity-kWh', ax = ax[0,0])
dailyElectricityWithFeatures.plot(kind = 'scatter', x = 'coolingDegrees', y = 'electricity-kWh', ax = ax[0,1])
dailyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'T-C', y = 'chilledWater-TonDays', ax = ax[1,0])
dailyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'coolingDegrees', y = 'chilledWater-TonDays', ax = ax[1,1])
dailySteamWithFeatures.plot(kind = 'scatter', x = 'T-C', y = 'steam-LBS', ax = ax[2,0])
dailySteamWithFeatures.plot(kind = 'scatter', x = 'heatingDegrees', y = 'steam-LBS', ax = ax[2,1])for i in range(3):    ax[i,0].tick_params(which=u'major', reset=False, axis = 'y', labelsize = 13)#ax[i,0].set_axis_bgcolor('w')for i in range(2):    ax[2,i].tick_params(which=u'major', reset=False, axis = 'x', labelsize = 13)ax[2,0].set_xlabel(r'Temperature ($^\circ$C)', fontsize = 13)
ax[2,0].set_xlim([-20,40])
ax[0,0].set_title('Daily energy use versus ourdoor temperature', fontsize = 15)ax[2,1].set_xlabel(r'Cooling/Heating degrees ($^\circ$C)', fontsize = 13)
#ax[2,1].set_xlim([0,30])
ax[0,1].set_title('Daily energy use versus cooling/heating degrees', fontsize = 15)plt.show()

每日的冷水和蒸汽消耗量室外温度存在强烈的线性关系。如果使用制冷/制热度代替温度差,可能可以避免逐步线性回归。

湿度radio & dehumidification

fig, ax = plt.subplots(3, 2, sharex = 'col', sharey='row', figsize = (15, 12))
fig.subplots_adjust(hspace = 0.1, wspace = 0.1)hourlyElectricityWithFeatures.plot(kind = 'scatter', x = 'humidityRatio-kg/kg', y = 'electricity-kWh', ax = ax[0,0])
hourlyElectricityWithFeatures.plot(kind = 'scatter', x = 'dehumidification', y = 'electricity-kWh', ax = ax[0,1])
hourlyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'humidityRatio-kg/kg', y = 'chilledWater-TonDays', ax = ax[1,0])
hourlyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'dehumidification', y = 'chilledWater-TonDays', ax = ax[1,1])
hourlySteamWithFeatures.plot(kind = 'scatter', x = 'humidityRatio-kg/kg', y = 'steam-LBS', ax = ax[2,0])
hourlySteamWithFeatures.plot(kind = 'scatter', x = 'dehumidification', y = 'steam-LBS', ax = ax[2,1])for i in range(3):    ax[i,0].tick_params(which=u'major', reset=False, axis = 'y', labelsize = 13)#ax[i,0].set_axis_bgcolor('w')for i in range(2):    ax[2,i].tick_params(which=u'major', reset=False, axis = 'x', labelsize = 13)ax[2,0].set_xlabel(r'Humidity ratio (kg/kg)', fontsize = 13)
ax[2,0].set_xlim([0,0.02])
ax[0,0].set_title('Hourly energy use versus humidity ratio', fontsize = 15)ax[2,1].set_xlabel(r'Dehumidification', fontsize = 13)
ax[2,1].set_xlim([0,0.01])
ax[0,1].set_title('Hourly energy use versus dehumidification', fontsize = 15)plt.show()

湿度radio绝对有助于预测冷水消耗量,并且比相对dehumidification更好。

fig, ax = plt.subplots(3, 2, sharex = 'col', sharey='row', figsize = (15, 12))
fig.subplots_adjust(hspace = 0.1, wspace = 0.1)dailyElectricityWithFeatures.plot(kind = 'scatter', x = 'humidityRatio-kg/kg', y = 'electricity-kWh', ax = ax[0,0])
dailyElectricityWithFeatures.plot(kind = 'scatter', x = 'dehumidification', y = 'electricity-kWh', ax = ax[0,1])
dailyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'humidityRatio-kg/kg', y = 'chilledWater-TonDays', ax = ax[1,0])
dailyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'dehumidification', y = 'chilledWater-TonDays', ax = ax[1,1])
dailySteamWithFeatures.plot(kind = 'scatter', x = 'humidityRatio-kg/kg', y = 'steam-LBS', ax = ax[2,0])
dailySteamWithFeatures.plot(kind = 'scatter', x = 'dehumidification', y = 'steam-LBS', ax = ax[2,1])for i in range(3):    ax[i,0].tick_params(which=u'major', reset=False, axis = 'y', labelsize = 13)#ax[i,0].set_axis_bgcolor('w')for i in range(2):    ax[2,i].tick_params(which=u'major', reset=False, axis = 'x', labelsize = 13)ax[2,0].set_xlabel(r'Humidity ratio (kg/kg)', fontsize = 13)
ax[2,0].set_xlim([0,0.02])
ax[0,0].set_title('Daily energy use versus humidity ratio', fontsize = 15)ax[2,1].set_xlabel(r'Dehumidification', fontsize = 13)
ax[2,1].set_xlim([0,0.01])
ax[0,1].set_title('Daily energy use versus dehumidification', fontsize = 15)plt.show()

Dehumidification is designed for chilled water prediction, not steam.
分别对比hourly 和daily

cosHour

fig, ax = plt.subplots(3, 2, sharex = 'col', figsize = (15, 12))
fig.subplots_adjust(hspace = 0.1, wspace = 0.15)hourlyElectricityWithFeatures.plot(kind = 'scatter', x = 'occupancy', y = 'electricity-kWh', ax = ax[0,0])
dailyElectricityWithFeatures.plot(kind = 'scatter', x = 'occupancy', y = 'electricity-kWh', ax = ax[0,1])
hourlyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'occupancy', y = 'chilledWater-TonDays', ax = ax[1,0])
dailyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'occupancy', y = 'chilledWater-TonDays', ax = ax[1,1])
hourlySteamWithFeatures.plot(kind = 'scatter', x = 'occupancy', y = 'steam-LBS', ax = ax[2,0])
dailySteamWithFeatures.plot(kind = 'scatter', x = 'occupancy', y = 'steam-LBS', ax = ax[2,1])for i in range(3):    ax[i,0].tick_params(which=u'major', reset=False, axis = 'y', labelsize = 13)#ax[i,0].set_axis_bgcolor('w')for i in range(2):    ax[2,i].tick_params(which=u'major', reset=False, axis = 'x', labelsize = 13)ax[2,0].set_xlabel(r'Occupancy', fontsize = 13)
#ax[2,0].set_xlim([0,0.02])
ax[0,0].set_title('Hourly energy use versus occupancy', fontsize = 15)ax[2,1].set_xlabel(r'Occupancy', fontsize = 13)#ax[2,1].set_xlim([0,0.01])
ax[0,1].set_title('Daily energy use versus occupancy', fontsize = 15)plt.show()

Occupancy is derived from academic calendar, holidays and weekends. Basiaclly, we just assign a lower value to holidays, weekends and summer. cosHour, occupancy might help, might not, since they are just estimation of occupancy.

fig, ax = plt.subplots(3, 1, sharex = 'col', figsize = (8, 12))
fig.subplots_adjust(hspace = 0.1, wspace = 0.15)hourlyElectricityWithFeatures.plot(kind = 'scatter', x = 'cosHour', y = 'electricity-kWh', ax = ax[0])
hourlyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'cosHour', y = 'chilledWater-TonDays', ax = ax[1])
hourlySteamWithFeatures.plot(kind = 'scatter', x = 'cosHour', y = 'steam-LBS', ax = ax[2])for i in range(3):    ax[i].tick_params(which=u'major', reset=False, axis = 'y', labelsize = 13)#ax[i,0].set_axis_bgcolor('w')ax[2].tick_params(which=u'major', reset=False, axis = 'x', labelsize = 13)ax[2].set_xlabel(r'cosHour', fontsize = 13)
#ax[2,0].set_xlim([0,0.02])
ax[0].set_title('Hourly energy use versus cosHourOfDay', fontsize = 15)plt.show()

solar radiation & wind speed

fig, ax = plt.subplots(3, 2, sharex = 'col', sharey = 'row', figsize = (15, 12))
fig.subplots_adjust(hspace = 0.1, wspace = 0.15)hourlyElectricityWithFeatures.plot(kind = 'scatter', x = 'solarRadiation-W/m2', y = 'electricity-kWh', ax = ax[0,0])
hourlyElectricityWithFeatures.plot(kind = 'scatter', x = 'windSpeed-m/s', y = 'electricity-kWh', ax = ax[0,1])
hourlyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'solarRadiation-W/m2', y = 'chilledWater-TonDays', ax = ax[1,0])
hourlyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'windSpeed-m/s', y = 'chilledWater-TonDays', ax = ax[1,1])
hourlySteamWithFeatures.plot(kind = 'scatter', x = 'solarRadiation-W/m2', y = 'steam-LBS', ax = ax[2,0])
hourlySteamWithFeatures.plot(kind = 'scatter', x = 'windSpeed-m/s', y = 'steam-LBS', ax = ax[2,1])for i in range(3):    ax[i,0].tick_params(which=u'major', reset=False, axis = 'y', labelsize = 13)#ax[i,0].set_axis_bgcolor('w')for i in range(2):    ax[2,i].tick_params(which=u'major', reset=False, axis = 'x', labelsize = 13)ax[2,0].set_xlabel(r'Solar radiation (W/m2)', fontsize = 13)
#ax[2,0].set_xlim([0,0.02])
ax[0,0].set_title('Hourly energy use versus solar radiation', fontsize = 15)ax[2,1].set_xlabel(r'Wind speed (m/s)', fontsize = 13)#ax[2,1].set_xlim([0,0.01])
ax[0,1].set_title('Hourly energy use versus wind speed', fontsize = 15)plt.show()

hourly energy vs solar radiation & wind speed

在这里主要是三个纵坐标:每小时电力、冷水和蒸汽消耗量

fig, ax = plt.subplots(3, 2, sharex = 'col', sharey = 'row', figsize = (15, 12))
fig.subplots_adjust(hspace = 0.1, wspace = 0.15)dailyElectricityWithFeatures.plot(kind = 'scatter', x = 'solarRadiation-W/m2', y = 'electricity-kWh', ax = ax[0,0])
dailyElectricityWithFeatures.plot(kind = 'scatter', x = 'windSpeed-m/s', y = 'electricity-kWh', ax = ax[0,1])
dailyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'solarRadiation-W/m2', y = 'chilledWater-TonDays', ax = ax[1,0])
dailyChilledWaterWithFeatures.plot(kind = 'scatter', x = 'windSpeed-m/s', y = 'chilledWater-TonDays', ax = ax[1,1])
dailySteamWithFeatures.plot(kind = 'scatter', x = 'solarRadiation-W/m2', y = 'steam-LBS', ax = ax[2,0])
dailySteamWithFeatures.plot(kind = 'scatter', x = 'windSpeed-m/s', y = 'steam-LBS', ax = ax[2,1])for i in range(3):    ax[i,0].tick_params(which=u'major', reset=False, axis = 'y', labelsize = 13)#ax[i,0].set_axis_bgcolor('w')for i in range(2):    ax[2,i].tick_params(which=u'major', reset=False, axis = 'x', labelsize = 13)ax[2,0].set_xlabel(r'Solar radiation (W/m2)', fontsize = 13)
#ax[2,0].set_xlim([0,0.02])
ax[0,0].set_title('Daily energy use versus solar radiation', fontsize = 15)ax[2,1].set_xlabel(r'Wind speed (m/s)', fontsize = 13)#ax[2,1].set_xlim([0,0.01])
ax[0,1].set_title('DAily energy use versus wind speed', fontsize = 15)plt.show()

对比小时的每天的
Solar radiation and wind speed are not that important and it is correlated with temperature.

Finds

  • 电力与天气数据(温度)无关。使用天气信息来预测电力将不起作用。我认为它主要取决于时间/占用率。但我们仍然可以进行一些模式探索,以找出白天/晚上、工作日/周末、学校日/假期的用电模式。事实上,我们应该从月度数据中就已经注意到了这一点。

  • 冷水和蒸汽消耗量与温度和湿度强相关。每日的冷水和蒸汽消耗量与制冷度和制热度存在良好的线性关系。因此,简单的线性回归可能已经足够准确。

  • 虽然冷水和蒸汽消耗量与天气强相关,但根据上述图表,使用天气信息来预测每小时的冷水和蒸汽是不足够的。这是因为操作时间表会影响每小时的能源消耗。在每小时的冷水和蒸汽预测中必须包括占用率和操作时间表。

  • 湿度比绝对有助于预测冷水消耗量,并且比相对湿度和露点温度更好。

  • 制冷度和制热度将有助于预测每日的冷水和蒸汽。如果使用制冷/制热度代替温度差,可能可以避免逐步线性回归。

  • 占用率是从学术日历、假期和周末中派生出来的。基本上,我们只是将假期、周末和夏季的值设为较低值。cosHour 和占用率可能有帮助,也可能没有,因为它们只是占用率的估计值。

Reference

cs109-energy+哈佛大学能源探索项目 Part-1(项目背景)
cs109-energy+哈佛大学能源探索项目 Part-2.1(Data Wrangling)
cs109-energy+哈佛大学能源探索项目 Part-2.2(Data Wrangling)
一个完整的机器学习项目实战代码+数据分析过程:哈佛大学能耗预测项目
Part 1-3 Project Overview, Data Wrangling and Exploratory Analysis-DEC10
Prediction of Buildings Energy Consumption

cs109-energy+哈佛大学能源探索项目 Part-3(探索性分析)相关推荐

  1. 【邓侃】哈佛大学机器翻译开源项目 OpenNMT的工作原理

    一. 哈佛大学机器翻译开源项目 OpenNMT 2016年12月20日,哈佛大学自然语言处理研究组,宣布开源了他们研发的机器翻译系统 OpenNMT [1],并声称该系统的质量已经达到商用水准. 在 ...

  2. 山东省能源产业项目动态及未来投资决策建议报告2021版

    山东省能源产业项目动态及未来投资决策建议报告2021版  HS--HS--HS--HS--HS--HS--HS--HS--HS--HS--HS--HS-- [修订日期]:2021年11月 [搜索鸿晟信 ...

  3. 一个免费开源、跨平台的可视化源码探索项目

    [公众号回复 "1024",免费领取程序员赚钱实操经验] 今天我章鱼猫给大家推荐一个查看源码的神器,超级棒! Sourcetrail,它是一个免费开源.跨平台的可视化源码探索项目. ...

  4. 大数据在电力行业的应用案例100讲(二十三)-敏感性分析在园区能源互联网项目经济评价中的应用

    前言 能源互联网是能源发展的方向,是改变能源结构.提高能源利用效率的重要手段.建设能源互联网是电网企业承接国家能源安全新战略,推动电网和发展高质量发展的关键. 为准确.科学地评价能源互联网项目经济性, ...

  5. 《你说对就队》第四次作业:项目需求调研与分析

    <你说对就队>第四次作业:项目需求调研与分析 内容 项目 这个作业属于哪个课程 [教师主页] 这个作业的要求在哪里 [作业要求] 团队名称 <你说对就队> 作业学习目标 1.探 ...

  6. 如何在比赛和项目中培养一个好的探索性分析(EDA)思维 —— 翻译自kaggle一位有趣的分享者

    文章目录 前言 1.So... 我们期待从中知道些什么 2.第一件事,分析"SalePrice" 3.SalePrice,她的身体和她的兴趣爱好 4.SalePrice与类别特征的 ...

  7. R项目:使用R分析人力资源数据

    一.分析背景 人力资源分析数据集汇聚了对大量员工的信息数据统计,包括企业因素(如部门).员工行为相关因素(如参与过项目数.每月工作时长.薪资水平等).以及工作相关因素(如绩效评估.工伤事故),这些因素 ...

  8. 广东省交通行业十四五项目前景与建设规模分析报告2022版

    广东省交通行业十四五项目前景与建设规模分析报告2022版 HS--HS--HS--HS--HS--HS--HS--HS--HS--HS--HS--HS-- [修订日期]:2021年11月 [搜索鸿晟信 ...

  9. python怎么处理数据_python中scrapy处理项目数据的实例分析

    在我们处理完数据后,习惯把它放在原有的位置,但是这样也会出现一定的隐患.如果因为新数据的加入或者其他种种原因,当我们再次想要启用这个文件的时候,小伙伴们就会开始着急却怎么也翻不出来,似乎也没有其他更好 ...

最新文章

  1. web服务器的文档根目录,web服务器根目录中
  2. rust和gta5哪个吃配置_选指数基金,像“今晚吃什么”一样容易
  3. Maximum Xor Secondary(单调栈好题)
  4. snort的详细配置
  5. python使用布隆过滤器筛选数据
  6. 科技圈CEO用微鲸尬明星脸,除了罗永浩还有哪些大牛光荣上榜?
  7. 基于Cocos2d-x开发guardCarrot--2 《保卫萝卜2》主页面开发
  8. 记录下qcharts绘制曲线遇到的坑
  9. 多目标优化中常用的进化算法简介及原论文(最全概括)
  10. vue 生成二维码(中间logo),下载二维码,复制链接(vue + vue-qr+clipboard)
  11. Qualcomm MSM8937 dual DSI 笔记
  12. php居民小区物业水电费管理系统mysql
  13. mysql运算符xor_MySql运算符
  14. Update|亚洲精品菜订餐平台「Chowbus」获400万美金新融资,由Greycroft和FJ labs领投...
  15. 谷歌高效开发的秘密:来自谷歌前员工的软件开发工具指南
  16. Android Studio 选项菜单和动画结合_安卓手机关于“开发者选项”你该知道的几件事...
  17. UHS-II文档学习
  18. login shell和non-login shell
  19. 华为机试python3题解(17题 持续更新ing)
  20. nnl learning

热门文章

  1. 网上申请的流量卡为何审核不通过,这些原因你中招了吗
  2. JetBrains公司软件版本控制
  3. Java代码调用聚合数据---查询全国车辆违章接口返回违章结果
  4. 你是怎么对待上帝的?
  5. 迷宫城堡——Tarjan
  6. 游戏运行为什么会卡、有什么解决办法
  7. 微信小程序 分享功能
  8. 阿里云技术战略总监陈绪:45 岁开源老兵“中年花开”阿里云 | 人物志
  9. java求乘法逆元的代码_求乘法逆元的几种方法
  10. PHP 验证银行卡是否存在,匹配开户行