NILMTK——深扒组合优化（CO）和FHMM细节

前面的博客讲了具体实现，现在深究算法代码实现细节！！！

1.CO

(1)关于train

从以下代码可知，CO首先是对各个电器的功率数据做了train，为了了解其原生实现对代码进行了深究：

classifiers = {'CO':CombinatorialOptimisation()}
predictions = {}
sample_period = 120  ## 采样周期是两分钟
for clf_name, clf in classifiers.items():print("*"*20)print(clf_name)print("*" *20)clf.train(top_5_train_elec, sample_period=sample_period)  ### 训练部分

进入代码可知：

def train(self, metergroup, num_states_dict=None, **load_kwargs):"""Train using 1D CO. Places the learnt model in the `model` attribute.Parameters----------metergroup : a nilmtk.MeterGroup objectnum_states_dict : dict**load_kwargs : keyword arguments passed to `meter.power_series()`Notes-----* only uses first chunk for each meter (TODO: handle all chunks)."""if num_states_dict is None:num_states_dict = {}if self.model:raise RuntimeError("This implementation of Combinatorial Optimisation"" does not support multiple calls to `train`.")num_meters = len(metergroup.meters)if num_meters > 12:max_num_clusters = 2else:max_num_clusters = 3
#        print(max_num_clusters)for i, meter in enumerate(metergroup.submeters().meters):print("Training model for submeter '{}'".format(meter))power_series = meter.power_series(**load_kwargs)chunk = next(power_series)num_total_states = num_states_dict.get(meter)if num_total_states is not None:num_on_states = num_total_states - 1else:num_on_states = Noneself.train_on_chunk(chunk, meter, max_num_clusters, num_on_states)# Check to see if there are any more chunks.# TODO handle multiple chunks per appliance.try:next(power_series)except StopIteration:passelse:warn("The current implementation of CombinatorialOptimisation"" can only handle a single chunk.  But there are multiple"" chunks available.  So have only trained on the"" first chunk!")print("Done training!")

简单来看传入参数有：metergroup--> top_5_train_elec (五种用电量较高的电器)

num_states_dict=None,（建立一个字典，后期没找到用途）

**load_kwargs -->sample_period=sample_period（采样周期是120秒，也就是2两分钟）

设置参数有：如果训练的电器数量大于12，将聚类的数量置为2，其他则为3，我们此处为5种电器，因此，max_num_clusters =3

总体过程：遍历这5种电器数据，每类电器进行单独聚类，返回每个电器的聚类结果，此结果为不同电器在不同状态下的功率数值。

每种电器训练过程：

如果model存在，则提示已经训练好，无需二次训练，否则就进行电器的聚类，输入参数为：chunk-->每个电器的功率数据；max_num_clusters 聚类数。并将每个电器的聚类结果记录states，training_metadata参数保存成model。

def train_on_chunk(self, chunk, meter, max_num_clusters, num_on_states):# Check if we've already trained on this metermeters_in_model = [d['training_metadata'] for d in self.model]if meter in meters_in_model:raise RuntimeError("Meter {} is already in model!""  Can't train twice on the same meter!".format(meter))states = cluster(chunk, max_num_clusters, num_on_states)
#        print('states',states)self.model.append({'states': states,'training_metadata': meter})

结果为：

聚类详解：

通过_transform_data(X)进行数据格式转换，主要是将pd.Series或者单列的pd.DataFrame转成ndarray数据，然后再对数据进行聚类，得到每个类别的质心值，然后增加设备off状态的功率数据，按理说聚类传入的参数是3，在增加一个状态，应该是四个状态值，事实上只有3个状态，继续深究可知！！

def cluster(X, max_num_clusters=3, exact_num_clusters=None):'''Applies clustering on reduced data,i.e. data where power is greater than threshold.Parameters----------X : pd.Series or single-column pd.DataFramemax_num_clusters : intReturns-------centroids : ndarray of int32sPower in different states of an appliance, sorteda'''# Find where power consumption is greater than 10data = _transform_data(X)# Find clusterscentroids = _apply_clustering(data, max_num_clusters, exact_num_clusters)
#    print('centroids',centroids)centroids = np.append(centroids, 0)  # add 'off' statecentroids = np.round(centroids).astype(np.int32)centroids = np.unique(centroids)  # np.unique also sorts# TODO: Merge similar clustersreturn centroids

电器状态值缺少的原因：

由_apply_clustering函数的for n_clusters in range(1, max_num_clusters)可知，在max_num_clusters取值为3的情况下，n_clusters 的取值为1,2，因此少了一个状态！额外的，在进行聚类的时候，将每个电器的数据聚类成1簇，2簇，并采用了聚类算法的轮廓系数（sklearn.metrics.silhouette_score）选取了最好的n_clusters,即为n_clusters=2.

def _apply_clustering(X, max_num_clusters, exact_num_clusters=None):'''Parameters----------X : ndarraymax_num_clusters : intReturns-------centroids : list of numbersList of power in different states of an appliance'''# If we import sklearn at the top of the file then it makes autodoc failfrom sklearn import metrics# Finds whether 2 or 3 gives better Silhouellete coefficient# Whichever is higher serves as the number of clusters for that# appliancenum_clus = -1sh = -1k_means_labels = {}k_means_cluster_centers = {}k_means_labels_unique = {}# If the exact number of clusters are specified, then use thatif exact_num_clusters is not None:labels, centers = _apply_clustering_n_clusters(X, exact_num_clusters)return centers.flatten()# Exact number of clusters are not specified, use the cluster validity measures# to find the optimal numberfor n_clusters in range(1, max_num_clusters):try:labels, centers = _apply_clustering_n_clusters(X, n_clusters)
#            print('labels, centers',labels, centers)k_means_labels[n_clusters] = labelsk_means_cluster_centers[n_clusters] = centersk_means_labels_unique[n_clusters] = np.unique(labels)try:sh_n = metrics.silhouette_score(X, k_means_labels[n_clusters], metric='euclidean')if sh_n > sh:sh = sh_nnum_clus = n_clustersexcept Exception:num_clus = n_clustersexcept Exception:if num_clus > -1:return k_means_cluster_centers[num_clus]else:return np.array([0])print(k_means_cluster_centers[num_clus].flatten())return k_means_cluster_centers[num_clus].flatten()

选取聚类算法为Kmeans！！！

def _apply_clustering_n_clusters(X, n_clusters):""":param X: ndarray:param n_clusters: exact number of clusters to use:return:"""from sklearn.cluster import KMeansk_means = KMeans(init='k-means++', n_clusters=n_clusters)k_means.fit(X)return k_means.labels_, k_means.cluster_centers_

此为训练结果的全过程！

（2）disaggregate_chunk

分解时候的函数是用的disaggregate_chunk，得到房间的总功率曲线，并对5种电器进行分解。

首先是将之前train()过程的质心提取出来，并做一个枚举操作，cartesian函数是做枚举操作，由5个模型，每个模型3个状态，则可得3*3*3*3*3=243行，5列的状态组合数据。

 def _set_state_combinations_if_necessary(self):"""Get centroids"""# If we import sklearn at the top of the file then auto doc fails.#枚举所有可能性if (self.state_combinations is None orself.state_combinations.shape[1] != len(self.model)):from sklearn.utils.extmath import cartesiancentroids = [model['states'] for model in self.model]
#            print(len(centroids),len(centroids[0]))self.state_combinations = cartesian(centroids)

接下来，对状态数据进行按列累加，然后调用find_nearest函数，求得负荷数据和状态数据的残差和具体索引值。find_nearest的传入参数有按列累加之后的状态数据，用户的总功率数据。

def find_nearest(known_array, test_array):"""Find closest value in `known_array` for each element in `test_array`.Parameters----------known_array : numpy arrayconsisting of scalar values only; shape: (m, 1)test_array : numpy arrayconsisting of scalar values only; shape: (n, 1)Returns-------indices : numpy array; shape: (n, 1)For each value in `test_array` finds the index of the closest valuein `known_array`.residuals : numpy array; shape: (n, 1)For each value in `test_array` finds the difference from the closestvalue in `known_array`."""# from http://stackoverflow.com/a/20785149/732596#将x中的元素从小到大排列，提取其对应的index(索引),从小到大排序index_sorted = np.argsort(known_array)known_array_sorted = known_array[index_sorted]#将功率值插入到known_array_sorted对应的位置，并返回indexidx1 = np.searchsorted(known_array_sorted, test_array)idx2 = np.clip(idx1 - 1, 0, len(known_array_sorted)-1)idx3 = np.clip(idx1,     0, len(known_array_sorted)-1)#上限-实际值diff1 = known_array_sorted[idx3] - test_array#实际值-下限diff2 = test_array - known_array_sorted[idx2]#找到差据最小的indexindices = index_sorted[np.where(diff1 <= diff2, idx3, idx2)]#残差residuals = test_array - known_array[indices]return indices, residuals

得到索引值和残差之后，就可通过每个model来分别进行计算了，此时indices_of_state_combinations与indices等价。

 appliance_powers_dict = {}for i, model in enumerate(self.model):print("Estimating power demand for '{}'".format(model['training_metadata']))predicted_power = state_combinations[indices_of_state_combinations, i].flatten()column = pd.Series(predicted_power, index=mains.index, name=i)
#            print(column)appliance_powers_dict[self.model[i]['training_metadata']] = columnappliance_powers = pd.DataFrame(appliance_powers_dict, dtype='float32')

主要的分解核心代码在于

state_combinations[
indices_of_state_combinations, i].flatten()

下面看一个小例子：

由此可知，分别按列遍历，然后选取indices_of_state_combinations对应位置的值为当前电器的分解结果。

2.FHMM

(1)关于train

在train函数里边频繁使用了好几次GaussianHMM，没有理解其中深意。先看代码吧，首先传入meters，检查内存是否能支撑状态转换概率矩阵的计算。

def _check_memory(num_appliances):"""Checks if the maximum resident memory is enough to handle the combined matrix of transition probabilities"""# Each transmat is small (usually 2x2 or 3x3) but the combined# matrix is dense, using much more memory# Get the approximate memory in MBtry:# If psutil is installed, we can get the correct total # physical memory of the systemimport psutilavailable_memory = psutil.virtual_memory().total >> 20except ImportError:# Otherwise use a crude approximationavailable_memory = 16 << 10# We use (num_appliances + 1) here to get a pessimistic approximation:# 8 bytes * (2 ** (num_appliances + 1)) ** 2required_memory = ((1 << (2 * (num_appliances + 1))) << 3) >> 20if required_memory >= available_memory:warn("The required memory for the model may be more than the total system memory!"" Try using fewer appliances if the training fails.")

然后，遍历每个电器，获取每个电器的状态数据，经过一系列判断语句，最终发现他用了聚类来确定每个电器的状态。

 if num_total_states is None:states = cluster(meter_data, max_num_clusters)num_total_states = len(states)

然后调用hmmlearn的GaussianHMM来进行模型训练。

print("Training model for submeter '{}' with {} states".format(meter, num_total_states))
learnt_model[meter] = hmm.GaussianHMM(num_total_states, "full")
# Fit
learnt_model[meter].fit(X)

对GaussianHMM计算的means_结果进行排序，然后根据means_索引值得到相应的startprob,covars,transmat等，然后在进行一次GaussianHMM，并对参数赋值。

self.meters = []
new_learnt_models = OrderedDict()
for meter in learnt_model:startprob, means, covars, transmat = sort_learnt_parameters(learnt_model[meter].startprob_, learnt_model[meter].means_,learnt_model[meter].covars_, learnt_model[meter].transmat_)new_learnt_models[meter] = hmm.GaussianHMM(startprob.size, "full")new_learnt_models[meter].startprob_ = startprobnew_learnt_models[meter].transmat_ = transmatnew_learnt_models[meter].means_ = meansnew_learnt_models[meter].covars_ = covars# UGLY! But works.self.meters.append(meter)

均值排序计算的代码如下：

def return_sorting_mapping(means):means_copy = deepcopy(means)means_copy = np.sort(means_copy, axis=0)# Finding mappingmapping = {}for i, val in enumerate(means_copy):mapping[i] = np.where(val == means)[0][0]return mapping

例子：设定a=[1,5,3,4,2],return_sorting_mapping返回的是均值从小到大的索引值。

其余参数计算皆以均值有序值进行计算，代码如下：

def sort_startprob(mapping, startprob):""" Sort the startprob according to power means; as returned by mapping"""num_elements = len(startprob)new_startprob = np.zeros(num_elements)for i in range(len(startprob)):new_startprob[i] = startprob[mapping[i]]return new_startprobdef sort_covars(mapping, covars):new_covars = np.zeros_like(covars)for i in range(len(covars)):new_covars[i] = covars[mapping[i]]return new_covarsdef sort_transition_matrix(mapping, A):"""Sorts the transition matrix according to increasing order ofpower means; as returned by mappingParameters----------mapping :A : numpy.array of shape (k, k)transition matrix"""num_elements = len(A)A_new = np.zeros((num_elements, num_elements))for i in range(num_elements):for j in range(num_elements):A_new[i, j] = A[mapping[i], mapping[j]]return A_new

然后又对结果值做了一个GaussianHMM，代码如下：

def create_combined_hmm(model):list_pi = [model[appliance].startprob_ for appliance in model]list_A = [model[appliance].transmat_ for appliance in model]list_means = [model[appliance].means_.flatten().tolist()for appliance in model]pi_combined = compute_pi_fhmm(list_pi)A_combined = compute_A_fhmm(list_A)[mean_combined, cov_combined] = compute_means_fhmm(list_means)combined_model = hmm.GaussianHMM(n_components=len(pi_combined), covariance_type='full')combined_model.startprob_ = pi_combinedcombined_model.transmat_ = A_combinedcombined_model.covars_ = cov_combinedcombined_model.means_ = mean_combined

又对means，transmat,startprob三个状态数据做了转换，代码如下：

def compute_A_fhmm(list_A):"""Parameters-----------list_pi : List of PI's of individual learnt HMMsReturns--------result : Combined Pi for the FHMM"""result = list_A[0]for i in range(len(list_A) - 1):result = np.kron(result, list_A[i + 1])return resultdef compute_means_fhmm(list_means):"""Returns-------[mu, cov]"""states_combination = list(itertools.product(*list_means))num_combinations = len(states_combination)means_stacked = np.array([sum(x) for x in states_combination])means = np.reshape(means_stacked, (num_combinations, 1))cov = np.tile(5 * np.identity(1), (num_combinations, 1, 1))return [means, cov]def compute_pi_fhmm(list_pi):"""Parameters-----------list_pi : List of PI's of individual learnt HMMsReturns-------result : Combined Pi for the FHMM"""result = list_pi[0]for i in range(len(list_pi) - 1):result = np.kron(result, list_pi[i + 1])return result

然后得到了模型：

（2）disaggregate_chunk

首先是获取总表的功率数据，然后通过GaussianHMM的predict函数进行预测,然后在进行decode_hmm函数进行解码，一顿操作猛如虎，还是没看懂。

  def disaggregate_chunk(self, test_mains):"""Disaggregate the test data according to the model learnt previouslyPerforms 1D FHMM disaggregation.For now assuming there is no missing data at this stage."""# See v0.1 code# for ideas of how to handle missing data in this code if needs be.# Array of learnt stateslearnt_states_array = []test_mains = test_mains.dropna()length = len(test_mains.index)temp = test_mains.values.reshape(length, 1)learnt_states_array.append(self.model.predict(temp))print(learnt_states_array[0].shape)# Modelmeans = OrderedDict()for elec_meter, model in self.individual.items():means[elec_meter] = (model.means_.round().astype(int).flatten().tolist())means[elec_meter].sort()decoded_power_array = []decoded_states_array = []print(means.keys())for learnt_states in learnt_states_array:[decoded_states, decoded_power] = decode_hmm(len(learnt_states), means, means.keys(), learnt_states)decoded_states_array.append(decoded_states)decoded_power_array.append(decoded_power)prediction = pd.DataFrame(decoded_power_array[0], index=test_mains.index)return prediction

解码函数：

def decode_hmm(length_sequence, centroids, appliance_list, states):"""Decodes the HMM state sequence"""hmm_states = {}hmm_power = {}total_num_combinations = 1for appliance in appliance_list:print(centroids[appliance])total_num_combinations *= len(centroids[appliance])print(total_num_combinations)for appliance in appliance_list:hmm_states[appliance] = np.zeros(length_sequence, dtype=np.int)hmm_power[appliance] = np.zeros(length_sequence)for i in range(length_sequence):factor = total_num_combinationsfor appliance in appliance_list:# assuming integer division (will cause errors in Python 3x)factor = factor // len(centroids[appliance])temp = int(states[i]) / factorhmm_states[appliance][i] = temp % len(centroids[appliance])hmm_power[appliance][i] = centroids[appliance][hmm_states[appliance][i]]return [hmm_states, hmm_power]

每个电器有3个状态均值，共有243种组合方式，然后进行了除和取余操作，实现对数据的分解。

粗略解析只能到这里了，具体还要看HMM等相关理论才能想明白那些操作吧！

代码源于nilmtk包的源码文件.

NILMTK——深扒组合优化（CO）和FHMM细节相关推荐

使用GNN求解组合优化问题
文章目录 1 论文内容 1.1 先验知识 1.2 论文方法 1.2.1 大致原理 1.2.2 源码关键实现 1.3 实际问题上的应用 1.3.1 风险分散 1.3.2 Interval Schedul ...
讲座笔记：图匹配 Graph Matching 问题 | 机器学习组合优化
讲座信息: 主讲人:严骏驰上海交通大学主办单位:运筹OR帷幄讲座时间:2020年9月9日讲座地点:线上讲座链接:https://www.bilibili.com/video/BV1Zf4y1 ...
基于组合优化的旅行商问题
基于组合优化的旅行商问题组合优化组合优化中的TSP 综述一.问题叙述二.图在计算机上的表示三.TSP问题的暴力解法四.Prim算法.Kruskal算法拓展五.利用最小生成树的结果来找旅行 ...
(转)当AI变成宣传武器：继续深扒大数据公司Cambrige Analytica
当AI变成宣传武器:继续深扒大数据公司Cambrige Analytica 原创 2017-02-27 造就造就导语:2016年美国大选已然结束,但武器化的AI宣传机器只是刚刚兴起,它代表的是一个 ...
python 组合优化回撤最小_【揭秘专业投资者的武器】经典组合优化模型在行业资产配置中的应用示例...
组合优化的目的在于给予高收益,低风险的标的更多的权重,来提高组合整体表现.策略里面大部分情况下都会默认平均持仓的方法,由于没有考虑各个标的风险的不同,标的之间的相关性,并未较好的解决鸡蛋在一个篮子里的 ...
超参数调优河伯、组合优化器CompBO，华为诺亚开源贝叶斯优化库
视学算法报道编辑:陈萍.杜伟华为诺亚开源了一个贝叶斯优化的库,该库包含三个部分:河伯.T-LBO.CompBO. 贝叶斯优化可以说是一种黑盒优化算法,该算法用于求解表达式未知函数的极值问题.因其具 ...
《强化学习周刊》第12期：强化学习应用之组合优化
No.12 智源社区强化学习组强化学习研究观点资源活动关于周刊强化学习作为人工智能领域研究热点之一,它在组合优化领域中的应用研究进展与成果也引发了众多关注.为帮助研究与工程人员 ...
强化学习应用于组合优化问题
https://www.toutiao.com/a6677162862743388686/ 将算法设计为自动化,可以为解决困难的COP问题可以节省大量的金钱和时间,也许可以产生比人类设计的方法更好的解 ...
【量化交易】组合优化三部曲：换手率和alpha模型换手约束下的最优模型时变IC下的多空/多头最优组合换手率
前言单因子模型,考虑策略风险(即IC时序波动),最大化风险调整后收益的主动增强组合优化 01 无约束下,多空最优组合的换手率的解析解 02 跟踪误差约束下,多头最优组合的换手率的数值优化 03 跟踪 ...

NILMTK——深扒组合优化（CO）和FHMM细节

NILMTK——深扒组合优化（CO）和FHMM细节相关推荐

最新文章

热门文章