
by ChenjingDing

分别使用K近邻和核函数的方法。为输入样本xˆx^\widehat{x}估计概率密度函数。xˆx^\widehat{x} = linspace(-5, 5, 100);
假设xtraixtraix_{trai} = random(‘norm’, 0, 1, 1, 100)服从高斯分布,则估计出的概率密度也应该服从高斯分布。


close all;
clc;%get fixed K of knn and fixed h of kde
p = parameters();disp('Question: Kernel/K-Nearest Neighborhood Density Estimators');% Produce the random samples
samples = random('norm', 0, 1, 1, 100);% Compute the original normal distribution
realDensity = gauss1D(0, 1, 100, 5);% Estimate the probability density using the KDE
estDensity = kde(samples, p.h);% plot results
plot(estDensity(1, :), estDensity(2, :), 'r', 'LineWidth', 1.5);
hold on;
plot(realDensity(1, :), realDensity(2, :), 'b', 'LineWidth', 1.5);
legend('KDE Estimated Distribution', 'Real Distribution');
hold off;% Estimate the probability density using KNN
estDensity = knn(samples, p.k);% Plot the distributions
plot(estDensity(1, :), estDensity(2, :), 'r', 'LineWidth', 1.5);
hold on;
plot(realDensity(1, :), realDensity(2, :), 'b', 'LineWidth', 1.5);
legend('KNN Estimated Distribution', 'Real Distribution');
hold off;


function estDensity = knn(samples, k)% compute density estimation from samples with KNN% Input%  samples    : DxN matrix of data points%  k          : number of neighbors% Output%  estDensity : estimated density in the range of [-5, 5]% Compute the number of the samples createdN = length(samples);% Create a linearly spaced vectorpos = linspace(-5, 5, 100);% Create two big matrices to avoid for loopsx = repmat(pos, N, 1);samples = repmat(samples', 1, length(pos));% Sort the distances so that we can choose the k-th pointdists = sort(abs(x-samples), 1);% Estimate the probability density using the k-NN density estimation% dists(k, :) = V/2;    res = (k/(2*N)) ./ dists(k, :);% Form the output variableestDensity = [pos; res];end


function estDensity = kde(samples, h)% compute density estimation from samples with KDE% Input%  samples    : DxN matrix of data points%  h          : (half) window size/radius of kernel% Output%  estDensity : estimated density in the range of [-5,5]% Compute the number of samples createdN = length(samples);% Create a linearly spaced vectorpos = linspace(-5, 5, 100);% Create two big matrices to avoid for loopsx = repmat(pos, N, 1);samples = repmat(samples', 1, length(pos));% Estimate the density from the samples using a kernel density estimator% 参考机器学习(二)非参数估计核函数法 高斯函数一例res = sum(exp(-(x-samples).^2./(2*h^2)), 1) ./ (sqrt(2*pi)*h*N);% Form the output variableestDensity = [pos; res];end


function [realDensity] = gauss1D(m, v, N, w)pos = (-w:(2*w/N):w-w/N);meanV = repmat(m,N,1)';aux = pos - meanV;insE = (aux.*aux)./(v^2)*(-0.5);norm = 1/(v*sqrt(2*pi));res = norm.*exp(insE);realDensity = [pos;res];end


function p = parameters()p.k = 30; %knn neighborsp.h = 0.3; %kde windowsize/radius


图8 核函数估计概率密度结果(红色曲线为估计值,蓝色曲线为理想值)

图9 K近邻法估计概率密度结果(红色曲线为估计值,蓝色曲线为理想值)


