kaggle 的online judge需要科学上网,链接如下:

ML2020spring - hw2 | Kaggle

另附一个在作业时看到的很好的文章:

机器学习之逻辑斯蒂回归_梅菜扣肉-CSDN博客

2020李宏毅机器学习课程作业——Homework2:classification(Logistic Regression)_梅菜扣肉-CSDN博客

文章中提到;

判别式分类器才是首选。当训练集增加,尽管两种算法都会表现更好,但效果好的表现形式却是不同的。从反复实验中观察到,训练集增加后,判别式方法会达到更低的渐进误差,而生成式方法会更快的达到渐进误差,但这个误差比判别方法的渐进误差大。

而从我们上述推到过程中也可以发现,生成模型我们进行了诸多假设,这些假设可以使噪声的影响被抵消,所以生成式模型受样本数据的影响较小,而判别式模型受数据和噪声影响更大。

助教给的原始源代码scores

修改代码

HW2

1.将优化方式改为Adagrad

Deep Learning 最优化方法之AdaGrad_BVL的博客-CSDN博客_adagrad

下图为助教给出的优化方式:

# Calcuate the number of parameter updates
step = 1# Iterative training
for epoch in range(max_iter):# Random shuffle at the begging of each epochX_train, Y_train = _shuffle(X_train, Y_train)# Mini-batch trainingfor idx in range(int(np.floor(train_size / batch_size))):X = X_train[idx*batch_size:(idx+1)*batch_size]Y = Y_train[idx*batch_size:(idx+1)*batch_size]# Compute the gradientw_grad, b_grad = _gradient(X, Y, w, b)# gradient descent update# learning rate decay with timew = w - learning_rate/np.sqrt(step) * w_gradb = b - learning_rate/np.sqrt(step) * b_gradstep = step + 1

用adagrad优化的代码

# adagrad所需的累加和
adagrad_w = 0
adagrad_b = 0
# 防止adagrad除零
eps = 1e-8# 迭代训练
for epoch in range(max_iter):# 在每个epoch开始时,随机打散训练数据X_train, Y_train = _shuffle(X_train, Y_train)# Mini-batch训练for idx in range(int(np.floor(train_size / batch_size))):X = X_train[idx * batch_size:(idx + 1) * batch_size]Y = Y_train[idx * batch_size:(idx + 1) * batch_size]# 计算梯度w_grad, b_grad = _gradient(X, Y, w, b)adagrad_w += w_grad**2adagrad_b += b_grad**2# 梯度下降法adagrad更新w和bw = w - learning_rate / (np.sqrt(adagrad_w + eps)) * w_gradb = b - learning_rate / (np.sqrt(adagrad_b + eps)) * b_grad

adagrad更新公式:

2.添加二次特征项来增加拟合效果

(举个例子,淘宝上有卖纸箱子的,各种型号,价格不等:

给你一组训练数据(a=长,b=宽,c=高)和他们对应的价格,让你训练出一个模型来预测箱子的价格。(假如卖家是根据箱子的体积来给箱子定价的)

这时候如果你直接用a,b,c的一次项来做线性回归,效果肯定不好;但如果你增加一个新变量d=a*b*c,也就是箱子的体积,再做线性回归,你会发现拟合效果会非常好。当箱子都是立方体的时候(a=b=c),那就相当于增加一个变量d=a^3表示体积来拟合(也就是你说的三次方,本质都是多项式)。

至于什么时候要用变量的多项式来拟合,要看具体问题和你的先验知识了,有时候需要不断的尝试。
链接:https://www.zhihu.com/question/264245010/answer/278623526
来源:知乎)

我把加入的特征平方项改为了 立方项,结果提升甚微 private score略有提高,public score略有下降。

下图为结果。

最后调参

# Some parameters for training
max_iter = 272
batch_size = 128
learning_rate = 0.1

优化明显

也是打到了public strong baseline(0.89052)

最后附上源代码:

为方便查找、优化代码,采取jupyter notebook的段式结构

import numpy as npnp.random.seed(0)
X_train_fpath = '../input/ml2020spring-hw2/data/X_train'
Y_train_fpath = '../input/ml2020spring-hw2/data/Y_train'
X_test_fpath = '../input/ml2020spring-hw2/data/X_test'
output_fpath = './output_{}.csv'# Parse csv files to numpy array
with open(X_train_fpath) as f:next(f)X_train = np.array([line.strip('\n').split(',')[1:] for line in f], dtype = float)
with open(Y_train_fpath) as f:next(f)Y_train = np.array([line.strip('\n').split(',')[1] for line in f], dtype = float)
with open(X_test_fpath) as f:next(f)X_test = np.array([line.strip('\n').split(',')[1:] for line in f], dtype = float)def _normalize(X, train = True, specified_column = None, X_mean = None, X_std = None):# This function normalizes specific columns of X.# The mean and standard variance of training data will be reused when processing testing data.## Arguments:#     X: data to be processed#     train: 'True' when processing training data, 'False' for testing data#     specific_column: indexes of the columns that will be normalized. If 'None', all columns#         will be normalized.#     X_mean: mean value of training data, used when train = 'False'#     X_std: standard deviation of training data, used when train = 'False'# Outputs:#     X: normalized data#     X_mean: computed mean value of training data#     X_std: computed standard deviation of training dataif specified_column == None:specified_column = np.arange(X.shape[1])if train:X_mean = np.mean(X[:, specified_column] ,0).reshape(1, -1)X_std  = np.std(X[:, specified_column], 0).reshape(1, -1)X[:,specified_column] = (X[:, specified_column] - X_mean) / (X_std + 1e-8)return X, X_mean, X_stddef _add_feature(X):X_2 = np.power(X,3)X = np.concatenate([X,X_2], axis=1)return X# 引入二次项
X_train = _add_feature(X_train)
X_test = _add_feature(X_test)def _train_dev_split(X, Y, dev_ratio = 0.25):# This function spilts data into training set and development set.train_size = int(len(X) * (1 - dev_ratio))return X[:train_size], Y[:train_size], X[train_size:], Y[train_size:]# Normalize training and testing data
X_train, X_mean, X_std = _normalize(X_train, train = True)
X_test, _, _= _normalize(X_test, train = False, specified_column = None, X_mean = X_mean, X_std = X_std)# Split data into training set and development set
dev_ratio = 0.1
X_train, Y_train, X_dev, Y_dev = _train_dev_split(X_train, Y_train, dev_ratio = dev_ratio)train_size = X_train.shape[0]
dev_size = X_dev.shape[0]
test_size = X_test.shape[0]
data_dim = X_train.shape[1]
print('Size of training set: {}'.format(train_size))
print('Size of development set: {}'.format(dev_size))
print('Size of testing set: {}'.format(test_size))
print('Dimension of data: {}'.format(data_dim))
def _shuffle(X, Y):# This function shuffles two equal-length list/array, X and Y, together.randomize = np.arange(len(X))np.random.shuffle(randomize)return (X[randomize], Y[randomize])def _sigmoid(z):# Sigmoid function can be used to calculate probability.# To avoid overflow, minimum/maximum output value is set.return np.clip(1 / (1.0 + np.exp(-z)), 1e-8, 1 - (1e-8))def _f(X, w, b):# This is the logistic regression function, parameterized by w and b## Arguements:#     X: input data, shape = [batch_size, data_dimension]#     w: weight vector, shape = [data_dimension, ]#     b: bias, scalar# Output:#     predicted probability of each row of X being positively labeled, shape = [batch_size, ]return _sigmoid(np.matmul(X, w) + b)def _predict(X, w, b):# This function returns a truth value prediction for each row of X # by rounding the result of logistic regression function.return np.round(_f(X, w, b)).astype(np.int)def _accuracy(Y_pred, Y_label):# This function calculates prediction accuracyacc = 1 - np.mean(np.abs(Y_pred - Y_label))return acc
def _cross_entropy_loss(y_pred, Y_label):# This function computes the cross entropy.## Arguements:#     y_pred: probabilistic predictions, float vector#     Y_label: ground truth labels, bool vector# Output:#     cross entropy, scalarcross_entropy = -np.dot(Y_label, np.log(y_pred)) - np.dot((1 - Y_label), np.log(1 - y_pred))return cross_entropydef _gradient(X, Y_label, w, b):# This function computes the gradient of cross entropy loss with respect to weight w and bias b.y_pred = _f(X, w, b)pred_error = Y_label - y_predw_grad = -np.sum(pred_error * X.T, 1)b_grad = -np.sum(pred_error)return w_grad, b_grad
# Zero initialization for weights ans bias
w = np.zeros((data_dim,))
b = np.zeros((1,))# Some parameters for training
max_iter = 272
batch_size = 128
learning_rate = 0.1# Keep the loss and accuracy at every iteration for plotting
train_loss = []
dev_loss = []
train_acc = []
dev_acc = []# adagrad所需的累加和
adagrad_w = 0
adagrad_b = 0
# 防止adagrad除零
eps = 1e-8# 迭代训练
for epoch in range(max_iter):# 在每个epoch开始时,随机打散训练数据X_train, Y_train = _shuffle(X_train, Y_train)# Mini-batch训练for idx in range(int(np.floor(train_size / batch_size))):X = X_train[idx * batch_size:(idx + 1) * batch_size]Y = Y_train[idx * batch_size:(idx + 1) * batch_size]# 计算梯度w_grad, b_grad = _gradient(X, Y, w, b)adagrad_w += w_grad**2adagrad_b += b_grad**2# 梯度下降法adagrad更新w和bw = w - learning_rate / (np.sqrt(adagrad_w + eps)) * w_gradb = b - learning_rate / (np.sqrt(adagrad_b + eps)) * b_grad# Compute loss and accuracy of training set and development sety_train_pred = _f(X_train, w, b)Y_train_pred = np.round(y_train_pred)train_acc.append(_accuracy(Y_train_pred, Y_train))train_loss.append(_cross_entropy_loss(y_train_pred, Y_train) / train_size)y_dev_pred = _f(X_dev, w, b)Y_dev_pred = np.round(y_dev_pred)dev_acc.append(_accuracy(Y_dev_pred, Y_dev))dev_loss.append(_cross_entropy_loss(y_dev_pred, Y_dev) / dev_size)print('Training loss: {}'.format(train_loss[-1]))
print('Development loss: {}'.format(dev_loss[-1]))
print('Training accuracy: {}'.format(train_acc[-1]))
print('Development accuracy: {}'.format(dev_acc[-1]))
import matplotlib.pyplot as plt# Loss curve
plt.plot(train_loss)
plt.plot(dev_loss)
plt.title('Loss')
plt.legend(['train', 'dev'])
plt.savefig('loss.png')
plt.show()# Accuracy curve
plt.plot(train_acc)
plt.plot(dev_acc)
plt.title('Accuracy')
plt.legend(['train', 'dev'])
plt.savefig('acc.png')
plt.show()
# Predict testing labels
predictions = _predict(X_test, w, b)
with open(output_fpath.format('logistic'), 'w') as f:f.write('id,label\n')for i, label in  enumerate(predictions):f.write('{},{}\n'.format(i, label))

本人刚接触机器学习不久,以写博客的方式记录学习过程同时也为了帮助和我一起在学习过程的遇到困难的小伙伴,文中如有错误,希望大家指正。

李宏毅ML2020作业2(源代码在最后)相关推荐

  1. python线性回归预测pm2.5_线性回归预测PM2.5----台大李宏毅机器学习作业1(HW1)

    一.作业说明 给定训练集train.csv,要求根据前9个小时的空气监测情况预测第10个小时的PM2.5含量. 训练集介绍: (1).CSV文件,包含台湾丰原地区240天的气象观测资料(取每个月前20 ...

  2. 李宏毅机器学习作业6-使用GAN生成动漫人物脸

    理论部分参考:​李宏毅机器学习--对抗生成网络(GAN)_iwill323的博客-CSDN博客 目录 任务和数据集 评价方法 FID AFD (Anime face detection) rate 代 ...

  3. 李宏毅机器学习作业2:Winner还是Losser(含训练数据)

    训练数据以及源代码在我的Github:https://github.com/taw19960426/DeepLearning/tree/master/%E4%BD%9C%E4%B8%9A/%E4%BD ...

  4. 【C语言】以通讯录为例理解宿舍管理系统,图书管理系统完成C语言期末作业。源代码见文章末尾

    学完结构体来尝试写一个通讯录吧,以通讯录为例带你理解图书管理系统,宿舍管理系统解决C语言期末作业,需要基础的结构体与指针知识,基础的动态内存知识以及基础的文件操作知识. 源代码见文章末尾 1.理清思路 ...

  5. 李宏毅机器学习作业4——Recurrent Neural Network

    本作业来源于李宏毅机器学习作业说明,详情可看

  6. 李宏毅2020作业3---CNN

    其他作业指路:⭐李宏毅机器学习2020作业汇总 1.配置环境 第三方库: cv2 pytorch torchvision 理论: keras常用函数: pytorch常用函数: 程序: 函数: 1.s ...

  7. 李宏毅机器学习作业一

    前言 PM2.5预测是李宏毅老师机器学习课程第一个作业,要求实现linear regression预测PM2.5的数值.在第一个线性回归的作业中遇到了很多难点,第一是虽然给出了数据集,但是要对数据集做 ...

  8. 李宏毅2020作业4---RNN

    ​其他作业指路:⭐李宏毅机器学习2020作业汇总 目录 ==作业说明== ==数据说明== ==原理== *LSTM* ==参考内容== 作业说明 通过RNN进行情感分析,给定一个句子,判断这个句子是 ...

  9. 李宏毅ML作业笔记1: 预测PM2.5(kaggle预测与报告题目)

    作业笔记被我手贱删了, 只能凭回忆重新总结个简洁版本的 我用的是kaggle提供的在线notebook, 代码与部分思路公开在: https://www.kaggle.com/laugoon/home ...

最新文章

  1. linux下的基础操作
  2. fatal error C1083: Cannot open include file: 'ceconfig.h': No such file or directory
  3. Codeforces Round #668 (Div. 2)
  4. java biginteger 运算_Java大数字运算之BigInteger 原创
  5. iOS-启动动态页跳过设计思路 立即下载
  6. 怎么退出mysql_如何安全地关闭MySQL
  7. DB2数据库连接问题:java.lang.NoClassDefFoundError
  8. python与vb可以互换吗_VB6+Python混合编程(COM组件)(转)
  9. 数学建模LaTeX入门
  10. 数字一阶低通滤波器立体解析
  11. orientdb 学习
  12. 阿里妈妈-网络广告位投资策略
  13. 怎样才能跳过实名认证_和平精英qq怎么跳过实名认证!老司机告诉你仅需5步
  14. QQ拼音输入法的人机交互
  15. spark程序运行异常:java.lang.OutOfMemoryError: GC overhead limit exceeded
  16. Unity3d--实现第三人称视角(相机跟随)
  17. 记录:关于“扫雷”的布雷招
  18. 刷题神器小程序【飞刀帮刷题】,从此学习考试无忧虑
  19. java毕业设计银杏湖景区旅游管理信息平台Mybatis+系统+数据库+调试部署
  20. xorm reverse mysql_xorm 工具

热门文章

  1. android音乐播放器 单曲循环,[Android] MediaPlayer单曲循环不卡顿
  2. ecu故障现象_老司机聊发动机异常熄火故障:ECU可以换,程序不能乱
  3. Java基础方法重写
  4. Rhino可以用python脚本吗?
  5. dpi与dp的关系_Android中 dp,px,dpi三者之间的关系
  6. Linux CentOs 搭建Discuz论坛全过程
  7. UE4地编基础-灯光篇
  8. 通过B站搜索页将搜索到的内容批量下载,并且显示下载进度
  9. xctf 攻防世界-forgot writeup
  10. 数据分析 之 三大思路