达观杯_构建模型（三）lightGBM

countvector(a)+doc(a)+hash(a)

"""
1.特征：countvector(a)+doc(a)+hash(a)
2.模型：lgb
"""
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import pickle
import lightgbm as lgb"""=====================================================================================================================
1 读取数据,并转换到lgb的标准数据格式
"""
with open('countvector(a)+doc(a)+hash(a).pkl', 'rb') as f:x_train, y_train, x_test = pickle.load(f)"""划分训练集和验证集，验证集比例为test_size"""
x_train, x_vali, y_train, y_vali = train_test_split(x_train, y_train, test_size=0.1, random_state=0)
d_train = lgb.Dataset(data=x_train, label=y_train)
d_vali = lgb.Dataset(data=x_vali, label=y_vali)"""=====================================================================================================================
2 训练lgb分类器
"""
params = {'boosting': 'gbdt','application': 'multiclassova','num_class': 20,'learning_rate': 0.1,'num_leaves':31,'max_depth':-1,'lambda_l1': 0,'lambda_l2': 0.5,'bagging_fraction' :1.0,'feature_fraction': 1.0}bst = lgb.train(params, d_train, num_boost_round=800, valid_sets=d_vali,feval=f1_score_vali, early_stopping_rounds=None,verbose_eval=True)"""=====================================================================================================================
3 对测试集进行预测;将预测结果转换为官方标准格式；并将结果保存至本地
"""
y_proba = bst.predict(x_test)
y_test = np.argmax(y_proba, axis=1) + 1df_result = pd.DataFrame(data={'id':range(102277), 'class': y_test.tolist()})
df_proba = pd.DataFrame(data={'id':range(102277), 'proba': y_proba.tolist()})df_result.to_csv('lgb_countvector(a)+doc(a)+hash(a).csv',index=False)
df_proba.to_csv('lgb_countvector(a)+doc(a)+hash(a)_proba.csv',index=False)

特征：countvector(w)+doc(w)+hash(w)

"""
1.特征：countvector(w)+doc(w)+hash(w)
2.模型：lgb
"""
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import pickle
import lightgbm as lgb"""=====================================================================================================================
1 读取数据,并转换到lgb的标准数据格式
"""
with open('countvector(w)+doc(w)+hash(w).pkl', 'rb') as f:x_train, y_train, x_test = pickle.load(f)"""划分训练集和验证集，验证集比例为test_size"""
x_train, x_vali, y_train, y_vali = train_test_split(x_train, y_train, test_size=0.1, random_state=0)
d_train = lgb.Dataset(data=x_train, label=y_train)
d_vali = lgb.Dataset(data=x_vali, label=y_vali)"""=====================================================================================================================
2 训练lgb分类器
"""
params = {'boosting': 'gbdt','application': 'multiclassova','num_class': 20,'learning_rate': 0.1,'num_leaves':31,'max_depth':-1,'lambda_l1': 0,'lambda_l2': 0.5,'bagging_fraction' :1.0,'feature_fraction': 1.0}bst = lgb.train(params, d_train, num_boost_round=800, valid_sets=d_vali,feval=f1_score_vali, early_stopping_rounds=None,verbose_eval=True)"""=====================================================================================================================
3 对测试集进行预测;将预测结果转换为官方标准格式；并将结果保存至本地
"""
y_proba = bst.predict(x_test)
y_test = np.argmax(y_proba, axis=1) + 1df_result = pd.DataFrame(data={'id':range(102277), 'class': y_test.tolist()})
df_proba = pd.DataFrame(data={'id':range(102277), 'proba': y_proba.tolist()})df_result.to_csv('lgb_countvector(w)+doc(w)+hash(w).csv',index=False)
df_proba.to_csv('lgb_countvector(w)+doc(w)+hash(w)_proba.csv',index=False)

达观杯_构建模型（三）lightGBM相关推荐

达观杯_构建模型（四）贝叶斯
""" 1.特征:linearsvm-tfidf(word)+lr-tfidf(article) / doc2vec_word 2.模型:bayes"" ...
达观杯_构建模型（一）linearSVM
特征:tfidf(word)+tfidf(article) """ 1.特征:tfidf(word)+tfidf(article) 2.模型:linearsvm 3.参数 ...
达观杯_构建模型（二）逻辑回归
特征:tfidf(word+article) """ 1.特征:tfidf(word+article) 2.模型:lr 3.参数:C=120 ""&q ...
【小白学PyTorch】4.构建模型三要素与权重初始化
点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达文章目录: 1 模型三要素 2 参数初始化 3 完整运行代码 4 ...
qbytearray初始化全0_【小白学PyTorch】4.构建模型三要素与权重初始化
文章目录: 1 模型三要素 2 参数初始化 3 完整运行代码 4 尺寸计算与参数计算这篇文章内容不多,比较基础,里面的代码块可以复制到本地进行实践,以加深理解. 喜欢的话,可以给公众号加一个星标,点 ...
python_torch_加载数据集_构建模型_构建训练循环_保存和调用训练好的模型
以下代码均来自bilibili:[适用于初学者的Pytorch编程教学] 以下为完整代码,复制即可运行. import torch import time import json import tor ...
深度学习每层的通道数如何计算_深度学习基础系列（一）| 一文看懂用kersa构建模型的各层含义（掌握输出尺寸和可训练参数数量的计算方法）...
我们在学习成熟网络模型时,如VGG.Inception.Resnet等,往往面临的第一个问题便是这些模型的各层参数是如何设置的呢?另外,我们如果要设计自己的网路模型时,又该如何设置各层参数呢?如果模型 ...
lightgbm 保存模型过大_机器学习之12—Lightgbm
Lightgbm模型和GBDT以及XGboost一样,都是基于决策树的boosting集成模型: Lightgbm是一个快速高效.低内存占用.高准确度.支持并行和大规模数据处理的数据科学工具. 关于G ...
Scikit-learn 秘籍第三章使用距离向量构建模型
第三章使用距离向量构建模型作者:Trent Hauck 译者:飞龙协议:CC BY-NC-SA 4.0 这一章中,我们会涉及到聚类.聚类通常和非监督技巧组合到一起.这些技巧假设我们不知道结果变量 ...

达观杯_构建模型（三）lightGBM

countvector(a)+doc(a)+hash(a)

特征：countvector(w)+doc(w)+hash(w)

达观杯_构建模型（三）lightGBM相关推荐

最新文章

热门文章