样本示意,为kdd99数据源:

0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.01,0.00,0.00,0.00,0.00,0.00,snmpgetattack.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,0.01,0.00,0.00,0.00,0.00,0.00,snmpgetattack.
0,udp,domain_u,SF,29,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0.00,0.00,0.00,0.00,0.50,1.00,0.00,10,3,0.30,0.30,0.30,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,253,0.99,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack.
0,tcp,http,SF,223,185,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,4,4,0.00,0.00,0.00,0.00,1.00,0.00,0.00,71,255,1.00,0.00,0.01,0.01,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack.
0,tcp,http,SF,230,260,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,19,0.00,0.00,0.00,0.00,1.00,0.00,0.11,3,255,1.00,0.00,0.33,0.07,0.33,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.01,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,252,0.99,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack.
1,tcp,smtp,SF,3170,329,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,2,0.00,0.00,0.00,0.00,1.00,0.00,1.00,54,39,0.72,0.11,0.02,0.00,0.02,0.00,0.09,0.13,normal.
0,tcp,http,SF,297,13787,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,177,255,1.00,0.00,0.01,0.01,0.00,0.00,0.00,0.00,normal.
0,tcp,http,SF,291,3542,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,12,12,0.00,0.00,0.00,0.00,1.00,0.00,0.00,187,255,1.00,0.00,0.01,0.01,0.00,0.00,0.00,0.00,normal.
0,tcp,http,SF,295,753,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,21,22,0.00,0.00,0.00,0.00,1.00,0.00,0.09,196,255,1.00,0.00,0.01,0.01,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.01,0.00,0.00,0.00,0.00,0.00,snmpgetattack.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack.
0,tcp,http,SF,268,9235,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,5,5,0.00,0.00,0.00,0.00,1.00,0.00,0.00,58,255,1.00,0.00,0.02,0.05,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,253,0.99,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack.
0,tcp,http,SF,223,185,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,3,3,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,tcp,http,SF,227,8841,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,13,13,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,tcp,http,SF,222,19564,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,22,23,0.00,0.00,0.00,0.00,1.00,0.00,0.09,255,255,1.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,tcp,ftp_data,SF,740,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,77,33,0.34,0.08,0.34,0.06,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,tcp,ftp_data,SF,35195,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,10,0.00,0.00,0.00,0.00,1.00,0.00,0.00,92,44,0.43,0.07,0.43,0.05,0.00,0.00,0.00,0.00,normal.
0,tcp,ftp_data,SF,8325,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20,20,0.00,0.00,0.00,0.00,1.00,0.00,0.00,103,54,0.49,0.06,0.49,0.04,0.00,0.00,0.00,0.00,normal.

代码:

# -*- coding:utf-8 -*-import re
import matplotlib.pyplot as plt
import os
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import preprocessing
from sklearn import cross_validation
import os
from sklearn.datasets import load_iris
from sklearn import tree
import pydotplus
from sklearn.preprocessing import LabelEncoder
import numpy as np
import pandas as pd
from sklearn_pandas import DataFrameMapperdef label(x):if x == "normal.":return 0else:return 1if __name__ == '__main__':data = pd.read_csv('../data/kddcup99/corrected', sep=",", header=None)print data.columnsprint data.iloc[0,0], data.iloc[0,1]print len(data)col_cnt = len(data.columns)normal = data.loc[data.loc[:, col_cnt-1] == "normal.", :]print "normal len:", len(normal)guess = data.loc[data.loc[:, col_cnt-1] == "guess_passwd.", :]print "normal len:", len(guess)data = pd.concat([normal, guess])print len(data)le = preprocessing.LabelEncoder()for i in range(col_cnt-1): if isinstance(data.iloc[0,i], str):print "tranform string column only:", idata.loc[:,i] = le.fit_transform(data.loc[:,i])data.loc[:,col_cnt-1] = data.loc[:,col_cnt-1].apply(label)print data.iloc[0,0], data.iloc[0,1]x = data.iloc[:, range(col_cnt-1)]#x = data.iloc[:, [0,4,5,6,7,8,22,23,24,25,26,27,28,29,30]]y = data.iloc[:, col_cnt-1]    ''' also OK    data = data.as_matrix()    x = data[:, range(col_cnt-1)]    y = data[:, col_cnt-1]    '''print "x=>"print x.iloc[0:3, :]print "y=>"print y[-3:]#v=load_kdd99("../data/kddcup99/corrected")#x,y=get_guess_passwdandNormal(v)clf = tree.DecisionTreeClassifier()clf = clf.fit(x, y)print clfprint  cross_validation.cross_val_score(clf, x, y, n_jobs=-1, cv=10)clf = clf.fit(x, y)dot_data = tree.export_graphviz(clf, out_file=None)graph = pydotplus.graph_from_dot_data(dot_data)graph.write_pdf("../photo/6/iris-dt.pdf")

结果:

Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,34, 35, 36, 37, 38, 39, 40, 41],dtype='int64')
0 udp
311029
normal len: 60593
normal len: 4367
64960
tranform string column only: 1
tranform string column only: 2
tranform string column only: 3
0 2
x=>0   1   2   3    4    5   6   7   8   9  ...    31   32   33    34   35  \
0   0   2  15   7  105  146   0   0   0   0 ...   255  254  1.0  0.01  0.0
1   0   2  15   7  105  146   0   0   0   0 ...   255  254  1.0  0.01  0.0
2   0   2  15   7  105  146   0   0   0   0 ...   255  254  1.0  0.01  0.0   36   37   38   39   40
0  0.0  0.0  0.0  0.0  0.0
1  0.0  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  0.0  0.0  [3 rows x 41 columns]
y=>
142098    1
142099    1
142101    1
Name: 41, dtype: int64
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,max_features=None, max_leaf_nodes=None,min_impurity_decrease=0.0, min_impurity_split=None,min_samples_leaf=1, min_samples_split=2,min_weight_fraction_leaf=0.0, presort=False, random_state=None,splitter='best')
fg[ 0.9561336   0.99892258  0.99938433  0.99984606  0.99984606  0.999692121.          0.99984604  0.99969207  1.        ]

转载于:https://www.cnblogs.com/bonelee/p/7808478.html

pandas dataframe 做机器学习训练数据=》直接使用iloc或者as_matrix即可相关推荐

  1. 哪些电脑最适合做机器学习、数据科学和深度学习呢?这里有份调研报告

    选自Medium 机器之心编译 作者:Towards AI Team 编辑:陈萍.杜伟 一份来自 Towards AI 的关于机器学习.数据科学和深度学习的最佳笔记本电脑.在预算范围内,入手最适合的笔 ...

  2. pandas DataFrame 缺失值处理(数据预处理)

    pandas DataFrame 缺失值处理 (数据预处理) 创建DataFrame数据样例 import pandas as pd import numpy as np df = pd.DataFr ...

  3. TensorFlow2.4可以在MacBook Pro/Mac Pro上利用GPU做机器学习训练了

    田海立@CSDN 2020-11-21 之前MacBook上TensorFlow只能利用CPU做训练,TF2.4开始可以利用GPU做训练了,并且不管是M1的MacBook Pro,还是Intel架构的 ...

  4. python pandas.DataFrame选取、修改数据

    文章转载自: https://blog.csdn.net/AlanGuoo/article/details/52331901 相信很多人像我一样在学习python,pandas过程中对数据的选取和修改 ...

  5. 训练数据也外包?这家公司“承包”了不少注释训练数据,原来是这样做的……...

    作者 |  Lionbridge AI 译者 | 天道酬勤 责编 | 徐威龙 封图| CSDN│下载于视觉中国 出品 |  AI科技大本营(ID:rgznai100) 在机器学习领域,训练数据准备是最 ...

  6. pandas转mysql特定列_在pandas.DataFrame.to_sql时指定数据库表的列类型

    问题 在数据分析并存储到数据库时,Python的Pandas包提供了to_sql 方法使存储的过程更为便捷,但如果在使用to_sql方法前不在数据库建好相对应的表,to_sql则会默认为你创建一个新表 ...

  7. python pandas dataframe 转json_python-将嵌套的json转换为pandas dataframe

    时间: 2019-10-27 07:33:05 标签: pandas python 我正在尝试将嵌套的json数组转换为 pandas dataframe . 列表格式的数据如下所示: [{u'ana ...

  8. 深度学习,怎么知道你的训练数据真的够了?

    最近有很多关于数据是否是新模型驱动 [1] [2] 的讨论,无论结论如何,都无法改变我们在实际工作中获取数据成本很高这一事实(人工费用.许可证费用.设备运行时间等方面). 因此,在机器学习项目中,一个 ...

  9. 在图数据上做机器学习,应该从哪个点切入?

    作者 | David Mack 编译 | ronghuaiyang 来源 | AI公园(ID:AI_Paradise) [导读]很多公司和机构都在使用图数据,想在图上做机器学习但不知从哪里开始做,希望 ...

最新文章

  1. Chrome Extension 检查视图(无效)处理方法
  2. 混色,半透明的设定,以及我们视角即屏幕处在-1层,-1层的物体是看不见的
  3. Metasploit search命令使用技巧
  4. apache应用进阶
  5. Python—实训day4—爬虫案例3:贴吧图片下载
  6. 51 nod 1127最短的包含字符串(尺取法)
  7. 9.6 LSMW程序删除操作手册-录屏
  8. Rolling cURL: PHP并发最佳实践
  9. (02)System Verilog 程序块结束仿真
  10. 手机访问 电脑的html文件,手机能访问电脑的共享文件吗 如何用手机看电脑文件...
  11. java nio connect_服务器或客户端上的Java NIO套接字在什么时...
  12. 关于SimpleDateFormat的一些使用及性能数据
  13. 极客大学架构师训练营 性能测试 性能优化 第七次作业
  14. 四大开源3d游戏引擎探究
  15. Quartz 定时任务管理
  16. matlab逆变换法产生随机数_[原创]Matlab 生成随机数
  17. 宏转录组方法_Cell:基因表达的改变和群落的更替塑造了全球海洋宏转录组
  18. ubuntu 9配置
  19. Unity面板显示中文属性
  20. 2022-11-01 网工进阶(三十四) IP组播协议(PIM)-模式概述、组播分发树的分类、PIM路由表项、PIM-DM工作原理(组播分发树的形成、配置举例)

热门文章

  1. matlab绘制bland-altman,制作Bland-Altman图的步骤和程序(以SPSS作图为例讲解)
  2. java group类_浅析Java中线程组(ThreadGroup类)
  3. vsphere服务器虚拟化流程,VMware vSphere服务器虚拟化实验
  4. qt中socket通信流程图_使用QT实现简单的tcp/ip通信
  5. PHP的CI框架学习
  6. java文件不能生成class,一文说清!
  7. Android性能优化面试题集锦,终局之战
  8. CUDA编程快速入门教程
  9. 【OpenCV环境配置】Xcode+OpenCV+pkg-config
  10. 【深度学习入门到精通系列】目标检测评估之P-R曲线深入理解