import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

数据导入与数据探索

数据导入

df=pd.read_csv('train_data.csv',header=None)
col_name = ['duration','protocol_type','service','flag','src_bytes','dst_bytes','land','wrong_fragment','urgent','hot','num_failed_logins','logged_in','num_compromised','root_shell','su_attempted','num_root','num_file_creations','num_shells','num_access_files','num_outbound_cmds','is_hot_login','is_guest_login','count','srv_count','serror_rate','srv_serror_rate','rerror_rate','srv_rerror_rate','same_srv_rate','diff_srv_rate','srv_diff_host_rate','dst_host_count','dst_host_srv_count','dst_host_same_srv_rate','dst_host_diff_srv_rate','dst_host_same_src_port_rate','dst_host_srv_diff_host_rate','dst_host_serror_rate','dst_host_srv_serror_rate','dst_host_rerror_rate','dst_host_srv_rerror_rate','target' ]
# 将列名添加到df中
df.columns=col_name

数据查看

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4898431 entries, 0 to 4898430
Data columns (total 42 columns):
duration                       int64
protocol_type                  object
service                        object
flag                           object
src_bytes                      int64
dst_bytes                      int64
land                           int64
wrong_fragment                 int64
urgent                         int64
hot                            int64
num_failed_logins              int64
logged_in                      int64
num_compromised                int64
root_shell                     int64
su_attempted                   int64
num_root                       int64
num_file_creations             int64
num_shells                     int64
num_access_files               int64
num_outbound_cmds              int64
is_hot_login                   int64
is_guest_login                 int64
count                          int64
srv_count                      int64
serror_rate                    float64
srv_serror_rate                float64
rerror_rate                    float64
srv_rerror_rate                float64
same_srv_rate                  float64
diff_srv_rate                  float64
srv_diff_host_rate             float64
dst_host_count                 int64
dst_host_srv_count             int64
dst_host_same_srv_rate         float64
dst_host_diff_srv_rate         float64
dst_host_same_src_port_rate    float64
dst_host_srv_diff_host_rate    float64
dst_host_serror_rate           float64
dst_host_srv_serror_rate       float64
dst_host_rerror_rate           float64
dst_host_srv_rerror_rate       float64
target                         object
dtypes: float64(15), int64(23), object(4)
memory usage: 1.5+ GB

训练数据集一共4898431条数据,42个字段,15个浮点型,23个整型,4个字符串型;
其中41个 特征字段,1个标签字段(离散型、字符串),41个特征中,共32个连续类型,9个离散类型(包含3个字符型protocol_type、service、flag,6个0-1型land、logged_in、root_shell、su_attempted、is_hot_login、is_guest_login)

df.describe().T
count mean std min 25% 50% 75% max
duration 4898431.0 4.834243e+01 723.329811 0.0 0.00 0.0 0.00 5.832900e+04
src_bytes 4898431.0 1.834621e+03 941431.074484 0.0 45.00 520.0 1032.00 1.379964e+09
dst_bytes 4898431.0 1.093623e+03 645012.333754 0.0 0.00 0.0 0.00 1.309937e+09
land 4898431.0 5.716116e-06 0.002391 0.0 0.00 0.0 0.00 1.000000e+00
wrong_fragment 4898431.0 6.487792e-04 0.042854 0.0 0.00 0.0 0.00 3.000000e+00
urgent 4898431.0 7.961733e-06 0.007215 0.0 0.00 0.0 0.00 1.400000e+01
hot 4898431.0 1.243766e-02 0.468978 0.0 0.00 0.0 0.00 7.700000e+01
num_failed_logins 4898431.0 3.205108e-05 0.007299 0.0 0.00 0.0 0.00 5.000000e+00
logged_in 4898431.0 1.435290e-01 0.350612 0.0 0.00 0.0 0.00 1.000000e+00
num_compromised 4898431.0 8.088304e-03 3.856481 0.0 0.00 0.0 0.00 7.479000e+03
root_shell 4898431.0 6.818510e-05 0.008257 0.0 0.00 0.0 0.00 1.000000e+00
su_attempted 4898431.0 3.674646e-05 0.008082 0.0 0.00 0.0 0.00 2.000000e+00
num_root 4898431.0 1.293496e-02 3.938075 0.0 0.00 0.0 0.00 7.468000e+03
num_file_creations 4898431.0 1.188748e-03 0.124186 0.0 0.00 0.0 0.00 4.300000e+01
num_shells 4898431.0 7.430951e-05 0.008738 0.0 0.00 0.0 0.00 2.000000e+00
num_access_files 4898431.0 1.021143e-03 0.035510 0.0 0.00 0.0 0.00 9.000000e+00
num_outbound_cmds 4898431.0 0.000000e+00 0.000000 0.0 0.00 0.0 0.00 0.000000e+00
is_hot_login 4898431.0 4.082940e-07 0.000639 0.0 0.00 0.0 0.00 1.000000e+00
is_guest_login 4898431.0 8.351654e-04 0.028887 0.0 0.00 0.0 0.00 1.000000e+00
count 4898431.0 3.349734e+02 211.990782 0.0 121.00 510.0 511.00 5.110000e+02
srv_count 4898431.0 2.952671e+02 245.992710 0.0 10.00 510.0 511.00 5.110000e+02
serror_rate 4898431.0 1.779703e-01 0.381876 0.0 0.00 0.0 0.00 1.000000e+00
srv_serror_rate 4898431.0 1.780370e-01 0.382254 0.0 0.00 0.0 0.00 1.000000e+00
rerror_rate 4898431.0 5.766509e-02 0.232253 0.0 0.00 0.0 0.00 1.000000e+00
srv_rerror_rate 4898431.0 5.773010e-02 0.232660 0.0 0.00 0.0 0.00 1.000000e+00
same_srv_rate 4898431.0 7.898842e-01 0.389296 0.0 1.00 1.0 1.00 1.000000e+00
diff_srv_rate 4898431.0 2.117961e-02 0.082715 0.0 0.00 0.0 0.00 1.000000e+00
srv_diff_host_rate 4898431.0 2.826080e-02 0.140560 0.0 0.00 0.0 0.00 1.000000e+00
dst_host_count 4898431.0 2.329811e+02 64.020937 0.0 255.00 255.0 255.00 2.550000e+02
dst_host_srv_count 4898431.0 1.892142e+02 105.912767 0.0 49.00 255.0 255.00 2.550000e+02
dst_host_same_srv_rate 4898431.0 7.537132e-01 0.411186 0.0 0.41 1.0 1.00 1.000000e+00
dst_host_diff_srv_rate 4898431.0 3.071111e-02 0.108543 0.0 0.00 0.0 0.04 1.000000e+00
dst_host_same_src_port_rate 4898431.0 6.050520e-01 0.480988 0.0 0.00 1.0 1.00 1.000000e+00
dst_host_srv_diff_host_rate 4898431.0 6.464107e-03 0.041260 0.0 0.00 0.0 0.00 1.000000e+00
dst_host_serror_rate 4898431.0 1.780911e-01 0.381838 0.0 0.00 0.0 0.00 1.000000e+00
dst_host_srv_serror_rate 4898431.0 1.778859e-01 0.382177 0.0 0.00 0.0 0.00 1.000000e+00
dst_host_rerror_rate 4898431.0 5.792780e-02 0.230943 0.0 0.00 0.0 0.00 1.000000e+00
dst_host_srv_rerror_rate 4898431.0 5.765941e-02 0.230978 0.0 0.00 0.0 0.00 1.000000e+00

特征探索(特征统计分析)

新增异常类型字段

df.target.value_counts() 
smurf.              2807886
neptune.            1072017
normal.              972781
satan.                15892
ipsweep.              12481
portsweep.            10413
nmap.                  2316
back.                  2203
warezclient.           1020
teardrop.               979
pod.                    264
guess_passwd.            53
buffer_overflow.         30
land.                    21
warezmaster.             20
imap.                    12
rootkit.                 10
loadmodule.               9
ftp_write.                8
multihop.                 7
phf.                      4
perl.                     3
spy.                      2
Name: target, dtype: int64
# 已知的异常类型分类
#标签分类
def attack_classify(tag):dic_attack_type ={'DOS':['land.','pod.','teardrop.','back.','neptune.','smurf.'],'R2L':['spy.','phf.','multihop.','ftp_write.','imap.','warezmaster.','guess_passwd.','warezclient.'],'U2R':['buffer_overflow.','rootkit.','loadmodule.','perl.'],'PROBING':['nmap.','portsweep.','ipsweep.','satan.']}for i in dic_attack_type.keys():if tag in dic_attack_type[i]:return ielse:return tag
df['target_type']=df.target.apply(attack_classify)
df.target_type.value_counts()
DOS        3883370
normal.     972781
PROBING      41102
R2L           1126
U2R             52
Name: target_type, dtype: int64
gp = df.groupby(['target_type','target']).count()
gp.iloc[:,1]
target_type  target
DOS          back.                  2203land.                    21neptune.            1072017pod.                    264smurf.              2807886teardrop.               979
PROBING      ipsweep.              12481nmap.                  2316portsweep.            10413satan.                15892
R2L          ftp_write.                8guess_passwd.            53imap.                    12multihop.                 7phf.                      4spy.                      2warezclient.           1020warezmaster.             20
U2R          buffer_overflow.         30loadmodule.               9perl.                     3rootkit.                 10
normal.      normal.              972781
Name: protocol_type, dtype: int64

特征字段查看(离散型)

protocol_type、service、flag

df.protocol_type.value_counts()
icmp    2833545
tcp     1870598
udp      194288
Name: protocol_type, dtype: int64
df.service.value_counts()
ecr_i        2811660
private      1100831
http          623091
smtp           96554
other          72653...
tftp_u             3
http_8001          2
aol                2
harvest            2
http_2784          1
Name: service, Length: 70, dtype: int64
df.flag.value_counts()
SF        3744328
S0         869829
REJ        268874
RSTR         8094
RSTO         5344
SH           1040
S1            532
S2            161
RSTOS0        122
OTH            57
S3             50
Name: flag, dtype: int64

0-1型字段探索(land、logged_in、root_shell、su_attempted、is_hot_login、is_guest_login)

# land、logged_in、root_shell、su_attempted、is_hot_login、is_guest_login
df.land.value_counts()
0    4898403
1         28
Name: land, dtype: int64
df.logged_in.value_counts()
0    4195364
1     703067
Name: logged_in, dtype: int64
df.root_shell.value_counts()
0    4898097
1        334
Name: root_shell, dtype: int64
df.su_attempted.value_counts()
0    4898321
2         70
1         40
Name: su_attempted, dtype: int64
df.is_hot_login.value_counts()
0    4898429
1          2
Name: is_hot_login, dtype: int64
df.is_guest_login.value_counts()
0    4894340
1       4091
Name: is_guest_login, dtype: int64

交叉表分析

离散字段(protocol_type、service、flag)与标签字段(target|target_type)关系探索

pd.crosstab(df.protocol_type,df.target).T
protocol_type icmp tcp udp
target
back. 0 2203 0
buffer_overflow. 0 30 0
ftp_write. 0 8 0
guess_passwd. 0 53 0
imap. 0 12 0
ipsweep. 11557 924 0
land. 0 21 0
loadmodule. 0 9 0
multihop. 0 7 0
neptune. 0 1072017 0
nmap. 1032 1034 250
normal. 12763 768670 191348
perl. 0 3 0
phf. 0 4 0
pod. 264 0 0
portsweep. 6 10407 0
rootkit. 0 7 3
satan. 37 14147 1708
smurf. 2807886 0 0
spy. 0 2 0
teardrop. 0 0 979
warezclient. 0 1020 0
warezmaster. 0 20 0
pd.set_option('max_columns',100)
pd.set_option('max_row',100) 
pd.crosstab(df.service,df.target)
target back. buffer_overflow. ftp_write. guess_passwd. imap. ipsweep. land. loadmodule. multihop. neptune. nmap. normal. perl. phf. pod. portsweep. rootkit. satan. smurf. spy. teardrop. warezclient. warezmaster.
service
IRC 0 0 0 0 0 0 0 0 0 0 0 520 0 0 0 0 0 1 0 0 0 0 0
X11 0 0 0 0 0 0 0 0 0 0 0 129 0 0 0 0 0 6 0 0 0 0 0
Z39_50 0 0 0 0 0 0 0 0 0 1066 1 0 0 0 0 9 0 2 0 0 0 0 0
aol 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0
auth 0 0 0 0 0 0 0 0 0 1037 1 2328 0 0 0 9 0 7 0 0 0 0 0
bgp 0 0 0 0 0 0 0 0 0 1035 1 0 0 0 0 9 0 2 0 0 0 0 0
courier 0 0 0 0 0 0 0 0 0 1011 1 0 0 0 0 7 0 2 0 0 0 0 0
csnet_ns 0 0 0 0 0 0 0 0 0 1038 1 0 0 0 0 9 0 3 0 0 0 0 0
ctf 0 0 0 0 0 13 0 0 0 1037 1 0 0 0 0 14 0 3 0 0 0 0 0
daytime 0 0 0 0 0 0 0 0 0 1037 1 0 0 0 0 15 0 3 0 0 0 0 0
discard 0 0 0 0 0 0 0 0 0 1040 1 0 0 0 0 15 0 3 0 0 0 0 0
domain 0 0 0 0 0 13 0 0 0 1046 1 38 0 0 0 12 0 3 0 0 0 0 0
domain_u 0 0 0 0 0 0 0 0 0 0 0 57773 0 0 0 0 0 9 0 0 0 0 0
echo 0 0 0 0 0 0 0 0 0 1040 1 0 0 0 0 15 0 3 0 0 0 0 0
eco_i 0 0 0 0 0 11517 0 0 0 0 1026 3768 0 0 0 4 0 23 0 0 0 0 0
ecr_i 0 0 0 0 0 40 0 0 0 0 6 3456 0 0 259 2 0 11 2807886 0 0 0 0
efs 0 0 0 0 0 0 0 0 0 1032 1 0 0 0 0 7 0 2 0 0 0 0 0
exec 0 0 0 0 0 0 0 0 0 1035 1 0 0 0 0 7 0 2 0 0 0 0 0
finger 0 0 0 0 0 13 20 0 0 1800 1 5017 0 0 0 13 0 27 0 0 0 0 0
ftp 0 1 2 0 0 13 0 1 2 1042 1 3821 0 0 0 13 1 8 0 0 0 307 2
ftp_data 0 8 4 0 0 13 0 3 3 1805 1 38093 0 0 0 14 1 26 0 0 0 708 18
gopher 0 0 0 0 0 13 0 0 0 1039 1 0 0 0 0 13 0 11 0 0 0 0 0
harvest 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0
hostnames 0 0 0 0 0 0 0 0 0 1037 1 0 0 0 0 9 0 3 0 0 0 0 0
http 2203 0 0 0 0 13 0 0 0 1801 1 619046 0 4 0 16 0 7 0 0 0 0 0
http_2784 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
http_443 0 0 0 0 0 0 0 0 0 1036 1 0 0 0 0 6 0 1 0 0 0 0 0
http_8001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0
imap4 0 0 0 0 12 0 0 0 0 1043 1 3 0 0 0 8 0 2 0 0 0 0 0
iso_tsap 0 0 0 0 0 0 0 0 0 1038 1 0 0 0 0 10 0 3 0 0 0 0 0
klogin 0 0 0 0 0 0 0 0 0 1040 1 0 0 0 0 7 0 2 0 0 0 0 0
kshell 0 0 0 0 0 0 0 0 0 1030 1 0 0 0 0 7 0 2 0 0 0 0 0
ldap 0 0 0 0 0 0 0 0 0 1033 1 0 0 0 0 6 0 1 0 0 0 0 0
link 0 0 0 0 0 13 0 0 0 1038 1 0 0 0 0 14 0 3 0 0 0 0 0
login 0 0 2 0 0 0 0 0 0 1032 1 0 0 0 0 7 0 3 0 0 0 0 0
mtp 0 0 0 0 0 13 0 0 0 1044 1 0 0 0 0 15 0 3 0 0 0 0 0
name 0 0 0 0 0 13 0 0 0 1036 1 0 0 0 0 14 0 3 0 0 0 0 0
netbios_dgm 0 0 0 0 0 0 0 0 0 1039 1 0 0 0 0 10 0 2 0 0 0 0 0
netbios_ns 0 0 0 0 0 0 0 0 0 1041 1 0 0 0 0 10 0 2 0 0 0 0 0
netbios_ssn 0 0 0 0 0 0 0 0 0 1042 1 0 0 0 0 8 0 4 0 0 0 0 0
netstat 0 0 0 0 0 0 0 0 0 1038 1 0 0 0 0 14 0 3 0 0 0 0 0
nnsp 0 0 0 0 0 0 0 0 0 1030 1 0 0 0 0 6 0 1 0 0 0 0 0
nntp 0 0 0 0 0 0 0 0 0 1043 1 0 0 0 0 9 0 6 0 0 0 0 0
ntp_u 0 0 0 0 0 0 0 0 0 0 0 3833 0 0 0 0 0 0 0 0 0 0 0
other 0 0 0 0 0 0 0 0 0 1022 1 56520 0 0 0 2649 3 12453 0 0 0 5 0
pm_dump 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0
pop_2 0 0 0 0 0 0 0 0 0 1043 1 0 0 0 0 8 0 3 0 0 0 0 0
pop_3 0 0 0 0 0 0 0 0 0 1046 1 922 0 0 0 9 0 3 0 0 0 0 0
printer 0 0 0 0 0 0 0 0 0 1034 1 0 0 0 0 7 0 3 0 0 0 0 0
private 0 0 0 0 0 702 0 0 0 1013720 1231 73853 0 0 0 7200 0 3146 0 0 979 0 0
red_i 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0
remote_job 0 0 0 0 0 13 0 0 0 1041 1 0 0 0 0 15 0 3 0 0 0 0 0
rje 0 0 0 0 0 13 0 0 0 1038 1 0 0 0 0 15 0 3 0 0 0 0 0
shell 0 0 0 0 0 0 0 0 0 1034 1 5 0 0 0 7 0 4 0 0 0 0 0
smtp 0 0 0 0 0 13 0 0 0 1140 1 95371 0 0 0 15 0 14 0 0 0 0 0
sql_net 0 0 0 0 0 0 0 0 0 1039 1 0 0 0 0 10 0 2 0 0 0 0 0
ssh 0 0 0 0 0 13 0 0 0 1039 1 7 0 0 0 12 0 3 0 0 0 0 0
sunrpc 0 0 0 0 0 0 0 0 0 1043 1 0 0 0 0 9 0 3 0 0 0 0 0
supdup 0 0 0 0 0 0 0 0 0 1042 1 0 0 0 0 14 0 3 0 0 0 0 0
systat 0 0 0 0 0 0 0 0 0 1038 1 0 0 0 0 14 0 3 0 0 0 0 0
telnet 0 21 0 53 0 14 1 5 2 1923 1 2227 3 0 0 13 5 7 0 2 0 0 0
tftp_u 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0
tim_i 0 0 0 0 0 0 0 0 0 0 0 7 0 0 5 0 0 0 0 0 0 0 0
time 0 0 0 0 0 13 0 0 0 1040 1 509 0 0 0 13 0 3 0 0 0 0 0
urh_i 0 0 0 0 0 0 0 0 0 0 0 148 0 0 0 0 0 0 0 0 0 0 0
urp_i 0 0 0 0 0 0 0 0 0 0 0 5375 0 0 0 0 0 3 0 0 0 0 0
uucp 0 0 0 0 0 0 0 0 0 1027 1 0 0 0 0 7 0 6 0 0 0 0 0
uucp_path 0 0 0 0 0 0 0 0 0 1044 1 0 0 0 0 10 0 2 0 0 0 0 0
vmnet 0 0 0 0 0 0 0 0 0 1041 1 0 0 0 0 9 0 2 0 0 0 0 0
whois 0 0 0 0 0 13 0 0 0 1042 1 0 0 0 0 14 0 3 0 0 0 0 0
# service特征与target_type交叉表
pd.crosstab(df.service,df.target_type)
target_type DOS PROBING R2L U2R normal.
service
IRC 0 1 0 0 520
X11 0 6 0 0 129
Z39_50 1066 12 0 0 0
aol 0 2 0 0 0
auth 1037 17 0 0 2328
bgp 1035 12 0 0 0
courier 1011 10 0 0 0
csnet_ns 1038 13 0 0 0
ctf 1037 31 0 0 0
daytime 1037 19 0 0 0
discard 1040 19 0 0 0
domain 1046 29 0 0 38
domain_u 0 9 0 0 57773
echo 1040 19 0 0 0
eco_i 0 12570 0 0 3768
ecr_i 2808145 59 0 0 3456
efs 1032 10 0 0 0
exec 1035 10 0 0 0
finger 1820 54 0 0 5017
ftp 1042 35 313 3 3821
ftp_data 1805 54 733 12 38093
gopher 1039 38 0 0 0
harvest 0 2 0 0 0
hostnames 1037 13 0 0 0
http 4004 37 4 0 619046
http_2784 0 1 0 0 0
http_443 1036 8 0 0 0
http_8001 0 2 0 0 0
imap4 1043 11 12 0 3
iso_tsap 1038 14 0 0 0
klogin 1040 10 0 0 0
kshell 1030 10 0 0 0
ldap 1033 8 0 0 0
link 1038 31 0 0 0
login 1032 11 2 0 0
mtp 1044 32 0 0 0
name 1036 31 0 0 0
netbios_dgm 1039 13 0 0 0
netbios_ns 1041 13 0 0 0
netbios_ssn 1042 13 0 0 0
netstat 1038 18 0 0 0
nnsp 1030 8 0 0 0
nntp 1043 16 0 0 0
ntp_u 0 0 0 0 3833
other 1022 15103 5 3 56520
pm_dump 0 5 0 0 0
pop_2 1043 12 0 0 0
pop_3 1046 13 0 0 922
printer 1034 11 0 0 0
private 1014699 12279 0 0 73853
red_i 0 0 0 0 9
remote_job 1041 32 0 0 0
rje 1038 32 0 0 0
shell 1034 12 0 0 5
smtp 1140 43 0 0 95371
sql_net 1039 13 0 0 0
ssh 1039 29 0 0 7
sunrpc 1043 13 0 0 0
supdup 1042 18 0 0 0
systat 1038 18 0 0 0
telnet 1924 35 57 34 2227
tftp_u 0 0 0 0 3
tim_i 5 0 0 0 7
time 1040 30 0 0 509
urh_i 0 0 0 0 148
urp_i 0 3 0 0 5375
uucp 1027 14 0 0 0
uucp_path 1044 13 0 0 0
vmnet 1041 12 0 0 0
whois 1042 31 0 0 0
pd.crosstab(df.flag,df.target).T
flag OTH REJ RSTO RSTOS0 RSTR S0 S1 S2 S3 SF SH
target
back. 0 0 0 0 91 0 2 5 0 2105 0
buffer_overflow. 0 0 1 0 0 0 0 0 0 29 0
ftp_write. 0 0 0 0 0 0 0 0 0 8 0
guess_passwd. 0 0 45 0 4 0 0 0 2 2 0
imap. 0 0 0 0 0 1 1 0 0 6 4
ipsweep. 0 823 36 0 0 0 0 0 0 11622 0
land. 0 0 0 0 0 21 0 0 0 0 0
loadmodule. 0 0 0 0 0 0 0 0 0 9 0
multihop. 0 0 0 0 0 0 0 0 0 7 0
neptune. 0 199970 4600 0 0 867446 0 0 0 1 0
nmap. 0 0 0 0 0 0 0 0 0 1282 1034
normal. 13 53473 600 0 334 424 528 153 46 917208 2
perl. 0 0 0 0 0 0 0 0 0 3 0
phf. 0 0 0 0 0 0 0 0 0 4 0
pod. 0 0 0 0 0 0 0 0 0 264 0
portsweep. 44 2330 55 122 7663 191 0 0 0 8 0
rootkit. 0 0 0 0 0 0 0 0 0 10 0
satan. 0 12278 6 0 1 1746 1 2 1 1857 0
smurf. 0 0 0 0 0 0 0 0 0 2807886 0
spy. 0 0 0 0 0 0 0 0 0 2 0
teardrop. 0 0 0 0 0 0 0 0 0 979 0
warezclient. 0 0 1 0 1 0 0 1 1 1016 0
warezmaster. 0 0 0 0 0 0 0 0 0 20 0
pd.crosstab(df.flag,df.target_type)
target_type DOS PROBING R2L U2R normal.
flag
OTH 0 44 0 0 13
REJ 199970 15431 0 0 53473
RSTO 4600 97 46 1 600
RSTOS0 0 122 0 0 0
RSTR 91 7664 5 0 334
S0 867467 1937 1 0 424
S1 2 1 1 0 528
S2 5 2 1 0 153
S3 0 1 3 0 46
SF 2811235 14769 1065 51 917208
SH 0 1034 4 0 2
df_copy= df.copy()
# df_t_copy = df_test.copy()

数据处理

特征筛选

删除只有一个值的特征

only_1_field = []
for i in df.columns:if len(df[i].value_counts())<=1:only_1_field.append(i)
only_1_field
['num_outbound_cmds']
for i in only_1_field:del df[i]

筛选含信息量少的特征

#统计每个特征各种value出现的次数,并将其放入字典中
# features_dic = {}
# for i in df.columns.values.tolist():
#     features_dic[i]=dict(df.loc[:,i].value_counts())
# features_dic
{'duration': {0: 4779492,1: 23886,2: 8139,3: 6016,5: 5576,2630: 5061,4: 3738,14: 2673,10: 1746,7: 1651,6: 1625,9: 1061,2715: 1009,8: 964,11: 914,15: 888,2625: 855,2620: 744,27: 712,2594: 708,19: 664,16: 649,12: 639,23: 601,28: 589,13: 573,20: 531,21: 506,2638: 492,22: 489,30: 467,25: 441,2634: 434,24: 403,2639: 400,26: 389,18: 331,2470: 324,2350: 311,2593: 310,31: 293,29: 288,2450: 285,17: 262,1995: 248,2474: 226,7030: 205,32: 186,2718: 168,2420: 166,6649: 157,4564: 153,2592: 147,2000: 140,2717: 138,2595: 133,2635: 127,35: 126,2472: 125,12990: 123,2586: 122,34: 121,33: 121,4774: 121,17640: 117,2712: 114,2714: 109,36: 107,2585: 107,5610: 107,63: 105,37: 105,2476: 103,10339: 102,2465: 100,5415: 98,38: 98,4785: 95,40: 95,2719: 93,39: 91,9280: 90,2723: 89,2240: 88,2705: 88,1470: 87,2429: 87,2469: 86,2540: 85,1390: 84,15629: 82,5065: 82,2710: 82,1355: 82,2975: 81,1935: 80,1944: 80,7794: 79,20234: 74,2431: 74,45: 72,2579: 72,2596: 71,2708: 71,4620: 71,2584: 71,12810: 69,1360: 67,899: 67,2713: 66,41: 66,2422: 65,43: 65,2238: 64,16799: 64,1395: 63,46: 63,42: 63,2597: 62,2430: 61,48: 61,2583: 61,5605: 60,3990: 59,4024: 59,4185: 59,1399: 58,4555: 58,44: 57,16790: 56,4015: 56,7194: 56,4934: 56,60: 56,2580: 55,4935: 54,4939: 54,47: 54,2629: 53,4776: 53,2588: 53,1930: 52,9721: 50,50: 49,4940: 49,2589: 49,4298: 49,3995: 49,15449: 49,9100: 48,2703: 48,17037: 48,4250: 48,2615: 48,4729: 47,5561: 47,15440: 46,1404: 46,17038: 45,2477: 45,2473: 44,1385: 44,14498: 44,2590: 44,3924: 43,2368: 43,53: 43,1974: 41,49: 41,58: 40,15620: 40,9400: 40,7412: 40,11010: 40,2367: 39,14497: 39,52: 39,54: 39,2432: 39,10340: 38,7782: 37,2440: 37,7783: 37,55: 37,1357: 37,2242: 36,4717: 36,2275: 36,3030: 36,12999: 36,4786: 35,2445: 35,2460: 35,4705: 35,1704: 34,9427: 34,51: 34,4718: 34,62: 34,2599: 33,795: 33,1990: 33,17049: 33,7251: 33,2458: 32,4773: 32,61: 32,4196: 32,4197: 32,5619: 32,1925: 32,67: 31,59: 31,9426: 31,9415: 31,7247: 30,89: 30,3890: 30,64: 30,57: 29,3029: 29,71: 29,1939: 28,20232: 28,66: 28,2433: 28,2699: 28,2170: 28,9247: 28,56: 28,93: 27,4545: 27,1324: 27,7770: 27,4713: 27,2716: 27,5600: 27,4931: 26,70: 26,2578: 26,2468: 26,4710: 26,2379: 26,3999: 26,2354: 26,4550: 26,2454: 25,94: 25,12132: 25,6764: 25,9246: 25,7025: 25,68: 25,1979: 25,2459: 25,7019: 25,5678: 24,11020: 24,3916: 24,20226: 24,81: 24,4029: 24,102: 24,2633: 24,72: 24,2722: 24,1380: 24,900: 23,69: 23,127: 23,9268: 23,4449: 22,8034: 22,10345: 22,19764: 22,2285: 22,74: 22,4010: 22,2315: 22,65: 22,10326: 22,92: 22,15621: 22,1110: 22,2090: 22,3985: 22,5557: 21,2304: 21,2001: 21,9375: 21,2720: 21,4777: 21,2598: 21,2726: 21,1993: 21,210: 21,5394: 21,12275: 21,1482: 20,2628: 20,88: 20,2005: 20,2010: 20,7416: 20,10331: 20,97: 20,78: 20,14155: 20,2265: 20,73: 19,3810: 19,1112: 19,4725: 19,2180: 19,5099: 19,2631: 19,12819: 19,5614: 19,7780: 19,3304: 19,349: 19,4563: 19,2664: 19,3980: 19,96: 19,1985: 19,12985: 19,7034: 19,1988: 19,560: 19,76: 19,75: 19,4789: 19,4791: 19,5114: 18,4639: 18,4629: 18,2345: 18,4004: 18,6636: 18,2230: 18,85: 18,98: 18,1915: 18,1105: 18,2405: 18,103: 18,2587: 18,2300: 18,1460: 18,2003: 18,16780: 17,2343: 17,4724: 17,2229: 17,3769: 17,2320: 17,1905: 17,107: 17,7405: 17,110: 17,79: 17,4005: 17,1439: 17,12980: 17,2334: 17,1964: 17,2707: 17,1394: 17,4712: 17,2414: 17,77: 17,2632: 16,745: 16,2427: 16,2239: 16,9725: 16,2428: 16,155: 16,16785: 16,1465: 16,4060: 16,4294: 16,6631: 16,3954: 16,2609: 16,2582: 16,1085: 16,3920: 16,1730: 16,82: 16,190: 16,2643: 16,2644: 16,12994: 16,9412: 16,84: 16,2725: 16,91: 16,1169: 15,2311: 15,300: 15,4715: 15,2237: 15,2466: 15,4630: 15,1999: 15,2467: 15,90: 15,23829: 15,19753: 15,6641: 15,550: 15,2728: 15,2325: 15,2139: 15,86: 15,99: 15,1940: 15,1969: 15,150: 15,2234: 15,2591: 15,2645: 15,2416: 15,3854: 15,141: 15,22270: 15,3918: 15,5080: 15,1254: 15,159: 15,2365: 15,179: 15,22474: 15,4193: 15,10321: 15,1315: 14,3625: 14,1810: 14,3586: 14,192: 14,4440: 14,2471: 14,1978: 14,174: 14,2409: 14,120: 14,119: 14,310: 14,116: 14,2305: 14,1125: 14,3582: 14,2309: 14,2575: 14,83: 14,80: 14,4019: 14,4711: 14,4023: 14,2702: 14,2008: 14,611: 14,2355: 14,19752: 14,2348: 14,1375: 14,254: 14,1481: 14,820: 14,170: 14,1450: 14,2424: 14,4834: 14,1943: 13,2418: 13,101: 13,1403: 13,2080: 13,4559: 13,5770: 13,11029: 13,259: 13,15430: 13,3720: 13,9411: 13,2740: 13,4300: 13,359: 13,4453: 13,2337: 13,6420: 13,15435: 13,95: 13,113: 13,2104: 13,315: 13,252: 13,131: 13,855: 13,3305: 13,1900: 13,189: 13,1115: 13,1319: 13,1983: 13,4610: 13,4615: 13,111: 13,2260: 13,1910: 12,251: 12,9084: 12,6645: 12,7070: 12,240: 12,245: 12,2165: 12,10335: 12,5560: 12,267: 12,2175: 12,230: 12,263: 12,256: 12,7778: 12,2944: 12,825: 12,223: 12,1398: 12,270: 12,1824: 12,10344: 12,5014: 12,2360: 12,7267: 12,3170: 12,169: 12,4175: 12,4189: 12,100: 12,12800: 12,104: 12,156: 12,15610: 12,112: 12,640: 12,151: 12,2640: 12,2425: 12,714: 12,3939: 12,490: 12,1275: 12,465: 12,12998: 12,440: 12,444: 12,460: 12,132: 12,2410: 12,1154: 12,134: 12,180: 12,1615: 12,559: 12,4679: 12,5075: 12,585: 12,2135: 12,2464: 12,2721: 12,2349: 12,5085: 12,5090: 12,2004: 12,2700: 12,2684: 11,244: 11,1095: 11,1709: 11,7791: 11,475: 11,4180: 11,3994: 11,5768: 11,276: 11,2075: 11,249: 11,130: 11,4625: 11,11707: 11,12199: 11,87: 11,1259: 11,1998: 11,118: 11,2535: 11,4708: 11,1364: 11,4720: 11,1175: 11,305: 11,3034: 11,5025: 11,1349: 11,10508: 11,1359: 11,2012: 11,2015: 11,2544: 11,2964: 11,1270: 11,2654: 11,1458: 11,1455: 11,4445: 11,4520: 11,804: 11,830: 11,233: 11,4254: 11,5620: 11,3185: 11,161: 11,164: 11,5063: 11,2144: 11,7250: 11,12701: 11,1471: 11,1463: 11,12703: 11,404: 11,400: 11,235: 11,5595: 11,1535: 11,147: 11,2138: 11,280: 11,4474: 11,11730: 11,5374: 11,1444: 11,449: 11,239: 11,5069: 11,2108: 11,6650: 10,770: 10,555: 10,3005: 10,3733: 10,355: 10,7246: 10,354: 10,3123: 10,1788: 10,3124: 10,375: 10,549: 10,544: 10,1619: 10,606: 10,298: 10,7244: 10,1545: 10,414: 10,3160: 10,645: 10,455: 10,4554: 10,2413: 10,2270: 10,1835: 10,3060: 10,1832: 10,7775: 10,1830: 10,3058: 10,2525: 10,2375: 10,395: 10,7140: 10,1691: 10,429: 10,4605: 10,1823: 10,12805: 10,624: 10,2475: 10,620: 10,3577: 10,4500: 10,1805: 10,3749: 10,5419: 10,136: 10,172: 10,1156: 10,269: 10,1989: 10,261: 10,149: 10,234: 10,255: 10,115: 10,241: 10,206: 10,12128: 10,9275: 10,3929: 10,4345: 10,122: 10,7876: 10,123: 10,2959: 10,126: 10,2930: 10,266: 10,9408: 10,277: 10,2034: 10,202: 10,5423: 10,9410: 10,1195: 10,368: 9,2149: 9,5089: 9,331: 9,1079: 9,3846: 9,447: 9,2100: 9,138: 9,140: 9,1827: 9,9269: 9,5057: 9,133: 9,1965: 9,5077: 9,195: 9,4340: 9,4955: 9,6763: 9,670: 9,2283: 9,1440: 9,1720: 9,2942: 9,198: 9,2462: 9,199: 9,7248: 9,1445: 9,128: 9,2652: 9,2624: 9,205: 9,2494: 9,216: 9,7268: 9,399: 9,171: 9,1478: 9,173: 9,16794: 9,5104: 9,1480: 9,356: 9,175: 9,2236: 9,357: 9,1808: 9,177: 9,750: 9,1475: 9,765: 9,4255: 9,148: 9,4919: 9,4296: 9,4245: 9,2115: 9,2114: 9,222: 9,3578: 9,5410: 9,659: 9,4515: 9,1514: 9,1774: 9,7274: 9,246: 9,1650: 9,265: 9,1686: 9,9419: 9,15624: 9,4173: 9,12811: 9,297: 9,4183: 9,12812: 9,5769: 9,1244: 9,125: 9,264: 9,4723: 9,4721: 9,629: 9,2085: 9,516: 9,105: 9,1365: 9,9095: 9,12203: 9,4695: 9,5309: 9,5752: 9,2995: 9,580: 9,2055: 9,1200: 9,4070: 9,275: 9,1230: 9,3360: 9,2560: 9,6415: 9,1370: 9,4628: 9,3744: 9,904: 9,15615: 9,845: 9,1100: 9,114: 9,109: 9,9069: 9,2520: 9,9887: 9,485: 9,480: 9,1405: 9,2370: 9,1111: 9,17639: 9,5670: 8,3260: 8,1005: 8,2915: 8,540: 8,4134: 8,3620: 8,1190: 8,1224: 8,405: 8,12220: 8,3295: 8,380: 8,2289: 8,534: 8,3296: 8,12325: 8,385: 8,184: 8,1655: 8,2095: 8,15444: 8,17045: 8,994: 8,2228: 8,124: 8,2224: 8,17025: 8,2739: 8,388: 8,2294: 8,1000: 8,17028: 8,2225: 8,5749: 8,1120: 8,4556: 8,1950: 8,154: 8,4209: 8,2110: 8,4790: 8,1116: 8,2423: 8,3905: 8,1750: 8,2637: 8,2646: 8,694: 8,2084: 8,106: 8,3219: 8,9259: 8,1074: 8,1285: 8,158: 8,2669: 8,454: 8,755: 8,2660: 8,1107: 8,3998: 8,15628: 8,3245: 8,642: 8,4178: 8,1525: 8,2885: 8,135: 8,5390: 8,6909: 8,418: 8,421: 8,3921: 8,658: 8,915: 8,160: 8,1326: 8,814: 8,905: 8,1825: 8,201: 8,301: 8,3800: 8,227: 8,3790: 8,4420: 8,5053: 8,22279: 8,5058: 8,1344: 8,3725: 8,209: 8,296: 8,215: 8,1419: 8,293: 8,1920: 8,2969: 8,217: 8,338: 8,4936: 8,6014: 8,5076: 8,7787: 8,3090: 8,281: 8,6620: 8,9079: 8,211: 8,884: 8,3741: 8,5051: 8,4335: 8,2329: 8,1454: 8,2538: 8,3115: 8,2550: 8,7773: 8,2548: 8,200: 8,910: 8,370: 8,253: 8,7786: 7,4415: 7,484: 7,7790: 7,5795: 7,317: 7,486: 7,7784: 7,2960: 7,7159: 7,7399: 7,2820: 7,3054: 7,17035: 7,10381: 7,8038: 7,2524: 7,655: 7,473: 7,1840: 7,319: 7,2659: 7,9645: 7,...},'protocol_type': {'icmp': 2833545, 'tcp': 1870598, 'udp': 194288},'service': {'ecr_i': 2811660,'private': 1100831,'http': 623091,'smtp': 96554,'other': 72653,'domain_u': 57782,'ftp_data': 40697,'eco_i': 16338,'finger': 6891,'urp_i': 5378,'ftp': 5214,'telnet': 4277,'ntp_u': 3833,'auth': 3382,'pop_3': 1981,'time': 1579,'domain': 1113,'Z39_50': 1078,'gopher': 1077,'mtp': 1076,'ssh': 1075,'whois': 1073,'remote_job': 1073,'rje': 1070,'link': 1069,'imap4': 1069,'ctf': 1068,'name': 1067,'supdup': 1060,'discard': 1059,'echo': 1059,'nntp': 1059,'uucp_path': 1057,'sunrpc': 1056,'netstat': 1056,'daytime': 1056,'systat': 1056,'netbios_ssn': 1055,'pop_2': 1055,'netbios_ns': 1054,'vmnet': 1053,'sql_net': 1052,'netbios_dgm': 1052,'iso_tsap': 1052,'shell': 1051,'csnet_ns': 1051,'klogin': 1050,'hostnames': 1050,'bgp': 1047,'exec': 1045,'printer': 1045,'login': 1045,'http_443': 1044,'efs': 1042,'ldap': 1041,'uucp': 1041,'kshell': 1040,'nnsp': 1038,'courier': 1021,'IRC': 521,'urh_i': 148,'X11': 135,'tim_i': 12,'red_i': 9,'pm_dump': 5,'tftp_u': 3,'harvest': 2,'http_8001': 2,'aol': 2,'http_2784': 1},'flag': {'SF': 3744328,'S0': 869829,'REJ': 268874,'RSTR': 8094,'RSTO': 5344,'SH': 1040,'S1': 532,'S2': 161,'RSTOS0': 122,'OTH': 57,'S3': 50},'src_bytes': {1032: 2280245,0: 1152546,520: 527731,105: 73899,147: 27324,146: 20311,42: 10726,8: 10481,44: 9219,145: 9218,30: 8664,46: 8200,45: 7274,216: 5947,215: 5782,209: 5730,221: 5653,224: 5638,222: 5580,208: 5569,217: 5542,232: 5500,214: 5462,235: 5423,218: 5395,220: 5347,223: 5308,230: 5282,233: 5257,229: 5238,228: 5222,219: 5190,212: 5167,225: 5161,207: 5099,234: 5093,236: 5078,226: 5073,306: 4998,227: 4976,213: 4906,211: 4858,210: 4778,231: 4598,305: 4578,237: 4574,206: 4559,205: 4522,308: 4490,297: 4269,309: 4252,245: 4219,303: 4195,204: 4177,238: 4148,43: 4142,203: 4079,48: 4061,295: 4029,200: 3933,293: 3912,304: 3904,316: 3887,294: 3880,296: 3859,307: 3837,317: 3815,239: 3793,302: 3769,301: 3747,310: 3721,315: 3699,314: 3699,322: 3673,299: 3661,201: 3658,325: 3652,298: 3612,289: 3605,241: 3596,291: 3585,300: 3571,288: 3504,324: 3489,244: 3477,323: 3439,312: 3431,292: 3429,18: 3390,319: 3381,240: 3355,321: 3345,199: 3327,202: 3326,243: 3288,192: 3267,287: 3261,320: 3248,313: 3241,242: 3199,318: 3154,311: 3101,286: 3044,290: 3012,198: 3008,33: 2971,246: 2876,327: 2868,326: 2836,334: 2832,247: 2830,285: 2786,197: 2743,1: 2729,249: 2710,196: 2614,284: 2605,331: 2597,251: 2566,328: 2558,36: 2542,333: 2542,330: 2539,248: 2525,250: 2500,252: 2497,332: 2449,183: 2441,329: 2432,281: 2401,253: 2315,182: 2296,282: 2231,336: 2207,383: 2172,283: 2154,181: 2152,54540: 2144,280: 2120,335: 2116,193: 2112,254: 2094,338: 2053,259: 2008,276: 1988,260: 1968,279: 1944,194: 1929,258: 1928,337: 1921,10: 1910,195: 1895,255: 1894,256: 1886,339: 1882,340: 1874,278: 1860,261: 1782,257: 1772,9: 1770,34: 1689,342: 1683,190: 1677,341: 1672,262: 1652,277: 1643,263: 1579,264: 1557,748: 1538,343: 1525,274: 1504,344: 1481,275: 1472,191: 1444,567: 1416,345: 1415,272: 1392,641: 1388,59: 1387,189: 1386,265: 1374,268: 1365,270: 1338,346: 1305,347: 1293,29: 1292,273: 1265,269: 1265,32: 1265,267: 1245,352: 1201,349: 1172,266: 1171,271: 1167,350: 1153,35: 1143,188: 1142,348: 1117,28: 1065,159: 1061,7: 1046,351: 1038,160: 1030,161: 1002,163: 978,31: 976,187: 922,164: 904,185: 878,186: 874,162: 873,353: 835,151: 807,172: 782,355: 758,354: 755,184: 751,165: 748,144: 747,358: 746,167: 734,158: 701,179: 684,178: 678,175: 673,356: 670,157: 670,180: 659,171: 649,148: 640,177: 636,357: 631,168: 616,170: 616,373: 614,174: 614,166: 613,173: 611,740: 605,169: 602,6: 601,38: 599,155: 590,176: 572,152: 569,374: 546,359: 546,516: 542,149: 516,143: 516,360: 507,363: 505,156: 502,364: 501,361: 498,153: 490,154: 434,368: 409,17: 406,362: 404,369: 393,88: 393,37: 383,365: 359,366: 348,142: 345,371: 339,965: 339,370: 338,1010: 335,150: 335,854: 330,879: 327,1030: 324,372: 324,960: 320,884: 320,1214: 318,1470: 311,367: 310,852: 308,1480: 307,749: 302,375: 298,78: 296,12: 296,768: 295,376: 294,1363: 289,1556: 287,91: 285,637: 285,95: 284,1199: 284,14: 279,607: 275,1766: 273,141: 272,1690: 272,1153: 270,1699: 270,1830: 268,1342: 266,1376: 265,1339: 257,19: 250,1874: 248,3468: 248,377: 247,7280: 247,8325: 247,3888: 247,3690: 247,6062: 246,15377: 246,7321: 246,15876: 246,7196: 246,41: 246,3909: 245,4539: 245,3775: 245,7077: 245,10389: 245,9178: 245,12324: 244,4763: 244,13239: 244,6601: 244,10044: 244,12241: 243,2890: 242,10261: 242,13: 241,26408: 240,35195: 240,10073: 240,12286: 240,378: 239,88382: 238,15722: 236,7940: 235,9198: 235,16787: 234,10272: 234,40: 234,61298: 233,2881: 233,2056: 232,12983: 232,6932: 229,4753: 227,7162: 227,2191: 226,2194619: 226,12165: 226,4316: 226,11376: 225,5319: 225,5891: 225,5558: 225,7422: 225,8766: 225,4892: 225,5336: 225,14052: 223,52: 221,40494: 218,3676: 215,380: 214,9311: 213,7233: 213,8173: 213,5929: 213,8737: 213,11: 212,379: 212,16115: 212,7012: 211,14189: 210,24: 210,39: 207,71: 203,554: 201,495: 197,5: 190,102: 187,47: 185,387: 185,388: 182,381: 178,382: 169,26: 157,491: 154,86: 149,140: 149,104: 146,110: 145,100: 142,389: 141,384: 140,113: 135,111: 132,175337: 126,832: 125,501760: 125,940: 123,385: 122,890: 118,801: 117,22: 117,926: 117,386: 115,938: 115,859: 113,918: 112,802: 112,849: 111,821: 110,877: 109,394: 109,89: 109,64: 109,956: 108,947: 108,74: 108,882: 107,861: 107,865: 106,1194: 106,937: 105,963: 105,756: 105,724: 105,800: 104,911: 104,878: 104,887: 103,881: 103,840: 103,834: 103,949: 103,722: 102,843: 102,880: 102,396: 102,893: 102,944: 102,390: 101,401: 101,751: 101,760: 101,803: 101,916: 101,1000: 100,889: 100,888: 100,885: 99,932: 98,931: 98,1208: 98,804: 98,833: 98,841: 98,903: 98,828: 98,1109: 98,900: 97,393: 97,827: 97,934: 97,946: 97,933: 97,875: 97,904: 97,936: 97,939: 97,49: 97,790: 96,876: 96,788: 96,919: 96,1006: 96,391: 96,1058: 96,767: 96,764: 96,53: 96,87: 96,807: 95,1171: 95,986: 95,810: 95,871: 95,766: 95,978: 95,941: 94,907: 94,753: 94,708: 94,929: 94,920: 94,805: 93,798: 93,1206: 93,837: 93,850: 93,909: 93,990: 93,704: 92,754: 92,725: 92,862: 92,781: 92,1116: 92,763: 92,999: 92,948: 92,772: 91,945: 91,908: 91,723: 91,770: 91,883: 91,792: 91,797: 91,1031: 91,814: 91,1081: 91,964: 91,856: 91,20: 91,988: 91,1033: 90,818: 90,910: 90,906: 90,728: 89,771: 89,902: 89,1186: 89,1183: 89,831: 89,860: 89,733: 89,846: 89,847: 89,1044: 89,895: 89,409: 89,1034: 89,915: 89,928: 89,953: 89,874: 89,870: 89,1018: 89,973: 89,950: 88,914: 88,758: 88,1174: 88,905: 88,1141: 88,829: 88,923: 88,897: 88,959: 88,784: 88,1222: 88,892: 88,987: 87,1195: 87,858: 87,1188: 87,966: 87,975: 87,398: 87,1187: 87,785: 87,735: 87,1176: 87,1022: 87,1074: 87,886: 87,809: 87,759: 87,851: 87,773: 87,825: 87,815: 87,1067: 87,891: 86,1042: 86,738: 86,63: 86,989: 86,1083: 86,951: 86,1189: 86,1102: 86,1015: 86,718: 86,1009: 86,791: 85,848: 85,873: 85,839: 85,957: 85,974: 85,126: 85,1132: 85,1001: 85,935: 85,983: 85,925: 85,977: 85,705: 84,776: 84,1046: 84,952: 84,868: 84,961: 84,817: 84,998: 84,1196: 84,761: 84,981: 84,721: 84,899: 84,997: 84,835: 84,675: 84,838: 83,795: 83,853: 83,1184: 83,1003: 83,1005: 83,712: 83,845: 83,930: 83,1216: 83,922: 83,717: 83,779: 82,896: 82,1117: 82,898: 82,66: 82,984: 82,715: 82,1179: 82,1207: 82,1024: 82,1232: 82,962: 82,1143: 82,867: 82,1008: 81,1054: 81,1039: 81,1129: 81,1055: 81,1047: 81,787: 81,1173: 81,954: 81,739: 81,762: 81,912: 81,1218: 81,774: 81,991: 81,826: 81,958: 81,972: 81,1077: 80,1217: 80,1181: 80,1155: 80,1170: 80,392: 80,1210: 80,745: 80,968: 80,703: 80,942: 80,690: 80,697: 80,921: 80,970: 80,979: 80,744: 80,1012: 80,917: 80,811: 80,741: 80,816: 80,820: 80,752: 80,863: 80,780: 80,943: 79,1064: 79,1191: 79,1185: 79,955: 79,1093: 79,1007: 79,1028: 79,1041: 79,1105: 79,855: 79,967: 79,702: 79,765: 79,844: 79,736: 79,786: 79,783: 79,746: 79,812: 79,1178: 78,1086: 78,694: 78,1241: 78,1053: 78,924: 78,1079: 78,1233: 78,1204: 78,796: 78,688: 78,1245: 78,1107: 78,1131: 78,1112: 78,726: 78,727: 78,1124: 78,996: 78,995: 78,913: 77,1139: 77,1133: 77,1036: 77,698: 77,1076: 77,1294: 77,680: 77,1242: 77,1113: 77,1011: 77,1180: 77,869: 77,1175: 77,1254: 77,794: 77,1160: 77,711: 76,1224: 76,901: 76,706: 76,927: 76,1060: 76,1262: 76,1284: 76,982: 76,1213: 76,650: 76,1146: 76,731: 76,1192: 76,1103: 76,1056: 75,1266: 75,1013: 75,1095: 75,793: 75,1002: 75,864: 75,1168: 75,1281: 75,1148: 75,673: 75,976: 75,1190: 75,97: 75,750: 75,1202: 75,729: 75,1050: 75,1043: 75,1205: 75,1203: 75,1236: 75,1265: 75,1252: 74,1075: 74,1255: 74,1169: 74,1238: 74,1037: 74,1104: 74,1209: 74,1091: 74,667: 74,1147: 74,1082: 74,1062: 74,1061: 74,1123: 74,730: 74,757: 74,737: 74,819: 74,755: 74,743: 74,842: 73,400: 73,1240: 73,985: 73,1237: 73,777: 73,866: 73,439: 73,692: 73,1228: 73,980: 73,830: 73,1122: 73,857: 73,1065: 73,1057: 73,823: 72,1275: 72,1149: 72,732: 72,1087: 72,1128: 72,769: 72,1142: 72,1014: 72,813: 72,1094: 72,700: 72,720: 72,1140: 72,1137: 72,1215: 72,1371: 72,1130: 72,701: 72,1230: 72,971: 72,695: 72,1264: 72,92: 71,707: 71,1045: 71,1158: 71,1496: 71,1023: 71,685: 71,80: 71,1172: 71,139: 71,678: 71,836: 71,1127: 71,775: 71,1277: 71,652: 70,1035: 70,1099: 70,1333: 70,1038: 70,1229: 70,1261: 70,696: 70,1351: 70,808: 70,1136: 70,799: 70,710: 70,1068: 70,742: 70,1405: 70,719: 70,666: 70,1134: 70,1144: 70,1467: 70,1026: 70,1246: 70,992: 70,1247: 70,824: 70,1308: 69,1070: 69,1223: 69,822: 69,1052: 69,1029: 69,1125: 69,872: 69,1407: 69,1059: 69,1428: 69,789: 69,1089: 69,1200: 69,1301: 69,683: 69,894: 69,1167: 69,1156: 69,1020: 69,1114: 69,1193: 68,1197: 68,1110: 68,1182: 68,734: 68,395: 68,1084: 68,103: 68,806: 68,682: 68,1025: 68,1446: 68,1270: 68,1115: 68,638: 68,687: 68,1019: 68,1291: 68,671: 68,1268: 68,1239: 68,1326: 68,1231: 68,1225: 68,1135: 68,1383: 68,1138: 68,1267: 68,782: 67,1327: 67,1437: 67,1100: 67,1357: 67,677: 67,1211: 67,681: 67,1258: 67,1004: 67,1243: 67,397: 67,1078: 67,1121: 67,1226: 67,1051: 67,1069: 67,1502: 67,714: 67,1212: 67,1120: 67,1106: 66,1145: 66,709: 66,1027: 66,1340: 66,670: 66,644: 66,1287: 66,1372: 66,658: 66,686: 66,689: 66,1021: 66,1335: 66,1414: 66,1392: 66,1072: 66,1501: 65,1283: 65,1435: 65,1088: 65,1162: 65,1285: 65,1073: 65,1234: 65,1098: 65,404: 65,1473: 65,51: 65,1101: 65,1397: 64,...},'dst_bytes': {0: 4064854,105: 44713,147: 24910,146: 22536,145: 9500,42: 9242,330: 8335,331: 7690,329: 7650,332: 7264,328: 6852,327: 6134,333: 6014,334: 5560,46: 5142,335: 4030,48: 3914,326: 3582,336: 3551,337: 3445,2698: 3109,1380: 2732,1075: 2507,2239: 2487,3222: 2481,2507: 2470,44: 2426,325: 2204,8314: 2136,338: 2104,392: 2082,313: 2062,255: 1768,324: 1763,280: 1742,370: 1692,365: 1687,406: 1681,397: 1675,261: 1600,285: 1540,130: 1334,398: 1304,294: 1271,296: 1242,283: 1227,115: 1212,339: 1206,284: 1203,362: 1166,269: 1148,131: 1146,310: 1135,321: 1135,45: 1128,369: 1104,354: 1084,281: 1065,4: 1061,366: 1056,307: 1054,279: 1053,788: 1050,128: 1038,301: 1031,608: 1028,308: 1021,774: 1010,622: 982,282: 979,367: 969,364: 957,1425: 957,263: 956,110: 933,762: 932,1719: 922,753: 889,132: 889,259: 881,188: 878,1227: 874,253: 870,511: 866,267: 858,434: 853,133: 844,114: 826,274: 820,458: 799,368: 791,2445: 782,2531: 780,258: 779,1484: 763,312: 761,3200: 759,129: 755,468: 753,1856: 752,480: 733,363: 733,113: 733,38: 732,293: 731,134: 726,396: 723,278: 721,135: 720,1415: 715,1721: 714,405: 713,138: 713,1695: 712,290: 705,1600: 705,260: 702,30: 700,1875: 695,2431: 692,318: 689,12884: 686,111: 679,43: 677,1718: 673,12922: 672,1680: 672,340: 671,323: 670,136: 670,1481: 670,607: 666,1694: 657,1645: 654,127: 651,1651: 649,1408: 649,770: 648,421: 636,1215: 635,315: 633,885: 631,488: 629,771: 628,7333: 628,389: 624,37: 623,659: 620,453: 620,454: 620,1211: 616,303: 614,2446: 611,765: 608,525: 608,342: 606,88: 605,482: 603,1423: 599,1108: 598,361: 594,1483: 590,431: 589,1404: 587,1451: 587,391: 586,108: 581,137: 577,810: 573,309: 568,4841: 566,39: 561,314: 560,412: 558,882: 554,139: 552,388: 550,34: 548,372: 548,343: 544,112: 544,379: 543,272: 542,383: 541,306: 540,341: 536,1395: 533,757: 532,1247: 531,295: 530,275: 530,955: 523,1158: 523,449: 518,356: 514,373: 512,126: 512,751: 511,75: 511,1471: 503,375: 503,371: 501,462: 500,1461: 500,807: 499,270: 498,767: 490,36: 487,2251: 486,316: 485,1112: 483,9284: 478,481: 477,277: 475,140: 472,403: 469,1931: 468,2543: 467,402: 467,584: 466,117: 466,382: 466,101: 464,1587: 463,654: 461,597: 459,276: 457,626: 456,498: 456,193: 455,759: 455,415: 454,304: 451,1630: 450,5280: 449,455: 446,288: 441,70: 441,1511: 440,471: 440,755: 440,273: 439,1065: 439,619: 437,302: 437,934: 436,374: 436,1197: 436,409: 432,390: 432,91: 430,347: 427,784: 427,7203: 427,5790: 426,1374: 426,394: 422,2280: 422,74: 421,940: 419,6534: 418,94: 417,345: 417,422: 415,360: 415,1274: 414,944: 414,25519: 413,1054: 412,761: 411,420: 411,124: 410,526: 410,1701: 408,256: 407,24572: 406,535: 405,7355: 404,1529: 403,1745: 403,484: 402,924: 401,189: 401,891: 401,292: 400,73: 400,660: 399,5989: 397,410: 397,466: 395,633: 394,492: 393,1660: 393,387: 390,266: 388,440: 388,707: 388,1342: 387,6356: 387,299: 382,575: 381,671: 380,1046: 378,1896: 378,667: 377,475: 377,76: 377,305: 376,376: 376,1214: 375,1263: 374,71: 373,1109: 372,1565: 372,95: 371,1364: 371,835: 371,385: 370,404: 370,141: 368,520: 368,495: 367,515: 366,322: 365,1035: 364,1918: 364,1512: 362,248: 361,4624: 361,401: 361,872: 360,426: 359,616: 359,486: 359,573: 359,1564: 357,1069: 352,2367: 352,2996: 352,4137: 351,1973: 349,349: 349,125: 349,35: 348,1576: 348,760: 348,1043: 347,185: 347,588: 346,400: 346,1588: 346,89: 343,1792: 343,1761: 343,997: 343,289: 339,2077: 339,1337: 338,182: 338,359: 338,77: 337,2006: 337,17: 336,516: 336,4073: 336,655: 334,984: 333,594: 332,384: 331,1063: 330,483: 330,1610: 330,1528: 330,451: 329,1024: 328,1248: 328,2313: 328,930: 327,350: 327,478: 326,1212: 325,438: 325,72: 324,497: 324,238: 323,386: 322,1738: 322,602: 321,791: 321,411: 320,3211: 319,424: 319,1233: 318,8422: 316,1713: 316,1705: 316,758: 314,107: 314,2408: 314,695: 314,377: 314,903: 313,442: 312,1722: 311,346: 311,1159: 309,1509: 307,1040: 306,7886: 306,2536: 305,287: 303,430: 303,2720: 301,546: 301,93: 300,7384: 300,4152: 299,741: 298,268: 296,1585: 295,297: 295,271: 294,1072: 293,2241: 293,581: 293,1769: 292,3081: 291,893: 290,736: 290,764: 289,1053: 289,418: 288,491: 288,300: 288,674: 288,798: 287,5381: 287,943: 284,1249: 284,682: 282,439: 282,756: 281,1819: 281,2168: 281,2586: 280,702: 280,320: 279,1410: 279,1925: 278,3675: 278,1541: 277,181: 277,501: 277,408: 276,2649: 275,506: 275,1800: 275,745: 275,2055: 271,78: 271,9986: 268,744: 268,109: 267,435: 267,2486: 266,286: 264,380: 263,614: 262,399: 262,2402: 261,1617: 260,1047: 260,1492: 259,413: 259,928: 259,1187: 259,598: 258,685: 258,6924: 258,311: 258,170: 257,912: 257,609: 257,579: 255,443: 255,92: 255,5200: 254,2005: 254,1767: 253,2688: 253,358: 252,1359: 251,118: 251,414: 251,417: 250,171: 250,3380: 250,754: 249,1039: 249,960: 248,768: 247,2143: 247,809: 247,1088: 247,668: 246,1400: 246,447: 245,2302: 245,998: 244,250: 243,870: 243,1193: 242,187: 242,1691: 242,540: 241,613: 241,378: 240,2568: 240,703: 240,247: 240,978: 239,423: 237,531: 237,617: 237,395: 237,868: 236,3253: 236,769: 235,1376: 235,1817: 235,450: 235,920: 235,479: 234,2750: 234,589: 234,1295: 233,2090: 232,393: 231,731: 231,142: 231,119: 230,1357: 230,407: 230,821: 229,79: 228,636: 227,595: 227,457: 226,32: 226,353: 226,2042: 225,494: 225,1270: 225,1650: 224,662: 224,2265: 222,436: 222,1362: 222,437: 221,1010: 221,664: 221,236: 220,6329: 220,2483: 220,477: 219,298: 219,2714: 219,560: 218,14496: 218,80: 218,935: 217,527: 217,811: 217,4823: 217,828: 216,240: 216,317: 216,873: 216,223: 216,3248: 215,496: 215,29: 215,381: 214,1078: 213,1319: 213,700: 213,3413: 213,452: 213,183: 213,10378: 212,566: 212,1042: 212,3433: 211,776: 211,2474: 210,1281: 210,857: 210,1416: 210,445: 210,2825: 210,555: 209,670: 209,2043: 209,658: 209,1195: 209,2809: 208,799: 208,2071: 208,180: 208,1259: 208,246: 208,2226: 207,474: 207,33: 207,3731: 206,100: 206,983: 205,4097: 205,772: 205,351: 204,1246: 204,808: 204,706: 203,2325: 203,1269: 203,1905: 202,175: 202,839: 202,69: 202,1384: 201,446: 201,4556: 200,604: 200,1125: 200,564: 200,1007: 200,976: 199,1491: 199,344: 199,942: 198,291: 198,1366: 198,864: 197,1155: 197,441: 197,2658: 196,578: 196,9534: 196,646: 196,500: 196,2488: 196,490: 196,933: 195,1335: 195,749: 195,173: 194,766: 194,2013: 194,4768: 193,2126: 192,178: 192,574: 192,911: 192,504: 192,1432: 192,2073: 192,1344: 191,1099: 191,2015: 191,1735: 190,1487: 190,1538: 190,2229: 189,888: 189,3313: 188,1160: 188,528: 187,82: 187,476: 187,773: 187,179: 187,4894: 186,460: 186,172: 185,473: 185,910: 185,556: 185,487: 185,4061: 184,1256: 184,716: 183,1230: 183,3712: 183,510: 182,143: 182,3073: 182,2648: 182,1026: 181,923: 181,2120: 180,2596: 180,2762: 180,2125: 180,824: 179,732: 179,9317: 179,1164: 179,254: 178,3867: 178,5369: 178,176: 178,1092: 178,863: 178,1394: 178,12161: 178,3276: 177,433: 177,849: 177,687: 176,1030: 176,2592: 176,1707: 176,752: 176,41: 175,2113: 175,3127: 175,1000: 175,1472: 175,2129: 174,1433: 174,1808: 174,225: 174,11485: 174,1828: 174,35215: 173,461: 173,1820: 173,691: 172,937: 172,22780: 172,1752: 172,2816: 172,1674: 171,1287: 171,432: 171,1434: 171,262: 170,1032: 170,1074: 170,663: 170,644: 170,2109: 170,1409: 170,1459: 169,887: 169,3432: 169,665: 169,709: 169,1411: 169,2662: 169,1378: 169,2999: 169,538: 169,1240: 168,87: 168,986: 168,642: 168,728: 168,1596: 167,456: 167,174: 167,1244: 167,1271: 167,1056: 167,901: 167,657: 167,946: 167,352: 167,1262: 167,3040: 166,186: 166,1791: 166,1044: 166,120: 165,6250: 165,529: 165,1307: 165,640: 164,580: 164,3451: 164,122: 164,3410: 164,571: 164,224: 164,9743: 164,464: 163,1390: 163,264: 163,834: 163,1175: 162,1737: 162,508: 162,1952: 162,1356: 162,2650: 162,2904: 161,11889: 161,656: 161,1556: 161,116: 161,777: 161,783: 161,677: 161,2288: 161,3267: 160,804: 160,572: 160,1308: 160,489: 160,10861: 160,1496: 160,9287: 160,618: 160,1049: 159,606: 159,2873: 159,612: 159,16617: 159,8159: 158,999: 158,620: 158,782: 158,1223: 158,648: 158,750: 158,545: 157,357: 157,2631: 157,917: 157,748: 157,2019: 157,8222: 157,7273: 156,593: 156,237: 156,1402: 156,819: 156,797: 156,1619: 156,993: 156,715: 156,24: 156,2833: 155,5320: 155,844: 155,3550: 155,550: 155,5480: 154,96: 154,3198: 154,1724: 154,244: 154,14515: 154,1991: 154,1314: 153,5260: 153,86: 153,2097: 153,3101: 153,1025: 152,833: 152,1399: 152,1762: 152,4580: 152,846: 152,522: 152,5336: 151,1885: 151,40: 151,499: 151,1041: 151,931: 151,1389: 151,585: 151,812: 151,1628: 150,2459: 149,429: 149,1462: 149,265: 148,521: 148,1288: 147,1540: 147,775: 147,557: 147,1062: 147,1463: 147,559: 146,628: 146,1370: 146,1179: 146,17905: 146,1349: 146,2449: 146,1579: 146,2063: 145,1811: 145,1398: 145,4145: 145,1066: 145,820: 145,81: 144,673: 144,2354: 144,3112: 144,4359: 144,123: 144,1382: 144,1283: 143,31: 143,722: 143,2056: 143,763: 143,222: 142,567: 142,724: 142,1413: 142,2701: 142,241: 142,1178: 142,1126: 141,1419: 141,12471: 141,1185: 140,513: 140,2169: 140,638: 140,1020: 140,669: 140,1976: 140,249: 140,601: 139,1322: 139,3231: 139,1086: 139,13599: 139,848: 139,950: 139,841: 138,1397: 138,858: 138,719: 138,2997: 138,517: 138,121: 138,469: 138,1315: 137,502: 137,1018: 137,2066: 137,3048: 137,16904: 137,428: 137,530: 137,459: 137,686: 136,198: 136,467: 136,1391: 136,1405: 136,746: 136,1581: 135,485: 135,1003: 135,816: 135,4815: 135,2794: 134,2451: 134,600: 134,867: 134,544: 134,38299: 134,1664: 134,2119: 134,1396: 134,1911: 133,18275: 133,472: 133,856: 133,734: 133,245: 133,1231: 133,897: 133,221: 133,961: 133,1726: 132,1257: 132,1296: 132,416: 132,319: 132,...},'land': {0: 4898403, 1: 28},'wrong_fragment': {0: 4897193, 3: 970, 1: 268},'urgent': {0: 4898415, 1: 9, 2: 4, 14: 1, 5: 1, 3: 1},'hot': {0: 4890163,2: 2647,1: 1393,4: 942,6: 868,5: 339,19: 281,30: 281,28: 279,14: 270,18: 269,22: 255,24: 249,3: 88,7: 27,20: 17,9: 11,15: 8,11: 7,17: 7,10: 5,12: 4,13: 4,21: 4,16: 4,8: 3,44: 2,25: 2,33: 1,77: 1},'num_failed_logins': {0: 4898306, 1: 107, 2: 9, 3: 5, 4: 3, 5: 1},'logged_in': {0: 4195364, 1: 703067},'num_compromised': {0: 4895552,1: 2369,2: 153,3: 78,4: 68,5: 29,6: 27,7: 12,8: 8,9: 7,13: 7,18: 6,11: 6,10: 5,12: 5,17: 4,16: 4,14: 3,21: 3,23: 3,884: 2,151: 2,371: 2,31: 2,166: 1,94: 1,75: 1,78: 1,83: 1,110: 1,102: 1,107: 1,64: 1,121: 1,157: 1,74: 1,27: 1,54: 1,46: 1,44: 1,41: 1,40: 1,38: 1,37: 1,29: 1,175: 1,22: 1,19: 1,15: 1,174: 1,7479: 1,1739: 1,676: 1,457: 1,462: 1,520: 1,537: 1,538: 1,543: 1,558: 1,568: 1,622: 1,682: 1,452: 1,691: 1,716: 1,751: 1,756: 1,761: 1,767: 1,789: 1,809: 1,1043: 1,456: 1,435: 1,187: 1,270: 1,193: 1,198: 1,202: 1,217: 1,237: 1,238: 1,247: 1,254: 1,258: 1,275: 1,407: 1,281: 1,307: 1,345: 1,349: 1,373: 1,375: 1,378: 1,394: 1,405: 1,177: 1},'root_shell': {0: 4898097, 1: 334},'su_attempted': {0: 4898321, 2: 70, 1: 40},'num_root': {0: 4892729,1: 2411,9: 1493,6: 1238,5: 224,2: 152,4: 67,3: 17,10: 5,7: 3,8: 3,22: 2,421: 2,36: 2,26: 2,39: 2,11: 2,857: 2,16: 1,91: 1,191: 1,190: 1,187: 1,184: 1,179: 1,151: 1,146: 1,123: 1,121: 1,119: 1,104: 1,100: 1,74: 1,77: 1,17: 1,71: 1,55: 1,54: 1,47: 1,41: 1,40: 1,204: 1,38: 1,28: 1,12: 1,14: 1,195: 1,7468: 1,206: 1,505: 1,512: 1,572: 1,605: 1,610: 1,611: 1,626: 1,629: 1,684: 1,749: 1,754: 1,766: 1,789: 1,841: 1,849: 1,867: 1,889: 1,975: 1,993: 1,1045: 1,508: 1,502: 1,1743: 1,480: 1,222: 1,247: 1,261: 1,268: 1,278: 1,287: 1,289: 1,290: 1,306: 1,338: 1,387: 1,390: 1,402: 1,416: 1,417: 1,425: 1,439: 1,446: 1,450: 1,218: 1},'num_file_creations': {0: 4896079,1: 1792,2: 359,4: 15,17: 13,12: 10,14: 10,13: 9,20: 9,16: 8,8: 8,15: 8,10: 8,18: 8,11: 8,5: 7,9: 7,19: 6,3: 6,23: 6,26: 5,25: 5,22: 4,7: 4,28: 4,6: 3,35: 3,40: 3,32: 3,34: 3,27: 3,29: 3,36: 2,21: 2,31: 1,33: 1,41: 1,24: 1,37: 1,38: 1,30: 1,43: 1},'num_shells': {0: 4898072, 1: 354, 2: 5},'num_access_files': {0: 4893768,1: 4429,2: 197,5: 9,4: 9,3: 9,6: 4,8: 3,7: 2,9: 1},'is_hot_login': {0: 4898429, 1: 2},'is_guest_login': {0: 4894340, 1: 4091},'count': {511: 2276994,1: 383272,510: 267314,2: 101964,509: 56022,3: 55314,4: 48161,5: 41027,6: 36301,7: 33077,8: 28678,9: 25975,10: 24279,11: 22617,12: 20539,13: 18641,14: 16403,15: 14699,508: 14487,16: 13485,17: 12252,18: 10978,19: 9572,449: 9168,103: 8864,102: 8861,104: 8855,105: 8845,107: 8838,106: 8834,108: 8828,109: 8817,101: 8814,110: 8801,111: 8797,112: 8781,113: 8776,114: 8767,115: 8762,119: 8755,117: 8754,118: 8754,116: 8752,123: 8745,120: 8745,121: 8739,122: 8732,124: 8730,125: 8729,126: 8705,127: 8701,128: 8676,129: 8673,130: 8656,131: 8649,132: 8622,133: 8606,134: 8585,20: 8583,135: 8576,136: 8539,137: 8512,138: 8475,139: 8442,140: 8388,141: 8342,142: 8279,451: 8256,143: 8198,480: 8009,478: 7959,144: 7939,21: 7869,145: 7610,146: 7452,147: 7334,148: 7193,482: 7106,22: 7087,149: 7012,453: 6964,481: 6598,23: 6588,203: 6218,204: 6215,202: 6212,205: 6203,206: 6198,207: 6184,208: 6181,201: 6178,209: 6163,210: 6158,211: 6149,212: 6143,213: 6137,214: 6131,456: 6125,215: 6119,216: 6118,217: 6110,218: 6105,24: 6092,219: 6092,220: 6088,221: 6071,222: 6068,223: 6055,224: 6050,225: 6037,226: 6034,227: 6024,228: 6020,229: 6012,230: 6010,231: 6002,232: 6001,233: 5989,234: 5985,450: 5978,235: 5975,236: 5973,237: 5958,238: 5956,239: 5949,240: 5949,241: 5937,242: 5933,243: 5914,244: 5908,447: 5900,245: 5897,246: 5889,247: 5873,446: 5872,248: 5870,249: 5847,250: 5843,251: 5823,252: 5823,253: 5806,254: 5802,255: 5789,256: 5785,257: 5765,258: 5760,259: 5743,260: 5739,458: 5719,261: 5712,262: 5702,457: 5696,263: 5686,264: 5679,265: 5655,266: 5649,267: 5632,268: 5628,269: 5602,270: 5596,271: 5577,272: 5574,273: 5533,274: 5530,275: 5506,276: 5497,277: 5462,150: 5460,278: 5452,25: 5445,507: 5441,279: 5422,280: 5419,281: 5368,282: 5364,283: 5315,284: 5305,285: 5218,286: 5210,452: 5130,287: 5063,288: 5037,26: 5026,289: 4860,290: 4855,291: 4765,292: 4745,293: 4643,294: 4616,27: 4600,477: 4589,476: 4571,295: 4526,445: 4514,296: 4508,297: 4323,298: 4284,454: 4271,479: 4153,28: 4148,461: 3933,448: 3699,29: 3606,463: 3498,30: 3365,484: 3283,444: 3227,299: 3116,31: 3109,32: 2960,100: 2940,300: 2898,486: 2814,33: 2757,443: 2690,441: 2684,436: 2638,200: 2563,460: 2540,459: 2530,34: 2436,199: 2392,483: 2304,35: 2295,442: 2256,439: 2229,475: 2183,464: 2120,462: 2101,36: 2099,455: 2051,37: 1779,99: 1761,473: 1697,466: 1666,38: 1652,98: 1626,438: 1601,506: 1583,39: 1526,97: 1526,198: 1502,197: 1475,96: 1428,40: 1404,504: 1393,491: 1374,487: 1363,196: 1354,485: 1352,195: 1347,434: 1319,95: 1301,194: 1282,193: 1262,41: 1246,474: 1231,472: 1220,467: 1205,192: 1189,191: 1178,190: 1120,189: 1119,42: 1068,94: 1064,188: 992,187: 974,43: 927,437: 902,505: 891,435: 890,490: 881,432: 879,44: 876,93: 869,186: 853,185: 851,45: 848,92: 802,184: 800,183: 797,46: 775,181: 759,91: 757,182: 757,471: 746,469: 743,468: 741,47: 727,179: 726,180: 723,90: 721,178: 713,177: 707,48: 703,176: 686,89: 684,175: 683,49: 677,173: 670,174: 667,50: 663,172: 651,79: 650,78: 649,171: 649,88: 645,51: 644,80: 638,169: 637,170: 636,77: 632,52: 631,87: 624,76: 623,168: 623,167: 622,151: 613,165: 611,166: 610,86: 607,75: 606,74: 599,53: 594,164: 594,73: 593,72: 589,163: 589,162: 588,85: 585,160: 584,159: 584,157: 583,158: 583,161: 582,71: 580,84: 579,54: 575,55: 573,70: 570,59: 569,155: 569,69: 568,60: 568,83: 567,156: 566,58: 565,81: 561,68: 561,57: 561,56: 560,153: 560,154: 559,152: 559,64: 558,66: 558,67: 558,61: 558,65: 557,82: 557,62: 555,63: 551,500: 515,440: 479,502: 447,503: 432,501: 409,496: 394,493: 394,494: 394,492: 394,495: 394,498: 394,497: 394,499: 394,488: 391,489: 391,470: 275,465: 268,301: 78,302: 45,315: 26,316: 26,317: 26,314: 26,313: 26,312: 26,311: 26,305: 25,304: 25,303: 25,306: 25,307: 25,308: 25,309: 25,310: 25,318: 25,319: 25,320: 25,322: 24,321: 24,323: 22,324: 22,325: 18,326: 18,433: 17,327: 17,328: 17,331: 16,329: 16,333: 16,332: 16,330: 16,368: 15,431: 15,343: 15,341: 15,340: 15,339: 15,338: 15,337: 15,336: 15,335: 15,334: 15,369: 15,370: 15,371: 15,372: 15,373: 15,374: 15,375: 15,376: 15,377: 15,378: 15,379: 15,380: 15,381: 15,342: 15,344: 15,430: 15,345: 15,366: 15,365: 15,364: 15,363: 15,362: 15,361: 15,360: 15,359: 15,358: 15,357: 15,356: 15,355: 15,354: 15,353: 15,352: 15,351: 15,350: 15,349: 15,348: 15,347: 15,346: 15,382: 15,383: 15,384: 15,385: 15,410: 15,411: 15,412: 15,413: 15,414: 15,415: 15,416: 15,417: 15,418: 15,419: 15,420: 15,421: 15,422: 15,423: 15,367: 15,424: 15,425: 15,426: 15,427: 15,428: 15,429: 15,409: 15,408: 15,407: 15,395: 15,386: 15,387: 15,388: 15,389: 15,390: 15,391: 15,392: 15,393: 15,394: 15,396: 15,406: 15,397: 15,398: 15,399: 15,400: 15,401: 15,402: 15,403: 15,404: 15,405: 15,0: 13},'srv_count': {511: 2263945,1: 363308,510: 269922,2: 177246,3: 108282,4: 94630,5: 88059,6: 84992,7: 82123,8: 79115,9: 76672,10: 74968,11: 73784,12: 72283,13: 70728,14: 69017,15: 67410,16: 65952,17: 64728,18: 63561,19: 62172,20: 61263,509: 54410,21: 19463,22: 18590,23: 17868,24: 17232,25: 16396,508: 13977,449: 9164,451: 8252,480: 8006,478: 7955,482: 7103,453: 6961,26: 6670,481: 6595,27: 6208,456: 6122,450: 5974,447: 5896,446: 5868,28: 5726,458: 5716,457: 5693,507: 5438,29: 5192,452: 5127,30: 4847,477: 4586,476: 4568,31: 4511,445: 4510,32: 4271,454: 4268,479: 4150,33: 4018,461: 3930,448: 3695,34: 3688,463: 3495,35: 3440,484: 3280,444: 3223,36: 3208,37: 3013,38: 2824,486: 2811,443: 2686,441: 2680,436: 2634,39: 2621,460: 2537,459: 2527,40: 2376,483: 2301,442: 2252,439: 2225,475: 2180,464: 2117,462: 2098,41: 2080,455: 2048,42: 1824,473: 1694,466: 1663,43: 1642,438: 1597,506: 1580,44: 1535,45: 1437,504: 1390,491: 1371,487: 1360,485: 1349,46: 1346,434: 1315,47: 1247,474: 1228,472: 1217,467: 1202,48: 1148,49: 1081,50: 1016,51: 944,437: 898,505: 888,435: 886,52: 881,490: 878,432: 875,471: 743,469: 740,468: 738,53: 678,54: 632,55: 556,500: 512,56: 498,440: 475,57: 458,502: 444,58: 429,503: 429,501: 406,59: 405,493: 391,494: 391,495: 391,496: 391,497: 391,498: 391,499: 391,492: 391,489: 388,488: 388,60: 385,61: 357,62: 346,63: 327,64: 301,65: 280,470: 272,66: 271,465: 265,67: 255,68: 239,69: 226,70: 214,71: 208,72: 200,73: 197,74: 189,75: 181,76: 175,77: 168,82: 164,83: 162,84: 161,78: 161,79: 160,85: 158,81: 157,80: 157,86: 155,89: 154,87: 153,90: 152,88: 151,92: 146,91: 145,93: 139,157: 138,154: 138,94: 137,153: 137,155: 137,156: 137,152: 136,96: 134,95: 134,151: 134,150: 133,99: 132,97: 132,149: 132,147: 131,142: 131,98: 131,123: 131,141: 131,110: 131,124: 130,125: 130,140: 130,130: 130,128: 130,114: 130,146: 130,111: 130,148: 130,129: 130,139: 129,137: 129,136: 129,131: 129,144: 129,127: 129,126: 129,113: 129,104: 129,122: 129,107: 129,108: 129,109: 129,105: 129,138: 128,103: 128,143: 128,145: 128,112: 128,134: 128,106: 128,132: 128,158: 127,100: 127,119: 127,120: 127,135: 127,133: 127,159: 126,102: 126,101: 126,121: 125,116: 125,118: 125,115: 125,117: 123,160: 121,161: 66,162: 53,197: 52,198: 52,203: 51,201: 50,180: 50,204: 50,193: 50,200: 50,199: 50,192: 50,179: 50,163: 50,189: 49,190: 49,191: 49,194: 49,195: 49,196: 49,202: 49,205: 49,223: 48,221: 48,187: 48,206: 48,218: 48,188: 48,184: 48,182: 48,181: 48,185: 47,186: 47,219: 47,220: 47,222: 47,224: 47,208: 47,207: 47,171: 47,170: 47,172: 47,169: 47,165: 47,173: 47,178: 47,164: 47,174: 47,183: 47,168: 46,167: 46,166: 46,209: 46,210: 46,177: 46,175: 46,217: 46,230: 46,225: 46,226: 46,227: 46,228: 46,229: 46,236: 45,235: 45,234: 45,233: 45,231: 45,232: 45,176: 45,211: 45,213: 45,216: 45,215: 44,212: 44,214: 44,237: 44,238: 43,239: 43,240: 43,241: 34,252: 32,253: 32,254: 32,242: 32,255: 31,243: 31,245: 31,267: 31,251: 31,256: 31,257: 31,258: 31,259: 31,263: 30,271: 30,270: 30,269: 30,268: 30,266: 30,265: 30,264: 30,261: 30,262: 30,260: 30,250: 30,249: 30,248: 30,247: 30,246: 30,244: 30,272: 29,273: 29,276: 28,275: 28,274: 28,311: 26,278: 26,310: 26,277: 26,312: 26,314: 25,309: 25,313: 25,315: 25,316: 25,317: 25,279: 25,306: 24,307: 24,308: 24,305: 24,289: 24,304: 24,293: 24,292: 24,291: 24,290: 24,288: 24,287: 24,280: 24,303: 23,286: 23,302: 23,294: 23,281: 23,282: 23,301: 23,284: 23,285: 23,283: 23,295: 23,296: 23,297: 23,298: 23,299: 23,300: 23,318: 22,319: 21,320: 19,321: 19,322: 19,323: 17,324: 15,433: 13,0: 13,325: 13,327: 12,326: 12,394: 11,405: 11,404: 11,403: 11,402: 11,401: 11,400: 11,399: 11,398: 11,397: 11,396: 11,395: 11,393: 11,407: 11,392: 11,391: 11,390: 11,389: 11,388: 11,387: 11,386: 11,385: 11,384: 11,383: 11,406: 11,408: 11,381: 11,421: 11,431: 11,430: 11,429: 11,428: 11,427: 11,426: 11,425: 11,424: 11,423: 11,422: 11,420: 11,409: 11,419: 11,418: 11,417: 11,416: 11,415: 11,414: 11,413: 11,412: 11,411: 11,410: 11,382: 11,380: 11,328: 11,379: 11,352: 11,351: 11,350: 11,349: 11,348: 11,347: 11,346: 11,345: 11,344: 11,343: 11,342: 11,341: 11,339: 11,338: 11,337: 11,336: 11,335: 11,334: 11,333: 11,332: 11,331: 11,330: 11,329: 11,353: 11,354: 11,355: 11,368: 11,378: 11,377: 11,376: 11,375: 11,374: 11,373: 11,372: 11,371: 11,370: 11,369: 11,367: 11,356: 11,366: 11,365: 11,364: 11,363: 11,362: 11,361: 11,360: 11,359: 11,358: 11,357: 11,340: 11},'serror_rate': {0.0: 4008319,1.0: 865155,0.99: 3017,0.08: 1647,0.05: 1533,0.14: 1309,0.07: 1236,0.06: 1209,0.04: 1183,0.09: 1028,0.1: 988,0.03: 920,0.13: 901,0.11: 886,0.12: 870,0.2: 653,0.02: 589,0.5: 577,0.25: 544,0.01: 507,0.17: 458,0.15: 423,0.33: 416,0.18: 356,0.22: 352,0.16: 351,0.23: 341,0.21: 265,0.19: 215,0.27: 182,0.98: 178,0.44: 136,0.29: 129,0.24: 109,0.97: 109,0.28: 75,0.96: 73,0.31: 72,0.26: 65,0.3: 52,0.36: 51,0.95: 47,0.94: 47,0.8: 38,0.32: 37,0.93: 33,0.79: 31,0.65: 30,0.64: 29,0.35: 26,0.67: 26,0.75: 26,0.63: 25,0.83: 25,0.85: 24,0.66: 24,0.84: 21,0.86: 21,0.34: 20,0.82: 18,0.59: 18,0.81: 18,0.74: 17,0.78: 17,0.4: 16,0.92: 16,0.6: 16,0.58: 14,0.62: 14,0.57: 13,0.61: 13,0.68: 13,0.73: 12,0.41: 12,0.43: 12,0.38: 11,0.7: 11,0.39: 10,0.71: 10,0.37: 10,0.77: 10,0.69: 10,0.91: 10,0.42: 10,0.56: 9,0.76: 9,0.72: 9,0.53: 9,0.52: 8,0.55: 8,0.51: 8,0.9: 7,0.54: 7,0.88: 7,0.89: 6,0.87: 4},'srv_serror_rate': {0.0: 4015875,1.0: 870397,0.03: 1359,0.04: 1204,0.05: 1100,0.06: 924,0.02: 834,0.08: 739,0.07: 698,0.5: 580,0.33: 469,0.25: 460,0.12: 447,0.17: 432,0.14: 427,0.2: 427,0.11: 424,0.1: 421,0.09: 419,0.01: 89,0.67: 64,0.95: 56,0.94: 39,0.22: 37,0.15: 37,0.13: 35,0.18: 35,0.29: 31,0.92: 30,0.4: 29,0.93: 28,0.75: 22,0.91: 22,0.9: 20,0.89: 20,0.88: 20,0.8: 19,0.83: 19,0.86: 17,0.21: 8,0.19: 8,0.23: 8,0.3: 6,0.58: 5,0.16: 5,0.27: 5,0.41: 4,0.43: 4,0.37: 4,0.6: 4,0.38: 4,0.56: 3,0.76: 3,0.39: 3,0.36: 3,0.35: 3,0.59: 3,0.26: 3,0.71: 3,0.42: 2,0.85: 2,0.53: 2,0.82: 2,0.34: 2,0.78: 2,0.87: 2,0.45: 2,0.32: 2,0.44: 1,0.54: 1,0.46: 1,0.96: 1,0.72: 1,0.24: 1,0.63: 1,0.48: 1,0.55: 1,0.7: 1,0.57: 1,0.51: 1,0.68: 1,0.74: 1,0.73: 1,0.47: 1,0.65: 1,0.28: 1,0.61: 1},'rerror_rate': {0.0: 4611181,1.0: 269224,0.86: 1168,0.87: 1027,0.92: 929,0.95: 929,0.9: 909,0.91: 738,0.93: 696,0.88: 671,0.96: 645,0.89: 607,0.94: 594,0.85: 576,0.99: 487,0.82: 450,0.5: 417,0.77: 393,0.97: 370,0.98: 351,0.8: 300,0.01: 292,0.25: 280,0.33: 274,0.84: 271,0.78: 267,0.76: 246,0.2: 244,0.79: 239,0.75: 195,0.73: 183,0.08: 178,0.03: 173,0.17: 170,0.81: 166,0.07: 161,0.14: 148,0.06: 148,0.71: 147,0.56: 140,0.04: 139,0.05: 138,0.12: 134,0.83: 132,0.02: 129,0.11: 112,0.1: 107,0.09: 103,0.67: 97,0.69: 84,0.72: 74,0.74: 67,0.7: 58,0.64: 57,0.68: 37,0.21: 27,0.65: 26,0.66: 20,0.6: 20,0.22: 17,0.36: 17,0.62: 17,0.4: 16,0.26: 15,0.57: 15,0.35: 14,0.37: 13,0.34: 12,0.59: 12,0.27: 12,0.58: 12,0.29: 11,0.32: 11,0.61: 10,0.19: 10,0.63: 10,0.3: 10,0.23: 9,0.24: 8,0.43: 8,0.31: 8,0.28: 7,0.38: 5,0.18: 2,0.45: 1,0.55: 1,0.39: 1,0.44: 1,0.16: 1},'srv_rerror_rate': {0.0: 4607827,1.0: 280581,0.5: 1230,0.33: 918,0.25: 703,0.03: 656,0.2: 618,0.02: 575,0.17: 551,0.12: 544,0.04: 524,0.14: 496,0.05: 451,0.06: 414,0.08: 374,0.07: 335,0.11: 259,0.1: 228,0.09: 209,0.67: 185,0.4: 87,0.29: 73,0.22: 55,0.01: 51,0.75: 47,0.6: 28,0.83: 23,0.8: 22,0.86: 22,0.88: 20,0.95: 16,0.85: 15,0.18: 15,0.96: 15,0.79: 13,0.89: 13,0.81: 13,0.43: 13,0.87: 12,0.82: 12,0.15: 11,0.13: 11,0.84: 11,0.93: 10,0.9: 10,0.71: 9,0.38: 9,0.92: 9,0.62: 8,0.73: 8,0.78: 8,0.94: 8,0.3: 7,0.64: 7,0.76: 7,0.91: 7,0.69: 7,0.57: 6,0.56: 4,0.7: 3,0.77: 3,0.21: 3,0.74: 2,0.27: 2,0.72: 2,0.44: 2,0.58: 2,0.36: 2,0.19: 2,0.23: 2,0.39: 1,0.24: 1,0.55: 1,0.32: 1,0.16: 1,0.37: 1},'same_srv_rate': {1.0: 3780885,0.06: 111230,0.05: 108018,0.07: 101118,0.04: 100316,0.03: 96970,0.02: 93355,0.01: 85519,0.08: 71784,0.09: 49290,0.1: 38538,0.0: 37497,0.12: 36085,0.13: 33632,0.11: 32478,0.14: 25199,0.15: 17096,0.5: 14454,0.16: 12054,0.17: 8933,0.33: 6316,0.18: 5025,0.2: 4226,0.19: 3252,0.67: 2426,0.25: 2062,0.21: 1533,0.22: 1312,0.23: 1072,0.24: 1058,0.99: 1014,0.4: 967,0.75: 908,0.27: 700,0.26: 696,0.29: 682,0.8: 565,0.3: 500,0.38: 497,0.28: 493,0.98: 476,0.86: 469,0.83: 461,0.31: 430,0.43: 386,0.36: 358,0.44: 355,0.32: 341,0.6: 319,0.35: 276,0.47: 266,0.45: 264,0.42: 263,0.41: 250,0.39: 211,0.46: 207,0.88: 189,0.57: 168,0.48: 158,0.37: 156,0.56: 155,0.53: 154,0.97: 150,0.52: 149,0.34: 149,0.92: 137,0.71: 123,0.55: 113,0.62: 111,0.54: 109,0.93: 109,0.94: 103,0.89: 99,0.96: 80,0.9: 78,0.91: 77,0.64: 75,0.58: 71,0.95: 68,0.78: 48,0.73: 47,0.59: 42,0.69: 40,0.65: 35,0.49: 35,0.7: 31,0.82: 30,0.61: 27,0.76: 24,0.85: 24,0.87: 24,0.79: 22,0.77: 19,0.74: 19,0.72: 17,0.68: 17,0.63: 15,0.51: 13,0.81: 12,0.84: 12,0.66: 10},'diff_srv_rate': {0.0: 3780259,0.06: 524404,0.07: 288438,0.05: 193620,0.08: 33433,1.0: 23610,0.04: 9320,0.67: 7034,0.5: 5918,0.09: 3883,0.6: 2542,0.12: 2193,0.1: 2146,0.11: 1972,0.14: 1386,0.13: 1144,0.33: 1102,0.29: 1101,0.15: 1097,0.4: 1087,0.01: 1071,0.17: 938,0.25: 788,0.75: 759,0.18: 730,0.2: 665,0.16: 662,0.19: 603,0.03: 541,0.22: 520,0.21: 494,0.02: 382,0.38: 346,0.96: 329,0.27: 324,0.23: 278,0.24: 256,0.52: 234,0.43: 214,0.3: 205,0.36: 180,0.31: 179,0.57: 169,0.95: 168,0.53: 159,0.26: 137,0.44: 129,0.8: 126,0.42: 95,0.28: 88,0.56: 79,0.99: 75,0.54: 73,0.55: 62,0.51: 57,0.32: 56,0.45: 55,0.37: 51,0.41: 44,0.35: 43,0.58: 41,0.71: 40,0.62: 39,0.39: 32,0.83: 31,0.46: 27,0.64: 14,0.86: 14,0.78: 13,0.73: 12,0.69: 12,0.82: 12,0.97: 11,0.47: 11,0.98: 10,0.59: 10,0.77: 6,0.7: 6,0.88: 6,0.61: 4,0.85: 4,0.89: 3,0.87: 3,0.92: 2,0.79: 2,0.74: 2,0.76: 2,0.72: 2,0.68: 1,0.9: 1,0.81: 1,0.65: 1,0.91: 1,0.63: 1,0.48: 1},'srv_diff_host_rate': {0.0: 4559729,1.0: 78821,0.12: 14655,0.5: 12645,0.67: 12414,0.11: 11857,0.33: 11622,0.25: 11060,0.1: 10737,0.14: 10678,0.17: 10166,0.08: 9722,0.2: 9688,0.09: 9476,0.18: 9451,0.15: 9398,0.4: 8919,0.29: 8675,0.07: 8339,0.13: 8120,0.22: 8087,0.06: 7204,0.02: 4920,0.05: 4831,0.01: 4146,0.19: 3789,0.21: 3705,0.27: 3347,0.16: 3345,0.75: 3342,0.04: 2945,0.23: 2770,0.6: 2717,0.38: 2716,0.3: 2557,0.43: 2484,0.03: 1789,0.24: 1286,0.31: 1050,0.36: 926,0.8: 839,0.44: 687,0.57: 618,0.26: 406,0.28: 319,0.42: 186,0.45: 169,0.56: 157,0.83: 144,0.71: 135,0.62: 135,0.32: 106,0.35: 90,0.46: 47,0.55: 39,0.86: 38,0.37: 26,0.39: 25,0.47: 24,0.41: 23,0.54: 20,0.58: 17,0.64: 16,0.88: 9,0.78: 9,0.53: 7,0.7: 6,0.73: 2,0.48: 1,0.89: 1,0.77: 1,0.9: 1},'dst_host_count': {255: 4305015,1: 26103,2: 20242,3: 13175,4: 11736,5: 9935,6: 9376,7: 8520,8: 8206,9: 7695,10: 7358,11: 7007,12: 6825,13: 6480,14: 6312,15: 6071,16: 5881,17: 5662,18: 5564,19: 5375,20: 5277,21: 5128,22: 5036,23: 4919,24: 4830,25: 4634,26: 4543,27: 4421,28: 4329,29: 4227,30: 4176,31: 4061,32: 4011,33: 3913,34: 3843,35: 3764,36: 3719,37: 3628,38: 3592,39: 3540,40: 3485,41: 3420,42: 3395,43: 3318,44: 3279,45: 3204,46: 3149,47: 3102,48: 3076,49: 3024,50: 2999,51: 2961,52: 2917,53: 2875,54: 2858,55: 2803,56: 2786,57: 2748,58: 2720,59: 2684,60: 2667,61: 2632,62: 2613,63: 2591,64: 2567,65: 2539,66: 2516,67: 2482,68: 2467,69: 2439,70: 2419,71: 2397,72: 2382,73: 2352,74: 2339,75: 2312,76: 2294,77: 2273,78: 2257,79: 2225,80: 2214,81: 2185,82: 2154,83: 2131,84: 2121,85: 2098,86: 2087,87: 2067,88: 2048,89: 2022,90: 2011,91: 1989,92: 1984,93: 1966,94: 1946,95: 1936,96: 1917,97: 1900,98: 1890,99: 1867,100: 1853,101: 1840,102: 1824,103: 1808,104: 1790,105: 1762,106: 1748,107: 1730,108: 1724,109: 1695,110: 1687,111: 1673,112: 1662,113: 1652,114: 1645,115: 1631,116: 1624,117: 1610,118: 1603,119: 1589,120: 1576,121: 1565,122: 1558,123: 1549,124: 1534,125: 1527,126: 1522,127: 1513,128: 1505,129: 1494,130: 1486,131: 1474,132: 1462,133: 1448,134: 1442,135: 1434,136: 1431,137: 1426,138: 1424,139: 1414,140: 1408,141: 1399,142: 1393,143: 1379,144: 1375,145: 1362,146: 1347,147: 1335,148: 1328,149: 1323,150: 1320,151: 1308,152: 1302,153: 1295,154: 1291,155: 1283,156: 1275,157: 1267,158: 1257,159: 1249,160: 1243,161: 1230,162: 1221,163: 1214,164: 1208,165: 1201,166: 1192,167: 1183,168: 1182,169: 1172,170: 1167,171: 1158,172: 1151,173: 1144,174: 1141,175: 1134,176: 1129,177: 1125,178: 1119,179: 1114,180: 1104,181: 1100,182: 1090,183: 1081,184: 1075,185: 1068,186: 1064,187: 1056,188: 1052,189: 1048,190: 1044,191: 1041,192: 1039,193: 1034,194: 1029,195: 1024,196: 1015,197: 1007,198: 1004,199: 999,200: 995,201: 989,202: 979,203: 971,204: 963,205: 960,206: 955,207: 954,208: 953,209: 943,210: 937,211: 929,212: 922,213: 916,214: 906,215: 898,216: 885,217: 878,218: 875,219: 866,220: 861,221: 849,222: 840,223: 833,224: 830,225: 824,226: 820,227: 814,228: 802,229: 794,230: 788,231: 780,232: 778,233: 773,234: 771,235: 767,236: 763,237: 759,238: 753,239: 748,240: 742,241: 735,242: 733,243: 729,244: 729,245: 727,246: 723,247: 721,248: 719,249: 710,250: 701,251: 697,252: 691,253: 682,254: 678,0: 33},'dst_host_srv_count': {255: 3363886,1: 117468,2: 70459,3: 58016,4: 55638,5: 54646,6: 54361,7: 54072,8: 53942,9: 53692,10: 53501,11: 53186,12: 52853,13: 52669,14: 52397,15: 51980,16: 51560,17: 51081,18: 50654,19: 50134,20: 49652,21: 10723,22: 10695,254: 10687,23: 10639,25: 10629,24: 10599,250: 7182,253: 6825,251: 6349,249: 5871,241: 5837,245: 5786,239: 5298,252: 5122,240: 4797,244: 4641,235: 4282,242: 4143,238: 4118,247: 4006,246: 3942,248: 3895,231: 3804,234: 3683,237: 3678,236: 3471,243: 3251,232: 3184,233: 3180,228: 2795,229: 2572,230: 2558,227: 2177,225: 2074,221: 1758,226: 1485,222: 1484,219: 1482,224: 1393,223: 1388,26: 1374,27: 1369,166: 1359,30: 1319,28: 1317,29: 1299,64: 1296,162: 1287,33: 1287,131: 1280,46: 1277,38: 1268,63: 1263,47: 1252,41: 1247,65: 1240,32: 1237,45: 1235,152: 1233,42: 1222,34: 1220,36: 1220,48: 1220,43: 1218,31: 1215,61: 1213,40: 1210,129: 1204,44: 1200,56: 1195,149: 1192,35: 1191,49: 1188,37: 1179,58: 1177,169: 1176,146: 1176,62: 1174,132: 1169,39: 1167,121: 1165,69: 1157,138: 1156,173: 1149,148: 1147,215: 1146,54: 1145,66: 1145,141: 1141,51: 1141,167: 1141,140: 1140,53: 1136,126: 1136,86: 1132,151: 1131,59: 1130,75: 1129,143: 1128,144: 1123,165: 1121,122: 1120,171: 1115,60: 1114,71: 1108,133: 1108,120: 1106,119: 1106,50: 1106,82: 1103,145: 1102,52: 1100,137: 1098,170: 1098,68: 1097,134: 1097,154: 1097,136: 1094,57: 1093,142: 1092,155: 1091,77: 1091,130: 1090,139: 1084,164: 1084,55: 1082,147: 1081,157: 1078,168: 1074,73: 1069,182: 1069,70: 1067,153: 1066,67: 1065,79: 1061,81: 1060,174: 1060,216: 1058,156: 1058,90: 1055,135: 1052,150: 1051,72: 1049,91: 1049,78: 1049,80: 1046,87: 1045,116: 1044,99: 1033,117: 1032,74: 1029,161: 1026,93: 1024,98: 1016,94: 1016,100: 1014,84: 1013,76: 1012,163: 1011,95: 1009,103: 1008,105: 1004,88: 1003,113: 998,178: 997,175: 996,89: 993,158: 990,85: 987,180: 987,83: 982,104: 979,106: 975,181: 971,112: 969,102: 965,185: 964,124: 963,118: 959,128: 959,179: 952,172: 952,92: 950,97: 947,127: 945,177: 945,183: 942,101: 940,125: 939,107: 938,176: 938,110: 934,160: 930,220: 929,114: 928,159: 923,111: 921,203: 915,115: 913,96: 909,123: 907,214: 895,184: 889,192: 886,217: 882,187: 874,213: 868,186: 866,208: 863,191: 862,109: 862,193: 858,108: 832,212: 828,196: 824,201: 815,205: 802,188: 796,204: 775,189: 769,200: 760,194: 750,190: 725,210: 710,211: 701,195: 677,209: 672,218: 672,198: 651,197: 650,199: 610,202: 607,207: 547,206: 523,0: 33},'dst_host_same_srv_rate': {1.0: 3450149,0.02: 161618,0.04: 157446,0.05: 155521,0.07: 149794,0.01: 126495,0.0: 113651,0.03: 106632,0.06: 102579,0.08: 59469,0.09: 29796,0.98: 17350,0.96: 14685,0.95: 13834,0.93: 11040,0.1: 11022,0.94: 10896,0.99: 10511,0.91: 9460,0.92: 8726,0.97: 8416,0.89: 6155,0.9: 4918,0.87: 4420,0.88: 4399,0.67: 2985,0.8: 2860,0.86: 2822,0.62: 2654,0.84: 2595,0.71: 2593,0.64: 2590,0.75: 2573,0.6: 2572,0.83: 2545,0.85: 2524,0.65: 2494,0.5: 2484,0.73: 2436,0.69: 2435,0.68: 2351,0.55: 2335,0.25: 2335,0.7: 2334,0.63: 2261,0.72: 2260,0.61: 2234,0.56: 2233,0.82: 2213,0.57: 2204,0.59: 2198,0.74: 2194,0.58: 2142,0.11: 2128,0.66: 2078,0.54: 2075,0.76: 2067,0.53: 1961,0.38: 1867,0.12: 1860,0.24: 1846,0.81: 1840,0.33: 1838,0.47: 1825,0.52: 1815,0.44: 1770,0.79: 1762,0.4: 1761,0.51: 1761,0.77: 1757,0.78: 1748,0.49: 1733,0.45: 1727,0.27: 1724,0.2: 1705,0.29: 1698,0.22: 1684,0.39: 1659,0.13: 1651,0.46: 1648,0.15: 1643,0.31: 1615,0.26: 1612,0.14: 1609,0.35: 1592,0.48: 1592,0.36: 1590,0.18: 1555,0.42: 1538,0.32: 1537,0.16: 1533,0.43: 1532,0.17: 1527,0.41: 1524,0.23: 1508,0.28: 1452,0.21: 1432,0.19: 1422,0.37: 1417,0.3: 1414,0.34: 1361},'dst_host_diff_srv_rate': {0.0: 3442197,0.07: 458302,0.06: 281262,0.05: 185221,0.08: 145560,0.01: 130698,0.02: 40670,0.09: 39347,0.03: 34462,0.04: 26969,1.0: 20001,0.85: 7441,0.86: 5647,0.84: 5228,0.1: 4769,0.87: 4215,0.11: 3118,0.12: 2899,0.82: 2036,0.83: 1953,0.62: 1826,0.14: 1771,0.64: 1734,0.65: 1688,0.67: 1621,0.13: 1538,0.6: 1468,0.15: 1443,0.17: 1291,0.69: 1215,0.5: 1180,0.89: 1144,0.66: 1143,0.51: 1130,0.53: 1125,0.73: 1097,0.18: 1075,0.2: 1044,0.63: 998,0.88: 963,0.71: 954,0.25: 929,0.74: 918,0.68: 915,0.16: 862,0.58: 860,0.61: 843,0.56: 841,0.55: 838,0.33: 836,0.59: 817,0.52: 789,0.22: 778,0.4: 765,0.7: 760,0.8: 736,0.78: 731,0.54: 711,0.75: 680,0.19: 674,0.29: 650,0.76: 643,0.72: 627,0.47: 620,0.57: 611,0.21: 605,0.91: 602,0.81: 588,0.77: 556,0.49: 514,0.44: 510,0.43: 500,0.48: 490,0.23: 465,0.27: 459,0.42: 444,0.79: 442,0.9: 418,0.38: 393,0.24: 326,0.45: 322,0.3: 301,0.46: 295,0.31: 259,0.41: 254,0.36: 225,0.39: 174,0.37: 146,0.26: 146,0.35: 145,0.28: 124,0.95: 112,0.34: 111,0.97: 110,0.32: 102,0.96: 94,0.93: 82,0.92: 64,0.94: 61,0.98: 58,0.99: 57},'dst_host_same_src_port_rate': {1.0: 2881921,0.0: 1411146,0.01: 215839,0.02: 71077,0.03: 43141,0.04: 27020,0.05: 21019,0.06: 17541,0.5: 15976,0.08: 14104,0.07: 13581,0.33: 13564,0.25: 12076,0.2: 10772,0.17: 10050,0.14: 9352,0.12: 9237,0.11: 9022,0.1: 8459,0.09: 8411,0.99: 6231,0.98: 4630,0.96: 2882,0.95: 2518,0.97: 2166,0.93: 1528,0.89: 1337,0.15: 1329,0.29: 1274,0.88: 1262,0.13: 1239,0.31: 1210,0.92: 1161,0.32: 1160,0.27: 1155,0.22: 1141,0.91: 1139,0.35: 1131,0.94: 1129,0.16: 1123,0.38: 1098,0.24: 1082,0.28: 1066,0.21: 1058,0.18: 1053,0.36: 1053,0.3: 1040,0.26: 1020,0.4: 990,0.19: 969,0.87: 961,0.34: 957,0.23: 948,0.39: 933,0.85: 880,0.37: 843,0.9: 826,0.82: 815,0.44: 807,0.42: 796,0.41: 794,0.86: 775,0.43: 720,0.45: 662,0.84: 645,0.47: 642,0.46: 625,0.83: 610,0.8: 590,0.67: 551,0.49: 508,0.48: 506,0.75: 494,0.53: 480,0.6: 467,0.62: 464,0.55: 460,0.73: 454,0.52: 453,0.51: 449,0.74: 444,0.81: 442,0.56: 439,0.58: 429,0.57: 425,0.79: 420,0.54: 409,0.71: 398,0.69: 395,0.59: 369,0.68: 369,0.61: 361,0.76: 358,0.78: 349,0.64: 348,0.72: 334,0.65: 327,0.7: 320,0.63: 315,0.77: 303,0.66: 280},'dst_host_srv_diff_host_rate': {0.0: 4384482,0.02: 117501,0.01: 104684,0.04: 66844,0.03: 65742,0.05: 46108,0.06: 19440,0.07: 15156,0.5: 7380,0.08: 6568,0.09: 6077,0.15: 3923,0.11: 3895,0.16: 3873,0.13: 3562,0.1: 3507,1.0: 3463,0.2: 3262,0.18: 3250,0.14: 3046,0.17: 3019,0.12: 2919,0.25: 2729,0.22: 2423,0.19: 2245,0.21: 2213,0.24: 1225,0.23: 1161,0.26: 1070,0.29: 974,0.33: 915,0.27: 831,0.51: 806,0.4: 550,0.28: 478,0.3: 443,0.67: 385,0.31: 337,0.52: 297,0.32: 277,0.38: 167,0.34: 126,0.35: 116,0.43: 112,0.36: 106,0.53: 92,0.6: 88,0.54: 64,0.37: 55,0.44: 54,0.57: 49,0.56: 49,0.75: 44,0.39: 37,0.55: 36,0.42: 35,0.45: 23,0.41: 22,0.46: 20,0.47: 14,0.62: 13,0.8: 11,0.48: 9,0.58: 5,0.71: 4,0.49: 4,0.86: 2,0.64: 2,0.83: 2,0.7: 2,0.73: 2,0.97: 2,0.88: 1,0.59: 1,0.93: 1,0.78: 1},'dst_host_serror_rate': {0.0: 3966023,1.0: 867360,0.01: 32574,0.02: 8190,0.03: 3868,0.04: 1973,0.05: 1702,0.08: 1258,0.07: 1201,0.09: 979,0.06: 937,0.14: 908,0.13: 887,0.15: 811,0.11: 756,0.1: 754,0.16: 748,0.12: 611,0.18: 555,0.25: 371,0.17: 364,0.99: 332,0.27: 303,0.2: 297,0.19: 294,0.31: 282,0.33: 219,0.98: 191,0.35: 169,0.24: 145,0.28: 134,0.5: 131,0.22: 117,0.96: 116,0.32: 113,0.97: 112,0.53: 106,0.29: 104,0.94: 96,0.3: 89,0.42: 89,0.23: 89,0.93: 84,0.95: 82,0.26: 79,0.21: 77,0.92: 59,0.36: 57,0.91: 57,0.55: 48,0.89: 48,0.51: 45,0.47: 45,0.9: 45,0.44: 43,0.49: 42,0.88: 40,0.45: 40,0.87: 39,0.43: 39,0.85: 37,0.4: 36,0.56: 35,0.84: 34,0.82: 34,0.78: 34,0.8: 33,0.58: 32,0.67: 32,0.75: 32,0.34: 31,0.38: 31,0.52: 31,0.71: 31,0.48: 30,0.6: 30,0.86: 30,0.83: 28,0.46: 28,0.76: 28,0.62: 28,0.73: 27,0.54: 27,0.69: 27,0.65: 27,0.64: 26,0.39: 25,0.41: 24,0.81: 23,0.79: 23,0.57: 22,0.37: 22,0.68: 20,0.77: 20,0.74: 19,0.7: 19,0.59: 18,0.61: 18,0.72: 18,0.66: 17,0.63: 17},'dst_host_srv_serror_rate': {0.0: 3973284,1.0: 869899,0.01: 45103,0.02: 6065,0.03: 1140,0.04: 731,0.05: 281,0.06: 183,0.5: 112,0.08: 106,0.07: 106,0.09: 80,0.12: 60,0.11: 59,0.97: 57,0.98: 54,0.67: 53,0.1: 51,0.96: 48,0.33: 46,0.17: 41,0.14: 35,0.25: 32,0.2: 28,0.95: 28,0.92: 27,0.91: 23,0.94: 23,0.93: 22,0.75: 21,0.88: 20,0.78: 19,0.16: 18,0.9: 16,0.62: 15,0.13: 14,0.73: 14,0.15: 14,0.79: 14,0.18: 13,0.6: 13,0.27: 13,0.43: 12,0.4: 12,0.31: 12,0.65: 12,0.56: 12,0.85: 11,0.22: 11,0.89: 11,0.29: 11,0.83: 10,0.8: 10,0.7: 10,0.69: 10,0.86: 10,0.87: 10,0.48: 10,0.36: 9,0.76: 9,0.51: 9,0.81: 9,0.3: 8,0.49: 8,0.71: 8,0.66: 8,0.57: 8,0.82: 8,0.24: 8,0.34: 8,0.21: 8,0.41: 8,0.63: 7,0.39: 7,0.64: 7,0.52: 7,0.38: 7,0.47: 7,0.84: 7,0.26: 7,0.19: 7,0.32: 7,0.61: 7,0.68: 6,0.54: 6,0.77: 6,0.74: 6,0.58: 6,0.45: 6,0.55: 5,0.72: 5,0.23: 5,0.28: 5,0.42: 5,0.53: 4,0.35: 4,0.37: 4,0.44: 4,0.59: 3,0.46: 3},'dst_host_rerror_rate': {0.0: 4571036,1.0: 260470,0.01: 13143,0.02: 6679,0.03: 3445,0.04: 2791,0.05: 2143,0.85: 1189,0.87: 1158,0.93: 1113,0.06: 1096,0.95: 999,0.92: 995,0.84: 975,0.86: 933,0.89: 863,0.91: 821,0.07: 816,0.08: 810,0.82: 806,0.96: 743,0.9: 720,0.88: 689,0.09: 651,0.99: 651,0.98: 626,0.97: 618,0.83: 601,0.94: 554,0.11: 530,0.1: 529,0.12: 514,0.69: 505,0.14: 480,0.81: 472,0.73: 468,0.75: 468,0.33: 463,0.5: 460,0.17: 448,0.8: 426,0.13: 416,0.2: 413,0.25: 397,0.15: 370,0.65: 326,0.18: 321,0.67: 309,0.19: 305,0.16: 298,0.53: 293,0.52: 292,0.76: 286,0.4: 285,0.47: 281,0.78: 275,0.58: 272,0.22: 270,0.71: 268,0.29: 252,0.7: 251,0.21: 248,0.56: 245,0.36: 243,0.38: 241,0.68: 241,0.35: 235,0.31: 235,0.57: 232,0.43: 229,0.27: 228,0.23: 227,0.24: 226,0.32: 225,0.55: 222,0.62: 220,0.42: 219,0.64: 217,0.77: 214,0.45: 214,0.48: 213,0.3: 213,0.6: 213,0.41: 212,0.49: 209,0.44: 203,0.74: 200,0.79: 200,0.26: 195,0.34: 193,0.46: 191,0.54: 189,0.37: 183,0.72: 182,0.51: 182,0.59: 180,0.39: 178,0.28: 176,0.63: 175,0.61: 162,0.66: 144},'dst_host_srv_rerror_rate': {0.0: 4575957,1.0: 257113,0.01: 17057,0.02: 5104,0.98: 3182,0.04: 3103,0.99: 2958,0.03: 2885,0.05: 2427,0.97: 1451,0.06: 1315,0.95: 995,0.96: 983,0.94: 953,0.07: 901,0.93: 892,0.89: 809,0.8: 724,0.78: 680,0.91: 651,0.74: 648,0.87: 560,0.85: 554,0.84: 522,0.92: 518,0.75: 517,0.82: 502,0.67: 502,0.9: 464,0.68: 409,0.79: 404,0.08: 397,0.83: 377,0.86: 369,0.53: 369,0.66: 341,0.88: 339,0.56: 331,0.72: 330,0.69: 324,0.76: 320,0.55: 319,0.6: 308,0.33: 308,0.62: 291,0.77: 286,0.58: 284,0.71: 281,0.65: 273,0.57: 267,0.54: 261,0.5: 259,0.81: 250,0.64: 249,0.7: 246,0.4: 237,0.09: 236,0.29: 232,0.12: 229,0.73: 226,0.59: 206,0.61: 189,0.45: 182,0.51: 180,0.52: 171,0.35: 164,0.13: 158,0.63: 154,0.42: 150,0.49: 143,0.38: 138,0.34: 138,0.41: 128,0.39: 127,0.36: 124,0.44: 122,0.47: 120,0.11: 119,0.43: 112,0.48: 111,0.1: 107,0.31: 94,0.25: 92,0.46: 91,0.37: 85,0.2: 80,0.32: 76,0.14: 69,0.27: 69,0.17: 67,0.24: 48,0.15: 46,0.3: 43,0.28: 42,0.18: 41,0.22: 36,0.16: 31,0.26: 29,0.19: 26,0.23: 25,0.21: 19},'target': {'smurf.': 2807886,'neptune.': 1072017,'normal.': 972781,'satan.': 15892,'ipsweep.': 12481,'portsweep.': 10413,'nmap.': 2316,'back.': 2203,'warezclient.': 1020,'teardrop.': 979,'pod.': 264,'guess_passwd.': 53,'buffer_overflow.': 30,'land.': 21,'warezmaster.': 20,'imap.': 12,'rootkit.': 10,'loadmodule.': 9,'ftp_write.': 8,'multihop.': 7,'phf.': 4,'perl.': 3,'spy.': 2},'target_type': {'DOS': 3883370,'normal.': 972781,'PROBING': 41102,'R2L': 1126,'U2R': 52}}
# #筛选发生概率大于99.9%的特征
# low_e_list =[]
# for i in features_dic.keys():
#     for j in features_dic[i].values():
#         if j/df.shape[0]>=0.999:
#             low_e_list.append(i)
# low_e_list
['land','wrong_fragment','urgent','num_failed_logins','num_compromised','root_shell','su_attempted','num_file_creations','num_shells','num_access_files','is_hot_login','is_guest_login']
pd.crosstab(df.land,df.target_type)
target_type DOS PROBING R2L U2R normal.
land
0 3883349 41102 1126 52 972774
1 21 0 0 0 7
pd.crosstab(df.land,df.target)
target back. buffer_overflow. ftp_write. guess_passwd. imap. ipsweep. land. loadmodule. multihop. neptune. nmap. normal. perl. phf. pod. portsweep. rootkit. satan. smurf. spy. teardrop. warezclient. warezmaster.
land
0 2203 30 8 53 12 12481 0 9 7 1072017 2316 972774 3 4 264 10413 10 15892 2807886 2 979 1020 20
1 0 0 0 0 0 0 21 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0

将字符型离散变量转化为数值型

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit(df['target_type'])
df['target_type']=le.transform(df['target_type'])
# 将 protocol_type, service,flag 三个字段转化为数值型
le_protocol_type = LabelEncoder()
le_protocol_type.fit(df.protocol_type)
df.protocol_type=le_protocol_type.transform(df.protocol_type)
service_tag = ['aol','auth','bgp','courier','csnet_ns','ctf','daytime', 'discard','domain','domain_u','echo','eco_i', 'ecr_i','efs', 'exec', 'finger', 'ftp', 'ftp_data', 'gopher', 'harvest', 'hostnames', 'http', 'http_2784', 'http_443','http_8001', 'imap4', 'IRC', 'iso_tsap', 'klogin', 'kshell', 'ldap', 'link', 'login', 'mtp', 'name', 'netbios_dgm','netbios_ns', 'netbios_ssn', 'netstat', 'nnsp', 'nntp', 'ntp_u', 'other', 'pm_dump', 'pop_2', 'pop_3', 'printer','private', 'red_i', 'remote_job', 'rje', 'shell', 'smtp', 'sql_net', 'ssh', 'sunrpc', 'supdup', 'systat', 'telnet', 'tftp_u', 'tim_i', 'time', 'urh_i', 'urp_i','uucp', 'uucp_path', 'vmnet', 'whois', 'X11', 'Z39_50','icmp']
le_service = LabelEncoder()
le_service.fit(service_tag)
df.service=le_service.transform(df.service)
le_flag = LabelEncoder()
le_flag.fit(df.flag)
df.flag = le_flag.transform(df.flag)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4898431 entries, 0 to 4898430
Data columns (total 42 columns):
duration                       int64
protocol_type                  int32
service                        int32
flag                           int32
src_bytes                      int64
dst_bytes                      int64
land                           int64
wrong_fragment                 int64
urgent                         int64
hot                            int64
num_failed_logins              int64
logged_in                      int64
num_compromised                int64
root_shell                     int64
su_attempted                   int64
num_root                       int64
num_file_creations             int64
num_shells                     int64
num_access_files               int64
is_hot_login                   int64
is_guest_login                 int64
count                          int64
srv_count                      int64
serror_rate                    float64
srv_serror_rate                float64
rerror_rate                    float64
srv_rerror_rate                float64
same_srv_rate                  float64
diff_srv_rate                  float64
srv_diff_host_rate             float64
dst_host_count                 int64
dst_host_srv_count             int64
dst_host_same_srv_rate         float64
dst_host_diff_srv_rate         float64
dst_host_same_src_port_rate    float64
dst_host_srv_diff_host_rate    float64
dst_host_serror_rate           float64
dst_host_srv_serror_rate       float64
dst_host_rerror_rate           float64
dst_host_srv_rerror_rate       float64
target                         object
target_type                    int32
dtypes: float64(15), int32(4), int64(22), object(1)
memory usage: 1.5+ GB

提取特征矩阵和标签

X= df.iloc[:,:-2]
Y= df.iloc[:,-1]

切分训练集和测试集

from sklearn.model_selection import train_test_split Xtrain,Xtest,Ytrain,Ytest = train_test_split(X,Y,test_size = 0.3,random_state = 7)X.shape
(4898431, 40)
Xtrain.shape
(3428901, 40)
Xtest.shape
(1469530, 40)

决策树模型筛选字段

from sklearn.tree import DecisionTreeClassifier #导入分类树
dtclf=DecisionTreeClassifier(random_state=7)
dtclf = dtclf.fit(Xtrain,Ytrain)#训练模型
dt_score = dtclf.score(Xtest,Ytest)#查看模型效果
dt_score
0.9999394364184467
from sklearn.metrics import confusion_matrix
pd.DataFrame(confusion_matrix(Ytest,dtclf.predict(Xtest),labels=[0,1,2,3,4]))
0 1 2 3 4
0 1165139 2 0 0 2
1 2 12242 0 0 18
2 1 2 307 2 18
3 0 0 2 10 10
4 12 7 7 4 291743
dtclf.feature_importances_.shape
(40,)
t1 = sorted([*zip(X.columns,dtclf.feature_importances_)],key=(lambda x:x[1]),reverse=True)
t1
[('count', 0.9068626046206024),('dst_host_serror_rate', 0.02241779830887273),('diff_srv_rate', 0.01812624171027743),('dst_bytes', 0.016390409180257894),('dst_host_srv_diff_host_rate', 0.011906366453233608),('dst_host_diff_srv_rate', 0.007023148205235081),('service', 0.0058209874567988875),('src_bytes', 0.0035919929941658356),('protocol_type', 0.00254284591630643),('num_compromised', 0.001959164583740053),('flag', 0.0011409009135562502),('wrong_fragment', 0.0004906336598947696),('dst_host_rerror_rate', 0.00035468112211381426),('dst_host_same_src_port_rate', 0.0003331056598445668),('duration', 0.00023222416376659689),('hot', 0.00017104489484580593),('dst_host_same_srv_rate', 0.00013519235138785438),('dst_host_srv_rerror_rate', 0.00010031054379468164),('dst_host_count', 7.133048115728888e-05),('logged_in', 6.534276798013103e-05),('dst_host_srv_count', 4.67301350246183e-05),('dst_host_srv_serror_rate', 4.582179977679342e-05),('srv_serror_rate', 4.0364479179341566e-05),('is_guest_login', 3.7796381156556284e-05),('num_failed_logins', 3.090826218805775e-05),('same_srv_rate', 1.251117515084309e-05),('srv_count', 1.0850180806941767e-05),('srv_rerror_rate', 7.684478087600836e-06),('srv_diff_host_rate', 6.686957867928859e-06),('rerror_rate', 6.392373872147657e-06),('num_root', 5.673540544638304e-06),('serror_rate', 4.3915423435587785e-06),('urgent', 3.797352024726769e-06),('num_file_creations', 2.101587954030111e-06),('num_shells', 1.7559380825881046e-06),('root_shell', 2.0782810759976382e-07),('land', 0.0),('su_attempted', 0.0),('num_access_files', 0.0),('is_hot_login', 0.0)]
low_importance_fields =[]
for i in t1:if i[1]<0.0001:low_importance_fields.append(i[0])
low_importance_fields
['dst_host_count','logged_in','dst_host_srv_count','dst_host_srv_serror_rate','srv_serror_rate','is_guest_login','num_failed_logins','same_srv_rate','srv_count','srv_rerror_rate','srv_diff_host_rate','rerror_rate','num_root','serror_rate','urgent','num_file_creations','num_shells','root_shell','land','su_attempted','num_access_files','is_hot_login']
for i in low_importance_fields:del X[i]
X.shape
(4898431, 18)

建模

建立带交叉验证的树模型

Xtrain,Xtest,Ytrain,Ytest = train_test_split(X,Y,test_size = 0.3,random_state = 7)
Xtrain.shape
(3428901, 18)
from sklearn.model_selection import cross_val_score #导入进行交叉验证的模块
clf = DecisionTreeClassifier(random_state=7)
dt_score_cvs= cross_val_score(clf,X,Y,cv=10).mean()
dt_score_cvs
0.9569433478963363

学习曲线

plt.style.use('ggplot')
#设置rc动态参数
plt.rcParams['font.sans-serif']=['Simhei']  #显示中文
plt.rcParams['axes.unicode_minus']=False    #设置显示中文后,负号显示受影响,显示负号  
tr=[]
te=[]
for i in range(15):clf = DecisionTreeClassifier(random_state=7,max_depth=i+1)clf = clf.fit(Xtrain, Ytrain)score_tr = clf.score(Xtrain,Ytrain) #查看模型在训练集上的效果score_te = cross_val_score(clf,X,Y,cv=10).mean() #查看模型在测试集上交叉验证后的均值print("树深为:-----{},训练集分数为{},交叉验证分数为{}".format(i+1,score_tr,score_te))tr.append(score_tr) #15次循环后在训练集上效果te.append(score_te) #15次循环后在测试集上十折交叉验证的均值
print(max(te)) #打印在测试集上最好的效果  0.9965 最佳树深4
plt.figure(figsize=(10,6))
plt.plot(range(1,16),tr,color="red",label="train") #画模型在训练集和测试集上的效果图,可以根据模型在两个数据集上的表现判断模型是过拟合还是欠拟合,
plt.plot(range(1,16),te,color="blue",label="test") #然后进行相应的调参
plt.xticks(range(1,16))
plt.xlabel('max_depth')
plt.ylabel('score')
plt.title('不同树深对应的K折交叉验证分数和训练集的分数')
plt.legend();
树深为:-----1,训练集分数为0.9840625319891125,交叉验证分数为0.9835259490974406
树深为:-----2,训练集分数为0.9910102391407626,交叉验证分数为0.9863184763261226
树深为:-----3,训练集分数为0.9957683234365763,交叉验证分数为0.9914009198666749
树深为:-----4,训练集分数为0.9970279106920847,交叉验证分数为0.9965480794272645
树深为:-----5,训练集分数为0.9975254461998174,交叉验证分数为0.9549427078998493
树深为:-----6,训练集分数为0.997691971859205,交叉验证分数为0.9547009977767272
树深为:-----7,训练集分数为0.9987675351373516,交叉验证分数为0.9551670654213739
树深为:-----8,训练集分数为0.9993041502218932,交叉验证分数为0.956091034920217
树深为:-----9,训练集分数为0.9995115052898873,交叉验证分数为0.9562059697287826
树深为:-----10,训练集分数为0.9995774156209234,交叉验证分数为0.9567502246148385
树深为:-----11,训练集分数为0.9996159119204666,交叉验证分数为0.9564190987861902
树深为:-----12,训练集分数为0.9998060603091194,交叉验证分数为0.9568882282351435
树深为:-----13,训练集分数为0.9998430984154981,交叉验证分数为0.9569900976183685
树深为:-----14,训练集分数为0.9998626382039026,交叉验证分数为0.9568143273564772
树深为:-----15,训练集分数为0.9998655545902316,交叉验证分数为0.9566977596749681
0.9965480794272645

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QpheXdD4-1583246686076)(output_86_1.png)]

根据学习曲线可以发现当最大树深大于4时,交叉验证所得结果分数大于训练集分数,模型容易出现过拟合,所以这里设置树的最大树深为max_depth=4

网格搜索调参

from sklearn.model_selection import GridSearchCV
params ={#     'splitter':('best','random'),
#          'criterion':('gini','entropy'),
#          'max_depth':[4],
#          'min_samples_split':[*range(2,20,2)],
#        'min_samples_leaf':[*range(1,10)]
}
clf_gcv = DecisionTreeClassifier(random_state=7)
GS = GridSearchCV(clf_gcv,params,cv =10,iid=True)
GS.fit(Xtrain,Ytrain)
D:\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py:823: FutureWarning: The parameter 'iid' is deprecated in 0.22 and will be removed in 0.24."removed in 0.24.", FutureWarningGridSearchCV(cv=10, error_score=nan,estimator=DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None,criterion='gini', max_depth=None,max_features=None,max_leaf_nodes=None,min_impurity_decrease=0.0,min_impurity_split=None,min_samples_leaf=1,min_samples_split=2,min_weight_fraction_leaf=0.0,presort='deprecated',random_state=7, splitter='best'),iid=True, n_jobs=None,param_grid={'max_depth': [4],'min_samples_split': [2, 4, 6, 8, 10, 12, 14, 16, 18]},pre_dispatch='2*n_jobs', refit=True, return_train_score=False,scoring=None, verbose=0)
#'splitter': 'best'
# 'criterion': 'gini'
# 'min_samples_split': 2
#  'min_samples_leaf': 1
GS.best_params_
{'max_depth': 4, 'min_samples_split': 2}
GS.best_score_
0.997027035776186
params ={'min_samples_leaf':range(5,50,10)}
clf_gcv = DecisionTreeClassifier(random_state=7)
GS = GridSearchCV(clf_gcv,params,cv =10,iid=True)
GS.fit(Xtrain,Ytrain)
D:\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py:823: FutureWarning: The parameter 'iid' is deprecated in 0.22 and will be removed in 0.24."removed in 0.24.", FutureWarningGridSearchCV(cv=10, error_score=nan,estimator=DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None,criterion='gini', max_depth=None,max_features=None,max_leaf_nodes=None,min_impurity_decrease=0.0,min_impurity_split=None,min_samples_leaf=1,min_samples_split=2,min_weight_fraction_leaf=0.0,presort='deprecated',random_state=7, splitter='best'),iid=True, n_jobs=None,param_grid={'min_samples_leaf': range(5, 50, 10)},pre_dispatch='2*n_jobs', refit=True, return_train_score=False,scoring=None, verbose=0)
GS.best_params_
{'min_samples_leaf': 5}
GS.best_score_
0.9999160080737239
params ={'splitter':('best','random'),'criterion':('gini','entropy'),'max_depth':[*range(1,11)],'min_samples_leaf':[*range(5,50,10)],'min_impurity_decrease':[*np.linspace(0,0.5,10)]}
params2 ={'splitter':('best','random'),'criterion':('gini','entropy'),'max_depth':[4],'min_samples_leaf':[*range(1,10)],'min_impurity_decrease':[*np.linspace(0,0.5,10)]}
clf2 = DecisionTreeClassifier(random_state=7)
GS2 = GridSearchCV(clf2,params2,cv=10,iid=True,n_jobs=-1)
GS2.fit(Xtrain,Ytrain)

V(clf,params,cv=10,iid=True)
GS.fit(Xtrain,Ytrain)

D:\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py:823: FutureWarning: The parameter 'iid' is deprecated in 0.22 and will be removed in 0.24."removed in 0.24.", FutureWarningGridSearchCV(cv=10, error_score=nan,estimator=DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None,criterion='gini', max_depth=None,max_features=None,max_leaf_nodes=None,min_impurity_decrease=0.0,min_impurity_split=None,min_samples_leaf=1,min_samples_split=2,min_weight_fraction_leaf=0.0,presort='deprecated',random_state=7, splitter='best'),iid=True,...'max_depth': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],'min_impurity_decrease': [0.0, 0.05555555555555555,0.1111111111111111,0.16666666666666666,0.2222222222222222,0.2777777777777778,0.3333333333333333,0.38888888888888884,0.4444444444444444, 0.5],'min_samples_leaf': [5, 15, 25, 35, 45],'splitter': ('best', 'random')},pre_dispatch='2*n_jobs', refit=True, return_train_score=False,scoring=None, verbose=0)```python
GS.best_params_
{'criterion': 'entropy','max_depth': 10,'min_impurity_decrease': 0.0,'min_samples_leaf': 5,'splitter': 'best'}
GS.best_score_
0.9998906355126613

根据最优参数建立树模型

best_dtclf =DecisionTreeClassifier(random_state=7,criterion ='gini',max_depth =4,min_samples_leaf =1,
#                                     min_impurity_decrease=0.0,                splitter = 'best')
dt_score_cvs= cross_val_score(best_dtclf,X,Y,cv=10)
dt_score_cvs
array([0.9912911 , 0.99853422, 0.99625798, 0.99941614, 0.99853014,0.99884249, 0.99311616, 0.99711744, 0.99695004, 0.99542506])
dt_score_cvs.mean()
0.9965480794272645
dt_score_cvs.var()
6.260831911813343e-06
last_clf = DecisionTreeClassifier(random_state=7,criterion ='gini',max_depth =4,min_samples_leaf =1,
#                                     min_impurity_decrease=0.0,                splitter = 'best')
last_clf = last_clf.fit(Xtrain,Ytrain)
score = last_clf.score(Xtest,Ytest)
score

n import tree
import graphviz

```python
Xtrain.columns.values
array(['duration', 'protocol_type', 'service', 'flag', 'src_bytes','dst_bytes', 'wrong_fragment', 'hot', 'num_compromised', 'count','diff_srv_rate', 'dst_host_same_srv_rate','dst_host_diff_srv_rate', 'dst_host_same_src_port_rate','dst_host_srv_diff_host_rate', 'dst_host_serror_rate','dst_host_rerror_rate', 'dst_host_srv_rerror_rate'], dtype=object)
dot_data =tree.export_graphviz(last_clf,feature_names=Xtrain.columns.values,class_names=le.classes_,filled=True,rounded=True)
graph=graphviz.Source(dot_data)
# graph.render('tree')
graph

决策树评估

y_pred = last_clf.predict(Xtest)
from sklearn.metrics import confusion_matrix
from collections import Counter
Counter(Ytest)
Counter({0: 1165143, 4: 291773, 1: 12262, 2: 330, 3: 22})
Counter(y_pred)
Counter({0: 1164089, 4: 295475, 1: 9966})
confusion_matrix(Ytest,y_pred,labels=[0,1,2,3,4])
array([[1163999,      24,       0,       0,    1120],[     70,    9701,       0,       0,    2491],[      0,       8,       0,       0,     322],[      0,       0,       0,       0,      22],[     20,     233,       0,       0,  291520]], dtype=int64)
res_con_mat = pd.DataFrame(confusion_matrix(Ytest,y_pred,labels=[0,1,2,3,4]))
res_con_mat.set_index(le.classes_,inplace= True)
res_con_mat.columns=le.classes_
res_con_mat
DOS PROBING R2L U2R normal.
DOS 1163999 24 0 0 1120
PROBING 70 9701 0 0 2491
R2L 0 8 0 0 322
U2R 0 0 0 0 22
normal. 20 233 0 0 291520

过采样处理数据

随机过采样

from sklearn.model_selection import train_test_split
Xtrain,Xtest,Ytrain,Ytest = train_test_split(X,Y,test_size = 0.3,random_state = 7)
X.shape
(4898431, 18)
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(sampling_strategy='auto',random_state=7)
X_resampled, y_resampled = ros.fit_sample(Xtrain,Ytrain)
from collections import Counter
Counter(Ytrain)
Counter({0: 2718227, 4: 681008, 1: 28840, 2: 796, 3: 30})
Counter(y_resampled)
Counter({0: 2718227, 4: 2718227, 1: 2718227, 2: 2718227, 3: 2718227})

过采样后学习曲线

plt.style.use('ggplot')
#设置rc动态参数
plt.rcParams['font.sans-serif']=['Simhei']  #显示中文
plt.rcParams['axes.unicode_minus']=False    #设置显示中文后,负号显示受影响,显示负号  
tr=[]
te=[]
for i in range(15):clf = DecisionTreeClassifier(random_state=7,max_depth=i+1)clf = clf.fit(X_resampled, y_resampled)score_tr = clf.score(X_resampled,y_resampled) #查看模型在训练集上的效果score_te = cross_val_score(clf,X,Y,cv=10).mean() #查看模型在测试集上交叉验证后的均值print("树深为:-----{},训练集分数为{},交叉验证分数为{}".format(i+1,score_tr,score_te))tr.append(score_tr) #15次循环后在训练集上效果te.append(score_te) #15次循环后在测试集上十折交叉验证的均值
print(max(te)) #打印在测试集上最好的效果
树深为:-----1,训练集分数为0.3982541561098466,交叉验证分数为0.9835259490974406
树深为:-----2,训练集分数为0.5927148836355463,交叉验证分数为0.9863184763261226
树深为:-----3,训练集分数为0.7472467163338455,交叉验证分数为0.9914009198666749
树深为:-----4,训练集分数为0.8898918302260996,交叉验证分数为0.9965480794272645
树深为:-----5,训练集分数为0.9471810853177457,交叉验证分数为0.9549427078998493
树深为:-----6,训练集分数为0.9571636953058005,交叉验证分数为0.9547009977767272
树深为:-----7,训练集分数为0.9761342963630337,交叉验证分数为0.9551670654213739
树深为:-----8,训练集分数为0.9889636148857325,交叉验证分数为0.956091034920217
树深为:-----9,训练集分数为0.9930135341897495,交叉验证分数为0.9562059697287826
树深为:-----10,训练集分数为0.9973434889727752,交叉验证分数为0.9567502246148385
树深为:-----11,训练集分数为0.9989269476022422,交叉验证分数为0.9564190987861902
树深为:-----12,训练集分数为0.9993786390908486,交叉验证分数为0.9568882282351435
树深为:-----13,训练集分数为0.9996491095114572,交叉验证分数为0.9569900976183685
树深为:-----14,训练集分数为0.9997410812268438,交叉验证分数为0.9568143273564772
树深为:-----15,训练集分数为0.999755649546561,交叉验证分数为0.9566977596749681
0.9965480794272645
fig1 =plt.figure(figsize=(10,6))
plt.plot(range(1,16),tr,color="red",label="train") #画模型在训练集和测试集上的效果图,可以根据模型在两个数据集上的表现判断模型是过拟合还是欠拟合,
plt.plot(range(1,16),te,color="blue",label="test") #然后进行相应的调参
plt.xticks(range(1,16))
plt.xlabel('max_depth')
plt.ylabel('score')
plt.title('过采样后不同树深对应的K折交叉验证分数和训练集的分数')
plt.legend()
plt.savefig('out_file/过采样后学习曲线.png',dpi=300)

调参

params ={#      'splitter':('best','random'),
#         'criterion':('gini','entropy'),
#          'max_depth':[6],
#          'min_samples_split':[*range(2,20,2)],
#        'min_samples_leaf':[*range(1,10)]
}
clf_gcv = DecisionTreeClassifier(random_state=7)
GS = GridSearchCV(clf_gcv,params,cv =10,iid=True)
GS.fit(X_resampled,y_resampled)
print(GS.best_score_)
GS.best_params_
# 'criterion': 'entropy'
# 'max_depth': 6
# 'splitter': 'best'
# 'min_samples_split': 2
#  'min_samples_leaf': 1
print(GS.best_score_)
GS.best_params_
0.9571614879846312{'max_depth': 6, 'min_samples_leaf': 1}

最优参数树(过采样后)

last_clf_os = DecisionTreeClassifier(random_state=7,criterion ='entropy',max_depth =6,min_samples_leaf =1,
#                                     min_impurity_decrease=0.0,                splitter = 'best')
last_clf_os = last_clf_os.fit(X_resampled,y_resampled)
score_os = last_clf_os.score(Xtest,Ytest)
score_os
0.9900607677284574
from sklearn import tree
import graphviz
dot_data_os =tree.export_graphviz(last_clf_os,feature_names=X_resampled.columns.values,class_names=le.classes_,filled=True,rounded=True)
graph_os=graphviz.Source(dot_data_os)
# graph_os.render('out_file/tree_os2')
graph_os
y_pred_os = last_clf_os.predict(Xtest)

构建评估指标

TP:真正例,实际为正预测为正;

FP:假正例,实际为负但预测为正;

FN:假反例,实际为正但预测为负;

TN:真反例,实际为负预测为负

查准率(精准率):Precision = TP / (TP+FP);

查全率(召回率):Recall = TP / (TP+FN);

正确率(准确率):Accuracy = (TP+TN) / (TP+FP+TN+FN)

F值(F1-scores):Precision和Recall加权调和平均数,并假设两者一样重要。

F1-score = (2Recall*Precision) / (Recall + Precision)

res_con_mat_os = pd.DataFrame(confusion_matrix(Ytest,y_pred_os,labels=[0,1,2,3,4]))
res_con_mat_os.set_index(le.classes_,inplace= True)
res_con_mat_os.columns=le.classes_
res_con_mat_os
DOS PROBING R2L U2R normal.
DOS 1161582 67 74 48 3372
PROBING 2 12192 2 8 58
R2L 0 3 319 2 6
U2R 0 0 2 17 3
normal. 56 3036 1990 5877 280814
res_con_mat_os['T_sum']=res_con_mat_os.apply(lambda x:x.sum(),axis=1)
res_con_mat_os.loc['P_sum']=res_con_mat_os.apply(lambda x:x.sum())
res_con_mat_os
DOS PROBING R2L U2R normal. T_sum
DOS 1161582 67 74 48 3372 1165143
PROBING 2 12192 2 8 58 12262
R2L 0 3 319 2 6 330
U2R 0 0 2 17 3 22
normal. 56 3036 1990 5877 280814 291773
P_sum 1161640 15298 2387 5952 284253 1469530
res_con_mat_os.loc['P_sum','DOS']
1161640
DOS_detection_rate=res_con_mat_os.loc['DOS','DOS']/res_con_mat_os.loc['DOS','T_sum']
DOS_detection_rate
0.996943722787675
##构造检测率(召回率)recall 本类真正类占本类所有正类的比例
detection_rate = []
for i in list(res_con_mat_os.columns[:-1]):
#     print(i)
#     print(res_con_mat_os.loc[i,i])detection_rate.append(res_con_mat_os.loc[i,i]/res_con_mat_os.loc[i,'T_sum'])
detection_rate
[0.996943722787675,0.9942913064752895,0.9666666666666667,0.7727272727272727,0.9624399790247898]
DOS_fpr = 1-res_con_mat_os.loc['DOS','DOS']/res_con_mat_os.loc['P_sum','DOS']
DOS_fpr
4.9929410144256003e-05
##构造误报率(FPR) 不是本类却预测为该类
fpr=[]
for i in list(res_con_mat_os.columns[:-1]):
#     print(i)
#     print(res_con_mat_os.loc[i,i])fpr.append(1-res_con_mat_os.loc[i,i]/res_con_mat_os.loc['P_sum',i])
fpr
[4.9929410144256003e-05,0.2030330762191136,0.8663594470046083,0.9971438172043011,0.012098377149933337]
res = pd.DataFrame({'类别':le.classes_,'检测率':detection_rate,'误报率':fpr})
# res.to_excel('out_file/检测率2.xlsx')
res
类别 检测率 误报率
0 DOS 0.996944 0.000050
1 PROBING 0.994291 0.203033
2 R2L 0.966667 0.866359
3 U2R 0.772727 0.997144
4 normal. 0.962440 0.012098
colors = ['k','g','r','orange','blue']
label_list= res.类别
fig =plt.figure(figsize=(10,6))
for i in range(5):x=res.loc[res.index==i,'误报率']y=res.loc[res.index==i,'检测率']plt.scatter(x,y,c=colors[i],cmap='brg',s=40,alpha=0.8,marker='8')
plt.xticks(np.arange(0.0,1.1,0.1))
plt.yticks(np.arange(0.4,1.1,0.1))
plt.xlabel('误报率')
plt.ylabel('检测率')
plt.title('过采样后各类别的评估参数散点图')ax = fig.gca()
handles,labels = ax.get_legend_handles_labels()
ax.legend(handles, labels = label_list, loc='lower left')
# plt.savefig('out_file/评估参数散点图2.png',dpi=300)
plt.show();
D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:16: UserWarning: You have mixed positional and keyword arguments, some input may be discarded.app.launch_new_instance()

  • 检测率越高,误报率越低,该模型对该类的分类效果最好

  • 从检测率看,DOS、PROBING、R2L都在0.9以上,normal.在0.88以上,该模型对这几类的识别有较好的表现,U2R表现最差,只有0.5

  • 从误报率来看,DOS、normal.都在0.05以下,该模型对这两类的识别有较好的表现,PROBING的误报率为0.72,R2L和U2R的误报率都在0.9以上,表现最差

  • 从图中可以判断出,该模型识别效果(检测率和误报率权重相同的情况下)排序如下:DOS、normal、PROBING、R2L、U2R

预测分析

 df_test = pd.read_csv('test_data_unlabeled.csv',header=None)
# 添加列名
df_test.columns = col_name[:-1]
df_to_out_file=df_test.copy()
df_test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2984154 entries, 0 to 2984153
Data columns (total 41 columns):
duration                       int64
protocol_type                  object
service                        object
flag                           object
src_bytes                      int64
dst_bytes                      int64
land                           int64
wrong_fragment                 int64
urgent                         int64
hot                            int64
num_failed_logins              int64
logged_in                      int64
num_compromised                int64
root_shell                     int64
su_attempted                   int64
num_root                       int64
num_file_creations             int64
num_shells                     int64
num_access_files               int64
num_outbound_cmds              int64
is_hot_login                   int64
is_guest_login                 int64
count                          int64
srv_count                      int64
serror_rate                    float64
srv_serror_rate                float64
rerror_rate                    float64
srv_rerror_rate                float64
same_srv_rate                  float64
diff_srv_rate                  float64
srv_diff_host_rate             float64
dst_host_count                 int64
dst_host_srv_count             int64
dst_host_same_srv_rate         float64
dst_host_diff_srv_rate         float64
dst_host_same_src_port_rate    float64
dst_host_srv_diff_host_rate    float64
dst_host_serror_rate           float64
dst_host_srv_serror_rate       float64
dst_host_rerror_rate           float64
dst_host_srv_rerror_rate       float64
dtypes: float64(15), int64(23), object(3)
memory usage: 933.5+ MB

共2984154条记录

df_test.describe().T
count mean std min 25% 50% 75% max
duration 2984154.0 3.720818 243.953368 0.0 0.00 0.0 0.00 66366.0
src_bytes 2984154.0 768.880216 45551.303068 0.0 105.00 520.0 1032.00 62825648.0
dst_bytes 2984154.0 730.889662 39914.221849 0.0 0.00 0.0 0.00 32317698.0
land 2984154.0 0.000003 0.001737 0.0 0.00 0.0 0.00 1.0
wrong_fragment 2984154.0 0.000477 0.035681 0.0 0.00 0.0 0.00 3.0
urgent 2984154.0 0.000014 0.006099 0.0 0.00 0.0 0.00 6.0
hot 2984154.0 0.005270 0.303767 0.0 0.00 0.0 0.00 233.0
num_failed_logins 2984154.0 0.000260 0.017102 0.0 0.00 0.0 0.00 5.0
logged_in 2984154.0 0.147073 0.354178 0.0 0.00 0.0 0.00 1.0
num_compromised 2984154.0 0.004904 1.480858 0.0 0.00 0.0 0.00 942.0
root_shell 2984154.0 0.000080 0.008930 0.0 0.00 0.0 0.00 1.0
su_attempted 2984154.0 0.000017 0.005399 0.0 0.00 0.0 0.00 2.0
num_root 2984154.0 0.004576 1.616186 0.0 0.00 0.0 0.00 1013.0
num_file_creations 2984154.0 0.000568 0.104977 0.0 0.00 0.0 0.00 100.0
num_shells 2984154.0 0.000009 0.004254 0.0 0.00 0.0 0.00 5.0
num_access_files 2984154.0 0.000689 0.027856 0.0 0.00 0.0 0.00 7.0
num_outbound_cmds 2984154.0 0.000000 0.000000 0.0 0.00 0.0 0.00 0.0
is_hot_login 2984154.0 0.000006 0.002456 0.0 0.00 0.0 0.00 1.0
is_guest_login 2984154.0 0.000644 0.025377 0.0 0.00 0.0 0.00 1.0
count 2984154.0 280.423016 217.434739 0.0 95.00 236.0 511.00 511.0
srv_count 2984154.0 245.349697 239.442417 0.0 8.00 136.0 511.00 511.0
serror_rate 2984154.0 0.060077 0.235591 0.0 0.00 0.0 0.00 1.0
srv_serror_rate 2984154.0 0.060017 0.236513 0.0 0.00 0.0 0.00 1.0
rerror_rate 2984154.0 0.145965 0.351468 0.0 0.00 0.0 0.00 1.0
srv_rerror_rate 2984154.0 0.145926 0.352154 0.0 0.00 0.0 0.00 1.0
same_srv_rate 2984154.0 0.808191 0.377635 0.0 1.00 1.0 1.00 1.0
diff_srv_rate 2984154.0 0.024612 0.106544 0.0 0.00 0.0 0.00 1.0
srv_diff_host_rate 2984154.0 0.025781 0.125967 0.0 0.00 0.0 0.00 1.0
dst_host_count 2984154.0 235.507335 60.337888 0.0 255.00 255.0 255.00 255.0
dst_host_srv_count 2984154.0 199.141008 100.880118 0.0 248.00 255.0 255.00 255.0
dst_host_same_srv_rate 2984154.0 0.790242 0.391747 0.0 0.99 1.0 1.00 1.0
dst_host_diff_srv_rate 2984154.0 0.024176 0.095417 0.0 0.00 0.0 0.01 1.0
dst_host_same_src_port_rate 2984154.0 0.567627 0.489481 0.0 0.00 1.0 1.00 1.0
dst_host_srv_diff_host_rate 2984154.0 0.004283 0.034120 0.0 0.00 0.0 0.00 1.0
dst_host_serror_rate 2984154.0 0.060003 0.234881 0.0 0.00 0.0 0.00 1.0
dst_host_srv_serror_rate 2984154.0 0.059886 0.236330 0.0 0.00 0.0 0.00 1.0
dst_host_rerror_rate 2984154.0 0.145625 0.349760 0.0 0.00 0.0 0.00 1.0
dst_host_srv_rerror_rate 2984154.0 0.145907 0.351974 0.0 0.00 0.0 0.00 1.0
# 删除在训练集中属性值只有一个的字段
for i in only_1_field:del df_test[i]
df_test.head()
duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent hot ... dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate
0 0 udp private SF 105 146 0 0 0 0 ... 1 1 1.0 0.00 1.00 0.0 0.0 0.0 0.0 0.0
1 0 udp private SF 105 146 0 0 0 0 ... 255 254 1.0 0.01 0.00 0.0 0.0 0.0 0.0 0.0
2 0 udp private SF 105 146 0 0 0 0 ... 255 254 1.0 0.01 0.00 0.0 0.0 0.0 0.0 0.0
3 0 udp private SF 105 146 0 0 0 0 ... 255 254 1.0 0.01 0.00 0.0 0.0 0.0 0.0 0.0
4 0 udp private SF 105 146 0 0 0 0 ... 255 254 1.0 0.01 0.01 0.0 0.0 0.0 0.0 0.0

5 rows × 40 columns

#将离散的字符串型字段转化为数值
df_test.protocol_type = le_protocol_type.transform(df_test.protocol_type)
df_test.service = le_service.transform(df_test.service)
df_test.flag = le_flag.transform(df_test.flag)
X_resampled.columns
Index(['duration', 'protocol_type', 'service', 'flag', 'src_bytes','dst_bytes', 'wrong_fragment', 'hot', 'num_compromised', 'count','diff_srv_rate', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate','dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate','dst_host_serror_rate', 'dst_host_rerror_rate','dst_host_srv_rerror_rate'],dtype='object')
X_test_to_predict = df_test[X_resampled.columns.values]
X_test_to_predict
duration protocol_type service flag src_bytes dst_bytes wrong_fragment hot num_compromised count diff_srv_rate dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate
0 0 2 50 9 105 146 0 0 0 1 0.0 1.0 0.00 1.00 0.0 0.0 0.0 0.0
1 0 2 50 9 105 146 0 0 0 1 0.0 1.0 0.01 0.00 0.0 0.0 0.0 0.0
2 0 2 50 9 105 146 0 0 0 1 0.0 1.0 0.01 0.00 0.0 0.0 0.0 0.0
3 0 2 50 9 105 146 0 0 0 1 0.0 1.0 0.01 0.00 0.0 0.0 0.0 0.0
4 0 2 50 9 105 146 0 0 0 1 0.0 1.0 0.01 0.01 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2984149 0 2 50 9 105 147 0 0 0 2 0.0 1.0 0.00 0.01 0.0 0.0 0.0 0.0
2984150 0 2 50 9 105 105 0 0 0 3 0.0 1.0 0.00 0.00 0.0 0.0 0.0 0.0
2984151 0 2 50 9 105 147 0 0 0 4 0.0 1.0 0.00 0.01 0.0 0.0 0.0 0.0
2984152 0 2 50 9 105 105 0 0 0 1 0.0 1.0 0.00 0.00 0.0 0.0 0.0 0.0
2984153 0 2 50 9 105 147 0 0 0 2 0.0 1.0 0.00 0.01 0.0 0.0 0.0 0.0

2984154 rows × 18 columns

X_test_to_predict.shape
(2984154, 18)
##根据模型预测未知数据集
y_to_predict = last_clf_os.predict(X_test_to_predict)
y_to_predict
array([4, 4, 4, ..., 4, 4, 4])
y_p_LABEL =le.inverse_transform(y_to_predict)
df_to_out_file['pre_result'] = y_p_LABEL
df_to_out_file.to_csv('out_file/test_result_by_dtc.csv',header=False,index=False)

KDD CUP 99利用决策分类树进行网络异常检测相关推荐

  1. KDD CUP 99 数据集解析、挖掘与下载

    KDD CUP 99 数据集解析.挖掘与下载 数据特征描述 一个网络连接定义为在某个时间内从开始到结束的TCP数据包序列,并且在这段时间内,数据在预定义的协议下(如TCP.UDP)从源IP地址到目的I ...

  2. KDD CUP 99数据集分析

    背景知识 KDD是数据挖掘与知识发现(Data Mining and Knowledge Discovery)的简称,KDD CUP是由ACM(Association for Computing Ma ...

  3. KDD CUP 99 数据集

    背景知识 KDD是数据挖掘与知识发现(Data Mining and Knowledge Discovery)的简称,KDD CUP是由ACM(Association for Computing Ma ...

  4. 利用Kafka和Cassandra构建实时异常检测实验

    导言 异常检测是一种跨行业的方法,用于发现事件流中的异常事件 - 它适用于物联网传感器,财务欺诈检测,安全性,威胁检测,数字广告欺诈以及许多其他应用程序.此类系统检查流数据以检查异常或不规则,并在检测 ...

  5. KDD Cup'99 数据熟悉和特征分析

    /* 以下结论来自10%的数据,做一个简单的了解*/ 1. 检查文件共有42行,即42个特征,特征表格如下 (back,buffer_overflow,ftp_write,guess_passwd,i ...

  6. 机器学习入门之:使用 scikit-learn 决策分类树来预测泰坦尼克号沉船生还情况

    代码 有关数据处理部分的相关解析,请看 30天吃掉那只 TensorFlow2 写的非常详细 不再在这里赘述 from sklearn import tree import pandas as pd ...

  7. sklearn 决策树(分类树、回归树)的 重要参数、属性、方法理解

    文章目录 决策分类树 引入 重要参数 1,criterion 2,splitter 3,max_depth 4,min_samples_leaf 5,min_samples_split 6,max_f ...

  8. KDD CUP 1999数据集

    KDD是数据挖掘与知识发现(Data Mining and Knowledge Discovery)的简称,KDD CUP是由ACM(Association for Computing Machine ...

  9. 读书笔记《Outlier Analysis》 第八章 分类、文本和混合属性中的异常检测

    1.引言 前面讨论的都是数值数据,然而,在现实生活生成中,还有很多其他类型的数据,如性别.种族.邮编.人员和实体的名称.IP地址等.通常处理这些数据更有挑战,因为难以以均匀和一致的方式来处理各种类型的 ...

最新文章

  1. .NET 下基于动态代理的 AOP 框架实现揭秘
  2. 用VC写Assembly代码(5) --函数调用(三)
  3. 对称式加密和非对称式加密
  4. Redis 基本数据类型
  5. Mac电脑优化工具箱MacCleaner PRO
  6. 新 Nsight Graph、Nsight Aftermath 版本中的性能提升和增强功能
  7. attachEvent
  8. idea 怎么快速创建类的快捷键_Idea 常用快捷键整理
  9. 计算机专业自我总结100字,毕业自我鉴定100字
  10. 使用笔记:AWTK中文键盘按键字体加大
  11. android盒子远程,电视盒子ADB教程 通过ADB远程安装应用方法
  12. 用户满意您的产品吗?20个用户体验调查问题给您答案
  13. Windows下设置redis数据库允许远程访问
  14. vb.net Encoding类 编码解码
  15. 技术前沿与经典文章15:历史上54位伟大物理学家、科学家的专属LOGO(一)
  16. 桌面图标有了蓝色问号解决方案
  17. HUSTPC2022
  18. 银行数据采集,数据补录与指标管理3大问题如何解决?
  19. “2019·中国云计算和物联网大会”来袭,多个亮点
  20. 线结构光视觉传感器/线激光深度传感器标定工具

热门文章

  1. 【欢迎来怼】事后诸葛亮会议
  2. 基于clamp.js封装vue指令,处理多行文本的溢出
  3. signature=1be7575a614ba3597c2c53247a739d1c,18-02-07【摄影机系统】ARRI大画幅摄影机系统常见问题解答...
  4. 图片分析——现代家居风水学[图文]居家必然之奇术
  5. 浅墨博客 六 笔记
  6. c语言程序从哪里开始执行
  7. 从Dijkstra谈帅才的洞察力(王选)
  8. 关于Ubuntu多显卡服务器,掉显卡的问题
  9. 这是我见过最美的公众号图文排版,不接受反驳。
  10. FSR薄膜压力传感器使用教程