决策树python建模中的坑

代码

#coding=utf-8

from sklearn.feature_extraction import DictVectorizer

import csv

from sklearn import tree

from sklearn import preprocessing

from sklearn.externals.six import StringIO

allElectronicsData = open(r"D:\workspace\python\files\AllElectronics.csv")

reader = csv.reader(allElectronicsData)

headers = reader.next()

print (headers)

featureList = []

labelList = []

for row in reader:

labelList.append(row[len(row)-1])

rowDict = {}

for i in range(1,len(row)-1):

rowDict[headers[i]]=row[i]

featureList.append(rowDict)

print (featureList)

#Vetorrize features

vec = DictVectorizer()

dummyX = vec.fit_transform(featureList).toarray()

print ("dummyx:" + str(dummyX))

print (vec.get_feature_names())

print ("labelList:" + str(labelList))

# vectorize class labels

lb =preprocessing.LabelBinarizer()

dummyY = lb.fit_transform(labelList)

print ("dummyY:"+ str(dummyY))

#Using decision tree for classification

clf = tree.DecisionTreeClassifier(criterion='entropy')

clf =clf.fit(dummyX,dummyY)

print ("clf:"+str(clf))

#Visualize mpdel

with open("allElectornicinformationGainOri.dot",'w')as f:

f = tree.export_graphviz(clf,feature_names=vec.get_feature_names(),out_file=f)

#dot 转化成pdf 树:dot -Tpdf " " -o output.pdf

oneRowx = dummyX[0,:]

print ("oneRowx"+str(oneRowx))

#测试模型

newRowX = oneRowx

#这里有个坑,一定要注意维度 numpy!!!

newRowX[0] = 0

newRowX[2] = 1

newRowX.reshape(1, -1)

print ("newRowx:" + str(newRowX))

predictedY = clf.predict(oneRowx)

print ("predictedY"+str(predictedY))

错误如下:

Traceback (most recent call last):

File "D:/workspace/python/.idea/decision_tree.py", line 55, in

predictedY = clf.predict(oneRowx)

File "C:\Python27\lib\site-packages\sklearn\tree\tree.py", line 412, in predict

X = self._validate_X_predict(X, check_input)

File "C:\Python27\lib\site-packages\sklearn\tree\tree.py", line 373, in _validate_X_predict

X = check_array(X, dtype=DTYPE, accept_sparse="csr")

File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 441, in check_array

"if it contains a single sample.".format(array))

ValueError: Expected 2D array, got 1D array instead:

array=[0. 0. 1. 0. 1. 1. 0. 0. 1. 0.].

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

修正后代码:

#coding=utf-8

from sklearn.feature_extraction import DictVectorizer

import csv

from sklearn import tree

from sklearn import preprocessing

from sklearn.externals.six import StringIO

allElectronicsData = open(r"D:\workspace\python\files\AllElectronics.csv")

reader = csv.reader(allElectronicsData)

headers = reader.next()

print (headers)

featureList = []

labelList = []

for row in reader:

labelList.append(row[len(row)-1])

rowDict = {}

for i in range(1,len(row)-1):

rowDict[headers[i]]=row[i]

featureList.append(rowDict)

print (featureList)

#Vetorrize features

vec = DictVectorizer()

dummyX = vec.fit_transform(featureList).toarray()

print ("dummyx:" + str(dummyX))

print (vec.get_feature_names())

print ("labelList:" + str(labelList))

# vectorize class labels

lb =preprocessing.LabelBinarizer()

dummyY = lb.fit_transform(labelList)

print ("dummyY:"+ str(dummyY))

#Using decision tree for classification

clf = tree.DecisionTreeClassifier(criterion='entropy')

clf =clf.fit(dummyX,dummyY)

print ("clf:"+str(clf))

#Visualize mpdel

with open("allElectornicinformationGainOri.dot",'w')as f:

f = tree.export_graphviz(clf,feature_names=vec.get_feature_names(),out_file=f)

#dot 转化成pdf 树:dot -Tpdf " " -o output.pdf

oneRowx = dummyX[0,:].reshape(1, -1)

print ("oneRowx"+str(oneRowx))

#测试模型

newRowX = oneRowx

#这里有个坑,一定要注意维度 numpy!!!

newRowX[0][0] = 0

newRowX[0][2] = 1

newRowX.reshape(1, -1)print ("newRowx:" + str(newRowX))

predictedY = clf.predict(oneRowx)

print ("predictedY"+str(predictedY))

运行结果:

C:\Python27\python.exe D:/workspace/python/.idea/decision_tree.py

['RID', 'age', 'income', 'student', 'credit_rating', 'class_buys_computer']

[{'credit_rating': 'fair', 'age': 'youth', 'student': 'no', 'income': 'high'}, {'credit_rating': 'excellent', 'age': 'youth', 'student': 'no', 'income': 'high'}, {'credit_rating': 'fair', 'age': 'middle_aged', 'student': 'no', 'income': 'high'}, {'credit_rating': 'fair', 'age': 'senior', 'student': 'no', 'income': 'medium'}, {'credit_rating': 'fair', 'age': 'senior', 'student': 'yes', 'income': 'low'}, {'credit_rating': 'excellent', 'age': 'senior', 'student': 'yes', 'income': 'low'}, {'credit_rating': 'excellent', 'age': 'middle_aged', 'student': 'yes', 'income': 'low'}, {'credit_rating': 'fair', 'age': 'youth', 'student': 'no', 'income': 'medium'}, {'credit_rating': 'fair', 'age': 'youth', 'student': 'yes', 'income': 'low'}, {'credit_rating': 'fair', 'age': 'senior', 'student': 'yes', 'income': 'medium'}, {'credit_rating': 'excellent', 'age': 'youth', 'student': 'yes', 'income': 'medium'}, {'credit_rating': 'excellent', 'age': 'middle_aged', 'student': 'no', 'income': 'medium'}, {'credit_rating': 'fair', 'age': 'middle_aged', 'student': 'yes', 'income': 'high'}, {'credit_rating': 'excellent', 'age': 'senior', 'student': 'no', 'income': 'medium'}]

dummyx:[[0. 0. 1. 0. 1. 1. 0. 0. 1. 0.]

[0. 0. 1. 1. 0. 1. 0. 0. 1. 0.]

[1. 0. 0. 0. 1. 1. 0. 0. 1. 0.]

[0. 1. 0. 0. 1. 0. 0. 1. 1. 0.]

[0. 1. 0. 0. 1. 0. 1. 0. 0. 1.]

[0. 1. 0. 1. 0. 0. 1. 0. 0. 1.]

[1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]

[0. 0. 1. 0. 1. 0. 0. 1. 1. 0.]

[0. 0. 1. 0. 1. 0. 1. 0. 0. 1.]

[0. 1. 0. 0. 1. 0. 0. 1. 0. 1.]

[0. 0. 1. 1. 0. 0. 0. 1. 0. 1.]

[1. 0. 0. 1. 0. 0. 0. 1. 1. 0.]

[1. 0. 0. 0. 1. 1. 0. 0. 0. 1.]

[0. 1. 0. 1. 0. 0. 0. 1. 1. 0.]]

['age=middle_aged', 'age=senior', 'age=youth', 'credit_rating=excellent', 'credit_rating=fair', 'income=high', 'income=low', 'income=medium', 'student=no', 'student=yes']

labelList:['no', 'no', 'yes', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes', 'yes', 'yes', 'yes', 'no']

dummyY:[[0]

[0]

[1]

[1]

[1]

[0]

[1]

[0]

[1]

[1]

[1]

[1]

[1]

[0]]

clf:DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=None,

max_features=None, max_leaf_nodes=None,

min_impurity_decrease=0.0, min_impurity_split=None,

min_samples_leaf=1, min_samples_split=2,

min_weight_fraction_leaf=0.0, presort=False, random_state=None,

splitter='best')

oneRowx[[0. 0. 1. 0. 1. 1. 0. 0. 1. 0.]]

newRowx:[[0. 0. 1. 0. 1. 1. 0. 0. 1. 0.]]

predictedY[0]

总结:注意 维度,标红位置

python建模 决策_决策树python建模中的坑 :ValueError: Expected 2D array, got 1D array instead:...相关推荐

  1. 决策树python建模中的坑 :ValueError: Expected 2D array, got 1D array instead:

    决策树python建模中的坑 代码 #coding=utf-8 from sklearn.feature_extraction import DictVectorizerimport csvfrom ...

  2. python 时间序列预测_使用Python进行动手时间序列预测

    python 时间序列预测 Time series analysis is the endeavor of extracting meaningful summary and statistical ...

  3. python 概率分布模型_使用python的概率模型进行公司估值

    python 概率分布模型 Note from Towards Data Science's editors: While we allow independent authors to publis ...

  4. 《python机器学习经典实例》Expected 2D array, got 1D array instead和Reshape your data either using array.问题(已解决)

    问题描述: ValueError: Expected 2D array, got 1D array instead: array=[2.  1.5]. Reshape your data either ...

  5. python 使用sk_learn :ValueError: Expected 2D array, got 1D array instead

    源代码 """ date:0328 K均值 KMeans模型 """ import pandas as pd df = pd.read_cs ...

  6. Python机器学习bug:ValueError_ Expected 2D array, got 1D array instead

    0 前言 在学习机器学习时,为了便于理解观察,有时候会拿一些一维的数组进行测试,在初学阶段可能就难免会踩到这个坑.这个bug处理起来比较简单,就是将一维的数组变成二维的数组. 相关环境: Window ...

  7. python参数化建模加工图_基于Python的ABAQUS层压板参数化建模

    唐维 康泽毓 杨婷 曾凤 蒋莉 摘要:为了提高层压板在ABAQUS仿真中建模的效率与准确性,提出利用Python语言对ABAQUS二次开发进行层压板参数化建模的方法.基于ABAQUS有限元软件,采用P ...

  8. python初中数学建模培训_中学生数学建模训练营VIP班

    本课程是面向中学生.低年级大学生开设的数学建模基础教育课程,从课程中了解数学方法在实际中的应用,了解数学建模竞赛,了解编程语言,了解论文的写作方法,最终完成一份应用数学知识,建立数学模型,编程计算结果 ...

  9. python初中数学建模培训_中学生数学建模训练营精品班

    本课程是面向中学生.低年级大学生开设的数学建模基础教育课程,从课程中了解数学方法在实际中的应用,了解数学建模竞赛,了解编程语言,了解论文的写作方法,最终完成一份应用数学知识,建立数学模型,编程计算结果 ...

最新文章

  1. 取java.sql.date日期_JAVA 处理时间 - java.sql.Date、java.util.Date与数据库中的Date字段的转换方法[转]...
  2. 文巾解题383. 赎金信
  3. Spring Cloud Alibaba - 08 Ribbon 两种方式实现细粒度自定义配置控制微服务的负载均衡策略
  4. TJA1050只能RX不能TX
  5. Python基于socket实现的TCP服务端
  6. 计算机科学与编程基础,国外经典教材·计算机科学与技术:Oracle 10g编程基础
  7. kafka偏移量保存到mysql里_Kafka 新版消费者 API(二):提交偏移量
  8. linux真实地址是什么意思,linux – 如何为发件人地址配置真实域名
  9. yeoman+grunt/gulp+bower构建angular项目
  10. JAVA_调用方法_用户输入姓名打印出欢迎词
  11. python实现华氏温度和摄氏温度转换
  12. 数据结构期末复习速成
  13. Kali Linux安装2019.2.28
  14. 2所非211院校挺进全球高校600强,甩开一众985 | 泰晤士世界大学排行榜出炉
  15. WEB攻防-通用漏洞SQL读写注入ACCESS偏移注入MYSQLMSSQLPostgreSQL
  16. C 生化危机 SDUT
  17. 具有多孔光纤的偏振分束器
  18. 两种方案实现内外网隔离
  19. 注会考试计算机应用技巧,2019年注会cpa考试机考系统计算器的操作技巧
  20. ffmpeg 音视频分离、合成

热门文章

  1. 一年多的远程办公带给我的感悟
  2. 你抢的不是春节红包而是云!
  3. 神器推荐!在浏览器中运行 VS Code,随时随地写代码
  4. Spring 框架之 AOP 原理深度剖析!|CSDN 博文精选
  5. 2019 最新 200 道 Java 面试题
  6. IBM 重磅开源 Power 芯片指令集!国产芯迎来新机遇?
  7. 强制应用 AMP 工具,开发者欲“封杀” Google!
  8. 一个 8 岁的“前端老人”
  9. 为什么雷军指责“华为不懂研发”?| 畅言
  10. IoT 时代,安全危机爆发