项目实战-使用PySpark处理文本多分类问题
原文链接:https://cloud.tencent.com/developer/article/1096712
在大神创作的基础上,学习了一些新知识,并加以注释。
TARGET:将旧金山犯罪记录(San Francisco Crime Description)分类到33个类目中
源代码及数据集:之后提交。
一、载入数据集data
1 import time 2 from pyspark.sql import SQLContext 3 from pyspark import SparkContext 4 # 利用spark的csv库直接载入csv格式的数据 5 sc = SparkContext() 6 sqlContext = SQLContext(sc) 7 data = sqlContext.read.format('com.databricks.spark.csv').options(header='true', 8 inferschema='true').load('train.csv') 9 # 选10000条数据集,减少运行时间 10 data = data.sample(False, 0.01, 100) 11 print(data.count())
结果:
8703 1.1 除去与需求无关的列
1 # 除去一些不要的列,并展示前五行 2 drop_list = ['Dates', 'DayOfWeek', 'PdDistrict', 'Resolution', 'Address', 'X', 'Y'] 3 data = data.select([column for column in data.columns if column not in drop_list]) 4 data.show(5)
1.2 显示数据结构
1 # 利用printSchema()方法显示数据的结构 2 data.printSchema()
结果:
root|-- Category: string (nullable = true)|-- Descript: string (nullable = true) 1.3 查看犯罪类型最多的前20个
1 # 包含数量最多的20类犯罪 2 from pyspark.sql.functions import col 3 data.groupBy('Category').count().orderBy(col('count').desc()).show()
结果:
+--------------------+-----+ | Category|count| +--------------------+-----+ | LARCENY/THEFT| 1725| | OTHER OFFENSES| 1230| | NON-CRIMINAL| 962| | ASSAULT| 763| | VEHICLE THEFT| 541| | DRUG/NARCOTIC| 494| | VANDALISM| 447| | WARRANTS| 406| | BURGLARY| 347| | SUSPICIOUS OCC| 295| | MISSING PERSON| 284| | ROBBERY| 225| | FRAUD| 159| | SECONDARY CODES| 124| |FORGERY/COUNTERFE...| 109| | WEAPON LAWS| 86| | TRESPASS| 63| | PROSTITUTION| 59| | DISORDERLY CONDUCT| 54| | DRUNKENNESS| 52| +--------------------+-----+ only showing top 20 rows
1.4 查看犯罪描述最多的前20个
1 # 包含犯罪数量最多的20个描述 2 data.groupBy('Descript').count().orderBy(col('count').desc()).show()
结果: +--------------------+-----+ | Descript|count| +--------------------+-----+ |GRAND THEFT FROM ...| 569| | LOST PROPERTY| 323| | BATTERY| 301| | STOLEN AUTOMOBILE| 262| |DRIVERS LICENSE, ...| 244| |AIDED CASE, MENTA...| 223| | WARRANT ARREST| 222| |PETTY THEFT FROM ...| 216| |SUSPICIOUS OCCURR...| 211| |MALICIOUS MISCHIE...| 184| | TRAFFIC VIOLATION| 168| |THREATS AGAINST LIFE| 154| |PETTY THEFT OF PR...| 152| | FOUND PROPERTY| 138| |MALICIOUS MISCHIE...| 138| |ENROUTE TO OUTSID...| 121| |GRAND THEFT OF PR...| 115| |MISCELLANEOUS INV...| 101| | DOMESTIC VIOLENCE| 99| | FOUND PERSON| 98| +--------------------+-----+ only showing top 20 rows 二、对犯罪描述进行分词2.1 对Descript分词,先切分单词,再删除停用词
流程和scikit-learn版本的很相似,包含3个步骤:
1.regexTokenizer: 利用正则切分单词
2.stopwordsRemover: 移除停用词
3.countVectors: 构建词频向量
RegexTokenizer:基于正则的方式进行文档切分成单词组
inputCol: 输入字段
outputCol: 输出字段
pattern: 匹配模式,根据匹配到的内容切分单词
CountVectorizer:构建词频向量
covabSize: 限制的词频数
minDF:如果是float,则表示出现的百分比小于minDF,不会被当做关键词
如果是int,则表示出现是次数小于minDF,不会被当做关键词
1 from pyspark.ml.feature import RegexTokenizer, StopWordsRemover, CountVectorizer 2 from pyspark.ml.classification import LogisticRegression 3 4 # 正则切分单词 5 # inputCol:输入字段名 6 # outputCol:输出字段名 7 regexTokenizer = RegexTokenizer(inputCol='Descript', outputCol='words', pattern='\\W') 8 # 停用词 9 add_stopwords = ['http', 'https', 'amp', 'rt', 't', 'c', 'the'] 10 stopwords_remover = StopWordsRemover(inputCol='words', outputCol='filtered').setStopWords(add_stopwords) 11 # 构建词频向量 12 count_vectors = CountVectorizer(inputCol='filtered', outputCol='features', vocabSize=10000, minDF=5)
2.2 对分词后的词频率排序,最频繁出现的设置为0
StringIndexer
StringIndexer将一列字符串label编码为一列索引号,根据label出现的频率排序,最频繁出现的label的index为0
该例子中,label会被编码成从0-32的整数,最频繁的label被编码成0
Pipeline是基于DataFrame的高层API,可以方便用户构建和调试机器学习流水线,可以使得多个机器学习算法顺序执行,达到高效的数据处理的目的。
fit():将DataFrame转换成一个Transformer的算法,将label列转化为特征向量
transform(): 将特征向量作为新列添加到DataFrame
1 from pyspark.ml import Pipeline 2 from pyspark.ml.feature import OneHotEncoder, StringIndexer, VectorAssembler 3 label_stringIdx = StringIndexer(inputCol='Category', outputCol='label') 4 pipeline = Pipeline(stages=[regexTokenizer, stopwords_remover, count_vectors, label_stringIdx]) 5 # fit the pipeline to training documents 6 pipeline_fit = pipeline.fit(data) 7 dataset = pipeline_fit.transform(data) 8 dataset.show(5)
结果:
+---------------+--------------------+--------------------+--------------------+--------------------+-----+ | Category| Descript| words| filtered| features|label| +---------------+--------------------+--------------------+--------------------+--------------------+-----+ | LARCENY/THEFT|GRAND THEFT FROM ...|[grand, theft, fr...|[grand, theft, fr...|(309,[0,2,3,4,6],...| 0.0| | VEHICLE THEFT| STOLEN AUTOMOBILE|[stolen, automobile]|[stolen, automobile]|(309,[9,27],[1.0,...| 4.0| | NON-CRIMINAL| FOUND PROPERTY| [found, property]| [found, property]|(309,[5,32],[1.0,...| 2.0| |SECONDARY CODES| JUVENILE INVOLVED|[juvenile, involved]|[juvenile, involved]|(309,[67,218],[1....| 13.0| | OTHER OFFENSES|DRIVERS LICENSE, ...|[drivers, license...|[drivers, license...|(309,[14,23,28,30...| 1.0| +---------------+--------------------+--------------------+--------------------+--------------------+-----+ only showing top 5 rows 三、训练/测试集划分
1 # set seed for reproducibility 2 # 数据集划分训练集和测试集,比例7:3, 设置随机种子100 3 (trainingData, testData) = dataset.randomSplit([0.7, 0.3], seed=100) 4 print('Training Dataset Count:{}'.format(trainingData.count())) 5 print('Test Dataset Count:{}'.format(testData.count()))
结果:
Training Dataset Count:6117 Test Dataset Count:2586 四、模型训练和评价4.1 以词频作为特征,利用逻辑回归进行分类模型在测试集上预测和打分,查看10个预测概率值最高的结果:
LogisticRegression:逻辑回归模型
maxIter:最大迭代次数
regParam:正则化参数
elasticNetParam:正则化。0:l1;1:l2
1 start_time = time.time() 2 lr = LogisticRegression(maxIter=20, regParam=0.3, elasticNetParam=0) 3 lrModel = lr.fit(trainingData) 4 predictions = lrModel.transform(testData) 5 # 过滤prediction类别为0数据集 6 predictions.filter(predictions['prediction'] == 0).select('Descript', 'Category', 'probability', 'label', 'prediction').orderBy('probability', accending=False).show(n=10, truncate=30)
结果:
+--------------------------+--------+------------------------------+-----+----------+ | Descript|Category| probability|label|prediction| +--------------------------+--------+------------------------------+-----+----------+ | ARSON OF A VEHICLE| ARSON|[0.1194196587417514,0.10724...| 26.0| 0.0| | ARSON OF A VEHICLE| ARSON|[0.1194196587417514,0.10724...| 26.0| 0.0| | ARSON OF A VEHICLE| ARSON|[0.1194196587417514,0.10724...| 26.0| 0.0| | ATTEMPTED ARSON| ARSON|[0.12978385966276762,0.1084...| 26.0| 0.0| | CREDIT CARD, THEFT OF| FRAUD|[0.21637136655265077,0.0836...| 12.0| 0.0| | CREDIT CARD, THEFT OF| FRAUD|[0.21637136655265077,0.0836...| 12.0| 0.0| | CREDIT CARD, THEFT OF| FRAUD|[0.21637136655265077,0.0836...| 12.0| 0.0| | CREDIT CARD, THEFT OF| FRAUD|[0.21637136655265077,0.0836...| 12.0| 0.0| | CREDIT CARD, THEFT OF| FRAUD|[0.21637136655265077,0.0836...| 12.0| 0.0| |ARSON OF A VACANT BUILDING| ARSON|[0.22897903829071928,0.0980...| 26.0| 0.0| +--------------------------+--------+------------------------------+-----+----------+ only showing top 10 rows
1 from pyspark.ml.evaluation import MulticlassClassificationEvaluator 2 # predictionCol: 预测列的名称 3 evaluator = MulticlassClassificationEvaluator(predictionCol='prediction') 4 # 预测准确率 5 print(evaluator.evaluate(predictions)) 6 end_time = time.time() 7 print(end_time - start_time)
结果:
0.9641817609126011 8.245999813079834 4.2 以TF-ID作为特征,利用逻辑回归进行分类
1 from pyspark.ml.feature import HashingTF, IDF 2 start_time = time.time() 3 # numFeatures: 最大特征数 4 hashingTF = HashingTF(inputCol='filtered', outputCol='rawFeatures', numFeatures=10000) 5 # minDocFreq:过滤的最少文档数量 6 idf = IDF(inputCol='rawFeatures', outputCol='features', minDocFreq=5) 7 pipeline = Pipeline(stages=[regexTokenizer, stopwords_remover, hashingTF, idf, label_stringIdx]) 8 pipeline_fit = pipeline.fit(data) 9 dataset = pipeline_fit.transform(data) 10 (trainingData, testData) = dataset.randomSplit([0.7, 0.3], seed=100) 11 12 lr = LogisticRegression(maxIter=20, regParam=0.3, elasticNetParam=0) 13 lr_model = lr.fit(trainingData) 14 predictions = lr_model.transform(testData) 15 predictions.filter(predictions['prediction'] == 0).select('Descript', 'Category', 'probability', 'label', 'prediction').\ 16 orderBy('probability', ascending=False).show(n=10, truncate=30)
结果:
+----------------------------+-------------+------------------------------+-----+----------+ | Descript| Category| probability|label|prediction| +----------------------------+-------------+------------------------------+-----+----------+ |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.865376337558355,0.018892...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.865376337558355,0.018892...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.865376337558355,0.018892...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.865376337558355,0.018892...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.865376337558355,0.018892...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.865376337558355,0.018892...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.865376337558355,0.018892...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.865376337558355,0.018892...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.865376337558355,0.018892...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.865376337558355,0.018892...| 0.0| 0.0| +----------------------------+-------------+------------------------------+-----+----------+ only showing top 10 rows
1 evaluator = MulticlassClassificationEvaluator(predictionCol='prediction') 2 print(evaluator.evaluate(predictions)) 3 end_time = time.time() 4 print(end_time - start_time)
结果:
0.9653361434618551 12.998999834060669 4.3 交叉验证用交叉验证来优化参数,这里针对基于词频特征的逻辑回归模型进行优化
1 from pyspark.ml.tuning import ParamGridBuilder, CrossValidator 2 start_time = time.time() 3 pipeline = Pipeline(stages=[regexTokenizer, stopwords_remover, count_vectors, label_stringIdx]) 4 pipeline_fit = pipeline.fit(data) 5 (trainingData, testData) = dataset.randomSplit([0.7, 0.3], seed=100) 6 lr = LogisticRegression(maxIter=20, regParam=0.3, elasticNetParam=0) 7 # 为交叉验证创建参数 8 # ParamGridBuilder:用于基于网格搜索的模型选择的参数网格的生成器 9 # addGrid:将网格中给定参数设置为固定值 10 # parameter:正则化参数 11 # maxIter:迭代次数 12 # numFeatures:特征值 13 paramGrid = (ParamGridBuilder() 14 .addGrid(lr.regParam, [0.1, 0.3, 0.5]) 15 .addGrid(lr.elasticNetParam, [0.0, 0.1, 0.2]) 16 .addGrid(lr.maxIter, [10, 20, 50]) 17 # .addGrid(idf.numFeatures, [10, 100, 1000]) 18 .build()) 19 20 # 创建五折交叉验证 21 # estimator:要交叉验证的估计器 22 # estimatorParamMaps:网格搜索的最优参数 23 # evaluator:评估器 24 # numFolds:交叉次数 25 cv = CrossValidator(estimator=lr,\ 26 estimatorParamMaps=paramGrid,\ 27 evaluator=evaluator,\ 28 numFolds=5) 29 cv_model = cv.fit(trainingData) 30 predictions = cv_model.transform(testData) 31 32 # 模型评估 33 evaluator = MulticlassClassificationEvaluator(predictionCol='prediction') 34 print(evaluator.evaluate(predictions)) 35 end_time = time.time() 36 print(end_time - start_time)
结果:
0.9807684755923513 368.97300004959106 4.4 朴素贝叶斯
1 from pyspark.ml.classification import NaiveBayes 2 start_time = time.time() 3 # smoothing:平滑参数 4 nb = NaiveBayes(smoothing=1) 5 model = nb.fit(trainingData) 6 predictions = model.transform(testData) 7 predictions.filter(predictions['prediction'] == 0) \ 8 .select('Descript', 'Category', 'probability', 'label', 'prediction') \ 9 .orderBy('probability', ascending=False) \ 10 .show(n=10, truncate=30)
结果:
+----------------------+-------------+------------------------------+-----+----------+ | Descript| Category| probability|label|prediction| +----------------------+-------------+------------------------------+-----+----------+ | PETTY THEFT BICYCLE|LARCENY/THEFT|[1.0,1.236977662838925E-20,...| 0.0| 0.0| | PETTY THEFT BICYCLE|LARCENY/THEFT|[1.0,1.236977662838925E-20,...| 0.0| 0.0| | PETTY THEFT BICYCLE|LARCENY/THEFT|[1.0,1.236977662838925E-20,...| 0.0| 0.0| |GRAND THEFT PICKPOCKET|LARCENY/THEFT|[1.0,7.699728277574397E-24,...| 0.0| 0.0| |GRAND THEFT PICKPOCKET|LARCENY/THEFT|[1.0,7.699728277574397E-24,...| 0.0| 0.0| |GRAND THEFT PICKPOCKET|LARCENY/THEFT|[1.0,7.699728277574397E-24,...| 0.0| 0.0| |GRAND THEFT PICKPOCKET|LARCENY/THEFT|[1.0,7.699728277574397E-24,...| 0.0| 0.0| |GRAND THEFT PICKPOCKET|LARCENY/THEFT|[1.0,7.699728277574397E-24,...| 0.0| 0.0| |GRAND THEFT PICKPOCKET|LARCENY/THEFT|[1.0,7.699728277574397E-24,...| 0.0| 0.0| |GRAND THEFT PICKPOCKET|LARCENY/THEFT|[1.0,7.699728277574397E-24,...| 0.0| 0.0| +----------------------+-------------+------------------------------+-----+----------+ only showing top 10 rows
1 evaluator = MulticlassClassificationEvaluator(predictionCol='prediction') 2 print(evaluator.evaluate(predictions)) 3 end_time = time.time() 4 print(end_time - start_time)
结果:
0.977432832447723 5.371000051498413 4.5 随机森林
1 from pyspark.ml.classification import RandomForestClassifier 2 start_time = time.time() 3 # numTree:训练树的个数 4 # maxDepth:最大深度 5 # maxBins:连续特征离散化的最大分类数 6 rf = RandomForestClassifier(labelCol='label', \ 7 featuresCol='features', \ 8 numTrees=100, \ 9 maxDepth=4, \ 10 maxBins=32) 11 # Train model with Training Data 12 rfModel = rf.fit(trainingData) 13 predictions = rfModel.transform(testData) 14 predictions.filter(predictions['prediction'] == 0) \ 15 .select('Descript','Category','probability','label','prediction') \ 16 .orderBy('probability', ascending=False) \ 17 .show(n = 10, truncate = 30)
结果:
+----------------------------+-------------+------------------------------+-----+----------+ | Descript| Category| probability|label|prediction| +----------------------------+-------------+------------------------------+-----+----------+ |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.33206188381818563,0.1168...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.33206188381818563,0.1168...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.33206188381818563,0.1168...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.33206188381818563,0.1168...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.33206188381818563,0.1168...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.33206188381818563,0.1168...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.33206188381818563,0.1168...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.33206188381818563,0.1168...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.33206188381818563,0.1168...| 0.0| 0.0| |PETTY THEFT FROM LOCKED AUTO|LARCENY/THEFT|[0.33206188381818563,0.1168...| 0.0| 0.0| +----------------------------+-------------+------------------------------+-----+----------+ only showing top 10 rows
1 evaluator = MulticlassClassificationEvaluator(predictionCol='prediction') 2 print(evaluator.evaluate(predictions)) 3 end_time = time.time() 4 print(end_time - start_time)
结果:
0.27929770811242954 36.63699984550476
上面的结果可以看出:随机森林是优秀的、鲁棒的通用模型,但对于高维稀疏数据来说,并不是一个很好的选择。
明显,选择使用交叉验证的逻辑回归。
但是选择交叉验证的逻辑回归时需要注意一点:由于使用了交叉验证,训练时间会过长,在实际的应用场景中要根据业务选择最合适的模型。
转载于:https://www.cnblogs.com/cymx66688/p/10699018.html
项目实战-使用PySpark处理文本多分类问题相关推荐
- Python大数据处理库PySpark实战——使用PySpark处理文本多分类问题
[导读]近日,多伦多数据科学家Susan Li发表一篇博文,讲解利用PySpark处理文本多分类问题的详情.我们知道,Apache Spark在处理实时数据方面的能力非常出色,目前也在工业界广泛使用. ...
- 【机器学习】PCA主成分项目实战:MNIST手写数据集分类
PCA主成分项目实战:MNIST手写数据集分类 PCA处理手写数字集 1 模块加载与数据导入 2 模型创建与应用 手动反爬虫:原博地址 https://blog.csdn.net/lys_828/ar ...
- C语言项目实战之简单的文本编辑器
预期实现三个功能,第一,可以对指定的行输入字符串:第二,删除指定行的字符串:第三,显示编辑器的行数和内容. 我们通过块链结构来实现本程序."块"的含义是每个块中可以存放多个字符,& ...
- 机器学习项目实战——10决策树算法之动物分类
对于决策树的参数可以多次做网格搜索,更加细致的分类,可以使准确率更好. 整体代码: import pandas as pd import numpy as np # pip install missi ...
- Spring Boot + vue-element 开发个人博客项目实战教程(一、项目介绍和规划)
⭐ 作者简介:码上言 ⭐ 代表教程:Spring Boot + vue-element 开发个人博客项目实战教程 ⭐专栏内容:零基础学Java.个人博客系统 ⭐我的文档网站:http://xyhwh- ...
- 【项目实战课】NLP入门第1课,人人免费可学,基于TextCNN的新闻文本分类实战...
欢迎大家来到我们的项目实战课,本期内容是<基于TextCNN的新闻文本分类实战>. 所谓项目课,就是以简单的原理回顾+详细的项目实战的模式,针对具体的某一个主题,进行代码级的实战讲解,可以 ...
- 电商项目实战第三节: CSS3+HTML5+JS 设计案例【考拉海购网站】之【分类导航栏】
上一节:电商项目实战第二节: CSS3+HTML5+JS 设计案例[考拉海购网站]之[搜索框那一栏] 文章目录 [考拉海购网站]之[分类导航栏] 第一步,分析页面布局 第二步,写需要的html标签 i ...
- 【项目实战】Python基于KMeans算法进行文本聚类项目实战
说明:这是一个机器学习实战项目(附带数据+代码+文档+视频讲解),如需数据+代码+文档+视频讲解可以直接到文章最后获取. 1.项目背景 随着计算机技术的发展.Web 应用的逐步普及,大量的电子文本已经 ...
- Kaggle深度学习与卷积神经网络项目实战-猫狗分类检测数据集
Kaggle深度学习与卷积神经网络项目实战-猫狗分类检测数据集 一.相关介绍 二.下载数据集 三.代码示例 1.导入keras库,并显示版本号 2.构建网络 3.数据预处理 4.使用数据增强 四.使用 ...
最新文章
- Go 语言编程 — 程序运行环境
- 手机怎么进ph_明日发布,华为鸿蒙OS2.0手机版特色功能曝光
- 数据结构之线索化二叉树
- 【干货】数字化转型工作手册.pdf(附下载链接)
- 为什么现在那么多人都想做电商?
- IDEA快捷键整理(最详细的)
- UIAlertView 弹框
- PHP压缩CSS文件
- Hybrid APP基础篇(一)-什么是Hybrid App
- 《医院信息系统(HIS)软件基本功能规范》98版 [赏析]
- 后缀–ize_后缀-ize,-ise含义、来源和词例
- Ubuntu 安装 Google Chrome 浏览器
- 如何使用一个手机号注册两个微信号!
- 商业网站建设和运营的四度:Approachability、Usability、 Accessibility 和 Profitability...
- iOS CNContactStore 与AddressBook
- DetectoRS: Detecting Objects with Recursive Feature Pyramidand Switchable Atrous Convolution
- 解决windows10中springboot的jar启动之后的假死状态
- 为什么说龙妈能活到最后?
- 生产环境使用 pt-table-checksum 检查MySQL数据一致性
- Redis的expire(过期时间)
热门文章
- 《Effective Java读书笔记》--通用程序设计
- 程序员面试题100题第03题——求子数组的最大和
- 使用Controller.UpdateModel方法来更新ViewModel
- 怎么将.POF文件下载到开发板[转载]
- Tip: 强制 Outlook 用户使用缓存模式
- Python实战之多线程编程thread模块
- Vue 单页面应用 把公共组件放在 app.vue 但是我希望某个页面没有这些公共组件怎么办???(比如登陆页面)
- opengl入门6。1
- 1044. 火星数字(20)-PAT乙级真题
- php 警告方法 不可用,升级PHP版本后警告信息的逐一解决