Spark LogisticRegression 逻辑回归之建模
导入包
import org.apache.spark.sql.SparkSession import org.apache.spark.sql.Dataset import org.apache.spark.sql.Row import org.apache.spark.sql.DataFrame import org.apache.spark.sql.Column import org.apache.spark.sql.DataFrameReader import org.apache.spark.rdd.RDD import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder import org.apache.spark.sql.Encoder import org.apache.spark.sql.DataFrameStatFunctions import org.apache.spark.sql.functions._import org.apache.spark.ml.linalg.Vectors import org.apache.spark.ml.feature.VectorAssembler import org.apache.spark.ml.Pipeline import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.classification.{ BinaryLogisticRegressionSummary, LogisticRegression } import org.apache.spark.ml.tuning.{ ParamGridBuilder, TrainValidationSplit }
导入源数据
val spark = SparkSession.builder().appName("Spark Logistic Regression").config("spark.some.config.option", "some-value").getOrCreate()// For implicit conversions like converting RDDs to DataFrames import spark.implicits._val dataList: List[(Double, String, Double, Double, String, Double, Double, Double, Double)] = List( (0, "male", 37, 10, "no", 3, 18, 7, 4), (0, "female", 27, 4, "no", 4, 14, 6, 4), (0, "female", 32, 15, "yes", 1, 12, 1, 4), (0, "male", 57, 15, "yes", 5, 18, 6, 5), (0, "male", 22, 0.75, "no", 2, 17, 6, 3), (0, "female", 32, 1.5, "no", 2, 17, 5, 5), (0, "female", 22, 0.75, "no", 2, 12, 1, 3), (0, "male", 57, 15, "yes", 2, 14, 4, 4), (0, "female", 32, 15, "yes", 4, 16, 1, 2), (0, "male", 22, 1.5, "no", 4, 14, 4, 5), (0, "male", 37, 15, "yes", 2, 20, 7, 2), (0, "male", 27, 4, "yes", 4, 18, 6, 4), (0, "male", 47, 15, "yes", 5, 17, 6, 4), (0, "female", 22, 1.5, "no", 2, 17, 5, 4), (0, "female", 27, 4, "no", 4, 14, 5, 4), (0, "female", 37, 15, "yes", 1, 17, 5, 5), (0, "female", 37, 15, "yes", 2, 18, 4, 3), (0, "female", 22, 0.75, "no", 3, 16, 5, 4), (0, "female", 22, 1.5, "no", 2, 16, 5, 5), (0, "female", 27, 10, "yes", 2, 14, 1, 5), (0, "female", 22, 1.5, "no", 2, 16, 5, 5), (0, "female", 22, 1.5, "no", 2, 16, 5, 5), (0, "female", 27, 10, "yes", 4, 16, 5, 4), (0, "female", 32, 10, "yes", 3, 14, 1, 5), (0, "male", 37, 4, "yes", 2, 20, 6, 4), (0, "female", 22, 1.5, "no", 2, 18, 5, 5), (0, "female", 27, 7, "no", 4, 16, 1, 5), (0, "male", 42, 15, "yes", 5, 20, 6, 4), (0, "male", 27, 4, "yes", 3, 16, 5, 5), (0, "female", 27, 4, "yes", 3, 17, 5, 4), (0, "male", 42, 15, "yes", 4, 20, 6, 3), (0, "female", 22, 1.5, "no", 3, 16, 5, 5), (0, "male", 27, 0.417, "no", 4, 17, 6, 4), (0, "female", 42, 15, "yes", 5, 14, 5, 4), (0, "male", 32, 4, "yes", 1, 18, 6, 4), (0, "female", 22, 1.5, "no", 4, 16, 5, 3), (0, "female", 42, 15, "yes", 3, 12, 1, 4), (0, "female", 22, 4, "no", 4, 17, 5, 5), (0, "male", 22, 1.5, "yes", 1, 14, 3, 5), (0, "female", 22, 0.75, "no", 3, 16, 1, 5), (0, "male", 32, 10, "yes", 5, 20, 6, 5), (0, "male", 52, 15, "yes", 5, 18, 6, 3), (0, "female", 22, 0.417, "no", 5, 14, 1, 4), (0, "female", 27, 4, "yes", 2, 18, 6, 1), (0, "female", 32, 7, "yes", 5, 17, 5, 3), (0, "male", 22, 4, "no", 3, 16, 5, 5), (0, "female", 27, 7, "yes", 4, 18, 6, 5), (0, "female", 42, 15, "yes", 2, 18, 5, 4), (0, "male", 27, 1.5, "yes", 4, 16, 3, 5), (0, "male", 42, 15, "yes", 2, 20, 6, 4), (0, "female", 22, 0.75, "no", 5, 14, 3, 5), (0, "male", 32, 7, "yes", 2, 20, 6, 4), (0, "male", 27, 4, "yes", 5, 20, 6, 5), (0, "male", 27, 10, "yes", 4, 20, 6, 4), (0, "male", 22, 4, "no", 1, 18, 5, 5), (0, "female", 37, 15, "yes", 4, 14, 3, 1), (0, "male", 22, 1.5, "yes", 5, 16, 4, 4), (0, "female", 37, 15, "yes", 4, 17, 1, 5), (0, "female", 27, 0.75, "no", 4, 17, 5, 4), (0, "male", 32, 10, "yes", 4, 20, 6, 4), (0, "female", 47, 15, "yes", 5, 14, 7, 2), (0, "male", 37, 10, "yes", 3, 20, 6, 4), (0, "female", 22, 0.75, "no", 2, 16, 5, 5), (0, "male", 27, 4, "no", 2, 18, 4, 5), (0, "male", 32, 7, "no", 4, 20, 6, 4), (0, "male", 42, 15, "yes", 2, 17, 3, 5), (0, "male", 37, 10, "yes", 4, 20, 6, 4), (0, "female", 47, 15, "yes", 3, 17, 6, 5), (0, "female", 22, 1.5, "no", 5, 16, 5, 5), (0, "female", 27, 1.5, "no", 2, 16, 6, 4), (0, "female", 27, 4, "no", 3, 17, 5, 5), (0, "female", 32, 10, "yes", 5, 14, 4, 5), (0, "female", 22, 0.125, "no", 2, 12, 5, 5), (0, "male", 47, 15, "yes", 4, 14, 4, 3), (0, "male", 32, 15, "yes", 1, 14, 5, 5), (0, "male", 27, 7, "yes", 4, 16, 5, 5), (0, "female", 22, 1.5, "yes", 3, 16, 5, 5), (0, "male", 27, 4, "yes", 3, 17, 6, 5), (0, "female", 22, 1.5, "no", 3, 16, 5, 5), (0, "male", 57, 15, "yes", 2, 14, 7, 2), (0, "male", 17.5, 1.5, "yes", 3, 18, 6, 5), (0, "male", 57, 15, "yes", 4, 20, 6, 5), (0, "female", 22, 0.75, "no", 2, 16, 3, 4), (0, "male", 42, 4, "no", 4, 17, 3, 3), (0, "female", 22, 1.5, "yes", 4, 12, 1, 5), (0, "female", 22, 0.417, "no", 1, 17, 6, 4), (0, "female", 32, 15, "yes", 4, 17, 5, 5), (0, "female", 27, 1.5, "no", 3, 18, 5, 2), (0, "female", 22, 1.5, "yes", 3, 14, 1, 5), (0, "female", 37, 15, "yes", 3, 14, 1, 4), (0, "female", 32, 15, "yes", 4, 14, 3, 4), (0, "male", 37, 10, "yes", 2, 14, 5, 3), (0, "male", 37, 10, "yes", 4, 16, 5, 4), (0, "male", 57, 15, "yes", 5, 20, 5, 3), (0, "male", 27, 0.417, "no", 1, 16, 3, 4), (0, "female", 42, 15, "yes", 5, 14, 1, 5), (0, "male", 57, 15, "yes", 3, 16, 6, 1), (0, "male", 37, 10, "yes", 1, 16, 6, 4), (0, "male", 37, 15, "yes", 3, 17, 5, 5), (0, "male", 37, 15, "yes", 4, 20, 6, 5), (0, "female", 27, 10, "yes", 5, 14, 1, 5), (0, "male", 37, 10, "yes", 2, 18, 6, 4), (0, "female", 22, 0.125, "no", 4, 12, 4, 5), (0, "male", 57, 15, "yes", 5, 20, 6, 5), (0, "female", 37, 15, "yes", 4, 18, 6, 4), (0, "male", 22, 4, "yes", 4, 14, 6, 4), (0, "male", 27, 7, "yes", 4, 18, 5, 4), (0, "male", 57, 15, "yes", 4, 20, 5, 4), (0, "male", 32, 15, "yes", 3, 14, 6, 3), (0, "female", 22, 1.5, "no", 2, 14, 5, 4), (0, "female", 32, 7, "yes", 4, 17, 1, 5), (0, "female", 37, 15, "yes", 4, 17, 6, 5), (0, "female", 32, 1.5, "no", 5, 18, 5, 5), (0, "male", 42, 10, "yes", 5, 20, 7, 4), (0, "female", 27, 7, "no", 3, 16, 5, 4), (0, "male", 37, 15, "no", 4, 20, 6, 5), (0, "male", 37, 15, "yes", 4, 14, 3, 2), (0, "male", 32, 10, "no", 5, 18, 6, 4), (0, "female", 22, 0.75, "no", 4, 16, 1, 5), (0, "female", 27, 7, "yes", 4, 12, 2, 4), (0, "female", 27, 7, "yes", 2, 16, 2, 5), (0, "female", 42, 15, "yes", 5, 18, 5, 4), (0, "male", 42, 15, "yes", 4, 17, 5, 3), (0, "female", 27, 7, "yes", 2, 16, 1, 2), (0, "female", 22, 1.5, "no", 3, 16, 5, 5), (0, "male", 37, 15, "yes", 5, 20, 6, 5), (0, "female", 22, 0.125, "no", 2, 14, 4, 5), (0, "male", 27, 1.5, "no", 4, 16, 5, 5), (0, "male", 32, 1.5, "no", 2, 18, 6, 5), (0, "male", 27, 1.5, "no", 2, 17, 6, 5), (0, "female", 27, 10, "yes", 4, 16, 1, 3), (0, "male", 42, 15, "yes", 4, 18, 6, 5), (0, "female", 27, 1.5, "no", 2, 16, 6, 5), (0, "male", 27, 4, "no", 2, 18, 6, 3), (0, "female", 32, 10, "yes", 3, 14, 5, 3), (0, "female", 32, 15, "yes", 3, 18, 5, 4), (0, "female", 22, 0.75, "no", 2, 18, 6, 5), (0, "female", 37, 15, "yes", 2, 16, 1, 4), (0, "male", 27, 4, "yes", 4, 20, 5, 5), (0, "male", 27, 4, "no", 1, 20, 5, 4), (0, "female", 27, 10, "yes", 2, 12, 1, 4), (0, "female", 32, 15, "yes", 5, 18, 6, 4), (0, "male", 27, 7, "yes", 5, 12, 5, 3), (0, "male", 52, 15, "yes", 2, 18, 5, 4), (0, "male", 27, 4, "no", 3, 20, 6, 3), (0, "male", 37, 4, "yes", 1, 18, 5, 4), (0, "male", 27, 4, "yes", 4, 14, 5, 4), (0, "female", 52, 15, "yes", 5, 12, 1, 3), (0, "female", 57, 15, "yes", 4, 16, 6, 4), (0, "male", 27, 7, "yes", 1, 16, 5, 4), (0, "male", 37, 7, "yes", 4, 20, 6, 3), (0, "male", 22, 0.75, "no", 2, 14, 4, 3), (0, "male", 32, 4, "yes", 2, 18, 5, 3), (0, "male", 37, 15, "yes", 4, 20, 6, 3), (0, "male", 22, 0.75, "yes", 2, 14, 4, 3), (0, "male", 42, 15, "yes", 4, 20, 6, 3), (0, "female", 52, 15, "yes", 5, 17, 1, 1), (0, "female", 37, 15, "yes", 4, 14, 1, 2), (0, "male", 27, 7, "yes", 4, 14, 5, 3), (0, "male", 32, 4, "yes", 2, 16, 5, 5), (0, "female", 27, 4, "yes", 2, 18, 6, 5), (0, "female", 27, 4, "yes", 2, 18, 5, 5), (0, "male", 37, 15, "yes", 5, 18, 6, 5), (0, "female", 47, 15, "yes", 5, 12, 5, 4), (0, "female", 32, 10, "yes", 3, 17, 1, 4), (0, "female", 27, 1.5, "yes", 4, 17, 1, 2), (0, "female", 57, 15, "yes", 2, 18, 5, 2), (0, "female", 22, 1.5, "no", 4, 14, 5, 4), (0, "male", 42, 15, "yes", 3, 14, 3, 4), (0, "male", 57, 15, "yes", 4, 9, 2, 2), (0, "male", 57, 15, "yes", 4, 20, 6, 5), (0, "female", 22, 0.125, "no", 4, 14, 4, 5), (0, "female", 32, 10, "yes", 4, 14, 1, 5), (0, "female", 42, 15, "yes", 3, 18, 5, 4), (0, "female", 27, 1.5, "no", 2, 18, 6, 5), (0, "male", 32, 0.125, "yes", 2, 18, 5, 2), (0, "female", 27, 4, "no", 3, 16, 5, 4), (0, "female", 27, 10, "yes", 2, 16, 1, 4), (0, "female", 32, 7, "yes", 4, 16, 1, 3), (0, "female", 37, 15, "yes", 4, 14, 5, 4), (0, "female", 42, 15, "yes", 5, 17, 6, 2), (0, "male", 32, 1.5, "yes", 4, 14, 6, 5), (0, "female", 32, 4, "yes", 3, 17, 5, 3), (0, "female", 37, 7, "no", 4, 18, 5, 5), (0, "female", 22, 0.417, "yes", 3, 14, 3, 5), (0, "female", 27, 7, "yes", 4, 14, 1, 5), (0, "male", 27, 0.75, "no", 3, 16, 5, 5), (0, "male", 27, 4, "yes", 2, 20, 5, 5), (0, "male", 32, 10, "yes", 4, 16, 4, 5), (0, "male", 32, 15, "yes", 1, 14, 5, 5), (0, "male", 22, 0.75, "no", 3, 17, 4, 5), (0, "female", 27, 7, "yes", 4, 17, 1, 4), (0, "male", 27, 0.417, "yes", 4, 20, 5, 4), (0, "male", 37, 15, "yes", 4, 20, 5, 4), (0, "female", 37, 15, "yes", 2, 14, 1, 3), (0, "male", 22, 4, "yes", 1, 18, 5, 4), (0, "male", 37, 15, "yes", 4, 17, 5, 3), (0, "female", 22, 1.5, "no", 2, 14, 4, 5), (0, "male", 52, 15, "yes", 4, 14, 6, 2), (0, "female", 22, 1.5, "no", 4, 17, 5, 5), (0, "male", 32, 4, "yes", 5, 14, 3, 5), (0, "male", 32, 4, "yes", 2, 14, 3, 5), (0, "female", 22, 1.5, "no", 3, 16, 6, 5), (0, "male", 27, 0.75, "no", 2, 18, 3, 3), (0, "female", 22, 7, "yes", 2, 14, 5, 2), (0, "female", 27, 0.75, "no", 2, 17, 5, 3), (0, "female", 37, 15, "yes", 4, 12, 1, 2), (0, "female", 22, 1.5, "no", 1, 14, 1, 5), (0, "female", 37, 10, "no", 2, 12, 4, 4), (0, "female", 37, 15, "yes", 4, 18, 5, 3), (0, "female", 42, 15, "yes", 3, 12, 3, 3), (0, "male", 22, 4, "no", 2, 18, 5, 5), (0, "male", 52, 7, "yes", 2, 20, 6, 2), (0, "male", 27, 0.75, "no", 2, 17, 5, 5), (0, "female", 27, 4, "no", 2, 17, 4, 5), (0, "male", 42, 1.5, "no", 5, 20, 6, 5), (0, "male", 22, 1.5, "no", 4, 17, 6, 5), (0, "male", 22, 4, "no", 4, 17, 5, 3), (0, "female", 22, 4, "yes", 1, 14, 5, 4), (0, "male", 37, 15, "yes", 5, 20, 4, 5), (0, "female", 37, 10, "yes", 3, 16, 6, 3), (0, "male", 42, 15, "yes", 4, 17, 6, 5), (0, "female", 47, 15, "yes", 4, 17, 5, 5), (0, "male", 22, 1.5, "no", 4, 16, 5, 4), (0, "female", 32, 10, "yes", 3, 12, 1, 4), (0, "female", 22, 7, "yes", 1, 14, 3, 5), (0, "female", 32, 10, "yes", 4, 17, 5, 4), (0, "male", 27, 1.5, "yes", 2, 16, 2, 4), (0, "male", 37, 15, "yes", 4, 14, 5, 5), (0, "male", 42, 4, "yes", 3, 14, 4, 5), (0, "female", 37, 15, "yes", 5, 14, 5, 4), (0, "female", 32, 7, "yes", 4, 17, 5, 5), (0, "female", 42, 15, "yes", 4, 18, 6, 5), (0, "male", 27, 4, "no", 4, 18, 6, 4), (0, "male", 22, 0.75, "no", 4, 18, 6, 5), (0, "male", 27, 4, "yes", 4, 14, 5, 3), (0, "female", 22, 0.75, "no", 5, 18, 1, 5), (0, "female", 52, 15, "yes", 5, 9, 5, 5), (0, "male", 32, 10, "yes", 3, 14, 5, 5), (0, "female", 37, 15, "yes", 4, 16, 4, 4), (0, "male", 32, 7, "yes", 2, 20, 5, 4), (0, "female", 42, 15, "yes", 3, 18, 1, 4), (0, "male", 32, 15, "yes", 1, 16, 5, 5), (0, "male", 27, 4, "yes", 3, 18, 5, 5), (0, "female", 32, 15, "yes", 4, 12, 3, 4), (0, "male", 22, 0.75, "yes", 3, 14, 2, 4), (0, "female", 22, 1.5, "no", 3, 16, 5, 3), (0, "female", 42, 15, "yes", 4, 14, 3, 5), (0, "female", 52, 15, "yes", 3, 16, 5, 4), (0, "male", 37, 15, "yes", 5, 20, 6, 4), (0, "female", 47, 15, "yes", 4, 12, 2, 3), (0, "male", 57, 15, "yes", 2, 20, 6, 4), (0, "male", 32, 7, "yes", 4, 17, 5, 5), (0, "female", 27, 7, "yes", 4, 17, 1, 4), (0, "male", 22, 1.5, "no", 1, 18, 6, 5), (0, "female", 22, 4, "yes", 3, 9, 1, 4), (0, "female", 22, 1.5, "no", 2, 14, 1, 5), (0, "male", 42, 15, "yes", 2, 20, 6, 4), (0, "male", 57, 15, "yes", 4, 9, 2, 4), (0, "female", 27, 7, "yes", 2, 18, 1, 5), (0, "female", 22, 4, "yes", 3, 14, 1, 5), (0, "male", 37, 15, "yes", 4, 14, 5, 3), (0, "male", 32, 7, "yes", 1, 18, 6, 4), (0, "female", 22, 1.5, "no", 2, 14, 5, 5), (0, "female", 22, 1.5, "yes", 3, 12, 1, 3), (0, "male", 52, 15, "yes", 2, 14, 5, 5), (0, "female", 37, 15, "yes", 2, 14, 1, 1), (0, "female", 32, 10, "yes", 2, 14, 5, 5), (0, "male", 42, 15, "yes", 4, 20, 4, 5), (0, "female", 27, 4, "yes", 3, 18, 4, 5), (0, "male", 37, 15, "yes", 4, 20, 6, 5), (0, "male", 27, 1.5, "no", 3, 18, 5, 5), (0, "female", 22, 0.125, "no", 2, 16, 6, 3), (0, "male", 32, 10, "yes", 2, 20, 6, 3), (0, "female", 27, 4, "no", 4, 18, 5, 4), (0, "female", 27, 7, "yes", 2, 12, 5, 1), (0, "male", 32, 4, "yes", 5, 18, 6, 3), (0, "female", 37, 15, "yes", 2, 17, 5, 5), (0, "male", 47, 15, "no", 4, 20, 6, 4), (0, "male", 27, 1.5, "no", 1, 18, 5, 5), (0, "male", 37, 15, "yes", 4, 20, 6, 4), (0, "female", 32, 15, "yes", 4, 18, 1, 4), (0, "female", 32, 7, "yes", 4, 17, 5, 4), (0, "female", 42, 15, "yes", 3, 14, 1, 3), (0, "female", 27, 7, "yes", 3, 16, 1, 4), (0, "male", 27, 1.5, "no", 3, 16, 4, 2), (0, "male", 22, 1.5, "no", 3, 16, 3, 5), (0, "male", 27, 4, "yes", 3, 16, 4, 2), (0, "female", 27, 7, "yes", 3, 12, 1, 2), (0, "female", 37, 15, "yes", 2, 18, 5, 4), (0, "female", 37, 7, "yes", 3, 14, 4, 4), (0, "male", 22, 1.5, "no", 2, 16, 5, 5), (0, "male", 37, 15, "yes", 5, 20, 5, 4), (0, "female", 22, 1.5, "no", 4, 16, 5, 3), (0, "female", 32, 10, "yes", 4, 16, 1, 5), (0, "male", 27, 4, "no", 2, 17, 5, 3), (0, "female", 22, 0.417, "no", 4, 14, 5, 5), (0, "female", 27, 4, "no", 2, 18, 5, 5), (0, "male", 37, 15, "yes", 4, 18, 5, 3), (0, "male", 37, 10, "yes", 5, 20, 7, 4), (0, "female", 27, 7, "yes", 2, 14, 4, 2), (0, "male", 32, 4, "yes", 2, 16, 5, 5), (0, "male", 32, 4, "yes", 2, 16, 6, 4), (0, "male", 22, 1.5, "no", 3, 18, 4, 5), (0, "female", 22, 4, "yes", 4, 14, 3, 4), (0, "female", 17.5, 0.75, "no", 2, 18, 5, 4), (0, "male", 32, 10, "yes", 4, 20, 4, 5), (0, "female", 32, 0.75, "no", 5, 14, 3, 3), (0, "male", 37, 15, "yes", 4, 17, 5, 3), (0, "male", 32, 4, "no", 3, 14, 4, 5), (0, "female", 27, 1.5, "no", 2, 17, 3, 2), (0, "female", 22, 7, "yes", 4, 14, 1, 5), (0, "male", 47, 15, "yes", 5, 14, 6, 5), (0, "male", 27, 4, "yes", 1, 16, 4, 4), (0, "female", 37, 15, "yes", 5, 14, 1, 3), (0, "male", 42, 4, "yes", 4, 18, 5, 5), (0, "female", 32, 4, "yes", 2, 14, 1, 5), (0, "male", 52, 15, "yes", 2, 14, 7, 4), (0, "female", 22, 1.5, "no", 2, 16, 1, 4), (0, "male", 52, 15, "yes", 4, 12, 2, 4), (0, "female", 22, 0.417, "no", 3, 17, 1, 5), (0, "female", 22, 1.5, "no", 2, 16, 5, 5), (0, "male", 27, 4, "yes", 4, 20, 6, 4), (0, "female", 32, 15, "yes", 4, 14, 1, 5), (0, "female", 27, 1.5, "no", 2, 16, 3, 5), (0, "male", 32, 4, "no", 1, 20, 6, 5), (0, "male", 37, 15, "yes", 3, 20, 6, 4), (0, "female", 32, 10, "no", 2, 16, 6, 5), (0, "female", 32, 10, "yes", 5, 14, 5, 5), (0, "male", 37, 1.5, "yes", 4, 18, 5, 3), (0, "male", 32, 1.5, "no", 2, 18, 4, 4), (0, "female", 32, 10, "yes", 4, 14, 1, 4), (0, "female", 47, 15, "yes", 4, 18, 5, 4), (0, "female", 27, 10, "yes", 5, 12, 1, 5), (0, "male", 27, 4, "yes", 3, 16, 4, 5), (0, "female", 37, 15, "yes", 4, 12, 4, 2), (0, "female", 27, 0.75, "no", 4, 16, 5, 5), (0, "female", 37, 15, "yes", 4, 16, 1, 5), (0, "female", 32, 15, "yes", 3, 16, 1, 5), (0, "female", 27, 10, "yes", 2, 16, 1, 5), (0, "male", 27, 7, "no", 2, 20, 6, 5), (0, "female", 37, 15, "yes", 2, 14, 1, 3), (0, "male", 27, 1.5, "yes", 2, 17, 4, 4), (0, "female", 22, 0.75, "yes", 2, 14, 1, 5), (0, "male", 22, 4, "yes", 4, 14, 2, 4), (0, "male", 42, 0.125, "no", 4, 17, 6, 4), (0, "male", 27, 1.5, "yes", 4, 18, 6, 5), (0, "male", 27, 7, "yes", 3, 16, 6, 3), (0, "female", 52, 15, "yes", 4, 14, 1, 3), (0, "male", 27, 1.5, "no", 5, 20, 5, 2), (0, "female", 27, 1.5, "no", 2, 16, 5, 5), (0, "female", 27, 1.5, "no", 3, 17, 5, 5), (0, "male", 22, 0.125, "no", 5, 16, 4, 4), (0, "female", 27, 4, "yes", 4, 16, 1, 5), (0, "female", 27, 4, "yes", 4, 12, 1, 5), (0, "female", 47, 15, "yes", 2, 14, 5, 5), (0, "female", 32, 15, "yes", 3, 14, 5, 3), (0, "male", 42, 7, "yes", 2, 16, 5, 5), (0, "male", 22, 0.75, "no", 4, 16, 6, 4), (0, "male", 27, 0.125, "no", 3, 20, 6, 5), (0, "male", 32, 10, "yes", 3, 20, 6, 5), (0, "female", 22, 0.417, "no", 5, 14, 4, 5), (0, "female", 47, 15, "yes", 5, 14, 1, 4), (0, "female", 32, 10, "yes", 3, 14, 1, 5), (0, "male", 57, 15, "yes", 4, 17, 5, 5), (0, "male", 27, 4, "yes", 3, 20, 6, 5), (0, "female", 32, 7, "yes", 4, 17, 1, 5), (0, "female", 37, 10, "yes", 4, 16, 1, 5), (0, "female", 32, 10, "yes", 1, 18, 1, 4), (0, "female", 22, 4, "no", 3, 14, 1, 4), (0, "female", 27, 7, "yes", 4, 14, 3, 2), (0, "male", 57, 15, "yes", 5, 18, 5, 2), (0, "male", 32, 7, "yes", 2, 18, 5, 5), (0, "female", 27, 1.5, "no", 4, 17, 1, 3), (0, "male", 22, 1.5, "no", 4, 14, 5, 5), (0, "female", 22, 1.5, "yes", 4, 14, 5, 4), (0, "female", 32, 7, "yes", 3, 16, 1, 5), (0, "female", 47, 15, "yes", 3, 16, 5, 4), (0, "female", 22, 0.75, "no", 3, 16, 1, 5), (0, "female", 22, 1.5, "yes", 2, 14, 5, 5), (0, "female", 27, 4, "yes", 1, 16, 5, 5), (0, "male", 52, 15, "yes", 4, 16, 5, 5), (0, "male", 32, 10, "yes", 4, 20, 6, 5), (0, "male", 47, 15, "yes", 4, 16, 6, 4), (0, "female", 27, 7, "yes", 2, 14, 1, 2), (0, "female", 22, 1.5, "no", 4, 14, 4, 5), (0, "female", 32, 10, "yes", 2, 16, 5, 4), (0, "female", 22, 0.75, "no", 2, 16, 5, 4), (0, "female", 22, 1.5, "no", 2, 16, 5, 5), (0, "female", 42, 15, "yes", 3, 18, 6, 4), (0, "female", 27, 7, "yes", 5, 14, 4, 5), (0, "male", 42, 15, "yes", 4, 16, 4, 4), (0, "female", 57, 15, "yes", 3, 18, 5, 2), (0, "male", 42, 15, "yes", 3, 18, 6, 2), (0, "female", 32, 7, "yes", 2, 14, 1, 2), (0, "male", 22, 4, "no", 5, 12, 4, 5), (0, "female", 22, 1.5, "no", 1, 16, 6, 5), (0, "female", 22, 0.75, "no", 1, 14, 4, 5), (0, "female", 32, 15, "yes", 4, 12, 1, 5), (0, "male", 22, 1.5, "no", 2, 18, 5, 3), (0, "male", 27, 4, "yes", 5, 17, 2, 5), (0, "female", 27, 4, "yes", 4, 12, 1, 5), (0, "male", 42, 15, "yes", 5, 18, 5, 4), (0, "male", 32, 1.5, "no", 2, 20, 7, 3), (0, "male", 57, 15, "no", 4, 9, 3, 1), (0, "male", 37, 7, "no", 4, 18, 5, 5), (0, "male", 52, 15, "yes", 2, 17, 5, 4), (0, "male", 47, 15, "yes", 4, 17, 6, 5), (0, "female", 27, 7, "no", 2, 17, 5, 4), (0, "female", 27, 7, "yes", 4, 14, 5, 5), (0, "female", 22, 4, "no", 2, 14, 3, 3), (0, "male", 37, 7, "yes", 2, 20, 6, 5), (0, "male", 27, 7, "no", 4, 12, 4, 3), (0, "male", 42, 10, "yes", 4, 18, 6, 4), (0, "female", 22, 1.5, "no", 3, 14, 1, 5), (0, "female", 22, 4, "yes", 2, 14, 1, 3), (0, "female", 57, 15, "no", 4, 20, 6, 5), (0, "male", 37, 15, "yes", 4, 14, 4, 3), (0, "female", 27, 7, "yes", 3, 18, 5, 5), (0, "female", 17.5, 10, "no", 4, 14, 4, 5), (0, "male", 22, 4, "yes", 4, 16, 5, 5), (0, "female", 27, 4, "yes", 2, 16, 1, 4), (0, "female", 37, 15, "yes", 2, 14, 5, 1), (0, "female", 22, 1.5, "no", 5, 14, 1, 4), (0, "male", 27, 7, "yes", 2, 20, 5, 4), (0, "male", 27, 4, "yes", 4, 14, 5, 5), (0, "male", 22, 0.125, "no", 1, 16, 3, 5), (0, "female", 27, 7, "yes", 4, 14, 1, 4), (0, "female", 32, 15, "yes", 5, 16, 5, 3), (0, "male", 32, 10, "yes", 4, 18, 5, 4), (0, "female", 32, 15, "yes", 2, 14, 3, 4), (0, "female", 22, 1.5, "no", 3, 17, 5, 5), (0, "male", 27, 4, "yes", 4, 17, 4, 4), (0, "female", 52, 15, "yes", 5, 14, 1, 5), (0, "female", 27, 7, "yes", 2, 12, 1, 2), (0, "female", 27, 7, "yes", 3, 12, 1, 4), (0, "female", 42, 15, "yes", 2, 14, 1, 4), (0, "female", 42, 15, "yes", 4, 14, 5, 4), (0, "male", 27, 7, "yes", 4, 14, 3, 3), (0, "male", 27, 7, "yes", 2, 20, 6, 2), (0, "female", 42, 15, "yes", 3, 12, 3, 3), (0, "male", 27, 4, "yes", 3, 16, 3, 5), (0, "female", 27, 7, "yes", 3, 14, 1, 4), (0, "female", 22, 1.5, "no", 2, 14, 4, 5), (0, "female", 27, 4, "yes", 4, 14, 1, 4), (0, "female", 22, 4, "no", 4, 14, 5, 5), (0, "female", 22, 1.5, "no", 2, 16, 4, 5), (0, "male", 47, 15, "no", 4, 14, 5, 4), (0, "male", 37, 10, "yes", 2, 18, 6, 2), (0, "male", 37, 15, "yes", 3, 17, 5, 4), (0, "female", 27, 4, "yes", 2, 16, 1, 4), (3, "male", 27, 1.5, "no", 3, 18, 4, 4), (3, "female", 27, 4, "yes", 3, 17, 1, 5), (7, "male", 37, 15, "yes", 5, 18, 6, 2), (12, "female", 32, 10, "yes", 3, 17, 5, 2), (1, "male", 22, 0.125, "no", 4, 16, 5, 5), (1, "female", 22, 1.5, "yes", 2, 14, 1, 5), (12, "male", 37, 15, "yes", 4, 14, 5, 2), (7, "female", 22, 1.5, "no", 2, 14, 3, 4), (2, "male", 37, 15, "yes", 2, 18, 6, 4), (3, "female", 32, 15, "yes", 4, 12, 3, 2), (1, "female", 37, 15, "yes", 4, 14, 4, 2), (7, "female", 42, 15, "yes", 3, 17, 1, 4), (12, "female", 42, 15, "yes", 5, 9, 4, 1), (12, "male", 37, 10, "yes", 2, 20, 6, 2), (12, "female", 32, 15, "yes", 3, 14, 1, 2), (3, "male", 27, 4, "no", 1, 18, 6, 5), (7, "male", 37, 10, "yes", 2, 18, 7, 3), (7, "female", 27, 4, "no", 3, 17, 5, 5), (1, "male", 42, 15, "yes", 4, 16, 5, 5), (1, "female", 47, 15, "yes", 5, 14, 4, 5), (7, "female", 27, 4, "yes", 3, 18, 5, 4), (1, "female", 27, 7, "yes", 5, 14, 1, 4), (12, "male", 27, 1.5, "yes", 3, 17, 5, 4), (12, "female", 27, 7, "yes", 4, 14, 6, 2), (3, "female", 42, 15, "yes", 4, 16, 5, 4), (7, "female", 27, 10, "yes", 4, 12, 7, 3), (1, "male", 27, 1.5, "no", 2, 18, 5, 2), (1, "male", 32, 4, "no", 4, 20, 6, 4), (1, "female", 27, 7, "yes", 3, 14, 1, 3), (3, "female", 32, 10, "yes", 4, 14, 1, 4), (3, "male", 27, 4, "yes", 2, 18, 7, 2), (1, "female", 17.5, 0.75, "no", 5, 14, 4, 5), (1, "female", 32, 10, "yes", 4, 18, 1, 5), (7, "female", 32, 7, "yes", 2, 17, 6, 4), (7, "male", 37, 15, "yes", 2, 20, 6, 4), (7, "female", 37, 10, "no", 1, 20, 5, 3), (12, "female", 32, 10, "yes", 2, 16, 5, 5), (7, "male", 52, 15, "yes", 2, 20, 6, 4), (7, "female", 42, 15, "yes", 1, 12, 1, 3), (1, "male", 52, 15, "yes", 2, 20, 6, 3), (2, "male", 37, 15, "yes", 3, 18, 6, 5), (12, "female", 22, 4, "no", 3, 12, 3, 4), (12, "male", 27, 7, "yes", 1, 18, 6, 2), (1, "male", 27, 4, "yes", 3, 18, 5, 5), (12, "male", 47, 15, "yes", 4, 17, 6, 5), (12, "female", 42, 15, "yes", 4, 12, 1, 1), (7, "male", 27, 4, "no", 3, 14, 3, 4), (7, "female", 32, 7, "yes", 4, 18, 4, 5), (1, "male", 32, 0.417, "yes", 3, 12, 3, 4), (3, "male", 47, 15, "yes", 5, 16, 5, 4), (12, "male", 37, 15, "yes", 2, 20, 5, 4), (7, "male", 22, 4, "yes", 2, 17, 6, 4), (1, "male", 27, 4, "no", 2, 14, 4, 5), (7, "female", 52, 15, "yes", 5, 16, 1, 3), (1, "male", 27, 4, "no", 3, 14, 3, 3), (1, "female", 27, 10, "yes", 4, 16, 1, 4), (1, "male", 32, 7, "yes", 3, 14, 7, 4), (7, "male", 32, 7, "yes", 2, 18, 4, 1), (3, "male", 22, 1.5, "no", 1, 14, 3, 2), (7, "male", 22, 4, "yes", 3, 18, 6, 4), (7, "male", 42, 15, "yes", 4, 20, 6, 4), (2, "female", 57, 15, "yes", 1, 18, 5, 4), (7, "female", 32, 4, "yes", 3, 18, 5, 2), (1, "male", 27, 4, "yes", 1, 16, 4, 4), (7, "male", 32, 7, "yes", 4, 16, 1, 4), (2, "male", 57, 15, "yes", 1, 17, 4, 4), (7, "female", 42, 15, "yes", 4, 14, 5, 2), (7, "male", 37, 10, "yes", 1, 18, 5, 3), (3, "male", 42, 15, "yes", 3, 17, 6, 1), (1, "female", 52, 15, "yes", 3, 14, 4, 4), (2, "female", 27, 7, "yes", 3, 17, 5, 3), (12, "male", 32, 7, "yes", 2, 12, 4, 2), (1, "male", 22, 4, "no", 4, 14, 2, 5), (3, "male", 27, 7, "yes", 3, 18, 6, 4), (12, "female", 37, 15, "yes", 1, 18, 5, 5), (7, "female", 32, 15, "yes", 3, 17, 1, 3), (7, "female", 27, 7, "no", 2, 17, 5, 5), (1, "female", 32, 7, "yes", 3, 17, 5, 3), (1, "male", 32, 1.5, "yes", 2, 14, 2, 4), (12, "female", 42, 15, "yes", 4, 14, 1, 2), (7, "male", 32, 10, "yes", 3, 14, 5, 4), (7, "male", 37, 4, "yes", 1, 20, 6, 3), (1, "female", 27, 4, "yes", 2, 16, 5, 3), (12, "female", 42, 15, "yes", 3, 14, 4, 3), (1, "male", 27, 10, "yes", 5, 20, 6, 5), (12, "male", 37, 10, "yes", 2, 20, 6, 2), (12, "female", 27, 7, "yes", 1, 14, 3, 3), (3, "female", 27, 7, "yes", 4, 12, 1, 2), (3, "male", 32, 10, "yes", 2, 14, 4, 4), (12, "female", 17.5, 0.75, "yes", 2, 12, 1, 3), (12, "female", 32, 15, "yes", 3, 18, 5, 4), (2, "female", 22, 7, "no", 4, 14, 4, 3), (1, "male", 32, 7, "yes", 4, 20, 6, 5), (7, "male", 27, 4, "yes", 2, 18, 6, 2), (1, "female", 22, 1.5, "yes", 5, 14, 5, 3), (12, "female", 32, 15, "no", 3, 17, 5, 1), (12, "female", 42, 15, "yes", 2, 12, 1, 2), (7, "male", 42, 15, "yes", 3, 20, 5, 4), (12, "male", 32, 10, "no", 2, 18, 4, 2), (12, "female", 32, 15, "yes", 3, 9, 1, 1), (7, "male", 57, 15, "yes", 5, 20, 4, 5), (12, "male", 47, 15, "yes", 4, 20, 6, 4), (2, "female", 42, 15, "yes", 2, 17, 6, 3), (12, "male", 37, 15, "yes", 3, 17, 6, 3), (12, "male", 37, 15, "yes", 5, 17, 5, 2), (7, "male", 27, 10, "yes", 2, 20, 6, 4), (2, "male", 37, 15, "yes", 2, 16, 5, 4), (12, "female", 32, 15, "yes", 1, 14, 5, 2), (7, "male", 32, 10, "yes", 3, 17, 6, 3), (2, "male", 37, 15, "yes", 4, 18, 5, 1), (7, "female", 27, 1.5, "no", 2, 17, 5, 5), (3, "female", 47, 15, "yes", 2, 17, 5, 2), (12, "male", 37, 15, "yes", 2, 17, 5, 4), (12, "female", 27, 4, "no", 2, 14, 5, 5), (2, "female", 27, 10, "yes", 4, 14, 1, 5), (1, "female", 22, 4, "yes", 3, 16, 1, 3), (12, "male", 52, 7, "no", 4, 16, 5, 5), (2, "female", 27, 4, "yes", 1, 16, 3, 5), (7, "female", 37, 15, "yes", 2, 17, 6, 4), (2, "female", 27, 4, "no", 1, 17, 3, 1), (12, "female", 17.5, 0.75, "yes", 2, 12, 3, 5), (7, "female", 32, 15, "yes", 5, 18, 5, 4), (7, "female", 22, 4, "no", 1, 16, 3, 5), (2, "male", 32, 4, "yes", 4, 18, 6, 4), (1, "female", 22, 1.5, "yes", 3, 18, 5, 2), (3, "female", 42, 15, "yes", 2, 17, 5, 4), (1, "male", 32, 7, "yes", 4, 16, 4, 4), (12, "male", 37, 15, "no", 3, 14, 6, 2), (1, "male", 42, 15, "yes", 3, 16, 6, 3), (1, "male", 27, 4, "yes", 1, 18, 5, 4), (2, "male", 37, 15, "yes", 4, 20, 7, 3), (7, "male", 37, 15, "yes", 3, 20, 6, 4), (3, "male", 22, 1.5, "no", 2, 12, 3, 3), (3, "male", 32, 4, "yes", 3, 20, 6, 2), (2, "male", 32, 15, "yes", 5, 20, 6, 5), (12, "female", 52, 15, "yes", 1, 18, 5, 5), (12, "male", 47, 15, "no", 1, 18, 6, 5), (3, "female", 32, 15, "yes", 4, 16, 4, 4), (7, "female", 32, 15, "yes", 3, 14, 3, 2), (7, "female", 27, 7, "yes", 4, 16, 1, 2), (12, "male", 42, 15, "yes", 3, 18, 6, 2), (7, "female", 42, 15, "yes", 2, 14, 3, 2), (12, "male", 27, 7, "yes", 2, 17, 5, 4), (3, "male", 32, 10, "yes", 4, 14, 4, 3), (7, "male", 47, 15, "yes", 3, 16, 4, 2), (1, "male", 22, 1.5, "yes", 1, 12, 2, 5), (7, "female", 32, 10, "yes", 2, 18, 5, 4), (2, "male", 32, 10, "yes", 2, 17, 6, 5), (2, "male", 22, 7, "yes", 3, 18, 6, 2), (1, "female", 32, 15, "yes", 3, 14, 1, 5))val colArray1: Array[String] = Array("affairs", "gender", "age", "yearsmarried", "children", "religiousness", "education", "occupation", "rating")val data = dataList.toDF(colArray1: _*)
逻辑回归建模
data.createOrReplaceTempView("df")val affairs = "case when affairs>0 then 1 else 0 end as affairs," val gender = "case when gender='female' then 0 else 1 end as gender," val children = "case when children='yes' then 1 else 0 end as children,"val sqlDF = spark.sql("select " +affairs +gender +"age,yearsmarried," +children +"religiousness,education,occupation,rating" +" from df ") sqlDF.show()val colArray2 = Array("gender", "age", "yearsmarried", "children", "religiousness", "education", "occupation", "rating")val vecDF: DataFrame = new VectorAssembler().setInputCols(colArray2).setOutputCol("features").transform(sqlDF)val Array(trainingDF, testDF) = vecDF.randomSplit(Array(0.9, 0.1), seed = 12345)val lrModel = new LogisticRegression().setLabelCol("affairs").setFeaturesCol("features").fit(trainingDF)// 输出逻辑回归的系数和截距 println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")// 设置ElasticNet混合参数,范围为[0,1]。 // 对于α= 0,惩罚是L2惩罚。 对于alpha = 1,它是一个L1惩罚。 对于0 <α<1,惩罚是L1和L2的组合。 默认值为0.0,这是一个L2惩罚。 lrModel.getElasticNetParam lrModel.getRegParam // 正则化参数>=0 lrModel.getStandardization // 在拟合模型之前,是否标准化特征 // 在二进制分类中设置阈值,范围为[0,1]。如果类标签1的估计概率>Threshold,则预测1,否则0.高阈值鼓励模型更频繁地预测0; 低阈值鼓励模型更频繁地预测1。默认值为0.5。 lrModel.getThreshold // 设置迭代的收敛容限。 较小的值将导致更高的精度与更多的迭代的成本。 默认值为1E-6。 lrModel.getTol lrModel.getMaxIterlrModel.transform(testDF).select("features","rawPrediction","probability","prediction").show(30,false)// Extract the summary from the returned LogisticRegressionModel instance trained in the earlier // example val trainingSummary = lrModel.summary// Obtain the objective per iteration. val objectiveHistory = trainingSummary.objectiveHistory objectiveHistory.foreach(loss => println(loss))
代码执行结果
sqlDF.show() +-------+------+----+------------+--------+-------------+---------+----------+------+ |affairs|gender| age|yearsmarried|children|religiousness|education|occupation|rating| +-------+------+----+------------+--------+-------------+---------+----------+------+ | 0| 1|37.0| 10.0| 0| 3.0| 18.0| 7.0| 4.0| | 0| 0|27.0| 4.0| 0| 4.0| 14.0| 6.0| 4.0| | 0| 0|32.0| 15.0| 1| 1.0| 12.0| 1.0| 4.0| | 0| 1|57.0| 15.0| 1| 5.0| 18.0| 6.0| 5.0| | 0| 1|22.0| 0.75| 0| 2.0| 17.0| 6.0| 3.0| | 0| 0|32.0| 1.5| 0| 2.0| 17.0| 5.0| 5.0| | 0| 0|22.0| 0.75| 0| 2.0| 12.0| 1.0| 3.0| | 0| 1|57.0| 15.0| 1| 2.0| 14.0| 4.0| 4.0| | 0| 0|32.0| 15.0| 1| 4.0| 16.0| 1.0| 2.0| | 0| 1|22.0| 1.5| 0| 4.0| 14.0| 4.0| 5.0| | 0| 1|37.0| 15.0| 1| 2.0| 20.0| 7.0| 2.0| | 0| 1|27.0| 4.0| 1| 4.0| 18.0| 6.0| 4.0| | 0| 1|47.0| 15.0| 1| 5.0| 17.0| 6.0| 4.0| | 0| 0|22.0| 1.5| 0| 2.0| 17.0| 5.0| 4.0| | 0| 0|27.0| 4.0| 0| 4.0| 14.0| 5.0| 4.0| | 0| 0|37.0| 15.0| 1| 1.0| 17.0| 5.0| 5.0| | 0| 0|37.0| 15.0| 1| 2.0| 18.0| 4.0| 3.0| | 0| 0|22.0| 0.75| 0| 3.0| 16.0| 5.0| 4.0| | 0| 0|22.0| 1.5| 0| 2.0| 16.0| 5.0| 5.0| | 0| 0|27.0| 10.0| 1| 2.0| 14.0| 1.0| 5.0| +-------+------+----+------------+--------+-------------+---------+----------+------+ only showing top 20 rowsval colArray2 = Array("gender", "age", "yearsmarried", "children", "religiousness", "education", "occupation", "rating") colArray2: Array[String] = Array(gender, age, yearsmarried, children, religiousness, education, occupation, rating)val vecDF: DataFrame = new VectorAssembler().setInputCols(colArray2).setOutputCol("features").transform(sqlDF) vecDF: org.apache.spark.sql.DataFrame = [affairs: int, gender: int ... 8 more fields]val Array(trainingDF, testDF) = vecDF.randomSplit(Array(0.9, 0.1), seed = 12345) trainingDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [affairs: int, gender: int ... 8 more fields] testDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [affairs: int, gender: int ... 8 more fields]val lrModel = new LogisticRegression().setLabelCol("affairs").setFeaturesCol("features").fit(trainingDF) lrModel: org.apache.spark.ml.classification.LogisticRegressionModel = logreg_9d8a91cb1a0b// 输出逻辑回归的系数和截距 println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}") Coefficients: [0.308688148697453,-0.04150802586369178,0.08771801000466706,0.6896853841812993,-0.3425440049065515,0.008629892776596084,0.0458687806620022,-0.46268114569065383] Intercept: 1.263 200227888706// 设置ElasticNet混合参数,范围为[0,1]。 // 对于α= 0,惩罚是L2惩罚。 对于alpha = 1,它是一个L1惩罚。 对于0 <α<1,惩罚是L1和L2的组合。 默认值为0.0,这是一个L2惩罚。 lrModel.getElasticNetParam res5: Double = 0.0lrModel.getRegParam // 正则化参数>=0 res6: Double = 0.0lrModel.getStandardization // 在拟合模型之前,是否标准化特征 res7: Boolean = true// 在二进制分类中设置阈值,范围为[0,1]。如果类标签1的估计概率>Threshold,则预测1,否则0.高阈值鼓励模型更频繁地预测0; 低阈值鼓励模型更频繁地预测1。默认值为0.5。 lrModel.getThreshold res8: Double = 0.5// 设置迭代的收敛容限。 较小的值将导致更高的精度与更多的迭代的成本。 默认值为1E-6。 lrModel.getTol res9: Double = 1.0E-6lrModel.transform(testDF).show +-------+------+----+------------+--------+-------------+---------+----------+------+--------------------+--------------------+--------------------+----------+ |affairs|gender| age|yearsmarried|children|religiousness|education|occupation|rating| features| rawPrediction| probability|prediction| +-------+------+----+------------+--------+-------------+---------+----------+------+--------------------+--------------------+--------------------+----------+ | 0| 0|22.0| 0.125| 0| 4.0| 14.0| 4.0| 5.0|[0.0,22.0,0.125,0...|[3.01829971642105...|[0.95339403355398...| 0.0| | 0| 0|22.0| 0.417| 1| 3.0| 14.0| 3.0| 5.0|[0.0,22.0,0.417,1...|[2.00632544907384...|[0.88145961149358...| 0.0| | 0| 0|27.0| 1.5| 0| 2.0| 16.0| 6.0| 5.0|[0.0,27.0,1.5,0.0...|[2.31114222529279...|[0.90979563879849...| 0.0| | 0| 0|27.0| 4.0| 1| 3.0| 18.0| 4.0| 5.0|[0.0,27.0,4.0,1.0...|[1.81918359677719...|[0.86046813628746...| 0.0| | 0| 0|27.0| 7.0| 1| 2.0| 18.0| 1.0| 5.0|[0.0,27.0,7.0,1.0...|[1.35109190384264...|[0.79430808378365...| 0.0| | 0| 0|27.0| 7.0| 1| 3.0| 16.0| 1.0| 4.0|[0.0,27.0,7.0,1.0...|[1.24821454861173...|[0.77699063797650...| 0.0| | 0| 0|27.0| 10.0| 1| 2.0| 12.0| 1.0| 4.0|[0.0,27.0,10.0,1....|[0.67703608479756...|[0.66307686153089...| 0.0| | 0| 0|32.0| 10.0| 1| 4.0| 17.0| 5.0| 4.0|[0.0,32.0,10.0,1....|[1.34303963739813...|[0.79298936429536...| 0.0| | 0| 0|32.0| 10.0| 1| 5.0| 14.0| 4.0| 5.0|[0.0,32.0,10.0,1....|[2.22002324698713...|[0.90203325004083...| 0.0| | 0| 0|32.0| 15.0| 1| 3.0| 18.0| 5.0| 4.0|[0.0,32.0,15.0,1....|[0.55327568969165...|[0.63489524159656...| 0.0| | 0| 0|37.0| 15.0| 1| 4.0| 17.0| 1.0| 5.0|[0.0,37.0,15.0,1....|[1.75814598503192...|[0.85297730582863...| 0.0| | 0| 0|52.0| 15.0| 1| 5.0| 9.0| 5.0| 5.0|[0.0,52.0,15.0,1....|[2.60887439745861...|[0.93143054154558...| 0.0| | 0| 0|52.0| 15.0| 1| 5.0| 12.0| 1.0| 3.0|[0.0,52.0,15.0,1....|[1.84109755039552...|[0.86307846107252...| 0.0| | 0| 0|57.0| 15.0| 1| 4.0| 16.0| 6.0| 4.0|[0.0,57.0,15.0,1....|[1.90491134608169...|[0.87044638395268...| 0.0| | 0| 1|22.0| 4.0| 0| 1.0| 18.0| 5.0| 5.0|[1.0,22.0,4.0,0.0...|[1.26168391246747...|[0.77931584772929...| 0.0| | 0| 1|22.0| 4.0| 0| 2.0| 18.0| 5.0| 5.0|[1.0,22.0,4.0,0.0...|[1.60422791737402...|[0.83260846569570...| 0.0| | 0| 1|27.0| 4.0| 1| 3.0| 16.0| 5.0| 5.0|[1.0,27.0,4.0,1.0...|[1.48188645297092...|[0.81485734920851...| 0.0| | 0| 1|27.0| 4.0| 1| 4.0| 14.0| 5.0| 4.0|[1.0,27.0,4.0,1.0...|[1.37900909774001...|[0.79883180985416...| 0.0| | 0| 1|32.0| 0.125| 1| 2.0| 18.0| 5.0| 2.0|[1.0,32.0,0.125,1...|[0.28148664352576...|[0.56991065665974...| 0.0| | 0| 1|32.0| 10.0| 1| 2.0| 20.0| 6.0| 3.0|[1.0,32.0,10.0,1....|[-0.1851761257948...|[0.45383780246566...| 1.0| +-------+------+----+------------+--------+-------------+---------+----------+------+--------------------+--------------------+--------------------+----------+ only showing top 20 rows// Extract the summary from the returned LogisticRegressionModel instance trained in the earlier // example val trainingSummary = lrModel.summary trainingSummary: org.apache.spark.ml.classification.LogisticRegressionTrainingSummary = org.apache.spark.ml.classification.BinaryLogisticRegressionTrainingSummary@4cde233d// Obtain the objective per iteration. val objectiveHistory = trainingSummary.objectiveHistory objectiveHistory: Array[Double] = Array(0.5613118243072733, 0.5564125149222438, 0.5365395467216898, 0.5160918427628939, 0.51304621799159, 0.5105231964507352, 0.5079869547558363, 0.50728888730 31864, 0.5067113660796532, 0.506520677080951, 0.5059147658563949, 0.5053652033316485, 0.5047266888422277, 0.5045473900598205, 0.5041496504941453, 0.5034630545828777, 0.5025745763542784, 0.5019910559468922, 0.5012033102192196, 0.5009489760675826, 0.5008431925740259, 0.5008297629370251, 0.5008258245513862, 0.5008137617093257, 0.5008136785235711, 0.5008130045533166, 0.5008129888367148, 0.5008129675120628, 0.5008129469652479, 0.5008129168191972, 0.5008129132692991, 0.5008129124596163, 0.5008129124081014, 0.500812912251931, 0.5008129121356268) objectiveHistory.foreach(loss => println(loss)) 0.5613118243072733 0.5564125149222438 0.5365395467216898 0.5160918427628939 0.51304621799159 0.5105231964507352 0.5079869547558363 0.5072888873031864 0.5067113660796532 0.506520677080951 0.5059147658563949 0.5053652033316485 0.5047266888422277 0.5045473900598205 0.5041496504941453 0.5034630545828777 0.5025745763542784 0.5019910559468922 0.5012033102192196 0.5009489760675826 0.5008431925740259 0.5008297629370251 0.5008258245513862 0.5008137617093257 0.5008136785235711 0.5008130045533166 0.5008129888367148 0.5008129675120628 0.5008129469652479 0.5008129168191972 0.5008129132692991 0.5008129124596163 0.5008129124081014 0.500812912251931 0.5008129121356268lrModel.transform(testDF).select("features","rawPrediction","probability","prediction").show(30,false) +-------------------------------------+--------------------------------------------+----------------------------------------+----------+ |features |rawPrediction |probability |prediction| +-------------------------------------+--------------------------------------------+----------------------------------------+----------+ |[0.0,22.0,0.125,0.0,4.0,14.0,4.0,5.0]|[3.0182997164210517,-3.0182997164210517] |[0.9533940335539883,0.04660596644601167]|0.0 | |[0.0,22.0,0.417,1.0,3.0,14.0,3.0,5.0]|[2.00632544907384,-2.00632544907384] |[0.8814596114935873,0.11854038850641263]|0.0 | |[0.0,27.0,1.5,0.0,2.0,16.0,6.0,5.0] |[2.311142225292793,-2.311142225292793] |[0.9097956387984996,0.09020436120150035]|0.0 | |[0.0,27.0,4.0,1.0,3.0,18.0,4.0,5.0] |[1.81918359677719,-1.81918359677719] |[0.8604681362874618,0.13953186371253828]|0.0 | |[0.0,27.0,7.0,1.0,2.0,18.0,1.0,5.0] |[1.351091903842644,-1.351091903842644] |[0.7943080837836515,0.20569191621634847]|0.0 | |[0.0,27.0,7.0,1.0,3.0,16.0,1.0,4.0] |[1.2482145486117338,-1.2482145486117338] |[0.7769906379765039,0.2230093620234961] |0.0 | |[0.0,27.0,10.0,1.0,2.0,12.0,1.0,4.0] |[0.6770360847975654,-0.6770360847975654] |[0.6630768615308953,0.33692313846910465]|0.0 | |[0.0,32.0,10.0,1.0,4.0,17.0,5.0,4.0] |[1.343039637398138,-1.343039637398138] |[0.7929893642953615,0.20701063570463848]|0.0 | |[0.0,32.0,10.0,1.0,5.0,14.0,4.0,5.0] |[2.220023246987134,-2.220023246987134] |[0.9020332500408325,0.09796674995916752]|0.0 | |[0.0,32.0,15.0,1.0,3.0,18.0,5.0,4.0] |[0.5532756896916551,-0.5532756896916551] |[0.6348952415965647,0.3651047584034352] |0.0 | |[0.0,37.0,15.0,1.0,4.0,17.0,1.0,5.0] |[1.7581459850319243,-1.7581459850319243] |[0.8529773058286395,0.14702269417136052]|0.0 | |[0.0,52.0,15.0,1.0,5.0,9.0,5.0,5.0] |[2.6088743974586124,-2.6088743974586124] |[0.9314305415455806,0.06856945845441945]|0.0 | |[0.0,52.0,15.0,1.0,5.0,12.0,1.0,3.0] |[1.8410975503955256,-1.8410975503955256] |[0.8630784610725231,0.13692153892747697]|0.0 | |[0.0,57.0,15.0,1.0,4.0,16.0,6.0,4.0] |[1.904911346081691,-1.904911346081691] |[0.8704463839526814,0.1295536160473186] |0.0 | |[1.0,22.0,4.0,0.0,1.0,18.0,5.0,5.0] |[1.2616839124674724,-1.2616839124674724] |[0.7793158477292919,0.22068415227070803]|0.0 | |[1.0,22.0,4.0,0.0,2.0,18.0,5.0,5.0] |[1.6042279173740237,-1.6042279173740237] |[0.832608465695705,0.16739153430429493] |0.0 | |[1.0,27.0,4.0,1.0,3.0,16.0,5.0,5.0] |[1.4818864529709268,-1.4818864529709268] |[0.8148573492085158,0.1851426507914842] |0.0 | |[1.0,27.0,4.0,1.0,4.0,14.0,5.0,4.0] |[1.379009097740017,-1.379009097740017] |[0.7988318098541624,0.2011681901458377] |0.0 | |[1.0,32.0,0.125,1.0,2.0,18.0,5.0,2.0]|[0.28148664352576547,-0.28148664352576547] |[0.569910656659749,0.430089343340251] |0.0 | |[1.0,32.0,10.0,1.0,2.0,20.0,6.0,3.0] |[-0.1851761257948623,0.1851761257948623] |[0.45383780246566996,0.5461621975343299]|1.0 | |[1.0,32.0,10.0,1.0,4.0,20.0,6.0,4.0] |[0.9625930297088949,-0.9625930297088949] |[0.7236406723848533,0.2763593276151468] |0.0 | |[1.0,32.0,15.0,1.0,1.0,16.0,5.0,5.0] |[0.039440462424945366,-0.039440462424945366]|[0.5098588376463971,0.4901411623536029] |0.0 | |[1.0,37.0,4.0,1.0,1.0,18.0,5.0,4.0] |[0.7319377705508958,-0.7319377705508958] |[0.6752303588678488,0.3247696411321513] |0.0 | |[1.0,37.0,15.0,1.0,5.0,20.0,5.0,4.0] |[1.119955894572572,-1.119955894572572] |[0.7539805352533917,0.24601946474660835]|0.0 | |[1.0,42.0,15.0,1.0,4.0,17.0,6.0,5.0] |[1.4276540623429193,-1.4276540623429193] |[0.8065355283195409,0.19346447168045908]|0.0 | |[1.0,42.0,15.0,1.0,4.0,20.0,4.0,5.0] |[1.4935019453371354,-1.4935019453371354] |[0.8166033137058254,0.1833966862941747] |0.0 | |[1.0,42.0,15.0,1.0,4.0,20.0,6.0,3.0] |[0.4764020926318233,-0.4764020926318233] |[0.6168979221749373,0.38310207782506256]|0.0 | |[1.0,57.0,15.0,1.0,2.0,14.0,4.0,4.0] |[1.0201325344483316,-1.0201325344483316] |[0.734998414766428,0.265001585233572] |0.0 | |[1.0,57.0,15.0,1.0,2.0,14.0,7.0,2.0] |[-0.04283609891898266,0.04283609891898266] |[0.48929261249695394,0.5107073875030461]|1.0 | |[1.0,57.0,15.0,1.0,5.0,20.0,5.0,3.0] |[1.4874352661557535,-1.4874352661557535] |[0.8156930079647114,0.18430699203528864]|0.0 | +-------------------------------------+--------------------------------------------+----------------------------------------+----------+ only showing top 30 rows
转载于:https://www.cnblogs.com/wwxbi/p/6224670.html
Spark LogisticRegression 逻辑回归之建模相关推荐
- mllib逻辑回归 spark_大数据技术之Spark mllib 逻辑回归
本篇教程探讨了大数据技术之Spark mllib 逻辑回归,希望阅读本篇文章以后大家有所收获,帮助大家对大数据技术的理解更加深入. 逻辑回归 逻辑回归其实是一个分类算法而不是回归算法.通常是利用已知的 ...
- python逻辑回归模型建模步骤_Python逻辑回归——建模-评估模型
学完线性回归,逻辑回归建模+评估模型的过程就相对好理解很多.其实就是换汤不换药. 逻辑回归不是回归算法,而是分类算法,准确来说,叫逻辑分类 逻辑分类本质上是二分分类,即分类结果标签只有两个 逻辑回归建 ...
- spark java 逻辑回归_逻辑回归分类技术分享,使用Java和Spark区分垃圾邮件
原标题:逻辑回归分类技术分享,使用Java和Spark区分垃圾邮件 由于最近的工作原因,小鸟很久没给大家分享技术了.今天小鸟就给大家介绍一种比较火的机器学习算法,逻辑回归分类算法. 回归是一种监督式学 ...
- 使用spark建立逻辑回归(Logistic)模型帮Helen找男朋友
声明:版权所有,转载请联系作者并注明出处 http://blog.csdn.net/u013719780?viewmode=contents 博主简介:风雪夜归子(Allen),机器学习算法攻城狮, ...
- 机器学习-LogisticRegression逻辑回归算法
logistic回归又称logistic回归分析,是一种广义的线性回归分析模型,常用于数据挖掘,疾病自动诊断,经济预测等领域.例如,探讨引发疾病的危险因素,并根据危险因素预测疾病发生的概率等.以胃癌病 ...
- Lesson 12.4 逻辑回归建模实验
Lesson 12.4 逻辑回归建模实验 接下来进行逻辑回归的建模实验,首先需要导入相关库和自定义的模块. # 随机模块 import random# 绘图模块 import matplotlib ...
- 【Python学习系列十七】基于scikit-learn库逻辑回归训练模型(delta比赛代码2)
机器学习任务流程:学习任务定义->数学建模->训练样本采样->特征分析和抽取->算法设计和代码->模型训练和优化(性能评估和度量)->泛化能力评估(重采样和重建模) ...
- 5-1 逻辑回归代码(含warning解释)
#-*- coding: utf-8 -*- #逻辑回归 自动建模 import pandas as pd #参数初始化 filename = '../data/bankloan.xls' data ...
- 机器学习-逻辑回归-信用卡检测任务
信用卡欺诈检测 基于信用卡交易记录数据建立分类模型来预测哪些交易记录是异常的哪些是正常的. 任务流程: 加载数据,观察问题 针对问题给出解决方案 数据集切分 评估方法对比 逻辑回归模型 建模结果分析 ...
最新文章
- 图像腐蚀 java_OpenCV3 图像膨胀 dilate、腐蚀 erode、提取图像中的条形码 JAVA 实现...
- MongoDB应用篇
- html5伪类效果延缓,CSS3实现伪类hover离开时平滑过渡效果示例
- view controller lifecycle discussion - beforeRendering
- html判断用户名的合法性,javascript简单判断输入内容是否合法的方法
- 并发控制技术手段之多版本(三)
- 图文细谈远程桌面之3389
- EI检索实例(相关主题、单位集体、个人)
- Redis五大数据类型常用命令
- 亿图图示甘特图 开始日期和结束日期注意事项
- 为什么人脸识别系统总是认错黑人?
- hdu 4745 区间dp
- 【Lua笔记】、Lua元表
- 计算机专业英语10.2,计算机专业英语教程第2版 第2期:计算机硬件
- java计算机毕业设计高校大学生就业系统MyBatis+系统+LW文档+源码+调试部署
- Unity3D中使用Joystick Pack实现摇杆控制
- excel根据某列拆分数据表
- java番茄钟_个人用的简单番茄时钟
- Java中CAS操作
- Win7 任务栏缩略图消失的解决办法
热门文章
- 5G时代下,边缘计算产品的未来展望
- 《科学》封面特别报道:人类登月50年
- 2018 年将打响 AI 战争,7 条实战经验帮你战胜恐惧
- 别挖我的墙脚!乔布斯生前邮件竟爆出秘密协议
- 月薪 5 万清华姚班 NOI 金牌得主在线征友被群嘲,当代互联网相亲有多难
- 乐视视频 App 图标改为“欠 122 亿”,网友:我在别家分红包,却在你家随份子!...
- 研发团队来了高颜值的妹子,这结局万万没想到 | 每日趣闻
- hibernate整合进spring后的事务处理
- 数据中心网络流量精细运维
- 和封神一起“深挖”Spark