In this one, you will learn how to create a Neural Network (NN) and use it for deciding whether a student has alcohol consumption problems.

Do students drink too much? How can you predict that? What predicts it best? How much too much is exactly?

Those questions might be difficult to answer, yet we can start somewhere. We can use a very limited dataset to get a sense of what the answers might look like. Something like this one.

The dataset contains 1044 instances and 32 variables (most of which binary and categorical). Actually, it consists of 2 other datasets. The first provides data for students enrolled in Portuguese class. The second describes students enrolled in a math course. There is overlap (yep, I know) between the datasets, that is some students attend both classes.

Let’s build an NN model for classifying whether a student has alcohol consumption problem. For that, we will use our trusty old friend - TensorFlow.

Before getting there, we have a bit of dirty work to do. Our dataset is not clean enough to just start and feed the data to our NN model. A bit of wrangling is required. But first, let’s start with some setting up:

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from math import floor, ceil
from pylab import rcParams%matplotlib inline

Some styling and making our experiments reproducible:

sns.set(style='ticks', palette='Spectral', font_scale=1.5)material_palette = ["#4CAF50", "#2196F3", "#9E9E9E", "#FF9800", "#607D8B", "#9C27B0"]
sns.set_palette(material_palette)
rcParams['figure.figsize'] = 16, 8plt.xkcd();
random_state = 42
np.random.seed(random_state)
tf.set_random_seed(random_state)

1. Preparing the data

Remember, our data is stored in two separate files. Let’s load them, assign proper course attendance to each student and merge them into one:

math_df = pd.read_csv("data/student/student-mat.csv", sep=";")
port_df = pd.read_csv("data/student/student-por.csv", sep=";")math_df["course"] = "math"
port_df["course"] = "portuguese"merged_df = math_df.append(port_df)
merged_df.shape

(1044, 34)

Exactly as promised - 1044 rows, but we have duplicates. The dataset archive contains instructions on how to find them. The merged result contains 382 instances. We will update the course column for those students, too:

merge_vector = ["school","sex","age","address","famsize","Pstatus","Medu","Fedu","Mjob","Fjob","reason","nursery","internet"]duplicated_mask = merged_df.duplicated(keep=False, subset=merge_vector)
duplicated_df = merged_df[duplicated_mask]
unique_df = merged_df[~duplicated_mask]
both_courses_mask = duplicated_df.duplicated(subset=merge_vector)
both_courses_df = duplicated_df[~both_courses_mask].copy()
both_courses_df["course"] = "both"
students_df = unique_df.append(both_courses_df)

We will use the following formula to quantify the amount of alcohol taken during the week per student:

Alcohol=Walc×2+Dalc×57Alcohol=Walc×2+Dalc×57

The new value changes in the interval [1,5][1,5]. Furthermore, we will classify student as a drinker if that value is greater than 2.

students_df = students_df.sample(frac=1)
students_df['alcohol'] = (students_df.Walc * 2 + students_df.Dalc * 5) / 7
students_df['alcohol'] = students_df.alcohol.map(lambda x: ceil(x))
students_df['drinker'] = students_df.alcohol.map(lambda x: "yes" if x > 2 else "no")

2. Exploration

Finally, we can get a feel for our data. Let’s take a look at the course distribution:

students_df.course.value_counts().plot(kind="bar", rot=0);

And the alcohol consumption from the formula:

students_df.alcohol.value_counts().plot(kind="bar", rot=0);

The actual variable that we are going to predict:

students_df.drinker.value_counts().plot(kind="bar", rot=0);

Somewhat more comprehensive overview:

sns.pairplot(students_df[['age', 'absences', 'G3', 'goout', 'freetime', 'studytime', 'drinker']], hue='drinker');

Let’s have a look at a general correlations matrix:

corr_mat = students_df.corr()
fig, ax = plt.subplots(figsize=(20, 12))
sns.heatmap(corr_mat, vmax=1.0, square=True, ax=ax);

3. Building our model

It is time for the fun part. Well, not just yet.

3.1 Encoding the data

Most of our variables are categorical and we must one-hot encode them four our NN to work properly. First, let’s define a little helper function:

def encode(series):return pd.get_dummies(series.astype(str))

Our features and target variable using our little helper function:

train_x = pd.get_dummies(students_df.school)
train_x['age'] = students_df.age
train_x['absences'] = students_df.absences
train_x['g1'] = students_df.G1
train_x['g2'] = students_df.G2
train_x['g3'] = students_df.G3
train_x = pd.concat([train_x, encode(students_df.sex), encode(students_df.Pstatus), encode(students_df.Medu), encode(students_df.Fedu),encode(students_df.guardian), encode(students_df.studytime),encode(students_df.failures), encode(students_df.activities),encode(students_df.higher), encode(students_df.romantic),encode(students_df.reason), encode(students_df.paid),encode(students_df.goout), encode(students_df.health),encode(students_df.famsize), encode(students_df.course)], axis=1)train_y = encode(students_df.drinker)

3.2 Splitting the data

Let’s allocate 90% of the data for training and use 10% for testing:

train_size = 0.9train_cnt = floor(train_x.shape[0] * train_size)
x_train = train_x.iloc[0:train_cnt].values
y_train = train_y.iloc[0:train_cnt].values
x_test = train_x.iloc[train_cnt:].values
y_test = train_y.iloc[train_cnt:].values

3.3 Building our Neural Network

Our NN consists of input, output and 1 hidden layer. We are using ReLU as activation function of the hidden layer and softmax for our output layer. As an additional bonus we will use Dropout - simple way to reduce overfitting during the training of our network. Let’s wrap our model in a little helper function:

def multilayer_perceptron(x, weights, biases, keep_prob):layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])layer_1 = tf.nn.relu(layer_1)layer_1 = tf.nn.dropout(layer_1, keep_prob)out_layer = tf.matmul(layer_1, weights['out']) + biases['out']return out_layer

Let’s set the number of neurons in the hidden layer to 38 and randomly initialize the weights and biases considering their proper dimensions:

n_hidden_1 = 38
n_input = train_x.shape[1]
n_classes = train_y.shape[1]weights = {'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),'out': tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
}biases = {'b1': tf.Variable(tf.random_normal([n_hidden_1])),'out': tf.Variable(tf.random_normal([n_classes]))
}keep_prob = tf.placeholder("float")

We will train our model for 5,000 epochs (training steps) with a batch size of 32. That is, at each step, we will train our NN using 32 rows of our data. Granted, in our case you can just train on the whole dataset. However, when the data is huge and you can’t fit it in memory, you would love to split it and feed it to the model at batches (chunks):

training_epochs = 5000
display_step = 1000
batch_size = 32x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])

3.4 Training

In order for our model to learn, we need to define what is good. Actually, we will define what is bad and try to minimize it. We will call the “badness” - error or cost (hence, the cost function). It represents how far off of the true result our model is at some point during training. We would love that error to be 0 for all possible inputs. Currently, that happens only in Sci-Fi novels (not that I discourage dreaming about it).

The cost function that we are going to use is called “Cross-Entropy”. It is defined as:

Hy′(y)=−∑iy′ilog(yi)Hy′(y)=−∑iyi′log⁡(yi)

Where yy is the predicted distribution for our alcohol consumption and y′y′ is the ground truth. This guide might be helpful for better understanding Cross-Entropy. TensorFlow has a little helper function with the sweet little name softmax_cross_entropy_with_logits. It use softmax as activation function for our output layer and use Cross-Entropy as error function.

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=predictions, labels=y))

Now, for the actual workhorse - Adam (nope, not the from the Bible - although, that would’ve been fun). Adam is a type of gradient descent optimization algorithm which essentially tries as hard as he can to find proper weights and biases for our network via minimizing the cost function that we specified above. It is well beyond the scope of this post to describe Adam in details, but you can find all the necessary information over here - with tons of nice pictures!

Using Adam in TensorFlow is quite easy, we just have to specify learning rate (you can fiddle with that one) and pass the cost function we defined above:

optimizer = tf.train.AdamOptimizer(learning_rate=0.0001).minimize(cost)

Our model is created by just calling our helper function with the proper arguments:

predictions = multilayer_perceptron(x, weights, biases, keep_prob)

Our finished NN looks something like this (much reduced input and hidden layer sizes):

4. Evaluation

Time to see how well our model can predict. During the training, we will set the keep probability of the Dropout to 0.8 and reset it to 1.0 during test time:

with tf.Session() as sess:sess.run(tf.global_variables_initializer())for epoch in range(training_epochs):avg_cost = 0.0total_batch = int(len(x_train) / batch_size)x_batches = np.array_split(x_train, total_batch)y_batches = np.array_split(y_train, total_batch)for i in range(total_batch):batch_x, batch_y = x_batches[i], y_batches[i]_, c = sess.run([optimizer, cost], feed_dict={x: batch_x,y: batch_y, keep_prob: 0.8})avg_cost += c / total_batchif epoch % display_step == 0:print("Epoch:", '%04d' % (epoch+1), "cost=", \"{:.9f}".format(avg_cost))print("Optimization Finished!")correct_prediction = tf.equal(tf.argmax(predictions, 1), tf.argmax(y, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))print("Accuracy:", accuracy.eval({x: x_test, y: y_test, keep_prob: 1.0}))

Epoch: 0001 cost= 103.346587711
Epoch: 1001 cost= 2.053295698
Epoch: 2001 cost= 0.464109008
Epoch: 3001 cost= 0.304592287
Epoch: 4001 cost= 0.284183074
Optimization Finished!
Accuracy: 0.731343

5. Conclusion(s)

Yes, you did it! You survived another part of this tutorial. But what did you achieved? Our model got roughly 73% accuracy on the test set. Is this good? Well… no, it is not!

How is that possible? The authors of the paper linked from the dataset attained 92% accuracy. Which is (as they state) acceptable. So, why our model performs so badly?

For one thing, we excluded overlapping student data, which made our dataset considerably smaller - from 1044 to just 662 instances (I haven’t found any type of duplicate reduction technique used by the authors. Please, write me a comment if I am wrong about that one). Due to the high prevalence of no drinkers, this might have a decremental effect on our model performance.

Of course, you can try different parameters, architecture, training epochs etc… Feel free to do so! Till next time!

References

Student Alcohol Consumption - Description of the used dataset
Using Data Mining to Predict Secondary School Student Alcohol Consumption - A paper using this dataset and comparing 3 different models on it (including NN)
Student Alcohol Consumption Prediction - Possibly source code used in the previous paper
MNIST classification using TensorFlow - Use Deep Neural Network to classify handwritten digits
How to choose the number of hidden layers and neurons in NN?
How to handle ordinal data in NN models - Lots of the variables are ordinal. This paper presents an approach to handling that kind of data in NN models
Simpler way to handle ordinal data in NN models

原文地址: http://curiousily.com/data-science/2017/02/07/tensorflow-for-hackers-part-2.html

TensorFlow for Hackers - Part II相关推荐

  1. TensorFlow for Hackers (Part VII) - Credit Card Fraud Detection using Autoencoders in Keras

    It's Sunday morning, it's quiet and you wake up with a big smile on your face. Today is going to be ...

  2. TensorFlow for Hackers (Part VI) - Human Activity Recognition using LSTMs on Android

    Ever wondered how your smartphone, smartwatch or wristband knows when you're walking, running or sit ...

  3. TensorFlow for Hackers - Part III

    Have you ever stood still, contemplating about how cool would it be to build a model that can distin ...

  4. TensorFlow for Hackers - Part I

    What is TensorFlow? TensorFlow is a library for number crunching created and maintained by Google. I ...

  5. 常用增强学习实验环境 II (ViZDoom, Roboschool, TensorFlow Agents, ELF, Coach等)

    原文链接:http://blog.csdn.net/jinzhuojun/article/details/78508203 前段时间Nature上发表的升级版Alpha Go - AlphaGo Ze ...

  6. 常用深度学习框——Caffe/TensorFlow / Keras/ PyTorch/MXNet

    常用深度学习框--Caffe/TensorFlow / Keras/ PyTorch/MXNet 一.概述 近几年来,深度学习的研究和应用的热潮持续高涨,各种开源深度学习框架层出不穷,包括Tensor ...

  7. TensorFlow学习(四):优化器Optimizer

    梯度下降优化算法综述    该文翻译自An overview of gradient descent optimization algorithms.    总所周知,梯度下降算法是机器学习中使用非常 ...

  8. 利用卷积神经网络(VGG19)实现火灾分类(附tensorflow代码及训练集)

    源码地址 https://github.com/stephen-v/tensorflow_vgg_classify 1. VGG介绍 1.1. VGG模型结构 1.2. VGG19架构 2. 用Ten ...

  9. Tensorflow 10分钟快速上手

    Tensorflow 快速上手 系统版本 : Ubuntu 16.04LTS Python版本 : 3.6.1 Tensorflow 版本 : 1.0.1 本文依据教程 TensorFlow Tuto ...

最新文章

  1. automake使用说明
  2. BPF、eBPF、XDP 和 Bpfilter……这些东西是什么?
  3. [O365] Azure Active Directory Sync EVENT ID 906 FAILED While Purging Run History. Invalid Namespace
  4. chrome开启touch屏幕点击事件
  5. C++实现dijkstra单源最短路径算法-邻接表+优先队列
  6. springmvc 拦截器_Spring MVC拦截器学习
  7. 超级详细的Spring Boot 注解总结
  8. 计算机英语讲课笔记07
  9. iFixit:手机屏幕底部安装小芯片致第三方维修iPhone 13屏幕更难
  10. oracle表,视图,存储过程,函数,序列.....查询
  11. FreeType 使用FT_MEM_ALLOC/FT_FREE内存操作
  12. NMEA-0183通信协议
  13. VS开发中,常见编译问题解决方案
  14. STM8L HALT与IWDG同时使用问题
  15. mysql通过触发器获取数据表的操作id_MySQL触发器初试:当A表插入新记录,自动在B表中插入相同ID的记录...
  16. win7右键反应特别慢的问题
  17. 计算机会计学试题,会计电算化模拟题及答案
  18. 日语蔬菜水果相关词汇(2)
  19. 均值滤波计算_均值滤波器
  20. 浅谈网页设计中的黄金分割

热门文章

  1. c# 多线程中lock用法的经典实例
  2. js数据类型判断和数组判断
  3. Page.LoadTemplate的使用
  4. 科学计算:Python VS. MATLAB (1)----给我一个理由先
  5. php mysql explain_Mysql分析-explain的详细介绍
  6. 《Neural network and deep learning》学习笔记(一)
  7. 科大星云诗社动态20210411
  8. 致青春——谁的青春没有遗憾2018-01-05
  9. 自动平衡男女比例的随机分组软件B2G使用教程,献给组织
  10. 第11课:优化神经网络——梯度优化