基于CNN的狗叫，猫叫语音分类

基于CNN的狗叫，猫叫语音分类
最近开始北漂的实习生活，第一家实习单位还是挺不错的。说句题外话，北京的生活没有想象中的那么恐怖，没有想象中的那么累，反而挺有人情味的。
公司里的主要业务是做“声纹识别”的，现在项目组好像主要分为传统的机器学习以及深度学习两个模块在做。刚接触到是一个唤醒的智能AI产品，为了尽快的熟悉这一模块的知识，所以找了个练手的项目。
这个分类很简单，就是单纯的让模型去识别是狗叫声还是猫叫声，只要使用python现成的库将音频的特征提取出来，然后使用卷积网络进行训练就可以了。这里比较头痛的是数据的获取，没有找到合适的，只能是写个脚本现爬（我真牛逼，哈哈）！！

import urllib
from bs4 import BeautifulSoupdef download(url, save_path):urllib.urlretrieve(url, save_path)dog_links = ["http://sc.chinaz.com/tag_yinxiao/GouJiao.html","http://sc.chinaz.com/tag_yinxiao/GouJiao_2.html","http://sc.chinaz.com/tag_yinxiao/GouJiao_3.html","http://sc.chinaz.com/tag_yinxiao/GouJiao_4.html"
]cat_links = ["http://sc.chinaz.com/tag_yinxiao/MaoJiao.html","http://sc.chinaz.com/tag_yinxiao/MaoJiao_2.html","http://sc.chinaz.com/tag_yinxiao/MaoJiao_3.html",
]count = 10
for link, type in cat_links:response = urllib.urlopen(link)content = response.read().decode('utf-8')# print(content)soup = BeautifulSoup(content)divs = soup.findAll("div", class_="music_block")for div in divs:a = div.find_all("a")[1]["href"]content = urllib.urlopen(a).read().decode('utf-8')audio = BeautifulSoup(content).findAll("div", class_="dian")[1].find("a")["href"]count += 1download(audio, "./data/cat/cat"+str(count)+".wav")

大概下下有100多个素材吧，考虑到素材不是很多的原因。所以我们需要将比较长的音频切分成小段的，这样数据量就会增加（这一步是很重要的，数据量直接影响最后的结果）

然后就是提取切分数据集，提取特征（mfcc），最后使用卷积网络来训练。这里并不是对音频的源文件进行卷积，而是在mfcc特征上进行提取之后进行的，好像音频里面有很多特征，比图像里多很多，这里还需要以后进一步学习，不过好在python库都为我们写好了。直接上代码：

from pydub import AudioSegment
import os
import numpy as np
import scipy.io.wavfile as wav
from python_speech_features import mfcc
import pickle
import randomAUDIO_LEN = 400
MFCC_LEN = 13def create_data():item_len = 2000base_path = "./all/"for file in os.listdir(base_path):real_path = base_path + fileaudio = AudioSegment.from_file(real_path)audio_len = len(audio)steps = audio_len // item_lenfor step in range(max(1, steps)):item_audio = audio[step * item_len: (step + 1) * item_len]save_audio = item_audio + AudioSegment.silent(item_len - len(item_audio))save_audio.export("./data/" + str(step) + file, format="wav")# create_data()
# exit()def split_train_test():base_dir = "./data/"dogs = []cats = []for file in os.listdir(base_dir):real_file = base_dir + fileif "cat" in file:cats.append(real_file)else:dogs.append(real_file)test = dogs[:10] + cats[:10]train = dogs[10:] + cats[10:]random.shuffle(train)random.shuffle(test)pickle.dump(train, open("./train", "wb"))pickle.dump(test, open("./test", "wb"))train_data = pickle.load(open("train", "rb"))
test_data = pickle.load(open("test", "rb"))def get_train_or_test(type, batch_size = 30):x = []y = []data = train_data if type == "train" else test_dataif type == "test":batch_size = 10all_dogs = [item for item in data if "dog" in item]all_cats = [item for item in data if "cat" in item]sample_dogs = random.sample(all_dogs, int(batch_size / 2))sample_cats = random.sample(all_cats, batch_size - int(batch_size / 2))sample = sample_dogs + sample_catsrandom.shuffle(sample)for item in sample:try:fs, audio = wav.read(item)processed_audio = mfcc(audio, samplerate=fs)x_hold = np.zeros(shape=(AUDIO_LEN, MFCC_LEN))x_hold[:len(processed_audio), :] = processed_audiox.append(x_hold)if type == "train":reverse = []for data in x_hold:list(data).reverse()reverse.append(data)x.append(reverse)y.append([1, 0] if "cat" in item else [0, 1])y.append([1, 0] if "cat" in item else [0, 1])except:print("error")passx = np.array(x) / 100x = np.array(x)y = np.array(y)return x, y

这一块主要是训练前的预操作，包括对数据切分，划分训练集，测试集，还有提取特征。

import tensorflow as tf
import update_audioAUDIO_LEN = update_audio.AUDIO_LEN
MFCC_LEN = update_audio.MFCC_LENx = tf.placeholder(shape=(None, AUDIO_LEN, MFCC_LEN), dtype=tf.float32)
x_change = tf.expand_dims(x, -1)
y = tf.placeholder(shape=(None, 2), dtype=tf.int32)def create_model():print("input shape", x_change.shape)filter1 = tf.Variable(tf.random_normal([10, 3, 1, 64]))bias1 = tf.Variable(tf.random_normal([64]))conv_1 = tf.nn.conv2d(x_change, filter1, strides=[1, 1, 1, 1], padding="SAME") + bias1print("conv_1 shape", conv_1.shape)relu1 = tf.nn.relu(conv_1)dropout1 = tf.nn.dropout(relu1, 0.5)max_pool1 = tf.nn.max_pool(dropout1, [1, 2, 2, 1], [1, 2, 2, 1], 'SAME')print("max pool shape", max_pool1.shape)flatten = tf.layers.flatten(max_pool1)print("flatten shape", flatten.shape)net_work = tf.layers.dense(flatten, units=128, activation=tf.nn.relu)logit = tf.layers.dense(net_work, units=2, activation=tf.nn.softmax)print("logit shape", logit.shape)return logitlogit = create_model()def build_loss():loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=logit))return lossloss = build_loss()def create_opt():opt = tf.train.GradientDescentOptimizer(0.0001).minimize(loss)return optopt = create_opt()correct_prediction = tf.equal(tf.argmax(logit,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))saver = tf.train.Saver()
with tf.Session() as sess:sess.run(tf.global_variables_initializer())saver.restore(sess, "./model/model.model")feed_test_x, feed_test_y = update_audio.get_train_or_test("test")feed_test = {x: feed_test_x, y: feed_test_y}for index in range(10000):feed_x, feed_y = update_audio.get_train_or_test("train")feed = {x: feed_x, y: feed_y}get_loss, _ = sess.run([loss, opt], feed_dict=feed)if index % 100 == 0:get_acc = sess.run(accuracy, feed_dict=feed_test)print(get_loss, get_acc)saver.save(sess, "./model/model.model")

模型的主题模块，主要使用卷积网络对特征进行卷积，这里的特征并不是特别复杂，数据量也不是特别大，所以使用一层卷积就可以了，最后输出分类结果。
我最后得到的结果是损失0.31 准确率将近0.8（20个里面错3 - 4 个），如果后续需要改进的话可以继续对数据特征进行一些提取，或者扩大训练量，效果应该还会有提升！

基于CNN的狗叫，猫叫语音分类相关推荐

基于CNN模型的遥感图像复杂场景分类
复杂场景分类对于挖掘遥感图像中的价值信息具有重要意义.针对遥感图像中复杂场景分类,文章提出了一种基于卷积神经网络模型的分类方法,在该方法中构建了8层CNN网络结构,并对输入图像进行预处理操作以进一步增 ...
毕设：基于CNN卷积神经网络的猫狗识别、狗品种识别(Tensorflow、Keras、Kaggle竞赛)
基于卷积神经网络的图像识别算法及其应用研究毕业快一年了,拿出来分享给大家,我和网上唯一的区别就是,我能够同时实现两个方案(猫狗识别和狗品种识别),我当时也是网上各种查,花了2,3个月的时间,一个萝卜 ...
基于卷积神经网络VGG的猫狗识别
!有需要本项目的实验源码的可以私信博主! 摘要:随着大数据时代的到来,深度学习.数据挖掘.图像处理等已经成为了一个热门研究方向.深度学习是一个复杂的机器学习算法,在语音和图像识别方面取得的效果,远远超 ...
机器学习工程师 — Udacity 基于CNN和迁移学习创建狗品种分类器
卷积神经网络(Convolutional Neural Network, CNN) 项目:实现一个狗品种识别算法App 推荐你阅读以下材料来加深对 CNN和Transfer Learning的理解: ...
DL之VGG16：基于VGG16迁移技术实现猫狗分类识别(图片数据量调整→保存h5模型)
DL之VGG16:基于VGG16迁移技术实现猫狗分类识别(图片数据量调整→保存h5模型) 目录基于VGG16迁移技术实现猫狗分类识别(图片数据量调整→保存h5模型) 设计思路输出结果 1488/1 ...
Top2：CNN 卷积神经网络实现猫狗图片识别二分类
Top2:CNN 卷积神经网络实现猫狗图片识别二分类系统:Windows10 Professional 环境:python=3.6 tensorflow-gpu=1.14 ```python &qu ...
LSF-SCNN：一种基于 CNN 的短文本表达模型及相似度计算的全新优化模型
欢迎大家前往腾讯云社区,获取更多腾讯海量技术实践干货哦~ 本篇文章是我在读期间,对自然语言处理中的文本相似度问题研究取得的一点小成果.如果你对自然语言处理 (natural language proc ...
基于端到端深度学习方法的语音唤醒(Keyword Spotting)模型和论文
语音唤醒,即关键词检索(keyword spotting, KWS).用语音唤醒设备,让设备由休眠状态切换至工作状态. 下面主要对基于端到端的深度学习方法的语音唤醒模型总结. 模型输入为语音,输出为各 ...
Keras区分狗和猫
Kaggle概述与数据集下载:https://www.kaggle.com/c/dogs-vs-cats/data 原外文教程:https://deeplizard.com/learn/video/b ...

基于CNN的狗叫，猫叫语音分类

基于CNN的狗叫，猫叫语音分类相关推荐

最新文章

热门文章