遭遇棘手 交接

大纲 (Outline)

The goal of this post is two-fold:

这篇文章的目标有两个:

  1. I’ll show an example of implementing the results of an interesting research paper on classifying audio clips based on their sonic content. This will include applications of the librosa library, which is a Python package for music and audio analysis. The clips are short audio clips from city, and the classification task is predicting the appropriate category label.

    我将展示一个示例,该示例将实现一个有趣的研究论文的结果,该论文基于音频片段的声音内容对音频片段进行分类。 这将包括librosa库的应用程序,该库是用于音乐和音频分析的Python软件包。 这些剪辑是来自城市的简短音频剪辑,并且分类任务正在预测适当的类别标签。

  2. I’ll show the importance of a valid cross-validation scheme. Given the nuances of the audio source dataset I’ll be using, it is very easy to accidentally leak information from the recording that will overfit your model and prevent it from generalizing. The solution is somewhat subtle so it seemed like a nice opportunity for a blog post.我将展示有效的交叉验证方案的重要性。 考虑到我将要使用的音频源数据集的细微差别,很容易意外地从录音中泄漏信息,这些信息会过度拟合模型并阻止其泛化。 该解决方案有些微妙,因此对于博客帖子来说似乎是一个不错的机会。

原始研究论文 (Original research paper)

http://www.justinsalamon.com/uploads/4/3/9/4/4394963/salamon_urbansound_acmmm14.pdf

http://www.justinsalamon.com/uploads/4/3/9/4/4394963/salamon_urbansound_acmmm14.pdf

来源数据集,论文作者 (Source dataset, by paper authors)

https://urbansounddataset.weebly.com/urbansound8k.html

https://urbansounddataset.weebly.com/urbansound8k.html

他们的数据集摘要 (Summary of their dataset)

“This dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, and street_music. The classes are drawn from the urban sound taxonomy.”

“此数据集包含来自10类的城市声音的 8732个标记的声音摘录(<= 4s) :空调,汽车喇叭,儿童游戏,狗吠,钻探,引擎空转,枪声,手提钻,警笛,警笛和street_music。 这些类别是根据城市声音分类法得出的。”

I’ll extract features from these sound excerpts and fit a classifier to predict one of the 10 classes. Let’s get started!

我将从这些声音摘录中提取特征,并使用分类器来预测10个类别之一。 让我们开始吧!

请注意我的代码 (Note on my Code)

I’ve created a repo that allows you to re-create my example in full:

我创建了一个存储库,使您可以完整地重新创建示例:

  1. Script runner: https://github.com/marcmuon/urban_sound_classification/blob/master/main.py

    脚本执行者 : https : //github.com/marcmuon/urban_sound_classification/blob/master/main.py

  2. Feature extraction module: https://github.com/marcmuon/urban_sound_classification/blob/master/audio.py

    特征提取模块 : https : //github.com/marcmuon/urban_sound_classification/blob/master/audio.py

  3. Model module: https://github.com/marcmuon/urban_sound_classification/blob/master/model.py

    模型模块 : https : //github.com/marcmuon/urban_sound_classification/blob/master/model.py

The script runner handles loading the source audio from disk, parsing the metadata about the source audio, and passing this information to the feature extractor and the model.

脚本运行程序处理从磁盘加载源音频的过程,解析有关源音频的元数据,并将此信息传递给功能提取器和模型。

下载数据 (Downloading the data)

You can download the data, which extracts to 7.09GB, using this form from the research paper authors: https://urbansounddataset.weebly.com/download-urbansound8k.html

您可以使用以下来自研究论文作者的表格下载数据,该数据可提取至7.09GB: https ://urbansounddataset.weebly.com/download-urbansound8k.html

目录结构[可选部分-如果您想自己运行它] (Directory structure [optional section — if you want to run this yourself])

Obviously you can fork the code and re-map it to whatever directory structure you want, but if you want to follow mine:

显然,您可以分叉代码并将其重新映射到所需的任何目录结构,但是如果您要遵循我的代码:

  • In your home directory: create a folder called datasets, and in there place the unzipped UrbanSound8K folder [from link in ‘Downloading the Data’]

    在您的主目录中:创建一个名为datasets的文件夹,然后在其中放置解压缩的UrbanSound8K文件夹[来自“下载数据”中的链接]

  • Also in your home directory: create a projects folder and put the cloned repo there ending up with ~/projects/urban_sound_classification/…

    同样在您的主目录中:创建一个项目文件夹,并将克隆的存储库放在此处,以〜/ projects / urban_sound_classification /…结尾。

Within the code, I use some methods to automatically write the extracted feature vectors for each audio file into ~/projects/urban_sound_classification/data

在代码中,我使用一些方法将每个音频文件的提取特征向量自动写入〜/ projects / urban_sound_classification / data

I do this because the feature extraction takes a long time and you won’t want to do it twice. There’s also code that checks to see if these feature vectors exist.

我这样做是因为特征提取需要很长时间,并且您不想重复两次。 还有代码检查这些特征向量是否存在。

tl;dr — if you follow my directory structure, you can simply run the main.py script and everything should work!

tl; dr —如果遵循我的目录结构,则只需运行main.py脚本,一切就可以正常工作!

为什么这个问题需要仔细的交叉验证 (Why this problem requires careful cross-validation)

Note that the source data is split up into 10 sub-folders, labeled ‘Fold1’, ‘Fold2’, etc.

请注意,源数据分为10个子文件夹,分别标记为“ Fold1”,“ Fold2”等。

We have 8,732 four-second audio clips of various city sounds. These clips were manually created by the research paper authors, where they labeled them into groups such as ‘car horn’, ‘jackhammer’, ‘children playing’, and so on. In addition to the 10 folds, there are 10 classes.

我们提供了8,732个四秒钟的各种城市声音的音频剪辑。 这些片段是由研究论文作者手动创建的,他们将它们标记为“汽车喇叭”,“手提凿岩机”,“玩耍的孩子”等组。 除10折外,还有10类。

The fold numbers do not have anything to do with the class labels; rather, the folds refer to the uncut audio file(s) that these 4-second training examples were spliced from.

折叠数字与类别标签无关; 折叠是指这些4秒钟训练示例的未切割音频文件。

What we don’t want is for the model to be able to learn how to classify things based on aspects of the particular underlying recording.

我们希望模型能够学习如何基于特定基础记录的各个方面对事物进行分类。

We want a generalizable classifier that will work with a wide array of recording types, but that still classifies the sounds correctly.

我们想要一个可归类为通用的分类器,该分类器可用于多种录音类型,但仍可以正确分类声音。

论文作者关于适当简历的指导 (Guidance from the paper authors on proper CV)

That’s why the authors have pre-built folds for us, and offered the following guidance, which is worth quoting:

这就是为什么作者为我们预先构建折页,并提供以下指导的原因,值得引用:

Don’t reshuffle the data! Use the predefined 10 folds and perform 10-fold (not 5-fold) cross validation…

不要改组数据! 使用预定义的10折并执行10折(而非5折)交叉验证…

If you reshuffle the data (e.g. combine the data from all folds and generate a random train/test split) you will be incorrectly placing related samples in both the train and test sets, leading to inflated scores that don’t represent your model’s performance on unseen data. Put simply, your results will be wrong.

如果您重新整理数据 (例如,合并所有折叠的数据并生成随机训练/测试拆分), 您将在训练和测试集中错误地放置相关样本,从而导致分数膨胀而不能代表模型的性能在看不见的数据上。 简而言之,您的结果将是错误的。

适当方法总结 (Summary of the proper approach)

  • Train on folds 1–9, then test on fold 10 and record the score. Then train on folds 2–10, and test on fold 1 and record the score.训练1-9倍,然后测试10倍并记录得分。 然后训练2-10倍,并测试1倍并记录分数。
  • Repeat this until each fold has served as the holdout one time.

    重复此过程, 直到每一次折叠都成为一次保留。

  • The overall score will be the average of the 10 accuracy score from 10 different holdout sets.总体得分将是10个不同的保留集的10个准确性得分的平均值。

重新创建论文结果 (Re-creating the paper results)

Note that the research paper does not have any code examples. What I want to do is first see if I can re-create (more or less) the results from the paper with my own implementation.

请注意,该研究论文没有任何代码示例。 我想做的是首先查看我是否可以使用自己的实现重新创建(或多或少)论文的结果。

Then if that looks in line, I’ll work on some model improvements to see if I can beat it.

然后,如果这符合要求,我将进行一些模型改进,以查看是否可以击败它。

Here’s a snapshot of their model accuracy across folds from the paper [their image, not mine]:

这是纸上折痕处的模型精度的快照(他们的图像,不是我的):

Image from Research Paper Authors — Justin Salamon, Christopher Jacoby, and Juan Pablo Bello
图片由研究论文作者提供-贾斯汀·萨拉蒙,克里斯托弗·雅各比和胡安·帕勃罗·贝洛

Thus we’d like to get up to the high 60%/low 70% accuracy across the folds as shown in 3a.

因此,我们希望在折痕处达到60%/ 70%的高精度,如图3a所示。

音频特征提取 (Audio Feature Extraction)

Image by Author [Marc Kelechava]
图片作者[Marc Kelechava]

Librosa is an excellent and easy to use Python library that implements music information retrieval techniques. I recently wrote another blog post on a model using the librosa library here. The goal of that exercise was to train an audio genre classifier on labeled audio files (label=music genre) from my personal library. Then I use that trained model to predict the genre for other untagged files in my music library.

Librosa是一个出色且易于使用的Python库,它实现了音乐信息检索技术。 我最近在这里使用librosa库在模型上写了另一篇博客文章 。 该练习的目的是在我的个人图书馆中的带有标签的音频文件(标签=音乐流派)上训练音频流派分类器。 然后,我使用训练有素的模型来预测音乐库中其他未标记文件的类型。

I will use some of the music information retrieval techniques I learned from that exercise and apply them to audio feature extraction for the city sound classification problem. In particular I’ll use:

我将使用从该练习中学到的一些音乐信息检索技术,并将其应用于针对城市声音分类问题的音频特征提取。 特别是,我将使用:

  • Mel-Frequency Cepstral Coefficients (MFCC)

    梅尔频率倒谱系数(MFCC)

  • Spectral Contrast

    光谱对比

  • Chromagram

    色度图

快速绕过音频转换[可选] (A quick detour on audio transformations [optional])

[My other blog post expands on some of this section in a bit more detail if any of this is of particular interest]

[如果其中任何一个特别令人感兴趣,我的其他博客文章将在本节的某些部分进行更详细的扩展]

Note that it is technically possible to convert a raw audio source to a numerical vector and train that directly. However, a (downsampled) 7-minute audio file will yield a time series vector nearly ~9,000,000 floating point numbers in length!

请注意,从技术上讲,可以将原始音频源转换为数字矢量并直接对其进行训练。 但是,一个(缩减采样)的7分钟音频文件将产生一个时间序列向量,其长度约为9,000,000个浮点数!

Even for our 4-second clips, the raw time series representation is a vector ~7000-dim. Given we only have 8,732 training examples, this is likely too high-dim to be workable.

即使对于我们的4秒钟剪辑,原始时间序列表示形式也是一个约7000维的向量。 假设我们只有8,732个培训示例,那么这可能太过困难了而无法工作。

The various music informational retrieval techniques reduce the dimensionality of the raw audio vector representation and make this more tractable for modeling.

各种音乐信息检索技术降低了原始音频矢量表示的维数,并使建模更易于处理。

The techniques that we’ll be using to extract features seek to capture different qualities about the audio over time. For instance, the MFCCs describe the spectral envelope [amplitude spectrum] of a sound. Using librosa we get this information over time — i.e., we get a matrix!

我们将用于提取特征的技术旨在随着时间的流逝捕获音频的不同质量。 例如,MFCC描述了声音的频谱包络(振幅频谱)。 使用librosa,我们可以随着时间的推移获取此信息-即,我们得到一个矩阵!

The MFCC matrix for a particular audio file will have coefficients on the y-axis and time on the x-axis. Thus we want to summarize these coefficients over time (across the x-axis, or axis=1 in numpy land). Say we take an average over time — then we get the average value for each MFCC coefficient across time, i.e., a feature vector of numbers for that particular audio file!

特定音频文件的MFCC矩阵在y轴上具有系数,在x轴上具有时间。 因此,我们想随时间总结这些系数(在x轴上或numpy区域中轴= 1)。 假设我们随时间取平均值,然后取每个MFCC系数随时间的平均值,即该特定音频文件的数字特征向量!

What we can do is repeat this process for different music informational retrieval techniques, or different summary statistics. For instance, the spectral contrast technique will also yield a matrix of different spectral characteristics for different frequency ranges over time. Again we can repeat the aggregation process over time and pack it into our feature vector.

我们可以做的是针对不同的音乐信息检索技术或不同的摘要统计重复此过程。 例如,频谱对比技术还将随着时间的推移针对不同频率范围生成具有不同频谱特性的矩阵。 同样,我们可以随着时间的推移重复聚合过程,并将其打包到我们的特征向量中。

作者使用了什么 (What the authors used)

The paper authors call out MFCC explicitly. They mention pulling the first 25 MFCC coefficients and

本文作者明确地调用了MFCC。 他们提到拉前25个MFCC系数,

“The per-frame values for each coefficient are summarized across time using the following summary statistics: minimum, maximum, median, mean, variance, skewness, kurtosis and the mean and variance of the first and second derivatives, resulting in a feature vector of dimension 225 per slice.”

“每个系数的每帧值使用以下汇总统计信息在整个时间进行汇总:最小值,最大值,中位数,均值,方差,偏度,峰度以及一阶和二阶导数的均值和方差,从而得出特征向量为每片225尺寸。”

Thus in their case they kept aggregating the 25 MFCCs over different summary statistics and packed them into a feature vector.

因此,在他们的情况下,他们不断将25个MFCC汇总到不同的摘要统计数据中,并将它们打包成一个特征向量。

I’m going to implement something slightly different here, since it worked quite well for me in the genre classifier problem mentioned previously.

我将在此处实现一些稍有不同的方法,因为它在前面提到的体裁分类器问题中对我来说非常有效。

I take (for each snippet):

我参加(针对每个摘录):

  • Mean of the MFCC matrix over timeMFCC矩阵随时间的平均值
  • Std. Dev of the MFCC matrix over time标准 MFCC矩阵随时间的变化
  • Mean of the Spectral Contrast matrix over time光谱对比度矩阵随时间的平均值
  • Std. Dev of the Spectral Contrast matrix over time标准 光谱对比度矩阵随时间的偏差
  • Mean of the Chromagram matrix over time色度矩阵随时间的平均值
  • Std. Dev of the Chromagram matrix over time标准 时序图的色度矩阵

My output (for each audio clip) will only be 82-dimensional as opposed to the 225-dim of the paper, so modeling should be quite a bit faster.

我的输出(对于每个音频剪辑)仅是82维的,而不是纸张的225维,因此建模应该更快一些。

最后一些代码! 音频特征提取正在起作用。 (Finally some code! Audio feature extraction in action.)

[Note that I’ll be posting code snippets both within the blog post and with GitHub Gist links. Sometime Medium does not render Github Gists correctly, which is why I’m doing this. Also all the in-document code is copy and pasteable to an ipython terminal, but GitHub gists are not].

[请注意,我将在博客文章中以及GitHub Gist链接中发布代码片段。 有时,Medium无法正确渲染Github Gists,这就是为什么我要这样做。 同样,所有文档中的代码都可以复制并粘贴到ipython终端,但是GitHub要点不是]。

Referring to my script runner here:

在这里引用我的脚本运行器:

I parse through the metadata (given with the dataset) and grab the filename, fold, and class label for each audio file. Then this gets sent to an audio feature extractor class.

我解析元数据(与数据集一起),并获取每个音频文件的文件名,折叠和类标签。 然后将其发送到音频特征提取器类。

metadata = parse_metadata("metadata/UrbanSound8K.csv")audio_features = []for row in metadata:    path, fold, label = row    src_path = f"{Path.home()}/datasets/UrbanSound8K/audio/fold{fold}/{path}"    audio = AudioFeature(src_path, fold, label)    audio.extract_features("mfcc", "spectral", "chroma")    audio_features.append(audio)

The AudioFeature class wraps around librosa, and extracts the features you feed in as strings as shown above. It also then saves the AudioFeature object to disk for every audio clip. The process takes a while, so I save the class label and fold number in the AudioFeature object along with the feature vector. This way you can come back and play around with the model later on the extracted features.

AudioFeature类环绕librosa,并提取您作为字符串输入的功能,如上所示。 然后,还将每个音频剪辑的AudioFeature对象保存到磁盘。 该过程需要一段时间,因此我将类标签和折叠号与特征向量一起保存在AudioFeature对象中。 这样,您可以稍后在提取的功能上返回并使用模型。

import librosaimport numpy as npimport picklefrom pathlib import Pathimport osclass AudioFeature:    def __init__(self, src_path, fold, label):        self.src_path = src_path        self.fold = fold        self.label = label        self.y, self.sr = librosa.load(self.src_path, mono=True)        self.features = None    def _concat_features(self, feature):        self.features = np.hstack(            [self.features, feature] if self.features is not None else feature        )    def _extract_mfcc(self, n_mfcc=25):        mfcc = librosa.feature.mfcc(self.y, sr=self.sr,                                            n_mfcc=n_mfcc)        mfcc_mean = mfcc.mean(axis=1).T        mfcc_std = mfcc.std(axis=1).T        mfcc_feature = np.hstack([mfcc_mean, mfcc_std])        self._concat_features(mfcc_feature)    def _extract_spectral_contrast(self, n_bands=3):        spec_con = librosa.feature.spectral_contrast(            y=self.y, sr=self.sr, n_bands=n_bands        )        spec_con_mean = spec_con.mean(axis=1).T        spec_con_std = spec_con.std(axis=1).T        spec_con_feature = np.hstack([spec_con_mean, spec_con_std])        self._concat_features(spec_con_feature)    def _extract_chroma_stft(self):        stft = np.abs(librosa.stft(self.y))        chroma_stft = librosa.feature.chroma_stft(                                     S=stft, sr=self.sr)        chroma_mean = chroma_stft.mean(axis=1).T        chroma_std = chroma_stft.std(axis=1).T        chroma_feature = np.hstack([chroma_mean, chroma_std])        self._concat_features(chroma_feature)    def extract_features(self, *feature_list, save_local=True):        extract_fn = dict(            mfcc=self._extract_mfcc,            spectral=self._extract_spectral_contrast,            chroma=self._extract_chroma_stft,        )        for feature in feature_list:            extract_fn[feature]()        if save_local:            self._save_local()    def _save_local(self, clean_source=True):        out_name = self.src_path.split("/")[-1]        out_name = out_name.replace(".wav", "")        filename = f"{Path.home()}/projects/urban_sound_classification/data/fold{self.fold}/{out_name}.pkl"        os.makedirs(os.path.dirname(filename), exist_ok=True)        with open(filename, "wb") as f:            pickle.dump(self, f)        if clean_source:            self.y = None

This class implements what I described earlier — which is aggregating the various music information retrieval techniques over time, and then packing everything into a single feature vector for each audio clip.

此类实现了我之前介绍的内容-随时间聚集各种音乐信息检索技术,然后将所有内容打包到每个音频剪辑的单个特征向量中。

造型 (Modeling)

Image by Author [Marc Kelechava]
图片作者[Marc Kelechava]

Since we put all the AudioFeature objects in a list above, we can do some quick comprehensions to get what we need for modeling:

由于我们将所有AudioFeature对象都放在了上面的列表中,因此我们可以快速了解一下建模所需的内容:

feature_matrix = np.vstack([audio.features for audio in audio_features])labels = np.array([audio.label for audio in audio_features])folds = np.array([audio.fold for audio in audio_features])model_cfg = dict(    model=RandomForestClassifier(        random_state=42,        n_jobs=10,        class_weight="balanced",        n_estimators=500,        bootstrap=True,    ),)model = Model(feature_matrix, labels, folds, model_cfg)fold_acc = model.train_kfold()

The Model class will implement the cross-validation loop as described by the authors (keeping the relevant pitfalls in mind!).

Model类将按照作者的描述实现交叉验证循环(请牢记相关的陷阱!)。

As a reminder, here’s a second warning from the authors:

提醒一下,这是作者的第二条警告:

Don’t evaluate just on one split! Use 10-fold (not 5-fold) cross validation and average the scoresWe have seen reports that only provide results for a single train/test split, e.g. train on folds 1–9, test on fold 10 and report a single accuracy score. We strongly advise against this. Instead, perform 10-fold cross validation using the provided folds and report the average score.

不要一口气评估! 使用10倍(而非5倍)交叉验证并取平均分数我们已经看到报告仅提供单个训练/测试拆分的结果,例如,训练1-9进行训练,测试10折并报告单个准确性得分。 我们强烈建议您不要这样做。 相反,请使用提供的折数执行10折交叉验证,并报告平均得分。

Why?

为什么?

Not all the splits are as “easy”. That is, models tend to obtain much higher scores when trained on folds 1–9 and tested on fold 10, compared to (e.g.) training on folds 2–10 and testing on fold 1. For this reason, it is important to evaluate your model on each of the 10 splits and report the average accuracy.

并非所有拆分都一样“容易”。 也就是说,与(例如)在2–10倍训练和1折测试相比,在1–9倍训练和10折测试时,模型往往会获得更高的分数。因此,评估您的对10个分割中的每个分割进行建模,并报告平均准确度。

Again, your results will NOT be comparable to previous results in the literature.”

同样,您的结果将无法与文献中的先前结果相提并论。”

On their latter point —(this is from the paper) it’s worth noting that different recordings/folds have different distributions of when the snippets appear in either the foreground or the background —this is why some folds are easy and some are hard.

在后一点上(这是从论文中得出的),值得注意的是,不同的录音/折页在片段出现在前景或背景中时具有不同的分布,这就是为什么有些折痕很容易而有些折痕很困难的原因。

tl; dr CV (tl;dr CV)

  • We need to train on folds 1–9, predict & score on fold 10我们需要训练1-9,预测10并得分
  • Then train on folds 2–10, predict & score on fold 1然后以2-10倍训练,预测1并得分
  • …etc……等等…
  • Averaging the scores on the test folds with this process will match the existing research AND ensure that we aren’t accidentally leaking data about the source recording to our holdout set.

    通过此过程对测试结果进行平均,将与现有研究相匹配,并确保我们不会将有关源记录的数据意外泄漏到我们的保留集中。

离开一组 (Leave One Group Out)

Initially, I coded the split process described above by hand using numpy with respect to the given folds. While it wasn’t too bad, I realized that scikit-learn provides a perfect solution in the form of LeaveOneGroupOut KFold splitting.

最初,我针对给定的折叠使用numpy手动编码了上述拆分过程。 虽然还算不错,但我意识到scikit-learn以LeaveOneGroupOut KFold拆分的形式提供了一个完美的解决方案。

To prove to myself it is what we want, I ran a slightly altered version of the test code for the splitter from the sklearn docs:

为了向自己证明这是我们想要的,我为sklearn docs的分离器运行了稍微改动的测试代码版本:

import numpy as npfrom sklearn.model_selection import LeaveOneGroupOutX = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])y = np.array([1, 2, 1, 2, 1, 2])groups = np.array([1, 2, 3, 1, 2, 3])logo = LeaveOneGroupOut()logo.get_n_splits(X, y, groups)logo.get_n_splits(groups=groups)  # 'groups' is always requiredfor train_index, test_index in logo.split(X, y, groups):    X_train, X_test = X[train_index], X[test_index]    y_train, y_test = y[train_index], y[test_index]    print("TRAIN:", X_train, "TEST:", X_test)

"""TRAIN: [[ 3  4] [ 5  6] [ 9 10] [11 12]]  TEST:  [[1 2] [7 8]]

TRAIN: [[ 1  2] [ 5  6] [ 7  8] [11 12]]  TEST:  [[ 3  4] [ 9 10]]TRAIN: [[ 1  2] [ 3  4] [ 7  8] [ 9 10]]  TEST:  [[ 5  6] [11 12]] """

Note that in this toy example there are 3 groups, ‘1’, ‘2’, and ‘3’.

请注意,在此玩具示例中,有3个组,“ 1”,“ 2”和“ 3”。

When I feed the group membership list for each training example to the splitter, it correctly ensures that the same group examples never appear in both train and test.

当我将每个训练示例的组成员资格列表馈入拆分器时, 它可以正确地确保相同的组示例永远不会出现在训练和测试中。

模型类 (The Model Class)

Thanks to sklearn this ends up being pretty easy to implement!

多亏了sklearn,这最终很容易实现!

from sklearn.model_selection import LeaveOneGroupOut, StratifiedKFold, GridSearchCVfrom sklearn.pipeline import Pipelinefrom sklearn.metrics import accuracy_score, classification_report, f1_scorefrom sklearn.preprocessing import LabelEncoder, StandardScalerimport numpy as npimport randomclass Model:    def __init__(self, feature_matrix, labels, folds, cfg):        self.X = feature_matrix        self.encoder = LabelEncoder()        self.y = self.encoder.fit_transform(labels)        self.folds = folds        self.cfg = cfg        self.val_fold_scores_ = []    def train_kfold(self):        logo = LeaveOneGroupOut()        for train_index, test_index in logo.split(self.X, self.y, self.folds):            X_train, X_test = self.X[train_index], self.X[test_index]            y_train, y_test = self.y[train_index], self.y[test_index]            ss = StandardScaler(copy=True)            X_train = ss.fit_transform(X_train)            X_test = ss.transform(X_test)            clf = self.cfg["model"]            clf.fit(X_train, y_train)            y_pred = clf.predict(X_test)            fold_acc = accuracy_score(y_test, y_pred)            self.val_fold_scores_.append(fold_acc)        return self.val_fold_scores_

Here I add in some scaling, but in essence the splitter will give us the desired CV. After each iteration of the splitter I train the fold on 9 folds and predict on the holdout fold. This happens 10 times, and then we can average over the returned list of 10 scores on the holdout folds.

在这里,我添加了一些缩放比例,但实质上,分离器将为我们提供所需的CV。 在拆分器的每次迭代之后,我将折页训练为9折,并预测保持折页。 这会发生10次,然后我们可以对返回的10个分数进行平均(在保持倍数上)。

结果 (Results)

"""In: fold_acc                                                                                                                                              Out: [0.6632302405498282, 0.7083333333333334, 0.6518918918918919, 0.6404040404040404, 0.7585470085470085, 0.6573511543134872, 0.6778042959427207, 0.6910669975186104, 0.7230392156862745, 0.7825567502986858]In: np.mean(fold_acc)                                                                                                                                     Out: 0.6954224928485881"""

69.5% is about right in line with what the authors have in their paper for the top models! Thus I’m feeling good that this was implemented as they envisioned. Also note they also show that fold10 was the easiest to score on (and we have that too), so we’re in line there as well.

69.5%的水准与作者在论文中对顶级型号的水准相符! 因此,我很高兴按他们的设想实现了这一目标。 还要注意,他们还表明fold10是最容易得分的(我们也有),因此我们也排在前面。

为什么不为此任务运行超参数搜索? [非常可选] (Why not run a hyperparameter search for this task? [very optional])

Here’s where things get a little tricky.

这是有些棘手的地方。

A ‘Normal’ CV Process:

“常规”简历流程:

If we could train/test/split arbitrarily, we could do something like:

如果我们可以任意训练/测试/拆分,则可以执行以下操作:

  1. Split off a holdout test set拆分保持测试集
  2. From the larger train portion, split off a validation set.从较大的火车部分中,分离出一个验证集。
  3. Run some type of parameter search algo (say, GridSearchCV) on the train (non-val, non-test).在火车上运行某种类型的参数搜索算法(例如GridSearchCV)(非Val,非测试)。
  4. The GridSearch will run k-fold cross-validation on the train test, splitting it into folds. At the end, an estimator will be refit on the train portion with the best params found in the inner k-fold cross-validation of GridSearchCVGridSearch将在火车测试中进行k折交叉验证,将其拆分为折叠。 最后,将使用在GridSearchCV的内部k折交叉验证中找到的最佳参数对火车部分重新安装估算器
  5. Then we take that fitted best estimator and score in on validation set然后,我们采用最合适的估计量,并在验证集上得分

Because we have the validation set in part 5, we can repeat steps 3 and 4 a bunch of times on different model families or different parameter search ranges.

因为我们在第5部分中设置了验证集,所以我们可以在不同的模型族或不同的参数搜索范围上重复执行步骤3和4多次。

Then when we are done we’d take our final model and see if it generalizes using the holdout test set, which we hadn’t touched to that point.

然后,当我们完成后,我们将使用最终模型,看看它是否可以使用保持测试集进行概括,而这一点我们还没有涉及到。

But how is this going to work within our Fold based LeaveOneGroupOut approach? Imagine we tried to setup a GridSearchCV as follows:

但是,这在我们基于Fold的LeaveOneGroupOut方法中如何工作? 想象一下,我们尝试按以下步骤设置GridSearchCV:

def train_kfold(self):        logo = LeaveOneGroupOut()        for train_index, test_index in logo.split(self.X, self.y, self.folds):            X_train, X_test = self.X[train_index], self.X[test_index]            y_train, y_test = self.y[train_index], self.y[test_index]

            ss = StandardScaler(copy=True)            X_train = ss.fit_transform(X_train)            X_test = ss.transform(X_test)            clf = self.cfg["model"]

            kf = StratifiedKFold(n_splits=3, random_state=42, shuffle=True)

            # This isn't valid. In this inner CV things from same recording fold            # could end up in same train and val set of GridSearchCV            grid_search = GridSearchCV(                estimator=clf,                param_grid=self.cfg["param_grid"],                cv=kf,                return_train_score=True,                verbose=3,                refit=True            )            grid_search.fit(X_train, y_train)            self.trained_models_.append(grid_search.best_estimator_)            y_pred = grid_search.predict(X_test)            fold_acc = accuracy_score(y_test, y_pred)            self.val_fold_scores_.append(fold_acc)        return self.val_fold_scores_

But now when GridSearchCV runs the inner split, we’ll run into the same problem that we had solved by using LeaveOneGroupOut!

但是现在,当GridSearchCV运行内部拆分时,我们将遇到使用LeaveOneGroupOut解决的相同问题!

That is, imagine the first run of this loop where the test set is fold 1 and the train set is on folds 2–10. If we then pass the train set (of folds 2–10) into the inner GridSearchCV loop, we’ll end up with inner KFold cases where the same fold is used in the inner GridSearchCV train and the inner GridSearchCV test.

也就是说,想象一下该循环的第一次运行,其中测试集为折叠1,训练集为折叠2-10。 如果然后将训练集(2-10倍)传递到内部GridSearchCV循环中,我们将得到内部KFold情况,其中内部GridSearchCV训练和内部GridSearchCV测试使用相同的折叠。

Thus it’s going to end up (very likely) overfitting the choice of best params within the inner GridSearchCV loop.

因此,最终(很有可能)将过度适合内部GridSearchCV循环内最佳参数的选择。

And hence, I’m not going to run a hyperparameter search within the LeaveOneGroupOut loop.

因此,我不会在LeaveOneGroupOut循环中运行超参数搜索。

下一步 (Next Steps)

I’m pretty pleased this correctly implemented the research paper — at least in terms of very closely matching their results.

我很高兴这项研究报告得以正确实施-至少在与他们的研究结果非常匹配方面。

  1. I’d like to try extracting larger feature vectors per example, and then running these through a few different Keras based NN architectures following the same CV process here我想尝试每个示例提取较大的特征向量,然后按照此处相同的CV流程,通过一些不同的基于Keras的NN架构运行它们
  2. In terms of feature extraction, I’d also like to consider the nuances of misclassifications between classes and see if I can think up better features for the hard examples. For instance, it’s definitely getting confused on the air conditioner v engine idling class. To check this, I have some code in my prior audio blog post that you can use to look at the False Positive Rate and False Negative rate per class: https://github.com/marcmuon/audio_genre_classification/blob/master/model.py#L84-L128

    在特征提取方面,我还想考虑类之间错误分类的细微差别,看看我是否可以为这些硬示例想出更好的特征。 例如,在空调v发动机空转等级上肯定会感到困惑。 为了对此进行检查,我之前的音频博客文章中有一些代码,您可以使用这些代码查看每类的误报率和误报率: https : //github.com/marcmuon/audio_genre_classification/blob/master/model。 py#L84-L128

Thanks for reading this far! I intend to do a 2nd part of this post addressing the Next Steps soon. Some other work that might be of interest can be found here:

感谢您阅读本文! 我打算在这篇文章的第二部分中介绍“下一步”。 您可能会在这里找到其他一些有趣的工作:

https://github.com/marcmuonhttps://medium.com/@marckelechava

https://github.com/marcmuon https://medium.com/@marckelechava

引文 (Citations)

J. Salamon, C. Jacoby and J. P. Bello, “A Dataset and Taxonomy for Urban Sound Research”, 22nd ACM International Conference on Multimedia, Orlando USA, Nov. 2014.[ACM][PDF][BibTeX]

J. Salamon,C。Jacoby和JP Bello,“ 城市声音研究的数据集和分类法 ”,第22届ACM国际多媒体会议,美国奥兰多,2014年11月。 [ ACM ] [ PDF ] [ BibTeX ]

翻译自: https://towardsdatascience.com/urban-sound-classification-with-librosa-nuanced-cross-validation-5b5eb3d9ee30

遭遇棘手 交接


http://www.taodudu.cc/news/show-863567.html

相关文章:

  • 模型越复杂越容易惰性_ML模型的惰性预测
  • vgg 名人人脸图像库_您看起来像哪个名人? 图像相似度搜索模型
  • 机器学习:贝叶斯和优化方法_Facebook使用贝叶斯优化在机器学习模型中进行更好的实验
  • power-bi_在Power BI中的VertiPaq内-压缩成功!
  • 模型 标签数据 神经网络_大型神经网络和小数据的模型选择
  • 学习excel数据分析_为什么Excel是学习数据分析的最佳方法
  • 护理方面关于人工智能的构想_如何提出惊人的AI,ML或数据科学项目构想。
  • api数据库管理_API管理平台如何增强您的数据科学项目
  • batch lr替代关系_建立关系的替代方法
  • ai/ml_您本周应阅读的有趣的AI / ML文章(8月9日)
  • snowflake 使用_如何使用机器学习模型直接从Snowflake进行预测
  • 统计 python_Python统计简介
  • ios 图像翻转_在iOS 14中使用计算机视觉的图像差异
  • 熔池 沉积_用于3D打印的AI(第3部分):异常熔池分类的纠缠变分自动编码器
  • 机器学习中激活函数和模型_探索机器学习中的激活和丢失功能
  • macos上的硬盘检测工具_如何在MacOS上使用双镜头面部检测器(DSFD)实现90%以上的精度
  • 词嵌入应用_神经词嵌入的法律应用
  • 谷歌 colab_使用Google Colab在Python中将图像和遮罩拆分为多个部分
  • 美国人口普查年收入比赛_训练网络对收入进行分类:成人普查收入数据集
  • NLP分类
  • 解构里面再次解构_解构后的咖啡:焙炒,研磨和分层,以获得更浓的意式浓缩咖啡
  • 随机森林算法的随机性_理解随机森林算法的图形指南
  • 南加州大学机器视觉实验室_机器学习带动南加州爱迪生的变革
  • 机器学习特征构建_使用Streamlit构建您的基础机器学习Web应用
  • 数学建模算法:支持向量机_从零开始的算法:支持向量机
  • 普元部署包部署找不到构建_让我们在5分钟内构建和部署AutoML解决方案
  • 基于决策树的多分类_R中基于决策树的糖尿病分类—一个零博客
  • csdn无人驾驶汽车_无人驾驶汽车100年历史
  • 无监督学习 k-means_无监督学习-第2部分
  • regex 正则表达式_使用正则表达式(Regex)删除HTML标签

遭遇棘手 交接_Librosa的城市声音分类-棘手的交叉验证相关推荐

  1. k折交叉验证法python实现_Jason Brownlee专栏| 如何解决不平衡分类的k折交叉验证-不平衡分类系列教程(十)...

    作者:Jason Brownlee 编译:Florence Wong – AICUG 本文系AICUG翻译原创,如需转载请联系(微信号:834436689)以获得授权 在对不可见示例进行预测时,模型评 ...

  2. 文本分类 - 样本不平衡的解决思路与交叉验证CV的有效性

    现实情况中,很多机器学习训练集会遇到样本不均衡的情况,应对的方案也有很多种. 笔者把看到的一些内容进行简单罗列,此处还想分享的是交叉验证对不平衡数据训练极为重要. 文章目录 1 样本不平衡的解决思路 ...

  3. 使用ModelArts自动学习完成猫狗声音分类

    准备数据 点击下载猫狗声音数据集至本地: 解压,文件包结构大概如下图所示 data ├── test │ ├── cats │ │ ├── cat_20.wav │ │ ├── ...... │ │ ...

  4. 重磅!百度飞桨开源语音基础模型库|中英文语音识别、语音翻译、语音合成、声音分类通通一行代码轻松搞定...

    导读 要说生活里最常见的AI应用场景,语音合成与识别当属大家最为耳熟能详的场景之一了. 寻常到平时地图导航的播报.微信语音转文字.手机语音输入,以及小度智能音箱,都离不开语音技术的加持. 语音技术到底 ...

  5. 语音识别 AI 挑战赛上线:用深度学习三种结构,对 50 种环境声音分类!

    雷锋网 AI 源创评论按:此前,AI 研习社(https://god.yanxishe.com )陆续推出了医疗.美食.安全等多个领域的图像识别挑战赛以及 NLP 方向的挑战赛 30 余场.在这过程中 ...

  6. tensorflow训练自己的声音数据集进行声音分类

    ** tensorflow训练自己的声音数据集进行声音分类 ** 环境 win10 anaconda3.5 tensorflow 2.0 1.安装anaconda https://pan.baidu. ...

  7. AI创想秀,体验华为云ModelArts平台声音分类

    目录 一.AI Gallery 社区体验 二.通过华为云 ModelArts 平台声音分类 三.华为云 ModelArts 总结 一.AI Gallery 社区体验 AI Gallery 是在 Mod ...

  8. ModelArts第二次培训(声音分类和文本分类)

    声音分类 首先在obs-browser-plus软件中创建桶,并将测试的声音文件导入桶中 然后再modelarts中进行声音项目的创建 完善信息 进行标注和训练 之后进行部署操作 最后测试效果 文本分 ...

  9. matlab 深度学习做声音分类

    1.内容简介 略 595-可以交流.咨询.答疑 深度学习做声音分类 2.内容说明 提取声音特征,然后通过CNN进行分类 3.仿真分析 [normal_01_L,~] = audioread('正常样本 ...

最新文章

  1. pytorch中tensor的unsqueeze()函数和squeeze()函数的用处
  2. Android开发 - 解决DialogFragment在全屏时View被状态栏遮住的问题
  3. 题目1169:比较奇偶数个数
  4. 上海11月份计算机方面的会议,计算机类 | 10月截稿会议信息6条
  5. oracle数据库元数据SQL查询
  6. Debian wheezy安装Redis 3.0
  7. MATLAB离散傅里叶变换实验结果分析,Matlab离散傅里叶变换实验报告
  8. git clone 之前,是否需要先 git init
  9. 分析师:网易游戏占营收91% 已成游戏公司
  10. MySQL数据库入门学习
  11. windows11 - 快速实现局域网内传文件
  12. NOIP2013提高组华容道题解
  13. java p2p实例_java文件p2p传输
  14. 阮一峰ES6学习-Symbol
  15. HW-RTOS 概述
  16. 微带线特性阻抗计算公式_几种计算微带线特性阻抗的方法.pdf
  17. win10 git bash 设置别名
  18. 批量提取CAD中文字
  19. 3G带动企业移动管理信息化应用 直播视频
  20. .jar是什么文件?(转载)

热门文章

  1. ICG游戏:尼姆游戏异或解法的证明
  2. Android ORM 框架:GreenDao 数据库升级
  3. 怎么使用ar打包静态库
  4. Visual Studio 2017 15.5.0 正式发布 正式版下载
  5. 依赖注入容器 Castle windsor的使用
  6. [健康]女人喝红酒的好处
  7. staticextension 上提供值时引发了异常_干!一张图整理了 Python 所有内置异常
  8. vba 定义类_VBA中类的介绍及应用简介
  9. c语言如何输出无限小数,printf的格式控制(C语言)
  10. P2802 回家(dfs+三维数组标记+剪枝)