
初学者指南 (Beginner’s Guide)

This article provides detailed implementation of different classification algorithms on Hyperspectral Images(HSI).


目录 (Table of Contents)

  • Introduction to Hyperspectral Images(HSI)


  • Dimensionality Reduction(DR)


  • Classification Algorithms


  • Implementation — Classification on HSI


高光谱图像(HSI)简介 (Introduction to Hyperspectral Images(HSI))

In Remote Sensing, Hyperspectral remote sensors are widely used for monitoring the earth’s surface with the high spectral resolution. Generally, the HSI contains more than three bands compared to conventional RGB Images. The Hyperspectral Images(HSI) are used to address a variety of problems in diverse areas such as Crop Analysis, Geological Mapping, Mineral Exploration, Defence Research, Urban Investigation, Military Surveillance, etc.

遥感中 ,高光谱遥感器广泛用于以高光谱分辨率监视地球表面。 通常,与传统的RGB图像相比,HSI包含三个以上的波段。 高光谱图像(HSI)用于解决作物 分析地质制图矿物勘探国防研究,城市调查,军事监视等各个领域的各种问题。

Use the below article which provides information on Data Collection, Data Preprocessing, and Exploratory Data Analysis on HSI.


There are various open-source sites providing hyperspectral data for learning purposes. Here are the two popular sites:

有许多开放源站点提供高光谱数据用于学习目的。 这是两个受欢迎的网站:

In this article, we use the Indian Pines(IP) Hyperspectral Image Dataset. The Indian Pines(IP) HSI data is gathered using the AVIRIS sensor over the Indian Pines test site in North-western Indiana and it consists of 145 X 145 pixels, 16 classes, and 200 bands. Here are the Ground Truth details of the Indian Pines(IP) Dataset:

在本文中,我们使用“ 印度松(IP)高光谱图像数据集”。 印度派恩斯(IP)HSI数据是使用AVIRIS传感器在印第安纳州西北部的印度派恩斯测试站点上收集的,它由145 X 145像素,16个类别和200个波段组成。 以下是印度松树(IP)数据集的地面真相详细信息:

Image for post
Ground Truth Details of Indian Pines(IP) Dataset

The code to read the dataset:


from scipy.io import loadmatdef read_HSI():X = loadmat('Indian_pines_corrected.mat')['indian_pines_corrected']y = loadmat('Indian_pines_gt.mat')['indian_pines_gt']print(f"X shape: {X.shape}\ny shape: {y.shape}")return X, yX, y = read_HSI()

The visualization of the Ground Truth of the Indian Pines dataset is shown below:


Image for post
Ground Truth Visualisation of Indian Pines Dataset

The visualization of the six randomly selected bands over 200 is shown below:


Image for post
Visualization of the Bands of Indian Pines(IP) Dataset

降维(DR) (Dimensionality Reduction(DR))

Dimensionality Reduction is used to reduce the number of dimensions of the data, thereby paving the way for the classifiers to generate comprehensive models at a low computational cost. Hence, Dimensionality Reduction (DR) has become more prominent to improve the accuracy of pixel classification in Hyperspectral Images(HSI).

降维用于减少数据的维数,从而为分类器以较低的计算成本生成综合模型铺平了道路。 因此,降维(DR)在提高高光谱图像(HSI)中像素分类的准确性方面变得更加突出。

Dimensionality Reduction can be done in two types. They are:

降维可以采用两种类型。 他们是:

  • Feature Selection
  • Feature Extraction

Feature Selection is the process of selecting dimensions of features of the dataset which contributes mode to the machine learning tasks such as classification, clustering, e.t.c. This can be achieved by using different methods such as correlation analysis, univariate analysis, e.t.c.


Feature Extraction Feature Extraction is a process of finding new features by selecting and/or combining existing features to create reduced feature space, while still accurately and completely describing the data set without loss of information.


Based on the criterion function and process of convergence, dimensionality reduction techniques are also classified as Convex and Non-Convex. Some popular dimensionality reduction techniques include PCA, ICA, LDA, GDA, Kernel PCA, Isomap, Local linear embedding(LLE), Hessian LLE, etc.

基于准则函数和收敛过程,降维技术也分为凸和非凸。 一些流行的降维技术包括PCA,ICA,LDA,GDA,内核PCA,Isomap,局部线性嵌入(LLE),Hessian LLE等。

Use the below article “Dimensionality Reduction in Hyperspectral Images using Python” to get a better understanding.


In this article, we are going to use Principal Component Analysis(PCA) to reduce the dimensionality of the data.


主成分分析(PCA) (Principal Component Analysis(PCA))

Principal Component Analysis(PCA) is one of the standard algorithms used to reduce the dimensions of the data. PCA is a non-parametric algorithm that increases the interpretability at the same time reducing the minimizing the loss of information(Reconstruction Error).

主成分分析(PCA)是用于减少数据量的标准算法之一。 PCA是一种非参数算法,可在提高解释性的同时减少信息损失(重构错误)。

Use the below two papers for better understanding the math behind the PCA.


Based on the explained variance ratio the number of components is taken as 40. The below code explains —


pca = PCA(n_components = 40)dt = pca.fit_transform(df.iloc[:, :-1].values)q = pd.concat([pd.DataFrame(data = dt), pd.DataFrame(data = y.ravel())], axis = 1)q.columns = [f'PC-{i}' for i in range(1,41)]+['class']

The first eight principal components or eight bands are shown below:


Image for post
First Eights Bands after PCA

分类算法 (Classification Algorithms)

Classification refers to a predictive modeling problem where a class label is predicted for the given input data. The classification can be divided as :

分类是指预测建模问题,其中针对给定输入数据预测类别标签。 分类可分为:

  • Classification Predictive Modeling
  • Binary Classification
  • Multi-Class Classification
  • Multi-Label Classification
  • Imbalanced Classification

Today, we are dealing with the Multi-Class Classification problem. There are different classification algorithms that are used for the classification of Hyperspectral Images(HSI) such as :

今天,我们正在处理“多类分类”问题。 高光谱图像(HSI)的分类有不同的分类算法,例如:

  • K-Nearest Neighbors
  • Support Vector Machine
  • Spectral Angle Mapper
  • Convolutional Neural Networks
  • Decision Trees e.t.c

In this article, we are going to use the Support Vector Machine(SVM) to classify the Hyperspectral Image(HSI).


支持向量机(SVM) (Support Vector Machine(SVM))

Support Vector Machine is a supervised classification algorithm that maximizes the margin between data and hyperplane. Different kernel functions are used to project the data into higher dimensions such as Linear, polynomial, Radial Basis Function(RBF), e.t.c.

支持向量机是一种监督分类算法,可最大化数据和超平面之间的余量。 使用不同的内核函数将数据投影到更高的维度,例如线性,多项式,径向基函数(RBF)等

For better understanding, the concept behind SVM refer the below lectures:


实施-恒指分类 (Implementation — Classification on HSI)

The below code serves the purpose of implementing the support vector machine to classify the Hyperspectral Image.


x = q[q['class'] != 0]X = x.iloc[:, :-1].valuesy = x.loc[:, 'class'].values names = ['Alfalfa',  'Corn-notill', 'Corn-mintill',  'Corn',     'Grass-pasture','Grass-trees',
'Soybean-clean', 'Wheat',   'Woods',    'Buildings Grass Trees Drives', 'Stone Steel Towers']X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=11, stratify=y)svm =  SVC(C = 100, kernel = 'rbf', cache_size = 10*1024)svm.fit(X_train, y_train)ypred = svm.predict(X_test)

The confusion matrix is generated using the code:


data = confusion_matrix(y_test, ypred)df_cm = pd.DataFrame(data, columns=np.unique(names), index = np.unique(names))df_cm.index.name = 'Actual'df_cm.columns.name = 'Predicted'plt.figure(figsize = (10,8))sn.set(font_scale=1.4)#for label sizesn.heatmap(df_cm, cmap="Reds", annot=True,annot_kws={"size": 16}, fmt='d')plt.savefig('cmap.png', dpi=300)
Image for post
Confusion Matrix

The generated Classification Report which consists of the Classwise Accuracy, Accuracy Precision, Recall, F1 Score, and Support is shown below:


Image for post
Classification Report

Finally, the classification Map is shown below:


Image for post
Classification Map of Indian Pines(IP) Dataset

The entire code that I have written in this article can be accessed using the below notebook in GitHub and CoLab.


翻译自: https://towardsdatascience.com/hyperspectral-image-analysis-classification-c41f69ac447f




