多维空间可视化

Recently, I was working on a project where I was trying to build a model that could predict housing prices in King County, Washington — the area that surrounds Seattle. After looking at the features, I wanted a way to determine the houses’ worth based on location.

最近，我在一个项目中尝试建立一个可以预测华盛顿金县(西雅图周边地区)房价的模型。在查看了这些功能之后，我想找到一种根据位置确定房屋价值的方法。

The dataset included latitude and longitude and it was easy to google them to take a look at the houses, their neighborhoods, their distance from the water, etc. But with over 17000 observations, that was a fool’s task. I had to find an easier way.

数据集包括纬度和经度，可以很容易地用谷歌浏览一下房屋，附近，距水的距离等。但是，通过17000多个观察，这是一个傻瓜的任务。我必须找到一种更简单的方法。

I had used Geographic Information Systems (GIS) only once before but not in Python. So I did what I do best: I googled, and ran into this amazing package called GeoPandas. I am going to let the GeoPandas team sum up what they do because they can say much better than I can.

我以前只使用过一次地理信息系统(GIS)，而没有在Python中使用过。因此，我做了我最擅长的事情：我搜索了Google，并遇到了一个名为GeoPandas的惊人软件包。我要让GeoPandas团队总结他们所做的事情，因为他们的发言能力比我更好。

GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. GeoPandas further depends on fiona for file access and descartes and matplotlib for plotting. — Description from GeoPandas Website (2020)

GeoPandas是一个开源项目，可简化使用python中的地理空间数据的工作。 GeoPandas扩展了熊猫使用的数据类型，以允许对几何类型进行空间操作。几何运算是通过匀称进行的。 GeoPandas进一步依赖于fiona进行文件访问，并依赖笛卡尔和matplotlib进行绘图。 — GeoPandas网站(2020)的说明

This blew my mind, and what I wanted was really just the most basic of the features. I am going to show you how to run this code and do what I did — plotting accurate points on a map.

这让我大吃一惊，而我想要的实际上只是最基本的功能。我将向您展示如何运行此代码并完成我的工作-在地图上绘制准确的点。

You are going to need several packages and some files in addition to the basic pandas and matplotlib. They include:

除了基本的pandas和matplotlib外，您还需要几个软件包和一些文件。它们包括：

geopandas — the package that makes all of this possiblegeopandas-使所有这些成为可能的软件包
shapely — package for manipulation and analysis of planar geometric objects

匀称 —用于处理和分析平面几何对象的程序包
descartes — provides a nicer integration of Shapely geometry objects with Matplotlib. It’s not needed every time but I import it just to be safe

笛卡尔(笛卡尔) -将Shapely几何对象与Matplotlib更好地集成。并非每次都需要它，但为了安全起见我将其导入
Any .shp file — this is going to be the backdrop of the plot. Mine is going to have King County, but you should be able to find one from any city’s data department. Don’t delete any files from the .zip file it comes in. Something always breaks.任何.shp文件-这将是情节的背景。我的将有金县，但您应该可以从任何城市的数据部门中找到一个。不要从它所包含的.zip文件中删除任何文件。总有东西会中断。

More information about shapefiles can be found here, but the long and short of it is that these aren’t normal images. They are a vector data storage format that has information linking to locations — coordinates and the rest.

关于shapefile的更多信息可以在这里找到，但总的来说，它们不是正常图像。它们是矢量数据存储格式，具有链接到位置(坐标和其余位置)的信息。

First I imported the basic packages that I needed and then the new packages:

首先，我导入了所需的基本软件包，然后导入了新软件包：

import matplotlib.pyplot as plt
import numpy as np from shapely.geometry import Point,Polygon
import geopandas as gpd
import descartes

The Point and Polygon features are what help me match my data to the map I make.

点和多边形功能可以帮助我将数据与我制作的地图进行匹配。

Next, I load in my data. This is basic pandas but for those that are new, everything in quotations is the name of the file I had to access the housing records.

接下来，我加载我的数据。这是基本的大熊猫，但对于新熊猫，引号中的所有内容都是我必须访问房屋记录的文件的名称。

df = pd.read_csv('kc_house_data_train.csv')

With all of the packages imported and the data ready to go, I wanted to take a look at the map I was going to be plotting. I did this by finding a shape file made by the King County government website. They have done all the hard work of surveying and cataloging the land — it would be rude to not use their freely offered services. Loading in the shape file is easy and comparable to loading in a csv file with pandas.

导入了所有软件包并准备好数据后，我想看一下我要绘制的地图。我通过查找金县政府网站制作的形状文件来完成此操作。他们已经完成了土地测量和分类的所有艰苦工作-不使用免费提供的服务是不礼貌的。加载到shape文件中很容易，并且与使用pandas加载到csv文件中相当。

kings_county = gpd.read_file('*file_path_here*/School_Districts_in_King_County___schdst_area.shp')

You can open this up if you want to take a look at the data. The King County shape file was just a dataframe of locations matched with their school districts, geometry coordinates, and area. But the best part is when we plot it and yes, we have to plot it. This isn’t an image you can just call — it will have the coordinates built in so our data can be placed down like a point on a 5th grade (x,y) graph.

如果要查看数据，可以打开此窗口。金县形状文件只是与他们的学区，几何坐标和面积相匹配的位置的数据框。但是最好的部分是当我们绘制它时，是的，我们必须绘制它。这不是您只能调用的图像-它具有内置的坐标，因此我们的数据可以像5级(x，y)图上的点一样放置。

Using the below code (notice how I edited it the same way I would edit a graph):

使用下面的代码(注意，我以与编辑图形相同的方式对其进行编辑)：

fig, ax = plt.subplots(figsize = (15,15))
kings_county.plot(ax=ax)
ax.set_title('King County',fontdict = {'fontsize': 30})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})

My output looked like this:

我的输出看起来像这样：

Before we start adding our housing data we should look at utilizing the shape file to the fullest. Let’s take a look at the file.

在开始添加房屋数据之前，我们应该充分利用形状文件。让我们看一下文件。

OID  D#  NAME                              geometry
0   1   1   Seattle           MULTIPOLYGON (((-122.40324 47.66637...
1   2   210 Federal Way       POLYGON ((-122.29057 47.39374...
2   3   216 Enumclaw          POLYGON ((-121.84898 47.34708...
3   4   400 Mercer Island     POLYGON ((-122.24475 47.59601...
4   5   401 Highline          POLYGON ((-122.35853 47.51553...- Truncated for clarity

As you can see, the county is divided on school districts — each with a shape used as boundaries. We will now try to plot the shape file and annotate the districts using the data provided like so:

如您所见，该县分为多个学区-每个学区的形状都用作边界。现在，我们将尝试绘制形状文件并使用提供的数据对区域进行注释，如下所示：

left = ['Riverview','Snoqualmie Valley']
center = ['Skykomish','Kent','Auburn','Tahoma','VashonIsland','Northshore','Shoreline','Renton','Highline','Issaquah','Enumclaw','Seattle','FederalWay','Bellevue','Mercer Island','LakeWashington','Tukwila']
right = ['Fife']
kings_county.plot(figsize = (15,15),cmap = 'gist_earth')
for idx, row in kings_county.iterrows():if row['NAME'] in left:plt.annotate(s=row['NAME'], xy=row['coords'],ha='left', color = 'red')elif row['NAME'] in center:plt.annotate(s=row['NAME'], xy=row['coords'],ha='center', color = 'red')elif row['NAME'] in right:plt.annotate(s=row['NAME'], xy=row['coords'],ha='right', color = 'red')
plt.title('School Districts in Kings County, WA', fontdict = {'fontsize': 20})
plt.ylabel('Latitude',fontdict = {'fontsize': 20})
plt.xlabel('Longitude',fontdict = {'fontsize': 20})

The lists — left, right, center — are from trial and error with the placement of the district names. Some overlapped or needed to be manipulated so that they did not stray too far from their actual district.

列表(左，右，中心)来自地区名称的放置，反复尝试。有些重叠或需要进行操纵，以使它们不会偏离实际区域。

I’ve changed the color map to gist_earth for clarity. Next, I iterated through each row using the entry in the NAME series, and placing the title at a point that was definitely in the polygon. I aligned the names based on the lists I had made earlier. And this was out output:

为了清楚起见，我将颜色映射更改为gist_earth 。接下来，我使用NAME系列中的条目遍历每一行，并将标题放置在肯定位于多边形中的点上。我根据之前的清单排列了名称。这是输出：

School Districts of King County. Graphic by Author

Each of the regions signifies a school district in King County. This matches the data I found about the twenty school districts in the county. I never really thought about the size and shape of a county, so I googled it just to be sure.

每个地区都代表金县的学区。这与我发现的有关该县二十个学区的数据相匹配。我从来没有真正考虑过一个县的大小和形状，所以我用谷歌搜索只是为了确定。

It seemed like the Google Maps image was the perfect hole for my puzzle piece. From here, it was just a matter of formatting my data to fit the shape file. I did that by initiating my coordinate system and creating applicable points using the latitude and longitude of my houses.

似乎Google Maps图像是我的拼图的完美选择。从这里开始，只需要格式化我的数据以适合形状文件即可。我通过启动坐标系并使用房屋的纬度和经度来创建适用的点来完成此操作。

crs = {'init': 'epsg:4326'} # initiating my coordinate system
geometry = [Point(x,y) for x,y in zip(df.long,df.lat)] # creating points

If you were to look at an entry in geometry, you only get back that they are shapely objects. They need to be applied to our original dataframe. Below, you can see as I make a brand new dataframe that has the coordinate system built in, the old dataframe, and the addition of the points created by the intersection of the Latitude and Longitude of the houses.

如果要查看几何图形中的条目，您只会发现它们是匀称的对象。它们需要应用于我们的原始数据框。在下面，您可以看到当我制作一个全新的数据框时，该数据框内置了坐标系，旧的数据框，并添加了房屋的经度和纬度相交点。

geo_df = gpd.GeoDataFrame(df, # the dataframecrs = crs, # coordinate systemgeometry = geometry) # geometric points

That was the last step before we can plot the houses. Now, we put it all together.

那是我们绘制房屋之前的最后一步。现在，我们将所有内容放在一起。

fig, ax = plt.subplots(figsize = (15,16))
kings_county.plot(ax=ax, alpha = 0.8, color = 'black')
geo_df.plot(ax = ax , markersize = 2, color = 'blue',marker ='o',label = 'House', aspect = 1)
plt.legend(prop = {'size':10} )
ax.set_title('Houses in Kings County, WA', fontdict = {'fontsize':20})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})

在上面的代码中，步骤包括： (In the code above, the steps include:)

Calling an object to plot.调用对象进行绘图。
Plotting the King County shape file.绘制金县形状文件。
Plotting the data I made that includes the geometry point.

绘制我制作的包括几何点的数据。

This includes making markers, choosing the aspect, and adding the label for the legend.

这包括制作标记，选择外观以及为图例添加标签。
Adding a legend, title, and axis labels.添加图例，标题和轴标签。

These steps were done for each of the graphs.

对每个图形都完成了这些步骤。

Our output:

我们的输出：

This is a great product but our goal is to learn something from this visualization. While this gives some information, like the outliers far to the eastern part of the county, it doesn’t give much else. We have to play with parameters. Let’s try splitting the data by price. These are the houses that are listed for less than $750,000.

这是一个很棒的产品，但是我们的目标是从可视化中学习一些东西。尽管这提供了一些信息，例如该县东部的离群值，但它并没有提供其他信息。我们必须使用参数。让我们尝试按价格划分数据。这些房屋的标价低于750,000美元。

fig, ax = plt.subplots(figsize = (15,25))
kings_county.plot(ax=ax, alpha = 0.8, color = 'black')
geo_df[geo_df['price'] < 750000].plot(ax = ax , markersize = 2,color = 'red',marker = 's',label = 'Price < 750k',aspect = 1.5)
plt.legend(prop = {'size':15} )
ax.set_title('Houses by Price in Kings County, WA', fontdict ={'fontsize': 20})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})

Houses priced below $750,000. Graphic by Author

Now we graph the houses greater than or equal to $750,000.

现在我们绘制大于或等于750,000美元的房子的图。

fig, ax = plt.subplots(figsize = (15,25))
kings_county.plot(ax=ax, alpha = 0.8, color = 'black')
geo_df[geo_df['price'] >= 750000].plot(ax = ax , markersize = 2,color = 'yellow',marker = 'v',label = 'Price >=750k', aspect = 1.5)
plt.legend(prop = {'size':15})
ax.set_title('Houses by Price in Kings County, WA', fontdict ={'fontsize': 20})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})

Houses priced above $750,000. Graphic by Author

There is a big difference in terms of both location and quantity. But that is not the end, we can also layer them one on top of the other. We will be doing the expensive on top of the cheap because it is scarcer.

在位置和数量上都存在很大差异。但这还没有结束，我们也可以将它们一个接一个地放置。我们将在便宜的基础上再做昂贵的，因为它稀缺。

fig, ax = plt.subplots(figsize = (15,25))
kings_county.plot(ax=ax, alpha = 0.8, color = 'black')
geo_df[geo_df['price'] < 750000].plot(ax = ax , markersize = 1,color = 'red',marker = 's',label = 'Price <750k = Red', aspect = 1.5)
geo_df[geo_df['price'] >= 750000].plot(ax = ax , markersize = 1,color = 'yellow',marker = 'v',label = 'Price>= 750k = Yellow',aspect = 1.5)
plt.legend(prop = {'size':12})
ax.set_title('Houses by Price in Kings County, WA', fontdict ={'fontsize': 20})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})

Side by side comparison. Graphic by Author

The picture painted by this map is interesting. There is a plethora of housing in King County that falls below the bar we’ve set. Most of the houses on the lower end of the price scale falls more inland than the more expensive classes.

该地图绘制的图片很有趣。金县的住房过多，低于我们设定的标准。价格范围较低端的大多数房屋比昂贵的房屋价格下跌的地区更多。

If you zoom in, the more expensive houses dot the waterside. They also are more centrally located around the Seattle city center. There are several physical outliers but the trend is clear.

如果放大，则较贵的房屋将点缀在水边。它们还位于西雅图市中心附近的中心位置。有几个物理异常值，但趋势很明显。

Overall, the visualization has done its job. We have made several determinations from the houses on the map. Pricier houses are collected around the downtown area and spread around Puget Sound. They are also a minority in the data, which could be telling for predicting housing prices. The houses priced on the cheaper side are much more numerous and have a varied location. This will be useful for further EDA.

总体而言，可视化已完成工作。我们已经从地图上的房屋中做出了一些决定。价格较高的房屋在市区周围收集，并分布在普吉特海湾附近。他们也是数据中的少数，这可能有助于预测房价。价格便宜的房屋数量更多，并且位置各异。这对于进一步的EDA很有用。

If you want to connect to talk more about this technique, you can find me on LinkedIn. If you would like to check out the code, take a look at my Github.

如果您想联系以更多地谈论这种技术，可以在LinkedIn上找到我。如果您想查看代码，请查看我的Github 。

资料来源 (Sources)

King County Dataset — here

金县数据集- 此处

King County Shape File —

金县形状文件—

here

这里
Geopandas

大熊猫
Shapely

匀称
Descartes

笛卡尔
Fiona

菲奥娜

翻译自: https://towardsdatascience.com/using-geopandas-for-spatial-visualization-21e78984dc37

多维空间可视化

查看全文

http://www.taodudu.cc/news/show-997396.html

机器学习来源框架_机器学习的秘密来源：策展
呼吁开放外网_服装数据集：呼吁采取行动
数据可视化分析票房数据报告_票房收入分析和可视化
先知模型 facebook_Facebook先知
项目案例:qq数据库管理_2小时元项目：项目管理您的数据科学学习
查询数据库中有多少个数据表_您的数据中有多少汁？
数据科学与大数据技术的案例_作为数据科学家解决问题的案例研究
商业数据科学
数据科学家数据分析师_站出来！分析人员，数据科学家和其他所有人的领导和沟通技巧...
分析工作试用期收获_免费使用零编码技能探索数据分析
残疾科学家_数据科学与残疾：通过创新加强护理
spss23出现数据消失_改善23亿人口健康数据的可视化
COVID-19研究助理
缺失值和异常值的识别与处理_识别异常值-第一部分
梯度 cv2.sobel_TensorFlow 2.0中连续策略梯度的最小工作示例
yolo人脸检测数据集_自定义数据集上的Yolo-V5对象检测
图深度学习-第2部分
量子信息与量子计算_量子计算为23美分。
失物招领php_新奥尔良圣徒队是否增加了失物招领？
客户细分模型_Avarto金融解决方案的客户细分和监督学习模型
梯度反传_反事实政策梯度解释
facebook.com_如何降低电子商务的Facebook CPM
西格尔零点猜想_我从埃里克·西格尔学到的东西
深度学习算法和机器学习算法_啊哈！ 4种流行的机器学习算法的片刻
统计信息在数据库中的作用_统计在行业中的作用
怎么评价两组数据是否接近_接近组数据（组间）
power bi 中计算_Power BI中的期间比较
matplotlib布局_Matplotlib多列，行跨度布局
回归分析_回归
线性回归算法数学原理_线性回归算法-非数学家的高级数学

多维空间可视化_使用GeoPandas进行空间可视化相关推荐

python 3d大数据可视化_基于Python的数据可视化库pyecharts介绍
什么是pyecharts? pyecharts 是一个用于生成 Echarts 图表的类库. echarts 是百度开源的一个数据可视化 JS 库,主要用于数据可视化.pyecharts 是一个用于生 ...
人口密度可视化_使用GeoPandas可视化菲律宾的人口密度
人口密度可视化 GeoVisualization /菲律宾. (GeoVisualization /Philippines.) Population density is a crucial conc ...
数据可视化信息可视化_更好的数据可视化的8个技巧
数据可视化信息可视化 Ggplot is R's premier data visualization package. Its popularity can likely be attribute ...
react 数据可视化_使用d3创建数据可视化并在2020年做出React
react 数据可视化 Data visualisation and application technologies have evolved greatly over the past decad ...
processing文本可视化_推荐7个数据可视化工具，让你的信息快速生成可视化
现在我们对可视化信息的需求越来越高,可视化信息比传统的文本信息更吸引眼球,方便阅读,加深记忆,因此也可以更快地被人们传播出去.近年来涌现出了许多数据可视化工具.有哪些值得使用呢?下面与大家分享九大数据 ...
python做大屏数据可视化_超强大！Python 可视化这款大屏就够了！
对于从事数据领域的小伙伴来说,当需要阐述自己观点.展示项目成果时,我们需要在最短时间内让别人知道你的想法.我相信单调乏味的语言很难让别人快速理解.最直接有效的方式就是将数据进行可视化展现. 提到数据可 ...
layui数据可视化_利用ggplot2进行数据可视化
2020-04-25 1.1. first step --意识到ggplot绘制其实是由一层层图层组成,一个命令即可增加一层 ggplot(data = mpg) + geom_point(mappi ...
基于plotly数据可视化_如何使用Plotly进行数据可视化
基于plotly数据可视化 The amount of data in the world is growing every second. From sending a text to clicki ...
使用Python+Folium实现地理空间可视化效果
概述如今,有多个数据科学项目需要使用交互式地图.可以通过各种工具制作这种交互式绘图,其中一种工具是 Python 的 Folium 库本文重点介绍使用 Folium 库创建令人印象深刻的地理可视化 ...

多维空间可视化_使用GeoPandas进行空间可视化

在上面的代码中，步骤包括： (In the code above, the steps include:)

资料来源 (Sources)

相关文章：

多维空间可视化_使用GeoPandas进行空间可视化相关推荐

最新文章

热门文章