人口密度可视化

GeoVisualization /菲律宾。 (GeoVisualization /Philippines.)

Population density is a crucial concept in urban planning. Theories on how it affects economic growth are divided. Some claim, as Rappaport does, that an economy is a form of “spatial equilibrium”: that net flows of residents and employment gradually move to be balanced with one another.

人口密度是城市规划中的关键概念。 关于它如何影响经济增长的理论存在分歧。 就像拉帕波特所做的那样,有人声称经济是“空间均衡”的一种形式: 居民和就业的净流动逐渐走向相互平衡。

The thought that density has some sort of relationship with economic growth has long been established by multiple studies. But whether the same theory holds for the Philippines and to what predates what (density follows urban development or urban development follows density) is a classic data science problem.

关于密度与经济增长之间存在某种关系的观点早已由多项研究确立。 但是,对于菲律宾来说,是否适用相同的理论以及先于什么(密度跟随城市发展,密度跟随城市发展)是一个经典的数据科学问题。

Before we can test out any models, however, let’s do a fun exercise and visualize our dataset.

但是,在测试任何模型之前,让我们做一个有趣的练习并使数据集可视化。

The 2015 Philippines’ Population Dataset

2015年菲律宾的人口数据集

The Philippine Statistic Authority publishes population data every five (5) years. At the time of the writing, only the 2015 Dataset is published so we will be using this.

菲律宾统计局每五(5)年发布一次人口数据。 在撰写本文时,仅发布了2015年数据集,因此我们将使用它。

Importing Packages

导入包

import pandas as pdimport matplotlib.pyplot as pltimport matplotlib.colors as colors #to customize our colormap for legendimport numpy as npimport seaborn as sns; sns.set(style="ticks", color_codes=True)import geopandas as gpdimport descartes #important for integrating Shapely Geometry with the Matplotlib Libraryimport mapclassify #You will need this to implement a Choroplethimport geoplot #You will need this to implement a Choropleth%matplotlib inline

A lot of the packages we will be using needs to be installed. For those having trouble installing GeoPandas, check out my article about this. Note that geoplot requires cartopy package and can be installed as any dependencies discussed in my article.

我们将要使用的许多软件包都需要安装。 对于那些在安装GeoPandas时遇到麻烦的人,请查看有关此的文章 。 请注意,geoplot需要cartopy软件包,并且可以作为本文中讨论的任何依赖项进行安装。

Loading Shapefiles

加载Shapefile

Shapefiles are needed to create “shape” to your geographical or political boundaries.

需要Shapefile来为您的地理或政治边界创建“形状”。

Download the shapefile and load it using GeoPandas.

下载shapefile并使用GeoPandas加载它。

An important note here when extracting the zip package: all the contents should be in one folder, even though you will simply be using the “.shp” file or else it won’t work. (this means that the “.cpg”, “.dbf”, “.prj” and so forth should be in the same location as your “.shp” file.

解压缩zip包时的重要注意事项:所有内容都应放在一个文件夹中,即使您只是使用“ .shp”文件,否则它也将不起作用。 (这意味着“ .cpg”,“。dbf”,“。prj”等应与“ .shp”文件位于同一位置。

You can download the shapefile of the Philippines in gadm.org (https://gadm.org/).

您可以在gadm.org( https://gadm.org/ )中下载菲律宾的shapefile。

Note: You can likewise download the shapefiles from: PhilGIS (http://philgis.org/). It will probably be better for Philippine data though some of it is sourced with GADM, but let’s go with GADM as I have more experience in it.

注意:您也可以从以下位置下载shapefile:PhilGIS( http://philgis.org/ )。 尽管其中一些数据来自GADM,但对于菲律宾数据而言可能会更好一些,但是随着我对GADM的更多了解,让我们开始吧。

#The level of adminsitrative boundaries are given by 0 to 3; the details and boundaries get more detailed as the level increasecountry = gpd.GeoDataFrame.from_file("Shapefiles/gadm36_PHL_shp/gadm36_PHL_0.shp")provinces = gpd.GeoDataFrame.from_file("Shapefiles/gadm36_PHL_shp/gadm36_PHL_1.shp")cities = gpd.GeoDataFrame.from_file("Shapefiles/gadm36_PHL_shp/gadm36_PHL_2.shp")barangay = gpd.GeoDataFrame.from_file("Shapefiles/gadm36_PHL_shp/gadm36_PHL_3.shp")

At this point, you can view the shapefiles and examine the boundaries. You can do this by plotting the shapefiles.

此时,您可以查看shapefile并检查边界。 您可以通过绘制shapefile来实现。

#the GeoDataFrame of pandas has built-in plot which we can use to view the shapefilefig, axes = plt.subplots(2,2, figsize=(10,10));#Color here refers to the fill-color of the graph while #edgecolor refers to the line colors (you can use words, hex values but not rgb and rgba)country.plot(ax=axes[0][0], color='white', edgecolor = '#2e3131');provinces.plot(ax=axes[0][1], color='white', edgecolor = '#2e3131');cities.plot(ax=axes[1][0], color='white', edgecolor = '#2e3131');barangay.plot(ax=axes[1][1], color='white', edgecolor = '#555555');#Adm means administrative boundaries level - others refer to this as "political boundaries"adm_lvl = ["Country Level", "Provincial Level", "City Level", "Barangay Level"]i = 0for ax in axes:    for axx in ax:                axx.set_title(adm_lvl[i])        i = i+1        axx.spines['top'].set_visible(False)        axx.spines['right'].set_visible(False)        axx.spines['bottom'].set_visible(False)        axx.spines['left'].set_visible(False)
Darker fills imply more boundaries
较深的填充表示更多的边界

Load Population Density Data

负荷人口密度数据

Population data and Density per SQ Kilometers are usually collected by the Philippine Statistics Authority (PSA).

人口数据和每SQ公里的密度通常由菲律宾统计局(PSA)收集。

You can do this with other demographics or macroeconomic data as the Philippines have been advancing on the provision of these. (Good Job Philippines!)

您可以使用其他人口统计数据或宏观经济数据来做到这一点,因为菲律宾一直在提供这些数据。 (菲律宾好工作!)

Because we want to amp up the challenge, let’s go with the most detailed one: the city and municipality level.

因为我们要应对挑战,所以让我们来探讨最详细的挑战:城市和市政级别。

We first load the data and examine it:

我们首先加载数据并检查它:

df = pd.read_excel(r'data\2015 Population Density.xlsx',                  header=1,                  skipfooter=25,                   usecols='A,B,D,E',                  names=["City", 'Population', "landArea_sqkms", "Density_sqkms"])

Cleaning the Data

清理数据

Before we can proceed, we have to clean our data. We should:

在继续之前,我们必须清除数据。 我们应该:

  • drop rows with empty values删除具有空值的行
  • remove non-alphabet characters after the names (* denoting footnotes)删除名称后的非字母字符(*表示脚注)
  • remove the words “(capital)” and “excluding” after each city name在每个城市名称后删除“(大写)”和“排除”
  • remove leading and trailing spaces删除前导和尾随空格
  • and many more….还有很多…。

Cleaning really will take the bulk of the work when merging data with shapefiles.

将数据与shapefile合并时,清理确实会占用大量工作。

This is true for the Philippines, which have cities that are named similarly after one another. (e.g. San Isidro, San Juan, San Pedro, etc).

对于菲律宾来说,这是正确的,因为菲律宾的城市彼此之间有着相似的名字。 (例如,圣伊西德罗,圣胡安,圣佩德罗等)。

Let’s skip this part in the article but for those who would like to know how I did it, visit my Github repository. The code will apply to any PSA data on a municipality/city level.

让我们跳过本文的这一部分,但是对于那些想知道我是如何做到的,请访问我的Github存储库 。 该代码将适用于市政/城市级别的任何PSA数据。

Exploratory Data Analysis

探索性数据分析

One of my favorite way to implement EDA is through a scatter plot. Let’s do it just to see which cities have high densities in chart form.

我最喜欢的实现EDA的方法之一是通过散点图。 让我们来看一下图表中哪些城市的人口密度高。

Matplotlib is workable but I like the style of seaborn plots so I prefer to use it more often.

Matplotlib是可行的,但是我喜欢海洋情节的风格,因此我更喜欢使用它。

#First sort the dataframe according to Density from highest to lowestsorted_df = df.sort_values("Density_sqkms", ascending=False,ignore_index=True )[:50]fig, ax = plt.subplots(figsize=(10,15));scatter = sns.scatterplot(x=df.index, y=df.Density_sqkms)#Labeling the top 20 data (limiting so it won't get too cluttered)#First sort the dataframe according to Density from highest to lowestsorted_df = df.sort_values("Density_sqkms", ascending=False)[:20]#Since text annotations,overlap for something like this, let's import a library that adjusts this automatically from adjustText import adjust_texttexts = [ax.text(p[0], p[1],"{},{}".format(sorted_df.City.loc[p[0]], round(p[1])),          size='large') for p in zip(sorted_df.index, sorted_df.Density_sqkms)];adjust_text(texts, arrowprops=dict(arrowstyle="->", color='r', lw=1), precision=0.01)
Scatter plot of densities for Philippines’ Cities and Municipalities. You can visually see that there are cities that are outliers in terms of density. Note that I made use of the library adjust_texts to make sure that labels are legible.
菲律宾城市的密度散点图。 您可以从视觉上看到在密度上有些城市是离群值。 请注意,我利用了adjust_texts库来确保标签清晰易读。

With this chart, we can already see which cities are above the average of “Nationa Capital Region”, namely, Mandaluyong, Pasay, Caloocan, Navotas, Makati, Malabon, and Marikina.

通过此图表,我们已经可以看到哪些城市位于“国家首都地区”的平均水平之上,即曼达卢永帕赛卡卢奥坎纳沃塔斯马卡蒂马拉本马利基纳

Within the top 20 as well, we can see that most of these cities are located in the “National Capital Region” and nearby provinces such as Laguna. Notice as well how the city of Manila is an outlier for this dataset.

同样在前20名中,我们可以看到这些城市中的大多数都位于“国家首都地区”和附近的省份,例如拉古纳。 还要注意,马尼拉市是该数据集的离群值。

GeoPandas Visualization

GeoPandas可视化

The First Law of Geography, according to Waldo Tobler, is “everything is related to everything else, but near things are more related than distant things.”

根据沃尔多· 托伯勒 (Waldo Tobler)的说法, “地理第一定律”是“所有事物都与其他事物相关,但近处的事物比远处的事物更相关”。

This is why in real estate, it is important to examine and visualize, how proximity affects values. Ultimately, GeoVisualization is one of the ways we can do this.

这就是为什么在房地产中,重要的是检查和可视化邻近性如何影响价值。 最终,GeoVisualization是我们执行此操作的方法之一。

We can already visualize our data using the builtin plot method of GeoPandas.

我们已经可以使用GeoPandas的内置绘图方法来可视化我们的数据。

k = 1600 #I find that the more colors, the smoother the viz becomes as data points are spread across gradientscmap = 'Blues'figsize=(12,12)scheme= 'Quantiles'ax = merged_df.plot(column='Density_sqkms', cmap=cmap, figsize=figsize,                          scheme=scheme, k=k, legend=False)ax.spines['top'].set_visible(False)ax.spines['right'].set_visible(False)ax.spines['bottom'].set_visible(False)ax.spines['left'].set_visible(False)#Adding Colorbar for legibility# normalize colorvmin, vmax, vcenter = merged_df.Density_sqkms.min(), merged_df.Density_sqkms.max(), merged_df.Density_sqkms.mean()divnorm = colors.TwoSlopeNorm (vmin=vmin, vcenter=vcenter, vmax=vmax)# create a normalized colorbarcbar = plt.cm.ScalarMappable(norm=divnorm, cmap=cmap)fig.colorbar(cbar, ax=ax)# plt.show()
Visualization using the built-in plot method of GeoPandas
使用GeoPandas的内置绘图方法进行可视化

Some analysts prefer monotonic colormaps such as Blues or Greens, but when data is highly-skewed (having many outliers), I find it is better to use diverging colormaps.

一些分析人员更喜欢单调的颜色图,例如蓝色或绿色,但是当数据高度偏斜(具有许多离群值)时,我发现使用分散的颜色图更好。

Diverging Colormaps to Visualize Data Dispersion
分散颜色图以可视化数据分散

Using diverging colormaps, we can visualize the dispersion of density values. Even looking at the colorbar legend indicates how density values in the Philippines contain outliers on the high side.

使用发散的颜色图,我们可以可视化密度值的分散。 即使查看色标图例,也表明菲律宾的密度值如何包含较高的离群值。

Plotting using Geoplot

使用Geoplot进行绘图

In addition to the built-in plot function of GeoPandas, you can plot this using geoplot.

除了GeoPandas的内置绘图功能外,您还可以使用geoplot对其进行绘图。

k = 1600cmap = 'Greens'figsize=(12,12)scheme= 'Quantiles'geoplot.choropleth(    merged_df, hue=merged_df.Density_sqkms, scheme=scheme,    cmap=cmap, figsize=figsize)

In the next series, let’s try to plot this more interactively or use some machine learning algorithms to extract more insights.

在下一个系列中,让我们尝试以更具交互性的方式进行绘制,或者使用一些机器学习算法来提取更多的见解。

For the full code, check out my Github repository.

有关完整的代码,请查看我的Github存储库 。

The code to preprocess data on the municipality and city level applies to other PSA reported statistics as well.

预处理市政和城市级别数据的代码也适用于PSA报告的其他统计数据。

Let me know what dataset you would like for us to try and visualize in the future.

让我知道您希望我们将来尝试并可视化的数据集。

翻译自: https://towardsdatascience.com/psvisualizing-the-philippines-population-density-using-geopandas-ab8190f52ed1

人口密度可视化


http://www.taodudu.cc/news/show-4628762.html

相关文章:

  • 伊洛纳服务器维护,《伊洛纳》【公告】伊洛纳12月23日正式服版本更新
  • 2005年5月信息系统监理师
  • 《 JavaScript程序设计》—— 第五章 函数
  • 华硕VIVO BOOK15s啃苹果之路
  • 一键启动u盘的快捷键查询表
  • 黑苹果(i7-4790,华硕B85-plus主板)的过程
  • 华硕品牌笔记本电脑一键u盘启动详细图文教程
  • 硬件:开机如何进BIOS,U盘启动快捷键一键查询
  • 苹果计算机转换,便携毕亚兹苹果计算机转换器,超极本的少接口都能转换身份...
  • 华硕W519L(win10系统)安装黑苹果全过程(几乎完美,已洗白)
  • 华硕服务器设置固态盘启动不了系统盘,华硕uefi引导启动不了系统安装系统安装...
  • 华硕FL5900U如何关闭ahci_实战华硕B360主板RX580显卡安装苹果macOS 10.14 Mojave
  • 华硕m80微型计算机,小就不能满足你了?华硕M80迷你台式机评测
  • 苹果笔记本有uefi启动吗_联想和华硕笔记本重装系统时新BIOS无法设置u盘启动怎么办...
  • P3906 Geodetic集合 最短路的点集
  • I.Geodetic---(弗洛伊德算法(Floyd)的运用)
  • UVA - 1198 POJ - 1612 The Geodetic Set Problem (floyd+set+模拟)
  • 洛谷P3906 Geodetic集合解题报告
  • flat2lla matlab,Convert from flat Earth position to array of geodetic coordinates
  • 洛谷 P3906 Geodetic集合 题解
  • The Geodetic Set Problem - POJ 1612 Floyd求最短路径所有点集
  • ROI_PAC processing for Modern Geodetic Methods
  • The Geodetic Set Problem UVA - 1198
  • lougu3906 Geodetic
  • 开有geodetic engineering的世界著名高校(持续更新)
  • 洛谷——P3906 Geodetic集合
  • osgEarth的Rex引擎原理分析(四十三)osgEarth的Geographic、Geodetic、Geocentric和Project的关系
  • 洛谷 P3906 Geodetic集合
  • Geodetic集合
  • Geodetic集合(c++)

人口密度可视化_使用GeoPandas可视化菲律宾的人口密度相关推荐

  1. 多维空间可视化_使用GeoPandas进行空间可视化

    多维空间可视化 Recently, I was working on a project where I was trying to build a model that could predict ...

  2. pytorch卷积可视化_使用Pytorch可视化卷积神经网络

    pytorch卷积可视化 Filter and Feature map Image by the author 筛选和特征图作者提供的图像 When dealing with image's and ...

  3. 基于plotly数据可视化_[Plotly + Datashader]可视化大型地理空间数据集

    基于plotly数据可视化 简介(我们将创建的内容): (Introduction (what we'll create):) Unlike the previous tutorials in thi ...

  4. c语言可视化_这些算法可视化网站助你轻松学算法

    前言 无疑,数据结构与算法学习最大的难点之一就是如何在脑中形象化其抽象的逻辑步骤.而图像在很多时候能够大大帮助我们理解其对应的抽象化的东西,而如果这个图像还是我们自己一点点画出来的,那么无疑这个印象是 ...

  5. myeclipse java可视化_使用MyEclipse可视化开发Hibernate实例

    使用MyEclipse可视化开发Hibernate实例 2.7节的例子源代码在配套光盘sourcecode/workspace目录的chapter02_first项目中. 这个实例主要演示如何使用My ...

  6. python动态排名可视化_动态排名可视化 | 带你领略编程语言20年风云变化

    原标题:动态排名可视化 | 带你领略编程语言20年风云变化 作者 | 周萝卜 来源 | 萝卜大杂烩 2019已经过去,回看编程语言20年真是风云变化! 从 TIOBE 榜(热门榜)上来看,Java.C ...

  7. python知识图谱可视化_知识图谱可视化

    ## 人物关系知识图谱 #### 一.背景 将结构化数据通过关系预处理程序处理为图数据库可以查询的数据,示例是将其中一部分(人物关系数据)可视化表示. #### 二.用到的技术 技术点:图数据库Neo ...

  8. python线性回归可视化_【Python可视化5】Seaborn之线性回归

    Seaborn是基于matplotlib的Python可视化库.它提供了一个高级界面来绘制有吸引力的统计图形.Seaborn其实是在matplotlib的基础上进行了更高级的API封装,从而使得作图更 ...

  9. nbi可视化_用数据可视化的方式做汇报,更容易显现成绩、升职加薪更近一步

    在日常工作中,老板总是会时不时的让我们做工作汇报,而这也是我们能够在老板面前展示自己的机会.但是,如果你拿给老板的是这样一张数据密密麻麻的表格,你觉得老板能够在短时间内看懂你的数据吗? 字不如表,表不 ...

最新文章

  1. php设计模式--单例模式
  2. 亿些模板【数据结构】
  3. codeforces280 C. Game on Tree(期望)
  4. java版本streamgobbler_java调用本地命令 Runtime class's exec() method
  5. html htc控件详解,*.HTC 文件的简单介绍-网页设计,HTML/CSS
  6. jacson 将json字符串转换为list
  7. JavaScript学习之Object(下)this
  8. PHP异步调用实现方式
  9. java+基于dom4j工具类_基于dom4j的xml映射实体的工具类(java)
  10. 教新手小白如何通过五分钟学会用Python爬取女神照片!
  11. 真-Windows用户名
  12. Word批量转PDF 无水印
  13. 网关与路由器关系介绍
  14. rabbitmq中交换机类型direct和topic的区别
  15. 谈谈微信小程序仿网易云音乐有关播放的那些事儿
  16. mysql查询各专业人数_查询各专业的学生数。
  17. EXCEL如何实现两个表的相应数据关联
  18. 英语词根、词根、前缀、后缀大全
  19. finalcut剪切快捷键_final cut pro怎么用快捷键把时间线上的素材移动到入点或剪辑点...
  20. easyCMDB安装 ops版本和挂在U盘

热门文章

  1. wordpress 高并发_HyperDB的高可用性WordPress
  2. U盘中的文件夹全变成应用程序格式如何解决?
  3. 什么是浏览器的同源策略?
  4. idea 撤销本地commit 但是未push的代码
  5. 什么是oa办公无纸化?企业如何真正实现oa办公无纸化?
  6. git bush如何上传文件
  7. 谷歌浏览器上传文件总是无反应,导致浏览器崩溃的问题
  8. SAP甲方历程回顾-01 2017年转到甲方的故事~从乙方离职
  9. 软件开发团队中的角色
  10. IOS小组件(4-2):创建可配置小组件(动态修改配置数据)