
by Zhen Liu


泡泡图如何揭示美国最适合居住的城市 (How a Bubble Plot Reveals the Best Cities to Live in the US)

In this article, I’ll show you some exciting facts about American cities, the value of bubble plots in deciding which city to live in, and how to create those plots.


Are you thinking about investing in real estate in 2018? Moving to a new city? When considering these decisions, you need to weigh in different factors like unemployment rate, housing price, the size of the city, safety and so on. Even with all that data and four corresponding bar charts, you’ll still be clueless staring at that table. You’ll try to find the best candidates, but those factors are telling different stories… Sounds like a complex problem.

您是否正在考虑在2018年投资房地产? 搬到新城市? 在考虑这些决定时,您需要权衡各种因素,例如失业率,房价,城市规模,安全性等。 即使有了所有这些数据和四个相应的条形图,您仍然会毫无头绪地盯着那个桌子。 您将尝试找到最合适的人选,但是那些因素在讲不同的故事……听起来像是一个复杂的问题。

So, is there a way we can visualize all these factors in 1 chart and compare them ALL? Yes, we can use a bubble plot!

因此,有没有一种方法可以在一张图表中可视化所有这些因素,并将它们全部比较? 是的,我们可以使用气泡图!

什么是泡泡图? (What’s a bubble plot?)

A bubble plot is a type of chart that displays more than two dimensions of data (compared to traditional scatter plots). In addition to plotting a dot on an X-Y plane, it uses the size, color, or shape of the point to display more dimensions.

气泡图是一种图表,它显示了两个以上的数据维度(与传统的散点图相比)。 除了在XY平面上绘制点外,它还使用点的大小,颜色或形状来显示更多尺寸。

We use unemployment rate as the X-axis, median home price as the Y-axis, and the population of the cities as the size of the dots. This makes a good third dimension. Color is randomly assigned to each city.

我们将失业率用作X轴,将房价中位数用作Y轴 ,并将城市人口用作点的大小。 这是一个很好的三维。 颜色是随机分配给每个城市的。

美国最适合居住的城市是…(等待它) (The best city in the US to live in is…(wait for it))

Winner: Nashville!

优胜者 纳什维尔!

Other recommendations: Austin, Omaha, Milwaukee, Dallas, Minneapolis, Denver and Aurora.

其他建议: 奥斯丁,奥马哈,密尔沃基,达拉斯,明尼阿波利斯,丹佛和奥罗拉。

They have low unemployment (and therefore there’s higher chance of finding a job), and low home price, because they are on the lower left side of the plot. What does that mean?

他们的失业率很低(因此找到工作的机会更高),房价也很低,因为它们位于地块的左下角。 那是什么意思?

It means you can make your choices based on this plot.


For example, if you consider unemployment rate to be more important and don’t mind the higher home prices, then Honolulu, Oakland, Boston, and San Diego are strong candidates


如何将安全性作为另一个因素呢? (What about adding safety as another factor?)

Sure. Let’s add safety as a fourth factor (the other three factors are still home price, unemployment rate, and population). Instead of randomly assigning a color for a city, we use the color scale for crime (crime rate per 100,000 people). Red means more crime and blue means less.

当然。 让我们添加安全性作为第四个因素(其他三个因素仍然是房价,失业率和人口)。 我们没有为城市随机分配颜色,而是使用犯罪颜色标度 (每10万人的犯罪率)。 红色表示更多犯罪, 蓝色表示较少犯罪。

结果会改变吗? (Does the result change?)

It did! If safety is very important for you, then Milwaukee might not be such a great choice among the previous recommendations (even though it’s at the lower left side of the graph).

它做了! 如果安全对您来说非常重要,那么在先前的建议中,密尔沃基可能不是一个很好的选择(即使它在图表的左下方)。

Now you see the power of a bubble plot: the ability to demonstrate multiple factors in one 2-D plot. If you only have bar charts for those factors, it’s hard for you to identify the cities with an ideal combination of factors. The bubble plot basically created a “visual objective function” for you to optimize a multi-variable decision-making problem.

现在,您将看到气泡图的力量:在一个二维图中展示多个因素的能力。 如果您仅具有这些因素的条形图,则很难确定具有理想因素组合的城市。 气泡图基本上为您创建了一个“视觉目标函数”,以优化多变量决策问题。

失业率和房价随时间如何变化? (How do unemployment rate and home price change over time?)

We can create an interactive motion chart to add time as a dimension (2013 to 2017) to see how the factors change for these cities over time.


To avoid too much visual information, I didn’t use crime data and used the different colors to represent a few selected cities.


The good news is that the unemployment rate for almost all cities decreased significantly (moving from right to left). But the bad news is that the housing prices are going up pretty fast (especially for San Francisco, San Jose, Los Angles, New York, and Seattle).

好消息是,几乎所有城市的失业率都显着下降(从右向左移动)。 但是坏消息是,房价上涨非常快(尤其是旧金山,圣何塞,洛杉矶,纽约和西雅图)。

Want to create the charts yourself? Here is my code for the bubble plots and the motion chart in R. Have fun playing with the plots :)

想自己创建图表吗? 这是我的R中气泡图和运动图的代码。玩得开心吧:)

################ Bubble Plot ################library(data.table)library(ggplot2)library(ggrepel)
bubble_data <-fread("https://raw.githubusercontent.com/zhendata/Medium_Posts/c007346db1575aca391a6623c87bb5a31a60b365/bubble_plot_merged_city_data.csv",sep=",")
bubble_plot <- ggplot(bubble_data,                aes(x = Unemployment_Rate, y = Home_Price/1000)) +
geom_point(aes(size = Population, fill = Total_Crime),shape=21) +# Create 'Bubble' by assigning size a variable #
scale_fill_continuous(low = "#33FFFF", high ="#FF6699" ) +scale_size_area(max_size = 20)+# Select bubble color scale and bubble maximum size #
geom_text_repel(          aes(label = City),nudge_x = 0,nudge_y = 0.75,size = 6) +# Use geom_text_repel to repel the labels away from each other #
theme_bw()+# Use white background instead of the default grey one #
ggtitle("Best Cities in US to Live in") +labs(x = "Unemployment Rate%", y = "Home Price",       size = "Population",fill="Crime") +theme(plot.title = element_text(size=25, hjust = 0.5),        axis.title=element_text(size=20, face = "bold"),        axis.text=element_text(size=15)) +# Style title and axis #
scale_y_continuous(name="Home Price", breaks = seq(0, 1500, by=250),                       labels=c("0", "250K", "500K", "750K", "1000k",    "1250k", "1500K"))# Make y-axis more readable by replacing scientific number by "K" #
################# Motion Chart #################library(data.table)library(googleVis)
motion_data <-fread("https://raw.githubusercontent.com/zhendata/Medium_Posts/c007346db1575aca391a6623c87bb5a31a60b365/motion_chart_merged_city_data.csv",sep=",")
motion_chart <- gvisMotionChart(motion_data, idvar = "City", timevar = "Year",xvar = "Unemployment Rate",yvar= "Home Price",sizevar="Population")
plot(motion_chart)# R automatically opens a tab in the browser for you# The flash player needs to be enabled in browser
######### Data #########"""The datasets I used are from Zillow (medium housing), FBI’s UCR program, census.gov (population), Bureau of Labor (unemployment). I did some data cleaning and joining for the format I needed in this article, and you can click the links below to download."""bubble_plot_merged_city_data.csv, motion_chart_merged_city_data.csv

