介绍 (Introduction)

This blog post summarizes the results of the Capstone Project in the IBM Data Science Specialization on Coursera. Within the project, the districts of Frankfurt am Main in Germany shall be clustered according to their venue data using the K-Means clustering algorithm. The first section describes the Business problem that we will be dealing with. Then we shall take a look at the data that can be used to solve the problem and the methodology for finding a solution.

这篇博客文章总结了Coursera上IBM Data Science Specialization中Capstone项目的结果。 在项目内,应使用K-Means聚类算法根据其场地数据对德国美因河畔法兰克福地区进行聚类。 第一部分描述了我们将要处理的业务问题。 然后,我们将研究可用于解决问题的数据和找到解决方案的方法。

业务问题 (Business Problem)

A client is interested in opening a franchise of their Asian restaurant chain in the city of Frankfurt am Main, preferably close to the city center. It will be their first restaurant in the city, and they want us to find out which would be the best neighborhood/district to open an Asian restaurant in the city. Additionally, the results of the clustering algorithm t can also be used by someone interested in moving to Frankfurt and wanting to know about the cuisines available in the various districts.

客户有兴趣在美因河畔法兰克福市(最好是靠近市中心)开设其亚洲餐厅连锁店的特许经营权。 这将是他们在这座城市的第一家餐厅,他们希望我们找出哪一个是在城市开设亚洲餐厅的最佳社区/地区。 另外,聚类算法t的结果也可以供有兴趣移居法兰克福并希望了解各个地区可用美食的人使用。

数据 (Data)

Following datasets have been used in this project:

在该项目中使用了以下数据集:

  1. Street Directory of the city of Frankfurt am Main: https://offenedaten.frankfurt.de/dataset/strassenverzeichnis-der-stadt-frankfurt-am-main

    美因河畔法兰克福市街道目录: https : //offenedaten.frankfurt.de/dataset/strassenverzeichnis-der-stadt-frankfurt-am-main

  2. Foursquare API to get the most common venues in Frankfurt districts.Foursquare API获得法兰克福地区最常见的场所。
  3. Demographics of Frankfurt am Main Neighborhoods : https://offenedaten.frankfurt.de/dataset/stadtteilprofile-bevoelkerung

    法兰克福主要社区的人口统计学: https : //offenedaten.frankfurt.de/dataset/stadtteilprofile-bevoelkerung

  4. Election Atlas 2015 — GeoJSON Frankfurt neighborhoods: https://offenedaten.frankfurt.de/dataset/wahlatlas-2015-geodaten/resource/84dff094-ab75-431f-8c64-39606672f1da

    2015年选举地图集-法兰克福GeoJSON社区: https : //offenedaten.frankfurt.de/dataset/wahlatlas-2015-geodaten/resource/84dff094-ab75-431f-8c64-39606672f1da

数据收集与清理 (Data Gathering and cleaning)

We will analyze the districts of the city of Frankfurt am Main in this project. The datasets are available as CSV files which can be converted into a pandas dataframe using the pd.read_csv function inbuilt in pandas.

我们将在此项目中分析美因河畔法兰克福市的地区。 数据集以CSV文件形式提供,可以使用内置在pandas中的pd.read_csv函数将其转换为pandas数据框。

Data 1: Street directory of Frankfurt am Main:

数据1:美因河畔法兰克福的街道目录:

This dataset will be used to extract the district names and postcodes in Frankfurt. It is available as a CSV file and can be accessed via the link given above. Frankfurt contains 46 city districts. This is a huge dataset containing 4540 rows and 15 columns. Therefore, it was necessary to shorten and clean it by keeping only the data that is required. It is a street directory, which is why the dataset is so big. It was shortened to extract only the district names and postcodes. The resultant dataset contained 46 rows (one for each district) and 3 columns.

该数据集将用于提取法兰克福的地区名称和邮政编码。 它以CSV文件的形式提供,可以通过上面给出的链接进行访问。 法兰克福包含46个市区。 这是一个巨大的数据集,包含4540行和15列。 因此,有必要通过仅保留所需的数据来缩短和清理它。 这是街道目录,因此数据集如此之大。 缩短了提取区域名称和邮政编码的时间。 结果数据集包含46行(每个区一个)和3列。

Data 2 :

数据2:

The geographical coordinates of the districts will be utilized as input for Foursquare API that will be leveraged to extract information for each district respectively. We will use the Foursquare API to explore the districts in Frankfurt. We use Foursquare API to get the most common venues for each district. Foursquare returns a JSON file, from which required data needs to be extracted. We only extract the venue name, category, and geographical coordinates for each venue. These are then stored in a separate dataframe, for use in clustering.

地区的地理坐标将被用作Foursquare API的输入,Foursquare API将被用于分别提取每个地区的信息。 我们将使用Foursquare API探索法兰克福地区。 我们使用Foursquare API获取每个地区最常见的场所。 Foursquare返回一个JSON文件,需要从中提取所需的数据。 我们仅提取每个场地的场地名称,类别和地理坐标。 然后将它们存储在单独的数据框中,以用于群集。

Data 3: Frankfurt Demographics:

资料3:法兰克福客层:

This dataset contains the district-wise distribution of population for the city of Frankfurt. It also contains useful data about the percentage of foreigners and specifically, population of various ethnicities in the districts. It contains 46 rows (one for each district) and 164 columns. It needs to be shortened to analyze. Only the required columns were picked from this dataset, which contained information about the total population of each district, population of foreigners, and so on. Moreover, the column names are in German. These were translated into English for easy understanding.

该数据集包含法兰克福市的区域人口分布。 它还包含有关外国人百分比,特别是各地区不同种族人口的有用数据。 它包含46行(每个区一个)和164列。 需要缩短分析时间。 从此数据集中仅选择了必需的列,其中包含有关每个地区的总人口,外国人的人口等信息。 此外,列名是德语。 这些被翻译成英文以便于理解。

Data 4: Frankfurt neighborhoods GeoJSON:

数据4:法兰克福社区GeoJSON:

The geoJSON file is required for plotting the Choropleth maps to analyze the demographics of Frankfurt districts. The district names in this file must match the district names in the dataset which is intended to be plotted. After checking, it was found that the districts of Bahnhofsviertel and Gutleutviertel are combined into a single district in the geoJSON file. Thus, the 2 district rows were merged in the demographics dataset. Also, there was an issue with the German letters containing umlauts, i.e. ü, ä, ö. Hence, districts containing these letters were also renamed as per the characters found in their equivalent names in the geoJSON file.

绘制Choropleth地图以分析法兰克福地区的人口统计信息时,需要geoJSON文件。 该文件中的区域名称必须与要绘制的数据集中的区域名称匹配。 检查之后,发现在geoJSON文件中,Bahnhofsviertel和Gutleutviertel的区域合并为一个区域。 因此,这2个地区行已合并到人口统计数据集中。 另外,包含变音符号(即ü,ä,ö)的德语字母也存在问题。 因此,包含这些字母的地区也根据geoJSON文件中相同名称中的字符进行了重命名。

方法 (Methodology)

Analytical Approach

分析方法

We shall first use k-means clustering to cluster the neighborhoods in Frankfurt. Frankfurt has 46 districts. We shall use the geocoder to get the geographical coordinates for each of these districts. We will use Foursquare API to explore the districts using their coordinates and get the most common venues in each district. Based on this information, we shall cluster the districts using k-means and take a look at each cluster. We need to look at clusters with a greater number of Asian and similar cuisine restaurants, as that indicates that there is demand for Asian cuisine in that cluster.

我们将首先使用k-means聚类对法兰克福的社区进行聚类。 法兰克福有46个区。 我们将使用地理编码器获取这些地区中每个地区的地理坐标。 我们将使用Foursquare API使用坐标来探索区域,并获取每个区域中最常见的场所。 基于此信息,我们将使用k均值对区域进行聚类,并查看每个聚类。 我们需要查看具有更多亚洲和类似美食餐厅的集群,因为这表明该集群中对亚洲美食有需求。

Then we shall use the demographics data to find the districts with a greater population and compare that with the cluster data. We shall find districts that have more Asian restaurants as well as a sizeable Asian population, as these will be ideal for opening a new Asian restaurant. Additionally, we shall also look at closeby districts with lesser Asian restaurants but a sizeable Asian population, as this is also a good prospect, due to less competition in the area.

然后,我们将使用人口统计数据查找人口较多的地区,并将其与聚类数据进行比较。 我们将找到拥有更多亚洲餐厅以及大量亚洲人口的地区,因为这些地区对于开设新的亚洲餐厅非常理想。 此外,我们还将关注亚洲餐馆较少但亚洲人口众多的附近地区,因为由于该地区竞争较少,这也是一个很好的前景。

Photo by oxana v on Unsplash
oxana v在Unsplash上的照片

The street directory dataset is scraped and sliced to ultimately obtain just a list of districts in Frankfurt am Main along with their postal codes.

街道目录数据集将被剪切和切片,最终仅可获得美因河畔法兰克福的地区列表以及其邮政编码。

We require the geographical coordinates of the districts to plot on a map using Folium. These are not readily available in the dataset. We obtain the latitude and longitude for each district using Geopy- geopy is a Python 2 and 3 client for several popular geocoding web services.

我们要求使用Folium在地图上绘制区域的地理坐标。 这些在数据集中并不容易获得。 我们使用Geopy获得每个地区的纬度和经度。geopy是Python 2和3客户端,用于几种流行的地理编码Web服务。

Geopy makes it easy for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources to get the data.

Geopy使Python开发人员可以使用第三方地理编码器和其他数据源轻松获取全球地址,城市,国家和地标的坐标,以获取数据。

Map of districts in Frankfurt am Main plotted using Folium
使用Folium绘制的美因河畔法兰克福地区地图

Next, the top 100 venues shall be fetched for each postal code. For this task, an API call to the Foursquare API is performed. The Foursquare API offers location data from all over the world for business purposes as well as for developers. The required format of the URL for performing an API call to the Foursquare API is displayed below. A developer only needs a free developer account.

接下来,应为每个邮政编码获取前100个场所。 对于此任务,执行对Foursquare API的API调用。 Foursquare API提供了来自世界各地的位置数据,用于商业目的以及开发人员。 下面显示了执行对Foursquare API的API调用所需的URL格式。 开发人员只需要一个免费的开发人员帐户。

Python code for making a call to the Foursquare API
用于调用Foursquare API的Python代码

The received venues are stored in a new dataframe. We check for the number of unique venue categories present in the data returned by Foursquare. It turns out there are 188 unique venue categories in Frankfurt.

接收到的场所将存储在新的数据框中。 我们检查Foursquare返回的数据中存在的唯一场所类别的数量。 事实证明,法兰克福有188个独特的场馆类别。

Next up, we need to prepare the data for the K-means clustering algorithm. It cannot work with textual data or more commonly known as categorical data. Hence we need to encode the data using one-hot encoding. The encoded data is then grouped by District name in order to have 1 row for each district. When the data gets grouped, the one-hot encoded categories get summed up if a venue category appears more than once within a district. In order to have values at the same scale and smaller than one, the mean of the frequency of occurrence of each category is calculated and stored.

接下来,我们需要为K-means聚类算法准备数据。 它不能与文本数据或更常用的分类数据一起使用。 因此,我们需要使用一键编码对数据进行编码。 然后按地区名称对编码数据进行分组,以便每个地区有1行。 对数据进行分组后,如果场所类别在一个区域中出现多次,则将对一键编码类别进行汇总。 为了使值具有相同的标度并且小于1,计算并存储每个类别的出现频率的平均值。

In order to get more insights into the data, the top 10 most common venues for each district are obtained and a separate dataframe is created to store these.

为了更深入地了解数据,获取了每个地区的前10个最常见的场所,并创建了一个单独的数据框来存储这些场所。

Dataframe containing top 10 most common venues for each district
数据框包含每个区的前10个最常见的场所

使用K均值聚类 (Clustering using K-means)

The one-hot encoded and grouped data is the input to the K-means algorithm and the number of clusters is set to five. We use the scikit-learn library for the K-means algorithm. The district column is dropped as it is textual data and we need to cluster using only the encoded values. The resulting cluster labels are then additionally stored in the data frame containing the ten most common venues for each district.

一键编码和分组的数据是K-means算法的输入,并且簇数设置为五个。 我们将scikit-learn库用于K-means算法。 区域列被删除,因为它是文本数据,因此我们只需要使用编码后的值进行聚类。 然后,将生成的聚类标签另外存储在包含每个地区十个最常见场所的数据框中。

Python code for K-means clustering
用于K均值聚类的Python代码
Dataframe containing the cluster labels along with the top 10 venues for each district
数据框包含群集标签以及每个区的前10个场所

The dataframe containing the cluster labels and top venues is then merged with the dataframe containing latitude and longitude as seen in image above. This data was then used to visualize the clusters on a map using Folium.

然后,将包含聚类标签和顶部地点的数据框与包含纬度和经度的数据框合并,如上图所示。 然后使用Folium将这些数据用于在地图上可视化群集。

Map of clustered districts — Frankfurt am Main
集聚区地图—美因河畔法兰克福

We then look at each cluster and based on the most common venues, we can name them and make decisions on which cluster is suitable for opening a new Asian restaurant.

然后,我们查看每个集群,并根据最常见的场所进行命名,并确定哪个集群适合开设新的亚洲餐厅。

观察结果 (Observations)

We observe that the purple and light green clusters contain the most districts and the most number of venues. While the light green cluster contains more restaurants, the purple cluster contains more hotels, which indicates tourists. We can see that a variety of cuisines are offered in the light green cluster, indicating that they cater to a variety of customers. Most of the districts are located close to the city center. These factors make this cluster the most eligible for opening a new Asian restaurant.

我们观察到紫色和浅绿色的群集包含最多的区域和最多的场所。 浅绿色的群集包含更多的餐厅,而紫色的群集包含更多的酒店,表示游客。 我们可以看到,浅绿色群集中提供了多种美食,表明它们可以满足各种客户的需求。 大多数地区都靠近市中心。 这些因素使该集群最有资格开设新的亚洲餐厅。

The purple cluster, on the other hand, although it does not contain many restaurants, has a lot of hotels and is pretty close to the city center. Presence of hotels indicates an influx of tourists, some of them Asian, meaning more prospective customers and if one finds a location not too far from the city center, an Asian restaurant here could flourish.

另一方面,紫色群集虽然没有很多餐厅,但拥有许多旅馆,并且非常靠近市中心。 旅馆的存在表明游客的涌入,其中一些是亚洲人,这意味着潜在的顾客更多,如果发现离市中心不远的地点,这里的亚洲餐馆可能会兴旺。

To know which district specifically would be perfect for opening an Asian restaurant, we look at the district-wise demographics of Frankfurt am Main, and then explore districts from both the light green and purple clusters.

要了解哪个区域最适合开设亚洲餐厅,我们先看一下美因河畔法兰克福的区域人口统计信息,然后从浅绿色和紫色群集中探索区域。

数据探索-法兰克福人口统计 (Data Exploration — Frankfurt demographics)

The demographics dataset contains district-wise distribution of population for the city of Frankfurt. It also contains useful data about the percentage of foreigners and specifically, population of various ethnicities in the districts. Only the required columns were picked from this dataset, which contained information about the total population of each district, population of foreigners, and so on. This dataset was then merged with the dataset containing the latitude and longitudes of the districts. The resulting dataset is as seen below.

人口统计数据集包含法兰克福市的区域人口分布。 它还包含有关外国人百分比,特别是各地区不同种族人口的有用数据。 从该数据集中仅选择了必需的列,其中包含有关每个地区的总人口,外国人的人口等信息。 然后将此数据集与包含地区纬度和经度的数据集合并。 结果数据集如下所示。

Frankfurt demographics data overview
法兰克福人口统计数据概述

使用Choropleth映射进行数据可视化 (Data visualization using Choropleth maps)

The data from the demographics dataset is then plotted on a Choropleth map to visualize the population distribution across the city of Frankfurt. This data will then be used to select districts based on the earlier clustering results to explore further.

然后,将人口统计数据集中的数据绘制在Choropleth地图上,以可视化法兰克福市的人口分布。 然后,将根据较早的聚类结果将这些数据用于选择地区,以进行进一步的探索。

District-wise population distribution — Frankfurt am Main
地区人口分布—美因河畔法兰克福

From this map, we observe that the central districts have the highest populations in Frankfurt, along with the district of Flughafen on the outskirts.

从这张地图中,我们观察到法兰克福以及法兰克福郊区的Flughafen地区人口最多。

Next, we take a look at the distribution of Asian and Australian population in Frankfurt.

接下来,我们来看看法兰克福的亚洲和澳大利亚人口分布。

District-wise distribution of Asian and Australian population — Frankfurt am Main
亚洲和澳大利亚人口的地区分布—美因河畔法兰克福

We can see from the above maps, that the districts of Bockenheim and Gallus have the highest population of Asians and Australians. Out of these, Bockenheim comes under the light green cluster, and Gallus comes under the purple cluster. These 2 neighborhoods are then explored to find out the number of Asian or similar cuisine restaurants in these districts.

从上面的地图我们可以看到,博肯海姆和盖洛斯地区的亚洲人和澳大利亚人数量最多。 其中,博肯海姆位于浅绿色的星团之下,而盖洛斯位于紫色的星团之下。 然后探索这两个街区,以找出这些地区中亚洲或类似餐厅的数量。

  1. Bockenheim

    博肯海姆

Asian or similar cuisine restaurants in Bockenheim
博肯海姆亚洲风味餐厅

2. Gallus

2.捷拉斯

Asian or similar cuisine restaurants in Gallus
加卢斯亚洲料理或类似餐厅

3. Niederrad

3.尼德拉德

Asian or similar cuisine restaurants in Niederrad
尼德拉德亚洲风味餐厅

结果和讨论 (Results and Discussion)

By clustering the districts in Frankfurt and subsequently analyzing the district-wise demographics of the city, and then merging the two findings, we could arrive at 3 prospective neighborhoods that would be ideal for opening an Asian restaurant in the city.

通过将法兰克福的各个区域进行聚类,然后分析该城市的区域人口统计资料,然后合并这两个发现,我们可以得出3个潜在的社区,这对于在该城市开设亚洲餐厅非常理想。

1. Bockenheim:

1.博肯海姆:

Bockenheim falls in the light green cluster and is very close to the city center. It has 7 Asian restaurants which shows that there is a lot of demand for Asian cuisine in the area. It also has the highest population of Asians in the city at 1586.

博肯海姆(Bockenheim)落在浅绿色的集群中,非常靠近市中心。 它拥有7家亚洲餐厅,这表明该地区对亚洲美食的需求很大。 1586年,该市也是亚洲人口最多的城市。

2. Gallus:

2.捷拉斯:

Gallus is in the purple cluster containing a greater number of hotels. It is not far from the city center and has 5 Asian restaurants indicating that there is demand here as well. It has the second-highest population of Asians in the city at 1512. Hence, this seems like a better option than Bockenheim for opening an Asian restaurant owing to lesser competition, similar Asian population, and more prospective customers in the form of tourists.

捷拉斯位于包含大量酒店的紫色集群中。 它距离市中心不远,有5家亚洲餐厅,表明这里也有需求。 在1512年,它是该市第二大亚裔人口。因此,这似乎比博肯海姆(Bockenheim)开设亚洲餐馆更好的选择,原因是竞争较少,亚洲人口相似,并且游客形式更趋于潜在客户。

3. Niederrad:

3.尼德拉德:

Niederrad is also in the purple cluster having more hotels. It is also not far from the city center but has only 1 Asian restaurant — much less than both Bockenheim and Gallus. Niederrad also has a sizeable Asian population at 929, although a bit less than the other 2 districts in contention. Since it is in the purple cluster, we can expect more tourists in this district. We see that there are 3 hotels in the area. This translates to more prospective customers. Hence, this also seems like a good alternative to Gallus owing to much lesser competition, proximity to the city center, and more tourists.

尼德拉德(Niederrad)也在紫色集群中,拥有更多的酒店。 它也离市中心不远,但是只有1家亚洲餐厅-比Bockenheim和Gallus都少得多。 尼德拉德(Niederrad)在929年的亚洲人口也相当可观,尽管在争夺中比其他两个地区要少一些。 由于它位于紫色集群中,因此我们可以期望这个地区有更多游客。 我们发现该地区有3家酒店。 这转化为更多潜在客户。 因此,由于竞争少,靠近市中心且游客多,这似乎是捷拉斯的一个不错的选择。

结论: (Conclusion:)

The neighborhoods in Frankfurt am Main were clustered and displayed on a map containing the results. The demographics were studied and based on the findings, 3 districts were found to be ideal as a solution to the Business problem of opening an Asian restaurant. The client can choose any of the 3 neighborhoods to open an Asian restaurant, based on their preferences, confidence, and affinity to risk-taking.

美因河畔法兰克福的社区被聚类并显示在包含结果的地图上。 研究了人口统计信息,并根据调查结果,发现了3个地区是解决开设亚洲餐厅的业务问题的理想选择。 客户可以根据自己的喜好,信心和对冒险的意愿,选择3个街区中的任何一个开设亚洲餐厅。

翻译自: https://medium.com/swlh/clustering-neighborhoods-in-frankfurt-am-main-using-k-means-bb805545fd00


http://www.taodudu.cc/news/show-995055.html

相关文章:

  • 因果关系和相关关系 大数据_数据科学中的相关性与因果关系
  • 分类结果可视化python_可视化分类结果的另一种方法
  • rstudio 管道符号_R中的管道指南
  • 时间序列因果关系_分析具有因果关系的时间序列干预:货币波动
  • 无法从套接字中获取更多数据_数据科学中应引起更多关注的一个组成部分
  • 深度学习数据更换背景_开始学习数据科学的最佳方法是了解其背景
  • 数据中台是下一代大数据_全栈数据科学:下一代数据科学家群体
  • 泰坦尼克数据集预测分析_探索性数据分析-泰坦尼克号数据集案例研究(第二部分)
  • 大数据技术 学习之旅_如何开始您的数据科学之旅?
  • 搜索引擎优化学习原理_如何使用数据科学原理来改善您的搜索引擎优化工作
  • 一件登录facebook_我从Facebook的R教学中学到的6件事
  • python 图表_使用Streamlit-Python将动画图表添加到仪表板
  • Lockdown Wheelie项目
  • 实现klib_使用klib加速数据清理和预处理
  • 简明易懂的c#入门指南_统计假设检验的简明指南
  • python 工具箱_Python交易工具箱:通过指标子图增强图表
  • python交互式和文件式_使用Python创建和自动化交互式仪表盘
  • 无向图g的邻接矩阵一定是_矩阵是图
  • 熊猫分发_熊猫新手:第一部分
  • 队列的链式存储结构及其实现_了解队列数据结构及其实现
  • 水文分析提取河网_基于图的河网段地理信息分析排序算法
  • python 交互式流程图_使用Python创建漂亮的交互式和弦图
  • 最接近原点的 k 个点_第K个最接近原点的位置
  • 熊猫分发_熊猫新手:第二部分
  • 数据分析 绩效_如何在绩效改善中使用数据分析
  • 您一直在寻找5+个简单的一线工具来提升Python可视化效果
  • 产品观念:更好的捕鼠器_故事很重要:为什么您需要成为更好的讲故事的人
  • 面向Tableau开发人员的Python简要介绍(第2部分)
  • netflix_Netflix的计算因果推论
  • 高斯金字塔 拉普拉斯金字塔_金字塔学入门指南

使用K-Means对美因河畔法兰克福的社区进行聚类相关推荐

  1. OpenCV的k - means聚类 -对图片进行颜色量化

    OpenCV的k - means聚类 目标 学习使用cv2.kmeans()数据聚类函数OpenCV 理解参数 输入参数 样品:它应该的np.float32数据类型,每个特性应该被放在一个单独的列. ...

  2. OpenCV官方文档 理解k - means聚类

    理解k - means聚类 目标 在这一章中,我们将了解k - means聚类的概念,它是如何工作等. 理论 我们将这个处理是常用的一个例子. t恤尺寸问题 考虑一个公司要发布一个新模型的t恤. 显然 ...

  3. kmeans改进 matlab,基于距离函数的改进k―means 算法

    摘要:聚类算法在自然科学和和社会科学中都有很普遍的应用,而K-means算法是聚类算法中经典的划分方法之一.但如果数据集内相邻的簇之间离散度相差较大,或者是属性分布区间相差较大,则算法的聚类效果十分有 ...

  4. 文献记录(part89)--I-k-means-+:An iterative clustering algorithm based on an enhanced k -means

    学习笔记,仅供参考,有错必究 关键词:k均值:解决方案改进:准确的k均值:迭代改进 I-k-means-+:An iterative clustering algorithm based on an ...

  5. K means 图片压缩

    k-means的基本原理较为清晰,这里不多赘述,本次博客主要通过基础的k means算法进行图像的压缩处理. 原理分析 在彩色图像中,每个像素的大小为3字节(RGB),可以表示的颜色总数为256 * ...

  6. 为了联盟还是为了部落 | K means

    1. 问题 人类有个很有趣的现象,一群人在一起,过一段时间就会自发的形成一个个的小团体.好像我们很擅长寻找和自己气质接近的同类.其实不只是人类,数据也有类似情况,这就是聚类(Clustering)的意 ...

  7. k均值聚类算法(K Means)及其实战案例

    算法说明 K均值聚类算法其实就是根据距离来看属性,近朱者赤近墨者黑.其中K表示要聚类的数量,就是说样本要被划分成几个类别.而均值则是因为需要求得每个类别的中心点,比如一维样本的中心点一般就是求这些样本 ...

  8. k means聚类算法_一文读懂K-means聚类算法

    1.引言 什么是聚类?我们通常说,机器学习任务可以分为两类,一类是监督学习,一类是无监督学习.监督学习:训练集有明确标签,监督学习就是寻找问题(又称输入.特征.自变量)与标签(又称输出.目标.因变量) ...

  9. simple k means

    //选择初始的k个质点 for (int j = initInstances.numInstances() - 1; j >= 0; j--) { instIndex = RandomO.nex ...

最新文章

  1. 矩形脉冲信号的频域分析_矩形周期脉冲信号MATLAB实现
  2. 花 1 个月收入购买一份保险之后,我的一点碎碎念!
  3. C# :socket 通讯基础使用实例
  4. Sublime Text 3 、WebStorm配置实时刷新
  5. InceptionNet V2整理总结
  6. 利用SQL移动硬盘文件(转于zjcxc)
  7. cilium插件测试_Cilium网络概述
  8. (转)在NGUI使用图片文字(数字、美术字)(直接可用于UILable)
  9. c语言队列的作用,C语言队列
  10. c++实现tailf命令
  11. window下postgresql启动服务失败 Access is denied
  12. VFP_全面控制EXCEL
  13. Android只播放gif动画
  14. RHCE(三、四)NTP时间服务器、SSH远程加密登录
  15. Python | 人脸识别系统 — 人脸比对 代码部分
  16. 啤酒车间平面布置图、水厂平面布置图、厂房设备布置图、污水厂管道布置图、乳品厂平面布置图、水果罐头工厂厂区总平面布置图、煤矿开采工作面综合布置图、日产500吨石灰窑CAD工艺布置图……各种布置图汇总
  17. Qt源码解析-源码解析-QVideoWidget播放手机视频旋转问题
  18. ACLSCO链路介绍
  19. 为什么需要api产品经理
  20. 前田约翰《简单法则》十条

热门文章

  1. BZOJ3884上帝与集合的正确用法-欧拉函数
  2. yfan.qiu linux硬链接与软链接
  3. 关于memcpy和memmove两函数的区别
  4. 资深Android开发带你入门Framework,再不刷题就晚了!
  5. SVN Cannot merge into a working copy that has local modifications
  6. linux 单用户密码修改
  7. counter 计数器
  8. java/j2ee中文问题终极解决之道
  9. 汇总:一些不错的使用频率比较高的JS函数
  10. 从未有过的空闲学校生活