相似邻里算法

IBM Data Science Capstone Project

IBM Data Science Capstone项目

分析和可视化与服装店投资者的要求有关的纽约市结构 (Analyzing and visualizing the structure of New York City in relation to the requirements of a Clothing Store Investor)

介绍 (Introduction)

I have taken the time in my semester break to acquire more skills related to data science by completing the IBM Data Science Professional Certificate course on Coursera. The last module is a capstone project and the highlight of the course, it includes the greatest learning effect by applying the learned skills on an individual real life problem. As you will see from the business problem part, I decided to add some more complexity to the standard course assignment .

我在学期中花了时间通过完成Coursera上的IBM数据科学专业证书课程来学习与数据科学相关的更多技能。 最后一个模块是一个顶峰项目,也是课程的重点,它通过将学到的技能应用到单个现实生活问题上,包括最大的学习效果。 从业务问题部分您将看到,我决定为标准课程分配增加一些复杂性。

I present here the summery of my project and explain different methods briefly. If you are interested in the extended code check out my jupyter notebook.

我在这里介绍我的项目概述,并简要说明不同的方法。 如果您对扩展代码感兴趣,请查看我的jupyter 笔记本

业务问题 (Business Problem)

The project is based on a hypothetical business case. A Canadian Investor who recently made a fortune with an investment in a Clothing Store in Toronto wants to repeat his idea in New York City.

该项目基于一个假设的业务案例。 一位最近在多伦多的一家服装店投资赚钱的加拿大投资者想在纽约市重复他的想法。

  1. As his brand is exclusive and expensive the location should be one of the most crowded districts with high employment rate and above average income. He prefers not only tourists to buy in the store he would also like to gain many regular customers.由于他的品牌专有且价格昂贵,因此该地区应该是就业率最高且收入高于平均水平的最拥挤的地区之一。 他不仅喜欢游客在商店购物,而且还希望吸引许多固定顾客。
  2. Due to the origin of his brand has a touch of Italian design he prefers a location close to Italian restaurants on the basis of window shopping and the chance that people who go for Italian food also have a sympathy for Italian fashion is pretty high.由于他的品牌的起源具有意大利风格,因此他更喜欢在橱窗购物的基础上靠近意大利餐厅,而且去意大利美食的人也很喜欢意大利时尚。
  3. Tourists and business traveler are well known for spending money generously, therefore the criteria to be as close as possible to hotels is highly important, because guests of the city hotels are more likely to buy clothes nearby and guarantee for more walk-in customers.游客和商务旅行者以大笔消费而闻名,因此,尽可能接近酒店的标准非常重要,因为城市酒店的客人更有可能在附近买衣服,并保证有更多的客源顾客。
  4. As close to the city Center or other touristic hotspots to benefit from walk-in customers. Approximately 20 Minutes walking distance to the Center of the district. If possible far away from other clothing stores.距市中心或其他旅游热点不远,因此可从步行游客中受益。 到市区中心约20分钟步行路程。 如果可能,请远离其他服装店。
  5. The Investor wishes to invest in a flat in New York City to be nearby the store. By the reason to live close to the store he has the following criteria to his place of residence: low crime rate, high community trust, close to parks, theatres and art galleries.投资者希望在商店附近的纽约市投资一个公寓。 由于居住在商店附近的原因,他对居住地有以下标准:犯罪率低,社区信任度高,靠近公园,剧院和美术馆。

The Investor first wants a macro overview of New York City. So we are exploring the community districts.

投资者首先要对纽约市进行宏观概览。 因此,我们正在探索社区区。

1.业务问题理解 (1. Business Problem Understanding)

The Project seems very clear, find the perfect district for an Italian brand clothing store, taking into account the location should be suitable to his imaginations of the perfect place of residence, where you feel safe at the same time.

该项目看起来非常清晰,为意大利品牌服装店找到理想的地区,同时考虑到该位置应适合他对理想住所的想象,同时又让您感到安全。

2.分析方法 (2. Analytical Approach)

The core of the project will be the socio-economic data frame. Complementary we build a venues data frame fetched from foursquare and explore these venues. The final venues frame will contain the most common venues of each district, which we will get through one hot encoding. This data frame is the basis for the k-means algorithm to cluster the districts by their features to compare similarity between these districts.

该项目的核心将是社会经济数据框架。 作为补充,我们构建了一个从foursquare获取的场所数据框架,并探索了这些场所。 最终的场所框架将包含每个地区最常见的场所,我们将通过一种热门编码进行了解。 该数据帧是k-means算法根据区域特征对区域进行聚类以比较这些区域之间相似度的基础。

features weighted matrix
特征加权矩阵

For the best result the analytical solution to the business problem is to quantify and evaluate the thoughts of the client to full fill his requirements completely. For evaluating his criteria, we will create a features weighted matrix to express the investors desires in a scientific way, which we will multiply with the normalized final data frame to add the extra column with the weighted results, which gives us an indication of the best districts.

为了获得最佳结果,业务问题的分析解决方案是量化和评估客户的想法,以完全满足其需求。 为了评估他的标准,我们将创建一个特征加权矩阵 ,以科学的方式表达投资者的需求,然后将其与标准化的最终数据框相乘,以添加带有加权结果的额外列,从而为我们指明最佳地区。

3.数据要求和收集 (3. Data requirements and collection)

To ensure the best location for the store I decided to add some more complexity to the standard course problem. As you can see from the criteria given by the investor we need some more data.

为了确保商店的最佳位置,我决定为标准课程问题增加一些复杂性。 从投资者给出的标准可以看出,我们需要更多数据。

In the beginning of the Project I found data from many different data sources, but decided to get the data mainly from cccnewyork.org by the reason that the source of their Data is the U.S. Census Bureau and the data was fetched by the American Community Survey https://data.census.gov/. So we can be sure the data is up to date, consistent and reliable.

在项目开始时,我从许多不同的数据源中找到了数据,但决定主要从cccnewyork.org获取数据,原因是其数据源是美国人口普查局,并且该数据是由美国社区调查局获取的。 https://data.census.gov/ 。 因此,我们可以确保数据是最新的,一致且可靠的。

  • the socio-economic data will be obtained from various csv files from cccnewyork.org

    社会经济数据将从cccnewyork.org的各种csv文件中获得

  • the venues will be fetched from Foursquare through an API

    场地将通过API从Foursquare获取

  • the Geo-coordinates will be obtained with nominatim and geopy

    地理坐标将通过nominatimgeopy获得

4.数据理解与准备 (4. Data understanding and preparing)

First of all, we will build a clean socio-economic data frame with all the necessary information which are related to the business problem. Therefore, we need to load all the files and drop all unnecessary columns and rows.

首先,我们将建立一个干净的社会经济数据框架,其中包含与业务问题相关的所有必要信息。 因此,我们需要加载所有文件并删除所有不必要的列和行。

final socio economic data frame
最终的社会经济数据框架

使用Geocoder获取经度和纬度 (Getting latitudes and longitudes with geocoder)

As mentioned before geocoder is a great tool to get the latitude and longitude. If you query is too large you can use the rate limiter for fetching larger amounts of queries successful:

如前所述,Geocoder是获取纬度经度的好工具 如果查询太大,则可以使用速率限制器成功获取大量查询:

from geopy.extra.rate_limiter import RateLimitergeocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)D_lat = []D_long = []for district in socio_test["Area"]:

    location = geolocator.geocode(district)    if location:        d_latitude = location.latitude        d_longitude = location.longitude        D_lat.append(d_latitude)        D_long.append(d_longitude)    else:         print(district)#Append to dfsocio["D_lat"] = D_latsocio["D_long"] = D_long

用叶映射 (Mapping with Folium)

Folium is a great package to make beautiful maps. We will use it for a general overview of the districts of New York City to get familiar with the structure of the City and for interactive choropleth maps.

Folium是制作精美地图的理想选择。 我们将使用它来大致了解纽约市各区,以熟悉纽约市的结构以及交互式的十字形图。

map_NY_Nsimple = folium.Map(location=[40.730610, -73.935242], zoom_start=10)# for each Community District add a marker to mapfor lat, long, district in zip(socio[‘D_lat’], socio[‘D_long’], socio[‘Area’]): label = ‘{}’.format(district) label = folium.Popup(label, parse_html=True) folium.CircleMarker( [lat, long], radius=15, popup=label, color=’#3186cc’, fill=True, fill_color=’#3186cc’, fill_opacity=0.7, parse_html=False).add_to(map_NY_Nsimple) 

map_NY_Nsimple
simple map of the 59 community districts New York City
纽约市59个社区区域的简单地图

By adding chloropleth layers for each columnm, the map gets more interactive and informative. Simply add a layer for each column similar to the code below.

通过为每个columnm添加chloropleth层,地图将变得更具交互性和信息量。 只需为每列添加一层,类似于下面的代码。

income = map_NY_N.choropleth(    geo_data=nyc_geo,    data=socio,    columns=['boro_cd', 'Income'],    key_on='feature.properties.boro_cd',    fill_color='OrRd',     fill_opacity=0.7,     line_opacity=0.2,    legend_name='income',    smooth_factor=0,    name = "income",    highlight = True)
choropleth heatmap with checkbox for visualizing different columns
带复选框的Choropleth热图,用于可视化不同的列

With the Layer Control you get the checkbox for selecting the different data.

使用“ 图层控件”,您将获得用于选择其他数据的复选框。

folium.LayerControl(collapsed=False).add_to(map_NY_N)

使用Foursquare获取场地数据 (Getting the venue data with Foursquare)

With Foursquare we can get up to 100 venues for each district, which is great for a free service. We will fetch the data and create a venues data frame, the pandas build in method .get_dummies lets us easily use the one hot encoding process to quantify the venues. After grouping the frame by the districts and calculating the mean value we can compare the different districts perfectly. In the jupyter Notebook you can comprehend the venue exploring detailed. But what is one hot encoding again?

使用Foursquare,我们可以为每个地区提供多达100个场所,这是免费服务的绝佳选择。 我们将获取数据并创建一个场所数据框架,熊猫内置方法.get_dummies使我们可以轻松地使用一个热编码过程来量化场所。 在按区域对框架进行分组并计算平均值之后,我们可以完美地比较不同的区域。 在jupyter Notebook中,您可以详细了解场地。 但是又是什么一种热编码?

One Hot Encoding is a process in the data processing that is applied to categorical data, to convert it into a binary vector representation for use in machine learning algorithms

一种热编码是数据处理中的一种过程,用于分类数据,将其转换为二进制矢量表示形式,供机器学习算法使用

One-Hot Encoding simply creates one column for every possible value and put a 1 or 0 in the appropriate column.

一键编码仅为每个可能的值创建一列,然后在适当的列中放入1或0。

one hot encoded heatmap data frame with mean values
一个带有平均值的热编码热图数据帧

最常见的场所 (The most common venues)

For the comparison of the districts we would like to create a table with a function which gives us the most common venues of each district. We can use this function later to explore the different cluster by their venues.

为了比较各区,我们希望创建一个带有功能的表格,该功能为我们提供了每个区最常见的场所。 稍后我们可以使用此功能,根据其场所探索不同的集群。

data frame of the most common venues by community districts
社区地区最常见场馆的数据框

目标场地的热图 (Heat-map of the target venues)

Related to the requirements of the customer we will have a closer look at the distribution of the Clothing Stores, Boutiques and Italien Restaurants in the City.

根据客户的需求,我们将更详细地了解城市中服装店,精品店和意大利餐厅的分布。

Distribution of Clothing Stores, Boutiques and Italien Restaurants in the Community Districts of New York City
纽约市社区地区的服装店,精品店和意大利餐厅的分布

5.分析与建模 (5. Analysing and Modelling)

You can find the detailed code here

您可以找到详细的代码 这里

This project has a need for data analysing through data exploring we will only use a simple classification algorithm but the main part is not about a machine learning model. We will use the k-means clustering followed by more data exploring and visualisation to expand our feeling for the data and understanding of the city.

这个项目需要通过数据探索进行数据分析,我们将仅使用简单的分类算法,但主要部分不涉及机器学习模型。 我们将使用k-means聚类,然后进行更多数据探索和可视化,以扩大我们对城市数据和理解的感觉。

K-means is a method that aims to partition n data points into k clusters where each data point is assigned to the cluster with the nearest mean. The goal is to minimize the sum of all squared distances within a cluster.

K均值是一种旨在将n个数据点划分为k个簇的方法,其中每个数据点均以最接近的均值分配给该簇。 目的是最小化群集内所有平方距离的总和。

To find the perfect number of cluster the most common approach is the elbow method. Therefore we run the algorithm multiple times and then plotting the related score.

要找到理想的簇数,最常用的方法是弯头方法。 因此,我们多次运行该算法,然后绘制相关分数。

The elbow method for determining number of clusters
确定簇数的肘法

As you can see the elbow method is sometimes not very conclusive. But there are numerous different methods to determine the best number of clusters. The second method I used ist the Silhouette coefficient.

如您所见,肘部方法有时不是很确定。 但是,有许多不同的方法可以确定最佳群集数。 我使用的第二种方法是Silhouette系数。

The Silhouette coefficient is calculated using the mean intra-cluster distance and the mean nearest-cluster distance for each sample. For each point p, first find the average distance between p and all other points in the same cluster this is a measure of cohesion (A). Then find the average distance between p and all points in the nearest cluster, this is a measure of separation from the closest other cluster (B). The silhouette coefficient for p is defined as the difference between B and A (B-A) divided by the greater of the two (max(A,B))

使用每个样本的平均集群内距离和平均最近集群距离来计算Silhouette系数。 对于每个点p,首先找到p和同一簇中所有其他点之间的平均距离,这是内聚力(A)的量度。 然后找到p与最近的群集中所有点之间的平均距离,这是与最近的其他群集(B)分离的量度。 p的轮廓系数定义为B和A之间的差(BA)除以两者中的最大值(max(A,B))

The Silhouette coefficient method for determining number of clusters
确定簇数的轮廓系数法

There are numerous quantitative methods of evaluating clustering results, you will see by using them as tools with the full understanding of the limitations the combination of contrasting methods rises the quality of your choice, if you be aware of actually examine the results, kind of a human inspection and making a determination based on an understanding of what the data represents, what a cluster represents, and what the clustering is intended to achieve, you will find the perfect number of clusters.

评估聚类结果的定量方法很多,如果您将其用作工具并充分了解其局限性,那么如果您知道实际检查结果的话,对比方法的组合可以提高选择的质量。通过对数据代表什么,集群代表什么以及将要实现的集群的理解进行人工检查和确定,您将找到理想数量的集群。

This is the clustered map of each Community District by the venue structure and similarity.

这是根据会场结构和相似性划分的每个社区区的聚类地图。

map of each Community District by the venue structure and similarity.
每个社区区的地图由会场结构和相似性组成。

分析投资者要求 (Analyzing the Investor requirements)

where solving the Business Problem begins

解决业务问题的起点

The clustered map above includes all venues we have fetched from Foursquare including the irrelevant venues, except the socio economic data. For the quality of the result it is important to deal only with relevant features, which have an impact on the decision of the Investor. Beginning from this part we will deploy the recently mentioned features weight matrix.

上面的群集地图包括我们从Foursquare获取的所有场所,包括不相关的场所,但社会经济数据除外。 对于结果的质量,重要的是仅处理相关特征,这些特征会影响投资者的决策。 从这一部分开始,我们将部署最近提到的功能权重矩阵。

Initially we prepare and merge the data frames to include only the necessary columns.

最初,我们准备并合并数据帧以仅包括必要的列。

data frame of all customer requirements before feature scaling
功能扩展之前所有客户需求的数据框

For the next part Feature Scaling is very important.

对于下一部分, 功能缩放非常重要。

Feature scaling is a technique to change the values of columns in the dataset to use a common scale, without losing information or distorting the differences in the ranges of the values. This can be achieved through Normalization and Standardization

特征缩放是一种将数据集中的列的值更改为使用公共比例的技术,而不会丢失信息或扭曲值范围内的差异。 这可以通过标准化和标准化来实现

Normalization is a scaling technique which rescales the features so that the data will fall in the range of [0,1] to bring them to a comparable grade.

规范化是一种缩放技术,可以对特征进行重新缩放,以使数据落入[0,1]的范围内,以使它们达到可比的等级。

Standardization is a scaling technique which rescales the features the way they range between [-1,1] by the properties of a standard normal distribution with the mean μ=0 and the standard deviation, σ=1, where μ is the average and σ is the standard deviation from the average.

标准化是一种缩放技术,通过标准均值μ= 0标准偏差σ= 1,其中μ是平均值,而σ是与平均值的标准偏差。

So after the normalization and setting the index on the Area our data frame looks like the following:

因此,在标准化并在Area上设置索引后,我们的数据框如下所示:

data frame of all customer requirements after feature scaling
功能扩展后所有客户需求的数据框

Now we can multiply the features weight matrix and calculate the total score column, with some simple visualisation the data frame looks pretty informative.

现在,我们可以将特征权重矩阵相乘并计算总分列,通过一些简单的可视化,数据框看起来很有用。

heat map data frame based on the feature scaled frame multiplied by the weighted matrix
基于特征缩放帧乘以加权矩阵的热图数据帧

After applying the k-means method featuring this data frame (dropping the total score column) won’t get a visualization of the best districts numerically, but it shows us which districts are similar in accordance to the investor requirements. We will repeat the same process as mentioned before, finding the perfect number of clusters with the two method previously explained.

在应用了具有该数据框的k-means方法(删除总分列)后,将无法从数字上直观地看到最佳地区,但可以根据投资者的需求向我们显示哪些地区相似。 我们将重复前面提到的相同过程,并使用前面介绍的两种方法找到理想的簇数。

NYC map showing the clusters based on the scaled and weighted data frame
NYC地图显示基于缩放和加权数据框的聚类

Red cluster 0 ist the medium level cluster the total mean of the features is mediocre. The mean total score is 1.39, but it’s notable that it includes 5 of the top scored districts, especially South Beach and Tottenville, which are located in Staten Island. There are also 3 high ranked districts from Manhatten included. The rest of the cluster is moderate.

红色簇0和中等水平簇的特征总平均值中等。 平均总分是1.39,但值得注意的是,它包括5个得分最高的地区,尤其是位于史坦顿岛的南海滩和托特维尔。 曼哈顿还包括3个排名较高的地区。 集群的其余部分为中等。

The purple Cluster 1 is the high ranked Cluster it consists of only 2 districts with an median total score of 1.97. The districts of this cluster Battery Park and Midtown Business District scoring with a high occurrence of hotels but low population.

紫色的聚类1是排名最高的聚类,它仅由2个区域组成,中位数总分为1.97。 这个集群的炮台公园和中城商业区的地区得分很高,旅馆的发生率很高,但人口很少。

The blue Cluster 2 is the substandard faction, with a median total score of 0.67 and except of the population, trust and parks the mean values are very low.

蓝色的簇2是不合标准的派系,平均总得分为0.67,除了人口,信任和公园之外,平均值非常低。

6.评估 (6. Evaluation)

As you may see presenting the customer a clustered map is not a result, which is a good foundation for finding the perfect location of the Store. But the weighted heatmap is great to work with. We are going to explore this data frame further. Presenting a map with the total scores has much more information for the decision of the customer, combining this map with the choropleth map of the socio economic data is superb to visualize the data frames interactive.

您可能会看到,向客户展示集群地图并不是结果,这是找到商店理想位置的良好基础。 但是加权热图非常有用。 我们将进一步探索该数据框架。 呈现具有总分的地图可为客户决策提供更多信息,将该地图与社会经济数据的choropleth地图相结合可以很好地可视化交互式数据框架。

heat map data frame of the top 15 community districts
前15个社区区域的热点地图数据框

可视化总成绩 (Visualize the total score)

chloropleth map of the total scores of the community districts
社区地区总得分的chloropleth地图
choropleth heatmap with checkbox for visualizing different columns
带复选框的Choropleth热图,用于可视化不同的列

Bar plots of the top 15 districts

前15个区的条形图

Supportive to the maps are the following bar plots of the sorted top 15 results, to get a contrasting view of the data.

下面的条形图对地图的前15个结果进行了排序,以支持这些地图,以形成对比的数据视图。

bar plots of each column of the top 15 districts
前15个地区的每列的条形图
bar plots of each column of the top 15 districts
前15个地区的每列的条形图
bar plot of top 15 community districts sorted by total score
按总分排序的前15个社区区的条形图

7.结果讨论 (7. Discussion of the Result)

Our analysis shows that there are several promising districts for the store. Especially South Beach, Upper West Side and Battery Park are high rated. The Distribution of Clothing Stores was the highest in St.George and of Boutiques in Midtown Business District.

我们的分析表明,这家商店有几个很有希望的地区。 尤其是南海滩上西区炮台公园很高。 在圣乔治和中城商业区的精品店中,服装店的分布最高。

As you can see from the map the Cluster 0 (red) is the medium cluster for the requirements of the Customer, it is a pretty big cluster and includes some of the best scored districts. The purple Cluster is mostly located in Manhatten and consist of only 2 high ranked districts. The blue cluster should be ignored.

从地图上可以看到,集群0( 红色 )是满足客户需求的中型集群,它是一个很大的集群,其中包括一些得分最高的地区。 紫色星团大部分位于曼哈顿,仅由2个排名较高的地区组成。 的 蓝色簇应忽略。

South Beach located in Staten Island gained the highest score. There is a high frequency of Italian restaurants and the factor that it is a good place to live with a low Crime Rate compensates the medium socio economic data. Choosing this location could mean that the Store will profit from regular customers but there won’t be as many tourists and walk in customers as in Manhatten.

位于史泰登岛的南海滩获得最高分。 意大利餐馆的出现频率很高,并且这是一个犯罪率低的好地方,这弥补了中等社会经济数据的不足。 选择这个位置可能意味着该商店将从常规客户那里获利,但是不会有像曼哈顿那样多的游客和步行者。

Upper Westside scores with high income, population and parks but there are no hotels directly in the district, which could lead to less touristic customers. On the oter hand the Central Park is close by, which is a touristic hotspot. But probably more touristic than South Beach. On the other hand this is a place where a lot of wealthy people live and the store could benefit from regular customers. It could be a great place to live if the customer prefers to live right in the city Center. The proximity to the Central Park a touristic hot spot could maybe compensate the lack of hotels in the relation of touristic customers.

上西区的收入,人口和公园都很高,但该地区没有直接的酒店,这可能会减少游客人数。 另一方面,中央公园就在附近,这是一个旅游热点。 但可能比南滩更具旅游意义。 另一方面,这里是许多有钱人居住的地方,商店可以从普通顾客中受益。 如果客户更喜欢住在市中心,那将是一个理想的居住地。 靠近中央公园的旅游热点也许可以弥补旅游客户关系方面缺乏酒店的问题。

The Battery Park is a touristic hotspot in New York even though the low population it is in the top 3 districts and got the highest income score. The few People who can afford to live in the top of Manhatten have a high income furthermore there are lots of hotels located in and around the area which guarantees for a great mix of tourists and regular customers.

炮台公园是纽约的旅游热点,尽管人口稀少,是前三名地区,并且收入最高。 少数有能力在曼哈顿上山生活的人收入较高,此外,该地区及周边地区还有许多酒店,这些酒店保证了游客和固定客户的大量融合。

Tottenville the 4. place is in Staten Island too and has the lowest crime and the lowest unemployment rate of the top 15. The trust score is also one of the highest. Furthermore it has a high overall score and is pretty similar to South Beach.

排名第四的托特维尔也在史坦顿岛,犯罪率最低,前15名的失业率最低。信任度也是最高的之一。 此外,它的总体得分很高,与南海滩非常相似。

要做出一个主要决定: (There is one main decision to make:)

Manhatten or Staten Island

曼哈顿或史泰登岛

8.结论 (8. Conclusion)

Purpose of this project was to identify districts which fits best to the diverse requirements of the customer. By evaluating and quantifying his imaginations with the weighted matrix it was possible to identify several districts which combines his requirements for the location of the store and personal living wishes.

该项目的目的是确定最适合客户多样化需求的地区。 通过使用加权矩阵评估和量化他的想象力,可以确定几个区域,这些区域结合了他对商店位置和个人生活愿望的要求。

For finding the perfect location we now have to go deeper and analyse the top 10 to 15 districts more detailed. We could compare specific neighbourhoods and add more detailed data like tourism frequency to finally find the perfect neighbourhood or even the best street for the store.

为了找到理想的位置,我们现在必须更深入地分析更详细的前10至15个地区。 我们可以比较特定的社区,并添加更多详细数据(例如旅游频率),以最终找到商店的理想社区甚至最佳街道。

翻译自: https://medium.com/@patrich94/new-york-city-the-battle-of-the-neighbourhoods-73d10987cb

相似邻里算法


http://www.taodudu.cc/news/show-994938.html

相关文章:

  • 数据透视表和数据交叉表_数据透视表的数据提取
  • 图像处理傅里叶变换图像变化_傅里叶变换和图像床单视图。
  • 滞后分析rstudio_使用RStudio进行A / B测试分析
  • unity3d 可视化编程_R编程系列:R中的3D可视化
  • python 数据科学 包_什么时候应该使用哪个Python数据科学软件包?
  • 熊猫tv新功能介绍_您应该知道的4种熊猫绘图功能
  • vs显示堆栈数据分析_什么是“数据分析堆栈”?
  • 广告投手_测量投手隐藏自己的音高的程度
  • python bokeh_提升视觉效果:使用Python和Bokeh制作交互式地图
  • nosql_探索NoSQL系列
  • python中api_通过Python中的API查找相关的工作技能
  • 欺诈行为识别_使用R(编程)识别欺诈性的招聘广告
  • nlp gpt论文_GPT-3:NLP镇的最新动态
  • 基于plotly数据可视化_[Plotly + Datashader]可视化大型地理空间数据集
  • 划痕实验 迁移面积自动统计_从Jupyter迁移到合作实验室
  • 数据开放 数据集_除开放式清洗之外:叙述是开放数据门户的未来吗?
  • 它们是什么以及为什么我们不需要它们
  • 机器学习 啤酒数据集_啤酒数据集上的神经网络
  • nasa数据库cm1数据集_获取下一个地理项目的NASA数据
  • r语言处理数据集编码_在强调编码语言或工具之前,请学习这3个基本数据概念
  • 数据迁移测试_自动化数据迁移测试
  • 使用TensorFlow概率预测航空乘客人数
  • 程序员 sql面试_非程序员SQL使用指南
  • r a/b 测试_R中的A / B测试
  • 工作10年厌倦写代码_厌倦了数据质量讨论?
  • 最佳子集aic选择_AutoML的起源:最佳子集选择
  • 管道过滤模式 大数据_大数据管道配方
  • 用户体验可视化指南pdf_R中增强可视化的初学者指南
  • sql横着连接起来sql_SQL联接的简要介绍(到目前为止)
  • 如何击败Python的问题

相似邻里算法_纽约市-邻里之战相关推荐

  1. JS数据结构与算法_链表

    上一篇:JS数据结构与算法_栈&队列 下一篇:JS数据结构与算法_集合&字典 写在前面 说明:JS数据结构与算法 系列文章的代码和示例均可在此找到 上一篇博客发布以后,仅几天的时间竟然 ...

  2. 深度学习算法和机器学习算法_啊哈! 4种流行的机器学习算法的片刻

    深度学习算法和机器学习算法 Most people are either in two camps: 大多数人都在两个营地中: I don't understand these machine lea ...

  3. 算法_深度LSTM笔记[博]

    原创博客链接:算法_深度LSTM笔记 本文适合有一定基础同学的复习使用,不适合小白入门,入门参考本文参考文献第一篇 结构_静态综合图 结构_分步动图 进一步,向量化参数和引入问题 1, cell 的状 ...

  4. LeetCode_初级算法_数组

    LeetCode|初级算法_数组 题目如下: 1.1 从排序数组中删除重复项 给定一个排序数组,你需要在原地删除重复出现的元素,使得每个元素只出现一次,返回移除后数组的新长度. 不要使用额外的数组空间 ...

  5. 数据挖掘经典十大算法_条件熵、信息增益介绍

    数据挖掘经典算法_条件熵.信息增益介绍 一.条件熵 首先与信息熵进行类比,从信息熵的计算公式中我们知道它是考虑到该随机变量的所有可能取值,即所有可能发生事件所带来信息量的期望.由此顾名思义 条件熵的定 ...

  6. cb32a_c++_STL_算法_查找算法_(5)adjacent_find

    cb32a_c++_STL_算法_查找算法_(5)adjacent_find adjacent_find(b,e),b,begin(),e,end() adjacent_find(b,e,p),p-p ...

  7. Python_机器学习_算法_第4章_4.决策树算法

    Python_机器学习_算法_第4章_4.决策树算法 文章目录 Python_机器学习_算法_第4章_4.决策树算法 决策树算法 学习目标 4.1 决策树算法简介 学习目标 小结 4.2 决策树分类原 ...

  8. 典型的Top K算法_找出一个数组里面前K个最大数

    原文 典型的Top K算法_找出一个数组里面前K个最大数...或找出1亿个浮点数中最大的10000个...一个文本文件,找出前10个经常出现的词,但这次文件比较长,说是上亿行或十亿行,总之无法一次读入 ...

  9. cb46a_c++_STL_算法_逆转和旋转reverse_rotate_函数advance

    cb46a_c++_STL_算法_逆转和旋转reverse_rotate STL算法--变序性算法 reverse() 逆转 reverse_copy()一边复制一般逆转 rotate()旋转,某个位 ...

最新文章

  1. c语言怎么创建一个h文件,求助C语言大佬 , 只会写到一个.c文件里 ,不会用.h头文件...
  2. C# JS URL 中文传参出现乱码的解决方法
  3. vs发生生成错误_S7报告系统错误:
  4. 源代码编译MYSQL5.6 环境下的MHA+KEEPALIVED的安装和维护
  5. 如果你有15M 你会投到那些项目上上面呢?机会-可行性模型帮助你
  6. hive能加快MySQL查询速度吗_建立索引可以加快表中数据查询的速度吗
  7. 对前端Jenkins自动化部署的研究
  8. Python Email发送,通知业务完成
  9. 前端学习(1670):前端系列实战课程之核心运动原理
  10. 结对开发:电梯调度(2)
  11. 面向内网无纸化会议/智慧教室/实时同屏,组播还是RTMP?
  12. What is Freeview Play
  13. 同步异步、阻塞非阻塞
  14. c语言 计算机 只用if,C语言if语句的用法
  15. 【盘点】2014年跨境进口型电商十大模式
  16. 手机邮箱怎么弄_手机邮箱设置
  17. 简述XSS攻击及其防范措施
  18. swifty项目中pod的使用
  19. 深度学习从入门到精通——图像分割之DeepLab系列算法
  20. 在线jsp版泰囧表情的生成器代码,不用PS也能恶搞。

热门文章

  1. c++中的queue容器
  2. Linux 内核网络协议栈 ------sk_buff 结构体 以及 完全解释 (2.6.16)
  3. Linux自有服务(2)-Linux从入门到精通第六天(非原创)
  4. Java Statement PK PrepareStatement
  5. Facade(外观模式)
  6. NeHe OpenGL教程 第三十七课:卡通映射
  7. Oracle【IT实验室】数据库备份与恢复之二:SQL*Loader
  8. 关于meta便签详解
  9. jQuery实现鼠标划过展示大图的方法
  10. mysql benchmark基准测试