Manchester United are one of the hottest football team in the world, and among the top three when it comes to popularity. Coming out of last season, the club needs reinforcements in probably all departments.

曼联是全球最炙手可热的足球队之一,在人气方面排名前三。 上赛季结束后,俱乐部可能需要在所有部门进行增援。

Manchester United is in pursuit to acquire the service of Jadon Sancho. But, considering the pace of talks it is highly unlikely the club will sign Sancho this summer.

曼联正在寻求获得Jadon Sancho的服务。 但是,考虑到谈判的节奏,俱乐部极不可能在今年夏天签下桑乔。

Perhaps it is time for Manchester United to move on to other targets. The new player must be a fast, reliable shooter and should have the ability to find the teammate in the goal.

也许现在是曼联迈向其他目标的时候了。 新球员必须是快速,可靠的射手,并且应该能够找到进球的队友。

I have looked at the players who have been linked to Manchester United in the past. The question is, which one is the best replacement for Jadon Sancho? I have done a comparative analysis between the following players and try to find out the answer to our question.

我看了过去与曼联有联系的球员。 问题是,Jadon Sancho的最佳替代者是哪一个? 我对以下参与者进行了比较分析,并试图找出问题的答案。

Players ~Adama TraoreDybalaKingsley ComanJadon Sancho

球员〜阿达玛·特拉奥(Adama Traore)迪巴拉·金斯利·科曼(Jadon Sancho)

My goal with this case study is to understand the importance of descriptive statistics.


We need to compare the players based on their performance from last season.Quick stats on the number of goals and assists they have scored, I have just considered the League and European cup matches.


Adama Traore — G4, A9, M37, PS% 75.60Dybala — G14, A8, M41, PS% 88.09Kingsley Coman — G7, A4, M33, PS% 84Jadon Sancho — G19, A18, M40, PS% 83.96

Adama Traore-G4,A9,M37,PS%75.60Dybala-G14,A8,M41,PS%88.09金斯利·科曼-G7,A4,M33,PS%84Jadon Sancho-G19,A18,M40,PS%83.96

*G-Goals, A-Assists, M-Matches played, PS%-Pass Success percentage

* G目标,A助攻,M比赛,PS%合格成功率

Each of the three players (Adama Traore, Dybala, Kingsley Coman) has a mean, median, and mode rating of 7. Averages offer you only a one-dimensional view of your data. They tell you what the center of your data is, but that’s it. While this can be useful, it’s often not enough.

这三个参与者(Adama Traore,Dybala和Kingsley Coman)中的每一个均值,中位数和众数等级为7。平均值只能为您提供数据的一维视图。 他们告诉您数据的中心是什么,仅此而已。 尽管这可能有用,但通常还不够。

Each player has the same average rating, but there are clear differences between each data set. We need some other way of summarizing the data in addition to the average.

每个玩家的平均评分相同,但每个数据集之间存在明显差异。 除了平均值外,我们还需要其他一些汇总数据的方法。

Here, frequency tells us the number of matches where the player got each rating.

Each player’s ratings are distributed differently, and if we can measure how the ratings are dispersed, we will be able to make a more informed decision. We can easily do this by calculating the range. The range tells us over how many numbers the data extends, to find the range, we take the largest number in the data set and then subtract the smallest.

每个玩家的评分分布不同,如果我们可以衡量评分的分散程度,我们将能够做出更明智的决定。 我们可以通过计算范围轻松地做到这一点。 范围告诉我们数据扩展了多少个数字,要找到范围,我们取数据集中的最大数字,然后减去最小的数字。

Range~Adama Traore — 4Dybala — 3Kingsley Coman — 3Jadon Sancho — 4

Range〜Adama Traore — 4Dybala — 3 Kingsley Coman — 3Jadon Sancho — 4

The range only uses the smallest and the largest number in a set; The rest of the values are ignored. That could lead to a misleading result. The range can measure how far the values are spread out, but it’s difficult to get a real picture of how the data is distributed.

该范围仅使用集合中的最小和最大数字; 其余的值将被忽略。 这可能会导致误导性的结果。 该范围可以衡量这些值的分布范围,但是很难获得有关数据分布情况的真实图片。

The main problem with the range is that, by definition, it includes outliers. If data has outliers, the range will include them, even though there may be only one or two extreme values. Excluding the outliers with the interquartile range means that we now have a way of comparing different sets of data without our results being distorted by outliers.

根据定义,范围的主要问题是它包含异常值。 如果数据有异常值,即使可能只有一个或两个极值,范围也会包括这些异常值。 用四分位数范围排除异常值意味着我们现在可以比较不同的数据集,而我们的结果不会因异常值而失真。

IQR ~Adama Traore — 1Dybala — 1Kingsley Coman — 1Jadon Sancho — 1.25

IQR〜Adama Traore — 1Dybala — 1 Kingsley Coman — 1Jadon Sancho — 1.25

Our original problem with the range was that it’s extremely sensitive to outliers. To get around this, we divided the data into quarters, and we used the interquartile range to provide us with a cut-down range of the data.

我们对范围的最初问题是它对异常值非常敏感。 为了解决这个问题,我们将数据划分为四分之一,并使用四分位间距为我们提供了数据的缩小范围。

We need to find the player whose ratings vary the least.


How can we more accurately measure variability? One way of achieving this is to look at how far away each value is from the mean, i.e., variance. The variance is a way of measuring spread, and it’s the average of the distance of values from the mean squared.

我们如何才能更准确地测量可变性? 实现此目的的一种方法是查看每个值与平均值(即方差)的距离。 方差是一种衡量价差的方法,它是值与均值平方距离的平均值。

Statisticians use the variance a lot as a means of measuring the spread of data. The problem with the variance is that it can be quite difficult to think about the spread in terms of distances squared. The standard deviation is expressed in the same units as the mean is, whereas the variance is expressed in squared units.

统计人员大量使用方差作为衡量数据传播的一种手段。 方差的问题在于,就距离平方而言,很难考虑扩散。 标准偏差以与平均值相同的单位表示,而方差以平方单位表示。

S.D ~Adama Traore — 0.93Dybala — 0.96Kingsley Coman — 0.82Jadon Sancho — 1.15

SD〜阿达玛·特雷雷(Adama Traore)-0.93迪巴拉(Dybala)-0.96金斯利·科曼(Kingsley Coman)-0.82贾东·桑乔(Jadon Sancho)-1.15



