搜索引擎优化学习原理_如何使用数据科学原理来改善您的搜索引擎优化工作

搜索引擎优化学习原理

Search Engine Optimisation (SEO) is the discipline of using knowledge gained around how search engines work to build websites and publish content that can be found on search engines by the right people at the right time.

搜索引擎优化(SEO)是一门学科，它使用有关搜索引擎如何工作的知识来构建网站和发布内容，这些内容可以由合适的人在正确的时间在搜索引擎上找到。

Some people say that you don’t really need SEO and they take a Field of Dreams ‘build it and they shall come’ approach. The size of the SEO industry is predicted to be $80 billion by the end of 2020. There are at least some people who like to hedge their bets.

有人说您真的不需要SEO，而他们却选择了“梦想之场 ”来构建它，然后他们就会来。到2020年底，SEO行业的规模预计将达到800亿美元。至少有些人喜欢对冲自己的赌注。

An often-quoted statistic is that Google’s ranking algorithm contains more than 200 factors for ranking web pages and SEO is often seen as an ‘arms race’ between its practitioners and the search engines. With people looking for the next ‘big thing’ and putting themselves into tribes (white hat, black hat and grey hat).

经常被引用的统计数据是Google的排名算法包含200多个用于对网页进行排名的因素，而SEO通常被视为其从业者与搜索引擎之间的“军备竞赛”。人们正在寻找下一个“大事情”，并将自己纳入部落( 白帽子，黑帽子和灰帽子 )。

There is a huge amount of data generated by SEO activity and its plethora of tools. For context, the industry-standard crawling tool Screaming Frog has 26 different reports filled with web page metrics on things you wouldn’t even think are important (but are). That is a lot of data to munge and find interesting insights from.

SEO活动及其大量工具生成了大量数据。就上下文而言，行业标准的爬网工具Screaming Frog有26种不同的报告，其中包含关于您甚至不认为很重要(但很重要)的内容的网页指标。需要大量的数据来进行整理并从中找到有趣的见解。

The SEO mindset also lends itself well to the data science ideal of munging data and using statistics and algorithms to derive insights and tell stories. SEO practitioners have been pouring over all of this data for 2 decades trying to figure out the next best thing to do and to demonstrate value to clients.

SEO的思维方式也非常适合数据科学的理想，即处理数据并使用统计数据和算法来获得见解和讲故事。 SEO从业人员已经倾注了所有这些数据长达20年之久，试图找出下一步要做的事情，并向客户展示价值。

Despite access to all of this data, there is still a lot of guesswork in SEO and while some people and agencies test different ideas to see what performs well, a lot of the time it comes down to the opinion of the person with the best track record and overall experience on the team.

尽管可以访问所有这些数据，但SEO仍然存在很多猜测，尽管有些人和机构测试不同的想法以查看效果良好，但很多时候却取决于最佳跟踪者的意见。记录和团队的整体经验。

I’ve found myself in this position a lot in my career and this is something I would like to address now that I have acquired some data science skills of my own. In this article, I will point you to some resources that will allow you to take more data-led approach to your SEO efforts.

在我的职业生涯中，我经常担任这个职位，这是我现在要解决的问题，因为我已经掌握了一些数据科学技能。在本文中，我将为您指出一些资源，这些资源将使您可以采用更多以数据为主导的方法来进行SEO。

SEO测试 (SEO Testing)

One of the most often asked questions in SEO is ‘We’ve implemented these changes on a client’s webaite, but did they have an effect?’. This often leads to the idea that if the website traffic went up ‘it worked’ and if the traffic went down it was ‘seasonality’. That is hardly a rigorous approach.

SEO中最常被问到的问题之一是“我们已经在客户的Webaite上实施了这些更改，但是它们有效果吗？”。这通常导致这样的想法：如果网站流量上升，则“正常”，如果流量下降，则为“季节性”。那不是严格的方法。

A better approach is to put some maths and statistics behind it and analyse it with a data science approach. A lot of the maths and statistics behind data science concepts can be difficult, but luckily there are a lot of tools out there that can help and I would like to introduce one that was made by Google called Causal Impact.

更好的方法是将一些数学和统计信息放在后面，并使用数据科学方法进行分析。数据科学概念背后的许多数学和统计数据可能很困难，但是幸运的是，那里有很多工具可以提供帮助，我想介绍一下由Google制造的名为因果影响的工具。

The Causal Impact package was originally an R package, however, there is a Python version if that is your poison and that is what I will be going through in this post. To install it in your Python environment using Pipenv, use the command:

因果影响包最初是R包，但是，如果有毒，那就有一个Python版本，这就是我将在本文中介绍的内容。要使用Pipenv在Python环境中安装它，请使用以下命令：

pipenv install pycausalimpact

If you want to learn more about Pipenv, see a post I wrote on it here, otherwise, Pip will work just fine too:

如果您想了解有关Pipenv的更多信息，请参阅我在此处写的一篇文章，否则，Pip也可以正常工作：

pip install pycausalimpact

什么是因果影响？ (What is Causal Impact?)

Causal Impact is a library that is used to make predictions on time-series data (such as web traffic) in the event of an ‘intervention’ which can be something like campaign activity, a new product launch or an SEO optimisation that has been put in place.

因果影响是一个库，用于在发生“干预”时对时间序列数据(例如网络流量)进行预测，该干预可以是诸如活动活动，新产品发布或已经进行的SEO优化之类的事情。到位。

You supply two-time series as data to the tool, one time series could be clicks over time for the part of a website that experienced the intervention. The other time series acts as a control and in this example that would be clicks over time for a part of the website that didn’t experience the intervention.

您向工具提供了两个时间序列作为数据，一个时间序列可能是随着时间的流逝而发生的涉及网站干预的部分。其他时间序列用作控制，在此示例中，将是一段时间内未经历干预的网站的点击次数。

You also supply a data to the tool when the intervention took place and what it does is it trains a model on the data called a Bayesian structural time series model. This model uses the control group as a baseline to try and build a prediction about what the intervention group would have looked like if the intervention hadn’t taken place.

您还可以在发生干预时向工具提供数据，它所做的是在数据上训练一个称为贝叶斯结构时间序列模型的模型。该模型以对照组为基准，以尝试建立关于如果未进行干预的情况下干预组的状况的预测。

The original paper on the maths behind it is here, however, I recommend watching this video below by a guy at Google, which is far more accessible:

关于它背后的数学原理的原始文章在这里，但是，我建议下面由Google的一个人观看此视频，该视频更容易获得：

在Python中实现因果影响 (Implementing Causal Impact in Python)

After installing the library into your environment as outlined above, using Causal Impact with Python is pretty straightforward, as can be seen in the notebook below by Paul Shapiro:

在如上所述将库安装到您的环境中之后，将因果影响与Python结合使用非常简单，如Paul Shapiro在下面的笔记本中所示：

Causal Impact with Python

Python的因果影响

After pulling in a CSV with the control group data, intervention group data and defining the pre/post periods you can train the model by calling:

在输入包含控制组数据，干预组数据的CSV并定义前后期间后，您可以通过调用以下方法来训练模型：

ci = CausalImpact(data[data.columns[1:3]], pre_period, post_period)

This will train the model and run the predictions. If you run the command:

这将训练模型并运行预测。如果运行命令：

ci.plot()

You will get a chart that looks like this:

您将获得一个如下所示的图表：

Output after training the Causal Impact Model

You have three panels here, the first panel showing the intervention group and the prediction of what would have happened without the intervention.

您在此处有三个面板，第一个面板显示干预组，并预测没有干预的情况。

The second panel shows the pointwise effect, which means the difference between what happened and the prediction made by the model.

第二个面板显示了逐点效应，这意味着发生的事情与模型所做的预测之间的差异。

The final panel shows the cumulative effect of the intervention as predicted by the model.

最后一个面板显示了模型所预测的干预措施的累积效果。

Another useful command to know is:

另一个有用的命令是：

print(ci.summary('report'))

This prints out a full report that is human readable and ideal for summarising and dropping into client slides:

这将打印出一份完整的报告，该报告易于阅读，是汇总和放入客户端幻灯片的理想选择：

选择一个对照组 (Selecting a control group)

The best way to build your control group is to pick pages which aren’t affected by the intervention at random using a method called stratified random sampling.

建立对照组的最佳方式是使用一种称为分层随机抽样的方法随机选择不受干预影响的页面。

Etsy has done a post on how they’ve used Causal Impact for SEO split testing and they recommend using this method. Random stratified sampling is as the name implies where you pick from the population at random to build the sample. However if what we’re sampling is segmented in some way, we try and maintain the same proportions in the sample as in the population for these segments:

Etsy发表了一篇关于他们如何将因果影响用于SEO拆分测试的文章，他们建议使用此方法。顾名思义，随机分层抽样是您从总体中随机选择以构建样本的地方。但是，如果以某种方式对样本进行了细分，则我们将尝试在样本中保持与这些细分中的总体相同的比例：

An ideal way to segment web pages for stratified sampling is to use sessions as a metric. If you load your page data into Pandas as a data frame, you can use a lambda function to label each page:

细分网页以进行分层抽样的理想方法是使用会话作为指标。如果将页面数据作为数据框加载到Pandas中，则可以使用lambda函数标记每个页面：

df["label"] = df["Sessions"].apply(lambda x:"Less than 50" if x<=50 else ("Less than 100" if x<=100 else ("Less than 500" if x<=500 else ("Less than 1000" if x<=1000 else ("Less than 5000" if x<=5000 else "Greater than 5000")))))

From there, you can use test_train_split in sklearn to build your control and test groups:

从那里，您可以在sklearn中使用test_train_split来构建您的控制和测试组：

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(selectedPages["URL"],selectedPages["label"], test_size=0.01, stratify=selectedPages["label"])

Note that stratify is set and if you have a list of pages you want to test already then your sample pages should equal the number of pages you want to test. Also, the more pages you have in your sample, the better the model will be. If you use too few pages, the less accurate the model will be.

请注意，已设置分层，并且如果您已经有要测试的页面列表，则示例页面应等于要测试的页面数。另外，样本中的页面越多，模型越好。如果使用的页面太少，则模型的准确性将降低。

It is is worth noting that JC Chouinard gives a good background on how to do all of this in Python using a method similar to Etsy:

值得注意的是，JC Chouinard为如何使用类似于Etsy的方法在Python中完成所有这些操作提供了良好的背景知识：

结论 (Conclusion)

There are a couple of different use cases that you could use this type of testing. The first would be to test ongoing improvements using split testing and this is similar to the approach that Etsy uses above.

您可以使用几种类型的测试来使用这种类型的测试。首先是使用拆分测试来测试正在进行的改进，这与Etsy上面使用的方法类似。

The second would be to test an improvement that was made on-site as part of ongoing work. This is similar to an approach outlined in this post, however with this approach you need to ensure your sample size is sufficiently large otherwise your predictions will be very inaccurate. So please do bear that in mind.

第二个是测试正在进行的工作中在现场进行的改进。这类似于在此列出的方法后，但是这种方法，你需要确保你的样本规模足够大，否则你的预测将是非常不准确的。因此，请记住这一点。

Both ways are valid ways of doing SEO testing, with the former being a type of A/B split test for ongoing optimisation and the latter being an test for something that has already been implemented.

两种方法都是进行SEO测试的有效方法，前一种是用于进行持续优化的A / B拆分测试，而后一种是针对已经实施的测试。

I hope this has given you some insight into how to apply data science principles to your SEO efforts. Do read around these interesting topics and try and come up with other ways to use this library to validate your efforts. If you need background on the Python used in this post I recommend this course.

我希望这使您对如何将数据科学原理应用于SEO有所了解。请阅读这些有趣的主题，并尝试使用其他方法来使用此库来验证您的工作。如果您需要本文中使用的Python的背景知识，我建议您学习本课程。

翻译自: https://towardsdatascience.com/how-to-use-data-science-principles-to-improve-your-search-engine-optimisation-efforts-927712ed0b12

搜索引擎优化学习原理

查看全文

http://www.taodudu.cc/news/show-995045.html

一件登录facebook_我从Facebook的R教学中学到的6件事
python 图表_使用Streamlit-Python将动画图表添加到仪表板
Lockdown Wheelie项目
实现klib_使用klib加速数据清理和预处理
简明易懂的c#入门指南_统计假设检验的简明指南
python 工具箱_Python交易工具箱：通过指标子图增强图表
python交互式和文件式_使用Python创建和自动化交互式仪表盘
无向图g的邻接矩阵一定是_矩阵是图
熊猫分发_熊猫新手：第一部分
队列的链式存储结构及其实现_了解队列数据结构及其实现
水文分析提取河网_基于图的河网段地理信息分析排序算法
python 交互式流程图_使用Python创建漂亮的交互式和弦图
最接近原点的 k 个点_第K个最接近原点的位置
熊猫分发_熊猫新手：第二部分
数据分析绩效_如何在绩效改善中使用数据分析
您一直在寻找5+个简单的一线工具来提升Python可视化效果
产品观念：更好的捕鼠器_故事很重要：为什么您需要成为更好的讲故事的人
面向Tableau开发人员的Python简要介绍（第2部分）
netflix_Netflix的计算因果推论
高斯金字塔拉普拉斯金字塔_金字塔学入门指南
语言认知偏差_我们的认知偏差正在破坏患者的结果数据
python中定义数据结构_Python中的数据结构。
plotly django_使用Plotly为Django HTML页面进行漂亮的可视化
软件工程方法学要素含义_日期时间数据的要素工程
数据湖 data lake_在Data Lake中高效更新TB级数据的模式
ai对话机器人实现方案_显然地引入了AI —无代码机器学习解决方案
图片中的暖色或冷色滤色片是否会带来更多点击？ —机器学习A / B测试
图卷积节点分类_在节点分类任务上训练图卷积网络
回归分析预测_使用回归分析预测心脏病。
aws spark_使用Spark构建AWS数据湖时的一些问题以及如何处理这些问题