使用机器学习预测天气

by Flavio H. Freitas

Flavio H.Freitas着

如何使用机器学习根据文章标题预测喜欢和分享 (How to predict likes and shares based on your article’s title using Machine Learning)

Choosing a good title for an article is an important step in the writing process. The more interesting the title seems, the higher the chance a reader will interact with the whole thing. Furthermore, showing the user content they prefer (to interact with) increases the user’s satisfaction.

为文章选择一个好的标题是写作过程中的重要一步。 标题似乎越有趣,读者与整个事物进行交互的机会就越高。 此外,显示他们喜欢(与之交互)的用户内容可以提高用户的满意度。

This is how my final project from the Machine Learning Engineer Nanodegree specialization started. I just finished it, and I feel so proud and happy ? that I wanted to share with you some insights I’ve had about the whole flow. Also, I promised Quincy Larson this article when I finished the project.

这就是我来自机器学习工程师纳米学位专业的最终项目的开始方式。 我刚完成,就感到如此自豪和幸福 ? 我想与您分享我对整个流程的一些见解。 另外,我在完成项目时向Q uincy Larson承诺了这篇文章。

If you want to see the final technical document click here. If you want the implementation of the code, check it out here or fork my project on GitHub. If you just want an overview using layperson’s terms, this is the right place — continue reading this article.

如果要查看最终技术文档, 请单击此处 。 如果您想执行代码,请在此处查看或在GitHub上分叉我的项目。 如果您只想使用通俗易懂的术语进行概述,那么这里是正确的地方-继续阅读本文。

Some of the most used platforms to spread ideas nowadays are Twitter and Medium (you are here!). On Twitter, articles are normally posted including external URLs and the title, where users can access the article and demonstrate their satisfaction with a like or a retweet of the original post.

如今,用于传播思想的一些最常用的平台是Twitter和Medium(您在这里!)。 在Twitter上,通常会发布包含外部URL和标题的文章,用户可以在其中访问文章并通过对原始帖子的赞或转发来表明其满意。

Medium shows the full text with tags (to classify the article) and claps (similar to Twitter’s likes) to show how much the users appreciate the content. A correlation between these two platforms can bring us valuable information.

中号显示带有标签(对文章进行分类)和拍手(类似于Twitter的赞)的全文,以显示用户对内容的欣赏程度。 这两个平台之间的关联可以为我们带来有价值的信息。

该项目 (The project)

The problem that I defined was a classification task using supervised learning: Predict the number of likes and retweets an article receives based on the title.

我定义的问题是使用监督学习的分类任务: 根据标题预测文章收到的喜欢和转发的次数。

Correlating the number of likes and retweets from Twitter with a Medium article is an attempt to isolate the effect of the number of reached readers and the number of Medium claps. Because the more the article is shared on different platforms, the more readers it will reach and the more Medium claps it will (likely) receive.

将来自Twitter的点赞和转发的次数与“中型”文章相关联,是一种尝试将达到的读者数量和“中型”拍手数量的影响分开的尝试。 由于在不同平台上分享的文章越多,读者就会越多,并且(可能)会收到更多的中奖。

Using only the Twitter statistic, we’d expect that the articles reached initially almost the same number of readers (those readers being the followers of the freeCodeCamp account on Twitter). Their performance and interactions, therefore, would be limited to the characteristics of the tweet — for example, the title of the article. And that is exactly what we want to measure.

我们仅使用Twitter统计信息,就可以预期文章最初吸引的读者人数几乎相同(这些读者是Twitter上freeCodeCamp帐户的追随者)。 因此,它们的性能和交互作用将仅限于该推文的特性,例如,文章标题。 而这正是我们要衡量的。

I chose the freeCodeCamp account for this project because the idea was to limit the scope of the subject of the articles and better predict the response on a specific field. The same title can perform well in one category (e.g. Technology), but not necessarily in a different one (e.g. Culinary). Also, this account posts the title of the original article and the URL on Medium as the tweet content.

我之所以选择该项目的freeCodeCamp帐户 ,是因为其想法是限制文章主题的范围,并更好地预测特定领域的响应。 同一标题在一个类别(例如技术)中可以表现良好,但不一定在另一个类别(例如烹饪)中表现良好。 另外,此帐户将原始文章的标题和URL张贴在Medium上作为推文内容。

数据看起来如何? (How does the data look?)

The first step of this project was to get the information from Twitter and Medium and then correlate it. The dataset can be found here and it has 711 data points. This is how the dataset looks like:

该项目的第一步是从Twitter和Medium获取信息,然后将其关联。 数据集可以在这里找到,它具有711个数据点。 数据集如下所示:

分析和学习数据 (Analyzing and learning with the data)

After analyzing the dataset and plotting some graphics, I found interesting information about it. For these analyses, the outliers were removed, and I just considered the 25% top performers for each feature (retweet, like, and clap).

在分析数据集并绘制一些图形之后,我发现了有关它的有趣信息。 对于这些分析, 离群值被删除了,我只是考虑了每个功能(转推,喜欢和鼓掌)中表现最好25%

So let’s take a look at what the numbers say for freeCodeCamp articles written on Medium and shared on Twitter.

因此,让我们看一下这些数字对在Medium上写并在Twitter上共享的freeCodeCamp文章的含义。

好的标题长度是多少? (What is a good title length?)

Writing titles that have a length greater than 50 and less than 110 characters helps to increase the chances of a successful article.

撰写长度超过50个字符且少于110个字符的标题有助于增加文章成功的机会。

标题中有多少个单词? (What is a good number of words in the title?)

The most effective number of words in the title is 9 to 17. To optimize the number of retweets and likes, try something from 9 to 18 words, and for claps from 7 to 17.

标题中最有效的单词数是9到17 。 要优化转发和点赞的次数,请尝试输入9到18个单词,拍手范围为7到17个单词。

哪些类别最适合标记? (Which are the best categories to tag?)

Programming, Tech, Technology, JavaScript and Web Development are categories you should consider when tagging your next article. They appear for all the three features as a good indicator.

编程技术技术JavaScriptWeb开发是标记下一篇文章时应考虑的类别。 对于所有这三个功能,它们都可以作为一个很好的指示。

最好使用哪些单词? (Which are the best words to use?)

In this lexical analysis, you’ll notice that some words get much more attention on the freeCodeCamp community than others. If the intention is to make the articles reach further in numbers, talking about JavaScript, React or CSS will increase how much it’s appreciated. Using the words “learn” or “guide” to describe will also make the probability higher.

在此词法分析中,您会注意到,在FreeCodeCamp社区中,某些单词比其他单词受到更多关注。 如果希望使文章的数量更多,那么谈论JavaScript,React或CSS将会增加它的赞赏程度。 使用“学习”或“指南”一词来描述也将使概率更高。

使用机器学习 (Using Machine Learning)

OK! After taking a look at the data and extracting some information from it, the goal was to create a Machine Learning model that makes predictions of the number of retweets, likes, and claps based on the title of the article.

好! 在查看了数据并从中提取了一些信息之后,目标是创建一个机器学习模型,该模型根据文章的标题来预测转发,喜欢和拍手的数量。

Predicting the number of retweets, likes, and claps of an article can be treated as a classification problem, and that is a common task of machine learning (ML). But for this, we need to use the output as discrete values (a range of numbers). The input will be the title of the articles with each word as a token (t1, t2, t3, … tn), the title length, and the number of words in the title.

预测文章的转发,喜欢和拍手的数量可以视为分类问题,而这是机器学习(ML)的常见任务。 但是为此,我们需要将输出用作离散值(数字范围)。 输入将是文章的标题,每个单词作为标记(t1,t2,t3,…tn),标题长度和标题中的单词数。

The ranges for our features are:

我们的功能范围是:

  • Retweets: 0–10, 10–30, 30+转推:0-10、10-30、30 +
  • Likes: 0–25, 25–60, 60+喜欢:0–25、25–60、60 +
  • Claps: 0–50, 50–400, 400+拍手:0–50、50–400、400 +

And finally, after preprocessing our dataset and evaluating some models (everything fully described here), we reached the conclusion that the MultinomialNB model performed better for retweets reaching an accuracy of 60.6%. Logistic regression reached 55.3% for likes and 49% for claps.

最后,在对数据集进行预处理并评估了一些模型( 此处已全面描述)后,我们得出的结论是,MultinomialNB模型对转发的性能更好,达到60.6%的准确性。 对喜欢的Logistic回归达到55.3%,对拍手的Logistic回归达到49%。

As an experiment for this article, I ran the prediction of the title of this article and the model predicted that:

作为本文的实验,我对本文标题进行了预测,该模型预测:

It will have 10–30 retweets and 25–60 favorites on Twitter and 400+ claps on Medium.

在Twitter上将有10–30条转发和25–60条收藏夹,在Medium将有400多个拍手。

How is this prediction? ?

这个预测如何? ?

Follow me if you want to read more of my articles ? And if you enjoyed this article, be sure to like it give me a lot of claps — it means the world to the writer.

如果您想我的文章,请 关注我而且,如果您喜欢这篇文章,请确保喜欢它给了我很多鼓掌-这对作家来说意味着世界。

Flávio H. de Freitas is an Entrepreneur, Engineer, Tech lover, Dreamer and Traveler. Has worked as CTO in Brazil, Silicon Valley and Europe.

FlávioH. de Freitas是一位企业家,工程师,技术爱好者,梦想家和旅行者。 曾在巴西硅谷和欧洲担任首席技术官

翻译自: https://www.freecodecamp.org/news/how-to-predict-likes-and-shares-based-on-your-articles-title-using-machine-learning-47f98f0612ea/

使用机器学习预测天气

使用机器学习预测天气_如何使用机器学习根据文章标题预测喜欢和分享相关推荐

  1. 使用机器学习预测天气_如何使用机器学习预测着陆

    使用机器学习预测天气 Based on every NFL play from 2009–2017 根据2009-2017年每场NFL比赛 Ah, yes. The times, they are c ...

  2. 使用机器学习预测天气_使用机器学习的二手车价格预测

    使用机器学习预测天气 You can reach all Python scripts relative to this on my GitHub page. If you are intereste ...

  3. 使用机器学习预测天气_使用机器学习来预测患者是否会再次入院

    使用机器学习预测天气 We are in a age where machines are utilizing huge data and trying to create a better worl ...

  4. python模型预测足球_采用 Python 机器学习预测足球比赛结果!买谁赢就谁赢!

    采用 Python 机器学习预测足球比赛结果 足球是世界上最火爆的运动之一,世界杯期间也往往是球迷们最亢奋的时刻.比赛狂欢季除了炸出了熬夜看球的铁杆粉丝,也让足球竞猜也成了大家茶余饭后最热衷的话题.甚 ...

  5. #时间预测算法_改进的智慧交通系统出行时间预测算法

    引用 Chowdhury N K, Leung C K S. Improved travel time prediction algorithms for intelligent transporta ...

  6. python生成文章标题_利用简书首页文章标题数据生成词云

    原标题:利用简书首页文章标题数据生成词云 感谢关注天善智能,走好数据之路↑↑↑ 欢迎关注天善智能,我们是专注于商业智能BI,人工智能AI,大数据分析与挖掘领域的垂直社区,学习,问答.求职一站式搞定! ...

  7. 机器学习管道模型_使用连续机器学习来运行您的ml管道

    机器学习管道模型 Vaithy NarayananVaithy Narayanan Follow跟随 Jul 15 7月15 使用连续机器学习来运行ML管道 (Using Continuous Mac ...

  8. 机器学习 伪标签_伪英语—机器学习打字练习

    机器学习 伪标签 Articles in this series:1. Introduction2. Pseudo-English (You are here)3. Keyboard Input (C ...

  9. 使用python预测基金_使用python先知3 1创建预测

    使用python预测基金 This tutorial was created to democratize data science for business users (i.e., minimiz ...

最新文章

  1. 没有绿幕,AI也能完美视频抠图,发丝毕现,毫无违和感 | CVPR
  2. linux chroot 命令 设置根目录路径
  3. 【408预推免复习】计算机组成原理之控制单元的功能和控制单元的设计
  4. timerpickerview使用_详解iOS App中UIPickerView滚动选择栏的添加方法
  5. leetcode 649. Dota2 参议院(贪心算法)
  6. angular依赖注入_Angular依赖注入简介
  7. leetcode 88
  8. 如何读取指针指向的地址空间呢?
  9. 个人数据常用备份策略
  10. 四十三 Python分布式爬虫打造搜索引擎Scrapy精讲—elasticsearch(搜索引擎)的mapping映射管理...
  11. 频谱仪测试gsm信号测试软件,怎样用频谱分析仪测试和分析GSM信号
  12. Linux daemontools安装及使用
  13. 图片尺寸的修改(Java)
  14. 安装及使用RSSHub
  15. flink-HA集群搭建和问题记录
  16. macOS制作Linux启动U盘,如何在Mac OS下用ISO包制作启动U盘
  17. android星星闪效果,使用Canvas绘制星星闪烁的效果
  18. Java响应式编程基础-响应式流
  19. postgreSQL安装成功后打开pgadmin4出现错误:Fatal error:The pgAdmin 4 server could not be contacted:
  20. Altium Designer 学习笔记(PCB封装库)

热门文章

  1. Leetcode1:Two Sum
  2. Spring主要用到两种设计模式
  3. Struts2框架使用(十)之struts2的上传和下载
  4. Visual Studio 2012中使用GitHub
  5. ibatis 中 $与#的区别
  6. 运行caffe自带的两个简单例子
  7. unity3d 人员控制代码
  8. Windows Server 2003 DNS服务安装篇
  9. UDP打洞程序包的源码
  10. C# 使用WinApi操作剪切板Clipboard