by Radu Raicea

由Radu Raicea

我如何使用Python查找有趣的人来关注Medium (How I used Python to find interesting people to follow on Medium)

Medium has a large amount of content, a large number of users, and an almost overwhelming number of posts. When you try to find interesting users to interact with, you’re flooded with visual noise.

中型网站包含大量内容,大量用户,并且帖子数量几乎是压倒性的。 当您尝试寻找有趣的用户进行交互时,您会被视觉噪音所淹没。

I define an interesting user as someone who is from your network, who is active, and who writes responses that are generally appreciated by the Medium community.

我将一个有趣的用户定义为来自您网络的,活跃的,并撰写了媒体社区普遍赞赏的回复的用户。

I was looking through the latest posts from users I follow to see who had responded to those users. I figured that if they responded to someone I’m following, they must have similar interests to mine.

我一直在浏览我关注的用户的最新帖子,以了解谁对这些用户做出了回应。 我认为,如果他们对我所关注的人做出回应,那么他们必须具有与我相似的兴趣。

The process was tedious. And that’s when I remembered the most valuable lesson I learned during my last internship:

这个过程很乏味。 那时,我想起了我在上次实习中学到的最有价值的一课:

Any tedious task can and should be automated.

任何繁琐的任务都可以并且应该自动化。

I wanted my automation to do the following things:

我希望自动化可以执行以下操作:

  1. Get all the users from my “Followings” list

    从“关注”列表中获取所有用户

  2. Get the latest posts of each user

    获取每个用户的最新帖子

  3. Get all the responses to each post

    获取每个帖子的所有回复

  4. Filter out responses that are older than 30 days过滤出超过30天的回复
  5. Filter out responses that have less than a minimum number of recommendations筛选出建议数量少于最小数量的回复
  6. Get the username of the author of each response

    获取每个回复作者的用户名

让我们开始吧 (Let’s start pokin’)

I initially looked at Medium’s API, but found it limiting. It didn’t give me much to work with. I could only get information about my account, not on other users.

我最初查看了Medium的API ,但发现它有局限性。 它并没有给我太多工作。 我只能获取有关我的帐户的信息,而不能获取其他用户的信息。

On top of that, the last change to Medium’s API was over a year ago. There was no sign of recent development.

最重要的是,Medium的API的最后一次更改是一年多以前。 没有近期发展的迹象。

I realized that I would have to rely on HTTP requests to get my data, so I started to poke around using my Chrome DevTools.

我意识到我必须依靠HTTP请求来获取我的数据,因此我开始使用Chrome DevTools进行研究

The first goal was to get my list of Followings.

第一个目标是获得我的关注清单。

I opened up my DevTools and went on the Network tab. I filtered out everything but XHR to see where Medium gets my list of Followings from. I hit the reload button on my profile page and got nothing interesting.

我打开我的DevTools并进入“网络”选项卡。 我过滤掉了XHR以外的所有内容,以查看Medium从何处获取我的关注列表。 我点击了个人资料页面上的“重新加载”按钮,却没有任何有趣的事情。

What if I clicked the Followings button on my profile? Bingo.

如果我单击个人资料上的“关注”按钮怎么办? 答对了。

Inside the link, I found a very big JSON response. It was a well-formatted JSON, except for a string of characters at the beginning of the response: ])}while(1);</x>

在链接内部,我发现了一个很大的JSON响应。 它是格式正确的JSON,除了响应开头的字符串是: ])}while(1);< / x>

I wrote a function to clean that up and turn the JSON into a Python dictionary.

我编写了一个函数来清理它,然后将JSON转换为Python字典。

import json
def clean_json_response(response):    return json.loads(response.text.split('])}while(1);</x>')[1])

I had found an entry point. Let the coding begin.

我找到了一个切入点。 让编码开始。

从我的关注列表中获取所有用户 (Getting all the users from my Followings list)

To query that endpoint, I needed my User ID (I know that I already had it, but this is for educational purposes).

要查询该端点,我需要我的用户ID(我知道我已经有了它,但这是出于教育目的)。

While looking for a way to get a user’s ID, I found out that you can add ?format=json to most Medium URLs to get a JSON response from that page. I tried that out on my profile page.

在寻找一种获取用户ID的方法时,我发现您可以向大多数Medium URL添加?format=json以获得该页面的JSON响应。 我在个人资料页面上尝试过。

Oh look, there’s the user ID.

哦,有用户名。

])}while(1);</x>{"success":true,"payload":{"user":{"userId":"d540942266d0","name":"Radu Raicea","username":"Radu_Raicea",...

I wrote a function to pull the user ID from a given username. Again, I had to use clean_json_response to remove the unwanted characters at the beginning of the response.

我编写了一个函数,用于从给定的用户名中提取用户ID。 同样,我必须使用clean_json_response在响应开始时删除不需要的字符。

I also made a constant called MEDIUM that contains the base for all the Medium URLs.

我还创建了一个名为MEDIUM的常量,其中包含所有Medium URL的基础。

import requests
MEDIUM = 'https://medium.com'
def get_user_id(username):
print('Retrieving user ID...')
url = MEDIUM + '/@' + username + '?format=json'    response = requests.get(url)    response_dict = clean_json_response(response)    return response_dict['payload']['user']['userId']

With the User ID, I queried the /_/api/users/<user_id>/following endpoint and got the list of usernames from my Followings list.

使用用户ID,我查询了/_/api/users/<user_id>/fol端点,并从我的关注列表中获取了用户名列表。

When I did it in DevTools, I noticed that the JSON response only had eight usernames. Weird.

当我在DevTools中进行操作时,我注意到JSON响应中只有八个用户名。 奇怪的。

After I clicked on “Show more people,” I saw what was missing. Medium uses pagination for the list of Followings.

单击“显示更多人”后,我看到了丢失的内容。 媒介使用分页作为关注列表。

Pagination works by specifying a limit (elements per page) and to (first element of the next page). I had to find a way to get the ID of that next element.

分页的工作方式是指定一个limit (每页元素)和to (下一页的第一个元素)。 我必须找到一种获取下一个元素ID的方法。

At the end of the JSON response from /_/api/users/<user_id>/following, I saw an interesting key.

/_/api/users/<user_id>/fol的JSON响应的结尾,我看到了一个有趣的键。

..."paging":{"path":"/_/api/users/d540942266d0/followers","next":{"limit":8,"to":"49260b62a26c"}}},"v":3,"b":"31039-15ed0e5"}

From here, writing a loop to get all the usernames from my Followings list was easy.

从这里开始,编写循环以从我的关注列表中获取所有用户名很容易。

def get_list_of_followings(user_id):
print('Retrieving users from Followings...')        next_id = False    followings = []
while True:
if next_id:            # If this is not the first page of the followings list            url = MEDIUM + '/_/api/users/' + user_id                  + '/following?limit=8&to=' + next_id        else:            # If this is the first page of the followings list            url = MEDIUM + '/_/api/users/' + user_id + '/following'
response = requests.get(url)        response_dict = clean_json_response(response)        payload = response_dict['payload']
for user in payload['value']:            followings.append(user['username'])
try:            # If the "to" key is missing, we've reached the end            # of the list and an exception is thrown            next_id = payload['paging']['next']['to']        except:            break
return followings

获取每个用户的最新帖子 (Getting the latest posts from each user)

Once I had the list of users I follow, I wanted to get their latest posts. I could do that with a request to https://medium.com/@<username>/latest?format=json

找到关注的用户列表后,我想获取他们的最新帖子。 我可以向https://medium.com/@<username>/latest?forma <用户名> / latest?forma t = json发送请求

I wrote a function that takes a list of usernames and returns a list of post IDs for the latest posts from all the usernames on the input list.

我编写了一个函数,该函数接受用户名列表,并从输入列表中的所有用户名中返回最新帖子的帖子ID列表。

def get_list_of_latest_posts_ids(usernames):
print('Retrieving the latest posts...')
post_ids = []
for username in usernames:        url = MEDIUM + '/@' + username + '/latest?format=json'        response = requests.get(url)        response_dict = clean_json_response(response)
try:            posts = response_dict['payload']['references']['Post']        except:            posts = []
if posts:            for key in posts.keys():                post_ids.append(posts[key]['id'])
return post_ids

获取每个帖子的所有回复 (Getting all the responses from each post)

With the list of posts, I extracted all the responses using https://medium.com/_/api/posts/<post_id>/responses

在帖子列表中,我使用https://medium.com/_/api/posts/<post_id>/res ponses提取了所有回复

This function takes a list of post IDs and returns a list of responses.

此函数获取帖子ID列表,并返回响应列表。

def get_post_responses(posts):
print('Retrieving the post responses...')
responses = []
for post in posts:        url = MEDIUM + '/_/api/posts/' + post + '/responses'        response = requests.get(url)        response_dict = clean_json_response(response)        responses += response_dict['payload']['value']
return responses

过滤响应 (Filtering the responses)

At first, I wanted responses that had gotten a minimum number of claps. But I realized that this might not be a good representation of the community’s appreciation of the response: a user can give more than one clap for the same article.

起初,我希望得到的回应最少。 但是我意识到这可能不能很好地代表社区对响应的赞赏:用户可以为同一篇文章提供多个鼓掌。

Instead, I filtered by the number of recommendations. It measures the same thing as claps, but it doesn’t take duplicates into account.

相反,我按建议的数量过滤。 它和拍手一样,但没有考虑重复。

I wanted the minimum to be dynamic, so I passed a variable named recommend_min around.

我希望最小值是动态的,所以我在周围传递了一个名为recommend_min的变量。

The following function takes a response and the recommend_min variable. It checks if the response meets that minimum.

以下函数接受一个响应和recommend_min变量。 它检查响应是否达到该最小值。

def check_if_high_recommends(response, recommend_min):    if response['virtuals']['recommends'] >= recommend_min:        return True

I also wanted recent responses. I filtered out responses that were older than 30 days using this function.

我还希望最近有回应。 我使用此功能过滤了超过30天的回复。

from datetime import datetime, timedelta
def check_if_recent(response):    limit_date = datetime.now() - timedelta(days=30)    creation_epoch_time = response['createdAt'] / 1000    creation_date = datetime.fromtimestamp(creation_epoch_time)
if creation_date >= limit_date:        return True

获取每个回复作者的用户名 (Getting the username of the author of each response)

Once I had all the filtered responses, I grabbed all the authors’ user IDs using the following function.

一旦获得所有过滤的响应,便可以使用以下函数获取所有作者的用户ID。

def get_user_ids_from_responses(responses, recommend_min):
print('Retrieving user IDs from the responses...')
user_ids = []
for response in responses:        recent = check_if_recent(response)        high = check_if_high_recommends(response, recommend_min)
if recent and high:            user_ids.append(response['creatorId'])
return user_ids

User IDs are useless when you’re trying to access someone’s profile. I made this next function query the /_/api/users/<user_id> endpoint to get the usernames.

当您尝试访问某人的个人资料时,用户ID无用。 我通过查询/_/api/users/<user_ user_id>端点来获取用户名。

def get_usernames(user_ids):
print('Retrieving usernames of interesting users...')
usernames = []
for user_id in user_ids:        url = MEDIUM + '/_/api/users/' + user_id        response = requests.get(url)        response_dict = clean_json_response(response)        payload = response_dict['payload']
usernames.append(payload['value']['username'])
return usernames

全部放在一起 (Putting it all together)

After I finished all the functions, I created a pipeline to get my list of recommended users.

完成所有功能后,我创建了一个管道以获取推荐用户列表。

def get_interesting_users(username, recommend_min):
print('Looking for interesting users for %s...' % username)
user_id = get_user_id(username)
usernames = get_list_of_followings(user_id)
posts = get_list_of_latest_posts_ids(usernames)
responses = get_post_responses(posts)
users = get_user_ids_from_responses(responses, recommend_min)
return get_usernames(users)

The script was finally ready! To run it, you have to call the pipeline.

脚本终于准备好了! 要运行它,您必须调用管道。

interesting_users = get_interesting_users('Radu_Raicea', 10)print(interesting_users)

Finally, I added an option to append the results to a CSV with a timestamp.

最后,我添加了一个选项,将结果附加到带有时间戳的CSV上。

import csv
def list_to_csv(interesting_users_list):    with open('recommended_users.csv', 'a') as file:        writer = csv.writer(file)
now = datetime.now().strftime('%Y-%m-%d %H:%M:%S')        interesting_users_list.insert(0, now)                writer.writerow(interesting_users_list)
interesting_users = get_interesting_users('Radu_Raicea', 10)list_to_csv(interesting_users)

The project’s source code is on GitHub.

该项目的源代码在GitHub上 。

If you don’t know Python, go read TK’s Learning Python: From Zero to Hero.

如果您不了解Python,请阅读TK的学习Python:从零到英雄 。

If you have suggestions on other criteria that make users interesting, please write them below!

如果您有其他一些使用户感兴趣的标准建议,请在下面写下!

综上所述… (In summary…)

  • I made a Python script for Medium.

    为Medium创建了Python脚本

  • The script returns a list of interesting users that are active and post interesting responses on the latest posts of people you are following.

    该脚本返回活跃用户的有趣列表,并在您关注的人的最新帖子中发布有趣的回复

  • You can take users from the list and run the script with their username instead of yours.您可以从列表中选择用户,并使用用户名(而不是您的用户名)运行脚本。

Check out my primer on open source licenses and how to add them to your projects!

查看我关于开放源代码许可证的入门知识 ,以及如何将它们添加到您的项目中!

For more updates, follow me on Twitter.

有关更多更新,请在Twitter上关注我。

翻译自: https://www.freecodecamp.org/news/how-i-used-python-to-find-interesting-people-on-medium-be9261b924b0/

我如何使用Python查找有趣的人来关注Medium相关推荐

  1. python信息找人的算法_算法篇-python查找算法

    上一篇的递归算法中,了解到算法的复杂度.递归就是在函数中调用本身. 在汉诺塔游戏例子中,如果你需要移动的盘子很多时,程序运行就会消耗很长时间来计算结果.可以回顾下 ->算法篇-python递归算 ...

  2. python入门学习[看漫画学Python:有趣、有料、好玩、好用读书笔记]

    写在前面:本文中绝大多数图片来源于图书:看漫画学Python:有趣.有料.好玩.好用,本文仅供个人学习使用,如有侵权,请联系删除. 学习编程语言最好的方式就是多写,多写,多写!!!哪有什么快速掌握,能 ...

  3. python/gurobi计算二人零和博弈纳什均衡精确解(可求解大规划策略空间)

    python/gurobi计算二人零和博弈纳什均衡精确解(可求解大规划策略空间) 文章目录 python/gurobi计算二人零和博弈纳什均衡精确解(可求解大规划策略空间) 1 二人零和博弈的纯策略N ...

  4. python 查找字符串

    在我们的生活中,有很多人都会用到查找字符串,比如说我找了一堆字符串,然后我想要通过这个字符串去找一个单词,这样就是用查找字符串的方式,今天就和大家分享一下 python查找字符串. 首先我们先来了解一 ...

  5. python 查找损坏图片_使用Python查找损坏的图像

    python 查找损坏图片 赶上本系列: 第1部分:使用Python自动执行数字艺术家的重复任务 第2部分:针对数字艺术家的Python文件管理技巧 如果您正在计算机上处​​理图像,那么最终一定会遇到 ...

  6. 实用的工具,有趣的人

    1998年的9月4日,拉里·佩奇和谢尔盖·布林将Google带到这个世界.就在前几天,Google刚刚过了20岁生日.经过20年的发展,Google早已不是当初的小作坊了,而是发展成为被大家公认的全球 ...

  7. chatgpt赋能python:Python查找手机号码

    Python查找手机号码 在今天的数字时代,手机号码已成为每个人生活中必不可少的一部分.虽然我们可以轻松地拥有一部手机,但是对于那些需要通过电话来联系客户.朋友或家庭成员的人,获取正确的手机号码就显得 ...

  8. Python一个有趣的彩蛋

    上周组内技术分享会,朋友介绍了Python语言有趣的历史,其中一个有意思的环节就是Python之禅,或者叫Python的彩蛋-this.py, 命令行执行python -c "import ...

  9. python实现爬取名人名言

    python实现爬取名人名言 技术路线:requests-bs4-re 第一步 首先打开名人名言的网站https://mingyan.supfree.net/search.asp 第二步 然后查看源代 ...

最新文章

  1. 如何強迫 .Net 應用程式輸出英文的例外訊息
  2. 【Android 事件分发】ItemTouchHelper 简介 ( 拖动/滑动事件 | ItemTouchHelper.Callback 回调 )
  3. 读书笔记---图解HTTP(一)
  4. 前端学习(2450):页面布局制作
  5. 某中学要对学校运动会进行计算机管理,2020年新编高职入学考试适应性试卷信息技术试卷定稿名师精品资料....
  6. Spark Structured Straming:'writeStream' can be called only on streaming Dataset/DataFrame
  7. Linux配置JAVA环境变量(图文教程)
  8. Ubuntu 新建qt项目时出现 No valid kits found.
  9. Sql中的union和union all的讲解
  10. 【Git/Github学习笔记】Github的使用介绍
  11. java 字符串string、int和arraylist互转
  12. Docker Toolbox下配置国内镜像源-阿里云加速器
  13. 30个有趣的Python实战项目(附源码)
  14. 渗透测试工程师从业经验
  15. echart--axisLabel中值太长不自动换行
  16. java netbeans_Java和Netbeans字体美化
  17. Scala历史版本在哪里??
  18. [cogs2482][二分答案]Franky的胡子
  19. 在图3-30 中,某学院的以太网交换机有三个接口分别和学院三个系的以太网相连,另外三个接口分别和电子邮件服务器、万维网服务器以及一个连接互联网的路由器相连。图中的A,B和C都是100Mbit/s以太网
  20. export PATHONPATH的用法

热门文章

  1. 1107班html大赛比赛说明 同学们需注意的事项
  2. jdbc连接对象的获取 20210409233805735
  3. 字节流复制文件 java
  4. 04 能够使用using和namespace引用和定义命名空间 1214
  5. 草稿 断开绑dgv1
  6. 判断数组是否有序排列的0831
  7. redis-软件安装redis5
  8. django-行对向的反向查找
  9. (转)使用异步Python 3.6和Redis编写快速应用程序
  10. Spring Boot与RESTful API