抓取spotify歌单_如何使用python抓取Spotify上最受欢迎的歌曲

抓取spotify歌单

Spotify is one of the most popular music streaming services in the world, with nearly 300 million users as of 2020. It is also a data analyst’s dream, with publicly available metrics on everything from broad topics such as the top songs in the past month, to minutiae like the “speechiness” (how many words are in a track) of your favorite song. Today, we’ll be looking at how to scrape the Top 200 songs on Spotify for every day for the past year using Python’s Beautiful Soup and Pandas. This is the first part of my “Know the Vibes Animation” series, in which we explore how the mood of top songs on Spotify has shifted over the past year.

Spotify是全球最受欢迎的音乐流媒体服务之一，截至2020年拥有近3亿用户。这也是数据分析师的梦想，它提供了有关广泛主题的公开可用指标，例如上个月的热门歌曲，就像您喜欢的歌曲的“语音”(音轨中有多少个单词)一样。今天，我们将研究如何使用Python的Beautiful Soup和Pandas在过去一年中每天在Spotify上刮取前200首歌曲。这是我的“了解共鸣动画”系列的第一部分，在其中，我们探讨了Spotify上顶级歌曲的情绪在过去一年中如何变化。

To start, we’ll be using Anaconda, a platform consisting of popular data science tools. To install the free version of Anaconda, click here.

首先，我们将使用Anaconda，该平台由流行的数据科学工具组成。要安装Anaconda的免费版本，请单击此处。

Great! Now that Anaconda’s up and running, we’ll be using its JupyterLab interface. JupyterLab will allow us to get immediate feedback on our Python script.

大！现在Anaconda已启动并运行，我们将使用其JupyterLab界面。 JupyterLab将使我们能够立即获得有关Python脚本的反馈。

Now, we’re going to import some helpful packages and libraries.

现在，我们将导入一些有用的软件包和库。

from bs4 import BeautifulSoupimport pandas as pdimport requestsfrom time import sleepfrom datetime import date, timedelta

(Note for beginners: To run the script, press “shift” and “enter.”)

(初学者注意：要运行脚本，请按“ Shift”和“ Enter”。)

To review what these do:

要查看这些功能：

Beautiful Soup helps us to extract data from HTML files.

Beautiful Soup帮助我们从HTML文件提取数据。
Pandas will help us transform our array of data (one-dimensional, hard to read, disorganized) into a nice data-frame (two-dimensional, easy-to read).

熊猫将帮助我们将数据数组(一维，难于阅读，杂乱无章)转换成一个不错的数据框(二维，易读)。
Requests allows us to send HTTP requests, allowing us to get relevant info from the webpages, such as content, encoding, etc.

请求使我们能够发送HTTP请求，从而使我们能够从网页获取相关信息，例如内容，编码等。
Sleep makes sure that we don’t overwhelm the server by sending too many requests at once. It puts a pause in our code. This also helps to make sure our page is fully loaded before proceeding.

睡眠可确保我们不会一次发送太多请求而使服务器不知所措。它使我们的代码暂停了。这也有助于确保在继续操作之前，我们的页面已完全加载。
Date will help us tack on dates to our base URL so that we can get top daily song data. Timedelta will help us to get data for each day in the duration which we’re studying.

日期将帮助我们将日期附加到我们的基本URL，以便我们获取每日的热门歌曲数据。 Timedelta将帮助我们在学习期间每天获取数据。

We’ll then define some empty arrays so that we can store the data which we’ll collect.

然后，我们将定义一些空数组，以便我们可以存储要收集的数据。

dates=[]url_list=[]final = []

The next part of scraping multiple pages is mapping the pages which we want to scrape. We do this by identifying a base URL. Since we’ll be looking at daily data from Spotify Charts for the US, our base URL would be “https://spotifycharts.com/regional/us/daily/” . Next, we want to define the range of dates which we’ll be studying. I decided to study the past year in music, but you can adjust the dates accordingly. We also define delta as the difference between these two dates, which will help with iteration later.

抓取多个页面的下一部分是映射我们要抓取的页面。我们通过识别基本URL来实现。由于我们将查看来自美国Spotify图表的每日数据，因此我们的基本URL为“ https://spotifycharts.com/regional/us/daily/”。接下来，我们要定义将要研究的日期范围。我决定学习音乐的过去一年，但是您可以相应地调整日期。我们还将增量定义为这两个日期之间的差，这将有助于以后的迭代。

We then write a for loop which will cycle through each date, convert it into a string, and then append it to our dates array.

然后，我们编写一个for循环，该循环将遍历每个日期，将其转换为字符串，然后将其附加到我们的dates数组中。

We’ll then write a function to cycle through these date-strings, and append each of them to our base URL, and then put the new URLs in an array. After running the function, we’ve now mapped what we’re going to scrape!

然后，我们将编写一个函数来循环这些日期字符串，并将它们分别附加到我们的基本URL，然后将新的URL放入数组中。运行该函数后，我们现在已经映射了要抓取的内容！

#map site url = "https://spotifycharts.com/regional/us/daily/"start_date= date(2019, 9, 1)end_date= date(2020, 9, 1) delta= end_date-start_date for i in range(delta.days+1):     day = start_date+timedelta(days=i)     day_string= day.strftime("%Y-%m-%d")     dates.append(day_string)def add_url():     for date in dates:           c_string= url+date           url_list.append(c_string)add_url()

Now what do we do with the URLs in our list? Well, we want to go through each of them, pull out the relevant text, and then store it. In order to do that, we first have to use Requests to get the content of the page. So that we don’t overwhelm the server on each loop, I put a 2 second pause in. Afterwards, we use Beautiful Soup to parse the text of our content. Then, we go down to the chart on the page which lists all the song data.

现在，我们如何处理列表中的URL？好吧，我们想遍历每一个，取出相关文本，然后将其存储。为了做到这一点，我们首先必须使用请求来获取页面的内容。为避免每次循环时服务器不堪重负，我暂停了2秒。然后，我们使用Beautiful Soup解析内容的文本。然后，我们转到页面上的图表，其中列出了所有歌曲数据。

for u in url_list:          read_pg= requests.get(u)          sleep(2)          soup= BeautifulSoup(read_pg.text, "html.parser")          songs= soup.find("table", {"class":"chart-table"})       song_scrape(u)

Let’s go and get the data which we’ll be using for our animation.

我们去获取将用于动画的数据。

def song_scrape(x):       pg = x          for tr in songs.find("tbody").findAll("tr"):                     artist= tr.find("td", {"class": "chart-table-               track"}).find("span").text             artist= artist.replace("by ","").strip()             title= tr.find("td",{"class": "chart-table-              track"}).find("strong").text             songid= tr.find("td", {"class": "chart-table-             image"}).find("a").get("href")             songid= songid.split("track/")[1]             url_date= x.split("daily/")[1]             final.append([title, artist, songid, url_date])

In our song_scrape function, we include the argument x so that the program knows what page we’re scraping. Then, we use search through the text which we got from Beautiful Soup to find each table row. Then, we analyze the HTML to see what elements we need to select to get artist, title, and song ID information.

在我们的song_scrape函数中，我们包含参数x，以便程序知道我们要抓取的页面。然后，我们使用“ 美丽汤”中的文本进行搜索，以找到每个表格行。然后，我们分析HTML，以了解需要选择哪些元素来获取艺术家，标题和歌曲ID信息。

The HTML for *Mood, №2 on the charts at the moments*

Based on the song above, Mood, we see that the artist is inside the <span> element. Thus, we find the class that the span is in (“chart-table-track”), find the span, and get the text of that. We replicate this process for the song title as well.

根据上面的歌曲Mood ，我们可以看到艺术家位于<span>元素内。因此，我们找到跨度所在的类(“图表表轨道”)，找到跨度，并获取其文本。我们也为歌曲标题重复此过程。

Song ID is a bit trickier. When we make calls to the Spotify API in the second part of our tutorial, we’ll need the Spotify track ID’s (ex: 7FIWs0pqAYbP91WWM0vlTQ, Eminem’s “Godzilla).

歌曲ID有点棘手。当在本教程的第二部分中调用Spotify API时，我们将需要Spotify轨道ID(例如：7FIWs0pqAYbP91WWM0vlTQ，Eminem的“ Godzilla”)。

The problem is that the Charts don’t have the Spotify ID included. They do have links to listen to the song in your browser however (ex: https://open.spotify.com/track/22LAwLoDA5b4AaGSkg6bKW, Lil Mosey’s “Blueberry Faygo”). As it happens, everything after “track/” in our URL matches the form of our track ID’s. They are our track ID’s.

问题在于图表中没有包含Spotify ID。它们确实具有在您的浏览器中收听歌曲的链接(例如：Lil Mosey的“蓝莓Faygo”(例如https://open.spotify.com/track/22LAwLoDA5b4AaGSkg6bKW ))。碰巧的是，我们网址中“ track /”之后的所有内容都与我们的曲目ID的形式匹配。它们是我们的曲目ID。

All we have to do then is to split the URL into two parts, based on the strings before and after “track/” (including “track/”), and then take the second substring. Since Python’s indexing starts at 0, we take the second substring with [1]. Similarly, we note the date that the song charted by splitting the URL into the base URL and the date. We add all of our newfound data into a previously created empty array.

然后，我们要做的就是根据“ track /”之前和之后的字符串(包括“ track /”)将URL分为两部分，然后获取第二个子字符串。由于Python的索引从0开始，所以我们将第二个子字符串作为[1]。同样，我们通过将URL分为基本URL和日期来记录歌曲绘制的日期。我们将所有新发现的数据添加到以前创建的空数组中。

The for loop which we precedes this code chunk calls this function at the end of it, running it.

我们在此代码块之前的for循环在其末尾调用此函数并运行它。

Now, all we have to do is convert the array into the appropriate form and save it. Here, we use Pandas to put the data into a data frame with appropriately named columns. We then write to a new CSV file, with header= True to acknowledge our column titles.

现在，我们要做的就是将数组转换为适当的形式并保存。在这里，我们使用Pandas将数据放入具有适当命名列的数据帧中。然后，我们使用header = True写入新的CSV文件，以确认我们的列标题。

final_df = pd.DataFrame(final, columns= ["Title", "Artist", "Song ID", "Chart Date"]) with open('spmooddata.csv', 'w') as f:                final_df.to_csv(f, header= True, index=False)

After opening the file in Excel and deleting some blank rows, we get our finished product:

在Excel中打开文件并删除一些空白行后，我们得到了成品：

And there we have it! The full script can be viewed on my GitHub here. Join me soon for ~Part 2~!

我们终于得到它了！完整的脚本可以在我的GitHub上查看。快加入我吧〜 第2部分〜 ！

翻译自: https://medium.com/the-innovation/how-to-scrape-the-most-popular-songs-on-spotify-using-python-8a8979fa6b06

抓取spotify歌单

查看全文

http://www.taodudu.cc/news/show-3167388.html

spotify 缓存_如何在Spotify中获得最佳音质
spotify使用教程_使用Spotify oauth对您的应用程序进行身份验证
ios怎么用spotify_在iOS中通过Spotify进行身份验证
深度linux安装spotify,在Ubuntu/Mint中安装 Spotify
spotify电脑下载歌曲_Spotify for Windows
深度linux安装spotify,如何在Kali Linux上安装Spotify
spotify 缓存_如何组织您的Spotify库
spotify使用教程_如何在Google Home中使用Spotify
Spotify的牛逼是如何炼成的？
Spotify for Mac(流行音乐播放器)
Spotify 模型
手机连接fiddler上网
android 热点连接电脑上网,电脑没有网络，用手机数据线，就能让电脑快速上网...
电脑usb android上网,让你的Android手机通过USB共享电脑上网
教你如何用手机连接电脑上网
互联网中的视频(HTML5) — （1）
论文翻译：DeepFaceLab：一个简单，灵活的可扩展换脸框架
2008年将是视频中国年
转这个作为第一篇，换种模式记录即将开始的新生活
HTML5定稿了，终于有一种编程语言开发的程序可以在Android和IOS两种设备上运行了
iPhone到底好在哪里
iPad 使用感想－初体验
浅谈虚拟内存与项目开发中的OOM问题
太阳人人都恨布鲁斯小斯愤怒：马刺是支肮脏球队
苹果iPhone到底好在哪里
一个ABC眼中的中国创业环境
Mac Mini2018 开箱（视频）全球首映！Kindle 开箱一并奉上（文字）
逃出你的肖申克（二）：你的灯亮着吗？
本周AI热点回顾：无人车的眼睛、UP主的生产力工具，都藏在百度CVPR的这十个冠军里；IJCAI 2021心电全景图及其生成网络...
如何将孩子的视频保存在YouTube历史记录之外

抓取spotify歌单_如何使用python抓取Spotify上最受欢迎的歌曲相关推荐

python 抓取解析接口数据_[干货]用python抓取摩拜单车API数据并做可视化分析(源码)...
原标题:[干货]用python抓取摩拜单车API数据并做可视化分析(源码) 在APP中能看到很多单车,但走到那里的时候,才发现车并不在那里.有些车不知道藏到了哪里:有些车或许是在高楼的后面,由于有GP ...
python抓取京东联盟优惠券_[爬虫]使用python抓取京东全站数据（商品，店铺，分类，评论）...
网上抓取京东数据的文章,现在要么无法抓取数据,要么只能抓取部分数据,本文将介绍如何抓取京东全站数据,包括商品信息.店铺信息,评论信息,分类信息等. -------------------------- ...
python能爬取网站后台数据_如何利用Python爬取网站数据？
1.基本方法其实用python爬取网页很简单,只有简单的几句话这样就可以获得到页面的内容.接下来再用正则匹配去匹配所需要的内容就行了.但是,真正要做起来,就会有各种各样的细节问题. 2.登录这是 ...
如何用python爬取公众号文章_如何使用 Python 爬取微信公众号文章
我比较喜欢看公众号,有时遇到一个感兴趣的公众号时,都会感觉相逢恨晚,想一口气看完所有历史文章.但是微信的阅读体验挺不好的,看历史文章得一页页的往后翻,下一次再看时还得重复操作,很是麻烦. 于是便想着能 ...
python爬取网页公开数据_如何用Python爬取网页数据
使用Python爬取网页数据的方法: 一.利用webbrowser.open()打开一个网站:>>> import webbrowser >>> webbrowse ...
编程python爬取网页数据教程_实例讲解Python爬取网页数据
一.利用webbrowser.open()打开一个网站: >>> import webbrowser >>> webbrowser.open('http://i.f ...
python爬取b站数据_如果利用Python爬取B站上千万数据？B站直播都是大屌萌妹吗？...
粉丝独白说起热门的B站相信很多喜欢玩动漫的,看最有创意的Up主的同学一定非常熟悉.我突发奇想学Python这么久了,为啥不用Python爬取B站中我关注的人,已经关注的人他们关注的人,看看全站里面热 ...
python爬取网易云音乐_我用Python爬取了网易云音乐
原标题:我用Python爬取了网易云音乐来源:别动我的猫尾巴 headers需要进行修改,headers设置不对会被屏蔽导致爬取不成功.一个headers用久了也会爬取不成功代码如下: impor ...
python爬取网站教学视频_零基础Python爬取网页文章和图片详细教学（内附源码、教学视频）...
Python爬虫,这个我相信对于很多人来说都不陌生! 今天小编给大家提供一份零基础Python爬取网页图片以及小说的代码并且详细的和大家讲解一下! 开发环境:版本Python3.6.2!(2版本会在2 ...

抓取spotify歌单_如何使用python抓取Spotify上最受欢迎的歌曲

相关文章：

抓取spotify歌单_如何使用python抓取Spotify上最受欢迎的歌曲相关推荐

最新文章

热门文章