python实现爬取非小号相关性（btc）数据

下载chromedriver并且配置到PATH
配置xpath、selenium环境
定位元素
保存数据
完整代码

下载chromedriver并且配置到PATH

首先我们使用谷歌测试插件来爬取网页源码，原因是普通的requests.get()在网页没有加载完全的时候就直接爬取了，最后得到的html源码是不完整的。插件下载地址为 chromedriver.
下载后记得将其配置到环境变量PATH中，并放在一下文件夹下：

配置xpath、selenium环境

xpath用来定位元素、selenium用来调用插件

conda install xpath
conda install selenium

然后对chromedriver插件进行测试：

from selenium import webdriver
browser = webdriver.Chrome()

这个时候如果会自动打开一个chrome浏览器窗口，就说明测试成功了。

定位元素

首先打开非小号的官网，按下F12审查：

通过选择器发现每个币种的url都在div class="ivu-table-cell"下，然后我们就通过xpath来解析并获取这些div。

1.模拟浏览器打开非小号官网，并停留5秒钟（为了使得网页完全打开），然后获取并解析源码

     browser.get(url) time.sleep(5)page_text = browser.page_sourcetree = etree.HTML(page_text)li_list = tree.xpath("//div[@class='ivu-table-cell']")

2.保存所有币种的url
我们可以看到url就藏在a标签中于是我们通过xpath解析a标签的href。

    url_list = []for coin_url in li_list:if len(coin_url.xpath('./a/@href')) == 0:continueurl_list.append(str(coin_url.xpath('./a/@href')[0]))

3.对所有币种url进行请求获得BTC相关性数据
首先我们随便打开一个币种的网页：

然后同样用选择器选择到相关性的位置，然后在源码对应的地方右键，选择copy – xpath，就能获取这个相关性数据了。

    coef_list = []for temp_url in url_list:try:temp_text = requests.get(temp_url).texttemp_text = etree.HTML(temp_text)coef = temp_text.xpath("//body/div[@id='__nuxt']/div[@id='__layout']/section[1]/div[1]/div[1]/div[1]/div[1]/div[3]/div[2]/div[2]/div[8]/span[2]")coef = float(coef[0].text.split('\n')[1])coef_list.append(coef)print(temp_url + 'is finished!!!')except Exception as e:coef_list.append(None)print (e)

保存数据

最后将数据保存为csv文件。

    coin_name = [x.split('/')[4] for x in url_list]coef_df = pd.DataFrame([coin_name, coef_list]).Tcoef_df.columns = ['coin', 'coef']coef_df.to_csv(os.getcwd() + '\coef_df.csv', index = False)

文件打开后就是所有币种对BTC的相关系数了！

完整代码

import requests
from lxml import etree
from selenium import webdriver
import time
import pandas as pd
import osbrowser = webdriver.Chrome()
url = 'https://www.feixiaohao.com'
datapath = os.getcwd()if __name__ == "__main__":browser.get(url) time.sleep(5)page_text = browser.page_sourcetree = etree.HTML(page_text)li_list = tree.xpath("//div[@class='ivu-table-cell']")# 找出所有币种的urlurl_list = []for coin_url in li_list:if len(coin_url.xpath('./a/@href')) == 0:continueurl_list.append(str(coin_url.xpath('./a/@href')[0]))url_list = [url + x for x in url_list ]# 根据所有币种URL爬取BTC相关性coef_list = []for temp_url in url_list:try:temp_text = requests.get(temp_url).texttemp_text = etree.HTML(temp_text)coef = temp_text.xpath("//body/div[@id='__nuxt']/div[@id='__layout']/section[1]/div[1]/div[1]/div[1]/div[1]/div[3]/div[2]/div[2]/div[8]/span[2]")coef = float(coef[0].text.split('\n')[1])coef_list.append(coef)print(temp_url + 'is finished!!!')except Exception as e:coef_list.append(None)print (e)# 保存数据coin_name = [x.split('/')[4] for x in url_list]coef_df = pd.DataFrame([coin_name, coef_list]).Tcoef_df.columns = ['coin', 'coef']coef_df.to_csv(os.getcwd() + '\coef_df.csv', index = False)