python 爬虫 requests+BeautifulSoup 爬取巨潮资讯公司概况代码实例

第一次写一个算是比较完整的爬虫，自我感觉极差啊，代码low，效率差，也没有保存到本地文件或者数据库，强行使用了一波多线程导致数据顺序发生了变化。。。

贴在这里，引以为戒吧。

# -*- coding: utf-8 -*-
"""
Created on Wed Jul 18 21:41:34 2018
@author: brave-man
blog: http://www.cnblogs.com/zrmw/
"""import requests
from bs4 import BeautifulSoup
import json
from threading import Thread
# 获取上市公司的全称，英文名称，地址，法定代表人（也可以获取任何想要获取的公司信息）
def getDetails(url):headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0"}res = requests.get("{}".format(url), headers = headers)res.encoding = "GBK"soup = BeautifulSoup(res.text, "html.parser")details = {"code": soup.select(".table")[0].td.text.lstrip("股票代码：")[:6], "Entire_Name": soup.select(".zx_data2")[0].text.strip("\r\n "), "English_Name": soup.select(".zx_data2")[1].text.strip("\r\n "), "Address": soup.select(".zx_data2")[2].text.strip("\r\n "), "Legal_Representative": soup.select(".zx_data2")[4].text.strip("\r\n ")}# 这里将details转换成json字符串格式用作后期存储处理jd = json.dumps(details)jd1 = json.loads(jd)print(jd1)
# 此函数用来获取上市公司的股票代码
def getCode():headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0"}res = requests.get("http://www.cninfo.com.cn/cninfo-new/information/companylist", headers = headers)res.encoding = "gb1232"soup = BeautifulSoup(res.text, "html.parser")
#    print(soup.select(".company-list"))L = []l1 = []l2 = []l3 = []l4 = []for i in soup.select(".company-list")[0].find_all("a"):code = i.text[:6]l1.append(code)for i in soup.select(".company-list")[1].find_all("a"):code = i.text[:6]l2.append(code)for i in soup.select(".company-list")[2].find_all("a"):code = i.text[:6]l3.append(code)for i in soup.select(".company-list")[3].find_all("a"):code = i.text[:6]l4.append(code)L = [l1, l2, l3, l4]print(L[0])return getAll(L)def getAll(L):def t1(L):for i in L[0]:url_sszb = "http://www.cninfo.com.cn/information/brief/szmb{}.html".format(i)getDetails(url_sszb)def t2(L):for i in L[1]:url_zxqyb = "http://www.cninfo.com.cn/information/brief/szsme{}.html".format(i)getDetails(url_zxqyb)def t3(L):for i in L[2]:url_cyb = "http://www.cninfo.com.cn/information/brief/szcn{}.html".format(i)getDetails(url_cyb)def t4(L):for i in L[3]:url_hszb = "http://www.cninfo.com.cn/information/brief/shmb{}.html".format(i)getDetails(url_hszb)
#    tt1 = Thread(target = t1, args = (L, ))
#    tt2 = Thread(target = t2, args = (L, ))
#    tt3 = Thread(target = t3, args = (L, ))
#    tt4 = Thread(target = t4, args = (L, ))
#
#    tt1.start()
#    tt2.start()
#    tt3.start()
#    tt4.start()
#
#    tt1.join()
#    tt2.join()
#    tt3.join()
#    tt4.join()
    t1(L)t2(L)t3(L)t4(L)if __name__ == "__main__":getCode()

没有考虑实际生产中突发的状况，比如网速延迟卡顿等问题。

速度是真慢，有时间会分享给大家 selenium + 浏览器的爬取巨潮资讯的方法代码。晚安~

转载于:https://www.cnblogs.com/zrmw/p/9333385.html

python 爬虫 requests+BeautifulSoup 爬取巨潮资讯公司概况代码实例相关推荐

巧用selenium爬取巨潮资讯公司数据
巧用selenium爬取巨潮资讯公司数据立项背景:在做深度学习的过程中利用python进行建模,需要数据来训练模型. 项目目标:通过运用python的selenium模块,爬取巨潮资讯网站关于公司的 ...
基于python+selenium+Chrome自动化爬取巨潮资讯网A股财务报表
转自同学的博客引言: 网页爬虫分为静态网页爬虫和动态网页爬虫,前者是指索要获取的网页内容不需要经过js运算或者人工交互, 后者是指获取的内容必须要经过js运算或者人工交互.这里的js运算可能是aja ...
python爬虫 requests+bs4爬取猫眼电影傻瓜版教程
python爬虫 requests+bs4爬取猫眼电影傻瓜版教程前言一丶整体思路二丶遇到的问题三丶分析URL 四丶解析页面五丶写入文件六丶完整代码七丶最后前言大家好我是墨绿头顶总 ...
python爬虫-使用BeautifulSoup爬取新浪新闻标题
** python爬虫-使用BeautifulSoup爬取新浪新闻标题 ** 最近在学习爬虫的技巧,首先学习的是较为简单的BeautifulSoup,应用于新浪新闻上. import requests ...
python3爬取巨潮资讯网的年报数据
python3爬取巨潮资讯网的年报数据前期准备: 需要用到的库: 完整代码: 前期准备: 巨潮资讯网有反爬虫机制,所以先打开巨潮资讯网的年报板块,看看有什么解决办法. 巨潮咨询年报板块可以通过这样 ...
python3爬取巨潮资讯网站年报数据
python3爬取巨潮资讯网站年报数据 2018年年底巨潮资讯http://www.cninfo.com.cn改版了,之前实习生从网上找的脚本不能用了,因此重新修改了下爬取脚本.最初脚本的原链接忘了, ...
python3爬取数据_python3爬取巨潮资讯网站年报数据
python3爬取巨潮资讯网站年报数据 2018年年底巨潮资讯http://www.cninfo.com.cn改版了,之前实习生从网上找的脚本不能用了,因此重新修改了下爬取脚本.最初脚本的原链接忘了, ...
批量爬取巨潮资讯网中“贵州茅台”相关公告的PDF文件。
1 需求批量爬取巨潮资讯网中"贵州茅台"相关公告的PDF文件. 2 代码实现 import reimport requests from selenium import webd ...
selenium爬取巨潮资讯指定领域下所有上市公司的数据并存储到csv文件
selenium爬取巨潮资讯指定领域下所有上市公司的数据并存储到csv文件 from selenium.webdriver import Chrome #引入selenium中的Chrome from ...

python 爬虫 requests+BeautifulSoup 爬取巨潮资讯公司概况代码实例

python 爬虫 requests+BeautifulSoup 爬取巨潮资讯公司概况代码实例相关推荐

最新文章

热门文章