Python pyspider的安装

  • 一、平台
  • 二、Python版本
  • 三、安装
    • 1、安装pyspider
    • 2、安装phantomjs
  • 四、执行和排错

一、平台

本机使用WIN 10平台进行配置

二、Python版本

本机使用Python版本为3.6.2(不建议使用高版本,有些库会不兼容),下载链接,使用如下截图中的安装包安装即可。

三、安装

1、安装pyspider

打开cmd,直接输入pip install pyspider

C:\Users\Donnie>pip install pyspider
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting pyspiderUsing cached https://pypi.tuna.tsinghua.edu.cn/packages/d0/97/d6062c928f53d899ff2a8538fed11d4d425ba3d27c96248a2c601c1c9fef/pyspider-0.3.10.tar.gz (110 kB)
Collecting Flask>=0.10Using cached https://pypi.tuna.tsinghua.edu.cn/packages/bf/73/9180d22a40da68382e9cb6edb66a74bf09cb72ac825c130dce9c5a44198d/Flask-2.0.0-py3-none-any.whl (93 kB)
Collecting Jinja2>=2.7Using cached https://pypi.tuna.tsinghua.edu.cn/packages/48/9b/dc3bbfc44d851632df958acf9d47e4de662c6bbd238e46798d555d427b27/Jinja2-3.0.0-py3-none-any.whl (133 kB)
Collecting chardet>=2.2Downloading https://pypi.tuna.tsinghua.edu.cn/packages/19/c7/fa589626997dd07bd87d9269342ccb74b1720384a4d739a1872bd84fbe68/chardet-4.0.0-py2.py3-none-any.whl (178 kB)|████████████████████████████████| 178 kB 2.2 MB/s
Requirement already satisfied: cssselect>=0.9 in c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages (from pyspider) (1.1.0)
Requirement already satisfied: lxml in c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages (from pyspider) (4.6.3)
Collecting pycurlDownloading https://pypi.tuna.tsinghua.edu.cn/packages/50/1a/35b1d8b8e4e23a234f1b17a8a40299fd550940b16866c9a1f2d47a04b969/pycurl-7.43.0.6.tar.gz (222 kB)|████████████████████████████████| 222 kB ...ERROR: Command errored out with exit status 10:command: 'c:\users\donnie\appdata\local\programs\python\python36\python.exe' -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Donnie\\AppData\\Local\\Temp\\pip-install-93yw3wr8\\pycurl_7f2651bdbd8a4eeeac657372eeff673f\\setup.py'"'"'; __file__='"'"'C:\\Users\\Donnie\\AppData\\Local\\Temp\\pip-install-93yw3wr8\\pycurl_7f2651bdbd8a4eeeac657372eeff673f\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\Donnie\AppData\Local\Temp\pip-pip-egg-info-7m2njos2'cwd: C:\Users\Donnie\AppData\Local\Temp\pip-install-93yw3wr8\pycurl_7f2651bdbd8a4eeeac657372eeff673f\Complete output (1 lines):Please specify --curl-dir=/path/to/built/libcurl----------------------------------------
WARNING: Discarding https://pypi.tuna.tsinghua.edu.cn/packages/50/1a/35b1d8b8e4e23a234f1b17a8a40299fd550940b16866c9a1f2d47a04b969/pycurl-7.43.0.6.tar.gz#sha256=8301518689daefa53726b59ded6b48f33751c383cf987b0ccfbbc4ed40281325 (from https://pypi.tuna.tsinghua.edu.cn/simple/pycurl/) (requires-python:>=3.5). Command errored out with exit status 10: python setup.py egg_info Check the logs for full command output.Downloading https://pypi.tuna.tsinghua.edu.cn/packages/af/78/ce614dabbbcdc649174e5f06ae63d27473b97a815ae04c8f509b68d6472d/pycurl-7.43.0.5-cp36-cp36m-win_amd64.whl (1.7 MB)|████████████████████████████████| 1.7 MB ...
Collecting requests>=2.2Downloading https://pypi.tuna.tsinghua.edu.cn/packages/29/c1/24814557f1d22c56d50280771a17307e6bf87b70727d975fd6b2ce6b014a/requests-2.25.1-py2.py3-none-any.whl (61 kB)|████████████████████████████████| 61 kB 3.8 MB/s
Collecting Flask-Login>=0.2.11Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2b/83/ac5bf3279f969704fc1e63f050c50e10985e50fd340e6069ec7e09df5442/Flask_Login-0.5.0-py2.py3-none-any.whl (16 kB)
Collecting u-msgpack-python>=1.6Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a3/54/0400a3a22ff133633d343371821bf81010455fa3a981a93d7ff3e27a554e/u_msgpack_python-2.7.1-py2.py3-none-any.whl (10.0 kB)
Collecting click>=3.3Using cached https://pypi.tuna.tsinghua.edu.cn/packages/80/11/597f9102867dc0972b698972f05f50925f586639e57beba4db352029e8f9/click-8.0.0-py3-none-any.whl (96 kB)
Collecting six>=1.5.0Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting tblib>=1.3.0Downloading https://pypi.tuna.tsinghua.edu.cn/packages/f8/cd/2fad4add11c8837e72f50a30e2bda30e67a10d70462f826b291443a55c7d/tblib-1.7.0-py2.py3-none-any.whl (12 kB)
Collecting wsgidav>=2.0.0Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b2/fa/5d917861f496a06926061ea48bfe739ece36f73d5a41d237ab0c4258a659/WsgiDAV-3.1.0-py2.py3-none-any.whl (174 kB)|████████████████████████████████| 174 kB ...
Collecting tornado<=4.5.3,>=3.2Downloading https://pypi.tuna.tsinghua.edu.cn/packages/0a/29/01057551db50f718fda2afa0e42abdfccca4f8b18fa6163c59588ae8e991/tornado-4.5.3-cp36-cp36m-win_amd64.whl (423 kB)|████████████████████████████████| 423 kB ...
Requirement already satisfied: pyquery in c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages (from pyspider) (1.4.3)
Collecting coloramaDownloading https://pypi.tuna.tsinghua.edu.cn/packages/44/98/5b86278fbbf250d239ae0ecb724f8572af1c91f4a11edf4d36a206189440/colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting Werkzeug>=2.0Using cached https://pypi.tuna.tsinghua.edu.cn/packages/ff/1d/960bb4017c68674a1cb099534840f18d3def3ce44aed12b5ed8b78e0153e/Werkzeug-2.0.0-py3-none-any.whl (288 kB)
Collecting itsdangerous>=2.0Using cached https://pypi.tuna.tsinghua.edu.cn/packages/c4/af/93fb95dc8bc90a5580084f3dc4be049d243c6f687ff2e253ed3a6df30db4/itsdangerous-2.0.0-py3-none-any.whl (18 kB)
Collecting MarkupSafe>=2.0.0rc2Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ec/93/ecfbd40bd9392a054af0a5fa429cdd5309582afa57128f4026fdeefb5475/MarkupSafe-2.0.0-cp36-cp36m-win_amd64.whl (14 kB)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages (from requests>=2.2->pyspider) (1.26.4)
Collecting certifi>=2017.4.17Downloading https://pypi.tuna.tsinghua.edu.cn/packages/5e/a0/5f06e1e1d463903cf0c0eebeb751791119ed7a4b3737fdc9a77f1cdfb51f/certifi-2020.12.5-py2.py3-none-any.whl (147 kB)|████████████████████████████████| 147 kB ...
Collecting idna<3,>=2.5Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a2/38/928ddce2273eaa564f6f50de919327bf3a00f091b5baba8dfa9460f3a8a8/idna-2.10-py2.py3-none-any.whl (58 kB)|████████████████████████████████| 58 kB ...
Collecting dataclassesDownloading https://pypi.tuna.tsinghua.edu.cn/packages/fe/ca/75fac5856ab5cfa51bbbcefa250182e50441074fdc3f803f6e76451fab43/dataclasses-0.8-py3-none-any.whl (19 kB)
Collecting PyYAMLDownloading https://pypi.tuna.tsinghua.edu.cn/packages/30/d0/8699372d1c22202e80b160527f8412d98a5edfefeefac056df3997e84801/PyYAML-5.4.1-cp36-cp36m-win_amd64.whl (209 kB)|████████████████████████████████| 209 kB ...
Collecting json5Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2b/81/22bf51a5bc60dde18bb6164fd597f18ee683de8670e141364d9c432dd3cf/json5-0.9.5-py2.py3-none-any.whl (17 kB)
Collecting defusedxmlDownloading https://pypi.tuna.tsinghua.edu.cn/packages/07/6c/aa3f2f849e01cb6a001cd8554a88d4c77c5c1a31c95bdf1cf9301e6d9ef4/defusedxml-0.7.1-py2.py3-none-any.whl (25 kB)
Using legacy 'setup.py install' for pyspider, since package 'wheel' is not installed.
Installing collected packages: MarkupSafe, dataclasses, colorama, Werkzeug, Jinja2, itsdangerous, click, six, PyYAML, json5, idna, Flask, defusedxml, chardet, certifi, wsgidav, u-msgpack-python, tornado, tblib, requests, pycurl, Flask-Login, pyspiderRunning setup.py install for pyspider ... done
Successfully installed Flask-2.0.0 Flask-Login-0.5.0 Jinja2-3.0.0 MarkupSafe-2.0.0 PyYAML-5.4.1 Werkzeug-2.0.0 certifi-2020.12.5 chardet-4.0.0 click-8.0.0 colorama-0.4.4 dataclasses-0.8 defusedxml-0.7.1 idna-2.10 itsdangerous-2.0.0 json5-0.9.5 pycurl-7.43.0.5 pyspider-0.3.10 requests-2.25.1 six-1.16.0 tblib-1.7.0 tornado-4.5.3 u-msgpack-python-2.7.1 wsgidav-3.1.0

2、安装phantomjs

还需要安装phantomjs,打开plantomjs网址,下载Windows版本,下载完成后进行解压,并将bin目录下的phantomjs.exe添加到系统的环境变量下即可

四、执行和排错

安装完成后直接执行pyspider all启动,然后,发现,报错了。。。。。

C:\Users\Donnie>pyspider all
c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\pyspider\libs\utils.py:196: FutureWarning: timeout is not supported on your platform.warnings.warn("timeout is not supported on your platform.", FutureWarning)
phantomjs fetcher running on port 25555
[I 210513 09:22:04 result_worker:49] result_worker starting...
[I 210513 09:22:05 processor:211] processor starting...
[I 210513 09:22:05 scheduler:647] scheduler starting...
[I 210513 09:22:05 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 210513 09:22:05 tornado_fetcher:638] fetcher starting...
[I 210513 09:22:05 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 210513 09:22:06 app:84] webui exiting...
[I 210513 09:22:06 tornado_fetcher:671] fetcher exiting...
[I 210513 09:22:06 scheduler:663] scheduler exiting...
[I 210513 09:22:06 result_worker:66] result_worker exiting...
[I 210513 09:22:07 processor:229] processor exiting...
Traceback (most recent call last):File "C:\Users\Donnie\AppData\Local\Programs\Python\Python36\Scripts\pyspider-script.py", line 11, in <module>load_entry_point('pyspider==0.3.10', 'console_scripts', 'pyspider')()File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\pyspider\run.py", line 754, in maincli()File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\click\core.py", line 1134, in __call__return self.main(*args, **kwargs)File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\click\core.py", line 1059, in mainrv = self.invoke(ctx)File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\click\core.py", line 1665, in invokereturn _process_result(sub_ctx.command.invoke(sub_ctx))File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\click\core.py", line 1401, in invokereturn ctx.invoke(self.callback, **ctx.params)File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\click\core.py", line 767, in invokereturn __callback(*args, **kwargs)File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\click\decorators.py", line 26, in new_funcreturn f(get_current_context(), *args, **kwargs)File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\pyspider\run.py", line 497, in allctx.invoke(webui, **webui_config)File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\click\core.py", line 767, in invokereturn __callback(*args, **kwargs)File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\click\decorators.py", line 26, in new_funcreturn f(get_current_context(), *args, **kwargs)File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\pyspider\run.py", line 384, in webuiapp.run(host=host, port=port)File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\pyspider\webui\app.py", line 59, in runfrom .webdav import dav_appFile "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\pyspider\webui\webdav.py", line 216, in <module>dav_app = WsgiDAVApp(config)File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\wsgidav\wsgidav_app.py", line 133, in __init___check_config(config)File "c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\wsgidav\wsgidav_app.py", line 117, in _check_configraise ValueError("Invalid configuration:\n  - " + "\n  - ".join(errors))
ValueError: Invalid configuration:- Deprecated option 'domaincontroller': use 'http_authenticator.domain_controller' instead.

是由于WsgiDAV发布了版本 pre-release 3.x,所以需要修改部分源码

首先找到pyspider的安装路径,打开pyspider/webui/webdav.py,将209行

'domaincontroller': NeedAuthController(app),

修改为:

'http_authenticator':{'HTTPAuthenticator':NeedAuthController(app),},

再次运行,还是报错

    from werkzeug.wsgi import DispatcherMiddleware
ImportError: cannot import name 'DispatcherMiddleware'

由于安装Werkzeug版本为2.0.0,这个版本没有DispatcherMiddleware这个方法,所以进行降级,安装0.16.1版本,执行下面两行命令即可

1、python -m pip uninstall werkzeug
2、python -m pip install werkzeug==0.16.1

安装完成后会有

flask 2.0.0 requires Werkzeug>=2.0, but you have werkzeug 0.16.1 which is incompatible.

的提示,意思是flask要求Werkzeug版本>=2.0,故对flask也进行降级,安装0.11版本即可,执行下面两行命令:

1、python -m pip uninstall flask
2、python -m pip install flask==0.11

再次运行pyspider all命令,发现已经启动成功

C:\Users\Donnie>pyspider all
c:\users\donnie\appdata\local\programs\python\python36\lib\site-packages\pyspider\libs\utils.py:196: FutureWarning: timeout is not supported on your platform.warnings.warn("timeout is not supported on your platform.", FutureWarning)
phantomjs fetcher running on port 25555
[I 210513 12:03:52 result_worker:49] result_worker starting...
[I 210513 12:03:52 processor:211] processor starting...
[I 210513 12:03:52 scheduler:647] scheduler starting...
[I 210513 12:03:52 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 210513 12:03:53 tornado_fetcher:638] fetcher starting...
[I 210513 12:03:53 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 210513 12:03:53 app:76] webui running on 0.0.0.0:5000

在电脑Chrome输入http://localhost:5000/,发现控制台可以正确输出

Python pyspider的安装相关推荐

  1. Python + PySpider 抓取百度图片搜索的图片

    说明 1.PySpider 是一个方便并且功能强大的Python爬虫框架 2.PySpider 依赖于PhantomJS 3.windows平台,PySpider 与64位的Python兼容不太好,需 ...

  2. python pycharm 包 安装问题

    20211006 https://blog.csdn.net/anshuai_aw1/article/details/83749395 windows pyfm安装 20210930 在pycharm ...

  3. Python:Scrapy的安装和入门案例

    Scrapy的安装介绍 Scrapy框架官方网址:http://doc.scrapy.org/en/latest Scrapy中文维护站点:http://scrapy-chs.readthedocs. ...

  4. Windows下Python 3.6 安装BeautifulSoup库

    " 介绍Python库BeautifulSoup安装." 01 - BeautifulSoup库介绍 Beautiful Soup是Python的一个库,支持Python 2和Py ...

  5. python开发环境安装

    PyCharm的安装地址:http://www.jetbrains.com/pycharm/download/#section=windows,免费的可以选择社区版本 Python解释器的安装地址:h ...

  6. Python环境的安装(Anaconda+Jupyter notebook+Pycharm)

    点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达 本文总结了Windows下Python环境的安装,包括Anacon ...

  7. python mysql 驱动安装

    为什么80%的码农都做不了架构师?>>>    安装组件: python 3.4 + django 1.7 + mysql connector driver 系统平台: window ...

  8. windows 10 anaconda python 3.7 安装 pytorch-gpu

    win 10 anaconda python 3.7 安装 pytorch 2019-6-1:清华更新源已经关闭了 先添加清华源: 安装GPU版pytorch conda install pytorc ...

  9. windows 10 anaconda python 3.7 安装keras-gpu tensorflow-gpu

    我的个人博客:zhang0peter的个人博客 win 10 anaconda python3.7 安装keras tensorflow-gpu pytorch的安装参考这篇文章:windows an ...

最新文章

  1. 从git仓库中删除.idea文件夹的小技巧
  2. mysql数据库管理文件_数据库管理中文件的使用教程
  3. java并发 并行 串行
  4. distcc源码研究三
  5. [20171130]关于rman的一些总结.txt
  6. kettle7.1 右上角不显示connect
  7. 国内远程医疗市场快速增长
  8. pytorch 创建神经网络
  9. mysql 创建用户并赋予用户权限
  10. 地图采集商家,附近商家,最新企业信息采集软件的使用教程
  11. translate maketrans 方法详解
  12. 头条学院-新媒体训练营第10期 | 10.23笔记(新媒体:一代人的机遇)
  13. python halcon_HALCON高级篇:常用分类器及其特点
  14. 软件质量保证与测试大作业,软件测试大作业..docx
  15. Java课程设计——仓库商品管理系统
  16. WPF中任务栏只显示主窗口
  17. 7-1 定期存款 (10 分)
  18. 计算机图形学中的四元数(Quaternions)
  19. php与python缺点_php,python,ruby,perl的优缺点?
  20. 14 ,spark sql 例子 :四张表数据( student,course,score,teacher ),建表

热门文章

  1. 怎么在PDF上修改文字,PDF修改文字的步骤
  2. 无延时直播/超低延时直播画面同步性测试(实测组图)
  3. 2、JavaScript快速入门
  4. 【项目数据优化一】敏感数据脱敏处理
  5. 病毒、蠕虫和木马的区别
  6. 矩阵指数 matlab,空间计量-矩阵指数空间模型
  7. 双重差分模型能做固定效应吗_Stata:双重差分的固定效应模型 (DID)
  8. 博一就完成了SCI论文发表要求是一种怎样的体验?
  9. roc曲线spss怎么做_SPSS单因素ROC曲线及多因素联合诊断ROC曲线绘制(原创手把手) - 医学统计和生物统计讨论版 -丁香园论坛...
  10. Saliency as Evidence: Event Detection with Trigger Saliency Attribution 论文解读