Scrapy是一个开源的机遇twisted框架的python的单机爬虫,该爬虫实际上包含大多数网页抓取的工具包,用于爬虫下载端以及抽取端。

安装环境:

centos5.4
python2.7.3

安装步骤:

1.下载python2.7  http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz

[root@zxy-websgs ~]# wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz -P /opt[root@zxy-websgs opt]# tar xvf Python-2.7.3.tgz [root@zxy-websgs Python-2.7.3]# ./configure [root@zxy-websgs Python-2.7.3]# make && make install

 验证python2.7安装

[root@zxy-websgs Python-2.7.3]# python2.7
Python 2.7.3 (default, Feb 28 2013, 03:08:43)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()

2.安装setuptools,http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz

[root@zxy-websgs ~]# wget http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz -P /opt/
[root@zxy-websgs opt]# tar zxvf setuptools-0.6c11.tar.gz
[root@zxy-websgs setuptools-0.6c11]# python2.7 setup.py  install

3.安装Twisted

[root@zxy-websgs setuptools-0.6c11]# easy_install Twisted
......
Installed /usr/local/lib/python2.7/site-packages/Twisted-12.3.0-py2.7-linux-x86_64.egg
......
Installed /usr/local/lib/python2.7/site-packages/zope.interface-4.0.4-py2.7-linux-x86_64.egg

Twisted要安装zope.interface,可以从下面地址下载

zope.interface:http://pypi.python.org/packages/source/z/zope.interface/zope.interface-4.0.1.tar.gz

twisted:http://twistedmatrix.com/Releases/Twisted/12.1/Twisted-12.1.0.tar.bz2

5.安装w3lib

[root@zxy-websgs setuptools-0.6c11]# easy_install -U w3lib
Searching for w3lib
Reading http://pypi.python.org/simple/w3lib/
Reading http://github.com/scrapy/w3lib
Best match: w3lib 1.2
Downloading http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz#md5=f929d5973a9fda59587b09a72f185a9e
Processing w3lib-1.2.tar.gz
Running w3lib-1.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-wm_1BB/w3lib-1.2/egg-dist-tmp-2DQHY_
zip_safe flag not set; analyzing archive contents...
Adding w3lib 1.2 to easy-install.pth fileInstalled /usr/local/lib/python2.7/site-packages/w3lib-1.2-py2.7.egg
Processing dependencies for w3lib
Finished processing dependencies for w3lib

w3lib:http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz

6.安装libxml2或者用easy_install安装lxml

  安装失败时参考:http://www.coder4.com/archives/3660

[root@zxy-websgs lxml-3.1.0]# easy_install lxml

验证lxml安装

[root@zxy-websgs lxml-3.1.0]# python2.7
Python 2.7.3 (default, Feb 28 2013, 03:08:43)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml
>>> exit()

也可以安装libxml2,官网上推荐安装2.6.28或者以上的版本,但在官网上没找到,我先是安装的2.6.9的版本,运行scrapy时报以下错误

Traceback (most recent call last):File "/usr/local/bin/scrapy", line 5, in <module>pkg_resources.run_script('Scrapy==0.14.4', 'scrapy')File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 489, in run_scriptFile "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1207, in run_scriptFile "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>execute()File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 112, in executecmds = _get_commands_dict(inproject)File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 37, in _get_commands_dictcmds = _get_commands_from_module('scrapy.commands', inproject)File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 30, in _get_commands_from_modulefor cmd in _iter_command_classes(module):File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 21, in _iter_command_classesfor module in walk_modules(module_name):File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/utils/misc.py", line 65, in walk_modulessubmod = __import__(fullpath, {}, {}, [''])File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/commands/shell.py", line 8, in <module>from scrapy.shell import ShellFile "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/shell.py", line 14, in <module>from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelectorFile "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/__init__.py", line 30, in <module>from scrapy.selector.libxml2sel import *File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/libxml2sel.py", line 12, in <module>from .factories import xmlDoc_from_html, xmlDoc_from_xmlFile "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/factories.py", line 14, in <module>libxml2.HTML_PARSE_NOERROR + \
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

升级到2.6.21版本以后解决了。

libxml2.6.1:ftp://xmlsoft.org/libxml2/python/libxml2-python-2.6.21.tar.gz

7.安装pyOpenSSL(这个是可选安装的,主要为了使scrapy能够支持https)

用easy_install pyOpenSSL安装的是pyOpenSSL-0.13版本,没安装成功,于是手动下载.011版本来进行安装。

[root@zxy-websgs opt]# wget http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz -P /opt
[root@zxy-websgs opt]# tar zxvf pyOpenSSL-0.11.tar.gz
[root@zxy-websgs pyOpenSSL-0.11]# python2.7 setup.py install

pyOpenSSL:http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz

8.安装scrapy

[root@zxy-websgs pyOpenSSL-0.11]# easy_install -U Scrapy

验证安装

[root@zxy-websgs pyOpenSSL-0.11]# scrapy
Scrapy 0.16.4 - no active projectUsage:scrapy <command> [options] [args]Available commands:fetch         Fetch a URL using the Scrapy downloaderrunspider     Run a self-contained spider (without creating a project)settings      Get settings valuesshell         Interactive scraping consolestartproject  Create new projectversion       Print Scrapy versionview          Open URL in browser, as seen by Scrapy[ more ]      More commands available when run from project directoryUse "scrapy <command> -h" to see more info about a command

scrapy:http://pypi.python.org/packages/source/S/Scrapy/Scrapy-0.14.4.tar.gz

总结:

pyOpenSSL单独安装的时候不成功,也可以先下载pyOpenSSL0.11进行安装,再使用easy_install -U Scrapy进行全程安装

yuanwen :::    http://www.cnblogs.com/xiaoruoen/archive/2013/02/27/2933854.html

Centos下安装Scrapy相关推荐

  1. Python下安装Scrapy

    Python下安装Scrapy 依次 执行如下命令: pip install wheel pip install lxml pip install pyOpenSSL pip install D:\T ...

  2. Centos下安装mysql 总结

    一.MySQL安装 Centos下安装mysql 请点开:http://www.centoscn.com/CentosServer/sql/2013/0817/1285.html 二.MySQL的几个 ...

  3. linux卸载欧朋浏览器,如何在Centos下安装opera浏览器

    如何在Centos下安装opera浏览器 ,Opera目前是Linux平台上性能最优的浏览器,而且Opera中国团队本身即定位于Opera的研发中心,主要也是负责全球Linux平台项目的开发,这个版本 ...

  4. 在CentOS下安装apche+tomcat+mysql+php

    在CentOS下安装apche+tomcat+mysql+php 本例中所用到的软件 Apache 2.2 Sun的JDK-1_5_0_12-linux-i586 MySQL: mysql-5.0.4 ...

  5. python爬虫scrapy步骤mac系统_Mac中Python 3环境下安装scrapy的方法教程

    前言 最近抽空想学习一下python的爬虫框架scrapy,在mac下安装的时候遇到了问题,逐一解决了问题,分享一下,话不多说了,来一起看看详细的介绍吧. 步骤如下: # 在Mac上Python3环境 ...

  6. Linux(CentOs)下安装Phantomjs + Casperjs

    Linux(CentOs)下安装Phantomjs + Casperjs 是参照cnMiss's Blog http://ju.outofmemory.cn/entry/70691的博客进行安装的 1 ...

  7. CentOS下安装JDK7 转载

    转载地址:http://www.cnblogs.com/rilley/archive/2012/02/02/2335395.html CentOS下安装JDK7 下载地址:http://www.ora ...

  8. centos解压zip命令_2、centos下安装elasticsearch-head

    1.下载 https://github.com/mobz/elasticsearch-head/archive/master.zip 2.解压 unzip elasticsearch-head-mas ...

  9. CentOS 下安装

    2016年12月5日15:25:58 ----------------------------------- 通常情况下在centos下安装软件就用yum. 关键是,使用yum你要知道安装包的名字是什 ...

  10. Centos下安装mysql(二进制版)

    Centos下安装mysql(二进制版) 1.下载安装包,选择相应的平台.版本,比如,选择64位Linux平台下的MySQL二进制包"Linux-Generic (glibc 2.5)(x8 ...

最新文章

  1. CZoneSoft出品: 音频视频在线录制系列之 AV留言本 简介
  2. 第四阶段 04_Linux基本操作
  3. antd 设置表头属性_解决react使用antd table组件固定表头后,表头和表体列不对齐以及配置fixed固定左右侧后行高度不对齐...
  4. JFreeChart设置背景图片 .
  5. 【剑指offer - C++/Java】5、用两个栈实现队列
  6. C++异常处理类与自定义异常处理类
  7. C语言 函数不定长参数 ##__VA_ARGS__经典案例 - C语言零基础入门教程
  8. LeetCode篇之链表:1290(二进制链表转整数)
  9. 95-190-730-源码-WindowFunction-窗口操作符侧的窗口函数(内部函数)
  10. Spring发送基于freemarker模板的邮件
  11. 网上照片之博客照片与网店照片拍摄心得
  12. syslog收到的日志存放在哪里_Linux使用RsyslogServer记录远程主机系统日志
  13. delphi 多个线程 多个进度条_Python 进阶知识全篇-多线程
  14. Redis的lua脚本
  15. 惠普触控板使用指南_hp触摸板开关怎么使用 如何锁定hp触摸板
  16. IM即时通讯实现原理
  17. webservice(草稿)
  18. 聚集索引与非聚集索引的区别
  19. php上位机,OV7670摄像头上位机软件源码
  20. 优先队列默认是小顶堆吗_堆和堆傻傻分不清?进来!包教会!

热门文章

  1. javafx 与java,java桌面应用程序和javafx有什么区别?
  2. bufferedimage设置位深度_深度解读超级推荐自定义推广,快速上手最新推广利器!...
  3. 图解tcpip 第5版 pdf_16G906国标图集,装配式混凝土剪力墙结构住宅施工图解,PDF版...
  4. 编辑距离、拼写检查与度量空间:一个有趣的数据结构
  5. 2018_11_05_珍惜少年时
  6. Spring——基于注解的IOC配置常用注解
  7. http请求POST和GET调用接口以及反射动态调用Webservices类
  8. css之px自动转rem—“懒人”必备
  9. 01背包 hihocoder第六周
  10. sql查询数据库所有表(select * from sysobjects )