1,通过sor查询nutch抓取的结果

{"responseHeader": {"status": 0,"QTime": 2,"params": {"indent": "true","q": "title:幻想","_": "1418266706916","wt": "json"}},"response": {"numFound": 7,"start": 0,"docs": [{"content": "幻想江湖-2.2资料片,巅峰对决,震撼来袭! 跳转官网 装备凝练 巅峰擂台 万圣之夜 新版时装","id": "http://hxjh.zqgame.com/","title": "幻想江湖-2.2资料片,巅峰对决,震撼来袭!","segment": "20141211104005","boost": 0,"digest": "c61521c1861b1a7574c8920fd27d0155","tstamp": "2014-12-11T02:40:14.477Z","url": "http://hxjh.zqgame.com/","anchor": ["幻想江湖","幻想江湖"],"_version_": 1487159323035435000},{"content": "幻想江湖-鬼灵精怪万圣节 开启时间 : 10 月 30 日 万圣节礼包领取> 万圣节前夕,为了避免恶灵干扰,大侠们纷纷挂起了南瓜灯,驱逐鬼怪。江湖有一传闻,一群糖果商人行经龙脉岭时,因为身上的糖果、饼干、宝石而找来鬼魂附身,如果帮助他们驱逐了附身邪灵,将会获得他们道谢的礼物哦~! 1 万圣节天天有礼 2 练级打宝两不误 3 节日消费奖励翻倍 4 奖励兑换惊喜不断 5 洗炼折扣大放送 温馨提示: 活动期间,大侠们请每天提着南瓜灯,穿上蝙蝠衫,去龙脉去收集糖果饼干,要不停地说:“trick or treat.”(意思是给不给,不给就捣蛋)。要是不肯给的话,就用各种方法去惩罚他,例如:一招一个怪,“唰唰唰————”把龙脉挂个三小时! 关闭 恭喜你获得幻想江湖万圣节礼包! IOS用户领取: 安卓和越狱用户领取: 有效时间: 即日-2014.11.30 兑换次数: 只限兑换一次 兑换范围: 全服 礼包使用方法: 登录游戏后,点击游戏右上方【领奖】-【福利】-【礼包】后输入正确礼品卡号领取礼包奖励!","id": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_03.html","title": "幻想江湖-鬼灵精怪万圣节","segment": "20141211104057","boost": 0,"digest": "5ae39251ad06017e4e1854aae9129126","tstamp": "2014-12-11T02:41:37.669Z","url": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_03.html","anchor": ["万圣之夜"],"_version_": 1487159633802952700},{"content": "幻想江湖-优雅转身华丽时装首曝 夜魔游龙 西式时装 全新时装新品上架啦,这批时装看上去是不是和以前大有不同呢,此次大胆革新,看到下面的时装,不禁令人想到后面可能真的会有结婚系统咯,新版本新换装,不走平凡路~我们就是这样的与众不同! 进入官网 返回活动首页","id": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_04.html","title": "幻想江湖-优雅转身华丽时装首曝","segment": "20141211104057","boost": 0,"digest": "e086540bf0f721f39560440c85d2161f","tstamp": "2014-12-11T02:41:47.879Z","url": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_04.html","anchor": ["新版时装"],"_version_": 1487159633805049900},{"content": "《幻想江湖》官网-首部超萌动作武侠片!今天开始,做武侠片主人公 首页 新闻中心 游戏资料 游戏论坛 分享到: 安卓下载 ios越狱下载 ios正版下载 礼包领取 1 2 3 4 幻想江湖绝尚发布会精彩视频 最新 新闻 公告 活动 《幻想江湖》IOS18区“美人天下”12月10日火爆开启 2014史上最萌武侠手游来袭!不用吃药,放弃治疗,12月10日上午11:00新区“美人天下”火爆开启!快来没日没夜一起萌萌哒!... 查看详情 > 2014-12-10 • [新闻] 菜鸟进阶强力党 《幻想江湖》装备属性轻松堆 2014-12-10 • [新闻] 全新资料片即将来袭《幻想江湖》四大活动任你玩 2014-12-09 • [活动] 双12 玩幻想送福利 2014-12-09 • [新闻] 细节决定成败 《幻想江湖》人物属性全掌握 2014-12-09 • [活动] 《幻想江湖》IOS18区”美人天下”十六大活动 2014-12-08 • [新闻] 刀尖上的武侠 挑战《幻想江湖》秦陵副本 2014-12-10 • [新闻] 菜鸟进阶强力党 《幻想江湖》装备属性轻松堆 2014-12-10 • [新闻] 全新资料片即将来袭《幻想江湖》四大活动任你玩 2014-12-09 • [新闻] 细节决定成败 《幻想江湖》人物属性全掌握 2014-12-08 • [新闻] 刀尖上的武侠 挑战《幻想江湖》秦陵副本 2014-12-08 • [新闻] 新版“姑姑”遭吐槽 《幻想江湖》还你女神梦 2014-12-05 • [新闻] 《幻想江湖》我们结婚吧!——订婚篇 2014-12-03 • [公告] 幻想江湖-公测9~14区 数据互通公告 2014-12-01 • [公告] 《幻想江湖》12月2日临时维护公告 2014-11-26 • [公告] 《幻想江湖》2.4版本更新 2014-11-25 • [公告] 幻想江湖临时维护公告 2014-11-25 • [公告] 《幻想江湖》appstore1~8服数据互通完毕 2014-11-25 • [公告] 幻想江湖-appstore数据互通延长公告 2014-12-09 • [活动] 双12 玩幻想送福利 2014-12-09 • [活动] 《幻想江湖》IOS18区”美人天下”十六大活动 2014-12-08 • [活动] 周末齐消费 欢乐享不停 2014-12-08 • [活动] 《幻想江湖》美女主播齐聚乐———回顾 2014-12-08 • [活动] 《幻想江湖》25区”独步江湖”十六大活动 2014-12-05 • [活动] 《幻想江湖》玩家体验指南——做好产品,专注体验 联系人:施若熙 联系QQ:744415486 手机:13510624817 邮箱:ruoxi.shi@zqgame.com 联系人:方彦琼 联系QQ:611535985 手机:13603061895 邮箱:yanqiong.fang@zqgame.com 玩家群② 264103428 企业客服QQ:800056019 客服热线:0755-86160520 特色玩法 玩家攻略 职业介绍 明教 唐门 天山 逍遥 18183 766 91手游网 合作媒体 ———————————————————— 微信公众号 新浪微博 腾讯微博 扫描二维码下载 快速注册 通行证: 密 码: 确认密码: 验证码: 立即注册 用户名 恭喜你已经注册成功! 关闭 恭喜您获得幻想江湖公测新手礼包! 你的礼包卡号是: 礼包使用方法: 登陆游戏后,点击游戏右上方【领奖】-【福利】-【礼包】后输入8位的礼包卡号领取礼包奖励!内容包含:止血丹*2、白色强化石*20、成长丹*5、易功丹*10、进阶丹*5。 关闭 微信公众号","id": "http://hxjh.zqgame.com/index.html","title": "《幻想江湖》官网-首部超萌动作武侠片!今天开始,做武侠片主人公","segment": "20141211104057","boost": 0,"digest": "3f9a2060e12f95316ee0201ce8a21da0","tstamp": "2014-12-11T02:41:01.462Z","url": "http://hxjh.zqgame.com/index.html","anchor": ["进入官网"],"_version_": 1487159633828118500},{"content": "【仙幻奇缘】官网 12.6首次开放公测!无商城,真正免费! 进入官网 论坛中心 游戏下载 购卡充值 1 2 3 4 5 媒体友链 通行证账号: 通行证密码: 确认密码: 验证码: 同意 《中青宝》协议 恭喜你!注册成功! 用户名是: 客户端 立即下载 获取特权礼包 版权所有:深圳中青宝互动网络股份有限公司 客服传真:0755-86368269 中华人民共和国增值电信业务经营许可证:粤B2-20030216 粤ICP备:09057836 网络文化经营许可证:文网文[2008]088号 中华人民共和国互联网出版许可证:新出网证(粤)字017号 每个IP只能参加一次抽奖, 谢谢您的参与!  ","id": "http://xh.zqgame.com/","title": "【仙幻奇缘】官网 12.6首次开放公测!无商城,真正免费!","segment": "20141211104057","boost": 0,"digest": "471def081683b7c5f94a39382e4c00a1","tstamp": "2014-12-11T02:41:02.165Z","url": "http://xh.zqgame.com/","anchor": ["仙幻奇缘","仙幻奇缘"],"_version_": 1487159634570510300},{"content": "《诸神世界》官方网站—3D魔幻战争网游 诸神世界 首页 新闻动态 游戏资料 下载微端 快速充值 官方论坛 下载微端 快速充值 VIP介绍 领取新手卡 选择大区 请选择服务器 风暴荒漠 战争血径 无尽沙海 燃烧平原 双线1-16服 领取中,请稍候…… 您的礼包号为: 更多服务器 《诸神世界》是一款MMORPG的3D国战网页游戏,采用魔幻风格,3D旋转俯瞰视角,以国家战争、团队冒险等玩法为特色,以大范围多维度强PVP玩法为核心的超激情游戏,体验游戏国战pk激情就来诸神世界。 0755-26635899 客服邮箱:kefu@zqgame.com 客服传真:0755-86368269 游戏QQ群:219759659 259942575 用户名: *   以字母开头由大小写字母、数字、下划线组成,长度为4-32位 密码: *   6-20字母、数字、符号组成,不含空格键、「\"」及「'」 确认密码: *   请再一次输入密码 1 2 3 4 最新 新闻 活动 公告 攻略 诸神世界混服部分区服数据互通公告 公告 06-04 诸神世界混服部分区服数据互通公告 公告 05-23 5月29日12点诸神新区-风暴荒漠火爆开启 新闻 05-14 5月15日12点诸神新区-亡魂峡谷火爆开启 公告 04-18 诸神世界混服合服活动精彩上线 公告 04-18 诸神世界混服部分区服数据互通公告 【新闻】 05-14 5月15日12点诸神新区-亡魂峡谷火爆开启 【新闻】 04-16 4月17日12点诸神新区-呼啸沙漠火爆开启 【新闻】 03-24 3月27日12点诸神新区-巨龙之吼火爆开启 【新闻】 03-17 3月20日12点诸神新区-尘风峡谷火爆开启 【新闻】 03-11 3月13日12点诸神新区-耳语海岸火爆开启 【活动】 04-02 《诸神世界》十大开服活动 【活动】 02-13 《诸神世界》元宵&情人节活动 【活动】 01-26 《诸神世界》春节活动 【活动】 11-21 诸神世界周末限时活动火爆上线 【活动】 11-08 双十一《诸神世界》劲爆大酬宾 【公告】 06-04 诸神世界混服部分区服数据互通公告 【公告】 05-23 5月29日12点诸神新区-风暴荒漠火爆开启 【公告】 04-18 诸神世界混服合服活动精彩上线 【公告】 04-18 诸神世界混服部分区服数据互通公告 【公告】 03-25 3月28日平台网络升级公告 魔 牧 枪 炮 术 战 魔 刃 狩猎灵魂 攻击方式:近程魔法攻击 核心属性:智力 敏捷 职业特质:隐匿暗杀能力 职业说明:刀锋舞者,狩猎着生者的灵魂。隐没于黑暗,游走于光明。不被历史描述,却是历史的主宰! 点击查看详情 牧 师 神的宠儿 攻击方式:中程魔法攻击 核心属性:精神 智力 职业特质:恢复治愈能力 职业说明:神之使徒,捍卫生者,拯救死者。信者永生,不信者也救赎。虔诚的信徒,是神的宠儿! 点击查看详情 枪 手 一击必杀 攻击方式:远程物理攻击 核心属性:力量 精神 职业特质:伤害输出 职业说明:猎命王者,半边恶魔半边天使。沉着冷静,是他们的特质;一击必杀,是他们的实力! 点击查看详情 魔 炮 焚天怒焰 攻击方式:远程魔法攻击 核心属性:智力 精神 职业特质:群体伤害 职业说明:焚天烈焰,吞噬罪孽与苍生。沉稳步伐,吼出战歌嘹亮;怒放炮火,点亮生命奇迹! 点击查看详情 术 士 破碎虚空 攻击方式:中程魔法攻击 核心属性:智力 精神 职业特质:战斗节奏控制能力 职业说明:掌握法则,智慧象征。探索真理,识古通今,洞悉未来。以世间威能,抑恶扬善,改天逆命,破碎虚空! 点击查看详情 战 士 金刚不坏 攻击方式:近程物理攻击 核心属性:体质 力量 职业特质:生存能力 职业说明:移动城墙,金刚不坏。战,则掠地千里;守,则万夫莫开。英勇的灵魂铸造不灭传奇! 点击查看详情 系统介绍 进阶指导 特色系统 活动玩法 结婚系统 | 职业介绍 | FAQ | VIP如何获得 | 坐骑强化 | 转职重修 | 战友系统 | 升级送祝福 | 日常任务 | 拍卖寄售 | 技能遗忘重生 | 道具商城 | 财产保护 炼金系统 | 星耀石 | 装备镶嵌 | 装备升阶 | 装备打孔 | 要塞守卫站 | 神器合成 | 宠物潜力修改 | 宝石摘除 斗气系统 | 羽翼系统 | 1V1模拟战 | 移民系统 | 击鼓传花 | 情缘任务 | 神圣血脉 | 军衔系统 | 钓鱼系统 | 称号系统 | 封印进度 | 离线经验 巴比伦塔 | 跨区国战 | 跨区巡游 | 跨区极速狂飙 | 跨区组队争夺战 | 超级血战到底 | 血战到底 | 小丑的梦境 | 王者试炼 | 探险者地宫 | 前线速递 | 骑魂谷 | 冒险岛 | 极速狂飙 | 毁灭神迹 | 国家正式战争 | 国家远征 | 国家情报 | 国家BOSS | 藏宝峡谷 游戏壁纸 游戏截图 玩家相册 MORE                         265G百科 073专区 新浪爱问 抵制不良游戏 拒绝盗版游戏 注意自我保护 谨防上当受骗 适度游戏益脑 沉迷游戏伤身 合理安排时间 享受健康生活 增值电信许可证:粤B2-20120680 网络文化经营许可证: 粤网文[2014]0615-215号 粤ICP备09057836号 深圳市卓页互动网络科技有限公司 Copyright © 2012-2014 All Rights Reserved 本游戏适合18岁以上用户,不含暴力、恐怖、残酷、色情等妨害未成年人身心健康的内容,属于绿色健康产品 yy","id": "http://zs.ucjoy.com/","title": "《诸神世界》官方网站—3D魔幻战争网游","cache": "content","segment": "20141211104057","boost": 0,"digest": "8d00af8aaa03c2cf68a69dc68892b764","tstamp": "2014-12-11T02:41:18.686Z","url": "http://zs.ucjoy.com/","anchor": ["官网","诸神世界"],"_version_": 1487159634641813500},{"content": "《诸神世界》官方网站—3D魔幻战争网游 诸神世界 首页 新闻动态 游戏资料 下载微端 快速充值 官方论坛 下载微端 快速充值 VIP介绍 领取新手卡 选择大区 请选择服务器 风暴荒漠 战争血径 无尽沙海 燃烧平原 双线1-16服 领取中,请稍候…… 您的礼包号为: 更多服务器 《诸神世界》是一款MMORPG的3D国战网页游戏,采用魔幻风格,3D旋转俯瞰视角,以国家战争、团队冒险等玩法为特色,以大范围多维度强PVP玩法为核心的超激情游戏,体验游戏国战pk激情就来诸神世界。 0755-26635899 客服邮箱:kefu@zqgame.com 客服传真:0755-86368269 游戏QQ群:219759659 259942575 用户名: *   以字母开头由大小写字母、数字、下划线组成,长度为4-32位 密码: *   6-20字母、数字、符号组成,不含空格键、「\"」及「'」 确认密码: *   请再一次输入密码 您所在的位置: 首页 > 服务器列表 推荐服务器列表 风暴荒漠 火爆 战争血径 火爆 我的服务器列表 你还未进入过游戏,请先登录游戏! 所有服务器 1-10 11-20 诸神混服 双线1-16服 火爆 风暴荒漠 火爆 战争血径 火爆 无尽沙海 火爆 燃烧平原 火爆 抵制不良游戏 拒绝盗版游戏 注意自我保护 谨防上当受骗 适度游戏益脑 沉迷游戏伤身 合理安排时间 享受健康生活 增值电信许可证:粤B2-20120680 网络文化经营许可证: 粤网文[2014]0615-215号 粤ICP备09057836号 深圳市卓页互动网络科技有限公司 Copyright © 2012-2014 All Rights Reserved 本游戏适合18岁以上用户,不含暴力、恐怖、残酷、色情等妨害未成年人身心健康的内容,属于绿色健康产品 yy","id": "http://zs.ucjoy.com/serverlist.app","title": "《诸神世界》官方网站—3D魔幻战争网游","cache": "content","segment": "20141211104057","boost": 0,"digest": "30a836aae5886924d1a87d3ab1ad42c8","tstamp": "2014-12-11T02:41:13.476Z","url": "http://zs.ucjoy.com/serverlist.app","anchor": ["进入新服","开始游戏"],"_version_": 1487159634643910700}]}
}

2,截图展示solr展示的结果

bin/crawl urls  crawl  http://xx.xx.xx.xx:8983/solr  5

3,nutch抓取时候日志:

<pre name="code" class="plain">2014-12-11 10:23:02,927 INFO  crawl.Injector - Injector: starting at 2014-12-11 10:23:022289 2014-12-11 10:23:02,928 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb2290 2014-12-11 10:23:02,928 INFO  crawl.Injector - Injector: urlDir: urls2291 2014-12-11 10:23:02,929 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.2292 2014-12-11 10:23:03,210 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2293 2014-12-11 10:23:03,266 WARN  snappy.LoadSnappy - Snappy native library not loaded2294 2014-12-11 10:23:03,748 INFO  regex.RegexURLNormalizer - can't find rules for scope 'inject', using default2295 2014-12-11 10:23:04,496 INFO  crawl.Injector - Injector: Total number of urls rejected by filters: 02296 2014-12-11 10:23:04,496 INFO  crawl.Injector - Injector: Total number of urls after normalization: 12297 2014-12-11 10:23:04,496 INFO  crawl.Injector - Injector: Merging injected urls into crawl db.2298 2014-12-11 10:23:04,779 INFO  crawl.Injector - Injector: overwrite: false2299 2014-12-11 10:23:04,779 INFO  crawl.Injector - Injector: update: false2300 2014-12-11 10:23:05,606 INFO  crawl.Injector - Injector: URLs merged: 12301 2014-12-11 10:23:05,611 INFO  crawl.Injector - Injector: Total new urls injected: 02302 2014-12-11 10:23:05,612 INFO  crawl.Injector - Injector: finished at 2014-12-11 10:23:05, elapsed: 00:00:022303 2014-12-11 10:23:06,551 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2304 2014-12-11 10:23:06,552 INFO  crawl.Generator - Generator: starting at 2014-12-11 10:23:062305 2014-12-11 10:23:06,552 INFO  crawl.Generator - Generator: Selecting best-scoring urls due for fetch.2306 2014-12-11 10:23:06,552 INFO  crawl.Generator - Generator: filtering: false2307 2014-12-11 10:23:06,552 INFO  crawl.Generator - Generator: normalizing: true2308 2014-12-11 10:23:06,552 INFO  crawl.Generator - Generator: topN: 500002309 2014-12-11 10:23:07,201 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule2310 2014-12-11 10:23:07,202 INFO  crawl.AbstractFetchSchedule - defaultInterval=25920002311 2014-12-11 10:23:07,202 INFO  crawl.AbstractFetchSchedule - maxInterval=77760002312 2014-12-11 10:23:07,211 INFO  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default2313 2014-12-11 10:23:07,267 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule2314 2014-12-11 10:23:07,267 INFO  crawl.AbstractFetchSchedule - defaultInterval=25920002315 2014-12-11 10:23:07,267 INFO  crawl.AbstractFetchSchedule - maxInterval=77760002316 2014-12-11 10:23:07,272 INFO  regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default2317 2014-12-11 10:23:07,875 INFO  crawl.Generator - Generator: Partitioning selected urls for politeness.2318 2014-12-11 10:23:08,875 INFO  crawl.Generator - Generator: segment: crawl/segments/201412111023082319 2014-12-11 10:23:09,051 INFO  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default2320 2014-12-11 10:23:09,993 INFO  crawl.Generator - Generator: finished at 2014-12-11 10:23:09, elapsed: 00:00:032321 2014-12-11 10:23:10,681 INFO  fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:23:102322 2014-12-11 10:23:10,681 INFO  fetcher.Fetcher - Fetcher: segment: crawl/segments/201412111023082323 2014-12-11 10:23:10,681 INFO  fetcher.Fetcher - Fetcher Timelimit set for : 14182753906812324 2014-12-11 10:23:10,956 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2325 2014-12-11 10:23:11,415 INFO  fetcher.Fetcher - Using queue mode : byHost2326 2014-12-11 10:23:11,415 INFO  fetcher.Fetcher - Fetcher: threads: 502327 2014-12-11 10:23:11,415 INFO  fetcher.Fetcher - Fetcher: time-out divisor: 22328 2014-12-11 10:23:11,435 INFO  fetcher.Fetcher - QueueFeeder finished: total 18 records + hit by time limit :02329 2014-12-11 10:23:11,585 INFO  fetcher.Fetcher - Using queue mode : byHost2330 2014-12-11 10:23:11,586 INFO  fetcher.Fetcher - Using queue mode : byHost2331 2014-12-11 10:23:11,586 INFO  fetcher.Fetcher - fetching http://v.zqgame.com/moviePlay/goMoviePlay/5/001 (queue crawl delay=5000ms)2332 2014-12-11 10:23:11,587 INFO  fetcher.Fetcher - Using queue mode : byHost2348 2014-12-11 10:23:11,597 INFO  http.Http - http.proxy.host = null2349 2014-12-11 10:23:11,597 INFO  http.Http - http.proxy.port = 80802350 2014-12-11 10:23:11,597 INFO  http.Http - http.timeout = 100002351 2014-12-11 10:23:11,597 INFO  http.Http - http.content.limit = 655362352 2014-12-11 10:23:11,597 INFO  http.Http - http.agent = My Nutch Spider/Nutch-1.92353 2014-12-11 10:23:11,597 INFO  http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.32354 2014-12-11 10:23:11,597 INFO  http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.82355 2014-12-11 10:23:11,597 INFO  fetcher.Fetcher - Using queue mode : byHost2387 2014-12-11 10:23:11,620 INFO  fetcher.Fetcher - Fetcher: throughput threshold: -12388 2014-12-11 10:23:11,620 INFO  fetcher.Fetcher - Fetcher: throughput threshold retries: 52389 2014-12-11 10:23:11,620 INFO  fetcher.Fetcher - fetcher.maxNum.threads can't be < than 50 : using 50 instead2390 2014-12-11 10:23:12,622 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12391 2014-12-11 10:23:13,622 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12392 2014-12-11 10:23:14,623 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12393 2014-12-11 10:23:15,623 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12394 2014-12-11 10:23:16,624 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12395 2014-12-11 10:23:16,891 INFO  fetcher.Fetcher - fetching http://v.zqgame.com/moviePlay/goMoviePlay/3/3 (queue crawl delay=5000ms)2396 2014-12-11 10:23:17,624 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12397 2014-12-11 10:23:18,625 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12398 2014-12-11 10:23:19,625 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12399 2014-12-11 10:23:20,626 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12400 2014-12-11 10:23:21,626 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12401 2014-12-11 10:23:21,935 INFO  fetcher.Fetcher - fetching http://v.zqgame.com/view/index (queue crawl delay=5000ms)2402 2014-12-11 10:23:22,627 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=12403 2014-12-11 10:23:23,627 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=12404 2014-12-11 10:23:24,627 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=12405 2014-12-11 10:23:25,628 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=13158 2014-12-11 10:27:15,997 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available3159 2014-12-11 10:27:15,997 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=03160 2014-12-11 10:27:16,004 INFO  fetcher.Fetcher - -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=03161 2014-12-11 10:27:16,005 INFO  fetcher.Fetcher - -activeThreads=03162 2014-12-11 10:27:16,629 INFO  fetcher.Fetcher - Fetcher: finished at 2014-12-11 10:27:16, elapsed: 00:00:073163 2014-12-11 10:27:17,320 INFO  parse.ParseSegment - ParseSegment: starting at 2014-12-11 10:27:173164 2014-12-11 10:27:17,320 INFO  parse.ParseSegment - ParseSegment: segment: crawl/segments/201412111027073165 2014-12-11 10:27:17,591 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3166 2014-12-11 10:27:18,518 INFO  crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature3167 2014-12-11 10:27:18,528 INFO  parse.ParseSegment - Parsed (12ms):http://v.zqgame.com/indexmain3168 2014-12-11 10:27:18,571 INFO  parse.ParseSegment - Parsed (1ms):http://v.zqgame.com/moviePlay/goMoviePlay/4/43169 2014-12-11 10:27:18,659 INFO  regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default3170 2014-12-11 10:27:18,871 INFO  parse.ParseSegment - ParseSegment: finished at 2014-12-11 10:27:18, elapsed: 00:00:013171 2014-12-11 10:27:19,794 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3172 2014-12-11 10:27:19,810 INFO  crawl.CrawlDb - CrawlDb update: starting at 2014-12-11 10:27:193173 2014-12-11 10:27:19,810 INFO  crawl.CrawlDb - CrawlDb update: db: crawl/crawldb3174 2014-12-11 10:27:19,811 INFO  crawl.CrawlDb - CrawlDb update: segments: [crawl/segments/20141211102707]3175 2014-12-11 10:27:19,811 INFO  crawl.CrawlDb - CrawlDb update: additions allowed: true3176 2014-12-11 10:27:19,811 INFO  crawl.CrawlDb - CrawlDb update: URL normalizing: false3177 2014-12-11 10:27:19,811 INFO  crawl.CrawlDb - CrawlDb update: URL filtering: false3178 2014-12-11 10:27:19,811 INFO  crawl.CrawlDb - CrawlDb update: 404 purging: false3179 2014-12-11 10:27:19,812 INFO  crawl.CrawlDb - CrawlDb update: Merging segment data into db.3180 2014-12-11 10:27:20,639 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3181 2014-12-11 10:27:20,639 INFO  crawl.AbstractFetchSchedule - defaultInterval=25920003182 2014-12-11 10:27:20,639 INFO  crawl.AbstractFetchSchedule - maxInterval=77760003183 2014-12-11 10:27:21,120 INFO  crawl.CrawlDb - CrawlDb update: finished at 2014-12-11 10:27:21, elapsed: 00:00:013184 2014-12-11 10:27:22,066 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3185 2014-12-11 10:27:22,067 INFO  crawl.LinkDb - LinkDb: starting at 2014-12-11 10:27:223186 2014-12-11 10:27:22,067 INFO  crawl.LinkDb - LinkDb: linkdb: crawl/linkdb3187 2014-12-11 10:27:22,067 INFO  crawl.LinkDb - LinkDb: URL normalize: true3188 2014-12-11 10:27:22,067 INFO  crawl.LinkDb - LinkDb: URL filter: true3189 2014-12-11 10:27:22,068 INFO  crawl.LinkDb - LinkDb: internal links will be ignored.3190 2014-12-11 10:27:22,068 INFO  crawl.LinkDb - LinkDb: adding segment: crawl/segments/201412111027073191 2014-12-11 10:27:23,376 INFO  crawl.LinkDb - LinkDb: merging with existing linkdb: crawl/linkdb3192 2014-12-11 10:27:23,688 INFO  regex.RegexURLNormalizer - can't find rules for scope 'linkdb', using default3193 2014-12-11 10:27:24,510 INFO  crawl.LinkDb - LinkDb: finished at 2014-12-11 10:27:24, elapsed: 00:00:023194 2014-12-11 10:27:25,209 INFO  crawl.DeduplicationJob - DeduplicationJob: starting at 2014-12-11 10:27:253195 2014-12-11 10:27:25,483 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3196 2014-12-11 10:27:26,760 INFO  crawl.DeduplicationJob - Deduplication: 2 documents marked as duplicates3197 2014-12-11 10:27:26,760 INFO  crawl.DeduplicationJob - Deduplication: Updating status of duplicate urls into crawl db.3198 2014-12-11 10:27:27,931 INFO  crawl.DeduplicationJob - Deduplication finished at 2014-12-11 10:27:27, elapsed: 00:00:023199 2014-12-11 10:27:28,623 INFO  indexer.IndexingJob - Indexer: starting at 2014-12-11 10:27:283200 2014-12-11 10:27:28,711 INFO  indexer.IndexingJob - Indexer: deleting gone documents: false3201 2014-12-11 10:27:28,711 INFO  indexer.IndexingJob - Indexer: URL filtering: false3202 2014-12-11 10:27:28,718 INFO  indexer.IndexingJob - Indexer: URL normalizing: false3203 2014-12-11 10:27:28,933 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3204 2014-12-11 10:27:28,933 INFO  indexer.IndexingJob - Active IndexWriters :3205 SOLRIndexWriter3206     solr.server.url : URL of the SOLR instance (mandatory)3207     solr.commit.size : buffer size when sending to SOLR (default 1000)3208     solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)3209     solr.auth : use authentication (default false)3210     solr.auth.username : use authentication (default false)3211     solr.auth : username for authentication3212     solr.auth.password : password for authentication3213 3214 3215 2014-12-11 10:27:28,937 INFO  indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb3216 2014-12-11 10:27:28,937 INFO  indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb3217 2014-12-11 10:27:28,937 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/201412111027073218 2014-12-11 10:27:29,087 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3219 2014-12-11 10:27:29,585 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off3220 2014-12-11 10:27:29,995 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3221 2014-12-11 10:27:30,022 INFO  solr.SolrMappingReader - source: content dest: content3222 2014-12-11 10:27:30,023 INFO  solr.SolrMappingReader - source: title dest: title3223 2014-12-11 10:27:30,023 INFO  solr.SolrMappingReader - source: host dest: host3224 2014-12-11 10:27:30,023 INFO  solr.SolrMappingReader - source: segment dest: segment3225 2014-12-11 10:27:30,023 INFO  solr.SolrMappingReader - source: boost dest: boost3226 2014-12-11 10:27:30,023 INFO  solr.SolrMappingReader - source: digest dest: digest3227 2014-12-11 10:27:30,023 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp3228 2014-12-11 10:27:30,054 INFO  solr.SolrIndexWriter - Indexing 2 documents3229 2014-12-11 10:27:30,175 INFO  solr.SolrIndexWriter - Indexing 2 documents2014-12-11 10:39:34,707 INFO  crawl.Injector - Injector: starting at 2014-12-11 10:39:343254 2014-12-11 10:39:34,707 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb3255 2014-12-11 10:39:34,707 INFO  crawl.Injector - Injector: urlDir: urls3256 2014-12-11 10:39:34,708 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.3257 2014-12-11 10:39:34,989 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3258 2014-12-11 10:39:35,046 WARN  snappy.LoadSnappy - Snappy native library not loaded3259 2014-12-11 10:39:35,528 INFO  regex.RegexURLNormalizer - can't find rules for scope 'inject', using default3260 2014-12-11 10:39:36,273 INFO  crawl.Injector - Injector: Total number of urls rejected by filters: 03261 2014-12-11 10:39:36,273 INFO  crawl.Injector - Injector: Total number of urls after normalization: 13262 2014-12-11 10:39:36,273 INFO  crawl.Injector - Injector: Merging injected urls into crawl db.3263 2014-12-11 10:39:36,577 INFO  crawl.Injector - Injector: overwrite: false3264 2014-12-11 10:39:36,577 INFO  crawl.Injector - Injector: update: false3265 2014-12-11 10:39:37,387 INFO  crawl.Injector - Injector: URLs merged: 13266 2014-12-11 10:39:37,392 INFO  crawl.Injector - Injector: Total new urls injected: 03267 2014-12-11 10:39:37,392 INFO  crawl.Injector - Injector: finished at 2014-12-11 10:39:37, elapsed: 00:00:023268 2014-12-11 10:39:38,327 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3269 2014-12-11 10:39:38,328 INFO  crawl.Generator - Generator: starting at 2014-12-11 10:39:383270 2014-12-11 10:39:38,328 INFO  crawl.Generator - Generator: Selecting best-scoring urls due for fetch.3271 2014-12-11 10:39:38,328 INFO  crawl.Generator - Generator: filtering: false3272 2014-12-11 10:39:38,328 INFO  crawl.Generator - Generator: normalizing: true3273 2014-12-11 10:39:38,328 INFO  crawl.Generator - Generator: topN: 500003274 2014-12-11 10:39:38,978 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3275 2014-12-11 10:39:38,978 INFO  crawl.AbstractFetchSchedule - defaultInterval=25920003276 2014-12-11 10:39:38,978 INFO  crawl.AbstractFetchSchedule - maxInterval=77760003277 2014-12-11 10:39:38,987 INFO  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default3278 2014-12-11 10:39:39,040 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3279 2014-12-11 10:39:39,040 INFO  crawl.AbstractFetchSchedule - defaultInterval=25920003280 2014-12-11 10:39:39,040 INFO  crawl.AbstractFetchSchedule - maxInterval=77760003281 2014-12-11 10:39:39,045 INFO  regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default3282 2014-12-11 10:39:39,649 INFO  crawl.Generator - Generator: Partitioning selected urls for politeness.3283 2014-12-11 10:39:40,649 INFO  crawl.Generator - Generator: segment: crawl/segments/201412111039403284 2014-12-11 10:39:40,814 INFO  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default3285 2014-12-11 10:39:41,755 INFO  crawl.Generator - Generator: finished at 2014-12-11 10:39:41, elapsed: 00:00:033286 2014-12-11 10:39:42,447 INFO  fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:39:423287 2014-12-11 10:39:42,447 INFO  fetcher.Fetcher - Fetcher: segment: crawl/segments/201412111039403288 2014-12-11 10:39:42,447 INFO  fetcher.Fetcher - Fetcher Timelimit set for : 14182763824473289 2014-12-11 10:39:42,720 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3290 2014-12-11 10:39:43,171 INFO  fetcher.Fetcher - Using queue mode : byHost3291 2014-12-11 10:39:43,171 INFO  fetcher.Fetcher - Fetcher: threads: 503292 2014-12-11 10:39:43,171 INFO  fetcher.Fetcher - Fetcher: time-out divisor: 23293 2014-12-11 10:39:43,182 INFO  fetcher.Fetcher - QueueFeeder finished: total 1 records + hit by time limit :03294 2014-12-11 10:39:43,336 INFO  fetcher.Fetcher - Using queue mode : byHost3295 2014-12-11 10:39:43,337 INFO  fetcher.Fetcher - Using queue mode : byHost3296 2014-12-11 10:39:43,337 INFO  fetcher.Fetcher - fetching http://passport.zqgame.com/common/agreement.jsp (queue crawl delay=5000ms)3297 2014-12-11 10:39:43,337 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available3298 2014-12-11 10:39:43,337 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13299 2014-12-11 10:39:43,337 INFO  fetcher.Fetcher - Using queue mode : byHost3300 2014-12-11 10:39:43,338 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available3301 2014-12-11 10:39:43,338 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13302 2014-12-11 10:39:43,338 INFO  fetcher.Fetcher - Using queue mode : byHost3303 2014-12-11 10:39:43,338 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available3304 2014-12-11 10:39:43,338 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13305 2014-12-11 10:39:43,338 INFO  fetcher.Fetcher - Using queue mode : byHost3306 2014-12-11 10:39:43,339 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available3307 2014-12-11 10:39:43,339 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13308 2014-12-11 10:39:43,339 INFO  fetcher.Fetcher - Using queue mode : byHost3309 2014-12-11 10:39:43,340 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available3310 2014-12-11 10:39:43,340 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13311 2014-12-11 10:39:43,340 INFO  fetcher.Fetcher - Using queue mode : byHost3312 2014-12-11 10:39:43,340 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available3313 2014-12-11 10:39:43,340 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13314 2014-12-11 10:39:43,340 INFO  fetcher.Fetcher - Using queue mode : byHost3315 2014-12-11 10:39:43,341 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available2014-12-11 10:39:57,352 INFO  indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb3511 2014-12-11 10:39:57,352 INFO  indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb3512 2014-12-11 10:39:57,353 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/201412111039403513 2014-12-11 10:39:57,501 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3514 2014-12-11 10:39:57,970 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off3515 2014-12-11 10:39:58,376 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3516 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: content dest: content3517 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: title dest: title3518 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: host dest: host3519 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: segment dest: segment3520 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: boost dest: boost3521 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: digest dest: digest3522 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp3523 2014-12-11 10:39:58,434 INFO  solr.SolrIndexWriter - Indexing 1 documents3524 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: content dest: content3525 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: title dest: title3526 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: host dest: host3527 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: segment dest: segment3528 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: boost dest: boost3529 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: digest dest: digest3530 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp3531 2014-12-11 10:40:00,130 INFO  indexer.IndexingJob - Indexer: finished at 2014-12-11 10:40:00, elapsed: 00:00:033532 2014-12-11 10:40:00,830 INFO  indexer.CleaningJob - CleaningJob: starting at 2014-12-11 10:40:003533 2014-12-11 10:40:01,101 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3534 2014-12-11 10:40:01,748 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3535 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: content dest: content3536 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: title dest: title3537 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: host dest: host3538 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: segment dest: segment3539 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: boost dest: boost3540 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: digest dest: digest3541 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp3542 2014-12-11 10:40:01,963 INFO  indexer.CleaningJob - CleaningJob: deleted a total of 10 documents3543 2014-12-11 10:40:01,967 WARN  mapred.FileOutputCommitter - Output path is null in cleanup3544 2014-12-11 10:40:02,382 INFO  indexer.CleaningJob - CleaningJob: finished at 2014-12-11 10:40:02, elapsed: 00:00:013545 2014-12-11 10:40:03,313 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable014-12-11 10:40:01,967 WARN  mapred.FileOutputCommitter - Output path is null in cleanup3544 2014-12-11 10:40:02,382 INFO  indexer.CleaningJob - CleaningJob: finished at 2014-12-11 10:40:02, elapsed: 00:00:013545 2014-12-11 10:40:03,313 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3546 2014-12-11 10:40:03,314 INFO  crawl.Generator - Generator: starting at 2014-12-11 10:40:033547 2014-12-11 10:40:03,314 INFO  crawl.Generator - Generator: Selecting best-scoring urls due for fetch.3548 2014-12-11 10:40:03,314 INFO  crawl.Generator - Generator: filtering: false3549 2014-12-11 10:40:03,314 INFO  crawl.Generator - Generator: normalizing: true3550 2014-12-11 10:40:03,315 INFO  crawl.Generator - Generator: topN: 500003551 2014-12-11 10:40:03,963 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3552 2014-12-11 10:40:03,964 INFO  crawl.AbstractFetchSchedule - defaultInterval=25920003553 2014-12-11 10:40:03,964 INFO  crawl.AbstractFetchSchedule - maxInterval=77760003554 2014-12-11 10:40:03,972 INFO  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default3555 2014-12-11 10:40:04,062 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3556 2014-12-11 10:40:04,062 INFO  crawl.AbstractFetchSchedule - defaultInterval=25920003557 2014-12-11 10:40:04,062 INFO  crawl.AbstractFetchSchedule - maxInterval=77760003558 2014-12-11 10:40:04,067 INFO  regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default3559 2014-12-11 10:40:04,635 INFO  crawl.Generator - Generator: Partitioning selected urls for politeness.3560 2014-12-11 10:40:05,636 INFO  crawl.Generator - Generator: segment: crawl/segments/201412111040053561 2014-12-11 10:40:05,803 INFO  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default3562 2014-12-11 10:40:06,747 INFO  crawl.Generator - Generator: finished at 2014-12-11 10:40:06, elapsed: 00:00:033563 2014-12-11 10:40:07,435 INFO  fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:40:073564 2014-12-11 10:40:07,435 INFO  fetcher.Fetcher - Fetcher: segment: crawl/segments/201412111040053565 2014-12-11 10:40:07,435 INFO  fetcher.Fetcher - Fetcher Timelimit set for : 14182764074353566 2014-12-11 10:40:07,707 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3567 2014-12-11 10:40:08,157 INFO  fetcher.Fetcher - Using queue mode : byHost3568 2014-12-11 10:40:08,158 INFO  fetcher.Fetcher - Fetcher: threads: 503569 2014-12-11 10:40:08,158 INFO  fetcher.Fetcher - Fetcher: time-out divisor: 23570 2014-12-11 10:40:08,187 INFO  fetcher.Fetcher - QueueFeeder finished: total 40 records + hit by time limit :03571 2014-12-11 10:40:08,326 INFO  fetcher.Fetcher - Using queue mode : byHost3572 2014-12-11 10:40:08,327 INFO  fetcher.Fetcher - Using queue mode : byHost3573 2014-12-11 10:40:08,327 INFO  fetcher.Fetcher - fetching http://hxjh.zqgame.com/ (queue crawl delay=5000ms)3574 2014-12-11 10:40:08,328 INFO  fetcher.Fetcher - fetching http://lt.zqgame.com/ (queue crawl delay=5000ms)3575 2014-12-11 10:40:08,328 INFO  fetcher.Fetcher - Using queue mode : byHost3576 2014-12-11 10:40:08,328 INFO  fetcher.Fetcher - fetching http://zscq.zqgame.com/ (queue crawl delay=5000ms)3577 2014-12-11 10:40:08,328 INFO  fetcher.Fetcher - Using queue mode : byHost3578 2014-12-11 10:40:08,329 INFO  fetcher.Fetcher - fetching http://lj2.zqgame.com/ (queue crawl delay=5000ms)3523 2014-12-11 10:39:58,434 INFO  solr.SolrIndexWriter - Indexing 1 documents3524 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: content dest: content3525 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: title dest: title3526 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: host dest: host3527 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: segment dest: segment3528 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: boost dest: boost3529 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: digest dest: digest3530 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp3531 2014-12-11 10:40:00,130 INFO  indexer.IndexingJob - Indexer: finished at 2014-12-11 10:40:00, elapsed: 00:00:033532 2014-12-11 10:40:00,830 INFO  indexer.CleaningJob - CleaningJob: starting at 2014-12-11 10:40:003533 2014-12-11 10:40:01,101 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3534 2014-12-11 10:40:01,748 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3535 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: content dest: content
14550 2014-12-11 10:59:29,551 INFO  fetcher.Fetcher - fetching http://pay.zqgame.com/pay/toPayPage/dxpc/107 (queue crawl delay=5000ms)
14551 2014-12-11 10:59:29,703 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=49, fetchQueues.getQueueCount=1

nutch1.9和solr4.5集成 输出信息相关推荐

  1. nutch1.3与solr3.4集成部署在eclipse上之——运行的输出日志

    nutch1.3与solr3.4集成部署在eclipse上成功 在eclipse上运行参数是: crawl urls -solr http://localhost:8080/l-nutch-solr ...

  2. Tomcat下项目调整Log4J的console输出级别,减少输出信息

    场景 输出优先级,由低到高 DEBUG,INFO,WARN,ERROR,FATAL 输出方式说明 org.apache.log4j.ConsoleAppender(控制台),   org.apache ...

  3. python执行结果在gui界面显示_Python PyQt5运行程序把输出信息展示到GUI图形界面上...

    概述:最近在赶毕业设计,遇到一个问题,爬虫模块我用PyQt5写了图形界面,为了将所有的输出信息都显示到图形界面上遇到了问题. 先演示一下效果最终效果吧,下面两张图用来镇楼.可以看到我们图形界面和程序运 ...

  4. linux命令 重定向%3e,linux输出信息调试信息重定向

    在运行linux的时候有所有的调试信息可以分为三个部分 1.bootloader输出信息 U-Boot 1.3.2(Nov 19 2016 - 22:02:08) DRAM: 64 MB Flash: ...

  5. python websocket django vue_Django资料 Vue实现网页前端实时反馈输出信息

    Django资料 Vue实现网页前端实时反馈输出信息 前言 功能实现:网也点击任务,页面实时返回执行的信息 本次的任务是执行本地的一个sh脚本 这个sh脚本就是每隔1S,输出一段文字 如果需要远程可以 ...

  6. 去除NSLog时间戳及其他输出信息

    如果不想看见NSLog的时间戳以及其他输出信息,我们可以在前面自行添加宏定义 #define NSLog(FORMAT, ...) printf("%s\n", [[NSStrin ...

  7. 《ActionScript 3.0基础教程》——1.3 在显示面板输出信息

    本节书摘来自异步社区<ActionScript 3.0基础教程>一书中的第1章,第1.3节,作者: [美]Doug Winnie 更多章节内容可以访问云栖社区"异步社区" ...

  8. 声明一个长方形类,属性有长和宽;操作有赋值、计算长方形的周长和面积、输出信息等,要求定义构造函数(缺省值为10)和析构函数。

    题目描述:声明一个长方形类,属性有长和宽:操作有赋值.计算长方形的周长和面积.输出信息等,要求定义构造函数(缺省值为10)和析构函数. 析构函数的作用:对象消亡时,自动被调用,用来释放对象占用的空间. ...

  9. STM8-STVD+Cosmic编译输出信息参数配置

    STM8-STVD+Cosmic编译输出信息参数配置

最新文章

  1. ORACLE 存储过程异常捕获并抛出
  2. 『Python』VS2015编译源码注意事项
  3. 电脑底下的任务栏不见了_拿到一台新的Windows电脑,我会做什么?
  4. spring源码分析之context:component-scan/vsannotation-config/
  5. 7 centos 修改磁盘uuid_Centos7修改分区空间
  6. CXF配置,ant文件说明及运行,运行cxf中带的项目
  7. 工厂模式例子之计算器的实现
  8. 常用函数的连续傅里叶变换对
  9. 图片加载------reactVirtualized
  10. 转载--数据库sql取整操作
  11. js判断移动端或是pc端
  12. 重庆钢铁泛微oa系统服务器更新时间,泛微全新OA系统-协同办公系统
  13. 地图WGS84和地图GCJ02
  14. edge bing搜索响应缓慢
  15. 【播放器】媒体播放器三大架构
  16. 印刷机在纸厚发生变化时的压力调节
  17. AlBaath Collegiate Programming Contest (2015) 总结
  18. 服务器重启后启动php项目
  19. 【hadoop】mapreduce面试题总结
  20. html的常用标签,系列篇

热门文章

  1. 我叫mt4服务器维护时间,我叫MT4维护更新公告 维护更新内容及时间
  2. mt4查看虚拟服务器,查mt4服务器地址
  3. linux应用程序注册表,如何打开 Linux 中 Windows 程序的注册表编辑器
  4. B站高性能微服务架构
  5. matlab二维三维图形绘制和坐标轴范围设置
  6. (个人笔记-无用勿喷)Windows软件包的安装与卸载
  7. android轻量级数据存储框架Hawk
  8. SIPWeb视频对讲,群呼,广播会议一体方案分析
  9. Token Bucket 令牌桶算法
  10. 计算机桌面出现模糊窗口,显示屏模糊,教您怎么解决电脑屏幕模糊