nutch1.9和solr4.5集成 输出信息
1,通过sor查询nutch抓取的结果
{"responseHeader": {"status": 0,"QTime": 2,"params": {"indent": "true","q": "title:幻想","_": "1418266706916","wt": "json"}},"response": {"numFound": 7,"start": 0,"docs": [{"content": "幻想江湖-2.2资料片,巅峰对决,震撼来袭! 跳转官网 装备凝练 巅峰擂台 万圣之夜 新版时装","id": "http://hxjh.zqgame.com/","title": "幻想江湖-2.2资料片,巅峰对决,震撼来袭!","segment": "20141211104005","boost": 0,"digest": "c61521c1861b1a7574c8920fd27d0155","tstamp": "2014-12-11T02:40:14.477Z","url": "http://hxjh.zqgame.com/","anchor": ["幻想江湖","幻想江湖"],"_version_": 1487159323035435000},{"content": "幻想江湖-鬼灵精怪万圣节 开启时间 : 10 月 30 日 万圣节礼包领取> 万圣节前夕,为了避免恶灵干扰,大侠们纷纷挂起了南瓜灯,驱逐鬼怪。江湖有一传闻,一群糖果商人行经龙脉岭时,因为身上的糖果、饼干、宝石而找来鬼魂附身,如果帮助他们驱逐了附身邪灵,将会获得他们道谢的礼物哦~! 1 万圣节天天有礼 2 练级打宝两不误 3 节日消费奖励翻倍 4 奖励兑换惊喜不断 5 洗炼折扣大放送 温馨提示: 活动期间,大侠们请每天提着南瓜灯,穿上蝙蝠衫,去龙脉去收集糖果饼干,要不停地说:“trick or treat.”(意思是给不给,不给就捣蛋)。要是不肯给的话,就用各种方法去惩罚他,例如:一招一个怪,“唰唰唰————”把龙脉挂个三小时! 关闭 恭喜你获得幻想江湖万圣节礼包! IOS用户领取: 安卓和越狱用户领取: 有效时间: 即日-2014.11.30 兑换次数: 只限兑换一次 兑换范围: 全服 礼包使用方法: 登录游戏后,点击游戏右上方【领奖】-【福利】-【礼包】后输入正确礼品卡号领取礼包奖励!","id": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_03.html","title": "幻想江湖-鬼灵精怪万圣节","segment": "20141211104057","boost": 0,"digest": "5ae39251ad06017e4e1854aae9129126","tstamp": "2014-12-11T02:41:37.669Z","url": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_03.html","anchor": ["万圣之夜"],"_version_": 1487159633802952700},{"content": "幻想江湖-优雅转身华丽时装首曝 夜魔游龙 西式时装 全新时装新品上架啦,这批时装看上去是不是和以前大有不同呢,此次大胆革新,看到下面的时装,不禁令人想到后面可能真的会有结婚系统咯,新版本新换装,不走平凡路~我们就是这样的与众不同! 进入官网 返回活动首页","id": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_04.html","title": "幻想江湖-优雅转身华丽时装首曝","segment": "20141211104057","boost": 0,"digest": "e086540bf0f721f39560440c85d2161f","tstamp": "2014-12-11T02:41:47.879Z","url": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_04.html","anchor": ["新版时装"],"_version_": 1487159633805049900},{"content": "《幻想江湖》官网-首部超萌动作武侠片!今天开始,做武侠片主人公 首页 新闻中心 游戏资料 游戏论坛 分享到: 安卓下载 ios越狱下载 ios正版下载 礼包领取 1 2 3 4 幻想江湖绝尚发布会精彩视频 最新 新闻 公告 活动 《幻想江湖》IOS18区“美人天下”12月10日火爆开启 2014史上最萌武侠手游来袭!不用吃药,放弃治疗,12月10日上午11:00新区“美人天下”火爆开启!快来没日没夜一起萌萌哒!... 查看详情 > 2014-12-10 • [新闻] 菜鸟进阶强力党 《幻想江湖》装备属性轻松堆 2014-12-10 • [新闻] 全新资料片即将来袭《幻想江湖》四大活动任你玩 2014-12-09 • [活动] 双12 玩幻想送福利 2014-12-09 • [新闻] 细节决定成败 《幻想江湖》人物属性全掌握 2014-12-09 • [活动] 《幻想江湖》IOS18区”美人天下”十六大活动 2014-12-08 • [新闻] 刀尖上的武侠 挑战《幻想江湖》秦陵副本 2014-12-10 • [新闻] 菜鸟进阶强力党 《幻想江湖》装备属性轻松堆 2014-12-10 • [新闻] 全新资料片即将来袭《幻想江湖》四大活动任你玩 2014-12-09 • [新闻] 细节决定成败 《幻想江湖》人物属性全掌握 2014-12-08 • [新闻] 刀尖上的武侠 挑战《幻想江湖》秦陵副本 2014-12-08 • [新闻] 新版“姑姑”遭吐槽 《幻想江湖》还你女神梦 2014-12-05 • [新闻] 《幻想江湖》我们结婚吧!——订婚篇 2014-12-03 • [公告] 幻想江湖-公测9~14区 数据互通公告 2014-12-01 • [公告] 《幻想江湖》12月2日临时维护公告 2014-11-26 • [公告] 《幻想江湖》2.4版本更新 2014-11-25 • [公告] 幻想江湖临时维护公告 2014-11-25 • [公告] 《幻想江湖》appstore1~8服数据互通完毕 2014-11-25 • [公告] 幻想江湖-appstore数据互通延长公告 2014-12-09 • [活动] 双12 玩幻想送福利 2014-12-09 • [活动] 《幻想江湖》IOS18区”美人天下”十六大活动 2014-12-08 • [活动] 周末齐消费 欢乐享不停 2014-12-08 • [活动] 《幻想江湖》美女主播齐聚乐———回顾 2014-12-08 • [活动] 《幻想江湖》25区”独步江湖”十六大活动 2014-12-05 • [活动] 《幻想江湖》玩家体验指南——做好产品,专注体验 联系人:施若熙 联系QQ:744415486 手机:13510624817 邮箱:ruoxi.shi@zqgame.com 联系人:方彦琼 联系QQ:611535985 手机:13603061895 邮箱:yanqiong.fang@zqgame.com 玩家群② 264103428 企业客服QQ:800056019 客服热线:0755-86160520 特色玩法 玩家攻略 职业介绍 明教 唐门 天山 逍遥 18183 766 91手游网 合作媒体 ———————————————————— 微信公众号 新浪微博 腾讯微博 扫描二维码下载 快速注册 通行证: 密 码: 确认密码: 验证码: 立即注册 用户名 恭喜你已经注册成功! 关闭 恭喜您获得幻想江湖公测新手礼包! 你的礼包卡号是: 礼包使用方法: 登陆游戏后,点击游戏右上方【领奖】-【福利】-【礼包】后输入8位的礼包卡号领取礼包奖励!内容包含:止血丹*2、白色强化石*20、成长丹*5、易功丹*10、进阶丹*5。 关闭 微信公众号","id": "http://hxjh.zqgame.com/index.html","title": "《幻想江湖》官网-首部超萌动作武侠片!今天开始,做武侠片主人公","segment": "20141211104057","boost": 0,"digest": "3f9a2060e12f95316ee0201ce8a21da0","tstamp": "2014-12-11T02:41:01.462Z","url": "http://hxjh.zqgame.com/index.html","anchor": ["进入官网"],"_version_": 1487159633828118500},{"content": "【仙幻奇缘】官网 12.6首次开放公测!无商城,真正免费! 进入官网 论坛中心 游戏下载 购卡充值 1 2 3 4 5 媒体友链 通行证账号: 通行证密码: 确认密码: 验证码: 同意 《中青宝》协议 恭喜你!注册成功! 用户名是: 客户端 立即下载 获取特权礼包 版权所有:深圳中青宝互动网络股份有限公司 客服传真:0755-86368269 中华人民共和国增值电信业务经营许可证:粤B2-20030216 粤ICP备:09057836 网络文化经营许可证:文网文[2008]088号 中华人民共和国互联网出版许可证:新出网证(粤)字017号 每个IP只能参加一次抽奖, 谢谢您的参与! ","id": "http://xh.zqgame.com/","title": "【仙幻奇缘】官网 12.6首次开放公测!无商城,真正免费!","segment": "20141211104057","boost": 0,"digest": "471def081683b7c5f94a39382e4c00a1","tstamp": "2014-12-11T02:41:02.165Z","url": "http://xh.zqgame.com/","anchor": ["仙幻奇缘","仙幻奇缘"],"_version_": 1487159634570510300},{"content": "《诸神世界》官方网站—3D魔幻战争网游 诸神世界 首页 新闻动态 游戏资料 下载微端 快速充值 官方论坛 下载微端 快速充值 VIP介绍 领取新手卡 选择大区 请选择服务器 风暴荒漠 战争血径 无尽沙海 燃烧平原 双线1-16服 领取中,请稍候…… 您的礼包号为: 更多服务器 《诸神世界》是一款MMORPG的3D国战网页游戏,采用魔幻风格,3D旋转俯瞰视角,以国家战争、团队冒险等玩法为特色,以大范围多维度强PVP玩法为核心的超激情游戏,体验游戏国战pk激情就来诸神世界。 0755-26635899 客服邮箱:kefu@zqgame.com 客服传真:0755-86368269 游戏QQ群:219759659 259942575 用户名: * 以字母开头由大小写字母、数字、下划线组成,长度为4-32位 密码: * 6-20字母、数字、符号组成,不含空格键、「\"」及「'」 确认密码: * 请再一次输入密码 1 2 3 4 最新 新闻 活动 公告 攻略 诸神世界混服部分区服数据互通公告 公告 06-04 诸神世界混服部分区服数据互通公告 公告 05-23 5月29日12点诸神新区-风暴荒漠火爆开启 新闻 05-14 5月15日12点诸神新区-亡魂峡谷火爆开启 公告 04-18 诸神世界混服合服活动精彩上线 公告 04-18 诸神世界混服部分区服数据互通公告 【新闻】 05-14 5月15日12点诸神新区-亡魂峡谷火爆开启 【新闻】 04-16 4月17日12点诸神新区-呼啸沙漠火爆开启 【新闻】 03-24 3月27日12点诸神新区-巨龙之吼火爆开启 【新闻】 03-17 3月20日12点诸神新区-尘风峡谷火爆开启 【新闻】 03-11 3月13日12点诸神新区-耳语海岸火爆开启 【活动】 04-02 《诸神世界》十大开服活动 【活动】 02-13 《诸神世界》元宵&情人节活动 【活动】 01-26 《诸神世界》春节活动 【活动】 11-21 诸神世界周末限时活动火爆上线 【活动】 11-08 双十一《诸神世界》劲爆大酬宾 【公告】 06-04 诸神世界混服部分区服数据互通公告 【公告】 05-23 5月29日12点诸神新区-风暴荒漠火爆开启 【公告】 04-18 诸神世界混服合服活动精彩上线 【公告】 04-18 诸神世界混服部分区服数据互通公告 【公告】 03-25 3月28日平台网络升级公告 魔 牧 枪 炮 术 战 魔 刃 狩猎灵魂 攻击方式:近程魔法攻击 核心属性:智力 敏捷 职业特质:隐匿暗杀能力 职业说明:刀锋舞者,狩猎着生者的灵魂。隐没于黑暗,游走于光明。不被历史描述,却是历史的主宰! 点击查看详情 牧 师 神的宠儿 攻击方式:中程魔法攻击 核心属性:精神 智力 职业特质:恢复治愈能力 职业说明:神之使徒,捍卫生者,拯救死者。信者永生,不信者也救赎。虔诚的信徒,是神的宠儿! 点击查看详情 枪 手 一击必杀 攻击方式:远程物理攻击 核心属性:力量 精神 职业特质:伤害输出 职业说明:猎命王者,半边恶魔半边天使。沉着冷静,是他们的特质;一击必杀,是他们的实力! 点击查看详情 魔 炮 焚天怒焰 攻击方式:远程魔法攻击 核心属性:智力 精神 职业特质:群体伤害 职业说明:焚天烈焰,吞噬罪孽与苍生。沉稳步伐,吼出战歌嘹亮;怒放炮火,点亮生命奇迹! 点击查看详情 术 士 破碎虚空 攻击方式:中程魔法攻击 核心属性:智力 精神 职业特质:战斗节奏控制能力 职业说明:掌握法则,智慧象征。探索真理,识古通今,洞悉未来。以世间威能,抑恶扬善,改天逆命,破碎虚空! 点击查看详情 战 士 金刚不坏 攻击方式:近程物理攻击 核心属性:体质 力量 职业特质:生存能力 职业说明:移动城墙,金刚不坏。战,则掠地千里;守,则万夫莫开。英勇的灵魂铸造不灭传奇! 点击查看详情 系统介绍 进阶指导 特色系统 活动玩法 结婚系统 | 职业介绍 | FAQ | VIP如何获得 | 坐骑强化 | 转职重修 | 战友系统 | 升级送祝福 | 日常任务 | 拍卖寄售 | 技能遗忘重生 | 道具商城 | 财产保护 炼金系统 | 星耀石 | 装备镶嵌 | 装备升阶 | 装备打孔 | 要塞守卫站 | 神器合成 | 宠物潜力修改 | 宝石摘除 斗气系统 | 羽翼系统 | 1V1模拟战 | 移民系统 | 击鼓传花 | 情缘任务 | 神圣血脉 | 军衔系统 | 钓鱼系统 | 称号系统 | 封印进度 | 离线经验 巴比伦塔 | 跨区国战 | 跨区巡游 | 跨区极速狂飙 | 跨区组队争夺战 | 超级血战到底 | 血战到底 | 小丑的梦境 | 王者试炼 | 探险者地宫 | 前线速递 | 骑魂谷 | 冒险岛 | 极速狂飙 | 毁灭神迹 | 国家正式战争 | 国家远征 | 国家情报 | 国家BOSS | 藏宝峡谷 游戏壁纸 游戏截图 玩家相册 MORE 265G百科 073专区 新浪爱问 抵制不良游戏 拒绝盗版游戏 注意自我保护 谨防上当受骗 适度游戏益脑 沉迷游戏伤身 合理安排时间 享受健康生活 增值电信许可证:粤B2-20120680 网络文化经营许可证: 粤网文[2014]0615-215号 粤ICP备09057836号 深圳市卓页互动网络科技有限公司 Copyright © 2012-2014 All Rights Reserved 本游戏适合18岁以上用户,不含暴力、恐怖、残酷、色情等妨害未成年人身心健康的内容,属于绿色健康产品 yy","id": "http://zs.ucjoy.com/","title": "《诸神世界》官方网站—3D魔幻战争网游","cache": "content","segment": "20141211104057","boost": 0,"digest": "8d00af8aaa03c2cf68a69dc68892b764","tstamp": "2014-12-11T02:41:18.686Z","url": "http://zs.ucjoy.com/","anchor": ["官网","诸神世界"],"_version_": 1487159634641813500},{"content": "《诸神世界》官方网站—3D魔幻战争网游 诸神世界 首页 新闻动态 游戏资料 下载微端 快速充值 官方论坛 下载微端 快速充值 VIP介绍 领取新手卡 选择大区 请选择服务器 风暴荒漠 战争血径 无尽沙海 燃烧平原 双线1-16服 领取中,请稍候…… 您的礼包号为: 更多服务器 《诸神世界》是一款MMORPG的3D国战网页游戏,采用魔幻风格,3D旋转俯瞰视角,以国家战争、团队冒险等玩法为特色,以大范围多维度强PVP玩法为核心的超激情游戏,体验游戏国战pk激情就来诸神世界。 0755-26635899 客服邮箱:kefu@zqgame.com 客服传真:0755-86368269 游戏QQ群:219759659 259942575 用户名: * 以字母开头由大小写字母、数字、下划线组成,长度为4-32位 密码: * 6-20字母、数字、符号组成,不含空格键、「\"」及「'」 确认密码: * 请再一次输入密码 您所在的位置: 首页 > 服务器列表 推荐服务器列表 风暴荒漠 火爆 战争血径 火爆 我的服务器列表 你还未进入过游戏,请先登录游戏! 所有服务器 1-10 11-20 诸神混服 双线1-16服 火爆 风暴荒漠 火爆 战争血径 火爆 无尽沙海 火爆 燃烧平原 火爆 抵制不良游戏 拒绝盗版游戏 注意自我保护 谨防上当受骗 适度游戏益脑 沉迷游戏伤身 合理安排时间 享受健康生活 增值电信许可证:粤B2-20120680 网络文化经营许可证: 粤网文[2014]0615-215号 粤ICP备09057836号 深圳市卓页互动网络科技有限公司 Copyright © 2012-2014 All Rights Reserved 本游戏适合18岁以上用户,不含暴力、恐怖、残酷、色情等妨害未成年人身心健康的内容,属于绿色健康产品 yy","id": "http://zs.ucjoy.com/serverlist.app","title": "《诸神世界》官方网站—3D魔幻战争网游","cache": "content","segment": "20141211104057","boost": 0,"digest": "30a836aae5886924d1a87d3ab1ad42c8","tstamp": "2014-12-11T02:41:13.476Z","url": "http://zs.ucjoy.com/serverlist.app","anchor": ["进入新服","开始游戏"],"_version_": 1487159634643910700}]}
}
2,截图展示solr展示的结果
bin/crawl urls crawl http://xx.xx.xx.xx:8983/solr 5
3,nutch抓取时候日志:
<pre name="code" class="plain">2014-12-11 10:23:02,927 INFO crawl.Injector - Injector: starting at 2014-12-11 10:23:022289 2014-12-11 10:23:02,928 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb2290 2014-12-11 10:23:02,928 INFO crawl.Injector - Injector: urlDir: urls2291 2014-12-11 10:23:02,929 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries.2292 2014-12-11 10:23:03,210 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2293 2014-12-11 10:23:03,266 WARN snappy.LoadSnappy - Snappy native library not loaded2294 2014-12-11 10:23:03,748 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default2295 2014-12-11 10:23:04,496 INFO crawl.Injector - Injector: Total number of urls rejected by filters: 02296 2014-12-11 10:23:04,496 INFO crawl.Injector - Injector: Total number of urls after normalization: 12297 2014-12-11 10:23:04,496 INFO crawl.Injector - Injector: Merging injected urls into crawl db.2298 2014-12-11 10:23:04,779 INFO crawl.Injector - Injector: overwrite: false2299 2014-12-11 10:23:04,779 INFO crawl.Injector - Injector: update: false2300 2014-12-11 10:23:05,606 INFO crawl.Injector - Injector: URLs merged: 12301 2014-12-11 10:23:05,611 INFO crawl.Injector - Injector: Total new urls injected: 02302 2014-12-11 10:23:05,612 INFO crawl.Injector - Injector: finished at 2014-12-11 10:23:05, elapsed: 00:00:022303 2014-12-11 10:23:06,551 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2304 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: starting at 2014-12-11 10:23:062305 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: Selecting best-scoring urls due for fetch.2306 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: filtering: false2307 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: normalizing: true2308 2014-12-11 10:23:06,552 INFO crawl.Generator - Generator: topN: 500002309 2014-12-11 10:23:07,201 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule2310 2014-12-11 10:23:07,202 INFO crawl.AbstractFetchSchedule - defaultInterval=25920002311 2014-12-11 10:23:07,202 INFO crawl.AbstractFetchSchedule - maxInterval=77760002312 2014-12-11 10:23:07,211 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default2313 2014-12-11 10:23:07,267 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule2314 2014-12-11 10:23:07,267 INFO crawl.AbstractFetchSchedule - defaultInterval=25920002315 2014-12-11 10:23:07,267 INFO crawl.AbstractFetchSchedule - maxInterval=77760002316 2014-12-11 10:23:07,272 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default2317 2014-12-11 10:23:07,875 INFO crawl.Generator - Generator: Partitioning selected urls for politeness.2318 2014-12-11 10:23:08,875 INFO crawl.Generator - Generator: segment: crawl/segments/201412111023082319 2014-12-11 10:23:09,051 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default2320 2014-12-11 10:23:09,993 INFO crawl.Generator - Generator: finished at 2014-12-11 10:23:09, elapsed: 00:00:032321 2014-12-11 10:23:10,681 INFO fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:23:102322 2014-12-11 10:23:10,681 INFO fetcher.Fetcher - Fetcher: segment: crawl/segments/201412111023082323 2014-12-11 10:23:10,681 INFO fetcher.Fetcher - Fetcher Timelimit set for : 14182753906812324 2014-12-11 10:23:10,956 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2325 2014-12-11 10:23:11,415 INFO fetcher.Fetcher - Using queue mode : byHost2326 2014-12-11 10:23:11,415 INFO fetcher.Fetcher - Fetcher: threads: 502327 2014-12-11 10:23:11,415 INFO fetcher.Fetcher - Fetcher: time-out divisor: 22328 2014-12-11 10:23:11,435 INFO fetcher.Fetcher - QueueFeeder finished: total 18 records + hit by time limit :02329 2014-12-11 10:23:11,585 INFO fetcher.Fetcher - Using queue mode : byHost2330 2014-12-11 10:23:11,586 INFO fetcher.Fetcher - Using queue mode : byHost2331 2014-12-11 10:23:11,586 INFO fetcher.Fetcher - fetching http://v.zqgame.com/moviePlay/goMoviePlay/5/001 (queue crawl delay=5000ms)2332 2014-12-11 10:23:11,587 INFO fetcher.Fetcher - Using queue mode : byHost2348 2014-12-11 10:23:11,597 INFO http.Http - http.proxy.host = null2349 2014-12-11 10:23:11,597 INFO http.Http - http.proxy.port = 80802350 2014-12-11 10:23:11,597 INFO http.Http - http.timeout = 100002351 2014-12-11 10:23:11,597 INFO http.Http - http.content.limit = 655362352 2014-12-11 10:23:11,597 INFO http.Http - http.agent = My Nutch Spider/Nutch-1.92353 2014-12-11 10:23:11,597 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.32354 2014-12-11 10:23:11,597 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.82355 2014-12-11 10:23:11,597 INFO fetcher.Fetcher - Using queue mode : byHost2387 2014-12-11 10:23:11,620 INFO fetcher.Fetcher - Fetcher: throughput threshold: -12388 2014-12-11 10:23:11,620 INFO fetcher.Fetcher - Fetcher: throughput threshold retries: 52389 2014-12-11 10:23:11,620 INFO fetcher.Fetcher - fetcher.maxNum.threads can't be < than 50 : using 50 instead2390 2014-12-11 10:23:12,622 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12391 2014-12-11 10:23:13,622 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12392 2014-12-11 10:23:14,623 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12393 2014-12-11 10:23:15,623 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12394 2014-12-11 10:23:16,624 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=12395 2014-12-11 10:23:16,891 INFO fetcher.Fetcher - fetching http://v.zqgame.com/moviePlay/goMoviePlay/3/3 (queue crawl delay=5000ms)2396 2014-12-11 10:23:17,624 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12397 2014-12-11 10:23:18,625 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12398 2014-12-11 10:23:19,625 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12399 2014-12-11 10:23:20,626 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12400 2014-12-11 10:23:21,626 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=12401 2014-12-11 10:23:21,935 INFO fetcher.Fetcher - fetching http://v.zqgame.com/view/index (queue crawl delay=5000ms)2402 2014-12-11 10:23:22,627 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=12403 2014-12-11 10:23:23,627 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=12404 2014-12-11 10:23:24,627 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=12405 2014-12-11 10:23:25,628 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=13158 2014-12-11 10:27:15,997 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3159 2014-12-11 10:27:15,997 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=03160 2014-12-11 10:27:16,004 INFO fetcher.Fetcher - -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=03161 2014-12-11 10:27:16,005 INFO fetcher.Fetcher - -activeThreads=03162 2014-12-11 10:27:16,629 INFO fetcher.Fetcher - Fetcher: finished at 2014-12-11 10:27:16, elapsed: 00:00:073163 2014-12-11 10:27:17,320 INFO parse.ParseSegment - ParseSegment: starting at 2014-12-11 10:27:173164 2014-12-11 10:27:17,320 INFO parse.ParseSegment - ParseSegment: segment: crawl/segments/201412111027073165 2014-12-11 10:27:17,591 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3166 2014-12-11 10:27:18,518 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature3167 2014-12-11 10:27:18,528 INFO parse.ParseSegment - Parsed (12ms):http://v.zqgame.com/indexmain3168 2014-12-11 10:27:18,571 INFO parse.ParseSegment - Parsed (1ms):http://v.zqgame.com/moviePlay/goMoviePlay/4/43169 2014-12-11 10:27:18,659 INFO regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default3170 2014-12-11 10:27:18,871 INFO parse.ParseSegment - ParseSegment: finished at 2014-12-11 10:27:18, elapsed: 00:00:013171 2014-12-11 10:27:19,794 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3172 2014-12-11 10:27:19,810 INFO crawl.CrawlDb - CrawlDb update: starting at 2014-12-11 10:27:193173 2014-12-11 10:27:19,810 INFO crawl.CrawlDb - CrawlDb update: db: crawl/crawldb3174 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: segments: [crawl/segments/20141211102707]3175 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: additions allowed: true3176 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: URL normalizing: false3177 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: URL filtering: false3178 2014-12-11 10:27:19,811 INFO crawl.CrawlDb - CrawlDb update: 404 purging: false3179 2014-12-11 10:27:19,812 INFO crawl.CrawlDb - CrawlDb update: Merging segment data into db.3180 2014-12-11 10:27:20,639 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3181 2014-12-11 10:27:20,639 INFO crawl.AbstractFetchSchedule - defaultInterval=25920003182 2014-12-11 10:27:20,639 INFO crawl.AbstractFetchSchedule - maxInterval=77760003183 2014-12-11 10:27:21,120 INFO crawl.CrawlDb - CrawlDb update: finished at 2014-12-11 10:27:21, elapsed: 00:00:013184 2014-12-11 10:27:22,066 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3185 2014-12-11 10:27:22,067 INFO crawl.LinkDb - LinkDb: starting at 2014-12-11 10:27:223186 2014-12-11 10:27:22,067 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb3187 2014-12-11 10:27:22,067 INFO crawl.LinkDb - LinkDb: URL normalize: true3188 2014-12-11 10:27:22,067 INFO crawl.LinkDb - LinkDb: URL filter: true3189 2014-12-11 10:27:22,068 INFO crawl.LinkDb - LinkDb: internal links will be ignored.3190 2014-12-11 10:27:22,068 INFO crawl.LinkDb - LinkDb: adding segment: crawl/segments/201412111027073191 2014-12-11 10:27:23,376 INFO crawl.LinkDb - LinkDb: merging with existing linkdb: crawl/linkdb3192 2014-12-11 10:27:23,688 INFO regex.RegexURLNormalizer - can't find rules for scope 'linkdb', using default3193 2014-12-11 10:27:24,510 INFO crawl.LinkDb - LinkDb: finished at 2014-12-11 10:27:24, elapsed: 00:00:023194 2014-12-11 10:27:25,209 INFO crawl.DeduplicationJob - DeduplicationJob: starting at 2014-12-11 10:27:253195 2014-12-11 10:27:25,483 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3196 2014-12-11 10:27:26,760 INFO crawl.DeduplicationJob - Deduplication: 2 documents marked as duplicates3197 2014-12-11 10:27:26,760 INFO crawl.DeduplicationJob - Deduplication: Updating status of duplicate urls into crawl db.3198 2014-12-11 10:27:27,931 INFO crawl.DeduplicationJob - Deduplication finished at 2014-12-11 10:27:27, elapsed: 00:00:023199 2014-12-11 10:27:28,623 INFO indexer.IndexingJob - Indexer: starting at 2014-12-11 10:27:283200 2014-12-11 10:27:28,711 INFO indexer.IndexingJob - Indexer: deleting gone documents: false3201 2014-12-11 10:27:28,711 INFO indexer.IndexingJob - Indexer: URL filtering: false3202 2014-12-11 10:27:28,718 INFO indexer.IndexingJob - Indexer: URL normalizing: false3203 2014-12-11 10:27:28,933 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3204 2014-12-11 10:27:28,933 INFO indexer.IndexingJob - Active IndexWriters :3205 SOLRIndexWriter3206 solr.server.url : URL of the SOLR instance (mandatory)3207 solr.commit.size : buffer size when sending to SOLR (default 1000)3208 solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)3209 solr.auth : use authentication (default false)3210 solr.auth.username : use authentication (default false)3211 solr.auth : username for authentication3212 solr.auth.password : password for authentication3213 3214 3215 2014-12-11 10:27:28,937 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb3216 2014-12-11 10:27:28,937 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb3217 2014-12-11 10:27:28,937 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/201412111027073218 2014-12-11 10:27:29,087 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3219 2014-12-11 10:27:29,585 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off3220 2014-12-11 10:27:29,995 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3221 2014-12-11 10:27:30,022 INFO solr.SolrMappingReader - source: content dest: content3222 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: title dest: title3223 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: host dest: host3224 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: segment dest: segment3225 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: boost dest: boost3226 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: digest dest: digest3227 2014-12-11 10:27:30,023 INFO solr.SolrMappingReader - source: tstamp dest: tstamp3228 2014-12-11 10:27:30,054 INFO solr.SolrIndexWriter - Indexing 2 documents3229 2014-12-11 10:27:30,175 INFO solr.SolrIndexWriter - Indexing 2 documents2014-12-11 10:39:34,707 INFO crawl.Injector - Injector: starting at 2014-12-11 10:39:343254 2014-12-11 10:39:34,707 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb3255 2014-12-11 10:39:34,707 INFO crawl.Injector - Injector: urlDir: urls3256 2014-12-11 10:39:34,708 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries.3257 2014-12-11 10:39:34,989 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3258 2014-12-11 10:39:35,046 WARN snappy.LoadSnappy - Snappy native library not loaded3259 2014-12-11 10:39:35,528 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default3260 2014-12-11 10:39:36,273 INFO crawl.Injector - Injector: Total number of urls rejected by filters: 03261 2014-12-11 10:39:36,273 INFO crawl.Injector - Injector: Total number of urls after normalization: 13262 2014-12-11 10:39:36,273 INFO crawl.Injector - Injector: Merging injected urls into crawl db.3263 2014-12-11 10:39:36,577 INFO crawl.Injector - Injector: overwrite: false3264 2014-12-11 10:39:36,577 INFO crawl.Injector - Injector: update: false3265 2014-12-11 10:39:37,387 INFO crawl.Injector - Injector: URLs merged: 13266 2014-12-11 10:39:37,392 INFO crawl.Injector - Injector: Total new urls injected: 03267 2014-12-11 10:39:37,392 INFO crawl.Injector - Injector: finished at 2014-12-11 10:39:37, elapsed: 00:00:023268 2014-12-11 10:39:38,327 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3269 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: starting at 2014-12-11 10:39:383270 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: Selecting best-scoring urls due for fetch.3271 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: filtering: false3272 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: normalizing: true3273 2014-12-11 10:39:38,328 INFO crawl.Generator - Generator: topN: 500003274 2014-12-11 10:39:38,978 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3275 2014-12-11 10:39:38,978 INFO crawl.AbstractFetchSchedule - defaultInterval=25920003276 2014-12-11 10:39:38,978 INFO crawl.AbstractFetchSchedule - maxInterval=77760003277 2014-12-11 10:39:38,987 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default3278 2014-12-11 10:39:39,040 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3279 2014-12-11 10:39:39,040 INFO crawl.AbstractFetchSchedule - defaultInterval=25920003280 2014-12-11 10:39:39,040 INFO crawl.AbstractFetchSchedule - maxInterval=77760003281 2014-12-11 10:39:39,045 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default3282 2014-12-11 10:39:39,649 INFO crawl.Generator - Generator: Partitioning selected urls for politeness.3283 2014-12-11 10:39:40,649 INFO crawl.Generator - Generator: segment: crawl/segments/201412111039403284 2014-12-11 10:39:40,814 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default3285 2014-12-11 10:39:41,755 INFO crawl.Generator - Generator: finished at 2014-12-11 10:39:41, elapsed: 00:00:033286 2014-12-11 10:39:42,447 INFO fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:39:423287 2014-12-11 10:39:42,447 INFO fetcher.Fetcher - Fetcher: segment: crawl/segments/201412111039403288 2014-12-11 10:39:42,447 INFO fetcher.Fetcher - Fetcher Timelimit set for : 14182763824473289 2014-12-11 10:39:42,720 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3290 2014-12-11 10:39:43,171 INFO fetcher.Fetcher - Using queue mode : byHost3291 2014-12-11 10:39:43,171 INFO fetcher.Fetcher - Fetcher: threads: 503292 2014-12-11 10:39:43,171 INFO fetcher.Fetcher - Fetcher: time-out divisor: 23293 2014-12-11 10:39:43,182 INFO fetcher.Fetcher - QueueFeeder finished: total 1 records + hit by time limit :03294 2014-12-11 10:39:43,336 INFO fetcher.Fetcher - Using queue mode : byHost3295 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - Using queue mode : byHost3296 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - fetching http://passport.zqgame.com/common/agreement.jsp (queue crawl delay=5000ms)3297 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3298 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13299 2014-12-11 10:39:43,337 INFO fetcher.Fetcher - Using queue mode : byHost3300 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3301 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13302 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - Using queue mode : byHost3303 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3304 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13305 2014-12-11 10:39:43,338 INFO fetcher.Fetcher - Using queue mode : byHost3306 2014-12-11 10:39:43,339 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3307 2014-12-11 10:39:43,339 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13308 2014-12-11 10:39:43,339 INFO fetcher.Fetcher - Using queue mode : byHost3309 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3310 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13311 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - Using queue mode : byHost3312 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - Thread FetcherThread has no more work available3313 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13314 2014-12-11 10:39:43,340 INFO fetcher.Fetcher - Using queue mode : byHost3315 2014-12-11 10:39:43,341 INFO fetcher.Fetcher - Thread FetcherThread has no more work available2014-12-11 10:39:57,352 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb3511 2014-12-11 10:39:57,352 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb3512 2014-12-11 10:39:57,353 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/201412111039403513 2014-12-11 10:39:57,501 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3514 2014-12-11 10:39:57,970 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off3515 2014-12-11 10:39:58,376 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3516 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: content dest: content3517 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: title dest: title3518 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: host dest: host3519 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: segment dest: segment3520 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: boost dest: boost3521 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: digest dest: digest3522 2014-12-11 10:39:58,403 INFO solr.SolrMappingReader - source: tstamp dest: tstamp3523 2014-12-11 10:39:58,434 INFO solr.SolrIndexWriter - Indexing 1 documents3524 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: content dest: content3525 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: title dest: title3526 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: host dest: host3527 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: segment dest: segment3528 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: boost dest: boost3529 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: digest dest: digest3530 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: tstamp dest: tstamp3531 2014-12-11 10:40:00,130 INFO indexer.IndexingJob - Indexer: finished at 2014-12-11 10:40:00, elapsed: 00:00:033532 2014-12-11 10:40:00,830 INFO indexer.CleaningJob - CleaningJob: starting at 2014-12-11 10:40:003533 2014-12-11 10:40:01,101 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3534 2014-12-11 10:40:01,748 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3535 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: content dest: content3536 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: title dest: title3537 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: host dest: host3538 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: segment dest: segment3539 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: boost dest: boost3540 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: digest dest: digest3541 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: tstamp dest: tstamp3542 2014-12-11 10:40:01,963 INFO indexer.CleaningJob - CleaningJob: deleted a total of 10 documents3543 2014-12-11 10:40:01,967 WARN mapred.FileOutputCommitter - Output path is null in cleanup3544 2014-12-11 10:40:02,382 INFO indexer.CleaningJob - CleaningJob: finished at 2014-12-11 10:40:02, elapsed: 00:00:013545 2014-12-11 10:40:03,313 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable014-12-11 10:40:01,967 WARN mapred.FileOutputCommitter - Output path is null in cleanup3544 2014-12-11 10:40:02,382 INFO indexer.CleaningJob - CleaningJob: finished at 2014-12-11 10:40:02, elapsed: 00:00:013545 2014-12-11 10:40:03,313 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3546 2014-12-11 10:40:03,314 INFO crawl.Generator - Generator: starting at 2014-12-11 10:40:033547 2014-12-11 10:40:03,314 INFO crawl.Generator - Generator: Selecting best-scoring urls due for fetch.3548 2014-12-11 10:40:03,314 INFO crawl.Generator - Generator: filtering: false3549 2014-12-11 10:40:03,314 INFO crawl.Generator - Generator: normalizing: true3550 2014-12-11 10:40:03,315 INFO crawl.Generator - Generator: topN: 500003551 2014-12-11 10:40:03,963 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3552 2014-12-11 10:40:03,964 INFO crawl.AbstractFetchSchedule - defaultInterval=25920003553 2014-12-11 10:40:03,964 INFO crawl.AbstractFetchSchedule - maxInterval=77760003554 2014-12-11 10:40:03,972 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default3555 2014-12-11 10:40:04,062 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule3556 2014-12-11 10:40:04,062 INFO crawl.AbstractFetchSchedule - defaultInterval=25920003557 2014-12-11 10:40:04,062 INFO crawl.AbstractFetchSchedule - maxInterval=77760003558 2014-12-11 10:40:04,067 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default3559 2014-12-11 10:40:04,635 INFO crawl.Generator - Generator: Partitioning selected urls for politeness.3560 2014-12-11 10:40:05,636 INFO crawl.Generator - Generator: segment: crawl/segments/201412111040053561 2014-12-11 10:40:05,803 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default3562 2014-12-11 10:40:06,747 INFO crawl.Generator - Generator: finished at 2014-12-11 10:40:06, elapsed: 00:00:033563 2014-12-11 10:40:07,435 INFO fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:40:073564 2014-12-11 10:40:07,435 INFO fetcher.Fetcher - Fetcher: segment: crawl/segments/201412111040053565 2014-12-11 10:40:07,435 INFO fetcher.Fetcher - Fetcher Timelimit set for : 14182764074353566 2014-12-11 10:40:07,707 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3567 2014-12-11 10:40:08,157 INFO fetcher.Fetcher - Using queue mode : byHost3568 2014-12-11 10:40:08,158 INFO fetcher.Fetcher - Fetcher: threads: 503569 2014-12-11 10:40:08,158 INFO fetcher.Fetcher - Fetcher: time-out divisor: 23570 2014-12-11 10:40:08,187 INFO fetcher.Fetcher - QueueFeeder finished: total 40 records + hit by time limit :03571 2014-12-11 10:40:08,326 INFO fetcher.Fetcher - Using queue mode : byHost3572 2014-12-11 10:40:08,327 INFO fetcher.Fetcher - Using queue mode : byHost3573 2014-12-11 10:40:08,327 INFO fetcher.Fetcher - fetching http://hxjh.zqgame.com/ (queue crawl delay=5000ms)3574 2014-12-11 10:40:08,328 INFO fetcher.Fetcher - fetching http://lt.zqgame.com/ (queue crawl delay=5000ms)3575 2014-12-11 10:40:08,328 INFO fetcher.Fetcher - Using queue mode : byHost3576 2014-12-11 10:40:08,328 INFO fetcher.Fetcher - fetching http://zscq.zqgame.com/ (queue crawl delay=5000ms)3577 2014-12-11 10:40:08,328 INFO fetcher.Fetcher - Using queue mode : byHost3578 2014-12-11 10:40:08,329 INFO fetcher.Fetcher - fetching http://lj2.zqgame.com/ (queue crawl delay=5000ms)3523 2014-12-11 10:39:58,434 INFO solr.SolrIndexWriter - Indexing 1 documents3524 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: content dest: content3525 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: title dest: title3526 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: host dest: host3527 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: segment dest: segment3528 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: boost dest: boost3529 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: digest dest: digest3530 2014-12-11 10:39:59,776 INFO solr.SolrMappingReader - source: tstamp dest: tstamp3531 2014-12-11 10:40:00,130 INFO indexer.IndexingJob - Indexer: finished at 2014-12-11 10:40:00, elapsed: 00:00:033532 2014-12-11 10:40:00,830 INFO indexer.CleaningJob - CleaningJob: starting at 2014-12-11 10:40:003533 2014-12-11 10:40:01,101 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable3534 2014-12-11 10:40:01,748 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter3535 2014-12-11 10:40:01,775 INFO solr.SolrMappingReader - source: content dest: content
14550 2014-12-11 10:59:29,551 INFO fetcher.Fetcher - fetching http://pay.zqgame.com/pay/toPayPage/dxpc/107 (queue crawl delay=5000ms)
14551 2014-12-11 10:59:29,703 INFO fetcher.Fetcher - -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=49, fetchQueues.getQueueCount=1
nutch1.9和solr4.5集成 输出信息相关推荐
- nutch1.3与solr3.4集成部署在eclipse上之——运行的输出日志
nutch1.3与solr3.4集成部署在eclipse上成功 在eclipse上运行参数是: crawl urls -solr http://localhost:8080/l-nutch-solr ...
- Tomcat下项目调整Log4J的console输出级别,减少输出信息
场景 输出优先级,由低到高 DEBUG,INFO,WARN,ERROR,FATAL 输出方式说明 org.apache.log4j.ConsoleAppender(控制台), org.apache ...
- python执行结果在gui界面显示_Python PyQt5运行程序把输出信息展示到GUI图形界面上...
概述:最近在赶毕业设计,遇到一个问题,爬虫模块我用PyQt5写了图形界面,为了将所有的输出信息都显示到图形界面上遇到了问题. 先演示一下效果最终效果吧,下面两张图用来镇楼.可以看到我们图形界面和程序运 ...
- linux命令 重定向%3e,linux输出信息调试信息重定向
在运行linux的时候有所有的调试信息可以分为三个部分 1.bootloader输出信息 U-Boot 1.3.2(Nov 19 2016 - 22:02:08) DRAM: 64 MB Flash: ...
- python websocket django vue_Django资料 Vue实现网页前端实时反馈输出信息
Django资料 Vue实现网页前端实时反馈输出信息 前言 功能实现:网也点击任务,页面实时返回执行的信息 本次的任务是执行本地的一个sh脚本 这个sh脚本就是每隔1S,输出一段文字 如果需要远程可以 ...
- 去除NSLog时间戳及其他输出信息
如果不想看见NSLog的时间戳以及其他输出信息,我们可以在前面自行添加宏定义 #define NSLog(FORMAT, ...) printf("%s\n", [[NSStrin ...
- 《ActionScript 3.0基础教程》——1.3 在显示面板输出信息
本节书摘来自异步社区<ActionScript 3.0基础教程>一书中的第1章,第1.3节,作者: [美]Doug Winnie 更多章节内容可以访问云栖社区"异步社区" ...
- 声明一个长方形类,属性有长和宽;操作有赋值、计算长方形的周长和面积、输出信息等,要求定义构造函数(缺省值为10)和析构函数。
题目描述:声明一个长方形类,属性有长和宽:操作有赋值.计算长方形的周长和面积.输出信息等,要求定义构造函数(缺省值为10)和析构函数. 析构函数的作用:对象消亡时,自动被调用,用来释放对象占用的空间. ...
- STM8-STVD+Cosmic编译输出信息参数配置
STM8-STVD+Cosmic编译输出信息参数配置
最新文章
- ORACLE 存储过程异常捕获并抛出
- 『Python』VS2015编译源码注意事项
- 电脑底下的任务栏不见了_拿到一台新的Windows电脑,我会做什么?
- spring源码分析之context:component-scan/vsannotation-config/
- 7 centos 修改磁盘uuid_Centos7修改分区空间
- CXF配置,ant文件说明及运行,运行cxf中带的项目
- 工厂模式例子之计算器的实现
- 常用函数的连续傅里叶变换对
- 图片加载------reactVirtualized
- 转载--数据库sql取整操作
- js判断移动端或是pc端
- 重庆钢铁泛微oa系统服务器更新时间,泛微全新OA系统-协同办公系统
- 地图WGS84和地图GCJ02
- edge bing搜索响应缓慢
- 【播放器】媒体播放器三大架构
- 印刷机在纸厚发生变化时的压力调节
- AlBaath Collegiate Programming Contest (2015) 总结
- 服务器重启后启动php项目
- 【hadoop】mapreduce面试题总结
- html的常用标签,系列篇