[Flume]使用 Flume 来传递web log 到 hdfs 的例子
[Flume]使用 Flume 来传递web log 到 hdfs 的例子:
在 hdfs 上创建存储 log 的目录:
$ hdfs dfs -mkdir -p /test001/weblogsflume
指定log 输入的目录:
$ sudo mkdir -p /flume/weblogsmiddle
设定使得log 可以被任何用户访问:
$ sudo chmod a+w -R /flume
$
设置配置文件内容:
$ cat /mytraining/exercises/flume/spooldir.conf
#Setting component
agent1.sources = webserver-log-source
agent1.sinks = hdfs-sink
agent1.channels = memory-channel
#Setting source
agent1.sources.webserver-log-source.type = spooldir
agent1.sources.webserver-log-source.spoolDir = /flume/weblogsmiddle
agent1.sources.webserver-log-source.channels = memory-channel
#Setting sinks
agent1.sinks.hdfs-sink.type = hdfs
agent1.sinks.hdfs-sink.hdfs.path = /test001/weblogsflume/
agent1.sinks.hdfs-sink.channel = memory-channel
agent1.sinks.hdfs-sink.hdfs.rollInterval = 0
agent1.sinks.hdfs-sink.hdfs.rollSize = 524288
agent1.sinks.hdfs-sink.hdfs.rollCount = 0
agent1.sinks.hdfs-sink.hdfs.fileType = DataStream
#Setting channels
agent1.channels.memory-channel.type = memory
agent1.channels.memory-channel.capacity = 100000
agent1.channels.memory-channel.transactionCapacity = 1000
$cd /mytraining/exercises/flume/spooldir.conf
启动 Flume:
$ flume-ng agent --conf /etc/flume-ng/conf \
> --conf-file spooldir.conf \
> --name agent1 -Dflume.root.logger=INFO,console
Info: Sourcing environment configuration script /etc/flume-ng/conf/flume-env.sh
Info: Including Hadoop libraries found via (/usr/bin/hadoop) for HDFS access
Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12.jar from classpath
Info: Including HBASE libraries found via (/usr/bin/hbase) for HBASE access
Info: Excluding /usr/lib/hbase/bin/../lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /usr/lib/hbase/bin/../lib/slf4j-log4j12.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-log4j12.jar from classpath
Info: Including Hive libraries found via () for Hive access
...
-Djava.library.path=:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hbase/bin/../lib/native/Linux-amd64-64 org.apache.flume.node.Application --conf-file spooldir.conf --name agent1
2017-10-20 21:07:08,929 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] Configuration provider starting
2017-10-20 21:07:09,057 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:spooldir.conf
2017-10-20 21:07:09,300 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,302 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,302 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:931)] Added sinks: hdfs-sink Agent: agent1
...
2017-10-20 21:07:09,304 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,306 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,310 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
...
2017-10-20 21:07:10,398 (conf-file-poller-0)
[INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{webserver-log-source=EventDrivenSourceRunner: { source:Spool Directory source webserver-log-source: { spoolDir: /flume/weblogsmiddle } }} sinkRunners:{hdfs-sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@12c67180 counterGroup:{ name:null counters:{} } }} channels:{memory-channel=org.apache.flume.channel.MemoryChannel{name: memory-channel}} }
...
2017-10-20 21:10:25,268 (pool-6-thread-1) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:238)] Last read was never committed - resetting mark position.
向 /flume/weblogsmiddle 传入 log:
cp -r /mytest/weblogs /tmp/tmpweblogs
mv /tmp/tmpweblogs/* /flume/weblogsmiddle
等待几分钟后,查看 hdfs 上的变化:
$
$ hdfs dfs -ls /test001/weblogsflume
-rw-rw-rw- 1 training supergroup 527909 2017-10-20 21:10 /test001/weblogsflume/FlumeData.1508558917884 -rw-rw-rw- 1 training supergroup 527776 2017-10-20 21:10 /test001/weblogsflume/FlumeData.1508558917885 ...-rw-rw-rw- 1 training supergroup 527909 2017-10-20 21:10 /test001/weblogsflume/FlumeData.1508558917884 -rw-rw-rw- 1 training supergroup 527776 2017-10-20 21:10 /test001/weblogsflume/FlumeData.1508558917885 $
在flume-ng 启动的窗口,按下 Ctrl+C Ctrol+Z 停止 flume 的运行
^C
^Z
[1]+ Stopped
flume-ng agent --conf /etc/flume-ng/conf --conf-file spooldir.conf --name agent1 -Dflume.root.logger=INFO,console
[training@localhost flume]$
[Flume]使用 Flume 来传递web log 到 hdfs 的例子相关推荐
- 【Flume】Flume入门
Flume 简介 Flume 作为 cloudera 开发的实时日志收集系统,受到了业界的认可与广泛应用.Flume 初始的发行版本目前被统称为 Flume OG(original generatio ...
- Flume篇---Flume安装配置与相关使用
一.前述 Copy过来一段介绍Apache Flume 是一个从可以收集例如日志,事件等数据资源,并将这些数量庞大的数据从各项数据资源中集中起来存储的工具/服务,或者数集中机制.flume具有高可用, ...
- vue npm run serve/dev命令后台运行:nohup npm run serve >web.log 2>1 exit
nohup npm run serve >web.log 2>&1 & exit
- 基于HDP使用Flume实时采集MySQL中数据传到Kafka+HDFS或Hive
环境版本: HDP-2.5.3 注意:HDP中Kafka broker的端口是6667,不是9092 如果只sink到kafka请看这篇:基于HDP使用Flume采集MySQL中数据传到Kafka 前 ...
- Windows服务器流量异常排查分析(Nginx日志分析):Web Log Expert 和 GlassWire 的使用
最近某台阿里云服务器流量带宽突然超负荷运转,想了想,难道被攻击了?不应该会用户突然暴增啊!?于是开始排查流量来源. 首先,用 GlassWire 监听服务器的那个应用占用了大量的流量.GlassWir ...
- Flume(7) flume自定义Sinks实现
源码地址 前言 接上一篇Flume(6) flume自定义Sources实现,我们总结了一下自定义source的流程,这次我们实现一个自己的Sink,将数据Sink到Mysql数据库中. 创建数据库相 ...
- Hadoop集群启动后利用Web界面管理HDFS
Hadoop集群启动后,可以通过自带的浏览器Web界面查看HDFS集群的状态信息,访问IP为NameNode所在服务器的IP地址,hadoop版本为3.0以前访问端口默认为9870,hadoop版本为 ...
- 【flume】flume读取web应用某个文件夹下日志到hdfs
简介 这里主要是做这个实验遇到的问题以及解决方法 问题:java.lang.OutOfMemoryError: GC overhead limit exceeded ###1.准备工作 我本来想用已经 ...
- 日志采集框架Flume以及Flume的安装部署(一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统)...
Flume支持众多的source和sink类型,详细手册可参考官方文档,更多source和sink组件 http://flume.apache.org/FlumeUserGuide.html Flum ...
最新文章
- python画指数函数图像_python实现画出e指数函数的图像
- HTTP报文(待整理)
- localdate转date时区问题_时间戳和LocalDateTime和Date互转和格式化
- 杭电1597_find the nth digit
- java实体类怎么写_Java中(entity)实体类的书写规范
- 使用xsodata文件将SAP HANA CDS view暴露成OData服务
- 云漫圈 | 写给对 ”游戏开发” 感兴趣的朋友们
- Oracle创建表空间及用户
- vb net的定时循环_.NET工具ReSharper:如何帮助Visual Studio用户?
- 【日语笔记】日常日语
- spring cloud 资源
- 为什么坚持一件事总是那么难,而且有时候总是三分钟热度?
- 华为静态路由配置案例
- 倒车检测线怎么接图解_倒车影像摄像头3根线安装图解 这是倒车影像的电源线...
- 窄带包络解调python实现_对数据包络分析法DEA的再理解,以及python 实现
- ENVI学习总结——基于改进的 CASA 模型反演NPP
- 开源GIS--geos实现空间连接
- sd卡分区工具PM9.0汉化版
- 银行不良资产证券化是利好还是利空?
- Kettle数据从txt到数据库表,表到文件