[Flume]使用 Flume 来传递web log 到 hdfs 的例子:

在 hdfs 上创建存储 log 的目录:
$ hdfs dfs -mkdir -p /test001/weblogsflume

指定log 输入的目录:
$ sudo mkdir -p /flume/weblogsmiddle

设定使得log 可以被任何用户访问:
$ sudo chmod a+w -R /flume
$

设置配置文件内容:

$ cat /mytraining/exercises/flume/spooldir.conf

#Setting component
agent1.sources = webserver-log-source
agent1.sinks = hdfs-sink
agent1.channels = memory-channel

#Setting source
agent1.sources.webserver-log-source.type = spooldir
agent1.sources.webserver-log-source.spoolDir = /flume/weblogsmiddle
agent1.sources.webserver-log-source.channels = memory-channel

#Setting sinks
agent1.sinks.hdfs-sink.type = hdfs
agent1.sinks.hdfs-sink.hdfs.path = /test001/weblogsflume/
agent1.sinks.hdfs-sink.channel = memory-channel
agent1.sinks.hdfs-sink.hdfs.rollInterval = 0
agent1.sinks.hdfs-sink.hdfs.rollSize = 524288
agent1.sinks.hdfs-sink.hdfs.rollCount = 0
agent1.sinks.hdfs-sink.hdfs.fileType = DataStream

#Setting channels
agent1.channels.memory-channel.type = memory
agent1.channels.memory-channel.capacity = 100000
agent1.channels.memory-channel.transactionCapacity = 1000

 

$cd /mytraining/exercises/flume/spooldir.conf

启动 Flume:

$ flume-ng agent --conf /etc/flume-ng/conf \
> --conf-file spooldir.conf \
> --name agent1 -Dflume.root.logger=INFO,console

Info: Sourcing environment configuration script /etc/flume-ng/conf/flume-env.sh
Info: Including Hadoop libraries found via (/usr/bin/hadoop) for HDFS access
Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12.jar from classpath
Info: Including HBASE libraries found via (/usr/bin/hbase) for HBASE access
Info: Excluding /usr/lib/hbase/bin/../lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /usr/lib/hbase/bin/../lib/slf4j-log4j12.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-log4j12.jar from classpath
Info: Including Hive libraries found via () for Hive access

...

-Djava.library.path=:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hbase/bin/../lib/native/Linux-amd64-64 org.apache.flume.node.Application --conf-file spooldir.conf --name agent1
2017-10-20 21:07:08,929 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] Configuration provider starting
2017-10-20 21:07:09,057 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:spooldir.conf
2017-10-20 21:07:09,300 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,302 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,302 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:931)] Added sinks: hdfs-sink Agent: agent1

...

2017-10-20 21:07:09,304 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,306 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,310 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:hdfs-sink
...

2017-10-20 21:07:10,398 (conf-file-poller-0) 
[INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{webserver-log-source=EventDrivenSourceRunner: { source:Spool Directory source webserver-log-source: { spoolDir: /flume/weblogsmiddle } }} sinkRunners:{hdfs-sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@12c67180 counterGroup:{ name:null counters:{} } }} channels:{memory-channel=org.apache.flume.channel.MemoryChannel{name: memory-channel}} }

...

2017-10-20 21:10:25,268 (pool-6-thread-1) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:238)] Last read was never committed - resetting mark position.

向 /flume/weblogsmiddle 传入 log:

cp -r /mytest/weblogs /tmp/tmpweblogs
mv /tmp/tmpweblogs/* /flume/weblogsmiddle

等待几分钟后,查看 hdfs 上的变化:

$
$ hdfs dfs -ls /test001/weblogsflume

-rw-rw-rw- 1 training supergroup 527909 2017-10-20 21:10 /test001/weblogsflume/FlumeData.1508558917884
-rw-rw-rw- 1 training supergroup 527776 2017-10-20 21:10 /test001/weblogsflume/FlumeData.1508558917885
...-rw-rw-rw- 1 training supergroup 527909 2017-10-20 21:10 /test001/weblogsflume/FlumeData.1508558917884
-rw-rw-rw- 1 training supergroup 527776 2017-10-20 21:10 /test001/weblogsflume/FlumeData.1508558917885
$ 

在flume-ng 启动的窗口,按下 Ctrl+C Ctrol+Z 停止 flume 的运行

^C
^Z
[1]+ Stopped
flume-ng agent --conf /etc/flume-ng/conf --conf-file spooldir.conf --name agent1 -Dflume.root.logger=INFO,console
[training@localhost flume]$

[Flume]使用 Flume 来传递web log 到 hdfs 的例子相关推荐

  1. 【Flume】Flume入门

    Flume 简介 Flume 作为 cloudera 开发的实时日志收集系统,受到了业界的认可与广泛应用.Flume 初始的发行版本目前被统称为 Flume OG(original generatio ...

  2. Flume篇---Flume安装配置与相关使用

    一.前述 Copy过来一段介绍Apache Flume 是一个从可以收集例如日志,事件等数据资源,并将这些数量庞大的数据从各项数据资源中集中起来存储的工具/服务,或者数集中机制.flume具有高可用, ...

  3. vue npm run serve/dev命令后台运行:nohup npm run serve >web.log 2>1 exit

    nohup npm run serve >web.log 2>&1 & exit

  4. 基于HDP使用Flume实时采集MySQL中数据传到Kafka+HDFS或Hive

    环境版本: HDP-2.5.3 注意:HDP中Kafka broker的端口是6667,不是9092 如果只sink到kafka请看这篇:基于HDP使用Flume采集MySQL中数据传到Kafka 前 ...

  5. Windows服务器流量异常排查分析(Nginx日志分析):Web Log Expert 和 GlassWire 的使用

    最近某台阿里云服务器流量带宽突然超负荷运转,想了想,难道被攻击了?不应该会用户突然暴增啊!?于是开始排查流量来源. 首先,用 GlassWire 监听服务器的那个应用占用了大量的流量.GlassWir ...

  6. Flume(7) flume自定义Sinks实现

    源码地址 前言 接上一篇Flume(6) flume自定义Sources实现,我们总结了一下自定义source的流程,这次我们实现一个自己的Sink,将数据Sink到Mysql数据库中. 创建数据库相 ...

  7. Hadoop集群启动后利用Web界面管理HDFS

    Hadoop集群启动后,可以通过自带的浏览器Web界面查看HDFS集群的状态信息,访问IP为NameNode所在服务器的IP地址,hadoop版本为3.0以前访问端口默认为9870,hadoop版本为 ...

  8. 【flume】flume读取web应用某个文件夹下日志到hdfs

    简介 这里主要是做这个实验遇到的问题以及解决方法 问题:java.lang.OutOfMemoryError: GC overhead limit exceeded ###1.准备工作 我本来想用已经 ...

  9. 日志采集框架Flume以及Flume的安装部署(一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统)...

    Flume支持众多的source和sink类型,详细手册可参考官方文档,更多source和sink组件 http://flume.apache.org/FlumeUserGuide.html Flum ...

最新文章

  1. python画指数函数图像_python实现画出e指数函数的图像
  2. HTTP报文(待整理)
  3. localdate转date时区问题_时间戳和LocalDateTime和Date互转和格式化
  4. 杭电1597_find the nth digit
  5. java实体类怎么写_Java中(entity)实体类的书写规范
  6. 使用xsodata文件将SAP HANA CDS view暴露成OData服务
  7. 云漫圈 | 写给对 ”游戏开发” 感兴趣的朋友们
  8. Oracle创建表空间及用户
  9. vb net的定时循环_.NET工具ReSharper:如何帮助Visual Studio用户?
  10. 【日语笔记】日常日语
  11. spring cloud 资源
  12. 为什么坚持一件事总是那么难,而且有时候总是三分钟热度?
  13. 华为静态路由配置案例
  14. 倒车检测线怎么接图解_倒车影像摄像头3根线安装图解 这是倒车影像的电源线...
  15. 窄带包络解调python实现_对数据包络分析法DEA的再理解,以及python 实现
  16. ENVI学习总结——基于改进的 CASA 模型反演NPP
  17. 开源GIS--geos实现空间连接
  18. sd卡分区工具PM9.0汉化版
  19. 银行不良资产证券化是利好还是利空?
  20. Kettle数据从txt到数据库表,表到文件

热门文章

  1. 超图三维服务学习摘要1
  2. MyEclipse的Add Libraries对话框
  3. Python(3) 进制转换
  4. mysqldumpslow基本使用
  5. 记录一次阿里云服务器升级报错
  6. MFC Timer定时器
  7. php+mysql 注入基本过程
  8. 快速获取Android应用包名和Activity名
  9. php svn up,php中执行svn update问题
  10. 可以与空间耦合的神经网络分子微扰模型BeO