一、核心关注点

因为flume版本不同，source、channel和sink的接口都是不一样的，所以需要使用对应版本的接口。
本文以flume1.6.0为例，参考http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.6.0-cdh5.7.0/FlumeUserGuide.html

二、source

1.avro source

（1）功能
侦听Avro端口并从外部Avro客户端流接收事件。适用于：分层的数据收集。
（2）必须配置的参数

Property Name	Default	Description
channels	–
type	–	The component type name, needs to be avro
bind	–	hostname or IP address to listen on
port	–	Port # to bind to

（3）实例

a1.sources = r1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4141

2.exec source

（1）功能
监控文件。适用场景：监控日志
（2）必须配置的参数

Property Name	Default	Description
channels	–
type	–	The component type name, needs to be exec
command	–	The command to execute

（3）实例

a1.sources = r1
a1.channels = c1
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /var/log/secure
a1.sources.r1.channels = c1

3.Spooling Directory Source

（1）功能
监控某一个文件目录。
（2）必须配置的参数

Property Name	Default	Description
channels	–
type	–	The component type name, needs to be spooldir.
spoolDir	–	The directory from which to read files from.

（3）实例

a1.channels = ch-1
a1.sources = src-1a1.sources.src-1.type = spooldir
a1.sources.src-1.channels = ch-1
a1.sources.src-1.spoolDir = /var/log/apache/flumeSpool
a1.sources.src-1.fileHeader = true

三、channel

1.Memory Channel

（1）功能
事件存储在具有可配置最大大小的内存队列中。适用场景：需要更高吞吐量并准备在代理故障的情况下丢失上载数据的流的理想选择。
缺点：Memory Channel是一个不稳定的隧道，它在内存中存储所有事件。如果进程异常停止，内存中的数据将不能让恢复。受内存大小的限制。
（2）必须配置的参数

Property Name	Default	Description
type	–	The component type name, needs to be memory

（3）实例

a1.channels = c1
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000

2.flie channel

（1）功能
是一个持久化的channel，数据安全并且只要磁盘空间足够，它就可以将数据存储到磁盘上
（2）必须配置的参数

Property Name Default	Description
type	–	The component type name, needs to be file.

（3）实例

a1.channels = c1
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /mnt/flume/checkpoint
a1.channels.c1.dataDirs = /mnt/flume/data

参数解析：

checkpointDir：检查数据完整性，存放检查点目录，可以检测出哪些数据已被抽取，哪些还没有
dataDirs：存放数据的目录，dataDirs可以是多个目录，以逗号隔开，
用独立的多个磁盘上的多个目录可以提高file channel的性能。

四、sink

1.HDFS sink

（1）功能
此接收器将事件写入Hadoop分布式文件系统（HDFS）
（2）必须配置的参数

Name	Default	Description
channel	–
type	–	The component type name, needs to be hdfs
hdfs.path	–	HDFS directory path (eg hdfs://namenode/flume/webdata/)

（3）实例

a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute

2.hive sink

（1）功能
此接收器将包含定界文本或JSON数据的事件直接传输到Hive表或分区。
（2）必须配置的参数

Name	Default	Description
channel	–
type	–	The component type name, needs to be hive
hive.metastore	–	Hive metastore URI (eg thrift://a.b.com:9083 )
hive.database	–	Hive database name
hive.table	–	Hive table name

（3）实例

a1.channels = c1
a1.channels.c1.type = memory
a1.sinks = k1
a1.sinks.k1.type = hive
a1.sinks.k1.channel = c1
a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083
a1.sinks.k1.hive.database = logsdb
a1.sinks.k1.hive.table = weblogs
a1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M
a1.sinks.k1.useLocalTimeStamp = false
a1.sinks.k1.round = true
a1.sinks.k1.roundValue = 10
a1.sinks.k1.roundUnit = minute
a1.sinks.k1.serializer = DELIMITED
a1.sinks.k1.serializer.delimiter = "\t"
a1.sinks.k1.serializer.serdeSeparator = '\t'
a1.sinks.k1.serializer.fieldnames =id,,msg

3.hbase sink

（1）功能
把数据写入hbase。
（2）必须配置的参数

Property Name	Default	Description
channel	–
type	–	The component type name, needs to be hbase
table	–	The name of the table in Hbase to write to.
columnFamily	–	The column family in Hbase to write to.

（3）实例

a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = hbase
a1.sinks.k1.table = foo_table
a1.sinks.k1.columnFamily = bar_cf
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
a1.sinks.k1.channel = c1

4.avro sink

（1）功能
avro sink形成了Flume分层收集支持的一半。发送到此接收器的Flume事件将转换为Avro事件并发送到配置的主机名/端口对。事件从已配置的通道以批量配置的批处理大小获取
（2）必须配置的参数

Property Name	Default	Description
channel	–
type	–	The component type name, needs to be avro.
hostname	–	The hostname or IP address to bind to.
port	–	The port # to listen on.

（3）实例

a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 10.10.10.10
a1.sinks.k1.port = 4545

5.kafka sink

（1）功能
把数据写入kafka对应的topic中。
（2）必须配置的参数

Property Name	Default	Description
type	–	Must be set to org.apache.flume.sink.kafka.KafkaSink
brokerList	–	List of brokers Kafka-Sink will connect to, to get the list of topic partitions This can be a partial list of brokers, but we recommend at least two for HA. The format is comma separated list of hostname:port

（3）实例

a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = mytopic
a1.sinks.k1.brokerList = localhost:9092
a1.sinks.k1.requiredAcks = 1
a1.sinks.k1.batchSize = 20
a1.sinks.k1.channel = c1

flume（三）：常见source、channel和sink总结相关推荐

大数据——Flume组件Source、Channel和Sink具体使用
Flume组件Source.Channel和Sink使用说明 Flume Sources Avro Source 配置范例 Thrift Source 配置范例 Exec Source 配置范例 JM ...
flume avro java_flume之Avro Source和Avro Sink
一.Avro Souce介绍 Flume主要的RPC Source是Avro Source Avro Source被设计为高扩展的RPC服务器端,能从其他的Flume Agent的Avro Sink或 ...
Flume多source，多sink组合框架搭建
Flume多source,多sink组合框架搭建文章目录 Flume多source,多sink组合框架搭建一.实验目的二.实验原理三.实验环境四.实验内容五.实验步骤总结一.实验目的 ...
flume三种方式收集日志的案例
exec 监测某个单一的文件 # user_action_log_src 这个名字可以任意取 agent1.sources = user_action_log_src # memoryChannel ...
Flume NG之Agent部署和sink配置HDFS且吐槽CSDN博客及客服态度
实在是想对csdn博客吐槽,一天到晚要么发布不了,发布成功的居然还时不时看不到文章内容(空白的),有时还打不开博客,问客服就说换浏览器.我真想和csdn客服说,你妹的,你知不知道我是码农啊,初步的问题 ...
Android NDK开发(三)——常见错误集锦以及LOG使用，androidndk
Android NDK开发(三)--常见错误集锦以及LOG使用,androidndk 转载请注明出处:http://blog.csdn.net/allen315410/article/details/ ...
Flume之——配置多个Sink源（一个Source对应多个Channel和Sink）
转载请注明出处:https://blog.csdn.net/l1028386804/article/details/98055100 配置模型如下图: Flume的配置如下: myagent.sour ...
flume 多个source sinks channel 配置记录
a1.sources = r1 r2 a1.sinks = k1 k2 a1.channels = c1 c2##### source a1.sources.r1.type =spooldir a1. ...
Flink的Source端和Sink端大全
Flink和各种组件 enviroment Source flink + kafka (flink 消费 kafka 中的数据) Transform Transformation 的介绍复杂的方法 ...
Flume使用file作为channel的一个错误处理
Flume报错: ERROR org.apache.flume.SinkRunner: Unable to deliver event. Exception follows. java.lang.Il ...

flume（三）：常见source、channel和sink总结

一、核心关注点

二、source

1.avro source

2.exec source

3.Spooling Directory Source

三、channel

1.Memory Channel

2.flie channel

四、sink

1.HDFS sink

2.hive sink

3.hbase sink

4.avro sink

5.kafka sink

flume（三）：常见source、channel和sink总结相关推荐

最新文章

热门文章