Putting the Pieces Together 把各个块整合起来

This combination of messaging, storage, and stream processing may seem unusual but it is essential to Kafka’s role as a streaming platform.

组合消息, 存储, 流处理这些看起来不太平常, 但是这些仍然是kafka的作流处理平台的主要功能

A distributed file system like HDFS allows storing static files for batch processing. Effectively a system like this allows storing and processing historical data from the past.

像hdfs分布式文件处理系统, 允许存储静态数据用于批处理, 能使得系统在处理和分析过往的历史数据时更为有效

A traditional enterprise messaging system allows processing future messages that will arrive after you subscribe. Applications built in this way process future data as it arrives.

像传统的消息系统, 允许处理在你订阅之前的信息, 像这样的应用可以处理之前到达的数据

Kafka combines both of these capabilities, and the combination is critical both for Kafka usage as a platform for streaming applications as well as for streaming data pipelines.

kafka整合和所有这些功能, 这些组合包括把kafka平台当作一个流处理应用, 或者是作为流处理的管道

By combining storage and low-latency subscriptions, streaming applications can treat both past and future data the same way. That is a single application can process historical, stored data but rather than ending when it reaches the last record it can keep processing as future data arrives. This is a generalized notion of stream processing that subsumes batch processing as well as message-driven applications.

通过组合数据存储和低订阅开销, 流处理应用可以平等对待之前到达记录或即将到达的记录, 这就是一个应用可以处理历史存储的数据, 也可以在读到最后记录后, 保持等待未来的数据进行处理. 这是流处理,包括批处理以及消息驱动的应用的一个广义的概念

Likewise for streaming data pipelines the combination of subscription to real-time events make it possible to use Kafka for very low-latency pipelines; but the ability to store data reliably make it possible to use it for critical data where the delivery of data must be guaranteed or for integration with offline systems that load data only periodically or may go down for extended periods of time for maintenance. The stream processing facilities make it possible to transform data as it arrives.

同样的像流处理管道, 使用kafka在实时事件系统能实现比较低的延迟管道; 在kafka的存储能力, 使得一些离线系统, 如定时加载数据, 或者维护宕机时数据分发能力更有保障性. 流处理功能在数据到达时进行数据转换处理

For more information on the guarantees, apis, and capabilities Kafka provides see the rest of the documentation.

更多关于kafka提供的功能, 服务和api, 可以查看后文

1.2 Use Cases 用例

Here is a description of a few of the popular use cases for Apache Kafka™. For an overview of a number of these areas in action, see this blog post.

这里提供者一些常用的Kafak使用场景, 更多这些领域的详细说明可以参考这里,  this blog post.

Messaging 消息通讯

Kafka works well as a replacement for a more traditional message broker. Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc). In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large scale message processing applications.

Kafka  可以很好替代传统的消息服务器, 消息服务器的使用有多方面的原因(比如, 可以从生产者上解耦出数据处理, 缓冲未处理的数据), 相比其它消息系统, kafka有更好的吞吐量, 分区机制, 复制和容错能力, 更能适用于大规模的在线数据处理

In our experience messaging uses are often comparatively low-throughput, but may require low end-to-end latency and often depend on the strong durability guarantees Kafka provides.

从经验来看, 消息系统一般吞吐量都比较小, 更多的是要求更低的端到端的延迟, 这些功能都可以依赖于kafka的高可保障

In this domain Kafka is comparable to traditional messaging systems such as ActiveMQ or RabbitMQ.

在这个领域上, kafka可以类比于传统的消息系统, 如: ActiveMQ or RabbitMQ.

Website Activity Tracking 网页日志跟踪

The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type. These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting.

kafka最通常一种使用方式是通过构建用户活动跟踪管道作为实时发布和订阅反馈队列. 页面操作(查看, 检索或任何用户操作)都可以按活动类型发送的不同的topic上, 这些反馈信息, 有助于构建一个实时处理, 实时监控, 或加载到hadoop集群, 构建数据仓库用于离线处理和分析

Activity tracking is often very high volume as many activity messages are generated for each user page view.

由于每个用户页面访问都要记录, 活动日志跟踪一般会有大量的访问消息被记录

Metrics 度量

Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.

Kafak还经常用于运行监控数据的存储, 这涉及到对分布式应用的运行数的及时汇总统计

Log Aggregation 日志汇聚

Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption. In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, and much lower end-to-end latency.

很多人使用kafka作为日志汇总的替代品. 典型的情况下, 日志汇总从物理机上采集回来并忖道中央存储中心(如hdfs分布式文件系统)后等待处理, kafka将文件的细节抽象出来,并将日志或事件数据清理和转换成消息流. 这方便与地延迟的数据处理, 更容易支持多个数据源, 和分布式的数据消费. 比起典型的日志中心系统如Scribe或者flume 系统, kafka提供等同更好的性能, 更强大的复制保证和更低的端到端的延迟

Stream Processing 流处理

Many users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and then aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing. For example, a processing pipeline for recommending news articles might crawl article content from RSS feeds and publish it to an “articles” topic; further processing might normalize or deduplicate this content and published the cleansed article content to a new topic; a final processing stage might attempt to recommend this content to users. Such processing pipelines create graphs of real-time data flows based on the individual topics. Starting in 0.10.0.0, a light-weight but powerful stream processing library called Kafka Streams is available in Apache Kafka to perform such data processing as described above. Apart from Kafka Streams, alternative open source stream processing tools include Apache Storm and Apache Samza.

更多的kafka用户在处理数据时一般都是多流程多步骤的, 原始数据从kafka的topic里面被读取, 然后汇总, 分析 然后转换到新的topic中进行后续的消费处理. 例如, 文章推荐处理管道肯能从RSS feeds里面抓取文章内容, 然后发布到文章这个topic中, 后面在继续规范化处理,除去重复后发布到另外一个新的topic中去, 一个最终的步骤可能是把文章内容推荐给用户, 像这样的实时系统流数据处理管道基于各个独立的topic, 从0.10.0.0开始, kafak提供一个轻量级, 但是非常强大的流处理api叫做 Kafka Streams , 可以处理上述描述的任务情景. 除了kafka的流机制外, 可选择开源项目有e Apache Storm 和 Apache Samza.

Event Sourcing 事件源

Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka’s support for very large stored log data makes it an excellent backend for an application built in this style.

时间记录是应用在状态变化时按时间顺序依次记录状态变化日志的一种设计风格. kafka很适合作为这种风格的后端服务器

Commit Log 提交日志

Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data. The log compaction feature in Kafka helps support this usage. In this usage Kafka is similar to Apache BookKeeperproject.

Kafka适用于分布式系统的外部提交日志, 这些日志方便于在节点间进行复制, 并在服务器故障是提供重新同步机能. kafka的日志压缩特性有利于这方面的使用, 这个特性有点儿像Apache BookKeeper 这个项目

1.3 Quick Start 快速开始

This tutorial assumes you are starting fresh and have no existing Kafka™ or ZooKeeper data. Since Kafka console scripts are different for Unix-based and Windows platforms, on Windows platforms use bin\windows\ instead of bin/, and change the script extension to .bat.

该入门指南假定你对kafka和zookeeper是个新手, kafka的控制台脚步window和unix系统不一样, 如果在window系统, 请使用 bin\windows\目录下的脚本, 而不是使用bin/, 下的脚本

Step 1: Download the code

Download the 0.10.1.0 release and un-tar it. 下载 0.10.1.0 版本并解压

> tar -xzf kafka_2.11-0.10.1.0.tgz
> cd kafka_2.11-0.10.1.0

Step 2: Start the server

Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don’t already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.

由于kafka使用ZooKeepe服务器,  如果你没有zookeeper服务器需要先启动一个, 你可以使用kafka已经打包好的快捷脚本用于创建一个单个节点的zookeeper实例

> bin/zookeeper-server-start.sh config/zookeeper.properties
[2013-04-22 15:01:37,495] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
...

Now start the Kafka server: 现在可以启动kafka服务器

> bin/kafka-server-start.sh config/server.properties
[2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties)
[2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
...

Step 3: Create a topic

Let’s create a topic named “test” with a single partition and only one replica: 创建一个只有一个分区和一个副本的topic叫做”test”,

> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

We can now see that topic if we run the list topic command: 如果使用查看topic查看命令, 我们就可以看到所有topic列表

> bin/kafka-topics.sh --list --zookeeper localhost:2181
test

Alternatively, instead of manually creating topics you can also configure your brokers to auto-create topics when a non-existent topic is published to.

还有一种可选的方式, 如果不想手动创建topic, 你可以配置服务器在消息发时, 自动创建topic对象

Step 4: Send some messages

Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message.

kafka自带的命令行终端脚本, 可以从文件或标准输入读取行输入, 并发送消息到kafka集群, 默认每行数据当作一条独立的消息进行发送

Run the producer and then type a few messages into the console to send to the server.

运行发布者终端脚步, 然后从终端输入一些消息后发送到服务器

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
This is a message
This is another message

Step 5: Start a consumer

Kafka also has a command line consumer that will dump out messages to standard output.

kafka也自带一个消费者命令行终端脚本, 可以把消息打印到终端上

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
This is a message
This is another message

If you have each of the above commands running in a different terminal then you should now be able to type messages into the producer terminal and see them appear in the consumer terminal.

如果上面的两个命令跑在不同的终端上, 则从提供者终端输入消息, 会在消费者终端展现出来

All of the command line tools have additional options; running the command with no arguments will display usage information documenting them in more detail.

上面的命令都需要而外的命令行参数, 如果只输入命令不带任何参数, 则会提示更多关于该命令的使用说明

Step 6: Setting up a multi-broker cluster

So far we have been running against a single broker, but that’s no fun. For Kafka, a single broker is just a cluster of size one, so nothing much changes other than starting a few more broker instances. But just to get feel for it, let’s expand our cluster to three nodes (still all on our local machine).

到现在为止, 我们只跑了当个服务器实例, 但是这个不好玩, 对于kafka来说, 单个服务器实例意味着这个集群只有一个成员, 如果要启动多个实例也不需要做太多的变化. 现在来感受下, 把我们的集群扩展到3个机器(在同一台物理机上)

First we make a config file for each of the brokers (on Windows use the copy command instead):

所需我们给每个不同的服务器复制一份不同的配置文件

> cp config/server.properties config/server-1.properties
> cp config/server.properties config/server-2.properties

Now edit these new files and set the following properties:现在重新配置这些新的文件

config/server-1.properties:broker.id=1listeners=PLAINTEXT://:9093log.dir=/tmp/kafka-logs-1config/server-2.properties:broker.id=2listeners=PLAINTEXT://:9094log.dir=/tmp/kafka-logs-2

The broker.id property is the unique and permanent name of each node in the cluster. We have to override the port and log directory only because we are running these all on the same machine and we want to keep the brokers from all trying to register on the same port or overwrite each other’s data.

broker.id 属性对于集群中的每个服务器实例都必须是唯一的且不变的, 我们重新了端口号和日志目录, 是因为我们实例都是跑在同一台物理机器上, 需要使用不同的端口和目录来防止冲突

We already have Zookeeper and our single node started, so we just need to start the two new nodes:

我们已经有了Zookeeper服务器, 而且已经有启动一个实例, 现在我们再启动2个实例

> bin/kafka-server-start.sh config/server-1.properties &
...
> bin/kafka-server-start.sh config/server-2.properties &
...

Now create a new topic with a replication factor of three: 现在可以创建一个topic包含3个副本

> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic

Okay but now that we have a cluster how can we know which broker is doing what? To see that run the “describe topics” command:

Ok, 现在如果我们怎么知道那个实例在负责什么? 可以通过 “describe topics”命令查看

> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic   PartitionCount:1    ReplicationFactor:3 Configs:Topic: my-replicated-topic  Partition: 0    Leader: 1   Replicas: 1,2,0 Isr: 1,2,0

Here is an explanation of output. The first line gives a summary of all the partitions, each additional line gives information about one partition. Since we have only one partition for this topic there is only one line.

这里解析下输出信息, 第一行是所有的分区汇总, 每行分区的详细信息, 因为我们只有一个分区, 所以只有一行

  • “leader” is the node responsible for all reads and writes for the given partition. Each node will be the leader for a randomly selected portion of the partitions.
  • “leader”表示这个实例负责响应指定分区的读写请求, 每个实例都有可能被随机选择为部分分区的leader负责人
  • “replicas” is the list of nodes that replicate the log for this partition regardless of whether they are the leader or even if they are currently alive.
  • “replicas”  表示当前分区分发的所有副本所在的所有实例列表, 不管这个实例是否有存活
  • “isr” is the set of “in-sync” replicas. This is the subset of the replicas list that is currently alive and caught-up to the leader.
  • “isr”  表示存储当前分区的日志都已经同步到leader的服务器的实例集合

Note that in my example node 1 is the leader for the only partition of the topic. 注意我这个例子只有实例1是主服务器, 因为topic只有一个分区

We can run the same command on the original topic we created to see where it is: 我们可以运行同样的命令在原来创建的topic上

> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
Topic:test  PartitionCount:1    ReplicationFactor:1 Configs:Topic: test Partition: 0    Leader: 0   Replicas: 0 Isr: 0

So there is no surprise there—the original topic has no replicas and is on server 0, the only server in our cluster when we created it.

意料之中, 原来的topic没有副本, 而且由实例0负责, 实例0是我们集群最初创建时的唯一实例

Let’s publish a few messages to our new topic: 我们发布点消息到新的主题上

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic
...
my test message 1
my test message 2
^C

Now let’s consume these messages: 现在我们开始消费这些消息

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
...
my test message 1
my test message 2
^C

Now let’s test out fault-tolerance. Broker 1 was acting as the leader so let’s kill it: 现在,让我们测试下容错能力, 实例1现在是主服务器, 我们现在把它kill掉

> ps aux | grep server-1.properties
7564 ttys002    0:15.91 /System/Library/Frameworks/JavaVM.framework/Versions/1.8/Home/bin/java...
> kill -9 7564

On Windows use:

> wmic process get processid,caption,commandline | find "java.exe" | find "server-1.properties"
java.exe    java  -Xmx1G -Xms1G -server -XX:+UseG1GC ... build\libs\kafka_2.10-0.10.1.0.jar"  kafka.Kafka config\server-1.properties    644
> taskkill /pid 644 /f

Leadership has switched to one of the slaves and node 1 is no longer in the in-sync replica set: 主服务器切换到原来的两个从服务器里面, 原来的实例1也不在同步副本里面了

> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic   PartitionCount:1    ReplicationFactor:3 Configs:Topic: my-replicated-topic  Partition: 0    Leader: 2   Replicas: 1,2,0 Isr: 2,0

But the messages are still available for consumption even though the leader that took the writes originally is down: 但是消息还是可以消费, 尽管原来接受消息的主服务器已经宕机了

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
...
my test message 1
my test message 2
^C

转载自 并发编程网 - ifeve.com

《kafka中文手册》-快速开始(二)相关推荐

  1. 《kafka中文手册》- 构架设计(二)

    4.6 Message Delivery Semantics 消息分发语义 Now that we understand a little about how producers and consum ...

  2. 《kafka中文手册》- 构架设计(一)

    4. DESIGN 设计 4.1 Motivation 目的 4.2 Persistence 存储 Don't fear the filesystem! 不要对文件系统感到恐惧 Constant Ti ...

  3. 《kafka中文手册》- 构架设计

    4. DESIGN 设计 4.1 Motivation 目的 4.2 Persistence 存储 Don't fear the filesystem! 不要对文件系统感到恐惧 Constant Ti ...

  4. (转)Linux Kernel核心中文手册

    转自糖蒜的小屋http://blog.csdn.net/seastar_pickle/category/101975.aspx?PageNumber=2 Hardware Basic( 硬件基础知识  ...

  5. uiautomator2 库中文手册

    uiautomator2 库中文手册 一.安装 1.安装uiautomator2: 2.安装设备守护进程: 3.安装weditor 二.使用指南 1. 连接设备 2. 命令行使用 三.API手册 1. ...

  6. python库和语言手册_pytorch 中文手册

    PyTorch 中文手册(pytorch handbook) 书籍介绍 这是一本开源的书籍,目标是帮助那些希望和使用PyTorch进行深度学习开发和研究的朋友快速入门. 由于本人水平有限,在写此教程的 ...

  7. erlang nif 中文手册

    这是翻译erlang官方文档中的 erts-5.9.2的erl_nif部分.翻译完了.水平有限,我就把这个当作是我自己使用了,以后也会继续完善的. erlang nif 中文手册 概括 功能 初始化 ...

  8. 【转载】man rsync翻译(rsync命令中文手册)

    本文转自:博客园-骏马金龙.作者是个狠人,把手册每个命令都翻译了一遍.翻译不易,特打赏了作者1元RMB作为感谢,哈哈.:-) 看到原译文这么长,心里有些触动.想起最近看的<大师谈游戏设计:创意与 ...

  9. man rsync翻译(rsync命令中文手册)

    本文为命令rsync的man文档翻译,几乎所有的选项都翻译了,另外关于筛选规则部分只翻译了一部分.由于原文很多地方都比较啰嗦,所以译文中有些内容可能容易让国人疑惑,所以我个人在某些地方加上了注释.若有 ...

最新文章

  1. 迁移学习全面指南:概念、项目实战、优势、挑战
  2. Yahoo前端优化性能规则
  3. python读取txt文件代码-python批量处理txt文件的实例代码
  4. ubuntu安装过程中遇到问题小结
  5. 有36个人,36块砖,每人搬了一次,正好搬完。 其中男每人每次搬4块,女每人每次搬3块,小孩两人每次搬一块。问 男、女、小孩各多少人?...
  6. Ubuntu环境下挂载新硬盘 --硬盘要挂载在某个文件夹下面
  7. 元宇宙iwemeta:2021年企业领袖榜公布,曹德旺为终身成就奖
  8. pythonyield详解_Python yield生成器详解
  9. Python爬取图片时,urllib提示没有属性urlretrieve的问题
  10. 最大连续子序列和的问题
  11. linux oracle12c rman,12C CDB模式下RMAN备份与恢复
  12. [视频]AI 机器学习 深度学习 视频教程汇总
  13. <PCI-E> PCI-E的 x1/x4/x8/x16 四种插槽区别
  14. 淘宝直播详细开通方法
  15. 解决mysql开启GTID主从同步出现1236错误问题
  16. Cannot allocate memory
  17. 跨平台数据库桌面管理工具
  18. 再捐1亿元种树治沙:蚂蚁集团持续七年支持内蒙古生态治理
  19. 判断设备是否是 iphone5
  20. python 如果你的年龄大于18_5分钟学会Python的if条件判断语句

热门文章

  1. pageContext.findAttribute()与pageContext.getAttribute()的区别
  2. 使用node.js进行API自动化回归测试
  3. ASP.NET 实现站内信功能(点对点发送,管理员群发)
  4. 使用Mootools动态添加Css样式表代码,兼容各浏览器
  5. Duwamish深入剖析-配置篇
  6. PHP版本VC6与VC9/VC11/VC14、Thread Safe与None-Thread Safe等的区别
  7. 在SQLMAP中使用动态SQL
  8. Ruby一些小case总结
  9. mysql常用命令行操作(二):表和库的操作、引擎、聚合函数
  10. 大规模集群中Docker镜像如何分发管理?试试Uber刚开源的Kraken