前几篇文章我们主要介绍了一些理论上的知识,下面我们来实操一下,本文主要讲解Kafka生产者的API。关注专栏《破茧成蝶——大数据篇》,查看更多相关的内容~


目录

一、Kafka的消息发送流程

二、Kafka异步发送

2.1 不含回调函数的异步发送

2.1.1 编码实现

2.1.2 测试

2.2 含回调函数的异步发送

2.2.1 编码实现

2.2.2 测试

三、Kafka同步发送

3.1 编码实现

3.2 测试


一、Kafka的消息发送流程

Kafka的Producer发送消息采用的是异步发送的方式。在消息发送的过程中,涉及到了两个线程——main线程和Sender线程,以及一个线程共享变量——RecordAccumulator。main线程将消息发送给RecordAccumulator,Sender线程不断从RecordAccumulator中拉取消息发送到Kafka broker。

下面列举出了源码中Kafka生产者的配置参数:

    public static final String BOOTSTRAP_SERVERS_CONFIG = "bootstrap.servers";public static final String METADATA_MAX_AGE_CONFIG = "metadata.max.age.ms";private static final String METADATA_MAX_AGE_DOC = "The period of time in milliseconds after which we force a refresh of metadata even if we haven't seen any partition leadership changes to proactively discover any new brokers or partitions.";public static final String BATCH_SIZE_CONFIG = "batch.size";//数据积累到batch.size之后,sender才会发送数据private static final String BATCH_SIZE_DOC = "The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server. This configuration controls the default batch size in bytes. <p>No attempt will be made to batch records larger than this size. <p>Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent. <p>A small batch size will make batching less common and may reduce throughput (a batch size of zero will disable batching entirely). A very large batch size may use memory a bit more wastefully as we will always allocate a buffer of the specified batch size in anticipation of additional records.";public static final String ACKS_CONFIG = "acks";private static final String ACKS_DOC = "The number of acknowledgments the producer requires the leader to have received before considering a request complete. This controls the  durability of records that are sent. The following settings are allowed:  <ul> <li><code>acks=0</code> If set to zero then the producer will not wait for any acknowledgment from the server at all. The record will be immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the <code>retries</code> configuration will not take effect (as the client won't generally know of any failures). The offset given back for each record will always be set to -1. <li><code>acks=1</code> This will mean the leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers. In this case should the leader fail immediately after acknowledging the record but before the followers have replicated it then the record will be lost. <li><code>acks=all</code> This means the leader will wait for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost as long as at least one in-sync replica remains alive. This is the strongest available guarantee. This is equivalent to the acks=-1 setting.";public static final String LINGER_MS_CONFIG = "linger.ms";//如果数据一直没有达到batch.size,sender等待linger.ms之后就会发送数据private static final String LINGER_MS_DOC = "The producer groups together any records that arrive in between request transmissions into a single batched request. Normally this occurs only under load when records arrive faster than they can be sent out. However in some circumstances the client may want to reduce the number of requests even under moderate load. This setting accomplishes this by adding a small amount of artificial delay&mdash;that is, rather than immediately sending out a record the producer will wait for up to the given delay to allow other records to be sent so that the sends can be batched together. This can be thought of as analogous to Nagle's algorithm in TCP. This setting gives the upper bound on the delay for batching: once we get <code>batch.size</code> worth of records for a partition it will be sent immediately regardless of this setting, however if we have fewer than this many bytes accumulated for this partition we will 'linger' for the specified time waiting for more records to show up. This setting defaults to 0 (i.e. no delay). Setting <code>linger.ms=5</code>, for example, would have the effect of reducing the number of requests sent but would add up to 5ms of latency to records sent in the absense of load.";public static final String CLIENT_ID_CONFIG = "client.id";public static final String SEND_BUFFER_CONFIG = "send.buffer.bytes";public static final String RECEIVE_BUFFER_CONFIG = "receive.buffer.bytes";public static final String MAX_REQUEST_SIZE_CONFIG = "max.request.size";private static final String MAX_REQUEST_SIZE_DOC = "The maximum size of a request in bytes. This is also effectively a cap on the maximum record size. Note that the server has its own cap on record size which may be different from this. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests.";public static final String RECONNECT_BACKOFF_MS_CONFIG = "reconnect.backoff.ms";public static final String RECONNECT_BACKOFF_MAX_MS_CONFIG = "reconnect.backoff.max.ms";public static final String MAX_BLOCK_MS_CONFIG = "max.block.ms";private static final String MAX_BLOCK_MS_DOC = "The configuration controls how long <code>KafkaProducer.send()</code> and <code>KafkaProducer.partitionsFor()</code> will block.These methods can be blocked either because the buffer is full or metadata unavailable.Blocking in the user-supplied serializers or partitioner will not be counted against this timeout.";public static final String BUFFER_MEMORY_CONFIG = "buffer.memory";private static final String BUFFER_MEMORY_DOC = "The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are sent faster than they can be delivered to the server the producer will block for <code>max.block.ms</code> after which it will throw an exception.<p>This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if compression is enabled) as well as for maintaining in-flight requests.";public static final String RETRY_BACKOFF_MS_CONFIG = "retry.backoff.ms";public static final String COMPRESSION_TYPE_CONFIG = "compression.type";private static final String COMPRESSION_TYPE_DOC = "The compression type for all data generated by the producer. The default is none (i.e. no compression). Valid  values are <code>none</code>, <code>gzip</code>, <code>snappy</code>, or <code>lz4</code>. Compression is of full batches of data, so the efficacy of batching will also impact the compression ratio (more batching means better compression).";public static final String METRICS_SAMPLE_WINDOW_MS_CONFIG = "metrics.sample.window.ms";public static final String METRICS_NUM_SAMPLES_CONFIG = "metrics.num.samples";public static final String METRICS_RECORDING_LEVEL_CONFIG = "metrics.recording.level";public static final String METRIC_REPORTER_CLASSES_CONFIG = "metric.reporters";public static final String MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION = "max.in.flight.requests.per.connection";private static final String MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION_DOC = "The maximum number of unacknowledged requests the client will send on a single connection before blocking. Note that if this setting is set to be greater than 1 and there are failed sends, there is a risk of message re-ordering due to retries (i.e., if retries are enabled).";public static final String RETRIES_CONFIG = "retries";private static final String RETRIES_DOC = "Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error. Allowing retries without setting <code>max.in.flight.requests.per.connection</code> to 1 will potentially change the ordering of records because if two batches are sent to a single partition, and the first fails and is retried but the second succeeds, then the records in the second batch may appear first.";public static final String KEY_SERIALIZER_CLASS_CONFIG = "key.serializer";public static final String KEY_SERIALIZER_CLASS_DOC = "Serializer class for key that implements the <code>Serializer</code> interface.";public static final String VALUE_SERIALIZER_CLASS_CONFIG = "value.serializer";public static final String VALUE_SERIALIZER_CLASS_DOC = "Serializer class for value that implements the <code>Serializer</code> interface.";public static final String CONNECTIONS_MAX_IDLE_MS_CONFIG = "connections.max.idle.ms";public static final String PARTITIONER_CLASS_CONFIG = "partitioner.class";private static final String PARTITIONER_CLASS_DOC = "Partitioner class that implements the <code>Partitioner</code> interface.";public static final String REQUEST_TIMEOUT_MS_CONFIG = "request.timeout.ms";private static final String REQUEST_TIMEOUT_MS_DOC = "The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted. This should be larger than replica.lag.time.max.ms (a broker configuration) to reduce the possibility of message duplication due to unnecessary producer retries.";public static final String INTERCEPTOR_CLASSES_CONFIG = "interceptor.classes";public static final String INTERCEPTOR_CLASSES_DOC = "A list of classes to use as interceptors. Implementing the <code>ProducerInterceptor</code> interface allows you to intercept (and possibly mutate) the records received by the producer before they are published to the Kafka cluster. By default, there are no interceptors.";public static final String ENABLE_IDEMPOTENCE_CONFIG = "enable.idempotence";public static final String ENABLE_IDEMPOTENCE_DOC = "When set to 'true', the producer will ensure that exactly one copy of each message is written in the stream. If 'false', producer retries due to broker failures, etc., may write duplicates of the retried message in the stream. This is set to 'false' by default. Note that enabling idempotence requires <code>max.in.flight.requests.per.connection</code> to be set to 1 and <code>retries</code> cannot be zero. Additionally acks must be set to 'all'. If these values are left at their defaults, we will override the default to be suitable. If the values are set to something incompatible with the idempotent producer, a ConfigException will be thrown.";public static final String TRANSACTION_TIMEOUT_CONFIG = "transaction.timeout.ms";public static final String TRANSACTION_TIMEOUT_DOC = "The maximum amount of time in ms that the transaction coordinator will wait for a transaction status update from the producer before proactively aborting the ongoing transaction.If this value is larger than the max.transaction.timeout.ms setting in the broker, the request will fail with a `InvalidTransactionTimeout` error.";public static final String TRANSACTIONAL_ID_CONFIG = "transactional.id";public static final String TRANSACTIONAL_ID_DOC = "The TransactionalId to use for transactional delivery. This enables reliability semantics which span multiple producer sessions since it allows the client to guarantee that transactions using the same TransactionalId have been completed prior to starting any new transactions. If no TransactionalId is provided, then the producer is limited to idempotent delivery. Note that enable.idempotence must be enabled if a TransactionalId is configured. The default is empty, which means transactions cannot be used.";

二、Kafka异步发送

2.1 不含回调函数的异步发送

2.1.1 编码实现

1、首先需要导入依赖,如下所示:

        <dependency><groupId>org.apache.kafka</groupId><artifactId>kafka-clients</artifactId><version>0.11.0.0</version></dependency>

2、代码实现

首先需要创建一个生产者对象KafkaProducer用来发送数据,通过ProducerConfig配置生产者所需要的参数,最后将数据封装成一个ProducerRecord对象。

package com.xzw.kafka.producer;import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;import java.util.Properties;/*** @author: xzw* @create_date: 2021/3/2 8:43* @desc: 异步发送* @modifier:* @modified_date:* @desc:*/
public class AsyncProducer {public static void main(String[] args) {Properties props = new Properties();props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "master:9092");props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());props.put(ProducerConfig.ACKS_CONFIG, "all");props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);props.put(ProducerConfig.LINGER_MS_CONFIG, 1);//1、创建一个生产者对象KafkaProducer<String, String> producer = new KafkaProducer<String, String>(props);//2、调用生产者的send方法for (int i = 0; i < 10; i++) {producer.send(new ProducerRecord<String, String>("test", i + "", "data-" + i));}//3、关闭生产者producer.close();}
}

2.1.2 测试

新开一个消费者的窗口,同时启动生产者的API,可以发现消费者已经消费到了数据:

2.2 含回调函数的异步发送

回调函数会在producer收到ack时调用,为异步调用,该方法有两个参数,分别是RecordMetadata和Exception,如果Exception为null,说明消息发送成功,如果Exception不为null,说明消息发送失败。这里需要注意的是,消息发送失败会自动重试,不需要我们在回调函数中手动重试。

2.2.1 编码实现

实现回调函数非常简单,只需要在send方法中添加如下的表达式即可:

2.2.2 测试

新开一个消费者的窗口,同时启动生产者的API,可以发现消费者已经消费到了数据:

同时在本地控制台,也发现了回调函数打印出来的数据:

三、Kafka同步发送

同步发送的意思就是,一条消息发送之后,会阻塞当前线程,直至返回ack。同步发送的实现过程非常简单,下面是具体的实现代码,只需在调用Future对象的get方法即可。

3.1 编码实现

package com.xzw.kafka.producer;import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.serialization.StringSerializer;import java.util.Properties;
import java.util.concurrent.ExecutionException;/*** @author: xzw* @create_date: 2021/3/2 10:08* @desc: 同步发送* @modifier:* @modified_date:* @desc:*/
public class SyncProducer {public static void main(String[] args) throws ExecutionException, InterruptedException {Properties props = new Properties();props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "master:9092");props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());props.put(ProducerConfig.ACKS_CONFIG, "all");props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);props.put(ProducerConfig.LINGER_MS_CONFIG, 1000);//1、创建一个生产者对象KafkaProducer<String, String> producer = new KafkaProducer<String, String>(props);//2、调用生产者的send方法for (int i = 0; i < 10; i++) {RecordMetadata metadata =producer.send(new ProducerRecord<String, String>("test", i + "", "data-" + i)).get();System.out.println("offset=" + metadata.offset());}//3、关闭生产者producer.close();}
}

3.2 测试

通过测试可以发现,同步发送,很明显是一条一条发送的,因为一条消息发送之后,会阻塞当前线程,直至返回ack。

本文到此已经接近尾声了,本文主要讲述了一下Kafka生产者的API,内容比较简单。你们在此过程中遇到了什么问题,欢迎留言,让我看看你们都遇到了哪些问题~

四十五、Kafka生产者(Producer)API介绍相关推荐

  1. 2021年大数据Hadoop(二十五):YARN通俗介绍和基本架构

    全网最详细的Hadoop文章系列,强烈建议收藏加关注! 后面更新文章都会列出历史文章目录,帮助大家回顾知识重点. 目录 本系列历史文章 前言 YARN通俗介绍和基本架构 Yarn通俗介绍 Yarn基本 ...

  2. 【Visual C++】游戏开发笔记四十五 浅墨DirectX教程十三 深度测试和Z缓存专场

    本系列文章由zhmxy555(毛星云)编写,转载请注明出处. 文章链接: http://blog.csdn.net/zhmxy555/article/details/8607864 作者:毛星云(浅墨 ...

  3. Python编程基础:第四十五节 方法链Method Chaining

    第四十五节 方法链Method Chaining 前言 实践 前言 方法链是指一个对象一次调用其自身的多个方法,通常写作对象.方法1.方法2.由于这种调用方法看起来像一个链条,所以我们将其称作方法链. ...

  4. 四十五、深入Java的网络编程(上篇)

    @Author:Runsen @Date:2020/6/8 人生最重要的不是所站的位置,而是内心所朝的方向.只要我在每篇博文中写得自己体会,修炼身心:在每天的不断重复学习中,耐住寂寞,练就真功,不畏艰 ...

  5. [系统安全] 四十五.APT系列(10)Metasploit后渗透技术信息收集、权限提权和功能模块详解

    您可能之前看到过我写的类似文章,为什么还要重复撰写呢?只是想更好地帮助初学者了解病毒逆向分析和系统安全,更加成体系且不破坏之前的系列.因此,我重新开设了这个专栏,准备系统整理和深入学习系统安全.逆向分 ...

  6. 40岁想在职读计算机博士,年龄超过四十五岁还有机会报考在职博士吗

    我国一直都在强调"活到老,学到老",所以我国正在向学习型社会发展.因此,我们每一个人都要树立终身学习的理念,只有让自己不断的进步,才不会被快速发展的社会所淘汰.有很多人想要报考在职 ...

  7. [Python从零到壹] 四十五.图像增强及运算篇之图像灰度非线性变换详解

    欢迎大家来到"Python从零到壹",在这里我将分享约200篇Python系列文章,带大家一起去学习和玩耍,看看Python这个有趣的世界.所有文章都将结合案例.代码和作者的经验讲 ...

  8. android相册幻灯片功能,玩机教程 篇四十五:「MIUI玩机技巧63」MIUI相册新增“幻灯片播放”功能...

    玩机教程 篇四十五:「MIUI玩机技巧63」MIUI相册新增"幻灯片播放"功能 2020-02-17 16:15:35 0点赞 0收藏 0评论 本帖主要解决2大问题: 1) 功能科 ...

  9. 孤荷凌寒自学python第四十五天Python初学基础基本结束的下阶段预安装准备

    孤荷凌寒自学python第四十五天Python初学基础基本结束的下阶段预安装准备 (完整学习过程屏幕记录视频地址在文末,手写笔记在文末) 今天本来应当继续学习Python的数据库操作,但根据过去我自学 ...

最新文章

  1. 大胆,用Python爬一爬都是哪些程序员在反对996?!
  2. CentOS上修改用户名
  3. runlevel的修改方法
  4. java文章上一篇下一篇_每个人都必须阅读的10篇Java文章
  5. 各种水龙头拆卸图解_水龙头上包卫生纸,竟有这种效果!邻居看了都想学
  6. css常用单位px、em、 rem 区别与各自的用法解析
  7. 关于博客园开放API的授权问题解决
  8. 自动生成相机标定轨迹
  9. 经典排序算法(十九)--Flash Sort
  10. 高通BMS的研究 高通电量计
  11. Python学习笔记之疑问 1:def 是什么意思
  12. excel表格怎么画斜线_怎么画出漂亮的Excel表格线?
  13. K3 CLOUD返工生产成本方案——循环计算
  14. 制作OpenOffice的Docker镜像并添加中文字体解决乱码问题
  15. 低温工作笔记本计算机,电脑低温自动关机
  16. 2021-6-8-今日收获
  17. Win10屏幕不自动关闭怎么设置
  18. MySQL数据库多表查询练习题
  19. PID原理的详细分析及调节过程
  20. Spring Integration 快速入门教程

热门文章

  1. 抗变态或亲变态是更好的解决方案
  2. h5倒计时弹窗_iH5中级教程:活动必备,实现H5的倒计时
  3. SP603 OPPO A59 主观体验功耗对比
  4. 如何导入android sdk,如何导入android sdk samples
  5. 高通android编译命令,高通Android源码
  6. MYSQL 命令行大全 (简洁、明了、全面)
  7. linux里面的注释命令是啥,LINUX基础命令注释大全
  8. 学C++就学服务端,先把apue和unp两卷看了,接着libevent,出来找工作应该没问题
  9. Neo4j登录报错Neo4j Server shutdown initiated by request解决
  10. 巴比特 | 元宇宙每日必读:工业元宇宙究竟是什么,为何它值得被追捧?