Kafka新手入门-000 QuickStart 里面介绍视频的字幕整理

Kafka官网的QuickStart ------ 这里面有视频

Hi, I’m Tim Berglund with Confluent. I’ d like to tell you what Apache Kafka is. But first, I wanna start with some background. For a long time now, we have written programs that store information in databases. Now, what databases encouraged us to do, is to think of the world in terms of things, things like I don’t know, users and maybe a thermostat, that’s a thermometer, but you get the idea. Maybe a physical thing like a train, let’s say a train, things, there are things in the world database encourages thinking those terms, and those things have some state, we take that state, we store it in the database. This has worked well for decades. But now some people are finding, that is better rather than thinking of things first to think of events first. Now events have some state too, right? An event has a description of what happened with it. But the primary idea is that, the event is an indication in time that the things took place. Now it’s a little bit cumbersome to store events in databases, instead, we use a structure called a log. And a log is just an ordered sequence of these events, an event happens and we write it into a log, a little bit of state, a little bit of description what happens. And that says, hey, that event happened at that time. As you can see, logs are really easy to think about, they’re also easy to build at scale, which historically has not quite been true of databases, which have been a little cumbersome in one way or another to build at scale. Apache Kafka is the system for managing there logs, using a fairly standard historical term, it calls them topics, this is a topic. A topic is just an ordered collection of events that are stored in a durable way, durable meaning that they’re written to disk, and they’re replicated, so they’re stored on more than one disk, on more than one server, somewhere wherever that infrastructure runs, so that there’s no one hardware failure that can make that data go away. Topics can store data for a short period of time, like a few hours, or days, or years, or hundreds of years or indefinitely. Topics can also be relatively small, or they can be enormous. There’s nothing about the economics of Kafka that says that topics have to be large in order for it to make sense, and there’s nothing about the architecture of Kafka that says that they have to stay small, so they can be small, they can be big, they can remember data forever, they can remember data just for a little while. But there are persistent record of events. Each one of those events represents a things happening in the business like remember a user, maybe a user updates her shipping address, or a train, unloads cargo, or a thermostat reports that the temperature has gone from comfy to is it getting hot in here. Each one of those things can be an event stored in a topic, and Kafka encourages you to think of events first, and things second. Now back when databases ruled the world, it was kind of a trend to build one large program, we’ll just build this gigantic program here that uses one big database all by itself. And it was customary for a number of reasons to do this, but these things grew to a point where they were difficult to change, and also difficult to think about. They got too big for any one developer to fit that whole program in his or her head at the same time. And if you’ve lived like this, you know that that’s true. Now the trend is to write lots and lots of small programs, each one of which is small enough to fit in your head and think about and version and change and evolve all on its own. And these things can talk to each other through Kafka topics. So each one of these services can consume the message from a Kafka topic, do whatever its computation is, that goes on in there, and then produce that message off to another Kafka topic that lives over here. So that output is now durably, and maybe even permanently recorded for other services and other concerns in the system to process. So with all this data living in these persistent real time streams, and I’ve drawn two of them now, but imagine there are dozens or hundreds more in a large system. Now it’s possible to build new services that perform real time analysis of that data. So I can stand up some other service over here, that does some kind of gauge, some sort of real time analytics dashboard. And that is just consuming messages from this topic here, that’s in contrast to the way it used to be where you ran a batch process overnight. Now, it’s possible that yesterday is a long time ago for some businesses now. You might want that insight to be instant or as close to instant as it could possibly be. And with data in these topics as events that get processed as soon as they happen. It’s now fairly straightforward to build these services that can do that analysis in real time. So you’ve got events, you’ve got topics, you’ve got all these little services talking to each other through topics, you got real time analytics. I think if you have those four things in your head, you’ve got a decent idea of kind of the minimum viable understanding not only of what Kafka is, which is this distributed log thing, but also of the kinds of software architectures that Kafka tends to give rise to. When people start building systems on it, this is what happens. Once a company starts using Kafka, it tends to have this viral effect, right? We’ve got these persistent distributed logs that are records of the things that have happened, we’ve got things talking through them, but there are other systems. I mean, what’s this, there’s this database, there’s probably gonna be, you know, another database out there that was built, before Kafka came along, and you wanna integrate these systems. There could be other systems entirely, maybe there’s a search cluster, maybe you use some SAS product to help your sales people organize their efforts, all these systems in the business, and their data isn’t in Kafka. Well, Kafka Connect is a tool that helps get that data in, and back out, when there’s all these other systems in the world, you wanna collect data, so changes happen in a database, and you wanna collect that data and get it written into a topic like that. And now, I can stand up some new service that consumes that data, and does whatever computation is on it, now that it’s in a Kafka topic, that’s the whole point, Connect gets that data in, then that service produces some result, which goes to a new topic over here. And connect is the piece that moves it to whatever that external legacy system is here. So Kafka Connect is this process that does this inputting and this outputting, and it’s also an ecosystem of connectors. There are dozens, even hundreds of connectors out there in the world, some of them are open source, some of them are commercial, some of them are in between, but they’re these little pluggable modules that you can deploy, to get this integration done in a declarative way, you deploy them, you configure them, you don’t write code, to do this reading from the database, this writing to whatever that external system is. Those modules already exist, the code’s already written, you just deploy them and Connect does that integration to those external systems. And let’s think about the work that these things do, these services, these boxes I’m drawing, they have some life of their own, they’re programs, right? But they’re gonna process messages from topics, and they’re gonna have some computation that they wanna do over those messages. And it’s amazing, there’s really just a few things that people end up doing, like, say, you have messages, these green messages, you wanna group all those up and add some field, like come up with total weight of all the train cars that past a certain point or something, but only a certain kind of car, only the green kinds of cars. And then you’ve got these other, say, you’ve got these orange ones here. So right away we see that, we’re gonna have to go through those messages, we’re gonna have to group by some key, and then we’ll take the group and run some aggregation over it, or maybe count them or something like that. Maybe you want to filter, maybe I’ve got this topic, and let’s see, make some room for some other topic over here that’s got some other kind of data. And I wanna take all the messages here, and somehow link them with messages in this topic, and enrich when I see this, this message happened here, I wanna go enrich it with the data that’s in this other topic. These are common things, if the first time you thought about it, that might seem unusual, but those things grouping, aggregating, filtering, enrichment. Enrichment, by the way, it goes by another name and database, that’s a joint, right? These are the things that these services are going to do. They’re simple in principle to think about and to sketch, but to actually write the code to make all that happen, takes some work and that’s not work you wanna do. So Kafka, again, in the box, just like it has Connect for doing data integration, it has an API called Kafka streams. That’s a Java API that handles all of the framework and infrastructure and kind of kind of undifferentiated stuff you’d have to build to get that work done. So you can use that as a Java API, in your services, and get all that done in a scalable and fault tolerant way, just like we expect for modern applications to be able to do and that’s not framework code you have to write, you just get to use it because you’re using Kafka. Now if you’re a developer and you wanna learn more, you know, the thing to do, is to start writing code. Check those things out, let us know if you have any questions and I hope we hear from you soon.

Kafka新手入门-000 QuickStart 里面介绍视频的字幕整理相关推荐

Apache Kafka教程--Kafka新手入门
Apache Kafka教程–Kafka新手入门 Kafka Assistant 是一款 Kafka GUI 管理工具--管理Broker,Topic,Group.查看消费详情.监控服务器状态.支持多 ...
浩辰3D设计软件新手入门教程：用户界面介绍
对于3D设计工程师来说, 3D设计软件作为日常不可或缺的工具,但是正在日常的设计工作中,为了更好更快的3D建模,最好选择一款好用的软件,浩辰3D软件具备和主流3D设计软件一致的用户界面,让工程师可以直 ...
【番外篇1】青龙面板中cron表达式新手入门教程cron的介绍与使用
cron表达式即计划任务,约定任务在特定的时间执行 cron表达式有7位和5位之分,不同位数之间以空格分隔 7位:* * * * * * * 从左到右依次代表秒.分.时.天.月.周.年,在大部分情况下 ...
Python+Opencv图像处理新手入门教程(一)：介绍，安装与起步
一步一步来吧 1.什么是opencv opencv: 是一个开源的计算机视觉库,它提供了很多函数,这些函数非常高效地实现了计算机视觉算法(最基本的滤波到高级的物体检测皆有涵盖). 使用 C/C++ 开 ...
WordPress 建站教程新手入门六主题使用营销值得学
作者:营销值得学 WordPress较受欢迎的原因就是拥有众多的主题模板,包含有常见的博客主题/ 淘宝客主题/企业主题/CMS主题/图片主题/视频主题等等,单是营销值得学免费分享的WordPres ...
Python+Opencv图像处理新手入门教程(二)：颜色空间转换，图像大小调整，灰度直方图
一步一步来吧上一节:Python+Opencv图像处理新手入门教程(一):介绍,安装与起步 1.什么是图像对于计算机而言,图像的本质是一个由像素点构成的矩阵. 例如我们用肉眼很容易分辨一辆汽车的后 ...
java官网教程（基础篇）—— 新手入门
新手入门 Java技术的介绍,以及安装Java开发软件和使用它创建一个简单程序的课程. 本教程提供了关于开始使用Java编程语言所需了解的一切. Java 技术现象提供Java技术的整体概述.它讨论 ...
Nginx 新手入门
安装在centOS上安装nginx,最新版本1.21的要求centOS 7.4.x以上的,我们用的是centOS 7.3,试了一下,也能装. centOS安装教程在这里,下面简单翻译下,大致这些步骤 ...
短视频运营：如何做自媒体？新手入门的详细操作分享
经常听到这样一些言论,目前自媒体已经过时了,普通人根本挣不到钱,真相是否像别人说的一样呢? 其实,在我们身边经常会看到一些网红大V,一个月轻轻松松月入百万.月入千万的人已经大有人在,而这些网红往往都是 ...

Kafka新手入门-000 QuickStart 里面介绍视频的字幕整理

Kafka新手入门-000 QuickStart 里面介绍视频的字幕整理相关推荐

最新文章

热门文章