Kinesis、Streams and Firehose

Amazon Kinesis Data Firehose 是将流数据可靠地加载到数据湖、数据存储和分析工具的最简单方式。它可以捕获、转换流数据并将其加载到 Amazon S3、Amazon Redshift、Amazon Elasticsearch Service 和 Splunk,让您可以借助正在使用的现有商业智能工具和控制面板进行近乎实时的分析。这是一项完全托管的服务,可以自动扩展以匹配数据吞吐量,并且无需持续管理。它还可以在加载数据前对其进行批处理、压缩和加密,从而最大程度地减少目的地使用的存储量,同时提高安全性。

您可以从 AWS 管理控制台轻松创建 Firehose 交付流,通过单击几下对其进行配置,并开始将数据从成千上万个数据源发送到流中,以便不断将数据加载到 AWS 中 – 所有这些操作只需几分钟即可完成。 您还可以配置交付流,以便在将数据交付到 Amazon S3 之前自动将传入数据转换为诸如 Apache Parquet 和 Apache ORC 之类的列式格式,从而实现经济高效的存储和分析。

使用 Kinesis Data Firehose,您只需为通过该服务传输的数据量和转换格式的数据量(如果适用)付费。没有最低费用和准备成本。

。。。。

Amazon Kinesis makes it easy to collect, process, and analyze video and data streams in real time.

  1. 1、Use Kinesis Video Streams to capture, process, and store video streams for analytics and machine learning.
  2. 2、Use Kinesis Data Streams to build custom applications that analyze data streams using popular stream processing frameworks.
  3. 3、Use Kinesis Data Firehose to load data streams into AWS data stores.
  4. 4、Use Kinesis Data Analytics to analyze data streams with SQL.

很显然kinesis 只有四大部分组成:

Data Streams:

借助 Amazon Kinesis Data Streams,您可以构建用于处理或分析流数据的自定义应用程序,以满足特定需求。您可以配置数以万计的数据创建器,连续不断地将数据输入 Kinesis 数据流。例如,来自网站点击流、应用程序日志和社交媒体馈送内容的数据。在不到一秒时间里,您的 Amazon Kinesis 应用程序便可以从数据流读取和处理数据。

Data Firehose:

Amazon Kinesis Data Firehose 是将流数据加载到数据存储和分析工具的最简单方式。Kinesis Data Firehose 是一项完全托管的服务,让您可以轻松地从数十万个来源捕获、转换大量流数据并将其加载到 Amazon S3、Amazon Redshift、Amazon Elasticsearch Service、Kinesis Data Analytics 和 Splunk 中,从而实现近乎实时的分析与见解

Data Analytics:

Amazon Kinesis Data Analytics 是使用 ANSI 标准 SQL 实时处理和分析流数据最简单的方法。借助该产品,您能够从 Amazon Kinesis Data Streams 和 Amazon Kinesis Data Firehose 中读取数据,且可以使用 SQL 构建流处理查询或整个应用程序,从而在收到数据后持续进行筛选、转换和整合。Amazon Kinesis Data Analytics 自动识别标准数据格式、解析数据并推荐架构,这样您便可以使用交互式架构编辑器进行编辑。它提供交互式 SQL 编辑器和流处理模板,这样您就可以在几分钟内构建复杂的流处理应用程序。Amazon Kinesis Data Analytics 会在您的流式处理应用程序中持续运行查询,并将处理结果写入输出目标 (如 Amazon Kinesis Data Streams 和 Amazon Kinesis Data Firehose),从而将数据传输至 Amazon S3、Amazon Redshift 和 Amazon Elasticsearch Service。Amazon Kinesis Data Analytics 可以自动预置、部署和扩展运行您的流式处理应用程序所需的资源。

  1. Video Streams:

Amazon Kinesis Video Streams 是一项完全托管的视频提取和存储服务。借助它,您可以为支持机器人、智能城市、工业自动化、安全监控、机器学习 (ML) 等功能的应用程序安全地提取、处理和存储任意规模的视频。Kinesis Video Streams 还可以接收其他类型的时间编码数据,例如音频、RADAR 和 LIDAR 信号。Kinesis Video Streams 为您提供了可安装在您设备上的软件开发工具包,从而可以轻松安全地将视频流式传输到 AWS。Kinesis Video Streams 可以自动预置和弹性扩展从数百万台设备中提取视频流所需的所有基础设施。它还持久地存储视频流并对其进行加密和编制索引,而且提供了易于使用的 API,因此应用程序可以基于标签和时间戳来访问和检索已编制索引的视频片段。Kinesis Video Streams 提供了一个库来将 ML 框架 (例如 Apache MxNet、Tensorflow 和 OpenCV) 与视频流集成以构建机器学习应用程序。Kinesis Video Streams 与 Amazon Rekognition Video 集成,从而使您能够构建用于检测和识别流视频中的人脸的计算机视觉应用程序。

以上就是Kinesis 的四个功能所有的服务,读者可以根据实际场景进行选取对应的服务进行操作。我今天主要讲的操作服务是Firehose。

什么是 Amazon Kinesis Data Firehose?

Amazon Kinesis Data Firehose 是一个完全托管的服务,用于将实时流数据传输到目标,例如,Amazon Simple Storage Service (Amazon S3)、Amazon Redshift、Amazon Elasticsearch Service (Amazon ES) 和 Splunk。Kinesis Data Firehose 与 Kinesis Data Streams、Kinesis Video Streams 和 Amazon Kinesis Data Analytics 都是 Kinesis 流式处理数据平台的一部分。在使用 Kinesis Data Firehose 时,您无需编写应用程序或管理资源。您可以配置数据创建器以将数据发送到 Kinesis Data Firehose,后者自动将数据传输到您指定的目标。您还可以配置 Kinesis Data Firehose 以在传输之前转换数据。


Kafka and Kinesis are message brokers that have been designed as distributed logs. With them you can only write at the end of the log or you can read entries sequentially. But you cannot remove or update entries, nor add new ones in the middle of the log.

This simple design allows distributed logs to have a really interesting set of characteristics. Because the reads and writes to the log are sequential, they have much better performance than other message brokers. And because the log is persistent, you can reprocess it as many times as needed.

But even though Kafka and Kinesis are very similar (Kinesis was, to say the least, inspired by Kafka), they differ in many aspects.

Kinesis vs SQS

Amazon Kinesis is differentiated from Amazon’s Simple Queue Service (SQS) in that Kinesis is used to enable real-time processing of streaming big data. SQS, on the other hand, is used as a message queue to store messages transmitted between distributed application components.

Why does Amazon AWS have three types of messaging queues (Kinesis Firehose, Kinesis Streams, SQS) while Google Cloud only has Cloud Pub/Sub?

The premise of the question is not entirely correct. While SQS is definitely a messaging queue, Kinesis Firehose and Streams are not exactly messaging queues. Kinesis suite (if I may call that) is built for ingesting and integrating streaming data into downstream applications, while SQS is the good old-fashioned messaging solution. In terms of differences,

Kinesis Firehose: AWS’ data ingestion product for streaming data. Use it to load and process streaming data into S3, Redshift, RDS etc. AWS takes care of scaling based on incoming stream data volumes.

Kinesis Stream: Captures streaming data from producers and sends it to custom applications downstream. Can scale massively but scale provisioning to be done by you.

SQS: A run of the mill messaging queue, holds and sends/ receives messages between different applications.

I have not really worked on Cloud Sub/ Pub so I will let others comment how that fares against AWS offerings.

HOW TO LOAD HUGE AMOUNT OF DATA INTO THE PIPELINE?

This problem is solved by Firehose. It is the entrypoint of the data into the AWS ecosystem. Kafka does not have an equivalent to Firehose. This product is more specific to Amazon’s other offerings. Firehose can load the incoming data from various different sources, buffer it into larger chunks and forward it to other AWS services like S3, Redshift, Lambda. Firehose solves the problems with backpressure. Backpressure arises when the input buffer of a service is not able to keep up with the output buffer of another service that feeds data into it. Firehose automatically scales up or down to match the its throughput with the data it has to work with. It can also batch, compress and encrypt the data before feeding it into other AWS services. Amazon recently added several other types of data transformations to the pipeline that work before the data is loaded.

What is Amazon firehose?

Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (AmazonS3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), and Splunk. ... With Kinesis Data Firehose, you don't need to write applications or manage resources.

What is a Data Firehose API?

This blog post gives an overview of a data firehose API, how it works, use cases, and how to build one yourself from any data source.

firehose if you will. And that's exactly what a data firehose API does. The firehose API is a steady stream of all available data from a source in realtime – a giant spigot that delivers data to any number of subscribers at a time. The stream is constant, delivering new, updated data as it happens.

。。。

Turn Any Data Source into a Firehose API

Weather, Twitter, or bus times, turn any source of data into a data firehose with PubNub. All you need is the API, and you'll be on your way!

。。。

Clarifying and using your kinesis data

Are you mystified by Firehose and Streams? Read on and check out our infographic to learn about their key differences.

Within the AWS ecosystem, Amazon Kinesis offers real-time data processing over large data streams, making it an essential tool for developers working with real-time apps that pull data from several sources. Kinesis offers two options for data stream processing, each designed for users with different needs: Streams and Firehose.

  1. Kinesis streams. The more customizable option, Streams is best suited for developers building custom applications or streaming data for specialized needs. The customizability of the approach, however, requires manual scaling and provisioning. Data typically is made available in a stream for 24 hours, but for an additional cost, users can gain data availability for up to seven days.
  2. Kineses firehose. The simpler approach, Firehose handles loading data streams directly into AWS products for processing. Scaling is handled automatically, up to gigabytes per second, and allows for batching, encrypting, and compressing. Firehose also allows for streaming to S3, Elasticsearch Service, or Redshift, where data can be copied for processing through additional services.

Kinesis Streams and Kinesis Firehose both allow data to be loaded using HTTPS, the Kinesis Producer Library, the Kinesis Client Library, and the Kinesis Agent. Both services also allow for monitoring through Amazon Cloudwatch and through Kinesis Analytics, a service that allows users to create and run SQL queries on streaming data and send it to third-party analytics tools.

Kinesis Streams vs Firehose

One frequent question I get is “what is Amazon Kinesis, and what can it do for me?” I also get a lot of questions around Kinesis Streams vs. Firehose? Let’s go through the key concepts and show how to get started with logging using the Kinesis Connector.

In December 2013, Amazon Web Services released Kinesis, a managed, dynamically scalable service for the processing of streaming big data in real-time. Since that time, Amazon has been steadily expanding the regions in which Kinesis is available, and as of this writing, it is possible to integrate Amazon’s Kinesis producer and client libraries into a variety of custom applications to enable real-time processing of streaming data from a variety of sources.

Kinesis acts as a highly available conduit to stream messages between data producers and data consumers. Data producers can be almost any source of data: system or web log data, social network data, financial trading information, geospatial data, mobile app data, or telemetry from connected IoT devices. Data consumers will typically fall into the category of data processing and storage applications such as Apache Hadoop, Apache Storm, and Amazon Simple Storage Service (S3), and ElasticSearch.

Key Concepts

It’s helpful to understand some key concepts when working with Kinesis Streams. The basic unit of scale when working with streams is a shard. A single shard is capable of ingesting up to 1MB or 1,000 PUTs per second of streaming data, and emitting data at a rate of 2MB per second.

Shards scale linearly, so adding shards to a stream will add 1MB per second of ingestion, and emit data at a rate of 2MB per second for every shard added. Ten shards will scale a stream to handle 10MB (10,000 PUTs) of ingress, and 20MB of data egress per second. You choose the number of shards when creating a stream, and it is not possible to change this via the AWS Console once you’ve created a stream.

It is possible to dynamically add or remove shards from a stream using the AWS Streams API. This is called resharding. Resharding cannot be done via the AWS Console, and is considered an advanced strategy when working with Kinesis. A solid understanding of the subject is required prior to attempting these operations.

Adding shards essentially splits shards in order to scale the stream, and removing shards merges them. Data is not discarded when adding (splitting) or removing (merging) shards. It is not possible to split a single shard into more than two, nor to merge more than two shards into a single shard at a time.

Adding and removing shards will increase or decrease the cost of your stream accordingly. Per the Amazon Kinesis Streams FAQ, there is a default limit of 10 shards per region. This limit can be increased by contacting Amazon Support and requesting a limit increase. There is no limit to the number of shards or streams in an account.

Records are units of data stored in a stream and are made up of a sequence number, partition key, and a data blob. Data blobs are the payload of data contained within a record. The maximum size of a data blob before Base64-encoding is 1MB, and is the upper limit of data that can be placed into a stream in a single record. Larger data blobs must be broken into smaller chunks before putting them into a Kinesis stream.

Partition keys are used to identify different shards in a stream, and allow a data producer to distribute data across shards.

Sequence numbers are unique identifiers for records inserted into a shard. They increase monotonically, and are specific to individual shards.

Amazon Kinesis Offerings

Amazon Kinesis is currently broken into three separate service offerings.

Kinesis Streams

Kinesis Streams is capable of capturing large amounts of data (terabytes per hour) from data producers, and streaming it into custom applications for data processing and analysis. Streaming data is replicated by Kinesis across three separate availability zones within AWS to ensure reliability and availability of your data.

Kinesis Streams is capable of scaling from a single megabyte up to terabytes per hour of streaming data. You must manually provision the appropriate number of shards for your stream to handle the volume of data you expect to process. Amazon helpfully provides a shard calculator when creating a stream to correctly determine this number. Once created, it is possible to dynamically scale up or down the number of shards to meet demand, but only with the AWS Streams API at this time.

It is possible to load data into Streams using a number of methods, including HTTPS, the Kinesis Producer Library, the Kinesis Client Library, and the Kinesis Agent.

By default, data is available in a stream for 24 hours, but can be made available for up to 168 hours (7 days) for an additional charge.

Monitoring is available through Amazon Cloudwatch.

Kinesis Firehose

Kinesis Firehose is Amazon’s data-ingestion product offering for Kinesis. It is used to capture and load streaming data into other Amazon services such as S3 and Redshift. From there, you can load the streams into data processing and analysis tools like Elastic Map Reduce, and Amazon Elasticsearch Service. It is also possible to load the same data into S3 and Redshift at the same time using Firehose.

Firehose can scale to gigabytes of streaming data per second, and allows for batching, encrypting and compressing of data. It should be noted that Firehose will automatically scale to meet demand, which is in contrast to Kinesis Streams, for which you must manually provision enough capacity to meet anticipated needs.

As with Kinesis Streams, it is possible to load data into Firehose using a number of methods, including HTTPS, the Kinesis Producer Library, the Kinesis Client Library, and the Kinesis Agent. Currently, it is only possible to stream data via Firehose to S3 and Redshift, but once stored in one of these services, the data can be copied to other services for further processing and analysis.

Monitoring is available through Amazon Cloudwatch.

Kinesis Analytics

Kinesis Analytics is Amazon’s forthcoming product offering that will allow running of standard SQL queries against data streams, and send that data to analytics tools for monitoring and alerting. This product has not yet been released, and Amazon has not published details of the service as of this date.

Kinesis vs SQS

Amazon Kinesis is differentiated from Amazon’s Simple Queue Service (SQS) in that Kinesis is used to enable real-time processing of streaming big data. SQS, on the other hand, is used as a message queue to store messages transmitted between distributed application components.

Kinesis provides routing of records using a given key, ordering of records, the ability for multiple clients to read messages from the same stream concurrently, replay of messages up to as long as seven days in the past, and the ability for a client to consume records at a later time. Kinesis Streams will not dynamically scale in response to increased demand, so you must provision enough streams ahead of time to meet the anticipated demand of both your data producers and data consumers.

SQS provides for messaging semantics so that your application can track the successful completion of work items in a queue, and you can schedule a delay in messages of up to 15 minutes. Unlike Kinesis Streams, SQS will scale automatically to meet application demand. SQS has lower limits to the number of messages that can be read or written at one time compared to Kinesis, so applications using Kinesis can work with messages in larger batches than when using SQS.

Getting Started with AWS Kinesis

Amazon has published an excellent tutorial on getting started with Kinesis in their blog post Building a Near Real-Time Discovery Platform with AWS. It is recommended that you give this a try first to see how Kinesis can integrate with other AWS services, especially S3, Lambda, Elasticsearch, and Kibana.

Once you’ve taken Kinesis for a test spin, you might consider integrating with an external service such as SumoLogic to analyze log files from your EC2 instances using their Amazon Kinesis Connector. Information about the Amazon Kinesis Connector can be found in the SumoLogic Github repository. You may also want to check out “Sumo Logic App for Amazon VPC Flow Logs using Kinesis” for additional insights.

参考:Demystifying AWS Kinesis: Streams vs Firehose

参考:Kinesis Streams vs Firehose

参考:Kinesis Firehose vs. Kinesis Data Streams: what's the difference?

参考:aws服务从入门到精通| Amazon Kinesis 服务之Firehose操作

参考:https://aws.amazon.com/cn/

参考:Streaming Twitter feed using Kinesis Data Firehose and Redshift

参考:Apache Kafka and Amazon Kinesis

参考:Streaming Platforms: Apache Kafka vs. AWS Kinesis

参考:AMAZON KINESIS VS. APACHE KAFKA FOR BIG DATA ANALYSIS

参考:Amazon Kinesis Data Firehose FAQs

参考:

Kinesis、Streams and Firehose相关推荐

  1. 亚马逊AWS Kinesis Video Streams with KVS demo示例

    title: 亚马逊AWS Kinesis Video Streams with KVS demo示例 categories:[Linux C] tags:[亚马逊云平台] date: 2021/12 ...

  2. 亚马逊AWS Kinesis Video Streams with IOT mqtt的demo示例

    title: 亚马逊AWS Kinesis Video Streams with IOT mqtt的demo示例 categories:[Linux C] tags:[亚马逊云平台] date: 20 ...

  3. AWS KVS(Kinesis Video Streams)之WebRTC的C库

    作为小白用户编译就卡出各种问题.各种环境问题各种bug调试. 针对亚马逊的kvs 的vs2019编译去除kvs的信令服务器代码,实现自己的mqtt信令控制.最终可以p2p发送视频数据和音频数据. 编译 ...

  4. JavaScript基础知识总结 14:学习JavaScript中的File API、Streams API、Web Cryptography API

    目录 一.Atomics和SharedArrayBuffer 二.原子操作基础 1.算术及位操作方法 2.原子读和写 3.原子交换 4.原子Futex操作与加锁 三.跨上下文消息 四.Encoding ...

  5. Redis 消息队列的三种方案(List、Streams、Pub/Sub)

    现如今的互联网应用大都是采用 分布式系统架构 设计的,所以 消息队列 已经逐渐成为企业应用系统 内部通信 的核心手段,它具有 低耦合.可靠投递.广播.流量控制.最终一致性 等一系列功能. 当前使用较多 ...

  6. AWS KVS(Kinesis Video Streams)之WebRTC

    STUN .TURN 和ICE如何工作 (两个端点交互流程): 我们假设两个对等方A和B都使用WebRTC对等双向媒体流(例如,视频聊天应用程序)的情况. 要连接到B的应用程序,A的应用程序必须生成S ...

  7. AWS KVS(Kinesis Video Streams)之WebRTC移植编译(五)

    在使用KVS完整的WebRTC的SDK(包括信令.STUN/TURN等),我们还需要编译(详见SDK的依赖项)如下两个库 1.amazon-kinesis-video-streams-producer ...

  8. AWS KVS(Kinesis Video Streams)之WebRTC集成过程报错问题整理(七)

    在前面我已经将君正T31X平台的WebRTC相关的依赖库编译完成,且现在将kvsWebrtcClientMaster测试用例也需要编译出来,在Camera中运行,验证依赖库的正确性. 根据自己编写的C ...

  9. AWS KVS(Kinesis Video Streams)之WebRTC集成过程报错问题整理(八)

    1.用例情况: [1]第一组 master:君正平台的 kvsWebrtcClientMaster viewer:https://us-west-2.console.aws.amazon.com/ki ...

最新文章

  1. Windows Admin Center 高可用部署
  2. lambda表达式_在Java 7或更早版本中使用Java 8 Lambda表达式
  3. 二叉排序树的实现——java
  4. C#如何将按钮置于按下状态
  5. 框架 go_go异步任务框架machinery,嗖嗖的[视频]
  6. 计算机的定点运算器原理,计算机组成原理第二章第10讲定点运算器的组成.ppt
  7. 项目管理基本目录结构
  8. 智能门禁(2)---安检人脸识别人证验证系统解决方案
  9. android零碎要点---android开发者的福音,59_1 Android的界面设计工具,直接拖拉就可以设计界面,Java技术qq交流群:JavaDream:251572072
  10. asp.net core IIS发布
  11. 5.3 使用SQL还是NoSQL
  12. C++使用boost::bind 订阅消息中的返回函数传入多个参数
  13. Helm 3 完整教程(六):在模板中使用 Helm 函数
  14. windows系统bat批处理 清理注册表与蓝屏补丁
  15. centos5.5安装csvn,以及问题处理
  16. github客户端进行token认证
  17. OD调试初体验—关闭x86版winrar广告弹窗
  18. MarkDown下载及学习笔记
  19. 【机器学习】PR曲线
  20. 给大家分享一下我的数字化转型研究资料

热门文章

  1. uboot环境变量-带分号的环境变量
  2. 44 jQuery概述和基本使用
  3. python3语法糖------装饰器
  4. (附链接)CVPR 2022 | 模型难复现不一定是作者的错,最新研究发现模型架构要背锅...
  5. 重磅直播|多模态融合SLAM技术分享!
  6. 从理论到实践: ORB-SLAM3 Initializer完全解读
  7. 重磅:USNews2021世界大学排行榜出炉!清华首登亚洲第一
  8. 汇总|医学图像数据集
  9. java char的包装对象,Java 从Character和char的区别来学习自动拆箱装箱
  10. mysql 配置执行计划_MySQL深入学习(二)--配置、索引、执行计划