A shard has a sequence of data records in a stream
A producer puts data records into shards and a consumer gets data records from shards. Consumers use shards for parallel data processing and for consuming data in the exact order in which they are stored. If writes and reads exceed the shard limits, the producer and consumer applications will receive throttles, which can be handled through retries.
What is Amazon Kinesis Data Streams? With Amazon Kinesis Data Streams, you can build custom applications that process or analyze streaming data for specialized needs. You can add various types of data such as clickstreams, application logs, and social media to a Kinesis data stream from hundreds of thousands of sources. Within seconds, the data will be available for your applications to read and process from the stream
Amazon Kinesis Data Streams manages the infrastructure, storage, networking, and configuration needed to stream your data at the level of your data throughput
In addition, Kinesis Data Streams synchronously replicates data across three Availability Zones, providing high availability and data durability.
By default, Kinesis Data Streams scales capacity automatically, freeing you from provisioning and managing capacity.
You can choose provisioned mode if you want to provision and manage throughput on your own.
Kinesis Data Streams is useful for rapidly moving data off data producers and then continuously processing the data, whether that means transforming it before emitting to a data store, running real-time metrics and analytics, or deriving more complex data streams for further processing.
The following are typical scenarios for using Kinesis Data Streams: Accelerated log and data feed intake: Instead of waiting to batch the data, you can have your data producers push data to a Kinesis data stream as soon as the data is produced, preventing data loss in case of producer failure. For example, system and application logs can be continuously added to a data stream and be available for processing within seconds. Real-time metrics and reporting: You can extract metrics and generate reports from Kinesis data stream data in real time. For example, your Amazon Kinesis application can work on metrics and reporting for system and application logs as the data is streaming in, rather than waiting to receive data batches. Real-time data analytics: With Kinesis Data Streams, you can run real-time streaming data analytics. For example, you can add clickstreams to your Kinesis data stream and have your Kinesis application run analytics in real time, allowing you to gain insights from your data in minutes instead of hours or days. Log and event data collection: Collect log and event data from sources such as servers, desktops, and mobile devices. You can then build applications using Amazon Lambda or Kinesis Data Analytics to continuously process the data, generate metrics, power live dashboards, and emit aggregated data into stores such as Amazon Simple Storage Service (S3). Power event-driven applications: Quickly pair with AWS Lambda to respond or adjust to immediate occurrences within the event-driven applications in your environment, at any scale.
you can start using Kinesis Data Streams by creating a Kinesis data stream through either the AWS Management Console or the CreateStream operation
You can optionally send data from existing resources in AWS services such as Amazon DynamoDB, Amazon Aurora, Amazon CloudWatch
You can then use AWS Lambda, Amazon Kinesis Data Analytics, or AWS Glue Streaming to quickly process data stored in Kinesis Data Streams
You can also build custom applications that run on Amazon Elastic Compute Cloud (EC2), Amazon Elastic Container Service (ECS), and Amazon Elastic Kubernetes Service (EKS) using either Amazon Kinesis API or Amazon Kinesis Client Library (KCL).
What is a record? A record is the unit of data stored in an Amazon Kinesis data stream
A record is composed of a sequence number, partition key, and data blob. Data blob is the data of interest your data producer adds to a data stream. The maximum size of a data blob (the data payload before Base64-encoding) is 1 megabyte (MB).
What is a partition key? A partition key is used to segregate and route records to different shards of a data stream.
A partition key is specified by your data producer while adding data to a Kinesis data stream. For example, let’s say you have a data stream with two shards (shard 1 and shard 2). You can configure your data producer to use two partition keys (key A and key B) so that all records with key A are added to shard 1 and all records with key B are added to shard 2.
What is a sequence number? A sequence number is a unique identifier for each record
Sequence number is assigned by Amazon Kinesis when a data producer calls PutRecord or PutRecords operation to add data to a Amazon Kinesis data stream. Sequence numbers for the same partition key generally increase over time; the longer the time period between PutRecord or PutRecords requests, the larger the sequence numbers become.
In provisioned mode, you specify the number of shards for the data stream. The total capacity of a data stream is the sum of the capacities of its shards. You can increase or decrease the number of shards in a data stream as needed, and you pay for the number of shards at an hourly rate.
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.