next →← prev
Before knowing about the Kinesis, you should know about the streaming data.
What is streaming data?
Streaming data is data which is generated continuously from thousands of data sources, and these data sources can send the data records simultaneously and in small size.
Following are the examples of streaming data:
Purchases from online stores
People buying stuff on amazon.com and generates streaming data and that streaming data can be transactions, product, etc.
Stock price is also an example of streaming data.
Suppose the user is playing an angry bird game and the application is generating streaming data back to the central server. This streaming data could be "what the user is doing", "what is the score".
Social network data
Social network data is also another example of streaming data. Suppose you visit on Facebook, update your status, and put a post on your friend's wall. All these data would then be streamed.
When you are using uber, and your device is connected to the internet. Uber application is constantly saying that where the uber driver is, where you are, and it is interrogating the map to give you the best possible route to your destination. This is also a good example of streaming data.
iOT Sensor Data
It senses the all around world monitoring temperature.
What is Kinesis?
Kinesis is a platform on AWS that sends your streaming data. It makes it easy to analyze load streaming data and also provides the ability for you to build custom applications based on your business needs.
Core Services of Kinesis
Kinesis streams consist of shards.
Shards provide 5 transactions per second for reads, up to a maximum total data read rate of 2MB per second and up to 1,000 records per second for writes up to a maximum total data write rate of 1MB per second.
The data capacity of your stream is a function of the number of shards that you specify for the data stream. The total capacity of the Kinesis stream is the sum of the capacities of all shards.
Architecture of Kinesis Stream
Suppose we have got the EC2, mobile phones, Laptops, IOT which are producing the data. They are known as producers as they produce the data. The data is moved to the Kinesis streams and stored in the shard. By default, the data is stored in shards for 24 hours. You can increase the time to 7 days of retention. Once the data is stored in shards, then you have EC2 instances which are known as consumers. They take the data from shards and turned it into useful data. Once the consumers have performed its calculation, then the useful data is moved to either of the AWS services, i.e., DynamoDB, S3, EMR, Redshift.
Kinesis Firehose is a service used for delivering streaming data to destinations such as Amazon S3, Amazon Redshift, Amazon Elasticsearch.
With Kinesis Firehouse, you do not have to manage the resources.
Architecture of Kinesis Firehose
Suppose you have got the EC2, mobile phones, Laptop, IOT which are producing the data. They are also known as producers. Producers send the data to Kinesis Firehose. Kinesis Firehose does not have to manage the resources such as shards, you do not have to worry about streams, you do not have to worry about manual editing the shards to keep up with the data, etc. It?s completely automated. You do not have to worry even about the consumers. Data can be analyzed by using a Lambda function. Once the data has been analyzed, the data is sent directly over to the S3. The analytics of data is optional. One important thing about Kinesis Firehouse is that there is no automatic retention window, but the Kinesis stream has an automatic retention window whose default time is 24 hours and it can be extended up to 7 days. Kinesis Firehose does not work like this. It essentially either analyzes or sends the data over directly to S3 or other location.
The other location can be Redshift. First, you have to write to S3 and then copy it to the Redshift.
If the location is Elastic search cluster, then the data is directly sent to the Elastic search cluster.
Kinesis Analytics is a service of Kinesis in which streaming data is processed and analyzed using standard SQL.
Architecture of Kinesis Analytics
We have got the kinesis firehose and kinesis stream. Kinesis Analytics allows you to run the SQL Queries of that data which exist within the kinesis firehose. You can use the SQL Queries to store the data in S3, Redshift or Elasticsearch cluster. Essentially, data is analyzed inside the kinesis using SQL type query language.
Differences b/w Kinesis Streams & Kinesis Firehose
Kinesis stream is manually managed while Kinesis Firehose is fully automated managed.
Kinesis stream sends the data to many services while Kinesis Firehose sends the data only to S3 or Redshift.
Kinesis stream consists of an automatic retention window whose default time is 24 hours and can be extended to 7 days while Kinesis Firehose does not have automatic retention window.
Kinesis streams send the data to consumers for analyzing and processing while kinesis firehose does not have to worry about consumers as kinesis firehose itself analyzes the data by using a lambda function.