๐ฉโ๐ป Source Code
Introduction
This is an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka.
I have used different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL.
Architecture
Technology Used
- Programming Language - Python
- Amazon Web Service (AWS)
- S3 (Simple Storage Service)
- Athena
- Glue Crawler
- Glue Catalog
- EC2
- Apache Kafka
Kafka commands
wget https://downloads.apache.org/kafka/3.3.1/kafka_2.12-3.3.1.tgz
tar -xvf kafka_2.12-3.3.1.tgz
Install Java
sudo yum install java-1.8.0-openjdk
Check Java version
java -version
Start Zoo-keeper:
cd kafka_2.12-3.3.1
bin/zookeeper-server-start.sh config/zookeeper.properties
Open another window to start kafka
But first ssh to to your ec2 machine as done above
Start Kafka-server:
Duplicate the session & enter in a new console -
export KAFKA_HEAP_OPTS="-Xmx256M -Xms128M"
bin/kafka-server-start.sh config/server.properties
It is pointing to private server , change server.properties so that it can run in public IP
To do this , you can follow any of the 2 approaches shared below –
Do a sudo nano config/server.properties
- change ADVERTISED_LISTENERS to public ip of the EC2 instance
Create the topic:
Duplicate the session & enter in a new console -
bin/kafka-topics.sh --create --topic demo_test --bootstrap-server {Public IP of your EC2 Instance:9092} --replication-factor 1 --partitions 1
Start Producer:
bin/kafka-console-producer.sh --topic demo_test --bootstrap-server {Public IP of your EC2 Instance:9092}
Start Consumer:
Duplicate the session & enter in a new console -
bin/kafka-console-consumer.sh --topic demo_test --bootstrap-server {Public IP of your EC2 Instance:9092}