Ingesting Websocket Data for Live Stock Streams with Cloudera Flow Management Powered by Apache NiFi
Ingesting Websocket Data for Live Stock Streams with Cloudera Flow Management Powered by Apache NiFi
We will read websockets from wss://ws.finnhub.io?token=YOURTOKEN. You will need to sign up for a finnhub.io account to get this data. The API is well documented and very easy to use with Apache NiFi.
As updates happen we receive websocket calls and send them to Kafka for use in Flink SQL, Kafka Connect, Spark Streaming, Kafka Streams, Python, .Java Spring Boot Apps, NET Apps and NIFi.
Definition of Fields
Symbol.
Last price.
UNIX milliseconds timestamp.
Volume.
Incoming Websocket Text Message Processing
We parse out the fields we want, then rename them for something readable. Then we build a new JSON field that matches our trades schema then we push to Kafka.
First step we need to setup a controller pool to connect to finnhub's web socket API.
We can see data in flight via NiFi Provenance.
Schema
https://github.com/tspannhw/ApacheConAtHome2020/blob/main/schemas/trades.avsc
Happy Holidays from Tim and the Streaming Felines!