Did the user really ask for Exactly Once? Fault Tolerance
Exactly Once Requirements
It is very tricky and can cause performance degradation, if your user could just use at least once, then always go with that. Having data sinks like Kudu where you can do an upsert makes exactly once less needed.
Apache Flink, Apache NiFi Stateless and Apache Kafka can participate in that.
For CDF Stream Processing and Analytics with Apache Flink 1.10 Streaming:
Both Kafka sources and sinks can be used with exactly once processing guarantees when checkpointing is enabled.
End-to-End Guaranteed Exactly-Once Record Delivery
The Data Source and Data Sink to need to support exactly-once state semantics and take part in checkpointing.
Data Sources
- Apache Kafka - must have Exactly-Once selected, transactions enabled and correct driver.
Select: Semantic.EXACTLY_ONCE
Data Sinks
- HDFS BucketingSink
- Apache Kafka
For Kafka, please check the timeouts sync up to checkpoints. https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/kafka.html#kafka-producers-and-fault-tolerance