Processing Data from MQ with Spark Streaming: Part 3 - Reliable Delivery & Recovery Techniques with Spark Streaming

Stacey Ronaghan
5 min readApr 25, 2018

This is a multi-part series that provides information on messaging, including fault-tolerance techniques, and provide instruction and code to implement a connection between IBM MQ (formerly WebSphere MQ) and Spark Streaming.

Part 1 — Introduction to Messaging, JMS & MQ

Part 2 — Brief Discussion on Apache Spark Streaming and Use-cases

Part 3 — Reliable Delivery & Recovery Techniques with Spark Streaming

Part 4 — Implementation details for Spark MQ Connector

Part 3 — Reliable Delivery & Recovery Techniques with Spark Streaming

Messaging Delivery Semantics

Reliability of delivery is a very important topic for messaging as for some applications it is critical that messages are received and in some scenarios very important that there are no duplicates. Here’s an overview of the three different messaging delivery semantics:

At-Most-Once

At-most-once delivery means that there will not be duplicate messages; the client will either receive a message once or, if there is a failure, it will be lost. This method has the best performance and with the least implementation overhead.

This scenario may be reasonable for IoT applications where the data is non-critical or has a time lag. In these cases, it is cheaper to accept these loses than create an infrastructure to ensure they are received. E.g. A dashboard is displaying temperature readings which are being received in one minute intervals — if a reading is lost, there is little consequence to the end user.

At-Least-Once

At-least-once delivery means that there may be duplicate messages; the client will receive a message once or, if there is a failure, it may be received multiple times. As acknowledgements are required for this method, there is increased overhead and reduced performance in comparison to the at-most-once method.

This method would be used when it is necessary that the message is received but it is not going to cause disruption downstream of the application if there are duplicates E.g. Alarm systems — it is critical that an alert is received so an action can be taken but it is unlikely to matter if the message comes through multiple times.

Exactly-Once

Exactly-once delivery means that it is guaranteed a message will be received by the client but also that there will be no duplication. This is the most expensive method to implement with the worst performance but does not require development client-side for deduplication.

There are many businesses that require messages once and only once and this method is particularly useful when it is not possible to process the deduplication at the application side. E.g. Credit card transactions — for both the credit card company and customer, it is important that the charges are processed once and only once.

Messaging Acknowledgements

Acknowledgement is a way for a consumer to inform the broker that it has successfully received a message. This indicates that the broker must not deliver the same message to the consumer again. Once the broker has received acknowledgement from all consumers, it can remove the message from the queue. The following text discusses the three acknowledgement modes for JMS messaging:

Duplicates allowed

The method uses lazy acknowledgement; the acknowledgement is sent at a convenient time, when the runtime thinks it’s suitable, rather than immediately after a message is delivered. It allows for faster message processing but can result in the message being delivered multiple times, at least once messaging.

Messaging Acknowledgement: Duplicates Allowed Mode

Auto-acknowledgement

An automatic acknowledgement message from the client to the broker can be sent at two different times, based on whether the client has chosen to receive messages synchronously or asynchronously.

Synchronous: The client asks the broker for a message and an acknowledgement is received once the request has been processed, before the client is sent the message. If the system goes down before the message is received, it will be lost, at most once messaging.

Messaging Acknowledgements: Auto-acknowledgement Mode, Synchronous

Asynchronous: The acknowledgement happens immediately after the message has been processed by the client. If the system goes down before the acknowledgement is recorded, a duplicate will be delivered, at least once messaging.

Messaging Acknowledgements: Auto-acknowledgement Mode, Asynchronous

Client acknowledgement

This method requires that the client logs the messages it has received. When a system goes down before a message is received, it won’t be stored in the log. When reconnected, as the message wasn’t acknowledged, it will re-sent to the client.

Messaging Acknowledgement: Client Acknowledgement Mode (Scenario 1)

In the event that the system goes down before an acknowledgement is received, the acknowledgements will be resent once the system is restarted.

Messaging Acknowledgement: Client Acknowledgement Mode (Scenario 2)

This ensures exactly once messaging.

I found this JMS Message Delivery Reliability and Acknowledgement Pattern post incredibly valuable when trying to understand these concepts!

Spark Streaming Checkpointing

Messaging acknowledgements ensure reliability for when there is a failure within the messaging system pipeline. However, we also need to account for the Spark streaming application failing whilst processing the messages it has received (and already acknowledged, as required). In order to do this, we have used Spark checkpointing.

Spark offers two different type of checkpointing capabilities, metadata checkpointing and data checkpointing. For the purpose of messaging, we are interested in Metadata checkpointing as this is what is used to recover from failure of a streaming application.

Metadata checkpointing involves writing records at set intervals to HDFS, or an alternative fault-tolerant storage system, providing enough information to restart in the event of a failure. This includes the streaming application’s configuration, the set of operations to be run on each stream of data, as well as information on the incomplete batches. Therefore, if the system goes down, it will restart where it left off ensuring that all messages have been processed.

Next Up: Part 4 — Implementation details for Spark MQ Connector

--

--

Stacey Ronaghan

Data Scientist keen to share experiences & learnings from work & studies