Processing Data from MQ with Spark Streaming: Part 1 - Introduction to Messaging, JMS & MQ
This is a multi-part series that provides information on messaging, including fault-tolerance techniques, and provide instruction and code to implement a connection between IBM MQ (formerly WebSphere MQ) and Spark Streaming.
Part 1 — Introduction to Messaging, JMS & MQ
Part 2 — Brief Discussion on Apache Spark Streaming and Use-cases
Part 3 — Reliable Delivery & Recovery Techniques with Spark Streaming
Part 4 — Implementation details for Spark MQ Connector
Part 1 — Introduction to Messaging, JMS & MQ
Introduction to messaging
Messaging is a method of communication between applications or software components. In a messaging system, a client sends a message to a message broker. The recipient of the message can then connect to the same broker and read the message. The sender (also known as producer) and recipient (also known as consumer) do not need to be available at the same time to communicate. They also only need to know the name and address of the message broker to talk to and have little or no knowledge of each other, this design is referred to as loosely coupled.
Terminology
Message — A collection of data sent by one program intended for another program
Client — The program that sends and/or receives messages to/from another client
Message Broker — A client connects to a message broker to send a message and another client can then connect to the same message broker to receive it
Queue — The destination to which messages can be sent; messages accumulate on queues until they are retrieved by the consumer of that queue (client)
Queue Manager — A queue manager is a system program that provides additional queuing services (e.g. allow for administrators to alter the properties of existing queues)
Channels — (important for IBM MQ) A configuration/protocol that defines the communication path between the queue manager and another queue manager or a client application
A great analogy I heard for messaging systems relates to restaurants. The waiter must get an order to the chef to be prepared but they don’t directly interact. The waiter and chef are both clients in this scenario and the terminal, or point-of-sale system, that takes the order from the waiter and delivers it to the chef is the message broker. The orders sit on a queue for the chef who then takes them in the order they were placed.
Topic — A topic describes the subject of the information that is published. Multiple clients can subscribe to a topic and all will receive the messages
With a queue, only one client will receive the message. Topics allow for many clients to receive the same message.
Within the restaurant analogy, if more than one chef needs to know about certain orders then topics could be used. Let’s have an example of two topics, one for pizza and the other for burgers. There’s only one chef that is in charge of burgers so they have subscribed to that queue. However, for pizza, there are two chefs, one that prepares the base and the other in charge of toppings. They both subscribe to the “pizza” topic and receive the same order.
When I was trying to understand terminologies, I found the IBM Knowledge Center Message queuing terminology page useful.
JMS & IBM MQ
Java Message Service (JMS) is a specification for implementing a messaging system. It provides an API for creating, sending and receiving messages. Aside from being loosely coupled, it is also asynchronous meaning that messages are delivered as they arrive and do not require the consumer to ask for the message. It is also reliable meaning there are methods in place that ensure a message can be delivered once and only once.
IBM MQ provides IBM’s enterprise-grade implementation of JMS.
Next Up: Part 2: Brief Discussion on Apache Spark Streaming and Use-cases