Unlocking the Power of Multi-Protocol Messaging with Apache Pulsar
42 Slides5.61 MB
Unlocking the Power of Multi-Protocol Messaging with Apache Pulsar
Apache Pulsar Committer Former Principal Software Engineer on Splunk’s messaging team that is responsible for Splunk’s internal Pulsar-as-aService platform. Former Director of Solution Architecture at Streamlio. Global practice director of Professional Services at Hortonworks. David Kjerrumgaard Developer Advocate streamnative.io
Author of Pulsar In Action. Co-Author of Practical Hive David Kjerrumgaard Author streamnative.io
Agenda Messaging Systems Use Cases and Applications Apache Pulsar’s Protocol Handlers Demonstration Best Practices and Tips Q&A
Messaging Protocols
The problem with messaging protocols, is that every messaging system has their own.
Messaging Protocols At their core, messaging systems are client/server applications. The clients communicate with the broker using a predefined set of commands, a.k.a. a protocol.
Messaging Protocols are used for Data Flow Protocols are used to send lowlevel data flow commands between clients and brokers. These protocols are tightly coupled to the messaging system itself, meaning that they can only support one protocol.
Challenges of Single-Protocol Systems Over the past 30 years, multiple messaging systems have been developed to address different challenges. Having each of them use their own protocol means that they cannot communicate with one another. Application interoperability is limited to the same protocol.
Single-Protocol Message Systems Data Silos
No Good Solution Multiple application dependencies. Multiple messaging systems to maintain. Data duplication. Message consumption tracking pushed to application.
Automated Replication has Issues Dependency on protocol bridge process. Additional Latency from replication/bridge Multiple messaging systems to maintain. Data duplication.
Desired State Single copy of the data. Single system to maintain. Publish with one protocol, consume with any.
Benefits of Multi-Protocol System Interoperability: between systems that use various messaging standards. Diverse Use Cases Support a broad spectrum of use cases within a single messaging platform. Integration of legacy systems that rely on older messaging protocols without requiring complete rewrites or replacements Simplified Infrastructure: Consolidate all messaging onto a single platform to save money.
What is Apache Pulsar?
Pulsar is an infinite event stream storage
Storage Model - Topic
Storage Model - Event Stream
The Pulsar messaging protocol is a thin layer on top of the stream storage.
How Pulsar Implements Its Protocol
Apache Pulsar Support for Multi-Protocol
Let’s treat every messaging protocol as a thin layer on top of Pulsar’s stream storage
Pulsar Protocol Handlers
The *oP Kafka-on-Pulsar (StreamNative OVHCloud) Open-sourced on March 2020 AMQP-on-Pulsar (StreamNative China Mobile) Announced on June, 2020 MQTT-on-Pulsar (StreamNative) . Announced on September, 2020
The AMQP Protocol The Advanced Message Queuing Protocol (AMQP) is an open standard application layer protocol for message-oriented middleware. The defining features of AMQP are message orientation, queuing, routing (including point-to-point and publish-andsubscribe), reliability, and security. Initially developed at JP Morgan Chase, AMQP is widely adopted across multiple industries, to exchange critical business information reliably.
AoP Basics
AMQP-on-Pulsar Both the AMQP Proxy and protocol handler run alongside the Pulsar brokers. Allows AMQP clients to publish and consume from an Apache Pulsar cluster.
AMQP Proxy An optional component that extends AMQP support horizontally across Pulsar Brokers. Forwards messages between the AMQP clients and the Brokers. Therefore, clients only need to connect to the proxy regardless of topic assignment.
The MQTT Protocol Message Queuing Telemetry Transport (MQTT) is a lightweight publish-subscribe messaging transport protocol. It is ideal for connecting remote devices with a small code footprint and minimal network bandwidth to provide real-time and reliable messaging services. As a low-overhead and low-bandwidth real-time communication protocol, MQTT today is adopted widely across multiple industries, such as Industrial IoT, smart devices, etc.
MQTT on Pulsar Both the MQTT Proxy and MQTT protocol handler run alongside the Pulsar brokers. Allows MQTT clients to publish and consume from an Apache Pulsar cluster.
MoP Proxy An optional component that extends MQTT support horizontally across Pulsar Brokers. Forwards messages between the MQTT clients to the Brokers. Therefore, clients only need to connect to the proxy regardless of topic assignment.
The Kafka Protocol The Kafka messaging system has its own transport protocol and is the only way for clients to send/receive data from Kafka. Given Kafka’s popularity, this protocol is adopted widely across multiple industries. There is a significant number of custom applications that have been developed around the Kafka protocol and its opensource frameworks, such as KStreams, KTables, etc. KoP makes it easier to port these apps over with minimal code changes.
Kafka on Pulsar Just the Kafka protocol handler runs alongside the Pulsar brokers. Allows Kafka clients to publish and consume from an Apache Pulsar cluster.
The Group Coordinator/Offsets Problem In Kafka, the group coordinator is an elected actor within the cluster responsible for: Assigning partitions to consumers of a consumer group Managing offsets for each consumer group In Pulsar, partition assignment is managed by the broker on a perpartition basis. Offset management is done by storing the acknowledgments in cursors by the owner broker of that partition. Simulating this at the proxy level is hard (missing low-level info)
The JMS Client Library Starlight for JMS implements the JMS 2.0 (Java Messaging Service ) API over the Apache Pulsar Java Client. Drop-in client library, so no protocol handler is required.
Demonstration
Overview of Demo Start Pulsar using docker-compose Start Microservice for Pulsar, Kafka, and JMS that publish and consume from the same topic. Observe that the messages are exchanged regardless of the protocol used to send them. Code: [email protected]:david-streamlio/multi-protocol-pulsar.git
Summary
Summary Traditionally, messaging systems have used different protocols to produce/consume messages. Different Messaging Protocols are suited for different use cases; for example, MQTT is excellent for IoT data streaming, while AMQP is widely used in enterprise messaging. Many enterprises have legacy systems that rely on older messaging protocols such as JMS. Pulsar’s protocol handlers provide an abstraction over its infinite stream storage that allows it to communicate over multiple protocols,
Publish and consume messages at scale from anywhere using any protocol and language.
Let’s Keep in Touch! David Kjerrumgaard Developer Advocate @davidkjerrumga 1 https://www.linkedin.com/in/davidkj/ https://github.com/davidkj
Q&A