MQTT over TCP

Introduction

In this article I will be trying to understand how is the flow in a single MQTT connection. For those who don’t know what is MQTT here is their main page, where you can find reliable information about this protocol, along with several use cases. I personally recommend you to check this protocol.

Prerequisites

  1. Docker
  2. emqx/emqx docker image
  3. Pyton
  4. paho-mqtt python library
  5. Wireshark

Setup

Having Docker and Python 3 installed, run the following command:

  • Downloading emqx/emqx Docker image
  docker pull emqx/emqx
  • Running this image as a container
  docker run -d --name emqx -p 18083:18083 -p 1883:1883 emqx/emqx:latest
  • Making sure container is up without problem
  docker ps -a

Check the status of the container, it should be Up.

  • Install paho-mqtt python library
  pip install paho-mqtt
  • Run Wireshark
  sudo wireshark

After all these we should be good to go.

Preparing python scripts

The python scripts are the same that you can find in MQTT in Pyton(Paho), with a modification. We don’t want to send that amount of message, instead I will send only one message. All the analysis will be with the publisher script, but you can perform the same task with the subscriber script.

# pub.py modified from original

import random
import time
from paho.mqtt import client as mqtt_client

# Configuration
broker = '0.0.0.0'
port = 1883
topic = "/python/mqtt"
client_id = f"python-mqtt-{random.randint(0, 1000)}"

def connect_mqtt():
    """
    connect_mqtt: handle the connection.
    """
    def on_connect(client, userdata, flags, rc):
        if rc == 0:
            print("Connected to [MQTT](https://mqtt.org/) Broker!")
        else:
            print("failed to connect returned code %d\n", rc)

    client = mqtt_client.Client(client_id)    
    client.on_connect = on_connect
    client.connect(broker, port)
    return client

def publish(client):
    """
    publish: publish data to the broker.
    """
    msg_count = 0    
    time.sleep(1)
    msg = f"messages: {msg_count}"
    result = client.publish(topic, msg)
    # result: [0, 1]
    status = result[0]
    if status == 0:
        print(f"Send `{msg}` to topic: `{topic}`")
    else:
        print(f"Failed to send message to topic `{topic}`")
    msg_count += 1

def run():
    client = connect_mqtt()
    client.loop_start()
    publish(client)

if __name__ == '__main__':
    run()

General overview of this setup

If you are new to MQTT all this setup it may be pretty strange for you. I’ll do my best to explain to you why all this is needed, a pretty sketch with excalidraw will help you to understand.

Pub, EMQX, Sub

According to this sketch, both the publisher and the subscriber are connected to emqx/emqx. In this case I’ll be looking only one part of this communication, the interaction between the publisher and the broker.

Looking inside the packets

It’s time to see the packets on this communication, with the idea that at least get an idea of how this works. If you followed all the steps above, you should have emqx running and waiting for connection in 1883 port. Now make sure also to select the appropriate network interface with Wireshark, as I explained on the tutorial related to TCP. The filter in this case will be tcp.port == 1883 | mqtt. With all these, it is time to run the pub.py script, let’s see what happens.

python pub.py

Output:

Connected to [MQTT](https://mqtt.org/) Broker!
Send `messages: 0` to topic: `/python/mqtt`

If you read the output, it seems that the publisher just connected and send the message to the topic /python/mqtt. Mmm… :thinking: but that’s not all, there’s a lot of back and forward between the publisher and emqx, let’s explore the packets that we captured with Wireshark.

After running the script you should be seeing something like this.

Traffic mqtt and tcp.port 1883

As I commented to you, is not just sent.

TCP handshake

The first three packets that you can see on Wireshark are related to the TCP handshake between the publisher and emqx.

TCP handshake

CONNECT command

connect pub emqx

After the handshake is established the publisher send to emqx a CONNECT COMMAND, this is something new. What is this CONNECT command? This CONNECT command is part of MQTT protocol, let’s see inside the packet

CONNECT command mqtt

Mmm…:thinking: is not so easy to take an idea just with this picture, right? As a personal recommendation, you should take this picture and the MQTT specification, in case you are too lazy to read part of a specification I understand, so stay with me and question everything I say, that will help you to think more critical about this subject. Any specification of a protocol, is a document very dense and extremely technical, so don’t worry if you don’t understand everything. In our simple case you should be looking at section 2 MQTT Control Packet format and also skimming section 1 Introduction.

According to the specification an MQTT Control Packet is structured in the following way:

  • Fixed Header, present in all MQTT Control Packets
  • Variable Header, present in some MQTT Control Packets
  • Payload, present in some MQTT Control Packets

Now back to the picture. Could we identify some of these parts? At least I will make my best with the Fixed Header. Let’s see, according to the photo, which represent the traffic intercepted by Wireshark, the Fixed Header would be composed by:

  • Header Flags, which specify the Message Type, in our case is a Connect command.
  • Connect Flags, which contains flags specific to our Message Type. One really important here, is the QoS, which stands for Quality of Service. QoS can take three values: 0, 1 or 2. The section 4.3 Payload, present in some MQTT Control Packets is dedicated to it, check it out to get an idea of its importance.

There’s also more information that Wireshark show you, like Protocol name, Version, Msg length, and Client ID. The first two are used to identify the request with the protocol, is like saying “hey I’m speaking MQTT v3.1.1”, in this way the other part can return an appropriate response. While the Client ID, like the name suggests, is a way to identify the other part connected.

EMQX TCP ACK

After the CONNECT command, emqx sends an ACK packet over TCP with the common data related to this packet type.

emqx ack tcp

EMQX CONNECT ACK command

emqx conn ack mqtt

Now, using the MQTT protocol, is sent to the publisher the CONNECT ACK command. This is the equivalent of saying: “Ok, I acknowledge your attempt of connection”, all in the sense of MQTT protocol, because as we saw before they are already connected by TCP. They are just agreeing in “talking MQTT” for now on, here’s a screenshot of how Wireshark sees this case.

Conn ack command emqx, packet

Note the Return Code, as I mention before it tell us “Connection accepted”, thanks to Wireshark we can see this in human-readable format.

Publisher send ACK by TCP and emqx response with an ACK by TCP

publisher ack tcp

emqx ack tcp again

TCP chatting between both points before sending the actual message.

Publisher send Publish message command

publisher pub msg

Now actually sending the data from the publisher to the emqx

pub msg packet

Here’s where all the important information goes, take a look at the picture. We got Msg Length, Topic Length, Topic and the actual Message. It is in this last packet that the publisher really send all the data, he specified the Topic and the message, and many other things too. In this way a subscriber that is connected to emqx, and is “subscribed” to this specific topic, could read the message.

Conclusion

We saw how chatty are protocols, specially this one. In this case we analyze MQTT over TCP, but there’s another case that deserve our attention, and is the case of MQTT over Websocket. We didn’t covered the part of the subscriber here, but you should try it yourself.

That’s all 👋

Bibliography

  1. EMQX Docker image
  2. MQTT specification
  3. HiveQT
  4. How to use MQTT in Pyton