RabbitMQ Stream PerfTest is a Java tool to test the RabbitMQ Stream Plugin.

Getting Started

Stream PerfTest is based on the stream Java client and uses the stream protocol to communicate with a RabbitMQ cluster. Use PerfTest if you want to test streams or queues with AMQP 0.9.1 .

Stream PerfTest is usable as an uber JAR downloadable from GitHub Release or as a Docker image . It can be built separately as well.

Snapshots are on GitHub release as well. Use the pivotalrabbitmq/stream-perf-test:dev image to use the latest snapshot in Docker.

Pre-requisites

Stream PerfTest requires Java 11 or later to run.

With Docker

The performance tool is available as a Docker image . You can use the Docker image to list the available options:

Listing the available options of the performance tool
docker run -it --rm pivotalrabbitmq/stream-perf-test --help

There are all sorts of options, if none is provided, the tool will start publishing to and consuming from a stream created only for the test.

When using Docker, the container running the performance tool must be able to connect to the broker, so you have to figure out the appropriate Docker configuration to make this possible. You can have a look at the Docker network documentation to find out more.

Docker on macOS

Docker runs on a virtual machine when using macOS, so do not expect high performance when using RabbitMQ Stream and the performance tool inside Docker on a Mac.

We show next a couple of options to easily use the Docker image.

With Docker Host Network Driver

This is the simplest way to run the image locally, with a local broker running in Docker as well. The containers use the host network , this is perfect for experimenting locally.

Running the broker and performance tool with the host network driver
# run the broker
docker run -it --rm --name rabbitmq --network host rabbitmq:3.12
# open another terminal and enable the stream plugin
docker exec rabbitmq rabbitmq-plugins enable rabbitmq_stream
# run the performance tool
docker run -it --rm --network host pivotalrabbitmq/stream-perf-test
Docker Host Network Driver Support

According to Docker’s documentation, the host networking driver only works on Linux hosts. Nevertheless, the commands above work on some Mac hosts.

With Docker Bridge Network Driver

Containers need to be able to communicate with each other with the bridge network driver , this can be done by defining a network and running the containers in this network.

Running the broker and performance tool with the bridge network driver
# create a network
docker network create stream-perf-test
# run the broker
docker run -it --rm --network stream-perf-test --name rabbitmq rabbitmq:3.12
# open another terminal and enable the stream plugin
docker exec rabbitmq rabbitmq-plugins enable rabbitmq_stream
# run the performance tool
docker run -it --rm --network stream-perf-test pivotalrabbitmq/stream-perf-test \
    --uris rabbitmq-stream://rabbitmq:5552

With the Java Binary

The Java binary is available on GitHub Release . Snapshots are available as well. To use the latest snapshot:

wget https://github.com/rabbitmq/rabbitmq-java-tools-binaries-dev/releases/download/v-stream-perf-test-latest/stream-perf-test-latest.jar

To launch a run:

$ java -jar stream-perf-test-latest.jar
17:51:26.207 [main] INFO  c.r.stream.perf.StreamPerfTest - Starting producer
1, published 560277 msg/s, confirmed 554088 msg/s, consumed 556983 msg/s, latency min/median/75th/95th/99th 2663/9799/13940/52304/57995 µs, chunk size 1125
2, published 770722 msg/s, confirmed 768209 msg/s, consumed 768585 msg/s, latency min/median/75th/95th/99th 2454/9599/12206/23940/55519 µs, chunk size 1755
3, published 915895 msg/s, confirmed 914079 msg/s, consumed 916103 msg/s, latency min/median/75th/95th/99th 2338/8820/11311/16750/52985 µs, chunk size 2121
4, published 1004257 msg/s, confirmed 1003307 msg/s, consumed 1004981 msg/s, latency min/median/75th/95th/99th 2131/8322/10639/14368/45094 µs, chunk size 2228
5, published 1061380 msg/s, confirmed 1060131 msg/s, consumed 1061610 msg/s, latency min/median/75th/95th/99th 2131/8247/10420/13905/37202 µs, chunk size 2379
6, published 1096345 msg/s, confirmed 1095947 msg/s, consumed 1097447 msg/s, latency min/median/75th/95th/99th 2131/8225/10334/13722/33109 µs, chunk size 2454
7, published 1127791 msg/s, confirmed 1127032 msg/s, consumed 1128039 msg/s, latency min/median/75th/95th/99th 1966/8150/10172/13500/23940 µs, chunk size 2513
8, published 1148846 msg/s, confirmed 1148086 msg/s, consumed 1149121 msg/s, latency min/median/75th/95th/99th 1966/8079/10135/13248/16771 µs, chunk size 2558
9, published 1167067 msg/s, confirmed 1166369 msg/s, consumed 1167311 msg/s, latency min/median/75th/95th/99th 1966/8063/9986/12977/16757 µs, chunk size 2631
10, published 1182554 msg/s, confirmed 1181938 msg/s, consumed 1182804 msg/s, latency min/median/75th/95th/99th 1966/7963/9949/12632/16619 µs, chunk size 2664
11, published 1197069 msg/s, confirmed 1196495 msg/s, consumed 1197291 msg/s, latency min/median/75th/95th/99th 1966/7917/9955/12503/15386 µs, chunk size 2761
12, published 1206687 msg/s, confirmed 1206176 msg/s, consumed 1206917 msg/s, latency min/median/75th/95th/99th 1966/7893/9975/12503/15280 µs, chunk size 2771
...
^C
Summary: published 1279444 msg/s, confirmed 1279019 msg/s, consumed 1279019 msg/s, latency 95th 12161 µs, chunk size 2910

The previous command will start publishing to and consuming from a stream stream that will be created. The tool outputs live metrics on the console and write more detailed metrics in a stream-perf-test-current.txt file that get renamed to stream-perf-test-yyyy-MM-dd-HHmmss.txt when the run ends.

To see the options:

java -jar stream-perf-test-latest.jar --help

The performance tool comes also with a completion script . You can download it and enable it in your ~/.zshrc file:

alias stream-perf-test='java -jar target/stream-perf-test.jar'
source ~/.zsh/stream-perf-test_completion

Note the activation requires an alias which must be stream-perf-test. The command can be anything though.

Common Usage

Connection

The performance tool connects by default to localhost, on port 5552, with default credentials (guest/guest), on the default / virtual host. This can be changed with the --uris option:

java -jar stream-perf-test.jar --uris rabbitmq-stream://rabbitmq-1:5552

The URI follows the same rules as the AMQP 0.9.1 URI , except the protocol must be rabbitmq-stream. The next command shows how to set up the different elements of the URI:

java -jar stream-perf-test.jar \
  --uris rabbitmq-stream://guest:guest@localhost:5552/%2f

The option accepts several values, separated by commas. By doing so, the tool will be able to pick another URI for its "locator" connection, in case a node crashes:

java -jar stream-perf-test.jar \
  --uris rabbitmq-stream://rabbitmq-1:5552,rabbitmq-stream://rabbitmq-2:5552

Note the tool uses those URIs only for management purposes, it does not use them to distribute publishers and consumers across a cluster.

It is also possible to enable TLS by using the rabbitmq-stream+tls scheme:

java -jar stream-perf-test.jar \
  --uris rabbitmq-stream+tls://guest:guest@localhost:5551/%2f

Note the performance tool will automatically configure the client to trust all server certificates and to not use a private key (for client authentication).

Have a look at the connection logic section of the stream Java client in case of connection problem.

Publishing Rate

It is possible to limit the publishing rate with the --rate option:

java -jar stream-perf-test.jar --rate 10000

RabbitMQ Stream can easily saturate the resources of the hardware, it can especially max out the storage IO. Reasoning when a system is under severe constraints can be difficult, so setting a low publishing rate can be a good idea to get familiar with the performance tool and the semantics of streams.

Number of Producers and Consumers

You can set the number of producers and consumers with the --producers and --consumers options, respectively:

java -jar stream-perf-test.jar --producers 5 --consumers 5

With the previous command, you should see a higher consuming rate than publishing rate. It is because the 5 producers publish as fast as they can and each consumer consume the messages from the 5 publishers. In theory the consumer rate should be 5 times the publishing rate, but as stated previously, the performance tool may put the broker under severe constraints, so the numbers may not add up.

You can set a low publishing rate to verify this theory:

java -jar stream-perf-test.jar --producers 5 --consumers 5 --rate 10000

With the previous command, each publisher should publish 10,000 messages per second, that is 50,000 messages per second overall. As each consumer consumes each published messages, the consuming rate should be 5 times the publishing rate, that is 250,000 messages per second. Using a small publishing rate should let plenty of resources to the system, so the rates should tend towards those values.

Streams

The performance tool uses a stream stream by default, the --streams option allows specifying streams that the tool will try to create. Note producer and consumer counts must be set accordingly, as they are not spread across the stream automatically. The following command will run a test with 3 streams, with a producer and a consumer on each of them:

java -jar stream-perf-test.jar --streams stream1,stream2,stream3 \
                               --producers 3 --consumers 3

The stream creation process has the following semantics:

  • the tool always tries to create streams.

  • if the target streams already exist and have the exact same properties as the ones the tool uses (see retention below), the run will start normally as stream creation is idempotent.

  • if the target streams already exist but do not have the exact same properties as the ones the tool uses, the run will start normally as well, the tool will output a warning.

  • for any other errors during creation, the run will stop.

  • the streams are not deleted after the run.

  • if you want the tool to delete the streams after a run, use the --delete-streams flag.

Specifying streams one by one can become tedious as their number grows, so the --stream-count option can be combined with the --streams option to specify a number or a range and a stream name pattern, respectively. The following table shows the usage of these 2 options and the resulting exercised streams. Do not forget to also specify the appropriate number of producers and consumers if you want all the declared streams to be used.

Options Computed Streams Details

--stream-count 5 --streams stream

stream-1,stream-2,stream-3,stream-4,stream-5

Stream count starts at 1.

--stream-count 5 --streams stream-%d

stream-1,stream-2,stream-3,stream-4,stream-5

Possible to specify a Java printf-style format string .

--stream-count 10 --streams stream-%d

stream-1,stream-2,stream-3,…​, stream-10

Not bad, but not correctly sorted alphabetically.

--stream-count 10 --streams stream-%02d

stream-01,stream-02,stream-03,…​, stream-10

Better for sorting.

--stream-count 10 --streams stream

stream-01,stream-02,stream-03,…​, stream-10

The default format string handles the sorting issue.

--stream-count 50-500 --streams stream-%03d

stream-050,stream-051,stream-052,…​, stream-500

Ranges are accepted.

--stream-count 50-500

stream-050,stream-051,stream-052,…​, stream-500

Default format string.

Publishing Batch Size

The default publishing batch size is 100, that is a publishing frame is sent every 100 messages. The following command sets the batch size to 50 with the --batch-size option:

java -jar stream-perf-test.jar --batch-size 50

There is no ideal batch size, it is a tradeoff between throughput and latency. High batch size values should increase throughput (usually good) and latency (usually not so good), whereas low batch size should decrease throughput (usually not good) and latency (usually good).

Unconfirmed Messages

A publisher can have at most 10,000 unconfirmed messages at some point. If it reaches this value, it has to wait until the broker confirms some messages. This avoids fast publishers overwhelming the broker. The --confirms option allows changing the default value:

java -jar stream-perf-test.jar --confirms 20000

High values should increase throughput at the cost of consuming more memory, whereas low values should decrease throughput and memory consumption.

Message Size

The default size of a message is 10 bytes, which is rather small. The --size option lets you specify a different size, usually higher, to have a value close to your use case. The next command sets a size of 1 KB:

java -jar stream-perf-test.jar --size 1024

Note the message body size cannot be smaller that 8 bytes, as the performance tool stores a long in each message to calculate the latency. Note also the actual size of a message will be slightly higher, as the body is wrapped in an AMQP 1.0 message .

Connection Pooling

The performance tool does not use connection pooling by default: each producer and consumer has its own connection. This can be appropriate to reach maximum throughput in performance test runs, as producers and consumers do not share connections. But it may not always reflect what applications do, as they may have slow producers and not-so-busy consumers, so sharing connections becomes interesting to save some resources.

It is possible to configure connection pooling with the --producers-by-connection and --consumers-by-connection options. They accept a value between 1 and 255, the default being 1 (no connection pooling).

In the following example we use 10 streams with 1 producer and 1 consumer on each of them. As the rate is low, we can re-use connections:

java -jar stream-perf-test.jar --producers 10 --consumers 10 --stream-count 10 \
                               --rate 1000 \
                               --producers-by-connection 50 --consumers-by-connection 50

We end up using 2 connections for the producers and consumers with connection pooling, instead of 20.

Advanced Usage

Retention

If you run performance tests for a long time, you might be interested in setting a retention strategy for the streams the performance tool creates for a run. This would typically avoid saturating the storage devices of your servers. The default values are 20 GB for the maximum size of a stream and 500 MB for each segment files that composes a stream. You can change these values with the --max-length-bytes and --stream-max-segment-size-bytes options:

java -jar stream-perf-test.jar --max-length-bytes 10gb \
                               --stream-max-segment-size-bytes 250mb

Both options accept units (kb, mb, gb, tb), as well as no unit to specify a number of bytes.

It is also possible to use the time-based retention strategy with the --max-age option. This can be less predictable than --max-length-bytes in the context of performance tests though. The following command shows how to set the maximum age of segments to 5 minutes with a maximum segment size of 250 MB:

java -jar stream-perf-test.jar --max-age PT5M \
                               --stream-max-segment-size-bytes 250mb

The --max-age option uses the ISO 8601 duration format .

Offset (Consumer)

Consumers start by default at the very end of a stream (offset next). It is possible to specify an offset to start from with the --offset option, if you have existing streams, and you want to consume from them at a specific offset. The following command sets the consumer to start consuming at the beginning of a stream:

java -jar stream-perf-test.jar --offset first

The accepted values for --offset are first, last, next (the default), an unsigned long for a given offset, and an ISO 8601 formatted timestamp (eg. 2020-06-03T07:45:54Z).

Offset Tracking (Consumer)

A consumer can track the point it has reached in a stream to be able to restart where it left off in a new incarnation. The performance tool has the --store-every option to tell consumers to store the offset every x messages to be able to measure the impact of offset tracking in terms of throughput and storage. This feature is disabled by default. The following command shows how to store the offset every 100,000 messages:

java -jar stream-perf-test.jar --store-every 100000

Consumer Names

When using --store-every (see above) for offset tracking , the performance tool uses a default name using the pattern {stream-name}-{consumer-number}. So the default name of a single tracking consumer consuming from stream will be stream-1.

The consumer names pattern can be set with the --consumer-names option, which uses the Java printf-style format string . The stream name and the consumer number are injected as arguments, in this order.

The following table illustrates some examples for the --consumer-names option for a s1 stream and a second consumer:

Option Computed Name Details

%s-%d

s1-2

Default pattern.

stream-%s-consumer-%d

stream-s1-consumer-2

consumer-%2$d-on-stream-%1$s

consumer-2-on-stream-s1

The argument indexes (1$ for the stream, 2$ for the consumer number) must be used as the pattern uses the consumer number first, which is not the pre-defined order of arguments.

uuid

7cc75659-ea67-4874-96ef-151a505e1a55

Random UUID that changes for every run.

Note you can use --consumer-names uuid to change the consumer names for every run. This can be useful when you want to use tracking consumers in different runs but you want to force the offset they start consuming from. With consumer names that do not change between runs, tracking consumers would ignore the specified offset and would start where they left off (this is the purpose of offset tracking).

Producer Names

You can use the --producer-names option to set the producer names pattern and therefore enable message deduplication (using the default publishing sequence starting at 0 and incremented for each message). The same naming options apply as above in consumer names with the only difference that the default pattern is empty (i.e. no deduplication).

Here is an example of the usage of the --producer-names option:

java -jar stream-perf-test.jar --producer-names %s-%d

The run will start one producer and will use the stream-1 producer reference (default stream is stream and the number of the producer is 1.)

Load Balancer in Front of the Cluster

A load balancer can misguide the performance tool when it tries to connect to nodes that host stream leaders and replicas. The Connecting to Streams blog post covers why client applications must connect to the appropriate nodes in a cluster.

Use the --load-balancer flag to make sure the performance tool always goes through the load balancer that sits in front of your cluster:

java -jar stream-perf-test.jar --uris rabbitmq-stream://my-load-balancer:5552 \
                               --load-balancer

The same blog post covers why a load balancer can make things more complicated for client applications like the performance tool and how they can mitigate these issues .

Single Active Consumer

If the --single-active-consumer flag is set, the performance tool will create single active consumer instances. This means that if there are more consumers than streams, there will be only one active consumer at a time on a stream, if they share the same name. Note offset tracking gets enabled automatically if it’s not with --single-active-consumer (using 10,000 for --store-every). Let’s see a couple of examples.

In the following command we have 1 producer publishing to 1 stream and 3 consumers on this stream. As --single-active-consumer is used, only one of these consumers will be active at a time.

java -jar stream-perf-test.jar --producers 1 --consumers 3 --single-active-consumer \
                               --consumer-names my-app

Note we use a fixed value for the consumer names: if they don’t have the same name, the broker will not consider them as a group of consumers, so they will all get messages, like regular consumers.

In the following example we have 2 producers for 2 streams and 6 consumers overall (3 for each stream). Note the consumers have the same name on their streams with the use of --consumer-names my-app-%s, as %s is a placeholder for the stream name.

java -jar stream-perf-test.jar --producers 2 --consumers 6 --stream-count 2 \
                               --single-active-consumer --consumer-names my-app-%s

Super Streams

The performance tool has a --super-streams flag to enable super streams on the publisher and consumer sides. This support is meant to be used with the --single-active-consumer flag, to benefit from both features . We recommend reading the appropriate sections of the documentation to understand the semantics of the flags before using them. Let’s see some examples.

The example below creates 1 producer and 3 consumers on the default stream, which is now a super stream because of the --super-streams flag:

java -jar stream-perf-test.jar --producers 1 --consumers 3 --single-active-consumer \
                               --super-streams --consumer-names my-app

The performance tool creates 3 individual streams by default, they are the partitions of the super stream. They are named stream-0, stream-1, and stream-2, after the name of the super stream, stream. The producer will publish to each of them, using a hash-based routing strategy .

A consumer is composite with --super-streams: it creates a consumer instance for each partition. This is 9 consumer instances overall – 3 composite consumers and 3 partitions – spread evenly across the partitions, but with only one active at a time on a given stream.

Note we use a fixed consumer name so that the broker considers the consumers belong to the same group and enforce the single active consumer behavior.

The next example is more convoluted. We are going to work with 2 super streams (--stream-count 2 and --super-streams). Each super stream will have 5 partitions (--super-stream-partitions 5), so this is 10 streams overall (stream-1-0 to stream-1-4 and stream-2-0 to stream-2-4). Here is the command line:

java -jar stream-perf-test.jar --producers 2 --consumers 6 --stream-count 2 \
                               --super-streams --super-stream-partitions 5 \
                               --single-active-consumer \
                               --consumer-names my-app-%s

We see also that each super stream has 1 producer (--producers 2) and 3 consumers (--consumers 6). The composite consumers will spread their consumer instances across the partitions. Each partition will have 3 consumers but only 1 active at a time with --single-active-consumer and --consumer-names my-app-%s (the consumers on a given stream have the same name, so the broker make sure only one consumes at a time).

Note the performance tool does not use connection pooling by default. The command above opens a significant number of connections – 30 just for consumers – and may not reflect exactly how applications are deployed in the real world. Don’t hesitate to use the --producers-by-connection and --consumers-by-connection options to make the runs as close to your workloads as possible.

Monitoring

The tool can expose some runtime information on HTTP. The default port is 8080. The following options are available:

  • --monitoring: add a threaddump endpoint to display a thread dump of the process. This can be useful to inspect threads if the tool seems blocked.

  • --prometheus: add a metrics endpoint to expose metrics using the Prometheus format. The endpoint can then be declared in a Prometheus instance to scrape the metrics.

  • --monitoring-port: set the port to use for the web server.

Synchronizing Several Instances

This feature is available only on Java 11 or more.

Instances of the performance tool can synchronize to start at the same time. This can prove useful when you apply different workloads and want to compare them on the same monitoring graphics. The --id flag identifies the group of instances that need to synchronize and the --expected-instances flag sets the size of the group.

Let’s start a couple of instances to compare the impact of message size. The first instance uses 100-byte message:

java -jar stream-perf-test.jar --id msg-size-comparison --expected-instances 2 \
                               --size 100

The instance will wait until the second one is ready:

java -jar stream-perf-test.jar --id msg-size-comparison --expected-instances 2 \
                               --size 200

Both instances must share the same --id if they want to communicate to synchronize.

The default synchronization timeout is 10 minutes. This can be changed with the --instance-sync-timeout flag, using a value in seconds.

Instance synchronization is compatible with PerfTest , the AMQP 0.9.1 performance tool for RabbitMQ: instances of both tools can synchronize with each other. The 2 tools use the same flags for this feature.

Instance synchronization requires IP multicast to be available. IP multicast is not necessary when the performance tool runs on Kubernetes pods. In this case, the tool asks Kubernetes for a list of pod IPs. The performance tool instances are expected to run in the same namespace, and the namespace must be available in the MY_POD_NAMESPACE environment variable or provided with the --instance-sync-namespace flag. As soon as the namespace information is available, the tool will prefer listing pod IPs over using IP multicast. Here is an example of using instance synchronization on Kubernetes by providing the namespace explicitly:

java -jar stream-perf-test.jar --id workload-1 --expected-instances 2 \
                               --instance-sync-namespace qa
The performance tool needs permission to ask Kubernetes for a list of pod IPs. This is done by creating various policies e.g. with YAML. See the Kubernetes discovery protocol for JGroups page for more information.

Using Environment Variables as Options

Environment variables can sometimes be easier to work with than command line options. This is especially true when using a manifest file for configuration (with Docker Compose or Kubernetes) and the number of options used grows.

The performance tool automatically uses environment variables that match the snake case version of its long options. E.g. it automatically picks up the value of the BATCH_SIZE environment variable for the --batch-size option, but only if the environment variable is defined.

You can list the environment variables that the tool picks up with the following command:

java -jar stream-perf-test.jar --environment-variables

The short version of the option is -env.

To avoid collisions with environment variables that already exist, it is possible to specify a prefix for the environment variables that the tool looks up. This prefix is defined with the RABBITMQ_STREAM_PERF_TEST_ENV_PREFIX environment variable, e.g.:

RABBITMQ_STREAM_PERF_TEST_ENV_PREFIX="STREAM_PERF_TEST_"

With RABBITMQ_STREAM_PERF_TEST_ENV_PREFIX="STREAM_PERF_TEST_" defined, the tool looks for the STREAM_PERF_TEST_BATCH_SIZE environment variable, not BATCH_SIZE.

Logging

The performance tool binary uses Logback with an internal configuration file. The default log level is warn with a console appender.

It is possible to define loggers directly from the command line, this is useful for quick debugging. Use the rabbitmq.streamperftest.loggers system property with name=level pairs, e.g.:

java -Drabbitmq.streamperftest.loggers=com.rabbitmq.stream=debug -jar stream-perf-test.jar

It is possible to define several loggers by separating them with commas, e.g. -Drabbitmq.streamperftest.loggers=com.rabbitmq.stream=debug,com.rabbitmq.stream.perf=info.

It is also possible to use an environment variable:

export RABBITMQ_STREAM_PERF_TEST_LOGGERS=com.rabbitmq.stream=debug

The system property takes precedence over the environment variable.

Use the environment variable with the Docker image:

docker run -it --rm --network host \
    --env RABBITMQ_STREAM_PERF_TEST_LOGGERS=com.rabbitmq.stream=debug \
    pivotalrabbitmq/stream-perf-test

Building the Performance Tool

To build the uber JAR:

./mvnw clean package -Dmaven.test.skip

Then run the tool:

java -jar target/stream-perf-test.jar