Configure the Broker

The default configuration provided with the Kafka distribution is sufficient to run a standalone server as a proof of concept, but most likely will not be sufficient for large installations. There are numerous configuration options for Kafka that control all aspects of setup and tuning. Most of the options can be left at the default settings, though, as they deal with tuning aspects of the Kafka broker that will not be applicable until you have a specific use case that requires adjusting these settings.

Configuration can be set via a properties file or even command line arguments. Configuration can be found in /usr/local/kafka/config/server.properties.

There are several broker configuration parameters that should be reviewed when deploying Kafka for any environment other than a standalone broker on a single server.

broker.id

Every Kafka broker must have an integer identifier, which is set using the configuration. By default, this integer is set to, but it can be any value. The selection of this number is technically arbitrary, however, it is highly recommended to set this value to something intrinsic to the host so that when performing maintenance it is not onerous to map broker ID numbers to hosts.

listeners

Older versions of Kafka used a simple port configuration, which started Kafka with a listener on TCP port 9092.

The new listeners config is a comma-separated list of URIs that we listen on with the listener names. If the listener name is not a common security protocol, then another config listener.security.protocol.map must also be configured. A listener is defined as <protocol>://<hostname>:<port>.

Specifying the hostname as 0.0.0.0 will bind to all interfaces. Leaving the hostname empty will bind it to the default interface. Keep in mind that if a port lower than 1024 is chosen, Kafka must be started as root. Running Kafka as root is not a recommended configuration.

zookeeper.connect

The location of the ZooKeeper used for storing the broker metadata is set using the zookeeper.connect configuration parameter. The format for this parameter is a semicolon-separated list of hostname:port/path strings. Example value is localhost:2181

/path - an optional path to use as a chroot environment for the Kafka cluster. If it is omitted, the root path is used.

If a chroot path (a path designated to act as the root directory for a given application) is specified and does not exist, it will be created by the broker when it starts up.

log.dirs

Kafka persists all messages to disk, and these log segments are stored in the directory specified in the log.dir configuration. For multiple directories, the config log.dirs is preferable. If this value is not set, it will default back to log.dir. log.dirs is a comma-separated list of paths on the local system. If more than one path is specified, the broker will store partitions on them in a “least-used” fashion, with one partition’s log segments stored within the same path. Note that the broker will place a new partition in the path that has the least number of partitions currently stored in it, not the least amount of disk space used, so an even distribution of data across multiple directories is not guaranteed.

num.recovery.threads.per.data.dir

Kafka uses a configurable pool of threads for handling log segments. Currently, this thread pool is used:

When starting normally, to open each partition’s log segments
When starting after a failure, to check and truncate each partition’s log segments
When shutting down, to cleanly close log segments

By default, only one thread per log directory is used. As these threads are only used during startup and shutdown, it is reasonable to set a larger number of threads in order to parallelize operations. Specifically, when recovering from an unclean shutdown, this can mean the difference of several hours when restarting a broker with a large number of partitions!

When setting this parameter, remember that the number configured is per log directory specified with log.dirs. This means that if num.recovery.threads.per.data.dir is set to 8, and there are 3 paths specified in log.dirs, this is a total of 24 threads.

auto.create.topics.enable

The default Kafka configuration specifies that the broker should automatically create a topic under the following circumstances:

When a producer starts writing messages to the topic
When a consumer starts reading messages from the topic
When any client requests metadata for the topic

In many situations, this can be undesirable behavior, especially as there is no way to validate the existence of a topic through the Kafka protocol without causing it to be created. If you are managing topic creation explicitly, whether manually or through a provisioning system, you can set the auto.create.topics.enable configuration to false.

auto.leader.rebalance.enable

In order to ensure a Kafka cluster doesn’t become unbalanced by having all topic leadership on one broker, this config can be specified to ensure leadership is balanced as much as possible. It enables a background thread that checks the distribution of partitions at regular intervals (this interval is configurable via leader.imbalance.check.interval.seconds). If leadership imbalance exceeds another config, leader.imbalance.per.broker.percentage, then a rebalance of preferred leaders for partitions is started.

delete.topic.enable

Depending on your environment and data retention guidelines, you may wish to lock down a cluster to prevent arbitrary deletions of topics. Disabling topic deletion can be set by setting this flag to false.