pl apacze

Closely Watched Kafka. Monitoring Apache Kafka in Business Process Management

"Somebody must have made a false accusation against Joseph K., for he was arrested one morning without having done anything wrong." [1]  This - one of the better-known opening lines of the novel - comes from Franz Kafka's "The Trial", radiating with absurdity and dystopia. While Kafka's sense of threat arises from vague danger and lack of clear rules - in another classic example, Orwell's "1984" - the dystopian nature of the world is achieved primarily through the existence of permanent and all-knowing surveillance of citizens.
Jak to często bywa – co dla ludzi byłoby światem nie do zniesienia, w systemach zbudowanych z kodu, jest konieczne do sprawnego, efektywnego i bezpiecznego działania. W dzisiejszym artykule przybliżymy Wam w związku z tym, jak zespół pracujący nad Flowee – naszą autorską platformą BPMN umożliwiającą łatwe tworzenie, zarządzanie i uruchamianie szeroko pojętych procesów biznesowych – wykorzystuje jeden z technicznych elementów tej platformy działający „w tle”, Apache Kafka. Oraz to, jak Kafkę monitorować 😉 [2] If you want to try out monitoring methods, log in to GitHub. We have a repository with examples..

Flowee is a platform designed to minimise the need to write code when deploying new processes for the client. It achieves this through advanced graphical editors of processes and forms. If you want to learn more about it, visit THE WEBSITE..

1. What is Apache Kafka?

Apache Kafka Apache Kafka is a messaging system designed to handle large volumes of real-time data. This means that the primary goal of this platform is to enable the creation of systems for which the ability to scale quickly and asynchronously in perceived real-time by the user is crucial. Kafka was initially created by LinkedIn and is currently an open-source project managed by the Apache Software Foundation. It is widely used for building scalable data stream processing applications, log aggregation, telemetry systems, data integration – in architectures based on microservices and many more. As evidence of Kafka's reliability and scalability, in 2019, LinkedIn, which uses Kafka, reported that its infrastructure based on it handles 7 trillion messages per day! For those who get lost in the number of zeros - a trillion is otherwise a million million or a thousand billion. Impressive!

2. Apache Kafka in Flowee

Flowee is a distributed system consisting of many microservices. Each of them is responsible for a small, clearly defined fragment of the logic of the entire system. As in any such architecture, ensuring reliable and efficient communication is crucial. To achieve this goal, we chose Apache Kafka when creating Flowee.

3. Why is monitoring the Kafka cluster crucial?

Monitoring Apache Kafka in a production environment is crucial for many reasons, both in terms of ensuring the reliability and performance of the system and minimising downtime in case of problems. Among them, we can distinguish the following issues:

Monitoring helps understand how Kafka resources are utilised and where bottlenecks may occur. This allows for more efficient performance optimisation and resource management.

Early detection of problems, such as delays in processing, Kafka broker overload, or network issues, is crucial for maintaining the continuity of the system's operation. Rapid identification and resolution of problems minimises negative impact on business operations. 

Monitoring allows an understanding of usage patterns and system throughput, which is essential for scaling planning. This information helps make decisions about expanding or modifying infrastructure to handle increased load.

Monitoring can help detect unusual patterns of behavior that may indicate security issues. (np. ataki DDoS czy próby nieautoryzowanego dostępu). 

Many organisations must meet specific requirements regarding the availability and performance of their services. Monitoring helps ensure that Kafka services and applications using them meet these requirements. 

In production environments processing large volumes of data, it is essential for the data flow through Kafka to be as efficient as possible. Monitoring allows for identifying and eliminating bottlenecks in data flow.

Monitoring Kafka in a production environment is a matter of maintaining infrastructure efficiency and a strategic element of managing the performance, security, and operational costs of systems based on this technology.

4. How do we monitor Kafka in Flowee?

Monitoring in Flowee is realised using two tools: Prometheus and Grafana. The former collects current measurements from defined sources, such as individual microservices or Kafka brokers. Grafana, on the other hand, allows visualisation of Prometheus's measurements in the form of graphs.

Here is a repository with a ready-to-run docker-compose configuration demonstrating the operation of the configurations described in the article. docker compose demonstrating operation described in the configuration article. 

After cloning the repository to run a sample environment, use the command:, you should use the command: 

				
					docker compose -f compose.yaml up -d 
				
			

After they are launched, two panels are available in the browser: 

Grafana: http://localhost:3000/dashboards 

Kafka UI: http://localhost:8080 (hasło admin:admin) 

4.1. Monitoring Kafka brokers

Individual components of the Kafka cluster, i.e., brokers, are monitored by installing the jmx_exporter library (https://github.com/prometheus/jmx_exporter) on them. This allows individual processes to expose current measurements of monitored parameters via an endpoint on port 8888. The measured values are then periodically retrieved by the Prometheus server. 

Sample parameters that can be monitored using jmx_exporter library include: the number of active brokers, the number of active partitions, under-replicated partitions, the number of messages per second, CPU load, the amount of data sent to and received from each broker, the amount of data stored on disk by individual brokers, statistics of message producers and consumers, statistics regarding internal data replication between Kafka brokers, and many more. 

The configuration of jmx_exporter in Kafka brokers involves running a Java process with a so-called Java agent (javaagent switch):

				
					KAFKA_OPTS: '-javaagent:/bitnami/kafka/libs/jmx_prometheus_javaagent-0.20.0.jar=8888:/bitnami/kafka/libs/kafka_broker.yml' 
				
			

While the Prometheus configuration reading values provided by jmx_exporter looks like this:

				
					- job_name: "kafka-broker" 
        static_configs: 
          - targets: 
            - kafka-1:8888 
            - kafka-2:8888 
            - kafka-3:8888 
        labels: 
            env: "dev" 
         relabel_configs: 
            - source_labels: [__address__] 
            target_label: hostname 
            regex: '([^:]+)(:[0-9]+)?' 
            replacement: '${1}' 
				
			

4.2. Monitoring message consumption delays by consumers

One of the basic parameters to monitor in systems built using Kafka is the delay in message consumption.  High latency indicates that consumers are not keeping up with processing messages produced to topics in Kafka. This could be a sign of performance issues in consuming applications and requires corrective actions – such as increasing the number of microservice instances (eng. scale-out) or optimization of the implementation of components responsible for data processing.  To collect statistics regarding consumption delays, we use the kafka-lag-exporter application: https://github.com/seglo/kafka-lag-exporter It is defined in the compose.yaml file: compose.yaml: 
				
					kafka-lag-exporter: 
        image: seglo/kafka-lag-exporter:0.8.2 
        volumes: 
            - ${PWD}/kafka-lag-exporter:/opt/docker/conf 
				
			

While in the kafka-lag-exporter/application.conf directory, there is a configuration for this application: 

				
					kafka-lag-exporter { 
    port = 9999 
    client-group-id = "kafkaLagExporter"
    lookup-table-size = 120 
    poll-interval = 60 
    
    clusters = [ 
        { 
        name = "dev-cluster" 
        bootstrap-brokers = "kafka-1:29092,kafka-2:29092,kafka-3:29092" 
        
        admin-client-properties = { 
            client.id = "admin-client-id" 
            security.protocol = "PLAINTEXT" 
        } 
        
        consumer-properties = { 
            client.id = "consumer-client-id" 
            security.protocol = "PLAINTEXT" 
        } 
    } 
   ]
  }
				
			

The Grafana panel for kafka-lag-exporter is in the Kafka Lag Exporter.yaml file. It is automatically loaded at the start of the sample environment and is available at: http://localhost:3000/dashboards 

4.3 Kafka UI

The last demonstrated tool is Kafka UI: https://github.com/provectus/kafka-ui. It allows for checking the cluster's state, configuration and browsing and searching for messages stored in Kafka brokers. This is essential during development and incident analysis in the production environment.

In our demonstration environment, Kafka UI is available at: http://localhost:8080 (login and password admin:admin) 

And that's it - you now know how to monitor Kafka closely, like Big Brother! ? 😉  

  • [1] Franz Kafka: The Trial, Penguin Random House (German title: Der Prozess, 1925). English translation:  Willa and Edwin Muir / Breon Mitchell / David Wyllie / Mike Mitchell / Idris Parry / Susanne Lück and Maureen Fitzgibbons.
[2] The name Apache Kafka truly comes from Franz Kafka. Jay Kreps liked the writer's work and believed that, since the system is optimised for writing, it should be named after someone who writes in natural language.

Interesting? Feel free to share!