Monitoring of Kubernetes platforms

Liquid Reply develops solutions for dynamic environments

Auto-scaling requires auto-monitoring

Dynamic IT infrastructures are characterised by the fact that they can quickly scale IT resources for applications up or down according to the needs of the business departments. But no matter to what extent the resources are available: They must be properly monitored and logged.

A company in the automotive industry was faced with the challenge of keeping a dynamically scaling system transparent. The dynamics resulted from multiple autoscaling infrastructures and a complex, shifting micro-service application. Therefore, the manual effort for the long-term storage of observation data had to be reduced to a minimum and at the same time be cost-efficient.

The requirements

An legacy monitoring and logging stack, which did not meet the requirements, as well as a lack of standards in the entire system posed central hurdles

  • The monitoring and logging system had to recognize scalings and was not allowed to trigger false-positive alarms
  • Traditional surveillance systems rely on host names or IP addresses, which are static information that does not change. In dynamic environments these parameters sometimes change very often, which is why this information could not be used 
  • The monitoring and logging systems had to be scalable with the use of different AWS accounts. Therefore, the monitoring systems required autoscaling functions

Adjusted open source projects

Liquid Reply has set up a monitoring environment that meets the requirements to monitor and log infrastructure as well as applications in the best possible way.

For the solution the Prometheus, Thanos, Loki and Grafana projects of the Cloud Native Computing Foundation (CNCF) were used. This revealed the lack of use of standards throughout the system, especially in the overall monitoring of various AWS services. Liquid Reply therefore set up a cross-account network with native AWS services.

To overcome the outdated monitoring and logging stack, Liquid Reply relied on Helm charts developed by the open source community. This extended the team with various functions that met the client's requirements.

Automated monitoring of new clusters

Within a short period of time Liquid Reply succeeded in establishing a new, stable and secure monitoring and logging approach. With its versatility and optimal adaptation to the system which was to be monitored it delivered a convincing performance. By integrating Liquid Reply's solution, the client is now able to create new Kubernetes clusters, which are automatically monitored and logged by the central monitoring cluster. This reduces the potential for manual failures of IT resources and indirectly reduces costs.