Grafana Loki on Azure: A Comprehensive Installation and Configuration Guide

Author

Reads 1.1K

Nurse looking at a Monitor
Credit: pexels.com, Nurse looking at a Monitor

To get started with Grafana Loki on Azure, you'll need to create a new Azure Kubernetes Service (AKS) cluster. This will serve as the foundation for your Loki deployment.

First, you'll need to create a new resource group in Azure, which will hold all the resources for your Loki setup. You can do this by navigating to the Azure portal and clicking on the "Create a resource group" button.

Next, you'll need to create a new AKS cluster within that resource group. This will involve specifying the desired node count and VM size for your cluster.

The recommended node count for a Loki deployment is 3 nodes, with one node serving as the head node and the other two nodes serving as data nodes.

Why Grafana Loki on Azure?

Grafana Loki on Azure offers a cost-effective solution for log management.

Storing the same set of logs in Loki requires far less storage than with other solutions, making it a cheaper option to run.

Credit: youtube.com, Mastering Grafana Loki: Complete Guide to Installation, Configuration, and Integration | Part 1

Loki's minimal indexing approach means that it can handle fast writes and tiny indexes, which is particularly useful for log data that's constantly being generated.

One of the key benefits of Grafana Loki on Azure is its ability to log any and all formats, giving you the flexibility to work with different types of log data.

Here are some of the key advantages of using Grafana Loki on Azure:

  • Cheaper to run
  • Simpler to operate
  • Fast queries

Configuration and Setup

To configure Azure Monitor in Grafana, you need to create an app registration and service principal in Azure AD. This will require assigning the Reader role to the app registration on the subscription.

You can choose between Managed Identity and Workload Identity for secure authentication. Managed Identity is suitable for Azure-hosted Grafana, while Workload Identity is ideal for Kubernetes environments like AKS.

Here are the key configuration variables for Managed Identity and Workload Identity:

Configuration

To configure Azure Active Directory (AD) authentication, you need to create an app registration and service principal in Azure AD. This will allow you to authenticate the data source and access Azure resources securely.

Credit: youtube.com, Getting Started with Network Configuration Manager

You can configure the Azure Monitor data source to use Managed Identity for secure authentication without entering credentials into Grafana. This is especially useful if you host Grafana in Azure, such as in App Service or Azure Virtual Machines.

To enable Managed Identity, set the `managed_identity_enabled` flag to `true` in the `[azure]` section of the Grafana server configuration.ini. Then, in the Azure Monitor data source configuration, set Authentication to Managed Identity.

Alternatively, you can configure the Azure Monitor data source to use Workload Identity for secure authentication without manually configuring credentials via Azure AD App Registrations. This is useful if you host Grafana in a Kubernetes environment, such as AKS.

To enable Workload Identity, set the `workload_identity_enabled` flag to `true` in the `[azure]` section of the Grafana server configuration.ini. Then, in the Azure Monitor data source configuration, set Authentication to Workload Identity.

Here are the configuration variables that can control the authentication method:

Installation

Stunning altocumulus clouds in the blue sky creating a serene and fluffy pattern.
Credit: pexels.com, Stunning altocumulus clouds in the blue sky creating a serene and fluffy pattern.

To get Loki up and running on Azure, you'll need to create an Azure Storage Account and add a container called loki. This will be the foundation of your Loki setup.

First, create an Azure Storage Account and add a container called loki. This will be the storage space for your Loki data.

Next, you'll need to add the repo into Helm and get the latest version. This will ensure you have the most up-to-date version of Loki.

To configure the storage values for your chart, create a custom values file called az-values.yaml. In this file, you'll need to set the value for "auth_enabled" to false, but be aware that this may not be suitable for a production environment.

You'll also need to set the following attributes to 'azure': storage.type, compactor.shared_store, schema_config.configs.object_store, and storage_config.boltdb_shipper.shared_store.

Here's a quick reference guide to help you remember the attributes you need to set:

  • storage.type
  • compactor.shared_store
  • schema_config.configs.object_store
  • storage_config.boltdb_shipper.shared_store

Once you've configured the storage values, run your Helm install. If everything goes smoothly, you should see a response indicating that the installation was successful.

Loki Architecture and Storage

Credit: youtube.com, Mastering Grafana Loki: Complete Guide to Installation, Configuration, and Integration | Part 1

Loki is a sophisticated log aggregation and query platform carefully architected for large scale ingestion and economical storage.

The main components of Loki include multiple components that work together to achieve this goal.

Loki stores all data in a single object storage backend, such as Amazon S3, Google Cloud Storage, or Azure Blob Storage.

This mode uses an adapter called index shipper to store index files the same way we store chunk files in object storage.

Each of the tools used, including Thanos and Tempo, needs an object store to store the collected data.

Loki Architecture

Loki is a sophisticated log aggregation and query platform carefully architected for large scale ingestion and economical storage.

Its multiple components work together seamlessly to handle high volumes of data. The Loki Architecture is designed to be highly scalable and flexible.

Loki has multiple components, including the main ones we're interested in. These components are the core of Loki's functionality.

Loki's architecture is built with large-scale ingestion and economical storage in mind. This allows it to handle massive amounts of data with ease.

Storage

Credit: youtube.com, Grafana Loki Introduction and Architecture

Loki stores all data in a single object storage backend, such as Amazon Simple Storage Service (S3), Google Cloud Storage (GCS), Azure Blob Storage, among others.

This mode uses an adapter called index shipper (or short shipper) to store index (TSDB or BoltDB) files the same way we store chunk files in object storage.

Each of the tools we used - Thanos, Loki, Tempo - needs an object store to store the collected data.

We utilized Azure Storage Accounts and their storage containers for this purpose, with each application getting its own StorageAccount and StorageContainer.

Creating these resources in Terraform is very simple, requiring only two resources to be added.

Azure Storage Accounts can be accessed through Keyvault, making it easy to access the access key needed for Thanos to connect to the storage account.

The same storage account can be used for chunk and index storage, with BoltDB being used with a shared Azure store.

Labels, Streams, Chunks

Credit: youtube.com, Introduction to Ingesting logs with Loki | Zero to Hero: Loki | Grafana

In Loki, vast volumes of data need to be broken down into manageable units.

The organising principle in Loki is a stream.

Data is organised into streams to make it easier to handle.

A stream is a collection of related data, such as logs from a single application.

This approach allows Loki to efficiently process and store large amounts of data.

Loki also divides data into smaller chunks, making it even more manageable.

Chunks are the smallest units of data in Loki, and they're used to store and process individual data points.

Planning and Configuration

Planning and configuration are crucial steps in implementing Grafana Loki on Azure. You need to consider the huge scale of log ingestion, economic storage, log parsing and indexing, and querying. This means you'll have to plan a scalable and performant system, taking into account your logging flows and business needs.

To achieve this, you'll need to understand your logging volumes and peak ingestion rates, specify a sufficient number of replicas, and have benchmarks for CPU and memory requirements. You should also calculate potential storage costs and define policies for archiving in cold storage and retention.

Credit: youtube.com, How to Deploy Azure Managed Grafana

In terms of configuration, you have two main architectural options for sending application logs to Loki: using a tool like Promtail or forwarding logs via the OpenTelemetry Collector. For this example, we'll be using the second approach, sending logs via the oTel collector, which is suitable for running on Azure Kubernetes Service (AKS) cluster.

Here are some key configuration variables to consider:

By considering these planning and configuration factors, you'll be well on your way to implementing a scalable and performant Grafana Loki system on Azure.

Planning and Configuration

The first step in planning and configuration is to assess the components you'll be using. At the time of writing this article, all of the mentioned components support Azure, but they are at a very early stage in doing so.

You'll need to research and gather resources to set everything up properly. Most of the articles you'll find will be focused on other cloud vendors such as AWS or GCP.

Keep in mind that the amount of resources and documentation available for Azure is limited. Let's see how we deployed everything to be Azure compatible.

The deployment process will require some trial and error, but with patience and persistence, you'll be able to get everything up and running.

Labels

Credit: youtube.com, How to Use Labels

In Loki, labels are a crucial concept that define streams. Each unique combination of labels and values constitutes a stream.

Labels can be inserted into logs in two primary ways: through the OpenTelemetry Configuration in our code or through the OpenTelemetry Collector. We can use one or both of these methods.

You have the flexibility to choose which method suits your needs best.

Planning

Planning a log aggregation system requires careful consideration of its architecture and functional demands.

Logs need to be ingested at a huge scale, which poses a significant challenge.

To address this, log aggregation systems have architectures consisting of multiple layers and components.

These components must be able to handle parsing and indexing of logs, as well as storing them economically.

In addition, logs need to be queried economically, which requires efficient data retrieval mechanisms.

To plan a scalable and performant system, you need to have a certain amount of familiarity with these architectures and an understanding of your own logging flows and business needs.

Credit: youtube.com, Episode 9: Configuration Management Plan (CM-9 Explained) | KamilSec

Before installing a log aggregation system like Loki, you need to have a clear understanding of your logging volumes and peak ingestion rates.

You should also specify a sufficient number of replicas and have benchmarks for CPU and memory requirements.

Calculating potential storage costs and defining policies for archiving in cold storage and retention are also crucial steps.

It's essential to place a budget on the Resource Group for your containers and configure alerts to avoid unexpected costs.

Here are some key considerations to keep in mind:

  • Specify a sufficient number of replicas
  • Have benchmarks for CPU and memory requirements
  • Calculate potential storage costs
  • Define policies for archiving in cold storage and retention
  • Place a budget on the Resource Group for your containers
  • Configure alerts to avoid unexpected costs

Loki on Azure

Loki on Azure can be initially problematic due to a lack of documentation specific to Azure in critical areas such as storage configuration.

The Grafana documentation is open source, which means it's up to the Azure community to fill in the gaps.

The documentation tends to focus mostly on AWS and GCP, leaving Azure users to figure things out on their own.

Grafana and Visualization

Grafana is the tool that ties everything together, allowing us to visualize the data we've gathered. We use it to create dashboards and run ad-hoc queries.

Credit: youtube.com, How to Deploy Azure Managed Grafana

Grafana is deployed through Helmfile, just like the other components. It's recommended to deploy Grafana with a MySQL/PostgreSQL database for backend storage, but for simplicity, we're using Kubernetes Persistent Volumes to store its data.

Using Grafana, we can bundle a set of system dashboards that show us data about cluster resources, such as CPU and memory utilization, network and filesystem metrics, and metrics from ingresses.

Sending Telemetry

Sending telemetry is a crucial step in getting your logs to Loki. We've configured oTel logging in our service to send logs to the loki-write service.

You'll need to configure values for your OpenTelemetry Helm Chart to send telemetry to Loki. This can be done using the oltphttp exporter, which has replaced the dedicated Loki exporter.

To make your telemetry streams more manageable, you can switch off collection of Kubernetes metrics and logs. However, this may not be suitable for a production environment.

We've created a Kubernetes service for our Collector, which runs in LoadBalancer mode. This service is specific to Azure AKS and may require different annotations for other cloud providers.

In the config/processors/resource section, we've defined a number of attributes that will be surfaced in Loki as labels. These labels can be used for organizing and querying our logs.

We've used the from_attribute keyword for renaming existing attributes and the value keyword for creating new attributes in this section.

Visualization

Credit: youtube.com, Grafana Explained in Under 5 Minutes ⏲

Grafana is the tool that ties all of this together, allowing us to visualize the data we gathered and create dashboards.

We use it to run ad-hoc queries and get insights into our data. Grafana is deployed the same way as the other components, through Helmfile.

It's recommended to deploy Grafana with a MySQL/PostgreSQL database for the backend storage, but for simplicity, we're using Kubernetes Persistent Volumes to store its data.

We bundle a set of system dashboards that show gather data about cluster resources such as resource (cpu/mem) utilization, network and filesystem metrics, and metrics from ingresses.

Apart from the standard visualization of data using dashboards, Grafana provides a very useful “Explore” feature, which enables us to run ad-hoc queries for all of our datasources.

Using this feature, we can easily search for logs and run Loki’s LogQL queries or display traces in Tempo.

Frequently Asked Questions

What is Loki Azure?

Loki is a log aggregation system, not a product specific to Azure, but it can be integrated with Azure services. For Loki on Azure, consider pairing it with Azure Object Storage and Azure Kubernetes Service (AKS) for a scalable logging solution.

What database does Grafana Loki use?

For Loki 2.8 and newer, Grafana Loki recommends using TSDB as its index store. TSDB is the preferred choice for optimal performance and functionality.

Does Grafana Cloud include Loki?

Yes, Grafana Cloud Logs is powered by Grafana Loki, enabling fast and scalable log analysis. With Loki, you can run complex queries at scale without worrying about lost logs.

Calvin Connelly

Senior Writer

Calvin Connelly is a seasoned writer with a passion for crafting engaging content on a wide range of topics. With a keen eye for detail and a knack for storytelling, Calvin has established himself as a versatile and reliable voice in the world of writing. In addition to his general writing expertise, Calvin has developed a particular interest in covering important and timely subjects that impact society.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.