Automate monitoring with the Terraform Datadog provider
Datadog is a cloud monitoring platform that integrates with your infrastructure and gives you real-time visibility into your operations. With the Datadog Terraform provider, you can create custom monitors and dashboards for the resources you already manage, with or without Terraform, as well as new infrastructure automatically.
In this tutorial, you will deploy a demo Nginx application to a Kubernetes cluster with Helm and install the Datadog agent across the cluster. The Datadog agent reports the cluster health back to your Datadog dashboard. You will then create a monitor for this cluster in Terraform.
Prerequisites
This tutorial assumes you are familiar with the standard Terraform workflow. If you are unfamiliar with Terraform, complete the Get Started tutorials first.
For this tutorial, you will need:
- a Datadog trial account
- Terraform 1.1+
- an EKS cluster
Provision Kubernetes
Complete the provision an EKS cluster tutorial and do not destroy your cluster.
Get Datadog API credentials
Once you have signed up for your Datadog trial, you need to retrieve your API and Application keys.
Log into your Datadog account and navigate to the API Keys section on the Organization Settings page.
Your API key is automatically generated and is obscured for security. Click on the API key to show more information, then click Copy. Save this somewhere safe.
To generate an application key, click the Application Keys on the Organization Settings page.
Click New Key, type in "Terraform" as your new application key name and click Create Key. Click Copy and save the key somewhere safe.
These keys are the credentials Terraform will use to create monitors and dashboards on your behalf. Together, they give full access to your Datadog account, so treat them like a password and do not share or check them into version control.
Clone the example repository
Ensure that you are not inside the learn-terraform-provision-eks-cluster
you
created in the EKS cluster tutorial. Then clone the configuration for this
tutorial.
$ git clone https://github.com/hashicorp/learn-terraform-datadog-local.git
Change into the repository directory.
$ cd learn-terraform-datadog-local
Deploy your Kubernetes application
Open the terraform.tf
configuration. This file lists the minimum versions of your Datadog, Helm, AWS, and Kubernetes providers, and the minimum version of Terraform.
Open kubernetes.tf
in your file editor. This tutorial will walk you through each block.
The kubernetes_namespace
block declares your new namespace, which is named after the beacon
image that the rest of the tutorial will use.
resource "kubernetes_namespace" "beacon" { metadata { name = "beacon" }}
Update the configuration to read your Terraform state from your EKS deployment. Select the HCP Terraform tab if you deployed your cluster using HCP Terraform. If you deployed your cluster using Terraform Community Edition, choose the Terraform Community Edition tab.
Add the variables for your HCP Terraform organization and workspace to the variables.tf
file.
variable "tfc_org" { type = string description = "TFC Organization"} variable "tfc_workspace" { type = string description = "TFC Workspace" default = "learn-terraform-eks"}
Run the following command, replacing <YOUR_TFC_ORG>
with your HCP Terraform organization name.
$ export TF_VAR_tfc_org="<YOUR_TFC_ORG>"
If you did not deploy your EKS cluster using the default workspace name, run the following command. Replace <YOUR_TFC_WORKSPACE>
with your HCP Terraform workspace name.
$ export TF_VAR_tfc_workspace="<YOUR_TFC_WORKSPACE>"
The kubernetes_deployment
block defines the number of nodes in the cluster, assigns metadata, and defines the container image. This configuration deploys a beacon:datadog
image. This container image is custom-built by HashiCorp employees for this tutorial.
resource "kubernetes_deployment" "beacon" { metadata { name = var.application_name namespace = kubernetes_namespace.beacon.id labels = { app = var.application_name } } spec { replicas = 3 selector { match_labels = { app = var.application_name } } template { metadata { labels = { app = var.application_name } } spec { container { image = "onlydole/beacon:datadog" name = var.application_name } } } }}
Finally, the kubernetes_service
resource exposes the beacon
service using a load balancer on port 8080
.
resource "kubernetes_service" "beacon" { metadata { name = var.application_name namespace = kubernetes_namespace.beacon.id } spec { selector = { app = kubernetes_deployment.beacon.metadata[0].labels.app } port { port = 8080 target_port = 80 } type = "LoadBalancer" }}
Your application_name
variable is defined in the variables.tf
file and is set to a default value of beacon
.
Now that you have reviewed the infrastructure, initialize your configuration.
$ terraform init
Apply your configuration. Remember to confirm your apply with a yes
.
$ terraform apply
Verify your namespace.
$ kubectl get namespacesNAME STATUS AGEbeacon Active 10m## ...
Verify your deployment.
$ kubectl get deployment --namespace=beaconNAME READY UP-TO-DATE AVAILABLE AGEbeacon 3/3 3 3 10m
In the next step, you will deploy the Datadog Agent to your Kubernetes cluster as a DaemonSet in order to start collecting your cluster and application metrics, traces, and logs. To do this, you will use the Helm provider to deploy the datadog/datadog helm chart.
Deploy the Datadog Agent to your nodes with Helm
Next, deploy the Datadog helm
chart. This chart adds the Datadog Agent to all nodes in your cluster via a DaemonSet.
Copy and paste the configuration below into helm_datadog.tf
provider "helm" { kubernetes { config_path = "~/.kube/config" }} resource "helm_release" "datadog_agent" { name = "datadog-agent" chart = "datadog" repository = "https://helm.datadoghq.com" version = "3.10.9" namespace = kubernetes_namespace.beacon.id set_sensitive { name = "datadog.apiKey" value = var.datadog_api_key } set { name = "datadog.site" value = var.datadog_site } set { name = "datadog.logs.enabled" value = true } set { name = "datadog.logs.containerCollectAll" value = true } set { name = "datadog.leaderElection" value = true } set { name = "datadog.collectEvents" value = true } set { name = "clusterAgent.enabled" value = true } set { name = "clusterAgent.metricsProvider.enabled" value = true } set { name = "networkMonitoring.enabled" value = true } set { name = "systemProbe.enableTCPQueueLength" value = true } set { name = "systemProbe.enableOOMKill" value = true } set { name = "securityAgent.runtime.enabled" value = true } set { name = "datadog.hostVolumeMountPropagation" value = "HostToContainer" }}
This Helm configuration requires your Datadog API and application keys. Set these values as environment variables in your terminal.
Run the following command, replacing <Your-API-Key>
with your Datadog API key you saved earlier.
$ export TF_VAR_datadog_api_key="<Your-API-Key>"
Repeat this process with the application key. Replace <Your-App-Key>
with your Datadog application key you saved earlier.
$ export TF_VAR_datadog_app_key="<Your-App-Key>"
Note the URL of the Datadog website and refer to the Getting Started with Datadog Sites documentation to determine the correct values for the datadog_site
and datadog_api_url
variables. This tutorial defaults to using values for site US1. If you are on a different site, set the datadog_site
and datadog_api_url
to the values in the Datadog documentation. For example, if you are on site US5, run the following commands.
$ export TF_VAR_datadog_site="us5.datadoghq.com"
$ export TF_VAR_datadog_api_url="https://api.us5.datadoghq.com"
Add the values for your Datadog keys to the variables.tf
file. Terraform will apply the environment variable values to the corresponding variable declarations.
variable "datadog_api_key" { type = string description = "Datadog API Key"} variable "datadog_app_key" { type = string description = "Datadog Application Key"} variable "datadog_site" { type = string description = "Datadog Site Parameter" default = "datadoghq.com"} variable "datadog_api_url" { type = string description = "Datadog API URL" default = "https://api.datadoghq.com"}
Apply your configuration. Remember to confirm your apply with a yes
.
$ terraform apply
In the next section, you will create monitoring criteria for this cluster with the Datadog provider.
Create a metric alert with the Datadog provider
The datadog_monitor
resource will report threshold errors in the Kubernetes pods and report errors if any pods go down.
Copy and paste the configuration below to datadog_metrics.tf
.
provider "datadog" { api_key = var.datadog_api_key app_key = var.datadog_app_key api_url = var.datadog_api_url} resource "datadog_monitor" "beacon" { name = "Kubernetes Pod Health" type = "metric alert" message = "Kubernetes Pods are not in an optimal health state. Notify: @operator" escalation_message = "Please investigate the Kubernetes Pods, @operator" query = "max(last_1m):sum:kubernetes.containers.running{short_image:beacon} <= 1" monitor_thresholds { ok = 3 warning = 2 critical = 1 } notify_no_data = true tags = ["app:beacon", "env:demo"]}
The datadog_monitor.beacon
resource notifies and escalates the health of the Kubernetes "beacon" application. The query
argument is how Datadog communicates with the pods.
- If all three pods are operational, your Datadog monitor status report as "OK".
- If any pods go down, your Datadog monitor status will change to "Warn".
- If more than one pod goes down, your Datadog monitor status will change to "Alert".
Apply your configuration to create a new Datadog monitor. Remember to confirm your apply with a yes
.
$ terraform apply
Navigate to the Datadog Monitor page. Your Kubernetes Pod Health monitor is reporting here now.
Create a synthetic alert with the Datadog provider
A synthetic check allows Datadog to check a specific webpage at intervals of your choice. The datadog_synthetics_test
resource can create and manage API and URL performance monitors. If the URL times out or does not return the expected value Datadog will alert you.
Copy and paste the configuration below into datadog_synthetics.tf
.
resource "datadog_synthetics_test" "beacon" { type = "api" subtype = "http" request_definition { method = "GET" url = "http://<Host_URL>" } assertion { type = "statusCode" operator = "is" target = "200" } locations = ["aws:us-west-2"] options_list { tick_every = 900 min_location_failed = 1 } name = "Beacon API Check" message = "Oh no! Light from the Beacon app is no longer shining!" tags = ["app:beacon", "env:demo"] status = "live"}
Use terraform output
to return the endpoint of the Beacon service.
$ terraform output beacon_endpoint
Update <Host_URL>
with the Beacon service address.
Apply your configuration to create a new synthetic monitor. Remember to confirm your apply with a yes
.
$ terraform apply
Navigate to the Datadog Monitor page to view your "Beacon API Check" monitor.
Create a Datadog dashboard
The Datadog dashboard is an easily accessible dashboard for your monitors in the Datadog UI, which is useful if you have several monitors and need to group them together for visibility. This configuration contains the dashboard setup for your metrics and synthetics monitors with the datadog_dashboard
resource.
Copy and paste the configuration below into datadog_dashboard.tf
.
resource "datadog_dashboard" "beacon" { title = "Beacon Service" description = "A Datadog Dashboard for the ${kubernetes_deployment.beacon.metadata[0].name} deployment" layout_type = "ordered" widget { hostmap_definition { no_group_hosts = true no_metric_hosts = true node_type = "container" title = "Kubernetes Pods" request { fill { q = "avg:process.stat.container.cpu.total_pct{image_name:onlydole/beacon} by {host}" } } style { palette = "hostmap_blues" palette_flip = false } } } widget { timeseries_definition { show_legend = false title = "CPU Utilization" request { display_type = "line" q = "top(avg:kubernetes.cpu.usage.total{image_name:onlydole/beacon} by {short_image,container_id}, 10, 'mean', 'desc')" style { line_type = "solid" line_width = "normal" palette = "dog_classic" } } yaxis { include_zero = true max = "auto" min = "auto" scale = "linear" } } } widget { alert_graph_definition { alert_id = datadog_monitor.beacon.id title = "Kubernetes Node CPU" viz_type = "timeseries" } } widget { hostmap_definition { no_group_hosts = true no_metric_hosts = true node_type = "host" title = "Kubernetes Nodes" request { fill { q = "avg:system.cpu.user{*} by {host}" } } style { palette = "hostmap_blues" palette_flip = false } } } widget { timeseries_definition { show_legend = false title = "Memory Utilization" request { display_type = "line" q = "top(avg:kubernetes.memory.usage{image_name:onlydole/beacon} by {container_name}, 10, 'mean', 'desc')" style { line_type = "solid" line_width = "normal" palette = "dog_classic" } } yaxis { include_zero = true max = "auto" min = "auto" scale = "linear" } } }}
Apply your configuration to create a new Datadog dashboard for your metrics and synthetics monitors. Remember to confirm your apply with a yes
.
$ terraform apply
Navigate to the Datadog Dashboard page. Your Beacon Service dashboard is reporting here now. Click on the Beacon Service dashboard to see all of your monitors reporting.
Clean up resources
After verifying that the resources were deployed successfully, run terraform destroy
to destroy them. Remember to respond to the confirmation prompt with yes
.
$ terraform destroy
Note
If you provisioned an EKS cluster for use with this tutorial, destroy it as well.
Next steps
Now that you have successfully created a metric monitor, an endpoint monitor, and a Datadog dashboard, consider reviewing the resources below.