Grafana Plugin (`grafana/v1-alpha`)

The Grafana plugin is an optional plugin that can be used to scaffold Grafana Dashboards to allow you to check out the default metrics which are exported by projects using controller-runtime.

When to use it ?

If you are looking to observe the metrics exported by controller metrics and collected by Prometheus via Grafana.

How to use it ?

Prerequisites:

Your project must be using controller-runtime to expose the metrics via the controller default metrics and they need to be collected by Prometheus.
Access to Prometheus.
- Prometheus should have an endpoint exposed. (For prometheus-operator, this is similar as: http://prometheus-k8s.monitoring.svc:9090 )
- The endpoint is ready to/already become the datasource of your Grafana. See Add a data source
Access to Grafana. Make sure you have:
- Dashboard edit permission
- Prometheus Data source

Basic Usage

The Grafana plugin is attached to the init subcommand and the edit subcommand:

# Initialize a new project with grafana plugin
kubebuilder init --plugins grafana.kubebuilder.io/v1-alpha

# Enable grafana plugin to an existing project
kubebuilder edit --plugins grafana.kubebuilder.io/v1-alpha

The plugin will create a new directory and scaffold the JSON files under it (i.e. grafana/controller-runtime-metrics.json).

Show case:

See an example of how to use the plugin in your project:

output

Now, let’s check how to use the Grafana dashboards

Copy the JSON file
Visit <your-grafana-url>/dashboard/import to import a new dashboard.
Paste the JSON content to Import via panel json, then press Load button
Select the data source for Prometheus metrics
Once the json is imported in Grafana, the dashboard is ready.

Grafana Dashboard

Controller Runtime Reconciliation total & errors

Metrics:
- controller_runtime_reconcile_total
- controller_runtime_reconcile_errors_total
Query:
- sum(rate(controller_runtime_reconcile_total{job=“$job”}[5m])) by (instance, pod)
- sum(rate(controller_runtime_reconcile_errors_total{job=“$job”}[5m])) by (instance, pod)
Description:
- Per-second rate of total reconciliation as measured over the last 5 minutes
- Per-second rate of reconciliation errors as measured over the last 5 minutes
Sample:

Controller CPU & Memory Usage

Metrics:
- process_cpu_seconds_total
- process_resident_memory_bytes
Query:
- rate(process_cpu_seconds_total{job=“$job”, namespace=“$namespace”, pod=“$pod”}[5m]) * 100
- process_resident_memory_bytes{job=“$job”, namespace=“$namespace”, pod=“$pod”}
Description:
- Per-second rate of CPU usage as measured over the last 5 minutes
- Allocated Memory for the running controller
Sample:

Seconds of P50/90/99 Items Stay in Work Queue

Metrics
- workqueue_queue_duration_seconds_bucket
Query:
- histogram_quantile(0.50, sum(rate(workqueue_queue_duration_seconds_bucket{job=“$job”, namespace=“$namespace”}[5m])) by (instance, name, le))
Description
- Seconds an item stays in workqueue before being requested.
Sample:

Seconds of P50/90/99 Items Processed in Work Queue

Metrics
- workqueue_work_duration_seconds_bucket
Query:
- histogram_quantile(0.50, sum(rate(workqueue_work_duration_seconds_bucket{job=“$job”, namespace=“$namespace”}[5m])) by (instance, name, le))
Description
- Seconds of processing an item from workqueue takes.
Sample:

Add Rate in Work Queue

Metrics
- workqueue_adds_total
Query:
- sum(rate(workqueue_adds_total{job=“$job”, namespace=“$namespace”}[5m])) by (instance, name)
Description
- Per-second rate of items added to work queue
Sample:

Retries Rate in Work Queue

Metrics
- workqueue_retries_total
Query:
- sum(rate(workqueue_retries_total{job=“$job”, namespace=“$namespace”}[5m])) by (instance, name)
Description
- Per-second rate of retries handled by workqueue
Sample:

Number of Workers in Use

Metrics
- controller_runtime_active_workers
Query:
- controller_runtime_active_workers{job=“$job”, namespace=“$namespace”}
Description
- The number of active controller workers
Sample:

WorkQueue Depth

Metrics
- workqueue_depth
Query:
- workqueue_depth{job=“$job”, namespace=“$namespace”}
Description
- Current depth of workqueue
Sample:

Unfinished Seconds

Metrics
- workqueue_unfinished_work_seconds
Query:
- rate(workqueue_unfinished_work_seconds{job=“$job”, namespace=“$namespace”}[5m])
Description
- How many seconds of work has done that is in progress and hasn’t been observed by work_duration.
Sample:

Visualize Custom Metrics

The Grafana plugin supports scaffolding manifests for custom metrics.

Generate Config Template

When the plugin is triggered for the first time, grafana/custom-metrics/config.yaml is generated.

---
customMetrics:
#  - metric: # Raw custom metric (required)
#    type:   # Metric type: counter/gauge/histogram (required)
#    expr:   # Prom_ql for the metric (optional)
#    unit:   # Unit of measurement, examples: s,none,bytes,percent,etc. (optional)

You can enter multiple custom metrics in the file. For each element, you need to specify the metric and its type. The Grafana plugin can automatically generate expr for visualization. Alternatively, you can provide expr and the plugin will use the specified one directly.

---
customMetrics:
  - metric: memcached_operator_reconcile_total # Raw custom metric (required)
    type: counter # Metric type: counter/gauge/histogram (required)
    unit: none
  - metric: memcached_operator_reconcile_time_seconds_bucket
    type: histogram

edit ($ kubebuilder edit [OPTIONS])
init ($ kubebuilder init [OPTIONS])

Affected files

The following scaffolds will be created or updated by this plugin:

grafana/*.json

Further resources

Check out video to show how it works
Checkout the video to show how the custom metrics feature works
Refer to a sample of serviceMonitor provided by kustomize plugin
Check the plugin implementation
Grafana Docs of importing JSON file
The usage of serviceMonitor by Prometheus Operator

Grafana Plugin (`grafana/v1-alpha`)

When to use it ?

How to use it ?

Prerequisites:

Basic Usage

Show case:

Now, let’s check how to use the Grafana dashboards

Grafana Dashboard

Controller Runtime Reconciliation total & errors

Controller CPU & Memory Usage

Seconds of P50/90/99 Items Stay in Work Queue

Seconds of P50/90/99 Items Processed in Work Queue

Add Rate in Work Queue

Retries Rate in Work Queue

Number of Workers in Use

WorkQueue Depth

Unfinished Seconds

Visualize Custom Metrics

Generate Config Template

Add Custom Metrics to Config

Scaffold Manifest

Show case:

Subcommands

Affected files

Further resources