透過 Kubernetes 安裝 Prometheus 跟 alert manager

- 5月 16, 2017

環境
- Ubuntu 16.04
- Kubernetes 1.6
- Prometheus 1.6

前兩篇介紹完kubernetes的安裝，接下來，介紹一下monitor的工具 - Prometheus
嗯....其實沒有要介紹，是要直接講Prometheus中alert manager的安裝

先簡單說明一下，要怎麼安裝Prometheus
首先，準備兩個yaml檔，一個是起Prometheus container，一個是設定檔

prometheus-deployment.yaml

apiVersion: v1

kind: Service

metadata:

annotations:

prometheus.io/scrape: 'true'

labels:

spec:

selector:

app: prometheus

type: NodePort

ports:

- name: prometheus

protocol: TCP

port: 9090

nodePort: 30900

---

apiVersion: extensions/v1beta1

kind: Deployment

metadata:

spec:

replicas: 1

selector:

matchLabels:

app: prometheus

template:

metadata:

labels:

app: prometheus

spec:

containers:

- name: prometheus

image: quay.io/prometheus/prometheus:v1.6.0

args:

- '-storage.local.retention=6h'

- '-storage.local.memory-chunks=500000'

- '-config.file=/etc/prometheus/prometheus.yml'
# 這一段是alert manager設定給prometheus的位址

- '-alertmanager.url=http://<master ip>:9093'

ports:

- name: web

containerPort: 9090

volumeMounts:

- name: config-volume

mountPath: /etc/prometheus

# 這一段是 alert manager 要吃的rules

- name: config-volume-alert-rules

mountPath: /etc/prometheus-rules

# 這一段是 alert manager用的

- name: alertmanager

image: quay.io/prometheus/alertmanager:v0.6.0

args:

- '-config.file=/etc/prometheus/alertmanager.yml'

volumeMounts:

- name: config-volume-alertmanager

mountPath: /etc/prometheus

volumes:

- name: config-volume

configMap:

- name: config-volume-alertmanager

configMap:

- name: config-volume-alert-rules

configMap:

如果只是要安裝Prometheus，上面註解的那四段可以不用
縮排不能亂縮，會出事，請小心

prometheus-configmap.yaml

apiVersion: v1

kind: ConfigMap

metadata:

data:

prometheus.yml: |-

# my global config

global:

scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

external_labels:

monitor: 'codelab-monitor'

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:
# 這一段是讓Prometheus知道，alert的rule在哪

- '/etc/prometheus-rules/alert.rules'

scrape_configs:

- job_name: 'prometheus'

static_configs:

- targets: ['localhost:9090']

- job_name: etcd

static_configs:

- targets: ['<ETCD_IP>:2379']

- job_name: 'kubernetes-apiservers'

          .
          .
          .

. . .的部分請直接參照官方文件

兩個yaml編輯好後，切到這兩個檔案的目錄下，執行

kubectl create -f prometheus-configmap.yaml

kubectl create -f prometheus-deployment.yaml

Prometheus雖然裝好了，但還需要搜集資訊的人，就是node-exporter

node-exporter-deployment.yaml

apiVersion: v1

kind: Service

metadata:

annotations:

prometheus.io/scrape: 'true'

labels:

app: node-exporter

spec:

clusterIP: None

ports:

- name: scrape

port: 9100

protocol: TCP

selector:

app: node-exporter

type: ClusterIP

---

apiVersion: extensions/v1beta1

kind: DaemonSet

metadata:

spec:

template:

metadata:

labels:

app: node-exporter

spec:

containers:

- image: quay.io/prometheus/node-exporter:0.12.0

ports:

- containerPort: 9100

hostPort: 9100

hostNetwork: true

hostPID: true

再把exporter啟動，就大功告成！

kubectl create -f node-exporter-deployment.yaml

我們在啟動Prometheus時，也順便把alert manager啟起來了
接下來，就是alert manager程式本身的設定檔跟alert rules的設定檔

alert-manager-configmap.yaml

apiVersion: v1

kind: ConfigMap

metadata:

data:

alertmanager.yml: |-

global:

# The smarthost and SMTP sender used for mail notifications.

smtp_smarthost: 'localhost:25'

smtp_from: 'serviceadmin@yyy.com.tw'

# The root route on which each incoming alert enters.

route:

receiver: 'pager_duty'

group_by: ['alertname', 'cluster']

group_wait: 30s

group_interval: 5m

repeat_interval: 3h

routes:

- match:

service: backend

receiver: pager_duty

continue: true

inhibit_rules:

- source_match:

severity: 'critical'

target_match:

severity: 'warning'

# Apply inhibition if the alertname is the same.

equal: ['alertname']

receivers:

- name: 'pager_duty'

pagerduty_configs:

- service_key: xxxxxxxxxxxxxxxxxx

也可以參考官方的設定

再來，最重要的alert 的 rules

alert-rules-configmap.yaml

apiVersion: v1

kind: ConfigMap

metadata:

data:

alert.rules: |-

## alert.rules ##

# CPU Alerts

ALERT HighCPU

IF (100 - (avg(irate(node_cpu{job="kubernetes-service-endpoints",mode="idle"}[1m])) BY (instance) * 100)) > 80

FOR 10m

ANNOTATIONS {

summary = "High CPU Usage",

description = "This machine has really high CPU usage for over 10m",

}

# DNS Lookup failures

ALERT DNSLookupFailureFromPrometheus

IF prometheus_dns_sd_lookup_failures_total > 5

FOR 1m

LABELS { service = "frontend" }

ANNOTATIONS {

summary = "Prometheus reported over 5 DNS lookup failure",

description = "The prometheus unit reported that it failed to query the DNS. Look at the kube-dns to see if it is having any problems",

}

這邊有兩個規則的例子，至於語法及細節，需要參考官方文件

至於這兩個設定檔要怎麼讓alert manager吃到，必須在alert manager啟動前，就先建立好，因此最終的順序為

kubectl create -f prometheus-configmap.yaml
kubectl create -f alert-manager-configmap.yaml
kubectl create -f alert-rules-configmap.yaml

kubectl create -f prometheus-deployment.yaml
kubectl create -f node-exporter-deployment.yaml

但是，這樣是不是很麻煩，每次改條件，就要重新設定alert-rules-configmap.yaml，而且還需要去了解rule的語法，事情都做不完了，還有空去學這個...@@
所以，下一篇要介紹Grafana，把Prometheus的資料導到Grafana裡，利用Grafana的UI來建立alert rules跟notification

參考文件: http://blog.wercker.com/how-to-setup-alerts-on-prometheus

搜尋此網誌

Kimi的筆記

透過 Kubernetes 安裝 Prometheus 跟 alert manager

留言

張貼留言

這個網誌中的熱門文章

動手實做零知識 - circom

深入瞭解 zk-SNARKs

瞭解神秘的 ZK-STARKs