透過 Kubernetes 安裝 Prometheus 跟 alert manager
環境
- Ubuntu 16.04
- Kubernetes 1.6
- Prometheus 1.6
前兩篇介紹完kubernetes的安裝,接下來,介紹一下monitor的工具 - Prometheus
嗯....其實沒有要介紹,是要直接講Prometheus中alert manager的安裝
先簡單說明一下,要怎麼安裝Prometheus
首先,準備兩個yaml檔,一個是起Prometheus container,一個是設定檔
如果只是要安裝Prometheus,上面註解的那四段可以不用
縮排不能亂縮,會出事,請小心
. . .的部分請直接參照官方文件
兩個yaml編輯好後,切到這兩個檔案的目錄下,執行
Prometheus雖然裝好了,但還需要搜集資訊的人,就是node-exporter
再把exporter啟動,就大功告成!
我們在啟動Prometheus時,也順便把alert manager啟起來了
接下來,就是alert manager程式本身的設定檔跟alert rules的設定檔
再來,最重要的alert 的 rules
這邊有兩個規則的例子,至於語法及細節,需要參考官方文件
至於這兩個設定檔要怎麼讓alert manager吃到,必須在alert manager啟動前,就先建立好,因此最終的順序為
但是,這樣是不是很麻煩,每次改條件,就要重新設定alert-rules-configmap.yaml,而且還需要去了解rule的語法,事情都做不完了,還有空去學這個...@@
所以,下一篇要介紹Grafana,把Prometheus的資料導到Grafana裡,利用Grafana的UI來建立alert rules跟notification
參考文件: http://blog.wercker.com/how-to-setup-alerts-on-prometheus
- Ubuntu 16.04
- Kubernetes 1.6
- Prometheus 1.6
前兩篇介紹完kubernetes的安裝,接下來,介紹一下monitor的工具 - Prometheus
嗯....其實沒有要介紹,是要直接講Prometheus中alert manager的安裝
先簡單說明一下,要怎麼安裝Prometheus
首先,準備兩個yaml檔,一個是起Prometheus container,一個是設定檔
prometheus-deployment.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
labels:
name: prometheus
name: prometheus
spec:
selector:
app: prometheus
type: NodePort
ports:
- name: prometheus
protocol: TCP
port: 9090
nodePort: 30900
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
name: prometheus
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: quay.io/prometheus/prometheus:v1.6.0
args:
- '-storage.local.retention=6h'
- '-storage.local.memory-chunks=500000'
- '-config.file=/etc/prometheus/prometheus.yml'
# 這一段是alert manager設定給prometheus的位址
# 這一段是alert manager設定給prometheus的位址
- '-alertmanager.url=http://<master ip>:9093'
ports:
- name: web
containerPort: 9090
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus
# 這一段是 alert manager 要吃的rules
# 這一段是 alert manager 要吃的rules
- name: config-volume-alert-rules
mountPath: /etc/prometheus-rules
# 這一段是 alert manager用的
- name: alertmanager
image: quay.io/prometheus/alertmanager:v0.6.0
args:
- '-config.file=/etc/prometheus/alertmanager.yml'
volumeMounts:
- name: config-volume-alertmanager
mountPath: /etc/prometheus
volumes:
- name: config-volume
configMap:
name: prometheus
# 這一段是 alert manager 要 mount 的路徑
# 這一段是 alert manager 要 mount 的路徑
- name: config-volume-alertmanager
configMap:
name: prometheus-alertmanager
- name: config-volume-alert-rules
configMap:
name: prometheus-alert-rules
縮排不能亂縮,會出事,請小心
prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus
data:
prometheus.yml: |-
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
external_labels:
monitor: 'codelab-monitor'
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# 這一段是讓Prometheus知道,alert的rule在哪
# 這一段是讓Prometheus知道,alert的rule在哪
- '/etc/prometheus-rules/alert.rules'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: etcd
static_configs:
- targets: ['<ETCD_IP>:2379']
- job_name: 'kubernetes-apiservers'
.
.
.
.
.
兩個yaml編輯好後,切到這兩個檔案的目錄下,執行
kubectl create -f prometheus-configmap.yaml
kubectl create -f prometheus-deployment.yaml
Prometheus雖然裝好了,但還需要搜集資訊的人,就是node-exporter
node-exporter-deployment.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
labels:
app: node-exporter
name: node-exporter
name: node-exporter
spec:
clusterIP: None
ports:
- name: scrape
port: 9100
protocol: TCP
selector:
app: node-exporter
type: ClusterIP
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: node-exporter
spec:
template:
metadata:
labels:
app: node-exporter
name: node-exporter
spec:
containers:
- image: quay.io/prometheus/node-exporter:0.12.0
name: node-exporter
ports:
- containerPort: 9100
hostPort: 9100
name: scrape
hostNetwork: true
hostPID: true
再把exporter啟動,就大功告成!
kubectl create -f node-exporter-deployment.yaml
我們在啟動Prometheus時,也順便把alert manager啟起來了
接下來,就是alert manager程式本身的設定檔跟alert rules的設定檔
alert-manager-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-alertmanager
data:
alertmanager.yml: |-
global:
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: 'localhost:25'
smtp_from: 'serviceadmin@yyy.com.tw'
# The root route on which each incoming alert enters.
route:
receiver: 'pager_duty'
group_by: ['alertname', 'cluster']
group_wait: 30s
group_interval: 5m
也可以參考官方的設定
repeat_interval: 3h
routes:
- match:
service: backend
receiver: pager_duty
continue: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
# Apply inhibition if the alertname is the same.
equal: ['alertname']
receivers:
receivers:
- name: 'pager_duty'
pagerduty_configs:
- service_key: xxxxxxxxxxxxxxxxxx
再來,最重要的alert 的 rules
alert-rules-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-alert-rules
data:
alert.rules: |-
## alert.rules ##
#
# CPU Alerts
#
ALERT HighCPU
IF (100 - (avg(irate(node_cpu{job="kubernetes-service-endpoints",mode="idle"}[1m])) BY (instance) * 100)) > 80
FOR 10m
ANNOTATIONS {
summary = "High CPU Usage",
description = "This machine has really high CPU usage for over 10m",
}
#
# DNS Lookup failures
#
ALERT DNSLookupFailureFromPrometheus
IF prometheus_dns_sd_lookup_failures_total > 5
FOR 1m
LABELS { service = "frontend" }
ANNOTATIONS {
summary = "Prometheus reported over 5 DNS lookup failure",
description = "The prometheus unit reported that it failed to query the DNS. Look at the kube-dns to see if it is having any problems",
}
至於這兩個設定檔要怎麼讓alert manager吃到,必須在alert manager啟動前,就先建立好,因此最終的順序為
kubectl create -f prometheus-configmap.yaml
kubectl create -f alert-manager-configmap.yaml
kubectl create -f alert-rules-configmap.yaml
kubectl create -f alert-manager-configmap.yaml
kubectl create -f alert-rules-configmap.yaml
kubectl create -f prometheus-deployment.yaml
kubectl create -f node-exporter-deployment.yaml
kubectl create -f node-exporter-deployment.yaml
但是,這樣是不是很麻煩,每次改條件,就要重新設定alert-rules-configmap.yaml,而且還需要去了解rule的語法,事情都做不完了,還有空去學這個...@@
所以,下一篇要介紹Grafana,把Prometheus的資料導到Grafana裡,利用Grafana的UI來建立alert rules跟notification
參考文件: http://blog.wercker.com/how-to-setup-alerts-on-prometheus
留言
張貼留言