透過 Kubernetes 安裝 Prometheus 跟 alert manager

環境
 - Ubuntu 16.04
 - Kubernetes 1.6
 - Prometheus 1.6

前兩篇介紹完kubernetes的安裝,接下來,介紹一下monitor的工具 - Prometheus
嗯....其實沒有要介紹,是要直接講Prometheus中alert manager的安裝

先簡單說明一下,要怎麼安裝Prometheus
首先,準備兩個yaml檔,一個是起Prometheus container,一個是設定檔

prometheus-deployment.yaml
apiVersionv1
kindService
metadata:
  annotations:
    prometheus.io/scrape'true'
  labels:
    nameprometheus
  nameprometheus
spec:
  selector:
    appprometheus
  typeNodePort
  ports:
  - nameprometheus
    protocolTCP
    port9090
    nodePort30900
---
apiVersionextensions/v1beta1
kindDeployment
metadata:
  nameprometheus
spec:
  replicas1
  selector:
    matchLabels:
      appprometheus
  template:
    metadata:
      nameprometheus
      labels:
        appprometheus
    spec:
      containers:
      - nameprometheus
        imagequay.io/prometheus/prometheus:v1.6.0
        args:
          - '-storage.local.retention=6h'
          - '-storage.local.memory-chunks=500000'
          - '-config.file=/etc/prometheus/prometheus.yml'
          # 這一段是alert manager設定給prometheus的位址
          - '-alertmanager.url=http://<master ip>:9093'
        ports:
        - nameweb
          containerPort9090
        volumeMounts:
        - nameconfig-volume
          mountPath/etc/prometheus

        # 這一段是 alert manager 要吃的rules
        - nameconfig-volume-alert-rules
          mountPath/etc/prometheus-rules

      # 這一段是 alert manager用的
      - namealertmanager
        imagequay.io/prometheus/alertmanager:v0.6.0
        args:
        -  '-config.file=/etc/prometheus/alertmanager.yml'
        volumeMounts:
        - nameconfig-volume-alertmanager
          mountPath/etc/prometheus

      volumes:
      - nameconfig-volume
        configMap:
          nameprometheus
      
      # 這一段是 alert manager 要 mount 的路徑
      - nameconfig-volume-alertmanager
        configMap:
          nameprometheus-alertmanager
      - nameconfig-volume-alert-rules
        configMap:
          nameprometheus-alert-rules

如果只是要安裝Prometheus,上面註解的那四段可以不用
縮排不能亂縮,會出事,請小心


prometheus-configmap.yaml
apiVersionv1
kindConfigMap
metadata:
  nameprometheus
data:
  prometheus.yml: |-
    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      external_labels:
          monitor'codelab-monitor'

    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
       # 這一段是讓Prometheus知道,alert的rule在哪
       - '/etc/prometheus-rules/alert.rules'

    scrape_configs:
        - job_name'prometheus'
          static_configs:
            - targets: ['localhost:9090']

        - job_nameetcd
          static_configs:
            - targets: ['<ETCD_IP>:2379']

        - job_name'kubernetes-apiservers'
          .
          . 
          .
. . .的部分請直接參照官方文件

兩個yaml編輯好後,切到這兩個檔案的目錄下,執行
kubectl create -f prometheus-configmap.yaml
kubectl create -f prometheus-deployment.yaml


Prometheus雖然裝好了,但還需要搜集資訊的人,就是node-exporter
node-exporter-deployment.yaml
apiVersionv1
kindService
metadata:
  annotations:
    prometheus.io/scrape'true'
  labels:
    appnode-exporter
    namenode-exporter
  namenode-exporter
spec:
  clusterIPNone
  ports:
  - namescrape
    port9100
    protocolTCP
  selector:
    appnode-exporter
  typeClusterIP
---
apiVersionextensions/v1beta1
kindDaemonSet
metadata:
  namenode-exporter
spec:
  template:
    metadata:
      labels:
        appnode-exporter
      namenode-exporter
    spec:
      containers:
      - imagequay.io/prometheus/node-exporter:0.12.0
        namenode-exporter
        ports:
        - containerPort9100
          hostPort9100
          namescrape
      hostNetworktrue
      hostPIDtrue

再把exporter啟動,就大功告成!
kubectl create -f node-exporter-deployment.yaml


我們在啟動Prometheus時,也順便把alert manager啟起來了
接下來,就是alert manager程式本身的設定檔跟alert rules的設定檔
alert-manager-configmap.yaml
apiVersionv1
kindConfigMap
metadata:
  nameprometheus-alertmanager
data:
  alertmanager.yml: |-
    global:
      # The smarthost and SMTP sender used for mail notifications.
      smtp_smarthost'localhost:25'
      smtp_from'serviceadmin@yyy.com.tw'

    # The root route on which each incoming alert enters.
    route:
      receiver'pager_duty'
      group_by: ['alertname''cluster']
      group_wait30s 
      group_interval5m
      repeat_interval3h

      routes:
      - match:
          servicebackend
        receiverpager_duty
        continuetrue

    inhibit_rules:
    - source_match:
        severity'critical'
      target_match:
        severity'warning'
      # Apply inhibition if the alertname is the same.
      equal: ['alertname']

    receivers:
    - name'pager_duty'
      pagerduty_configs:
      - service_keyxxxxxxxxxxxxxxxxxx
也可以參考官方的設定

再來,最重要的alert 的 rules
alert-rules-configmap.yaml
apiVersionv1
kindConfigMap
metadata:
  nameprometheus-alert-rules
data:
  alert.rules: |-
    ## alert.rules ##
    #
    # CPU Alerts
    #
    ALERT HighCPU
      IF (100 - (avg(irate(node_cpu{job="kubernetes-service-endpoints",mode="idle"}[1m])) BY (instance) * 100)) > 80
      FOR 10m
      ANNOTATIONS {
        summary = "High CPU Usage",
        description = "This machine  has really high CPU usage for over 10m",
      }
    #
    # DNS Lookup failures
    #
    ALERT DNSLookupFailureFromPrometheus
      IF prometheus_dns_sd_lookup_failures_total > 5
      FOR 1m
      LABELS { service = "frontend" }
      ANNOTATIONS {
        summary = "Prometheus reported over 5 DNS lookup failure",
        description = "The prometheus unit reported that it failed to query the DNS.  Look at the kube-dns to see if it is having any problems",
      }

這邊有兩個規則的例子,至於語法及細節,需要參考官方文件



至於這兩個設定檔要怎麼讓alert manager吃到,必須在alert manager啟動前,就先建立好,因此最終的順序為
kubectl create -f prometheus-configmap.yaml
kubectl create -f alert-manager-configmap.yaml
kubectl create -f alert-rules-configmap.yaml
   
kubectl create -f prometheus-deployment.yaml
kubectl create -f node-exporter-deployment.yaml

但是,這樣是不是很麻煩,每次改條件,就要重新設定alert-rules-configmap.yaml,而且還需要去了解rule的語法,事情都做不完了,還有空去學這個...@@
所以,下一篇要介紹Grafana,把Prometheus的資料導到Grafana裡,利用Grafana的UI來建立alert rules跟notification


參考文件: http://blog.wercker.com/how-to-setup-alerts-on-prometheus

留言

這個網誌中的熱門文章

What's New in Ethereum Serenity (2.0)

瑞士滑雪分享2 - 策馬特

動手實做零知識 - circom