博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
kubernetes之收集集群的events,监控集群行为
阅读量:4991 次
发布时间:2019-06-12

本文共 10278 字,大约阅读时间需要 34 分钟。

一、概述

线上部署的k8s已经扛过了双11的洗礼,期间先是通过对网络和监控的优化顺利度过了双11并且表现良好。先简单介绍一下我们kubernetes的使用方式:

    物理机系统:Ubuntu-16.04(kernel 升级到4.17)

    kuberneets-version:1.13.2

    网络组件:calico(采用的是BGP模式+bgp reflector)

    kube-proxy:使用的是ipvs模式

    监控:prometheus+grafana

    日志: fluentd + ES

    metrics: metrics-server

    HPA:cpu + memory

    告警:钉钉

    CI/CD: gitlab-ci/gitlab-runner

    应用管理工具:helm、chartmuseum(不建议直接使用helm,helm charts可读性很差,学习成本较高)

    由于k8s、物理环境共存,需要打通通网络提供访问:kube-gateway

有的地方涉及到公司内部的东西不方便写出来,但是绝大部分在我之前的博客都有介绍,有兴趣的可以参考一下。

 

自己的反思:

开始的时候,k8s集群在线上跑了一段时间,但是我发现我对集群内部的变化没有办法把控的很清楚,比如某个pod被重新调度了、某个node节点上的imagegc失败了、某个hpa被触发了等等,而这些都是可以通过events拿到的,但是events并不是永久存储的,它包含了集群各种资源的状态变化,所以我们可以通过收集分析events来了解整个集群内部的变化,经过一番探索找到一个开源的eventrouter来收集events事件,经过一些改造使其符合我们的业务场景,更名为eventrouter-kafka(https://github.com/cuishuaigit/eventrouter-kafka)直接将修改配置直传kafka,而不是需要各种配置,感觉原版的配置有些繁琐不是很好用,而我们的日志也是走kafka队列的,减轻ES的写压力。现在的events收集流程:

eventrouter---->kafka---->logstash(过滤、解析)----->ES------elastalert---->钉钉

经过添加上面的收集events使k8s集群又完善了一步。

 

二、简述流程

1、部署eventrouter

eventrouter是使用golang写的,可以根据自己的需求二次开发,部署很简单,参考:https://github.com/cuishuaigit/eventrouter-kafka。这里就不细述了。

 

2、kafka集群

参考:https://github.com/cuishuaigit/k8s-kafka

 

3、logstash

现在相应版本的logstash,下载地址:https://www.elastic.co/guide/en/logstash/6.5/installing-logstash.html

然后进行配置,这里贴一下我的测试配置:

input{   kafka{      bootstrap_servers => ["kafka-0.kafka-svc.kafka.svc.cluster.local:9092,kafka-1.kafka-svc.kafka.svc.cluster.local:9092,kafka-2.kafka-svc.kafka.svc.cluster.local:9092"]      client_id => "eventrouter-prod"      #auto_offset_reset => "latest"      group_id => "eventrouter"      consumer_threads => 2      #decorate_events  => true      id => "eventrouter"      topics => ["eventrouter"]}}filter {  if [message] =~ 'DNSConfigForming' {     drop { }  }  json {    source => "message"  }  mutate {    remove_field => [ "message","old_event" ]}}output{ elasticsearch {                        hosts => "10.4.9.28:9200"                        index => "eventrouter-%{+YYYY-MM-dd}"                 }}

 

4、ES

version: '2'services:  elasticsearch:    image: docker.elastic.co/elasticsearch/elasticsearch:6.5.1    container_name: elasticsearch    environment:      - cluster.name=docker-cluster      - bootstrap.memory_lock=true      - "ES_JAVA_OPTS=-Xms4096m -Xmx4096m"    ulimits:      memlock:        soft: -1        hard: -1    volumes:      - /data/es1:/usr/share/elasticsearch/data      - /data/backups:/usr/share/elasticsearch/backups      - /data/longterm_backups:/usr/share/elasticsearch/longterm_backups      - ./config/jvm.options:/usr/share/elasticsearch/config/jvm.options    ports:      - "9200:9200"    networks:      - esnet#  elasticsearch2:#    image: docker.elastic.co/elasticsearch/elasticsearch:6.5.1#    container_name: elasticsearch2#    environment:#      - cluster.name=docker-cluster#      - bootstrap.memory_lock=true#      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"#      - "discovery.zen.ping.unicast.hosts=elasticsearch"#    ulimits:#      memlock:#        soft: -1#        hard: -1#    volumes:#      - /data/es2:/usr/share/elasticsearch/data#    networks:#      - esnet  kibana:    image: docker.elastic.co/kibana/kibana:6.5.1    container_name: kibana    environment:      SERVER_NAME: kibana      SERVER_HOST: "0.0.0.0"      ELASTICSEARCH_URL: http://elasticsearch:9200      XPACK_MONITORING_UI_CONATINER_ELASTICSEARCH_ENABLED: "true"    volumes:      - /data/plugin:/usr/share/kibana/plugin      - /tmp/:/etc/archives    ports:      - "5601:5601"    networks:      - esnet    depends_on:      - elasticsearchnetworks: esnet:   driver: bridge

 

cat config/jvm.properties

## JVM configuration################################################################## IMPORTANT: JVM heap size#################################################################### You should always set the min and max JVM heap## size to the same value. For example, to set## the heap to 4 GB, set:#### -Xms4g## -Xmx4g#### See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html## for more information################################################################### Xms represents the initial size of total heap space# Xmx represents the maximum size of total heap space-Xms2g-Xmx2g################################################################## Expert settings#################################################################### All settings below this section are considered## expert settings. Don't tamper with them unless## you understand what you are doing#################################################################### GC configuration-XX:+UseConcMarkSweepGC-XX:CMSInitiatingOccupancyFraction=75-XX:+UseCMSInitiatingOccupancyOnly## G1GC Configuration# NOTE: G1GC is only supported on JDK version 10 or later.# To use G1GC uncomment the lines below.# 10-:-XX:-UseConcMarkSweepGC# 10-:-XX:-UseCMSInitiatingOccupancyOnly# 10-:-XX:+UseG1GC# 10-:-XX:InitiatingHeapOccupancyPercent=75## optimizations# pre-touch memory pages used by the JVM during initialization-XX:+AlwaysPreTouch## basic# explicitly set the stack size-Xss1m# set to headless, just in case-Djava.awt.headless=true# ensure UTF-8 encoding by default (e.g. filenames)-Dfile.encoding=UTF-8# use our provided JNA always versus the system one-Djna.nosys=true# turn off a JDK optimization that throws away stack traces for common# exceptions because stack traces are important for debugging-XX:-OmitStackTraceInFastThrow# flags to configure Netty-Dio.netty.noUnsafe=true-Dio.netty.noKeySetOptimization=true-Dio.netty.recycler.maxCapacityPerThread=0# log4j 2-Dlog4j.shutdownHookEnabled=false-Dlog4j2.disable.jmx=true-Djava.io.tmpdir=${ES_TMPDIR}## heap dumps# generate a heap dump when an allocation from the Java heap fails# heap dumps are created in the working directory of the JVM-XX:+HeapDumpOnOutOfMemoryError# specify an alternative path for heap dumps; ensure the directory exists and# has sufficient space-XX:HeapDumpPath=data# specify an alternative path for JVM fatal error logs-XX:ErrorFile=logs/hs_err_pid%p.log## JDK 8 GC logging8:-XX:+PrintGCDetails8:-XX:+PrintGCDateStamps8:-XX:+PrintTenuringDistribution8:-XX:+PrintGCApplicationStoppedTime8:-Xloggc:logs/gc.log8:-XX:+UseGCLogFileRotation8:-XX:NumberOfGCLogFiles=328:-XX:GCLogFileSize=64m# JDK 9+ GC logging9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m# due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise# time/date parsing will break in an incompatible way for some date patterns and locals9-:-Djava.locale.providers=COMPAT# temporary workaround for C2 bug with JDK 10 on hardware with AVX-51210-:-XX:UseAVX=2

 

5、elastalert

部署参考https://github.com/Yelp/elastalert.git

使用:

mkdir  /etc/elastalert

将clone的elastalert目录下面的config.yaml.example拷贝到上面创建的目录里面:

cpoy  elastalert/config.yaml.example     /etc/elastalert/config.yaml

只需要修改:

rules_folder、es_host、es_port,如果设置了用户密码,还需要修改。

 

创建rules

mkdir /etc/elastalert/rules

 

6、钉钉

创建机器人参考我其他的博客,获取token,下载钉钉plugin, https://github.com/xuyaoqiang/elastalert-dingtalk-plugin

将elastalert_modules拷贝到/etc/elastalert目录下面

cp  -r elastalert-dingtalk-plugin/elastalert_modules   /etc/elastalert/elastalert

 

rules example

# Alert when the rate of events exceeds a threshold# (Optional)# Elasticsearch hostes_host: 10.2.9.28# (Optional)# Elasticsearch portes_port: 9200# (OptionaL) Connect with SSL to Elasticsearch#use_ssl: True# (Optional) basic-auth username and password for Elasticsearch#es_username: someusername#es_password: somepassword# (Required)# Rule name, must be uniquename: Other event frequency rule# (Required)# Type of alert.# the frequency rule type alerts when num_events events occur with timeframe timetype: frequency# (Required)# Index to search, wildcard supportedindex: eventrouter-*# (Required, frequency specific)# Alert when this many documents matching the query occur within a timeframenum_events: 5# (Required, frequency specific)# num_events must occur within this amount of time to trigger an alerttimeframe:  #hours: 4  minutes: 15# (Required)# A list of Elasticsearch filters used for find events# These filters are joined with AND and nested in a filtered query# For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.htmlfilter:#- term:#    some_field: "some_value"- query:    query_string:      query: "event.type: Warning NOT event.involvedObject.kind: Node"# (Required)# The alert is use when a match is found#smtp_host: smtp.exmail.qq.com#smtp_port: 25#smtp_auth_file: /etc/elastalert/smtp_auth_file.yaml#email_reply_to: ci@qq.com#from_addr: ci@qq.comrealert:  minutes: 5exponential_realert:  hours: 1alert:#- "email"- "elastalert_modules.dingtalk_alert.DingTalkAlerter"dingtalk_webhook: "https://oapi.dingtalk.com/robot/send?access_token=47194e6904c6e3133a9080980984444c8e5d7745e1f76c12cefa99c8c8ac718dd88d4c"dingtalk_msgtype: "text"alert_text_type: alert_text_onlyalert_text: "   ====elastalert message====\n   EventTime>>:  {
0}\n Event_involvedObject_name>>: {
1}\n Event_involvedObject_kind>>: {
2}\n Event_involvedObject_namespace>>: {
3}\n Message>>: {
4}\n Event_reason>>: {
5}\n verb>>: {
6}"alert_text_args:- "@timestamp"- event.involvedObject.name- event.source.component- event.involvedObject.namespace- event.message- event.reason- verb# (required, email specific)# a list of email addresses to send alerts to#email:#- "ci@qq.com"

 

自己定制的告警消息格式:

alert:#- "email"- "elastalert_modules.dingtalk_alert.DingTalkAlerter"dingtalk_webhook: "https://oapi.dingtalk.com/robot/send?access_token=47194e6904c6e3133a9080980984444c8e5d7745e1f76c12cefa99c8c8ac718dd88d4c"dingtalk_msgtype: "text"alert_text_type: alert_text_onlyalert_text: "   ====elastalert message====\n   EventTime>>:  {
0}\n Event_involvedObject_name>>: {
1}\n Event_involvedObject_kind>>: {
2}\n Event_involvedObject_namespace>>: {
3}\n Message>>: {
4}\n Event_reason>>: {
5}\n verb>>: {
6}"alert_text_args:- "@timestamp"- event.involvedObject.name- event.source.component- event.involvedObject.namespace- event.message- event.reason- verb

详细信息参考官网:https://elastalert.readthedocs.io/en/latest/recipes/writing_filters.html#writingfilters

 

 

转载于:https://www.cnblogs.com/cuishuai/p/10573586.html

你可能感兴趣的文章
python2.7.X 升级至Python3.6.X
查看>>
VS调试方法
查看>>
jquery拖拽实现UI设计组件
查看>>
javamail模拟邮箱功能获取邮件内容-中级实战篇【内容|附件下载方法】(javamail API电子邮件实例)...
查看>>
白话排序算法--冒泡排序
查看>>
imx6 18bit display
查看>>
Spring静态属性注入
查看>>
实验10:指针2
查看>>
【转】hibernate缓存:一级缓存和二级缓存
查看>>
第二个spring冲刺第3天
查看>>
AwSnap:让全版本(Windows、iOS、Android)Chrome浏览器崩溃的有趣漏洞
查看>>
线段树合并学习笔记
查看>>
AndroidAutoLayout
查看>>
样本不均衡下的分类损失函数
查看>>
node启动服务后,窗口不能关闭。pm2了解一下
查看>>
vsCode 改变主题
查看>>
【vijos】【树形dp】佳佳的魔法药水
查看>>
聚合新闻头条
查看>>
Ubuntu 关闭锁屏界面的 on-screen keyboard
查看>>
凸优化学习笔记
查看>>