grok_exporter是基于logstash的grok的插件开发的日志分析工具,可以分析非结构化日志根据正则表达式进行匹配,然后生成适合prometheus的规则规范的metrics。
编译安装
下载代码
https://github.com/fstab/grok_exporter
编译环境
1. go
2. gcc
3. Oniguruma
前面两个就不多说了,最后一个安装说明一下
1. Installing the Oniguruma library on OS X
brew install oniguruma
2.Installing the Oniguruma library on Ubuntu Linux
sudo apt-get install libonig-dev
3.Installing the Oniguruma library from source
curl -sLO https://github.com/kkos/oniguruma/releases/download/v6.9.4/onig-6.9.4.tar.gz
tar xfz onig-6.9.4.tar.gz
cd /tmp/onig-6.9.4
./configure
make
make install
编译
git clone https://github.com/fstab/grok_exporter
cd grok_exporter
git submodule update --init --recursive
go install .
使用
基本启动
./grok_exporter -config ./example/config.yml
就可以在 http://localhost:9144/metrics 来访问指标
其他启动参数
Usage of ./grok_exporter:
-config string
Path to the config file. Try '-config ./example/config.yml' to get started.
-showconfig
Print the current configuration to the console. Example: 'grok_exporter -showconfig -config ./example/config.yml'
-version
Print the grok_exporter version.
配置文件
The grok_exporter configuration file consists of five main sections:
global:
# Config version
input:
# How to read log lines (file or stdin).
grok:
# Available Grok patterns.
metrics:
# How to map Grok fields to Prometheus metrics.
server:
# How to expose the metrics via HTTP(S).
说明
global 主要是配置config的版本,目前最新版都是V2
global: config_version: 2 retention_check_interval: 53s
1、config
grok_exporter config_version ≤ 0.1.4 1 (see CONFIG_v1.md) 0.2.X, 1.0.X 2 (current version)
2、retention_check_interval
The retention_check_interval is the interval at which grok_exporter checks for expired metrics.
input 主要是重哪边采集日志,可以是文件,标准输入等,我们使用的是文件输入的方式,在这边只要将文件路径配置好就行。
1、file
input: type: file paths: - /var/logdir1/*.log - /var/logdir2/*.log readall: false fail_on_missing_logfile: true poll_interval_seconds: 5 # should not be needed in most cases, see below
- type就是类型
- path就是获取文件配置
- readall表示是否重文件开头开始读取,true表示重文件开头读取,false表示重结尾读取
- fail_on_missing_logfile表示不存在采集的文件是否启动成功,如果是true代表文件不存在就启动失败,反之亦然。
- poll_interval_seconds不重要
2、stdin
input: type: stdin
比如monitor the output of journalctl
journalctl -f | grok_exporter -config config.yml
3、Webhook
input: type: webhook # HTTP Path to POST the webhook # Default is `/webhook` webhook_path: /webhook # HTTP Body POST Format # text_single: Webhook POST body is a single plain text log entry # text_bulk: Webhook POST body contains multiple plain text log entries # separated by webhook_text_bulk_separator (default: \n\n) # json_single: Webhook POST body is a single json log entry. Log entry # text is selected from the value of a json key determined by # webhook_json_selector. # json_bulk: Webhook POST body contains multiple json log entries. The # POST body envelope must be a json array "[ <entry>, <entry> ]". Log # entry text is selected from the value of a json key determined by # webhook_json_selector. # Default is `text_single` webhook_format: json_bulk # JSON Path Selector # Within an json log entry, text is selected from the value of this json selector # Example ".path.to.element" # Default is `.message` webhook_json_selector: .message # Bulk Text Separator # Separator for text_bulk log entries # Default is `\n\n` webhook_text_bulk_separator: "\n\n"
grok 主要是匹配规则的相关正则表达式的定义,我们可以自定义的我们url相关的路径,然后根据路径进行匹配
配置
grok: patterns_dir: ./logstash-patterns-core/patterns additional_patterns: - 'EXIM_MESSAGE [a-zA-Z ]*' - 'EXIM_SENDER_ADDRESS F=<%{EMAILADDRESS}>'
- patterns_dir是指定我们写好的正则表达式文件的目录,我们可以自己去这个目录下编写,,正常logstash-patterns-core是以前用于logstash的,包含了大部分的正则,可以进去查看使用,当然如果自定义的话,也可以自己去写这个文件在这个目录下。
- additional_patterns也是我们给正则表达式起个名字,不用写在文件里,写在这里直接用
比如我做的nginx的url的匹配获取参数为标签的
grok: additional_patterns: - 'URL /springRed/getRewards.do!?' //获取url - 'ID (?<=promotionId=).*?(?=["|&| ])' //获取参数id
metrics 主要定义我们采集的指标
支持四种类型
Counter Gauge Histogram Summary
1.counter
metrics: - type: counter name: alice_occurrences_total help: number of log lines containing alice match: 'alice' labels: logfile: '{{base .logfile}}'
- match就是我们匹配的字段,可以正则表达式,可以是中间的任何一段。
- labels就是我们指标中使用的标签,可以使用match中定义的变量
- counter不需要指定value,是自己累加的
例如
metrics: - type: counter name: count_total help: Total Number of RedPackage Request. match: '%{URL:url}.*%{ID:id}' labels: url: '{{.url}}' id: '{{.id}}'
2.gauge
metrics: - type: gauge name: grok_example_values help: Example gauge metric with labels. match: '%{DATE} %{TIME} %{USER:user} %{NUMBER:val}' value: '{{.val}}' cumulative: false labels: user: '{{.user}}'
3.Histogram
metrics: - type: histogram name: grok_example_values help: Example histogram metric with labels. match: '%{DATE} %{TIME} %{USER:user} %{NUMBER:val}' value: '{{.val}}' buckets: [1, 2, 3] labels: user: '{{.user}}'
4.Summary
metrics: - type: summary name: grok_example_values help: Summary metric with labels. match: '%{DATE} %{TIME} %{USER:user} %{NUMBER:val}' value: '{{.val}}' quantiles: {0.5: 0.05, 0.9: 0.01, 0.99: 0.001} labels: user: '{{.user}}'
server服务器监听配置
配置
server: protocol: https host: localhost port: 9144 path: /metrics cert: /path/to/cert key: /path/to/key
实例
采集nginx日志中 http://test.com:8088/spring/getRewards?promotionId=21 类似于这个url的 的数量的统计
[root@test grok_exporter]# cat config.yml
global:
config_version: 2
input:
type: file
path: /usr/local/nginx/logs/access_http.log
readall: true # Read from the beginning of the file? False means we start at the end of the file and read only new lines.
grok:
additional_patterns:
- 'URL /spring/getRewards!?'
- 'ID (?<=promotionId=).*?(?=["|&| ])'
metrics:
- type: counter
name: count_total
help: Total Number of RedPackage Request.
match: '%{URL:url}.*%{ID:id}'
labels:
url: '{{.url}}'
id: '{{.id}}'
server:
host: 0.0.0.0
port: 9144
启动
./grok_exporter -config ./example/config.yml
采集数据
# HELP count_total Total Number of RedPackage Request.
# TYPE count_total counter
count_total{id="21",url="/springRed/getRewards.do"} 387184
count_total{id="22",url="/springRed/getRewards.do"} 384322
count_total{id="23",url="/springRed/getRewards.do"} 381606
原理
进程会一直对文件进行读取计算,所以就算不采集数据,程序在这一块也是有消耗的。读取的方式是可以选择的,重文件开始还是文件结束。