Grok解析 centos 的 nginx 原生格式日志
我的个人博客:逐步前行STEP
centos系统的nginx原生格式如下:
112.95.209.146 - - [13/Jun/2019:09:32:50 +0800] "GET /css/web.css HTTP/1.1" 200 27518 "http://www.hezehua.net/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
这个格式里,包含了-
、"
、[
、]
这些特殊符号,所以不仅需要使用grok预定义的关键字匹配,还需要自定义正则表达式,使用的到预定义关键字有:
- IP :匹配IP
- HTTPDATE 匹配格式如:04/Jul/2019:10:43:15 +0800 的时间字符串
- WORD:匹配一个单词
- PATH:匹配一个路径 格式如:/css/web.css
- NUMBER:匹配数值
- HOSTNAME:匹配域名,格式:www.hezehua.net
匹配的grok语句为:
%{IP:ip} ([\s\S]{5})%{HTTPDATE:date}([\s\S]{3})%{WORD:method} %{PATH:path} (?<http>HTTP\/%{NUMBER})([\s\S]{1}) %{NUMBER:code} %{NUMBER:length}([\s\S]{2})(?<referer>http[s]?://%{HOSTNAME}%{PATH})([\s\S]{3})((?<user-agent>[\s\S]*)"$)
设置字段:
- ip IP
- time 时间
- method 请求方式
- path 请求路径
- http-code 响应的http code
- content-length 响应的内容长度
- referer 请求来源页url
- user-agent 用户代理
grok语句匹配结果会产生很多用不上的字段,需要过滤掉,以下是测试结果:
{
"ip": [
[
"42.156.136.107"
]
],
"IPV6": [
[
null
]
],
"IPV4": [
[
"42.156.136.107"
]
],
"HTTPDATE": [
[
"16/Oct/2019:07:40:06 +0800"
]
],
"MONTHDAY": [
[
"16"
]
],
"MONTH": [
[
"Oct"
]
],
"YEAR": [
[
"2019"
]
],
"TIME": [
[
"07:40:06"
]
],
"HOUR": [
[
"07"
]
],
"MINUTE": [
[
"40"
]
],
"SECOND": [
[
"06"
]
],
"INT": [
[
"+0800"
]
],
"method": [
[
"GET"
]
],
"path": [
[
"/js/share.js/fonts/iconfont.woff"
]
],
"UNIXPATH": [
[
"/js/share.js/fonts/iconfont.woff",
"/js/share.js/css/share.min.css"
]
],
"WINPATH": [
[
null,
null
]
],
"http": [
[
"HTTP/1.1"
]
],
"NUMBER": [
[
"1.1"
]
],
"BASE10NUM": [
[
"1.1",
"200",
"6364"
]
],
"code": [
[
"200"
]
],
"length": [
[
"6364"
]
],
"referer": [
[
"http://www.hezehua.net/js/share.js/css/share.min.css"
]
],
"HOSTNAME": [
[
"www.hezehua.net"
]
],
"PATH": [
[
"/js/share.js/css/share.min.css"
]
],
"user-agent": [
[
"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 YisouSpider/5.0 Safari/537.36"
]
]
}
摘取有效信息即可,其中日期格式需要再转化为YYYY-MM-DD HH:II:SS
格式