分类 Grok 下的文章

我的个人博客:逐步前行STEP

centos系统的nginx原生格式如下:

112.95.209.146 - - [13/Jun/2019:09:32:50 +0800] "GET /css/web.css HTTP/1.1" 200 27518 "http://www.hezehua.net/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"

这个格式里,包含了-"[]这些特殊符号,所以不仅需要使用grok预定义的关键字匹配,还需要自定义正则表达式,使用的到预定义关键字有:

  1. IP :匹配IP
  2. HTTPDATE 匹配格式如:04/Jul/2019:10:43:15 +0800 的时间字符串
  3. WORD:匹配一个单词
  4. PATH:匹配一个路径 格式如:/css/web.css
  5. NUMBER:匹配数值
  6. HOSTNAME:匹配域名,格式:www.hezehua.net

匹配的grok语句为:

%{IP:ip} ([\s\S]{5})%{HTTPDATE:date}([\s\S]{3})%{WORD:method} %{PATH:path} (?<http>HTTP\/%{NUMBER})([\s\S]{1}) %{NUMBER:code} %{NUMBER:length}([\s\S]{2})(?<referer>http[s]?://%{HOSTNAME}%{PATH})([\s\S]{3})((?<user-agent>[\s\S]*)"$)

设置字段:

  • ip IP
  • time 时间
  • method 请求方式
  • path 请求路径
  • http-code 响应的http code
  • content-length 响应的内容长度
  • referer 请求来源页url
  • user-agent 用户代理

grok语句匹配结果会产生很多用不上的字段,需要过滤掉,以下是测试结果:

{
  "ip": [
    [
      "42.156.136.107"
    ]
  ],
  "IPV6": [
    [
      null
    ]
  ],
  "IPV4": [
    [
      "42.156.136.107"
    ]
  ],
  "HTTPDATE": [
    [
      "16/Oct/2019:07:40:06 +0800"
    ]
  ],
  "MONTHDAY": [
    [
      "16"
    ]
  ],
  "MONTH": [
    [
      "Oct"
    ]
  ],
  "YEAR": [
    [
      "2019"
    ]
  ],
  "TIME": [
    [
      "07:40:06"
    ]
  ],
  "HOUR": [
    [
      "07"
    ]
  ],
  "MINUTE": [
    [
      "40"
    ]
  ],
  "SECOND": [
    [
      "06"
    ]
  ],
  "INT": [
    [
      "+0800"
    ]
  ],
  "method": [
    [
      "GET"
    ]
  ],
  "path": [
    [
      "/js/share.js/fonts/iconfont.woff"
    ]
  ],
  "UNIXPATH": [
    [
      "/js/share.js/fonts/iconfont.woff",
      "/js/share.js/css/share.min.css"
    ]
  ],
  "WINPATH": [
    [
      null,
      null
    ]
  ],
  "http": [
    [
      "HTTP/1.1"
    ]
  ],
  "NUMBER": [
    [
      "1.1"
    ]
  ],
  "BASE10NUM": [
    [
      "1.1",
      "200",
      "6364"
    ]
  ],
  "code": [
    [
      "200"
    ]
  ],
  "length": [
    [
      "6364"
    ]
  ],
  "referer": [
    [
      "http://www.hezehua.net/js/share.js/css/share.min.css"
    ]
  ],
  "HOSTNAME": [
    [
      "www.hezehua.net"
    ]
  ],
  "PATH": [
    [
      "/js/share.js/css/share.min.css"
    ]
  ],
  "user-agent": [
    [
      "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 YisouSpider/5.0 Safari/537.36"
    ]
  ]
}

摘取有效信息即可,其中日期格式需要再转化为YYYY-MM-DD HH:II:SS格式