Grok it Yourself

Amora The Enchantress
4 min readMar 16, 2021

--

To represent %{SYNTAX:Semantic} for grok

Going by the book of wiki, The term “Grok” was coined by Robert A. Heinlein in his 1961 novel “Stranger in a Strange Land” as a Martian word that could not be defined in Earthling terms, but can be associated with various literal meanings such as “water”, “to drink”, “life”, or “to live”, and had a much more profound figurative meaning that is hard for terrestrial culture to understand because of its assumption of a singular reality.

Reference: goodreads.com

“I grok in fullness.”

― Robert A. Heinlein, Stranger in a Strange Land

Grok is a used to parse unstructured log data into structured and queryable form. It works best for logs coming from end point security systems like syslogs, winevents, webserver logs, etc. You can find many online debuggers for grok and one of the most popular grok debugger is http://grokdebug.herokuapp.com/

If you are using the ELK stack for logging, parsing and visualizing your data, you can also find a grok debugger tool at

Kibana →Dev Tools →Grok Debugger

How to Grok?

The basic syntax for Grok is %{SYNTAX:SEMANTIC} where syntax is pattern name and semantic is the part where we can rename the entity.

There are already predefined grok patterns available which can be found here: https://github.com/elastic/elasticsearch/blob/master/libs/grok/src/main/resources/patterns/grok-patterns

Example:

Let us consider a simple apache log sample with response 200

03:01:06 127.0.0.1 GET /images/cat.gif 200

The pattern to match this log can be:

%{TIME:time} %{IPV4:IP} %{WORD:method} %{URIPATH:uri} %{BASE10NUM:response}%{TIME:time} %{IPV4:IP} %{WORD:method} %{URIPATH:uri} %{BASE10NUM:response}

Let us write a configuration for logstash filter plugin:

input {
file {
path => “/var/log/http.log”
}
}
filter {
grok {
match => { “message” => “%{TIME:time} %{IPV4:IP} %{WORD:method} %{URIPATH:uri} %{BASE10NUM:response}” }
}
}

This will give an output like:

time: 03:01:06
IP: 127.0.0.1
method: GET
uri: /images/cat.gif
response: 304

If the pattern that you are searching is not available, you can even create your own custom pattern.

Regular Expressions:

Grok works well with RegEx, so any regular expressions are valid in grok as well.

Grok sits on top of regular expressions, so any regular expressions are valid in grok as well.

Custom Patterns:

It might be possible that logstash does not have a desired pattern for you. In that case, you can create a custom pattern to match the piece of information in the log.

First, you can use the Oniguruma syntax for named capture which will let you match a piece of text and save it as a field:

(?<field_name>the pattern here)

Let us create a sample custom pattern for the following string:

12:30:56

CUSTOM PATTERN:

DURATION %{HOUR}:%{MINUTE}:%{SECOND}

PATTERN:

%{DURATION:time}

OUTPUT:

time: 12:30:56

patterns_dir is used to tell logstash where your custom patterns directory is.

filter {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "%{DURATION:time}" }
}
}
Reference: giphy.com

What makes grok so special?

Grok filter parses the data in such a way that more specific patterns are tried before the more generic patterns are applied on the input data. A thing to keep in mind is that, while parsing the data, the sequence of the data must be kept in mind and the pattern should be written accordingly.

You can find common grok filter plugin configuration option:

filter {
grok {
add_field => … # hash (optional), default: {}
add_tag => … # array (optional), default: []
break_on_match => … # boolean (optional), default: true
drop_if_match => … # boolean (optional), default: false
keep_empty_captures => … # boolean (optional), default: false
match => … # hash (optional), default: {}
named_captures_only => … # boolean (optional), default: true
overwrite => … # array (optional), default: []
patterns_dir => … # array (optional), default: []
remove_field => … # array (optional), default: []
remove_tag => … # array (optional), default: []
tag_on_failure => … # array (optional), default: [“_grokparsefailure”]
}
}

BONUS

What if there is a string which is not that accepted with many patterns? Or…. You feel little lazy to write a specific pattern.. Jk..

Reference: giphy.com

Get as Greedy as a Gollum! If you have a string where values make no sense.. Grab it and put it inside “GREEDYDATA”

EXAMPLE:

Help me, Obi-Wan Kenobi. You’re my only hope.

PATTERN:

%{GREEDYDATA:message}

OUTPUT:

message: "Help me, Obi-Wan Kenobi. You’re my only hope."

Last but not the least…

HAPPY GROKKING AND…………

Reference: giphy.com

References:

https://dzone.com/articles/logstash-sequence-grok-blocks#:~:text=Additionally%2C%20we%20need%20to%20arrange,data%20is%20also%20equally%20important.

--

--