Skip to main content
Version: Next

Creating parsers

Foreword#

This documentation assumes you're trying to create a parser for crowdsec with the intent of submitting to the hub, and thus create the associated functional testing. The creation of said functional testing will guide our process and will make it easier.

We're going to create a parser for the imaginary service "myservice" that produce three types of logs via syslog :

Dec  8 06:28:43 mymachine myservice[2806]: bad password for user 'toto' from '1.2.3.4'Dec  8 06:28:43 mymachine myservice[2806]: unknown user 'toto' from '1.2.3.4'Dec  8 06:28:43 mymachine myservice[2806]: accepted connection for user 'toto' from '1.2.3.4'

As we are going to parse those logs to further detect bruteforce and user-enumeration attacks, we're simply going to "discard" the last type of logs.

Pre-requisites#

  1. Create a local test environment

  2. Clone the hub

git clone https://github.com/crowdsecurity/hub.git

Create our test#

From the root of the hub repository :

โ–ถ cscli hubtest create myservice-logs --type syslog
  Test name                   :  myservice-logs  Test path                   :  /home/dev/github/hub/.tests/myservice-logs  Log file                    :  /home/dev/github/hub/.tests/myservice-logs/myservice-logs.log (please fill it with logs)  Parser assertion file       :  /home/dev/github/hub/.tests/myservice-logs/parser.assert (please fill it with assertion)  Scenario assertion file     :  /home/dev/github/hub/.tests/myservice-logs/scenario.assert (please fill it with assertion)  Configuration File          :  /home/dev/github/hub/.tests/myservice-logs/config.yaml (please fill it with parsers, scenarios...)

Configure our test#

Let's add our parser to the test configuration (.tests/myservice-logs/config.yaml). He specify that we need syslog-logs parser (because myservice logs are shipped via syslog), and then our custom parser.

parsers:- crowdsecurity/syslog-logs- ./parsers/s01-parse/crowdsecurity/myservice-logs.yamlscenarios:postoverflows:log_file: myservice-logs.loglog_type: syslogignore_parsers: false

note: as our custom parser isn't yet part of the hub, we specify its path relative to the root of the hub directory

Parser creation : skeleton#

For the sake of the tutorial, let's create a very simple parser :

filter: 1 == 1debug: trueonsuccess: next_stagename: crowdsecurity/myservice-logsdescription: "Parse myservice logs"grok:#our grok pattern : capture .*  pattern: ^%{DATA:some_data}$#the field to which we apply the grok pattern : the log message itself  apply_on: messagestatics:  - parsed: is_my_service    value: yes
  • a filter : if the expression is true, the event will enter the parser, otherwise, it won't
  • a onsuccess : defines what happens when the event was successfully parsed : shall we continue ? shall we move to next stage ? etc.
  • a name & a description
  • some statics that will modify the event
  • a debug flag that allows to enable local debugging information
  • a grok pattern to capture some data in logs

We can then "test" our parser like this :

โ–ถ cscli hubtest run myservice-logsINFO[01-10-2021 12:41:21 PM] Running test 'myservice-logs'                WARN[01-10-2021 12:41:24 PM] Assert file '/home/dev/github/hub/.tests/myservice-logs/parser.assert' is empty, generating assertion: 
len(results) == 2len(results["s00-raw"]["crowdsecurity/syslog-logs"]) == 3results["s00-raw"]["crowdsecurity/syslog-logs"][0].Success == true...len(results["s01-parse"]["crowdsecurity/myservice-logs"]) == 3results["s01-parse"]["crowdsecurity/myservice-logs"][0].Success == trueresults["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["program"] == "myservice"results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["timestamp"] == "Dec  8 06:28:43"results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["is_my_service"] == "yes"results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["logsource"] == "syslog"results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["message"] == "bad password for user 'toto' from '1.2.3.4'"results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["some_data"] == "bad password for user 'toto' from '1.2.3.4'"...

Please fill your assert file(s) for test 'myservice-logs', exiting

What happened here ?

  • Our logs have been processed by syslog-logs parser and our custom parser
  • As we have no existing assertion(s), cscli hubtest kindly generated some for us

This mostly allows us to ensure that our logs have been processed by our parser, even if it's useless in its current state. Further inspection can be seen with cscli hubtest explain :

โ–ถ cscli hubtest explain myservice-logsline: Dec  8 06:28:43 mymachine myservice[2806]: bad password for user 'toto' from '1.2.3.4'    โ”œ s00-raw    |   โ”” ๐ŸŸข crowdsecurity/syslog-logs    โ”” s01-parse        โ”” ๐ŸŸข crowdsecurity/myservice-logs
line: Dec  8 06:28:43 mymachine myservice[2806]: unknown user 'toto' from '1.2.3.4'    โ”œ s00-raw    |   โ”” ๐ŸŸข crowdsecurity/syslog-logs    โ”” s01-parse        โ”” ๐ŸŸข crowdsecurity/myservice-logs
line: Dec  8 06:28:43 mymachine myservice[2806]: accepted connection for user 'toto' from '1.2.3.4'    โ”œ s00-raw    |   โ”” ๐ŸŸข crowdsecurity/syslog-logs    โ”” s01-parse        โ”” ๐ŸŸข crowdsecurity/myservice-logs

We can see that our log lines were successfully parsed by both syslog-logs and myservice-logs parsers.

Parser creation : actual parser#

Let's modify our parser, ./parsers/crowdsecurity/s01-parse/myservice-logs.yaml :

onsuccess: next_stagefilter: "evt.Parsed.program == 'myservice'"name: crowdsecurity/myservice-logsdescription: "Parse myservice logs"#for clarity, we create our pattern syntax beforehandpattern_syntax:  MYSERVICE_BADPASSWORD: bad password for user '%{USERNAME:user}' from '%{IP:source_ip}' #[1]  MYSERVICE_BADUSER: unknown user '%{USERNAME:user}' from '%{IP:source_ip}' #[1]nodes:#and we use them to parse our two type of logs  - grok:      name: "MYSERVICE_BADPASSWORD" #[2]      apply_on: message      statics:        - meta: log_type #[3]          value: myservice_failed_auth        - meta: log_subtype          value: myservice_bad_password  - grok:      name: "MYSERVICE_BADUSER" #[2]      apply_on: message      statics:        - meta: log_type #[3]          value: myservice_failed_auth        - meta: log_subtype          value: myservice_bad_userstatics:    - meta: service #[3]      value: myservice    - meta: username      expression: evt.Parsed.user    - meta: source_ip #[1]      expression: "evt.Parsed.source_ip"

Various changes have been made here :

  • We created to patterns to capture the two relevant type of log lines, Using an online grok debugger or an online regex debugger [2] )
  • We keep track of the username and the source_ip (Please note that setting the source_ip in evt.Meta.source_ip and evt.Parsed.source_ip is important [1])
  • We setup various statics information to classify the log type [3]

Let's run out tests again :

โ–ถ cscli hubtest run myservice-logs                    INFO[01-10-2021 12:49:56 PM] Running test 'myservice-logs'                WARN[01-10-2021 12:49:59 PM] Assert file '/home/dev/github/hub/.tests/myservice-logs/parser.assert' is empty, generating assertion: 
len(results) == 2len(results["s00-raw"]["crowdsecurity/syslog-logs"]) == 3results["s00-raw"]["crowdsecurity/syslog-logs"][0].Success == true...len(results["s01-parse"]["crowdsecurity/myservice-logs"]) == 3results["s01-parse"]["crowdsecurity/myservice-logs"][0].Success == true...results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["timestamp"] == "Dec  8 06:28:43"results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["program"] == "myservice"results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["source_ip"] == "1.2.3.4"results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Parsed["user"] == "toto"results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Meta["log_subtype"] == "myservice_bad_password"results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Meta["log_type"] == "myservice_failed_auth"results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Meta["service"] == "myservice"results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Meta["source_ip"] == "1.2.3.4"results["s01-parse"]["crowdsecurity/myservice-logs"][0].Evt.Meta["username"] == "toto"...results["s01-parse"]["crowdsecurity/myservice-logs"][1].Evt.Meta["log_subtype"] == "myservice_bad_user"results["s01-parse"]["crowdsecurity/myservice-logs"][2].Success == false

Please fill your assert file(s) for test 'myservice-logs', exiting

We can see that our parser captured all the relevant information, and it should be enough to create scenarios further down the line.

Again, further inspection with cscli hubtest explain will show us more about what happened :

โ–ถ cscli hubtest explain myservice-logsline: Dec  8 06:28:43 mymachine myservice[2806]: bad password for user 'toto' from '1.2.3.4'    โ”œ s00-raw    |   โ”” ๐ŸŸข crowdsecurity/syslog-logs    โ”” s01-parse        โ”” ๐ŸŸข crowdsecurity/myservice-logs
line: Dec  8 06:28:43 mymachine myservice[2806]: unknown user 'toto' from '1.2.3.4'    โ”œ s00-raw    |   โ”” ๐ŸŸข crowdsecurity/syslog-logs    โ”” s01-parse        โ”” ๐ŸŸข crowdsecurity/myservice-logs
line: Dec  8 06:28:43 mymachine myservice[2806]: accepted connection for user 'toto' from '1.2.3.4'    โ”œ s00-raw    |   โ”” ๐ŸŸข crowdsecurity/syslog-logs    โ”” s01-parse        โ”” ๐Ÿ”ด crowdsecurity/myservice-logs

note: we can see that our log line accepted connection for user 'toto' from '1.2.3.4' wasn't parsed by crowdsecurity/myservice-logs as we have no pattern for it

Closing word#

We have now a fully functional parser for myservice logs ! We can either deploy it to our production systems to do stuff, or even better, contribute to the hub !

If you want to know more about directives and possibilities, take a look at the parser reference documentation !

See as well this blog article on the topic.