The Rosie Pattern Language, a better way to mine your data

Hello RPL, goodbye regex! Rosie makes finding that data needle in the data haystack a lot easier.

1 2 Page 2
Page 2 of 2

As you’ll have also noticed, the output is selectively colored. Which patterns colors are assigned to can be checked with the -patterns switch:

RedQueen:rosie mgibbs$ rosie -patterns
This is Rosie v0.99i

Pattern                        Type            Color   
------------------------------ --------------- --------
$                              alias                   
.                              alias           black   
D                              alias                   
S                              alias                   
W                              alias                   
ampm                           alias                   
any                            alias                   
basic.datetime_patterns        definition      blue    
basic.element                  alias                   
basic.element_bracketed        alias                   
basic.element_quoted           alias                   
basic.matchall                 definition              
basic.network_patterns         definition      red     
basic.punctuation              definition              
basic.unmatched                definition      black   
c.any_comment                  definition              
c.code_only                    definition              
comment                        definition              
common.denoted_hex             definition      underline
common.dotted_identifier       definition              
common.dq                      alias                   
common.dquoted_string          alias                   
common.float                   definition      underline
common.graph                   alias                   
common.hex                     definition      underline

(snip, snip, snip)

If you want to change the color assignments for patterns you’ll need to either use the command line switch -encode to specify colors or edit /usr/local/Cellar/rosie/current/share/rosie/src/core/color-output.lua

Let’s just extract one pattern from the data, the IP addresses:

RedQueen:rosie mgibbs$ rosie network.ip_address ifconfig.txt
RedQueen:rosie mgibbs$

Hummm. No ouput! What’s happened here is that Rosie was looking for a match with network.ip_address and when something else was found first, Rosie quit. The behavior we want is to emulate the Unix grep command and find all matches so we need to use the -grep switch:

RedQueen:rosie mgibbs$ rosie -grep network.ip_address ifconfig.txt
127.0.0.1 
192.168.0.180 192.168.0.255 
169.254.239.110 169.254.255.255 
RedQueen:rosie mgibbs$

Now, if you’re going to do really interesting stuff with Rosie you’re going to want to get a more processable output format which is where Rosie’s JSON encoding comes in. I’m going to have Rosie output JSON by using the -encode json switch and then pipe that to jq, a really cool command-line JSON processor:

RedQueen:rosie mgibbs$ rosie -grep -encode json network.ip_address ifconfig.txt | jq
{
  "*": {
    "pos": 1,
    "subs": [
      {
        "network.ip_address": {
          "pos": 7,
          "text": "127.0.0.1"
        }
      }
    ],
    "text": "\tinet 127.0.0.1"
  }
}
{
  "*": {
    "pos": 1,
    "subs": [
      {
        "network.ip_address": {
          "pos": 7,
          "text": "192.168.0.180"
        }
      },
      {
        "network.ip_address": {
          "pos": 50,
          "text": "192.168.0.255"
        }
      }
    ],
    "text": "\tinet 192.168.0.180 netmask 0xffffff00 broadcast 192.168.0.255"
  }
}
{
  "*": {
    "pos": 1,
    "subs": [
      {
        "network.ip_address": {
          "pos": 7,
          "text": "169.254.239.110"
        }
      },
      {
        "network.ip_address": {
          "pos": 52,
          "text": "169.254.255.255"
        }
      }
    ],
    "text": "\tinet 169.254.239.110 netmask 0xffff0000 broadcast 169.254.255.255"
  }
}
RedQueen:rosie mgibbs$

Ta-da! Now I’ve got the data I wanted in a useful structure that can be easily manipulated, analyzed, and stored (actually, using jq to extract and re-encode Rosie’s JSON into new JSON structures adds even more flexibility).

At the start of this article I showed the horrible regex to find IPv6 addresses. To extract the same data using Rosie, we need to first download the IPv6 pattern which is specified in rfc3986.rpl in Rosie’s Github repo (this pattern is pretty new which is why it’s not included when you install via Brew) then add it to Rosie’s manifest. Let’s assume you downloaded the pattern to /usr/local/Cellar/rosie/current/share/rosie/rpl where the other RPL patterns are stored:

RedQueen:rosie mgibbs$ echo “/usr/local/Cellar/rosie/current/share/rosie/rpl/rfc3986.rpl" >> /usr/local/Cellar/rosie/current/share/rosie/MANIFEST
RedQueen:rosie mgibbs$rosie -grep -encode json IPv6address ifconfig.txt | jq
{
  "*": {
    "pos": 1,
    "text": "\tinet6 ::1"
  }
}
{
  "*": {
    "pos": 1,
    "text": "\tinet6 fe80::1"
  }
}
{
  "*": {
    "pos": 1,
    "text": "\tinet6 fe80::425:850:43a6:5863"
  }
}
{
  "*": {
    "pos": 1,
    "text": "\tinet6 2605:e000:6a0b:2500:86d:8286:ca88:9f8"
  }
}
{
  "*": {
    "pos": 1,
    "text": "\tinet6 2605:e000:6a0b:2500:e952:c1e6:2031:b452"
  }
}
{
  "*": {
    "pos": 1,
    "text": "\tinet6 fe80::1493:92de:58ee:96eb"
  }
}
{
  "*": {
    "pos": 1,
    "text": "\tinet6 fe80::42fb:52c5:cff7:807e"
  }
}
{
  "*": {
    "pos": 1,
    "text": "\tinet6 fe80::37b0:cdae:5ad4:4f0e"
  }
}
{
  "*": {
    "pos": 1,
    "text": "\tinet6 fe80::8e14:3930:f5f9:ae9c"
  }
}
RedQueen:rosie mgibbs$ 

Pretty cool, eh? Rosie is available as a C library that can be called from Go, Python, node.js, Ruby, Java, etc., and hopefully in the near future as native libraries from those languages.

The Rosie documentation spends some very useful time discussing how to extract data from CSV and Apache Spark log files which are incredibly valuable techniques for data scientists and network administrators.

This has been a brief foray into Rosie and, as I hope you’ve grokked, it’s a powerful tool that's more predictable than regexes, faster than grep, and leaps over tall data piles extracting the data you want in a useful format. This is not just a cool utility; for anyone mining large data sets, it’s a fantastic and much needed power tool that will save time, avoid errors, and generally make your life easier.

Comments? Thoughts? Drop me a line or comment below then follow me on Twitter and Facebook. And sign up for my new newsletter!

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Related:

Copyright © 2017 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2
SD-WAN buyers guide: Key questions to ask vendors (and yourself)