XML, JSON, and YAML… Oh my!

I”m a network engineer who codes. Maybe even a network coder. Probably not a a network programmer. Definitely not a programer who knows networking.  I’m in that weird zone where I’m enough of two things that don’t normally go together that it makes conversations I”m having with some of my peers awkward.

I had one such conversation today trying to explain the different data serializations modes in python and why, at the end of the day, they really don’t matter.

The conversation started with one of those “But they have an XML API!!!” comments thrown out as a criticism of someone’s product. My response was something like “ And why does that matter? ”

The person who made the comment certainly couldn’t answer that question. It was just something they had read in a competitive deck somewhere.

I’m all about competing and trying to make sure that the customer’s have the BEST possible information to make the best decisions for their particular requirements, but this little criticism was definitely not, IMHO, the best information. In fact, it was totally irrelevant.   This post is my way of trying to explain why. Hopefully, this will help clear up some of the confusion around data structures and APIs and why they really don’t matter so much, as least not their formatting.

XML

You can read more about XML here. In a nutshell,  XML uses tags, similar to HTML to represent different values in your data stream.  the <item> opens up an item and the </item> closes the item, and what lives between the two is the value for that item. Take a look at the following XML output from the HP IMC NMS. I just cut and paste this straight out of the API interface, so you should be able to do the same if you want to follow along at home.  In this code, I have created a string called x and pasted in the XML formatted text which is a bunch of information about a Cisco 2811 router that lives in my lab. Pay attention to the values as they will stay the same going through this exercise.

XML is the oldest of the bunch, being a W3C recommendation in 1998. It’s important to note though that XML is still relevant, being the native data format of Netconf and still used in a lot of places. It’s old, but that doesn’t mean devoid of value.

Ordered Dictionary

A dictionary is a way of storing data in python that uses keys instead of an index to access the content or value of a specific piece of information you want. Example item[‘ip’] would return “10.101.0.1” with a dictionary.

One of the “issues” with dictionaries is that they are unordered. That means that there’s no guaranty that when you print out a dictionary that the values will be in the same order. ( Pretty obvious when you read the word “unordered” I know.) The OrderedDictionary is a “ dict subclass that remembers the order entries were added”.  So we’re going to use a great little library called xmltodict which takes an XML string ( called x ) above and transforms it into a python ordered dictionary. Now we can do interesting things to it in python. We can access they keys and get to the values directly. We can iterate over top of it because it’s one of pythons native data structures. It’s easy to use. People know it and understand it. It’s a good thing. Lists and dictionaries are the bread and butter of data structures in python. You need, need, need them.

In this code example, we’re going to take the XML string from above, run it through the xmltodict to convert it to an ordered dictionary and assign it to the variable y.  Once I’ve got the ordereddict Y, I could also use xmltodict to convert it back into XML with little to no effort. Cool?

JSON

JSON has become one of the standard ways to represent data between machines. It’s structured, well understood and it’s mostly human readable. A lot of “newer” systems now use JSON as the default data type. Most RESTful APIs for instance seem to have settled on JSON.

This is where things get interesting. Now that I’ve got XML in an ordereddict, I can use the JSON library to convert it to a JSON formatted string which I can then send along to any system that understands JSON. Or write it to a file, or just stare at those pretty, pretty braces.

Note: If I convert from JSON back to a python structure using the json.loads method, it will actually return a regular dictionary, not an Ordered Dictionary, so the values might appear out of order which COULD, in theory, cause issues with an upstream system, but I haven’t seen that in any of my work.

YAML

Although JSON is “more” readable than XML, it’s still got all those braces and apostrophes to worry about. And so YAML was born. YAML is easily the most human readable of the formats I’ve worked with. It uses white space, dashes and asterisk to denote different levels of the data structure. It’s what is commonly used with Jinja2 templating and Ansible and other cool buzzwords that we all are starting to play with.

Just like with the JSON example above, I can take the Ordered Dictionary and convert it to a YAML format (shown below ) and back again.  The yaml.load method does actually return an Ordered Dictionary.

 

What’s my point?

So the original criticism was “But they have an XML api!!!” right?  Well in these little code snippets I just demonstrated how using python and a couple of readily available libraries ( pyyaml and xmltodict are not native python and must be installed ) I was able to go from XML, to OrderedDict, to JSON, to YAML,  with almost no effort. I could take any of these and convert it to something like a Python Pickle, pull it back and convert it to something else. It really doesn’t matter. I can go from one to another without much effort.

Personally, I don’t like working with XML. I can do it, but I would RATHER work with JSON. But that’s just my personal preference, there’s no technical reason why JSON is superior to XML that I can see. At least not in the implementations and the levels that I’m dealing with.

Just like Bilbo Baggins, I can go from there and back again without worrying to much about the actual format in between because when I”m doing something in python, I’m really looking to be working with a native structure like a list of a dictionary anyways.

Anything that I get from externally, I’m just going to convert into a native python data type, munge away, then I”m going convert it back to whatever data format I need, be that JSON, XML or YAML and be on to the next task.

The actual data is what matters.

As long as it’s structured in a way that I can parse easily, I couldn’t care less how it comes in and how it goes out.

Don’t even get me started about simple wrapping CLI commands in XML…

Does that mean the format doesn’t matter at all?

No, I’m sure there are many more experienced programmers who can explain the horror stories of converting between different data formats, or that time when this thing happened that caused this other thing to blow up.  But for me; I’d much rather you had a well structured API that gives me data in a way that I can easily access, convert to a format I can work with, and move on.

Hopefully if you’ve made it to the end of this blog. You’ll agree that the actual format is much less important that you might once have believed. Disagree? Let me know if the comments below. Always looking to learn something and in the coding real, I ‘know I’ve got a LOT to learn!!!

@netmanchris

 

Advertisements

Introduction to R and SWIRL

So I’m taking a Cousera course from John Hopkins on Data Science.

The course uses the R programming language which is a derivative of the S programming language that came out of Bell labs in the 70’s. I’m a huge believer in network programability and SDN in general. From a traditional  Network Management point of view, most of the work getting done and discussed today is really around the C in the FCAPS model. There are some people, like Jason Edelman, Matt Oswald, etc… who are using network programability for automating troubleshooting tasks, but most of those are pretty straight forward

  • automate information gathering
  • automate troubleshooting
  • Identify the corrective action

Once you’ve got the corrective action nailed down, you could also automate the fix, but there are a lot of people who are still nervous about having changes happen without a human being involved. 

Automating configuration management and configuration based fault detection and error correction are great things. But there are other parts of the network that can benefit from the application of a programming language to old problems. 

I’m personally interested in the massive amounts of data that the network holds. We’ve got a ton of instrumentation within the network that is just setting there to be accessed, tracked, and mined for useful insights. 

Data Science is all about different methods to scroll through all the data in a scientifically reproducable manner, hopefully gain some insights.

The Tools

Like python, R has an IDE available that will allow you to run R code interactively, or through R files. It can be downloaded at the CRAN site here

There’s also a better IDE available called R Studio that allows some additional functionality which is available here 

SWIRL is a library which allows learners to access some interactive tutorials written in R for R. There’s a GIT repository here which provides a set of tutorials for different courses that allows you to get a feel for the language syntax, creating functions, etc…  

 

R Studio and Swirl

Once you install the SWIRL library, which is really easy using the RStudio Install Packages feature, you load the SWIRL library ( think IMPORT in Python ) using the library(swirl) function. Once you’ve done that, you can either download the course files from the GIT repository, or you can install directly from within R ( uses CURL in the background to download the files directly into your working directory ). 

As you can see in the screen capture, I’ve got a few different course installed, and each of the courses has a bunch of lessons inside them. The screen capture shows the lessons within the R Programming course. What’s also cool about this is, assuming that you’re enrolled in the Coursera R Programming course, you can complete the lesson, input your username and password ( specific to your course, not your cousera password ) and magically, you get extra credit for the course lessons you complete.   

Extra credit is a good thing.

 

Wrap Up

I’ve only been into R for about a week. It’s got some nice features, but to be honest, I don’t have enough coding experience to really give a qualified opinion on the subject. I’ll continue to work with it and see where things go.  There’s still a ton of python that I need to learn, but I’ve already found a native python library called rpy2 that allows me to access native R libraries from within my python code. Best of both words I guess. 🙂

 

Plans for 2015: Where to from here?

I know I’m a little bit late for New Years resolutions, but it’s been a tough decision making process. There is so much going on right now in the networking industry and, to be honest, I’m not sure that networking is going to be a skill that will demand the premium that it’s been able to for the last 10-15 years.  Don’t get me wrong, I’m not saying that networking is dead. In fact, just the opposite, networking is going to flourish. There is going to be so much networking that needs to be done that the only way we will be able to deal with it is to dump all of our collective knowledge into code and start to automate what would have previously been the domain of the bit-plumbers that we are. 

 

What skills to pick up in 2015:

So the question: What skills am I looking at picking up in 2015?  I am a huge believer the infrastructure-as-code movement. Looking at what leaders like Matt Oswald, Jason Edelman, Brent Salisbury, Dave Tucker, Colin McNamara, Jeremy Schulman, etc… are taking us, it’s obvious that coding skills are becoming a mandatory skill for anyone in the networking field who wants to become, or remain, at the top of the field.  That’s not to say that core networking skills are not going to be important, but I’m definitely branching out this year in trying to gain some another language, as well as improve my chops with what I already know.

Increase Python Skills

As anyone who’s been here for the last year knows, I’ve been playing around with python a lot. I’m hoping that 2015 will allow me to continue to increase my python skills, specifically as focused around networking, and I’m hoping that I will have enough time to go from just learning to actually contributing back to some code to the community. I’m signed up for Kirk Bayles Python for Networking Engineers course starting in January, as well as going through a few different books. Bets of all, my 9 year old son has also shown some interest in learning to code, so this might actually become a father son project.

I’m also hoping to get more involved with things like Ansible, Schprokits, as well as possibly releasing some of my own all projects.  Crossing my fingers on the stretch goals. 🙂

Gain Data Analysis Skills

Cousera is awesome. If you haven’t checked it out, you need to. You would have to be living under a rock buried in a lead can stored in a faraday cage at the bottom of the ocean to not have heard about SDN. I believe that there’s an ENORMOUS opportunity within the networking space for applying data analysis techniques to the massive amounts of information that flows across our networks every day. There’s a Cousera Data Science Specialization that I’m signed up for that I”m hoping will start me down the path of being able to execute on some the ideas that I’ve had bouncing around in my skull for more than half a decade. I’m sure I will be blogging on the course, but you might have to wait for some of the ideas.  

Virtualization-Ho!

Docker, Rocket, NSX, ESX, KVM, OVS. They are all going to get a little love this year from this guy. I’m not sure how much I’m going to be able to consume, but I believe these are all technologies that are going to be relevant in the coming years. I believe that Containers are going to get a lot of love in the industry and companies like http://www.socketplane.io are going to be something I”m watching closely. 

Networking Networking Networking

This is my core knowledge set and, I believe, what will continue to be the foundation of my value for the foreseeable future. I hit my CCIE Emeritus this year and also had a chance to attend a Narbik bootcamp. It was an incredibly humbling experience and reminded me of how much there is still to learn in this space that I love. If you get a chance to attend a Micronics CCIE bootcamp, I couldn’t recommend it highly enough. There are very few people who understand and can TEACH this information at the level Narbik can. I’m actually planning on finding time to resit the bootcamp this year just soak up more of the goodness. 

 

Plans Plans Plans

2014 was a bit of a mess for me. But I think I still did fairly well in executing on gaining some of the programming skills that I wanted. 2015 is going to be a crazy time for the whole industry. I’m not sure which of these four areas is going consume the most of my time. The way our industry has been going, it’s entirely possible that I will fall in love with something else entirely. 🙂  

If at the end of 2015 I have managed to move forward in these four areas by at least a few steps, I think I will consider the year a success. 

 

What about you?

 

@netmanchris

 

Surfing your NMS with Python

Python is my favourite programming language. But then again, it’s also the only one I know. 🙂

I made a choice to go with python because, honestly, that’s what all the cool kids were doing at the time. But after spending the last year or so learning the basics of the language, I do find that it’s something that I can easily consume, and I’m starting to get better with all the different resources out there. BTW http://www.stackoverflow.com is your friend.  you will learn to love it.

On with the show…

So in this post, I’m going to show how to use python to build a quick script that will allow you to issue the RealTimeLocate API to the HP IMC server. In theory, you can build this against any RESTful API, but I make no promises that it will work without some tinkering.

Planning the project.

I’ve written before how I’m a huge fan of OPML tools like Mind Node Pro.  The first step for me was planning out the pieces I needed to make this:

  • usable in the future
  • actually work in the present
In this case I’m far more concerned about the present as I’m fairly sure that I will look back on this code in a year from now and think some words that I won’t put in print.
Aside: I’ve actually found that using the troubleshooting skills I’ve honed over the years as a network engineer helps me immensely when trying to decompose what pieces will need to go in my code. I actually think that Network Engineers have a lot of skills that are extremely transportable to the programming domain. Especially because we tend to think of the individual components and the system at the same time, not to mention our love of planning out failure domains and forcing our failures into known scenarios as much as possible.

Screen Shot 2014 11 24 at 9 33 50 PM

Auth Handler

Assuming that the RESTful service you’re trying to access will require you to authenticate, you will need an authentication handler to deal with the username/password stuff that a human being is usually required to enter. There are a few different options here. Python actually ships with URLLIB or some variant depending not the version of python you’re working with.  For ease of use reasons, and because of a strong recommendation from one of my coding mentors, I chose to use the REQUESTS library.  This is not shipped by default with the version of python you download over at http://www.python.org but it’s well worth the effort over PIP’ing it into your system.

The beautiful thing about REQUEST’s is that the documentation is pretty good and easily readable.

In looking through the HP IMC eAPI documentation and the Request library – I settled on the DigestAuth

Screen Shot 2014 11 24 at 10 17 07 PM

So here’s how this looks for IMC.

Building the Authentication Info

>>>import requests   #imports the requests library you may need to PIP this in if you don’t have it already

>>> from requests.auth import HTTPDigestAuth    # this imports the HTTPDigestAuth method from the request library.
>>>
>>> imc_user = ”’admin”’   #The username used to auth against the HP IMC Server
>>> imc_pw = ”’admin”’   #The password of the account used to auth against the HP IMC Server.
>>>  

auth = requests.auth.HTTPDigestAuth(imc_user,imc_pw)     #This puts the username and password together and stores them as a variable called auth

We’ve now built the auth handler to use the username “admin” with the password “admin”. For a real environment, you’ll probably want to setup an Operator Group with only access to the eAPI functions and lock this down to a secret username and password. The eAPI is power, make sure you protect it.

Building the URL

So for this to work, I need to assign a value to the host_ip  variable above so that the URL will complete with a valid response. The other thing to watch for are types. Python can be quite forgiving at times, but if you try to add to objects of the wrong type together… it mostly won’t work.  So we need to make sure the host_ip is a string and the easiest way to do that is to put three quotes around the value.

In a “real” program, I would probably use the input function to allow this variable to be input as part of the flow of the program, but we’re not quite there yet.

>>> host_ip = ”’10.101.0.109”’   #variable that you can assign to a host you want to find on the network
>>> h_url = ”’http://”&#8217;    #prefix for building URLs use HTTP or HTTPS
>>> imc_server = ”’10.3.10.220:8080”’   #match port number of IMC server default 8080 or 8443
>>> url = h_url+imc_server    #combines the h_url and the IP address of the IMC box as a base URL to use later
>>> find_ip_host_url = (”’/imcrs/res/access/realtimeLocate?type=2&value=”’+host_ip+”’&total=false”’)   # This is the RealTimeLocate API URL with a variable set
>>>

Putting it all together.

This line takes puts the url that we’re going to send to the web server all together. You could ask “Hey man, why didn’t you just drop the whole string in one variable to begin with? “   That’s a great question.  There’s a concept in programming called DRY. (Don’t Repeat Yourself).  The idea is that when you write code, you should never write the same thing twice. Think in a modular fashion which would allow you to reuse pieces of code again and again.

In this example, I can easily write another f_url variable and assign to it another RESTful API that gets me something interesting from the HP IMC server. I don’t need to write the h_url portion or the server IP address portion of the header.  Make sense?

>>> f_url = url + find_ip_host_url
>>>    #  This is a very simple mathematical operation that puts together the url and the f_url which will product the HTTP call. 

Executing the code.

Now the last piece is where we actually execute the code. This will issue a get request, using the requests library.  It will use the f_url as the actual URL it’s going to pass, and it will use the variable auth that we created in the Authentication Info step above to automatically populate the username and password.

The response will get returned in a variable called r.

>>> r = requests.get(f_url, auth=auth)    #  Using the requests library get method, we’re going to pass the f_url as the argument for the URL we’re going to access and pass auth as the auth argument to define how we authenticate Pretty simple actually . 
>>>

The Results

So this is the coolest part. We can now see what’s in r.  Did it work? Did we find out lost scared little host?  Let’s take a look.

>>> r
<Response [200]>

Really? That’s it? .

The answer is “yes”.  That’s what’s been assigned to the variable r.  200 OK may look familiar to you voice engineers who know SIP and it means mostly the same thing here. This is a response code to let you know that your request was successful – But not what we’re looking for. I want that content, right?  If I do a type(r) which will tell me what python knows about what kind of object r is I will get the following.

>>> type(r)

<class ‘requests.models.Response’>

So this tells us that maybe we need to go back to the request documentation and look for info on the responses. Now we know to access the part of the response that I wanted to see, which is the reply to my request on where the host with ip address 10.101.0.111 is actually located on the network.

So let’s try out one of the options and see what we get

>>> r.content
b'<?xml version=”1.0″ encoding=”UTF-8″ standalone=”yes”?><list><realtimeLocation><locateIp>10.101.0.111</locateIp><deviceId>4</deviceId><deviceIp>10.10.3.5</deviceIp><ifDesc>GigabitEthernet1/0/16</ifDesc><ifIndex>16</ifIndex></realtimeLocation></list>’

How cool is that. We put in an IP address and we actually learned four new things about that IP address without touching a single GUI. And the awesome part of this?  This works across any of the devices that HP IMC supports.

Where to from here?

So we’ve just started on our little journey here.  Now that we have some hints to the identity of the network devices and specific interface that is currently harbouring this lost host, we need to use that data as hints to continue filling in the picture.

But that’s in the next blog…

Comments or Questions?  Feel free to post below!