I”m a network engineer who codes. Maybe even a network coder. Probably not a a network programmer. Definitely not a programer who knows networking. I’m in that weird zone where I’m enough of two things that don’t normally go together that it makes conversations I”m having with some of my peers awkward.
I had one such conversation today trying to explain the different data serializations modes in python and why, at the end of the day, they really don’t matter.
The conversation started with one of those “But they have an XML API!!!” comments thrown out as a criticism of someone’s product. My response was something like “ And why does that matter? ”
The person who made the comment certainly couldn’t answer that question. It was just something they had read in a competitive deck somewhere.
I’m all about competing and trying to make sure that the customer’s have the BEST possible information to make the best decisions for their particular requirements, but this little criticism was definitely not, IMHO, the best information. In fact, it was totally irrelevant. This post is my way of trying to explain why. Hopefully, this will help clear up some of the confusion around data structures and APIs and why they really don’t matter so much, as least not their formatting.
You can read more about XML here. In a nutshell, XML uses tags, similar to HTML to represent different values in your data stream. the <item> opens up an item and the </item> closes the item, and what lives between the two is the value for that item. Take a look at the following XML output from the HP IMC NMS. I just cut and paste this straight out of the API interface, so you should be able to do the same if you want to follow along at home. In this code, I have created a string called x and pasted in the XML formatted text which is a bunch of information about a Cisco 2811 router that lives in my lab. Pay attention to the values as they will stay the same going through this exercise.
XML is the oldest of the bunch, being a W3C recommendation in 1998. It’s important to note though that XML is still relevant, being the native data format of Netconf and still used in a lot of places. It’s old, but that doesn’t mean devoid of value.
|>>> x = '''<device>|
|<contact>changed this too</contact>|
|<location>changed this too</location>|
|<link op="GET" rel="self" href="http://10.3.10.220:8080/imcrs/plat/res/device/116"/>|
A dictionary is a way of storing data in python that uses keys instead of an index to access the content or value of a specific piece of information you want. Example item[‘ip’] would return “10.101.0.1” with a dictionary.
One of the “issues” with dictionaries is that they are unordered. That means that there’s no guaranty that when you print out a dictionary that the values will be in the same order. ( Pretty obvious when you read the word “unordered” I know.) The OrderedDictionary is a “ dict subclass that remembers the order entries were added”. So we’re going to use a great little library called xmltodict which takes an XML string ( called x ) above and transforms it into a python ordered dictionary. Now we can do interesting things to it in python. We can access they keys and get to the values directly. We can iterate over top of it because it’s one of pythons native data structures. It’s easy to use. People know it and understand it. It’s a good thing. Lists and dictionaries are the bread and butter of data structures in python. You need, need, need them.
In this code example, we’re going to take the XML string from above, run it through the xmltodict to convert it to an ordered dictionary and assign it to the variable y. Once I’ve got the ordereddict Y, I could also use xmltodict to convert it back into XML with little to no effort. Cool?
|>>> y = xmltodict.parse(x)|
|OrderedDict([('device', OrderedDict([('id', '116'), ('label', 'Cisco2811.haw.int'), ('ip', '10.101.0.1'), ('mask', '255.255.255.0'), ('status', '1'), ('statusDesc', 'Normal'), ('sysName', 'Cisco2811.haw.int'), ('contact', 'changed this too'), ('location', 'changed this too'), ('sysOid', '184.108.40.206.220.127.116.11.576'), ('sysDescription', None), ('devCategoryImgSrc', 'router'), ('topoIconName', 'iconroute'), ('categoryId', '0'), ('symbolId', '1147'), ('symbolName', 'Cisco2811.haw.int'), ('symbolType', '3'), ('symbolDesc', None), ('symbolLevel', '2'), ('parentId', '1'), ('typeName', 'Cisco 2811'), ('mac', '00:1b:d4:47:1e:68'), ('link', OrderedDict([('@op', 'GET'), ('@rel', 'self'), ('@href', 'http://10.3.10.220:8080/imcrs/plat/res/device/116')]))]))])|
JSON has become one of the standard ways to represent data between machines. It’s structured, well understood and it’s mostly human readable. A lot of “newer” systems now use JSON as the default data type. Most RESTful APIs for instance seem to have settled on JSON.
This is where things get interesting. Now that I’ve got XML in an ordereddict, I can use the JSON library to convert it to a JSON formatted string which I can then send along to any system that understands JSON. Or write it to a file, or just stare at those pretty, pretty braces.
Note: If I convert from JSON back to a python structure using the json.loads method, it will actually return a regular dictionary, not an Ordered Dictionary, so the values might appear out of order which COULD, in theory, cause issues with an upstream system, but I haven’t seen that in any of my work.
|>>> print (json.dumps(y, indent = 4))|
|"contact": "changed this too",|
|"location": "changed this too",|
|"typeName": "Cisco 2811",|
Although JSON is “more” readable than XML, it’s still got all those braces and apostrophes to worry about. And so YAML was born. YAML is easily the most human readable of the formats I’ve worked with. It uses white space, dashes and asterisk to denote different levels of the data structure. It’s what is commonly used with Jinja2 templating and Ansible and other cool buzzwords that we all are starting to play with.
Just like with the JSON example above, I can take the Ordered Dictionary and convert it to a YAML format (shown below ) and back again. The yaml.load method does actually return an Ordered Dictionary.
|>>> print (yaml.dump(y, default_flow_style = False))|
|contact: changed this too|
|location: changed this too|
|typeName: Cisco 2811|
What’s my point?
So the original criticism was “But they have an XML api!!!” right? Well in these little code snippets I just demonstrated how using python and a couple of readily available libraries ( pyyaml and xmltodict are not native python and must be installed ) I was able to go from XML, to OrderedDict, to JSON, to YAML, with almost no effort. I could take any of these and convert it to something like a Python Pickle, pull it back and convert it to something else. It really doesn’t matter. I can go from one to another without much effort.
Personally, I don’t like working with XML. I can do it, but I would RATHER work with JSON. But that’s just my personal preference, there’s no technical reason why JSON is superior to XML that I can see. At least not in the implementations and the levels that I’m dealing with.
Just like Bilbo Baggins, I can go from there and back again without worrying to much about the actual format in between because when I”m doing something in python, I’m really looking to be working with a native structure like a list of a dictionary anyways.
Anything that I get from externally, I’m just going to convert into a native python data type, munge away, then I”m going convert it back to whatever data format I need, be that JSON, XML or YAML and be on to the next task.
The actual data is what matters.
As long as it’s structured in a way that I can parse easily, I couldn’t care less how it comes in and how it goes out.
Don’t even get me started about simple wrapping CLI commands in XML…
Does that mean the format doesn’t matter at all?
No, I’m sure there are many more experienced programmers who can explain the horror stories of converting between different data formats, or that time when this thing happened that caused this other thing to blow up. But for me; I’d much rather you had a well structured API that gives me data in a way that I can easily access, convert to a format I can work with, and move on.
Hopefully if you’ve made it to the end of this blog. You’ll agree that the actual format is much less important that you might once have believed. Disagree? Let me know if the comments below. Always looking to learn something and in the coding real, I ‘know I’ve got a LOT to learn!!!