So in a previous post, I made the recommendation to go find an ITSM framework. For the rest of this series, I’ll be referring to the ITILv3 ITSM framework a lot. The two books that, IMHO, apply the most to Network Operations are the Service Transition and the Service Operations books.
For the next few posts, I’m going to focus on the Service Transition volume, and specifically on the Configuration Management sections.
So in ITILv3, one of the MOST important things to understand is the concept of a Configuration Item.
What’s a CI?
The way I explain this to customers is it’s the smallest managed thing, or set of things, in the environment.
How does that apply to my network?
Well, hopefully, I’m going to try and explain that now.
The first CI in a network might be the hardware devices that are in the network. These are your switches, routers, firewalls, load balancers, servers, etc…
So most people are good with the idea of standardization. It makes senses that it’s easier to manage fewer kinds of devices. This is recommendation #1.
1) Standardize on as few hardware platforms as possible.
The good thing is that this is fairly easy to achieve. In fact a lot of people do this instinctively. They standardize on the same two chassis switches in their core, they use the same model in their distribution, and they use the same model for the access layer.
Here’s where things get crazy though.
Many of the same customers who try to standardize on a current device often have no processes in place to ensure that they are all running the same version of code.
So think back to the ITIL Configuration Item. If you have five HP 5500EI switches, and five different OS’s on them, you now have five different CI’s to track. Make sense?
Five different versions of commands
Five different versions of bugs.
5 times the headache.
If a configuration item is the smallest manageable object, then each of the different combinations of hardware and software count as a single CI. BUT… if we standardize one version of code for that hardware platform, we get one configuration item.
So the first thing that I recommend customers to do is…
GET ON ONE VERSION OF CODE!!!!!!
This is commonly called a golden software version. One version of commands, one version of bugs. One CI.
On the flip side; one of the other common mistakes I see made by customers who have taken the first step of getting on a single version is that of upgrading without a reason.
My recommendation here is to do your homework. When a new version of code is released, read the release notes, check the bug fixes, check the new features. If there’s nothing in there that is addressing an issue you’re having, or new functionality that you NEED to have,
WHY WOULD YOU CHANGE?
It may seem strange, but when you get a new switch out of the box, you may want to just plug that into your network and downgrade it to the older software. More thoughts on this in this blog post.
Any decent NMS should have the ability to be able to define, report, and deploy the correct version of code to the hardware devices.
Funny enough, post writting this, I found this another great blog by Terry Slattery, this time over at http://www.nojitter.com.
What about you guys? What configuration tools are you using? HP IMC? Orion NCM? Rancid? Prime? A TFTP Server on a wandering laptop?
Orion NCM right now. It’s not bad, but it’s a little limiting in the that templates used to retrieve the configs are locked into an XML schema that assumes a “startup” and “running” config. That’s fine for most, but not all, Cisco devices. It’s not fine for, say, F5 devices with configs living in a different set of files, and which require a completely different process to obtain a true backup, i.e. the creation and retrieval of a UCS file. NCM can still be used to get the job done, but it’s more involved, requiring that the admin write custom scripts to back up the device config, and even then the retrieved config is not truly integrated into the overall Orion system like running/startup configs are, where you can see those configs as a web part right on the device’s page.
I think this is why point solutions continue to thrive in the NMS space. No management solution handles hardware as well as a vendor’s own. Usually.
Hi Ethan,
Orion seems to definitely be one of the more popular out there for non-manufacturer supplied tools. I remember using the Solarwinds tools in the late 90’s as the TFTP and syslog server of choice. They have definitely pushed in the last couple of years. One of the biggest question that is outstanding in my mind is how the integration of all of their aquisitions has been going.
When I did my SCP, although they presented themselves as a single interface, it seemed that in the background, there wasn’t much going on between Orion NCM and Orion NPM from a SOA standpoint. I’ve heard some of that was addressed in Orion 10, but I haven’t had a chance to REALLY dig into the new version. ( It’s on my list! ). How has your experience been?
The point solution in the NMS space I don’t think will ever really go away. There’s always the balance between deep management, and multi-vendor support. To my knowledge IMC is still the only major networking manufacturer provided NMS that does have 3rd party support ( and I don’t mean MIBII SNMP and ICMP support ). I believe that the NMS/EMS space is going to be very exciting to watch in the next few years as it seems that in many ways, that SDN is just a sexier name for NMS from a certain perspective. It’s going to be fun to watch what happens!
On the F5 front; It will be interesting to see how integrated F5 device management gets into HP’s IMC. I wouldn’t surprise me at all if F5 config bacup/restore functionality is in a future version of IMC. I guess time will tell.
Full Disclosure: I work for HP Networking for what that’s worth. 🙂 I like to think I’m open minded, but I want to make sure people have all the info to judge my comments.
“…networking manufacturer-provided NMS”
That’s a key point – there are software focused companies that do NCCM for multiple vendors. (And of course software divisions of networking companies that do it – HP Network Automation). But then they all tend to fall down when you want wireless mgmt, and you need to go for the vendor software.
Just one other thing on selecting an appropriate software image – while I tend to agree with not upgrading unless there are specific fixes or features you need, you also need to take into account vendor support lifecycles. You most definitely do not want to be stuck running old code. Even then, if you aren’t running the absolute latest code, many support organisations will tell you to upgrade, as a knee-jerk response, before they’ll even begin to assist you with your problem
Hey @Lindsey,
Thanks for pointing that out. I was actually having a conversation yesterday about this very thing about firewall management. This is one of the “other” network devices that seem to have specific management requirements that result in a need for a point solution. It’s still always great to have the major FCAPS functions in a single management solution if possible. Consolidated alarm for fault, consolidated config repository for configuration, etc…
On the software image, great catch, lifecycle management is definitely something that should be taken into account. I think it’s just good network hygiene to make sure that you are running reasonably current code. I wouldn’t suggest going on the bleeding edge, unless you have one of the reasons above, but it’s also a bad idea, as we saw with the internet outage we saw last year which was caused by a failure of the service provider to upgrade to versions of code which had addressed the known issue in a timely manner.
Have to agree on the support org comment. I actually recommend pushing back and asking them why they are asking me to upgrade without having a good reason and list my change control policies as the reason I can’t upgrade with out a documented bug. This usually forces them to start troubleshooting the issue before they make a recommendation.