Rethinking Change Control in a SDN world

I had the opportunity to attend the Open Networking User Group event (ONUG) in New York recently and had a chance to talk through some of my musings around change management in an SDN world with some very smart, knowledgable people from a range of different backgrounds.

Let’s talk a little about change control

In a nutshell, people screw things up when left to their own devices. Individuals will inevitably type a wrong command, misplace a decimal point, not have sufficient information, or just plain not-think-something-through.

People are frail, fragile and error prone. But when people come together in groups, share information, share experience, and double check each other’s work, then the error-rate per change tends to drop significantly and changes start to be implemented in a much higher quality fashion.

Change Policy in Modern Organizations

Most modern organizations have some change management process in place. Whether they have succumbed to a full ITIL based process, gone the DevOps route of continual integration, or fall somewhere in between, people have generally figured out that change management is a good thing.

I’ve seen good change management that promotes healthy growth, and I’ve seen bad change managements that restricts the business into stagnation because nothing is ever allowed to change in the organization. ( There’s another word for something that never changes – dead. ūüôā

Change Control in a SDN environment

One of the major issues I see in SDN environments is that many of the changes that we are not only capable, but advocating, are currently heavily restricted through the existing organizations change policies.

To make this example more concrete, let’s talk about an app from Guardicore that uses SDN to detect potential advanced persistent threat attacks in the data center and then uses OpenFlow and the HP VAN SDN Controller to dynamically keep the session alive and re-route ( re-bridge?) the flow directly to a Honeypot which is capable of performing further analysis on that particular session to see if that particular flow is trying to do anything more interesting like trying to execute shell code or some other dubious shenanigans.

Now imagine how the Change Advisory Board is going to react to this request. I imagine it could go something like this.

” What? You want to reconfigure the edge, distribution and core of my data center based on an unknown event at an unknown time because something may or may not be going on?”

How do you think that’s going to go?

ITSM Pre-Approved Changes

There is a concept in ITSM frameworks like ITIL and MOF that allow for a common change to be pre-approved. The change request still has to be fed into the system, but the approvals are automatic and no one has to actively log into a system and click the ” I Approve “

One of the approaches I’ve been advocating is the possibility of repurposing the pre-approved change to allow for dynamic flow modification based on known conditions. This seems to be the simplest way for us to allow the ITSM structures in well-run IT organizations to continue to work without having to scrape the whole change approval process.

This is new ground and I think that this topic requires a lot more discussion that we are currently giving it.

What do you think? Is pre-approved change the way to go? Is there another better way? Is your organization currently using SDN and found a way to rationalize this to the Change Advisory Board?

Please blog it up or post in the comments below.

Advertisements

Network Management – How to get started

Network Management Skills

In the last few years, I’ve noticed that I’m a little different. It’s not just because I wear coloured socks or my hair looks like I style after Albert Einstein. ¬†I noticed that I’ve developed a different skill set than the majority of my pre-sales or post-sales network professional peers. What skills you ask? ¬†Network Management and Operations.

Why I choose to develop Network Management skills

About five years ago, I took a look at the market and thought ” This stuff is complicated “. ¬†Earth shattering observation, right? ¬† It sounds simple, but then I started looking at some of the tools we had at the time and I realized that NMS tools could really help to automate not only the information gathering, but also the configuration tasks in our networks. ¬†At the time, we had a cool little tool called 3Com Network Director. ¬†It ran on a single PC. No web interface and it really only managed 3Com gear. But it was better than running CLI commands all day long. And the monitoring aspects really helped my customers identify and resolve problems quickly. ¬†This was a moment of inspiration for me. ¬†I choose to develop skills in network management and operations.

Let me say that again.

I choose to develop skills in network management and operations. 

I didn’t choose to develop skills in 3ND, or IMC, or Solarwinds, Cisco Prime, or any of the various other tools. Overtime, I’ve gained experience on all of those products, but I would say my true value is having gone through the process to develop skills in the sub disciplines of network management. Learning a product is only a very small part of the whole domain.¬†

What does that mean? 

It’s easy to learn a product. They have bells and whistles. Click this check box. fill in this box. etc.. ¬† ¬†Those skills are important. But they don’t help us understand how to apply the product to resolve our customers business challenges. They don’t help us understand when not to click that box. And they don’t help us to design a network management strategy, or to consult with our customers on operational efficiencies and what can be done to help increase their networks stability, to reduce the MTTR times, or to mitigate pressures put on the operations team.¬†Learning the domain knowledge has helped me to understand WHY we have developed the product features and what they are to be used for.

My Learning Roadmap

To put it simply, I consumed everything I could on the subject. It’s amazing how much free information is out there if you set your mind on finding it. If anyone’s looking to increase their skills in this area, I’ve put together the following list of resources that have really helped me in this domain. I’ve tried to keep this out of vendor specific products, but I’m sure you’ll find that any product you choose will probably have training and learning resources around it as well. This is in NO way inclusive, there are a lot of resource out there. I highly encourage everyone to read, watch, and listen to as many of them as you can and to think about them critically.

Free Resource

Solarwinds SCP training¬† The Solarwinds SCP training is online and free. What I really liked about this training is that it’s really focused on network management, netman protocols, and the operational aspects of network management. There are, of course, some product specific aspects to the training, but in general this is a really good primer on network management in general. Oh‚Ķ did I mention there’s a bunch of videos as well? ¬†Great stuff to rip and put on your tablet when you’re stuck on a plane and you’ve seen all the movies.¬†

Solarwinds has also provided a bunch of whitepapers going further in depth on network management specific subjects which are a great reference. ¬†If you’re interested there’s also the Solarwinds Certified Professional certification if you’re looking for a way to validate your knowledge.

The Information Technology Infrastructure Library¬† (ITIL) is a compilation of IT service management practices compiled over the last 30 years. There’s a lot of great stuff in here. The books are expensive though. ¬†There is an entire industry that’s sprung up around ITSM. ¬†If you have some commute time to spare, I would highly suggest typing in the words “ITSM” into your favourite Podcast app and sit back and listen.¬†

If you’re interested, there’s also the ITILv3 Foundations certification if you’re looking for a way to validate your knowledge.

Blogs and Podcast

Social Media is a great way to learn how people apply ITIL concepts to the real world. I particularly like¬†http://www.itskeptic.org as it’s got a great following of a bunch of smart people who disagree on a regular basis. You never know when the customer you’re going into is operating in a traditional ITIL based ops model, perhaps they are using the Microsoft OperationsFramework, or perhaps they’ve moved on to Agile and DevOps. It’s good to have at least a cursory knowledge in all of this approaches to IT operations, to mention traditional Network Management Frameworks like FCAPS and eTOM.¬†

Paid resources

Books are a great way to learn about network management and operations. Here’s my ¬†abbreviated reading list. These are the reference books that sit on my shelf within easy reach.¬†

Network Maturity Model¬†– This book is actually a academic thesis focused on trying to extend the CMMI¬†models to network specific capabilities maturity models. Of course, network operations is part of the capabilities of an organization, so there’s a lot of great content in here. ¬†The book is definitely academic, but it’s got a LOT of great content in it, assuming you can get through all of the required footnotes and pointers to other academic works.¬†

Fundamentals of EMS, NMS, and OSS/BSS¬†– This books is wonderful. It covers all aspects of traditional telecom management from FCAPS to eTOM, as well as looking at OSS/BSS architectures which usually exist only in Service Provider networks. Great information in here. ¬†My biggest problem with this book is the font size. I have glasses and it’s tiny. ¬†Worth the effort to make it through, but plan on multiple reading sessions. This is not a book you’re going to get through in one sitting.

Network Management Fundamentals¬†– Cisco Press book that’s a great read. A lot of information in here is covered in some of the other books. What I like about this is that it written as an introduction to network management for people already working in the field. This is not an academic text.

Network Management: Accounting and Performance Strategies РCisco Press book again. This one focuses strictly on performance management, focusing a lot on Netflow and how it can be applied to accounting and performance in large network.  

Performance and Fault Management¬†– Cisco Press Book again. This is an older book, so the technologies discussed may not be as relevant as they once were. The nice thing though is that we’re talking about operational models and processes here, so the principles still apply.¬†

VoIP Performance Management and Optimization РLast Cisco Press book. This book looks at the operational aspects of VoIP/IP Telephone/Unified Communications networks specifically. There are a lot of very detailed recommendations in here that can be leveraged to give customer guidance on what they should be doing and what they should be monitoring. This book has helped me a few times when working with customer who have chosen to implement a dual-vendor strategy and want to have HP Intelligent Management Center managing and monitoring there Cisco Callmanager environment in addition to their network.

The Phoenix Project РThis book is written as a novel to teach people about the DevOps movement. This is a MUST read for anyone interested in IT operations and the current trends in the industry. It also will help get a first hand accounting of what many customers go through. Read it. Read it. Read it.

The Visible Ops¬†– From the same authors of the Phoenix Project. This book tries to tie DevOps and ITIL together. Interesting read. Many people see DevOps and ITIL as two opposites of the spectrum. Most have had a bad ITIL experience and now the pendulum swings in the other direction. Finding a happy middle is a good goal. I’m not sure they’ve hit the mark, but it’s a start.

Network Management: Principals and Practice¬†– Expensive book. Good information, but the technology is also quite dated. Concepts and knowledge is great. Good diagrams, but it’s sometimes hard to get through the hubs and token ring.

Domain Related Knowledge

Network Management is really about ensuring stability and helping the business to meet their operational requirements with the greatest efficiency possible. In that light, it’s important to understand what some of those operational burdens are. In recent years, businesses have had a ton of GRC¬†(governance, risk, and compliance) requirements put on the operations teams that threaten to break an already overloaded team. ¬† On the bright side, I believe that although they have been forced into these requirements through legislation and governance like SOX, COSO, PCI-DSS, HIPPA, Gramm-Leach-Bliley, etc. ¬†have actually forced network operations teams to get much tighter on their controls, forcing us into more stable and secure networks.¬†

note: This list is US specific, if international readers can post some examples in the comments section, I would be happy to add them to the list of references.

In my experience, one of the issues with GRC requirements in general is that they are very rarely descriptive of what actually needs to be done. They have generic statements like “monitor network access” ¬†or ” secure the it assets”. ¬†

ISACA noticed this and put together the COBIT framework which is a very detailed list of over 30 high-level processes and over 200 specific IT control objectives. Most of the GRC requirements can be mapped to specific COBIT objectives. COBIT is a good thing to be familiar with. 

 Next Steps

As we move forward in IT, operations and orchestration skills are starting to become some of the hottest requirements in IT. 

Whether it’s products like HP’s Cloudsystem, or industry wide projects like OpenStack, CloudStack or Eucalyptus, having solid Operational knowledge and skills is going to be a requirement for anyone seeking the coveted Trusted IT Advisor role in any customer.¬†

For anyone looking to gain or just brush up on their network management specific skills.

I would recommend

  • Solarwinds videos as a place to get started with the basics of network management
  • become familiar with the basics of COBIT and GRC in general.. ¬†Doing some reading on the various GRC requirements that apply to your specific regions and customers is also a great way to change the conversation from speeds and feeds to the challenges of the business.¬†
  • Read on OpenStack
  • Learn about ITIL and DevOps

Social Media is always a great way to stay current as well. One of the biggest challenges of operations is the best way to learn it is to do it. Unfortunately, many of the really good Network professionals, whether pre-sales or professional services, don’t get an opportunity as they are usually hands-off or on turning over the keys to an ops team after the project has been delivered. Socialmedia helps to connect to the daily challenges of people who are living in the trenches.¬†

Get some ITSM experience. If you don’t work in a company where you get to babysit the same environment, you can always do what I did and experiment with it at home.

Anyone else have any suggestions on how to get up to speed? Feel free to comment below!

@netmanchris

Providing Network Leadership

So I have to give credit where credit is due‚Ķ ¬†a lot of this post is directly inspired by the book Network Maturity Model¬†By William J. Bauman et al. ¬† It’s written in a very academic style, but there are a ton of little gems in there which I think are worth pointing out. I’m expanding a lot on some of these key points, so please feel free to drink from the source rather than the muddy water down river. ūüôā

 

The first section of the actual maturity model deals with Enterprise Network Leadership. I think it’s important to say that when I’m using the word Enterprise, I’m not talking about a large organization. I’m just talking about the business. Whether you are responsible for a few switches and a router, firewall or UTM appliance, or you are responsible for a multinational organization with a global WAN, several large campus environments, and smaller branches spanning the globe. I think the same general guidelines apply.¬†

 

Have a Plan

The network leaders are responsible for creating a network business plan that aligns with business strategy. Now keep in mind, that there are a LOT of very talented people in the industry who are consultants. These hired guns are often jumping from engagement to engagement, so this might not apply to them. But for those who are in an Network Operation role, it’s critically important to understand:

  • What the business goals are?
  • Who the LOB application stakeholders are?
  • What their requirements are? What applications are important to them?
  • How the LOB stakeholders directly impact the profitability of the business?

and most importantly; 

  • How the ability, or lack thereof, to successfully run the network can impact the business directly?

The Network Leaders are responsible for creating both the vision/strategy, and the specific policies and procedures to support the vision in the short, mid, and long term. From specific policies such as acceptable-use statements to longer term procedures such as a planned equipment refresh on a well defined rotational schedule to avoid a massive CAPEX hit, the network leader is responsible for making sure the network has the appropriate capacity, resiliency, availability, redundancy, etc.. to meet the business requirements. 

To create the vision/strategy from which the policies and procedures are derived, they should also be ensuring that the requirements of those stockholders are taken into account when planning out the network and all the operational tasks around it. This is very broad and can be summed up as “understand the business requirements”.

 

Understanding the Business Requirements

This one gets thrown around a lot in our industry. But to be honest, I find that VERY few hardcore network professionals actually take the time to do this. It’s my opinion, obvious bias aside, that the network is one of the fundamental pillars of almost every network in the world now. ¬†I’m choosing not to use the word “foundation” because I don’t believe that’s true.¬†

A foundation to me is something that business is built upon. ¬†Imagine if you will that a business is responsible for making hand-made clothes. Or is responsible for growing food. I think it’s obvious that the network is not the MOST important thing. In both of these examples, I don’t think any would argue that the business will be incapable of creating it’s product without the network.¬†

But imagine if the network is down and they are unable to receive orders from their customers? What if the network is down and they are unable to use their ERP system to ship orders? Or to send invoices?  

I think we can all agree that if the products sit on the shelf, it’s not a good thing. Money doesn’t come in. And soon, global economic catastrophe is created, cats sleeping with dogs, total chaos!!!

All because a network went down. 

(OK‚Ķ maybe I’m exaggerating a little. )

 

So what kind of things should be taken into account when we say “understand the business requirements”? ¬†Here’s some of the top of my list:

What governance, risk, or compliance initiatives does the company have to adhere to?

GRC? Huh? Depending on the specific industry, country, or region of the world that the company operates in, there may many legally enforced burdens that are placed on the company. The major examples everyone seems to know are SOX, Graham Leech, HIPPA, etc..  These all have different, although often complimentary, requirements that depending on the nature of the business, you need to be aware of as a network leader.  

If you are a network leader and you are having trouble getting budget approval for some much needed networking upgrades. Learn about which GRC requirements apply to your organization. It’s amazing how quickly the purse strings open when the business leaders understand that the failure to do these upgrades may have a direct impact on a GRC requirement that they can be personally held liable for.¬†

What are the different Line of Business applications and how critical are they to the success or failure of the business?

Most companies have a LOT of applications they “need” to do their business. But there is a BIG difference between their Microsoft Lync implementation which they use to increase collaboration between globally dispersed teams, and their ERP system which is responsible for making sure that orders are received, shipping requests are sent to the warehouse, and invoices are sent to the customer.¬†

If you are a network leader and you are having trouble getting budget for some much needed networking upgrades. Learn which of the LOB applications are directly related to the business’s ability to take orders, ship product, or invoice customers. When requesting budget for the upgrade, make sure you make it clear what hourly business cost can for network downtime.¬†

An easy way to calculate this, if you have access to the numbers, is to look at the annual report. Figure out what the revenue was last year, divide by 365. divide by 8 and you know have the hourly cost of downtime. 

 

For me, these are two of the most important “understand the business” requirements, but I’m sure there are a ton of others ones. ¬†PLease feel free to call out more examples in the comments! I’d love to hear them!

 

@netmanchris

 

 

 

Configuration Management – Configuration Baselines

Many times when I’m speaking with customers, one of the first questions I get asked is

” Ok, I’ve got this NMS, what’s the first thing I should do that’s going to make the biggest difference in my network?”

There are probably a lot of opinions on the answer to this question. For me, the answer is always this:

Start with Configuration Management.

In ITILv3, one of main aspects of the configuration management domain is to track all of the configuration items that relate to an IT service. For more on ITILv3 CI’s check out this video.

For those of you who suffer from insomnia and would like a cure, most of the ITILv3 change management stuff is found in Volume III, Service Transition. In ITILv3, the first thing you need to do is to define your CMS.

Configuration Management System

This is the ITIL term for the software that handles your configs for you.

Again, remember that ITIL is about process. So it’s possible to actually run an ITIL based shop without tools in place. It’s POSSIBLE…¬†but I think this falls in the JBYCDMYS (Just because you can doesn’t mean you should) bucket.

What to look for in your CMS

So for NMS newbie’s who are trying to get into more process driven network operations, your CMS is the software that does basic tasks like

Backup of Configurations

Any NCCM solution should allow you to backup configurations. If you’re lucky you’re NMS may have additional features that allow you to move beyond basic configuration backups. Ideally, your NMS will have features that will enable you to define configuration baselines and snapshots for any given device.

Configuration Baselines : A configuration baseline is the configuration of a service, product or infrastructure that has been formally reviewed and agreed on, that thereafter can be changed only through formal change procedures. Configuration Snapshots: A snapshot of the current state of a configuration item or an environment. It also serves as a fixed historical record.

In plain english terms, a configuration baseline is the place where you absolutely last know that everything was working. A snapshot is an automatic backup that lets you know what the state of the device was at the time of that backup.

We’ll come back to this later on a subsequent blog post, but snapshots are also great to have around for helping to address your compliance initiatives like SOX, PCI, or HIPPA.¬† Having a configuration snapshot from a certain date is an easy way for you to prove to the auditors what the configuration state of a given device was on that date.

Configuration Templates: A complete, or a portion, of a device configuration.

This could be your standard configuration for your access switches, a secure configuration for your routers, or even just a portion of a configuration, such as the config required to change the local admin password on all your switches.

Scheduling Configuration Changes: The ability to schedule changes to your network devices at specific time.

The ability to schedule changes is nice. Assuming your changes have gone through a peer-review process and through your companies Change Approval Board, Why do you need to be up at 3am during your companies change window?

Now there may be cases where you will still need to be onsite to verify that a critical change went through. To perform the change validation tests that I KNOW you all had in your change plan. Right?

But for those cases where you are simply changing a local admin password, or adding an NTP server, or some other low-risk change, you may want to just schedule this for the ‘wee hours of the morning while you are home in your toasty bed.

One last thing…

When making major, or minor changes to your network configurations, it’s a good practice to go back and update your CMS to reflect the new Configuration Baseline for that device. ¬†You did actually run through a series of test to make sure you didn’t break something, right?

So although this could be a TFTP server on the network somewhere, hopefully it’s a software that will automate the backup of network device configurations for you. Examples could include HP’s Intelligent Management Center, Solarwinds Orion, Cisco Prime, or perhaps an opensource tool like RANCID.

In this video, I’ll go through the basic CMS functions of HP’s IMC to show how baselining and snapshots can be applied.

Getting ITSM Experience

Many of the best network engineers I know have little to no network operation experience.

“What? How can that be?” you ask? Well it’s really quite simple.

Most of the best network engineers I know, and we’re talking some double and triple CCIE’s in this crowd, have never actually operated a network for any length of time. They were professional services guys, short term contract guys, some pre-sales or post sales guys. There are a LOT of paths to the top of Mt. Fuji after all.

Although I did a short term network ops. gig early on in my career, I actually feel I squandered the opportunity as I just wasn’t mature enough to understand the experience that I should have been gaining.

So this question came up with a college last week. ” How do you get network operations experience if you’re not in a network ops group?”

This blog post is dedicated to him.

A few years back, after I decided to really get serious about network management, I had the same issue. I wanted to get some experience in network management, but I had no large network to run. In my day job, I’m actually a pre-sales resource, so it’s not likely I’m going to get any experience in the near future, so it occurred to me that I could start a simulation to try and gain that experience.

At this point, I had already done some Ciscoworks LMS projects (long sleeve shirts to cover the scars to prove it!). I had successfully passed my ITILv3 foundations certifications, and I had even gained the honor of being one of the first Solarwinds Certified Professionals shortly after the SCP program was launched.

The Project

So with a bit of knowledge, I decided to run my home network as if it was an ITSM framework for a year. This means that I had to implement good network management hygene. Good Change management practices. Good fault management practices. Try to implement some of the ITIL processes around Service Strategy, Service Design, Service Operations, and Continual Service Improvement.¬† Basically, run it like a business who’s success depended on the network.

The Tools

So I had some ideas around the processes I wanted to put in place, but it always takes the three P’s to successfully implement any ITSM initiative. People, products and processes. Fortunately for me, I had access to HP’s Intelligent Management Center, as well as the trial versions of Solarwinds Orion NPM and NCM. but I was still missing some critical pieces to the puzzle.

Service Operations: One of primary activities in Service Operations is really around the help desk. How are tickets logged? How are they tracked? Escalations Procedures. Building out and growing the KMS (knowledge management system )

I didn’t have any help desk or ticketing software in place, so I decided to go the free way; Spiceworks.

For those of you who don’t know it, Spiceworks is a free IT Management app which ” includes a free IT management app for everything from network inventory and monitoring to help desk and more!”

It’s not what I would call a full FCAPS system, but it does have an ok help desk system, and it’s hard to beat free, right?

Note: I noticed last week that my Synology NAS now has a help desk app named OS Ticket in the available apps. I haven’t tried this, but considering it’s free and installs easily on the synology box, it might be a good option for those of you who are lucky enough to have one of these great little machines.

Financial Management

Financial Management falls under the service strategy volume of the ITILv3 core books. I’ll be honest, that this wasn’t exactly the strongest part of my little experiment, but I did try to implement some financial processes.

But unlike some of the helpdesk and change control procedures, this wasn’t exactly something that I could count on good self-discipline to track. Can you imagine that conversation?

“Hey Me… I’d really like this new synology RS812.”

“Hmm… Don’t we already have a 411?”

“Yeah, but this one has TWO gigabit ports!”

“Let me think about that… ok. Let’s buy it!”

As you can see, I had to come up with a different plan.

Fortunately, I’m married, so I merely formalized the process of having to ask my wife for permission to buy any new toys. I have to say, this was probably the year that I got the least amount of new techtoys, but I like to think the experience I gained was worth it. ( < – What’s the HTML tag for the sarcasm font again? )

The Results

So how did things go? Well, it was a little funny at times. Emailing myself a support ticket so that I could fix something that wasn’t working. I did try to get my wife to e-mail the tickets in, but that lasted about a week before she just said ” Can you just fix it!?!?!?!”

For the other things, it felt a little strange asking myself for permission so that I could make a change to the environment and then having to consult myself to see what the affects might be ( Change Advisory Board ). Implementing the RACI (responsible accountable consulted informed ) was pretty easy because I generally get along with myself. etc…

To be honest, I wish I would have been blogging back then, because I think it would have made for some interesting reading in retrospect. I’d like to say that I followed all the processes and ran a bullet proof network for the year, but I didn’t. Sometimes I slipped, made a change and locked myself out of my own gear.

But on the bright side…¬† I did learn why change management is important.

Any one else gone through an experiment like this? Anyone willing to take up the challenge and blog on the experience?

Configuration Management – Software Management

So in the last post I introduced the concepts of the Configuration Management System, and the Configuration Item. Today, I’m going to introduce the concept of the Definitive Media Library.

The DML is really nothing more than a software library. Ideally, this should be tied directly into your element management system so that you can define the baseline software image, deploy the image out to the appropriate devices, and audit the network to ensure that all of the devices are inline with your golden software definitions.

As I laid out in the last post, standardization is there to make your lives easier. But it takes a lot of commitment, especially if your network has gone through significant “organic growth”. Making the choice to commit to good configuration management hygene is sort of like committing to going to the gym or commiting to eat healthier.

Just like going to the gym, the first thing you need to do is figure out your current software state. Hopefully, your NMS software will have the ability to discover and audit the software running on the devices in your network and report against a known good state.

Audit the Current State of the Network

If you don’t have an NCCM tool in place with these features, you may end up writing scripts, or worse case, loging into your devices manually and noting the software version in an excel spreadsheet. Once you have a handle on what’s out there, the next step is chosing what version of code you need to be running.

Choosing your Software Version

So now that you’ve figured out that your devices are all over the place, it’s time to figure out what version of software you actually want to be running. Whether you are running Comware, IOS, NXOS, Junos, FTOS, or some other OS that I haven’t mentioned, the guidelines are pretty much the same.

Wash, Rinse and Repeat.

What about the exceptions?

I was going to try to sugar coat this, but I’ll just come out and say it. Cisco has licensing for many of their platforms, this can create situations where you can’t actually get on a common code version without incurring additional CAPEX costs associated with buying the licenses and OPEX to deal with the SMARTNet’. Or potentially, you can get into the situation where the features you’re looking for are mutually exclusive in two different IOS images for your routers. Or you’re running Cisco Callmanager and your gateways require the Voice image and your regular WAN routers another image.

In any event, my recommendation is still the same. Find the fewest possible combinations of software for the hardware platforms in your network and stick to them unless there is a REALLY good reason to change.

Check out this video of the basic NCCM features in HP’s Intelligent Management Center to help you navigate through your software baseline woes.

Anything I missed here? Feel free to post in the comments below.

Intro to Configuration Management

So in a previous post, I made the recommendation to go find an ITSM framework. ¬†For the rest of this series, I’ll be referring to the ITILv3 ITSM framework a lot. ¬†The two books that, IMHO, apply the most to Network Operations are the Service Transition and the Service Operations books.

For the next few posts, I’m going to focus on the Service Transition volume, and specifically on the Configuration Management sections.

So in ITILv3, one of the MOST important things to understand is the concept of a Configuration Item.

What’s a CI?

The way I explain this to customers is it’s the smallest managed thing, or set of things, in the environment.

How does that apply to my network?

Well, hopefully, I’m going to try and explain that now.

The first CI in a network might be the hardware devices that are in the network. These are your switches, routers, firewalls, load balancers, servers, etc…

So most people are good with the idea of standardization. It makes senses that it’s easier to manage fewer kinds of devices. This is recommendation #1.

1) Standardize on as few hardware platforms as possible.

The good thing is that this is fairly easy to achieve. In fact a lot of people do this instinctively. They standardize on the same two chassis switches in their core, they use the same model in their distribution, and they use the same model for the access layer.

Here’s where things get crazy though.

Many of the same customers who try to standardize on a current device often have no processes in place to ensure that they are all running the same version of code.

So think back to the ITIL Configuration Item. If you have five HP 5500EI switches, and five different OS’s on them, you now have five different CI’s to track. Make sense?

Five different versions of commands

Five different versions of bugs.

5 times the headache.

If a configuration item is the smallest manageable object, then each of the different combinations of hardware and software count as a single CI. BUT… if we standardize one version of code for that hardware platform, we get one configuration item.

So the first thing that I recommend customers to do is…

GET ON ONE VERSION OF CODE!!!!!!

This is commonly called a golden software version. One version of commands, one version of bugs. One CI.

On the flip side; one of the other common mistakes I see made by customers who have taken the first step of getting on a single version is that of upgrading without a reason.

My recommendation here is to do your homework. When a new version of code is released, read the release notes, check the bug fixes, check the new features. If there’s nothing in there that is addressing an issue you’re having, or new functionality that you NEED to have,

WHY WOULD YOU CHANGE?

It may seem strange, but when you get a new switch out of the box, you may want to just plug that into your network and downgrade it to the older software. More thoughts on this in this blog post.

Any decent NMS should have the ability to be able to define, report, and deploy the correct version of code to the hardware devices.

Funny enough, post writting this, I found this another great blog by Terry Slattery, this time over at http://www.nojitter.com.

What about you guys? What configuration tools are you using? HP IMC? Orion NCM? Rancid? Prime? A TFTP Server on a wandering laptop?