Network Management – How to get started

Network Management Skills

In the last few years, I’ve noticed that I’m a little different. It’s not just because I wear coloured socks or my hair looks like I style after Albert Einstein.  I noticed that I’ve developed a different skill set than the majority of my pre-sales or post-sales network professional peers. What skills you ask?  Network Management and Operations.

Why I choose to develop Network Management skills

About five years ago, I took a look at the market and thought ” This stuff is complicated “.  Earth shattering observation, right?   It sounds simple, but then I started looking at some of the tools we had at the time and I realized that NMS tools could really help to automate not only the information gathering, but also the configuration tasks in our networks.  At the time, we had a cool little tool called 3Com Network Director.  It ran on a single PC. No web interface and it really only managed 3Com gear. But it was better than running CLI commands all day long. And the monitoring aspects really helped my customers identify and resolve problems quickly.  This was a moment of inspiration for me.  I choose to develop skills in network management and operations.

Let me say that again.

I choose to develop skills in network management and operations. 

I didn’t choose to develop skills in 3ND, or IMC, or Solarwinds, Cisco Prime, or any of the various other tools. Overtime, I’ve gained experience on all of those products, but I would say my true value is having gone through the process to develop skills in the sub disciplines of network management. Learning a product is only a very small part of the whole domain. 

What does that mean? 

It’s easy to learn a product. They have bells and whistles. Click this check box. fill in this box. etc..    Those skills are important. But they don’t help us understand how to apply the product to resolve our customers business challenges. They don’t help us understand when not to click that box. And they don’t help us to design a network management strategy, or to consult with our customers on operational efficiencies and what can be done to help increase their networks stability, to reduce the MTTR times, or to mitigate pressures put on the operations team. Learning the domain knowledge has helped me to understand WHY we have developed the product features and what they are to be used for.

My Learning Roadmap

To put it simply, I consumed everything I could on the subject. It’s amazing how much free information is out there if you set your mind on finding it. If anyone’s looking to increase their skills in this area, I’ve put together the following list of resources that have really helped me in this domain. I’ve tried to keep this out of vendor specific products, but I’m sure you’ll find that any product you choose will probably have training and learning resources around it as well. This is in NO way inclusive, there are a lot of resource out there. I highly encourage everyone to read, watch, and listen to as many of them as you can and to think about them critically.

Free Resource

Solarwinds SCP training  The Solarwinds SCP training is online and free. What I really liked about this training is that it’s really focused on network management, netman protocols, and the operational aspects of network management. There are, of course, some product specific aspects to the training, but in general this is a really good primer on network management in general. Oh… did I mention there’s a bunch of videos as well?  Great stuff to rip and put on your tablet when you’re stuck on a plane and you’ve seen all the movies. 

Solarwinds has also provided a bunch of whitepapers going further in depth on network management specific subjects which are a great reference.  If you’re interested there’s also the Solarwinds Certified Professional certification if you’re looking for a way to validate your knowledge.

The Information Technology Infrastructure Library  (ITIL) is a compilation of IT service management practices compiled over the last 30 years. There’s a lot of great stuff in here. The books are expensive though.  There is an entire industry that’s sprung up around ITSM.  If you have some commute time to spare, I would highly suggest typing in the words “ITSM” into your favourite Podcast app and sit back and listen. 

If you’re interested, there’s also the ITILv3 Foundations certification if you’re looking for a way to validate your knowledge.

Blogs and Podcast

Social Media is a great way to learn how people apply ITIL concepts to the real world. I particularly like http://www.itskeptic.org as it’s got a great following of a bunch of smart people who disagree on a regular basis. You never know when the customer you’re going into is operating in a traditional ITIL based ops model, perhaps they are using the Microsoft OperationsFramework, or perhaps they’ve moved on to Agile and DevOps. It’s good to have at least a cursory knowledge in all of this approaches to IT operations, to mention traditional Network Management Frameworks like FCAPS and eTOM

Paid resources

Books are a great way to learn about network management and operations. Here’s my  abbreviated reading list. These are the reference books that sit on my shelf within easy reach. 

Network Maturity Model – This book is actually a academic thesis focused on trying to extend the CMMI models to network specific capabilities maturity models. Of course, network operations is part of the capabilities of an organization, so there’s a lot of great content in here.  The book is definitely academic, but it’s got a LOT of great content in it, assuming you can get through all of the required footnotes and pointers to other academic works. 

Fundamentals of EMS, NMS, and OSS/BSS – This books is wonderful. It covers all aspects of traditional telecom management from FCAPS to eTOM, as well as looking at OSS/BSS architectures which usually exist only in Service Provider networks. Great information in here.  My biggest problem with this book is the font size. I have glasses and it’s tiny.  Worth the effort to make it through, but plan on multiple reading sessions. This is not a book you’re going to get through in one sitting.

Network Management Fundamentals – Cisco Press book that’s a great read. A lot of information in here is covered in some of the other books. What I like about this is that it written as an introduction to network management for people already working in the field. This is not an academic text.

Network Management: Accounting and Performance Strategies – Cisco Press book again. This one focuses strictly on performance management, focusing a lot on Netflow and how it can be applied to accounting and performance in large network.  

Performance and Fault Management – Cisco Press Book again. This is an older book, so the technologies discussed may not be as relevant as they once were. The nice thing though is that we’re talking about operational models and processes here, so the principles still apply. 

VoIP Performance Management and Optimization – Last Cisco Press book. This book looks at the operational aspects of VoIP/IP Telephone/Unified Communications networks specifically. There are a lot of very detailed recommendations in here that can be leveraged to give customer guidance on what they should be doing and what they should be monitoring. This book has helped me a few times when working with customer who have chosen to implement a dual-vendor strategy and want to have HP Intelligent Management Center managing and monitoring there Cisco Callmanager environment in addition to their network.

The Phoenix Project – This book is written as a novel to teach people about the DevOps movement. This is a MUST read for anyone interested in IT operations and the current trends in the industry. It also will help get a first hand accounting of what many customers go through. Read it. Read it. Read it.

The Visible Ops – From the same authors of the Phoenix Project. This book tries to tie DevOps and ITIL together. Interesting read. Many people see DevOps and ITIL as two opposites of the spectrum. Most have had a bad ITIL experience and now the pendulum swings in the other direction. Finding a happy middle is a good goal. I’m not sure they’ve hit the mark, but it’s a start.

Network Management: Principals and Practice – Expensive book. Good information, but the technology is also quite dated. Concepts and knowledge is great. Good diagrams, but it’s sometimes hard to get through the hubs and token ring.

Domain Related Knowledge

Network Management is really about ensuring stability and helping the business to meet their operational requirements with the greatest efficiency possible. In that light, it’s important to understand what some of those operational burdens are. In recent years, businesses have had a ton of GRC (governance, risk, and compliance) requirements put on the operations teams that threaten to break an already overloaded team.   On the bright side, I believe that although they have been forced into these requirements through legislation and governance like SOX, COSO, PCI-DSS, HIPPA, Gramm-Leach-Bliley, etc.  have actually forced network operations teams to get much tighter on their controls, forcing us into more stable and secure networks. 

note: This list is US specific, if international readers can post some examples in the comments section, I would be happy to add them to the list of references.

In my experience, one of the issues with GRC requirements in general is that they are very rarely descriptive of what actually needs to be done. They have generic statements like “monitor network access”  or ” secure the it assets”.  

ISACA noticed this and put together the COBIT framework which is a very detailed list of over 30 high-level processes and over 200 specific IT control objectives. Most of the GRC requirements can be mapped to specific COBIT objectives. COBIT is a good thing to be familiar with. 

 Next Steps

As we move forward in IT, operations and orchestration skills are starting to become some of the hottest requirements in IT. 

Whether it’s products like HP’s Cloudsystem, or industry wide projects like OpenStack, CloudStack or Eucalyptus, having solid Operational knowledge and skills is going to be a requirement for anyone seeking the coveted Trusted IT Advisor role in any customer. 

For anyone looking to gain or just brush up on their network management specific skills.

I would recommend

  • Solarwinds videos as a place to get started with the basics of network management
  • become familiar with the basics of COBIT and GRC in general..  Doing some reading on the various GRC requirements that apply to your specific regions and customers is also a great way to change the conversation from speeds and feeds to the challenges of the business. 
  • Read on OpenStack
  • Learn about ITIL and DevOps

Social Media is always a great way to stay current as well. One of the biggest challenges of operations is the best way to learn it is to do it. Unfortunately, many of the really good Network professionals, whether pre-sales or professional services, don’t get an opportunity as they are usually hands-off or on turning over the keys to an ops team after the project has been delivered. Socialmedia helps to connect to the daily challenges of people who are living in the trenches. 

Get some ITSM experience. If you don’t work in a company where you get to babysit the same environment, you can always do what I did and experiment with it at home.

Anyone else have any suggestions on how to get up to speed? Feel free to comment below!

@netmanchris

Advertisements

Providing Network Leadership

So I have to give credit where credit is due…  a lot of this post is directly inspired by the book Network Maturity Model By William J. Bauman et al.   It’s written in a very academic style, but there are a ton of little gems in there which I think are worth pointing out. I’m expanding a lot on some of these key points, so please feel free to drink from the source rather than the muddy water down river. 🙂

 

The first section of the actual maturity model deals with Enterprise Network Leadership. I think it’s important to say that when I’m using the word Enterprise, I’m not talking about a large organization. I’m just talking about the business. Whether you are responsible for a few switches and a router, firewall or UTM appliance, or you are responsible for a multinational organization with a global WAN, several large campus environments, and smaller branches spanning the globe. I think the same general guidelines apply. 

 

Have a Plan

The network leaders are responsible for creating a network business plan that aligns with business strategy. Now keep in mind, that there are a LOT of very talented people in the industry who are consultants. These hired guns are often jumping from engagement to engagement, so this might not apply to them. But for those who are in an Network Operation role, it’s critically important to understand:

  • What the business goals are?
  • Who the LOB application stakeholders are?
  • What their requirements are? What applications are important to them?
  • How the LOB stakeholders directly impact the profitability of the business?

and most importantly; 

  • How the ability, or lack thereof, to successfully run the network can impact the business directly?

The Network Leaders are responsible for creating both the vision/strategy, and the specific policies and procedures to support the vision in the short, mid, and long term. From specific policies such as acceptable-use statements to longer term procedures such as a planned equipment refresh on a well defined rotational schedule to avoid a massive CAPEX hit, the network leader is responsible for making sure the network has the appropriate capacity, resiliency, availability, redundancy, etc.. to meet the business requirements. 

To create the vision/strategy from which the policies and procedures are derived, they should also be ensuring that the requirements of those stockholders are taken into account when planning out the network and all the operational tasks around it. This is very broad and can be summed up as “understand the business requirements”.

 

Understanding the Business Requirements

This one gets thrown around a lot in our industry. But to be honest, I find that VERY few hardcore network professionals actually take the time to do this. It’s my opinion, obvious bias aside, that the network is one of the fundamental pillars of almost every network in the world now.  I’m choosing not to use the word “foundation” because I don’t believe that’s true. 

A foundation to me is something that business is built upon.  Imagine if you will that a business is responsible for making hand-made clothes. Or is responsible for growing food. I think it’s obvious that the network is not the MOST important thing. In both of these examples, I don’t think any would argue that the business will be incapable of creating it’s product without the network. 

But imagine if the network is down and they are unable to receive orders from their customers? What if the network is down and they are unable to use their ERP system to ship orders? Or to send invoices?  

I think we can all agree that if the products sit on the shelf, it’s not a good thing. Money doesn’t come in. And soon, global economic catastrophe is created, cats sleeping with dogs, total chaos!!!

All because a network went down. 

(OK… maybe I’m exaggerating a little. )

 

So what kind of things should be taken into account when we say “understand the business requirements”?  Here’s some of the top of my list:

What governance, risk, or compliance initiatives does the company have to adhere to?

GRC? Huh? Depending on the specific industry, country, or region of the world that the company operates in, there may many legally enforced burdens that are placed on the company. The major examples everyone seems to know are SOX, Graham Leech, HIPPA, etc..  These all have different, although often complimentary, requirements that depending on the nature of the business, you need to be aware of as a network leader.  

If you are a network leader and you are having trouble getting budget approval for some much needed networking upgrades. Learn about which GRC requirements apply to your organization. It’s amazing how quickly the purse strings open when the business leaders understand that the failure to do these upgrades may have a direct impact on a GRC requirement that they can be personally held liable for. 

What are the different Line of Business applications and how critical are they to the success or failure of the business?

Most companies have a LOT of applications they “need” to do their business. But there is a BIG difference between their Microsoft Lync implementation which they use to increase collaboration between globally dispersed teams, and their ERP system which is responsible for making sure that orders are received, shipping requests are sent to the warehouse, and invoices are sent to the customer. 

If you are a network leader and you are having trouble getting budget for some much needed networking upgrades. Learn which of the LOB applications are directly related to the business’s ability to take orders, ship product, or invoice customers. When requesting budget for the upgrade, make sure you make it clear what hourly business cost can for network downtime. 

An easy way to calculate this, if you have access to the numbers, is to look at the annual report. Figure out what the revenue was last year, divide by 365. divide by 8 and you know have the hourly cost of downtime. 

 

For me, these are two of the most important “understand the business” requirements, but I’m sure there are a ton of others ones.  PLease feel free to call out more examples in the comments! I’d love to hear them!

 

@netmanchris

 

 

 

Getting ITSM Experience

Many of the best network engineers I know have little to no network operation experience.

“What? How can that be?” you ask? Well it’s really quite simple.

Most of the best network engineers I know, and we’re talking some double and triple CCIE’s in this crowd, have never actually operated a network for any length of time. They were professional services guys, short term contract guys, some pre-sales or post sales guys. There are a LOT of paths to the top of Mt. Fuji after all.

Although I did a short term network ops. gig early on in my career, I actually feel I squandered the opportunity as I just wasn’t mature enough to understand the experience that I should have been gaining.

So this question came up with a college last week. ” How do you get network operations experience if you’re not in a network ops group?”

This blog post is dedicated to him.

A few years back, after I decided to really get serious about network management, I had the same issue. I wanted to get some experience in network management, but I had no large network to run. In my day job, I’m actually a pre-sales resource, so it’s not likely I’m going to get any experience in the near future, so it occurred to me that I could start a simulation to try and gain that experience.

At this point, I had already done some Ciscoworks LMS projects (long sleeve shirts to cover the scars to prove it!). I had successfully passed my ITILv3 foundations certifications, and I had even gained the honor of being one of the first Solarwinds Certified Professionals shortly after the SCP program was launched.

The Project

So with a bit of knowledge, I decided to run my home network as if it was an ITSM framework for a year. This means that I had to implement good network management hygene. Good Change management practices. Good fault management practices. Try to implement some of the ITIL processes around Service Strategy, Service Design, Service Operations, and Continual Service Improvement.  Basically, run it like a business who’s success depended on the network.

The Tools

So I had some ideas around the processes I wanted to put in place, but it always takes the three P’s to successfully implement any ITSM initiative. People, products and processes. Fortunately for me, I had access to HP’s Intelligent Management Center, as well as the trial versions of Solarwinds Orion NPM and NCM. but I was still missing some critical pieces to the puzzle.

Service Operations: One of primary activities in Service Operations is really around the help desk. How are tickets logged? How are they tracked? Escalations Procedures. Building out and growing the KMS (knowledge management system )

I didn’t have any help desk or ticketing software in place, so I decided to go the free way; Spiceworks.

For those of you who don’t know it, Spiceworks is a free IT Management app which ” includes a free IT management app for everything from network inventory and monitoring to help desk and more!”

It’s not what I would call a full FCAPS system, but it does have an ok help desk system, and it’s hard to beat free, right?

Note: I noticed last week that my Synology NAS now has a help desk app named OS Ticket in the available apps. I haven’t tried this, but considering it’s free and installs easily on the synology box, it might be a good option for those of you who are lucky enough to have one of these great little machines.

Financial Management

Financial Management falls under the service strategy volume of the ITILv3 core books. I’ll be honest, that this wasn’t exactly the strongest part of my little experiment, but I did try to implement some financial processes.

But unlike some of the helpdesk and change control procedures, this wasn’t exactly something that I could count on good self-discipline to track. Can you imagine that conversation?

“Hey Me… I’d really like this new synology RS812.”

“Hmm… Don’t we already have a 411?”

“Yeah, but this one has TWO gigabit ports!”

“Let me think about that… ok. Let’s buy it!”

As you can see, I had to come up with a different plan.

Fortunately, I’m married, so I merely formalized the process of having to ask my wife for permission to buy any new toys. I have to say, this was probably the year that I got the least amount of new techtoys, but I like to think the experience I gained was worth it. ( < – What’s the HTML tag for the sarcasm font again? )

The Results

So how did things go? Well, it was a little funny at times. Emailing myself a support ticket so that I could fix something that wasn’t working. I did try to get my wife to e-mail the tickets in, but that lasted about a week before she just said ” Can you just fix it!?!?!?!”

For the other things, it felt a little strange asking myself for permission so that I could make a change to the environment and then having to consult myself to see what the affects might be ( Change Advisory Board ). Implementing the RACI (responsible accountable consulted informed ) was pretty easy because I generally get along with myself. etc…

To be honest, I wish I would have been blogging back then, because I think it would have made for some interesting reading in retrospect. I’d like to say that I followed all the processes and ran a bullet proof network for the year, but I didn’t. Sometimes I slipped, made a change and locked myself out of my own gear.

But on the bright side…  I did learn why change management is important.

Any one else gone through an experiment like this? Anyone willing to take up the challenge and blog on the experience?

FCAPS – A Quick Introduction

It occurs to me that I’ve been writing the last few posts about network management tasks based on an ITSM model and I didn’t even introduce what is probably the more arguably more useful model for breaking down and understanding network management tasks; the FCAPS model.
FCAPS has it’s roots in the ISO, similar to another model we all know and love; the OSI model. Everyone remember that one? Please Don’t Take Sales’ Peoples Advice?  You may have learned another acronym for it, but this is the probably the most basic conceptual model that every networking person uses to understand the world we live in.

For those of you who are looking for some extra credit reading, or need a cure for insomnia, you can find the actual FCAPS standards in the ITU-T M.3400 recommendations. For the rest, I’m hoping to give a brief overview to help you understand the different aspects of the disciplines of network management.

F is for Fault

This involves the detection, isolation, and correction of a fault condition. Or in plain english, this lets you know when things are broken.

Fault Management could involve things like syslog, SNMP traps been escalated to Alarms. Root-Cause-Analysis and Alarm suppression or some AI which tries to seperate the signal from the noise during event storms.  Alarm notification policies ( sending out an e-mail once you get an alarm ).

Traditionally this was implemented in a lot of NMSs as Green-is-good management. Basically, if everything is green. Things are ok. If they are yellow or red, you’ve probably got along night ahead of you.

In recent years, Fault Management has started to include application performance management as well. In modern networks, it’s not enough to know that an application is “up”. Now we must also make sure that the level of service, or SLA, that is been delivered to the end-user is adequate to meet their needs.

Note: Whether an activity falls into one category of FCAPS or another might depend on your perspective. If you are measuring bandwidth on a particular port, you may be in the “P”, but if you are measuring the bandwidth and raising an alarm if you cross a certain threshold, you’re now in the “F”.

This may seem confusing at first, but remember that FCAPS is just a conceptual model.  This is similar to the 7 Layer OSI model. Ask any good network person what layer MPLS falls at and they will either answer ” It depends” or potentially ” 2.5 “.

C is for Configuration

This involves the configuration of the software and hardware in the network. This includes the versions of software, the actual configurations, change management, etc…

This is probably the easiest to understand. If you’re upgrading code on a switch or router, if you’re logging into a router to make a configuration change, or if you’re just plugging a network cable in to a PC, you’re in the “C”s.

Accounting

This involves the identification of cost to the service provider and payment due for the customer. Ie: Billing.

Personally, I find this definition a little restrictive and prefer to apply the definition that I heard in a presentation.  I wish I could remember the name of the gentleman to give him credit. He started out in a thick southern drawl

The thing to remember about a’counting, is that the rest of the world just calls it counting.

I know. Barely funny, right?

But it does allow us to use this to include things like

  • netflow for counting the different protocols running across a certain WAN link.
  • SNMP polling of T1/PRI interfaces for ensuring that you’re Erlang calculations are accurate and you don’t need to raise or lower the number of trunks on your voice gateways.
  • RADIUS to track how long a user was logged into a specific port on the network or how much bandwidth he actually used.

You get the picture. Basically, accounting is just counting things which might be interesting to you.

Although this is not the strict definition from the ITU M.3400, this amended version makes it easier for me to apply this because I don’t have very many customer (read: any) who actually do charge-backs for their services.

Obviously, in a XaaS service, this domain is probably going to get a lot of attention in the coming years.

P is for Performance

This involves evaluating and reporting on the effectiveness of the network, and individual network devices.

Way back when I did my CCNA, one of the things I remember reading about was how you should be checking your routers and switches often to see if their CPU or memory was running high. I’ve never actually met anyone who logged into a device to check on a daily basis, but the advice was actually really good.

With a good NMS, you can

  • use SNMP polling for the CPU and Memory to track their trending over time.
  • use ICMP to track availability of the devices ( assuming it responds!)
  • use ICMP to track the latency of the device to test the quality of the link.

As I mentioned in the Fault section, performance often blurs with fault in that good performance management habits can alert you to  faults in the network. In fact, good performance management can even allow you to proactively avoid faults by identifying a potential performance block in the network, and addressing the issue before it turns into a fault.

Probably the most important thing to know about performance management is that it helps you make better decisions.

Most good network engineers can instinctively know where the bottlenecks are in their networks and can usually correctly identify what needs to be upgraded to get the most benefit.

Most great network engineers can use the pretty graphs from a good performance management tool to get the money from their CFO for those upgrades.

In my home network, I actually track the response time of all my links, as well as additional services, such as the one below which allows me to keep my wife happy.

Facebook Response Time Performance Tracking

note: probably the most recognizable performance management tool would be MRTG/PRTG. I can’t even imagine how many network upgrades were justfied by the pretty graphs that came out of these tools.

Security

Security is… well security. These are the network management activites that involve securing the network and the data running over it.

In a lot of ways, I strongly believe that security should be addressed in every waking (and sleeping!) moment that you’re thinking about your networks. Security should become so second nature to us that it should be almost impossible to perform any of the other tasks without security entering the conversation.

What do I mean?

Fault – CIA – Confidentiality, Availability, and Integrity. Hard to be secure when it’s not available and the Fault domain helps us keep it that way!

Configuration – Auditing – Good configuration management practices can involve automated IT Control objective verification tools, otherwise known as “scripts” which will allow us to have the NMS ensure all the configurations are what they should be and no unneeded services are on our routers and switches.

Performance – You can’t get performance data without SNMP, and if you’re using SNMP, PLEASE USE SNMPv3 if possible!  It can be encrypted with integrity. Also, lock down your management interfaces with ACLs on your devices.
FCAPS

It’s just a model

Please don’t take it too seriously. It’s not a binary model. Feel free to apply some fuzzy logic here and be confident that it’s 46% Fault Management and 54% Performance Management.

The important thing here is that it helps us understand the network management world we live in. It gives us a conceptual model to be able to understand the different activities involved in network management. As an added bonus, it also gives us a handy tool to evaluate different NMS software packages.

Think about the tools you’re using. Are you using a point solution, like Solarwinds Orion NPM which focuses on Performance monitoring, or an Open Source tool like RANCID which focuses on Configuration?

Or are you looking at a SPOG solution like HP’s IMC which provides full FCAPS (and more!) in the base package?

What tools are you using? Are they full FCAPS?Or are they more focused on one particular area?

Configuration Management – Software Management

So in the last post I introduced the concepts of the Configuration Management System, and the Configuration Item. Today, I’m going to introduce the concept of the Definitive Media Library.

The DML is really nothing more than a software library. Ideally, this should be tied directly into your element management system so that you can define the baseline software image, deploy the image out to the appropriate devices, and audit the network to ensure that all of the devices are inline with your golden software definitions.

As I laid out in the last post, standardization is there to make your lives easier. But it takes a lot of commitment, especially if your network has gone through significant “organic growth”. Making the choice to commit to good configuration management hygene is sort of like committing to going to the gym or commiting to eat healthier.

Just like going to the gym, the first thing you need to do is figure out your current software state. Hopefully, your NMS software will have the ability to discover and audit the software running on the devices in your network and report against a known good state.

Audit the Current State of the Network

If you don’t have an NCCM tool in place with these features, you may end up writing scripts, or worse case, loging into your devices manually and noting the software version in an excel spreadsheet. Once you have a handle on what’s out there, the next step is chosing what version of code you need to be running.

Choosing your Software Version

So now that you’ve figured out that your devices are all over the place, it’s time to figure out what version of software you actually want to be running. Whether you are running Comware, IOS, NXOS, Junos, FTOS, or some other OS that I haven’t mentioned, the guidelines are pretty much the same.

Wash, Rinse and Repeat.

What about the exceptions?

I was going to try to sugar coat this, but I’ll just come out and say it. Cisco has licensing for many of their platforms, this can create situations where you can’t actually get on a common code version without incurring additional CAPEX costs associated with buying the licenses and OPEX to deal with the SMARTNet’. Or potentially, you can get into the situation where the features you’re looking for are mutually exclusive in two different IOS images for your routers. Or you’re running Cisco Callmanager and your gateways require the Voice image and your regular WAN routers another image.

In any event, my recommendation is still the same. Find the fewest possible combinations of software for the hardware platforms in your network and stick to them unless there is a REALLY good reason to change.

Check out this video of the basic NCCM features in HP’s Intelligent Management Center to help you navigate through your software baseline woes.

Anything I missed here? Feel free to post in the comments below.

Intro to Configuration Management

So in a previous post, I made the recommendation to go find an ITSM framework.  For the rest of this series, I’ll be referring to the ITILv3 ITSM framework a lot.  The two books that, IMHO, apply the most to Network Operations are the Service Transition and the Service Operations books.

For the next few posts, I’m going to focus on the Service Transition volume, and specifically on the Configuration Management sections.

So in ITILv3, one of the MOST important things to understand is the concept of a Configuration Item.

What’s a CI?

The way I explain this to customers is it’s the smallest managed thing, or set of things, in the environment.

How does that apply to my network?

Well, hopefully, I’m going to try and explain that now.

The first CI in a network might be the hardware devices that are in the network. These are your switches, routers, firewalls, load balancers, servers, etc…

So most people are good with the idea of standardization. It makes senses that it’s easier to manage fewer kinds of devices. This is recommendation #1.

1) Standardize on as few hardware platforms as possible.

The good thing is that this is fairly easy to achieve. In fact a lot of people do this instinctively. They standardize on the same two chassis switches in their core, they use the same model in their distribution, and they use the same model for the access layer.

Here’s where things get crazy though.

Many of the same customers who try to standardize on a current device often have no processes in place to ensure that they are all running the same version of code.

So think back to the ITIL Configuration Item. If you have five HP 5500EI switches, and five different OS’s on them, you now have five different CI’s to track. Make sense?

Five different versions of commands

Five different versions of bugs.

5 times the headache.

If a configuration item is the smallest manageable object, then each of the different combinations of hardware and software count as a single CI. BUT… if we standardize one version of code for that hardware platform, we get one configuration item.

So the first thing that I recommend customers to do is…

GET ON ONE VERSION OF CODE!!!!!!

This is commonly called a golden software version. One version of commands, one version of bugs. One CI.

On the flip side; one of the other common mistakes I see made by customers who have taken the first step of getting on a single version is that of upgrading without a reason.

My recommendation here is to do your homework. When a new version of code is released, read the release notes, check the bug fixes, check the new features. If there’s nothing in there that is addressing an issue you’re having, or new functionality that you NEED to have,

WHY WOULD YOU CHANGE?

It may seem strange, but when you get a new switch out of the box, you may want to just plug that into your network and downgrade it to the older software. More thoughts on this in this blog post.

Any decent NMS should have the ability to be able to define, report, and deploy the correct version of code to the hardware devices.

Funny enough, post writting this, I found this another great blog by Terry Slattery, this time over at http://www.nojitter.com.

What about you guys? What configuration tools are you using? HP IMC? Orion NCM? Rancid? Prime? A TFTP Server on a wandering laptop?

So you want to install an NMS?

So when I started this blog, one of the things I wanted to do was to start putting out some of the knowledge that I’ve gained in network operations over the years. About 5 years ago, I decided that I really wanted to become a network management expert. Except… there was no CCIE Management. So I had to do it the hard way. One of the original reasons of me starting this blog was to try and make that journey a little easier for the next person who has an interest in network management or someone who just gets told they need to implement an NMS project and has no idea where to get started.

Getting started

I know I’m going to take flack for this one from some of the grumpy old network gods, but one of the first places I would HIGHLY recommend visiting on your journey is the ITIL homepage.

The IT Infrastructure Library is essentially a set of “best practices” that have been codified and put on paper. If you’re interested, there’s a intro white paper here.   I’ll wait….

So in a nutshell, ITIL is really just an IT Service Management Framework. There are others, like the Microsoft Operational Framework, not to mention FCAPS, eTOM, and probably a few others that bare mentioning, but have slipped my mind.

I’ll admit it, I’m a certification junky, so I went out and got ITIL v3 Foundation certified and what I found out was that ITIL is really just a bunch of common sense best practices that you would learn if you had spent the last 20 years in operations.

For those of us that didn’t, I’ll do a VERY quick and dirty

Intro to ITIL

ITIL v3 Wheel

       1) Service Strategy: What should we do?

       2) Service Design: How should we do it?

       3) Service Transition: Let’s do it.

       4) Service Operations: Are we doing it?

       5) Continual Process Improvement: Can we do it better?

 

Yup. 5 volumes. Thousands of pages, and I just reduced it down to 5 lines.

Now there’s a lot more to ITIL and there’s a ton of good information in there around specific things like change control processes, or how to define SLA ( service level agreements ) etc… but IMHO, at it’s very root, it’s just some wisdom written down by people who have been there and done that.

If you’re lucky enough to have an employer pay for your books, I would suggest picking up the last 3 volumes since that’s where the meat of operations actually happens. If you’re not, I would suggest checking out itunes and search for ITIL, ITSM, and start checking out blogs like the IT Sceptic to get a balanced view of the subject.

So step one in the journey to become a network management expert for me was to figure out what actually went in to managing a network. Not just the technical aspects, any CCNA can tell you how to turn on SNMP or to setup SSH on a switch, but I think the first step in this journey is to start to learn about how the business feels about IT Service Management and to start to fill in the other “W”s.

So my first big piece of advice for anyone who wants to really get into network management

Find a Framework

Whether that’s ITIL, or MOF, or your own companies ITSM Methodology, having a framework will provide you with a way to see the world. The only additional piece of caution I would give to you is that it’s just a framework. Use it to help you understand until it’s no longer useful.  Remember the OSI model? 7 discrete layers that were written in stone? and then MPLS comes along and suddenly you have to troubleshoot things at layer 2.5.

For ITIL specifically, there are a lot of people who have made a lot of money by preaching it as if it’s gospel. Just smile at them and remember, they are just guidelines. Adjust where you need to.

I think that’s enough for tonight. Hopefully someone finds this useful and I’ll keep motivated to get this out. 🙂

@netmanchris