Network Management – How to get started

Network Management Skills

In the last few years, I’ve noticed that I’m a little different. It’s not just because I wear coloured socks or my hair looks like I style after Albert Einstein.  I noticed that I’ve developed a different skill set than the majority of my pre-sales or post-sales network professional peers. What skills you ask?  Network Management and Operations.

Why I choose to develop Network Management skills

About five years ago, I took a look at the market and thought ” This stuff is complicated “.  Earth shattering observation, right?   It sounds simple, but then I started looking at some of the tools we had at the time and I realized that NMS tools could really help to automate not only the information gathering, but also the configuration tasks in our networks.  At the time, we had a cool little tool called 3Com Network Director.  It ran on a single PC. No web interface and it really only managed 3Com gear. But it was better than running CLI commands all day long. And the monitoring aspects really helped my customers identify and resolve problems quickly.  This was a moment of inspiration for me.  I choose to develop skills in network management and operations.

Let me say that again.

I choose to develop skills in network management and operations. 

I didn’t choose to develop skills in 3ND, or IMC, or Solarwinds, Cisco Prime, or any of the various other tools. Overtime, I’ve gained experience on all of those products, but I would say my true value is having gone through the process to develop skills in the sub disciplines of network management. Learning a product is only a very small part of the whole domain. 

What does that mean? 

It’s easy to learn a product. They have bells and whistles. Click this check box. fill in this box. etc..    Those skills are important. But they don’t help us understand how to apply the product to resolve our customers business challenges. They don’t help us understand when not to click that box. And they don’t help us to design a network management strategy, or to consult with our customers on operational efficiencies and what can be done to help increase their networks stability, to reduce the MTTR times, or to mitigate pressures put on the operations team. Learning the domain knowledge has helped me to understand WHY we have developed the product features and what they are to be used for.

My Learning Roadmap

To put it simply, I consumed everything I could on the subject. It’s amazing how much free information is out there if you set your mind on finding it. If anyone’s looking to increase their skills in this area, I’ve put together the following list of resources that have really helped me in this domain. I’ve tried to keep this out of vendor specific products, but I’m sure you’ll find that any product you choose will probably have training and learning resources around it as well. This is in NO way inclusive, there are a lot of resource out there. I highly encourage everyone to read, watch, and listen to as many of them as you can and to think about them critically.

Free Resource

Solarwinds SCP training  The Solarwinds SCP training is online and free. What I really liked about this training is that it’s really focused on network management, netman protocols, and the operational aspects of network management. There are, of course, some product specific aspects to the training, but in general this is a really good primer on network management in general. Oh… did I mention there’s a bunch of videos as well?  Great stuff to rip and put on your tablet when you’re stuck on a plane and you’ve seen all the movies. 

Solarwinds has also provided a bunch of whitepapers going further in depth on network management specific subjects which are a great reference.  If you’re interested there’s also the Solarwinds Certified Professional certification if you’re looking for a way to validate your knowledge.

The Information Technology Infrastructure Library  (ITIL) is a compilation of IT service management practices compiled over the last 30 years. There’s a lot of great stuff in here. The books are expensive though.  There is an entire industry that’s sprung up around ITSM.  If you have some commute time to spare, I would highly suggest typing in the words “ITSM” into your favourite Podcast app and sit back and listen. 

If you’re interested, there’s also the ITILv3 Foundations certification if you’re looking for a way to validate your knowledge.

Blogs and Podcast

Social Media is a great way to learn how people apply ITIL concepts to the real world. I particularly like http://www.itskeptic.org as it’s got a great following of a bunch of smart people who disagree on a regular basis. You never know when the customer you’re going into is operating in a traditional ITIL based ops model, perhaps they are using the Microsoft OperationsFramework, or perhaps they’ve moved on to Agile and DevOps. It’s good to have at least a cursory knowledge in all of this approaches to IT operations, to mention traditional Network Management Frameworks like FCAPS and eTOM

Paid resources

Books are a great way to learn about network management and operations. Here’s my  abbreviated reading list. These are the reference books that sit on my shelf within easy reach. 

Network Maturity Model – This book is actually a academic thesis focused on trying to extend the CMMI models to network specific capabilities maturity models. Of course, network operations is part of the capabilities of an organization, so there’s a lot of great content in here.  The book is definitely academic, but it’s got a LOT of great content in it, assuming you can get through all of the required footnotes and pointers to other academic works. 

Fundamentals of EMS, NMS, and OSS/BSS – This books is wonderful. It covers all aspects of traditional telecom management from FCAPS to eTOM, as well as looking at OSS/BSS architectures which usually exist only in Service Provider networks. Great information in here.  My biggest problem with this book is the font size. I have glasses and it’s tiny.  Worth the effort to make it through, but plan on multiple reading sessions. This is not a book you’re going to get through in one sitting.

Network Management Fundamentals – Cisco Press book that’s a great read. A lot of information in here is covered in some of the other books. What I like about this is that it written as an introduction to network management for people already working in the field. This is not an academic text.

Network Management: Accounting and Performance Strategies – Cisco Press book again. This one focuses strictly on performance management, focusing a lot on Netflow and how it can be applied to accounting and performance in large network.  

Performance and Fault Management – Cisco Press Book again. This is an older book, so the technologies discussed may not be as relevant as they once were. The nice thing though is that we’re talking about operational models and processes here, so the principles still apply. 

VoIP Performance Management and Optimization – Last Cisco Press book. This book looks at the operational aspects of VoIP/IP Telephone/Unified Communications networks specifically. There are a lot of very detailed recommendations in here that can be leveraged to give customer guidance on what they should be doing and what they should be monitoring. This book has helped me a few times when working with customer who have chosen to implement a dual-vendor strategy and want to have HP Intelligent Management Center managing and monitoring there Cisco Callmanager environment in addition to their network.

The Phoenix Project – This book is written as a novel to teach people about the DevOps movement. This is a MUST read for anyone interested in IT operations and the current trends in the industry. It also will help get a first hand accounting of what many customers go through. Read it. Read it. Read it.

The Visible Ops – From the same authors of the Phoenix Project. This book tries to tie DevOps and ITIL together. Interesting read. Many people see DevOps and ITIL as two opposites of the spectrum. Most have had a bad ITIL experience and now the pendulum swings in the other direction. Finding a happy middle is a good goal. I’m not sure they’ve hit the mark, but it’s a start.

Network Management: Principals and Practice – Expensive book. Good information, but the technology is also quite dated. Concepts and knowledge is great. Good diagrams, but it’s sometimes hard to get through the hubs and token ring.

Domain Related Knowledge

Network Management is really about ensuring stability and helping the business to meet their operational requirements with the greatest efficiency possible. In that light, it’s important to understand what some of those operational burdens are. In recent years, businesses have had a ton of GRC (governance, risk, and compliance) requirements put on the operations teams that threaten to break an already overloaded team.   On the bright side, I believe that although they have been forced into these requirements through legislation and governance like SOX, COSO, PCI-DSS, HIPPA, Gramm-Leach-Bliley, etc.  have actually forced network operations teams to get much tighter on their controls, forcing us into more stable and secure networks. 

note: This list is US specific, if international readers can post some examples in the comments section, I would be happy to add them to the list of references.

In my experience, one of the issues with GRC requirements in general is that they are very rarely descriptive of what actually needs to be done. They have generic statements like “monitor network access”  or ” secure the it assets”.  

ISACA noticed this and put together the COBIT framework which is a very detailed list of over 30 high-level processes and over 200 specific IT control objectives. Most of the GRC requirements can be mapped to specific COBIT objectives. COBIT is a good thing to be familiar with. 

 Next Steps

As we move forward in IT, operations and orchestration skills are starting to become some of the hottest requirements in IT. 

Whether it’s products like HP’s Cloudsystem, or industry wide projects like OpenStack, CloudStack or Eucalyptus, having solid Operational knowledge and skills is going to be a requirement for anyone seeking the coveted Trusted IT Advisor role in any customer. 

For anyone looking to gain or just brush up on their network management specific skills.

I would recommend

  • Solarwinds videos as a place to get started with the basics of network management
  • become familiar with the basics of COBIT and GRC in general..  Doing some reading on the various GRC requirements that apply to your specific regions and customers is also a great way to change the conversation from speeds and feeds to the challenges of the business. 
  • Read on OpenStack
  • Learn about ITIL and DevOps

Social Media is always a great way to stay current as well. One of the biggest challenges of operations is the best way to learn it is to do it. Unfortunately, many of the really good Network professionals, whether pre-sales or professional services, don’t get an opportunity as they are usually hands-off or on turning over the keys to an ops team after the project has been delivered. Socialmedia helps to connect to the daily challenges of people who are living in the trenches. 

Get some ITSM experience. If you don’t work in a company where you get to babysit the same environment, you can always do what I did and experiment with it at home.

Anyone else have any suggestions on how to get up to speed? Feel free to comment below!

@netmanchris

FCAPS – A Quick Introduction

It occurs to me that I’ve been writing the last few posts about network management tasks based on an ITSM model and I didn’t even introduce what is probably the more arguably more useful model for breaking down and understanding network management tasks; the FCAPS model.
FCAPS has it’s roots in the ISO, similar to another model we all know and love; the OSI model. Everyone remember that one? Please Don’t Take Sales’ Peoples Advice?  You may have learned another acronym for it, but this is the probably the most basic conceptual model that every networking person uses to understand the world we live in.

For those of you who are looking for some extra credit reading, or need a cure for insomnia, you can find the actual FCAPS standards in the ITU-T M.3400 recommendations. For the rest, I’m hoping to give a brief overview to help you understand the different aspects of the disciplines of network management.

F is for Fault

This involves the detection, isolation, and correction of a fault condition. Or in plain english, this lets you know when things are broken.

Fault Management could involve things like syslog, SNMP traps been escalated to Alarms. Root-Cause-Analysis and Alarm suppression or some AI which tries to seperate the signal from the noise during event storms.  Alarm notification policies ( sending out an e-mail once you get an alarm ).

Traditionally this was implemented in a lot of NMSs as Green-is-good management. Basically, if everything is green. Things are ok. If they are yellow or red, you’ve probably got along night ahead of you.

In recent years, Fault Management has started to include application performance management as well. In modern networks, it’s not enough to know that an application is “up”. Now we must also make sure that the level of service, or SLA, that is been delivered to the end-user is adequate to meet their needs.

Note: Whether an activity falls into one category of FCAPS or another might depend on your perspective. If you are measuring bandwidth on a particular port, you may be in the “P”, but if you are measuring the bandwidth and raising an alarm if you cross a certain threshold, you’re now in the “F”.

This may seem confusing at first, but remember that FCAPS is just a conceptual model.  This is similar to the 7 Layer OSI model. Ask any good network person what layer MPLS falls at and they will either answer ” It depends” or potentially ” 2.5 “.

C is for Configuration

This involves the configuration of the software and hardware in the network. This includes the versions of software, the actual configurations, change management, etc…

This is probably the easiest to understand. If you’re upgrading code on a switch or router, if you’re logging into a router to make a configuration change, or if you’re just plugging a network cable in to a PC, you’re in the “C”s.

Accounting

This involves the identification of cost to the service provider and payment due for the customer. Ie: Billing.

Personally, I find this definition a little restrictive and prefer to apply the definition that I heard in a presentation.  I wish I could remember the name of the gentleman to give him credit. He started out in a thick southern drawl

The thing to remember about a’counting, is that the rest of the world just calls it counting.

I know. Barely funny, right?

But it does allow us to use this to include things like

  • netflow for counting the different protocols running across a certain WAN link.
  • SNMP polling of T1/PRI interfaces for ensuring that you’re Erlang calculations are accurate and you don’t need to raise or lower the number of trunks on your voice gateways.
  • RADIUS to track how long a user was logged into a specific port on the network or how much bandwidth he actually used.

You get the picture. Basically, accounting is just counting things which might be interesting to you.

Although this is not the strict definition from the ITU M.3400, this amended version makes it easier for me to apply this because I don’t have very many customer (read: any) who actually do charge-backs for their services.

Obviously, in a XaaS service, this domain is probably going to get a lot of attention in the coming years.

P is for Performance

This involves evaluating and reporting on the effectiveness of the network, and individual network devices.

Way back when I did my CCNA, one of the things I remember reading about was how you should be checking your routers and switches often to see if their CPU or memory was running high. I’ve never actually met anyone who logged into a device to check on a daily basis, but the advice was actually really good.

With a good NMS, you can

  • use SNMP polling for the CPU and Memory to track their trending over time.
  • use ICMP to track availability of the devices ( assuming it responds!)
  • use ICMP to track the latency of the device to test the quality of the link.

As I mentioned in the Fault section, performance often blurs with fault in that good performance management habits can alert you to  faults in the network. In fact, good performance management can even allow you to proactively avoid faults by identifying a potential performance block in the network, and addressing the issue before it turns into a fault.

Probably the most important thing to know about performance management is that it helps you make better decisions.

Most good network engineers can instinctively know where the bottlenecks are in their networks and can usually correctly identify what needs to be upgraded to get the most benefit.

Most great network engineers can use the pretty graphs from a good performance management tool to get the money from their CFO for those upgrades.

In my home network, I actually track the response time of all my links, as well as additional services, such as the one below which allows me to keep my wife happy.

Facebook Response Time Performance Tracking

note: probably the most recognizable performance management tool would be MRTG/PRTG. I can’t even imagine how many network upgrades were justfied by the pretty graphs that came out of these tools.

Security

Security is… well security. These are the network management activites that involve securing the network and the data running over it.

In a lot of ways, I strongly believe that security should be addressed in every waking (and sleeping!) moment that you’re thinking about your networks. Security should become so second nature to us that it should be almost impossible to perform any of the other tasks without security entering the conversation.

What do I mean?

Fault – CIA – Confidentiality, Availability, and Integrity. Hard to be secure when it’s not available and the Fault domain helps us keep it that way!

Configuration – Auditing – Good configuration management practices can involve automated IT Control objective verification tools, otherwise known as “scripts” which will allow us to have the NMS ensure all the configurations are what they should be and no unneeded services are on our routers and switches.

Performance – You can’t get performance data without SNMP, and if you’re using SNMP, PLEASE USE SNMPv3 if possible!  It can be encrypted with integrity. Also, lock down your management interfaces with ACLs on your devices.
FCAPS

It’s just a model

Please don’t take it too seriously. It’s not a binary model. Feel free to apply some fuzzy logic here and be confident that it’s 46% Fault Management and 54% Performance Management.

The important thing here is that it helps us understand the network management world we live in. It gives us a conceptual model to be able to understand the different activities involved in network management. As an added bonus, it also gives us a handy tool to evaluate different NMS software packages.

Think about the tools you’re using. Are you using a point solution, like Solarwinds Orion NPM which focuses on Performance monitoring, or an Open Source tool like RANCID which focuses on Configuration?

Or are you looking at a SPOG solution like HP’s IMC which provides full FCAPS (and more!) in the base package?

What tools are you using? Are they full FCAPS?Or are they more focused on one particular area?