Introduction to R and SWIRL

So I’m taking a Cousera course from John Hopkins on Data Science.

The course uses the R programming language which is a derivative of the S programming language that came out of Bell labs in the 70’s. I’m a huge believer in network programability and SDN in general. From a traditional  Network Management point of view, most of the work getting done and discussed today is really around the C in the FCAPS model. There are some people, like Jason Edelman, Matt Oswald, etc… who are using network programability for automating troubleshooting tasks, but most of those are pretty straight forward

  • automate information gathering
  • automate troubleshooting
  • Identify the corrective action

Once you’ve got the corrective action nailed down, you could also automate the fix, but there are a lot of people who are still nervous about having changes happen without a human being involved. 

Automating configuration management and configuration based fault detection and error correction are great things. But there are other parts of the network that can benefit from the application of a programming language to old problems. 

I’m personally interested in the massive amounts of data that the network holds. We’ve got a ton of instrumentation within the network that is just setting there to be accessed, tracked, and mined for useful insights. 

Data Science is all about different methods to scroll through all the data in a scientifically reproducable manner, hopefully gain some insights.

The Tools

Like python, R has an IDE available that will allow you to run R code interactively, or through R files. It can be downloaded at the CRAN site here

There’s also a better IDE available called R Studio that allows some additional functionality which is available here 

SWIRL is a library which allows learners to access some interactive tutorials written in R for R. There’s a GIT repository here which provides a set of tutorials for different courses that allows you to get a feel for the language syntax, creating functions, etc…  

 

R Studio and Swirl

Once you install the SWIRL library, which is really easy using the RStudio Install Packages feature, you load the SWIRL library ( think IMPORT in Python ) using the library(swirl) function. Once you’ve done that, you can either download the course files from the GIT repository, or you can install directly from within R ( uses CURL in the background to download the files directly into your working directory ). 

As you can see in the screen capture, I’ve got a few different course installed, and each of the courses has a bunch of lessons inside them. The screen capture shows the lessons within the R Programming course. What’s also cool about this is, assuming that you’re enrolled in the Coursera R Programming course, you can complete the lesson, input your username and password ( specific to your course, not your cousera password ) and magically, you get extra credit for the course lessons you complete.   

Extra credit is a good thing.

 

Wrap Up

I’ve only been into R for about a week. It’s got some nice features, but to be honest, I don’t have enough coding experience to really give a qualified opinion on the subject. I’ll continue to work with it and see where things go.  There’s still a ton of python that I need to learn, but I’ve already found a native python library called rpy2 that allows me to access native R libraries from within my python code. Best of both words I guess. 🙂

 

Network Developer: A network engineers Journey into Python

Like most other people in the networking industry, I’ve been struggling with answering the question as to whether or not Network Engineers need to become programmers. It’s not an easy question to answer and after a few years down this SDN journey, I’m still no closer to figuring out whether or not network engineers need to fall into one of the following categories

Become Full-Time Software Developers

DaveTucker

For those of you who don’t know @dave_tucker, he was a talented networking engineer who choose to make the jump to becoming a full time programmer. Working on creating consumption libraries using python for the HP VAN SDN Controller, contributing to the OpenDayLight controller, and now joined up with @networkstatic, another great example. and @MadhuVenugopal   to form SocketPlane focused on the networking stack in Docker. 

Gain some level of proficiency in a modern programming language

One of the people that i think has started to lead in this category is @jedelman8. Jason is a CCIE who glimpsed what the future may hold for some in our profession and has done a great job sharing what he’s been learning on his journey on his blog at http://www.jedelman.com/.  Definitely check it out if you haven’t already. 

This is also where I’ve chosen to be for now. The more I code, I think it’s possible that I could go full programmer, but I also love networking too much. I guess the future will tell with that one. 

For this category, this will mean putting in extra time on nights and weekends to focus on learning the craft.  As someone once told me, it takes about 10 years to become a really good network engineer, no one can be expected to become a good programmer in a year, especially not with a full time day-job. 

On the bright side there are a lot of resources out there like

Coursera.org – Just search for the keyword “python” and there are several good courses that can help you gain the basics of this language.

CodeAcademy.com – CodeAcademy has a focused python track that will allow you to get some guided hands on labs as long as you have an internet connection.

 pynet.twb-tech.com – @kirkbyers has put together an email led python course specifically for network engineers over at   He’s also got some great blogs  that discuss how to use python for different functions that are specifically related to network engineers day-to-day jobs. Having something relevant always helps to make you’re live easier. 

Gain the ability to think programmatically and articulate this in terms software developers understand

I don’t have any really good examples of this particular category.  For some reason, that has so far eluded me, there just isn’t many network engineers in this category. If you know of any great examples, please comment below and I’ll be happy to update the post!

This is where I was a coupe of years ago. I knew logic. I could follow simplistic code if it was already written, and I could do a good enough job communicating to my programming friends enough to ensure that the bottle of tequila I was bribing them with would most likely result in something like what I had in my head. 

 

Stay right where they are today. 

The star fish is one of the few creatures in the history of evolution that went “ Hmmm. I think I’m good! “   This isn’t a judgement, but you need to decide where you want to be and if Star Fish is it… you might find your future career prospects limited. 

starfish

 

 

Journey Ahead

 

As I get back into actually posting, I’m planning on sharing some of the simplistic code that I’ve been able to cobble together. I make no claims as to how good this code is, but I hope that it will inspire some one else reading this to take some classes, find a project, and then write and share some small script or program that makes their life just a little bit easier. Guys like Jason have done this for me. I recently hit a place where I finally have enough skills to be able to accomplish some of the the goals I had in mind. My code is crap, but it’s so simplistic that it’s easy to understand exactly what I’m doing.  And that’s where I think the value comes from sharing right now.

 

Comments or thoughts? Please feel free to comment below!

 

 

Solarwinds NPM – Take 2

Ok. So I’m back at it now.

The first step of this mulligan was to remove the activated license from the corrupted windows box that caused me all the trouble in the first place.

While I deploy a brand new Windows 2012 image, I headed over to the solarwinds website and read through this document.  As detailed in the doc, I installed the licensing application. Deactivated the NPM license and everything went as great.

Good news so far. I’m really looking forward to start digging into how NPM manages HP Networking gear.

An Update

So after the fiasco of the last attempted install. I learned a couple of things.

  • The Solarwinds NPM install package from the customer portal does NOT include the embedded Microsoft SQL server.  If you want to run this with SQL express, then you need to install the eval version.  Good thing to know if you are trying to install NPM in a smaller environment.  Keep in mind though, it is STRONGLY recommended – I read it multiple times in the docs – to use an external SQL server when using NPM in production. This makes sense for a “real” network, but for my purposes, I have a small lab so there’s really no need. 
 
  • My Windows image was hosed. screwed. burned out. totally useless.  When I did the install on a brand new Windows 2012 server, it went totally smooth. I pre-installed the IIS server, as mentioned in the docs, and everything else went off without a hitch, so much so that the only reason I’m mentioning it is the fact that I had so much trouble the first time.   The blame for that one goes on a bad windows build.

 

First thoughts

Initial Discovery

It’s been a couple of years since I was at the helm of an NPM  box, but to be honest, it feels pretty comfortable. Having a lot of sticktime on some other products, I had a bit of trouble with getting the desired results from the discovery process ( IP ranges vs. Subnets didn’t do exactly what I wanted – I kept getting more ranges that I wanted to. ) but after a few tries, I managed to get the initial discovery up and running without any trouble.

The Good:

In general. The discovery process went smooth. Interestingly, NPM asked me for windows, vmware, telnet/ssh, and SNMP credentials. The nice thing, which kind of surprised me, was that NPM was now able to discover my VMware ESXi and vCenter servers. This is a good thing as I’m a big fan of providing a consolidated view of the entire network, whether that’s physical or virtual, wired or wireless. I’ll check later into what Virtualization support is actually offered in NPM, but for now, I’m happy to see that I can at least identify the resources on my network. 

 

NewImage

 The not so good:

There were a couple of mis–labeled devices. Specifically, the HP 5500EI and the HP 5120EI which are a couple of boxes that have been in the market now for a few years. As you can see from the images below, both of these devices are HP devices. The description ( which is pulled directly from the device through the sysdesc OID  ( .1.3.6.1.2.1.1.1.0  for anyone who’s counting ) does show that this is an HP device.

 

NewImage

 

 

 

 

 

 

 

 

 

 

 

 

 

On the bright side, the error has been submitted to the NPM unknown device thread here so hopefully this will be addressed in a future update. 

Topology Maps

In previous versions of Solarwinds, one of the things that did bother me was having to jump back and forth between the web interface and the windows console depending on the task that I needed to accomplish. I know Solarwinds has done a lot of work to move all the administrative functions into the web interface, but it doesn’t look like Network Atlas has made the cut yet. 

This is first glance, so it’s possible I just haven’t clicked on the right button yet. One of the most powerful pieces of a good NMS is an accurate topology map. Now that I’ve got the network discovered and up and running, creating some network maps are going to be my next task. 

 

NewImage

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Closing

In general, Solarwinds feels familiar. It’s not too far removed from the versions I was more familiar with so I’m hoping that digging in is going to go more smoothly. I’m also VERY happy that I’m over my initial install issues. That was a painful experience and it’s nice to be able to say I just had a corrupted windows build.  The new install went perfectly.  I’ve been spending some time upgrading my lab to ESX 5.5 this week, as well as playing with the HP SDN Controller as well, so I might take a break from Solarwinds for a bit, but expect more info in the future as I start to spend some more time with NPM.

 

@netmanchris

Network Management – How to get started

Network Management Skills

In the last few years, I’ve noticed that I’m a little different. It’s not just because I wear coloured socks or my hair looks like I style after Albert Einstein.  I noticed that I’ve developed a different skill set than the majority of my pre-sales or post-sales network professional peers. What skills you ask?  Network Management and Operations.

Why I choose to develop Network Management skills

About five years ago, I took a look at the market and thought ” This stuff is complicated “.  Earth shattering observation, right?   It sounds simple, but then I started looking at some of the tools we had at the time and I realized that NMS tools could really help to automate not only the information gathering, but also the configuration tasks in our networks.  At the time, we had a cool little tool called 3Com Network Director.  It ran on a single PC. No web interface and it really only managed 3Com gear. But it was better than running CLI commands all day long. And the monitoring aspects really helped my customers identify and resolve problems quickly.  This was a moment of inspiration for me.  I choose to develop skills in network management and operations.

Let me say that again.

I choose to develop skills in network management and operations. 

I didn’t choose to develop skills in 3ND, or IMC, or Solarwinds, Cisco Prime, or any of the various other tools. Overtime, I’ve gained experience on all of those products, but I would say my true value is having gone through the process to develop skills in the sub disciplines of network management. Learning a product is only a very small part of the whole domain. 

What does that mean? 

It’s easy to learn a product. They have bells and whistles. Click this check box. fill in this box. etc..    Those skills are important. But they don’t help us understand how to apply the product to resolve our customers business challenges. They don’t help us understand when not to click that box. And they don’t help us to design a network management strategy, or to consult with our customers on operational efficiencies and what can be done to help increase their networks stability, to reduce the MTTR times, or to mitigate pressures put on the operations team. Learning the domain knowledge has helped me to understand WHY we have developed the product features and what they are to be used for.

My Learning Roadmap

To put it simply, I consumed everything I could on the subject. It’s amazing how much free information is out there if you set your mind on finding it. If anyone’s looking to increase their skills in this area, I’ve put together the following list of resources that have really helped me in this domain. I’ve tried to keep this out of vendor specific products, but I’m sure you’ll find that any product you choose will probably have training and learning resources around it as well. This is in NO way inclusive, there are a lot of resource out there. I highly encourage everyone to read, watch, and listen to as many of them as you can and to think about them critically.

Free Resource

Solarwinds SCP training  The Solarwinds SCP training is online and free. What I really liked about this training is that it’s really focused on network management, netman protocols, and the operational aspects of network management. There are, of course, some product specific aspects to the training, but in general this is a really good primer on network management in general. Oh… did I mention there’s a bunch of videos as well?  Great stuff to rip and put on your tablet when you’re stuck on a plane and you’ve seen all the movies. 

Solarwinds has also provided a bunch of whitepapers going further in depth on network management specific subjects which are a great reference.  If you’re interested there’s also the Solarwinds Certified Professional certification if you’re looking for a way to validate your knowledge.

The Information Technology Infrastructure Library  (ITIL) is a compilation of IT service management practices compiled over the last 30 years. There’s a lot of great stuff in here. The books are expensive though.  There is an entire industry that’s sprung up around ITSM.  If you have some commute time to spare, I would highly suggest typing in the words “ITSM” into your favourite Podcast app and sit back and listen. 

If you’re interested, there’s also the ITILv3 Foundations certification if you’re looking for a way to validate your knowledge.

Blogs and Podcast

Social Media is a great way to learn how people apply ITIL concepts to the real world. I particularly like http://www.itskeptic.org as it’s got a great following of a bunch of smart people who disagree on a regular basis. You never know when the customer you’re going into is operating in a traditional ITIL based ops model, perhaps they are using the Microsoft OperationsFramework, or perhaps they’ve moved on to Agile and DevOps. It’s good to have at least a cursory knowledge in all of this approaches to IT operations, to mention traditional Network Management Frameworks like FCAPS and eTOM

Paid resources

Books are a great way to learn about network management and operations. Here’s my  abbreviated reading list. These are the reference books that sit on my shelf within easy reach. 

Network Maturity Model – This book is actually a academic thesis focused on trying to extend the CMMI models to network specific capabilities maturity models. Of course, network operations is part of the capabilities of an organization, so there’s a lot of great content in here.  The book is definitely academic, but it’s got a LOT of great content in it, assuming you can get through all of the required footnotes and pointers to other academic works. 

Fundamentals of EMS, NMS, and OSS/BSS – This books is wonderful. It covers all aspects of traditional telecom management from FCAPS to eTOM, as well as looking at OSS/BSS architectures which usually exist only in Service Provider networks. Great information in here.  My biggest problem with this book is the font size. I have glasses and it’s tiny.  Worth the effort to make it through, but plan on multiple reading sessions. This is not a book you’re going to get through in one sitting.

Network Management Fundamentals – Cisco Press book that’s a great read. A lot of information in here is covered in some of the other books. What I like about this is that it written as an introduction to network management for people already working in the field. This is not an academic text.

Network Management: Accounting and Performance Strategies – Cisco Press book again. This one focuses strictly on performance management, focusing a lot on Netflow and how it can be applied to accounting and performance in large network.  

Performance and Fault Management – Cisco Press Book again. This is an older book, so the technologies discussed may not be as relevant as they once were. The nice thing though is that we’re talking about operational models and processes here, so the principles still apply. 

VoIP Performance Management and Optimization – Last Cisco Press book. This book looks at the operational aspects of VoIP/IP Telephone/Unified Communications networks specifically. There are a lot of very detailed recommendations in here that can be leveraged to give customer guidance on what they should be doing and what they should be monitoring. This book has helped me a few times when working with customer who have chosen to implement a dual-vendor strategy and want to have HP Intelligent Management Center managing and monitoring there Cisco Callmanager environment in addition to their network.

The Phoenix Project – This book is written as a novel to teach people about the DevOps movement. This is a MUST read for anyone interested in IT operations and the current trends in the industry. It also will help get a first hand accounting of what many customers go through. Read it. Read it. Read it.

The Visible Ops – From the same authors of the Phoenix Project. This book tries to tie DevOps and ITIL together. Interesting read. Many people see DevOps and ITIL as two opposites of the spectrum. Most have had a bad ITIL experience and now the pendulum swings in the other direction. Finding a happy middle is a good goal. I’m not sure they’ve hit the mark, but it’s a start.

Network Management: Principals and Practice – Expensive book. Good information, but the technology is also quite dated. Concepts and knowledge is great. Good diagrams, but it’s sometimes hard to get through the hubs and token ring.

Domain Related Knowledge

Network Management is really about ensuring stability and helping the business to meet their operational requirements with the greatest efficiency possible. In that light, it’s important to understand what some of those operational burdens are. In recent years, businesses have had a ton of GRC (governance, risk, and compliance) requirements put on the operations teams that threaten to break an already overloaded team.   On the bright side, I believe that although they have been forced into these requirements through legislation and governance like SOX, COSO, PCI-DSS, HIPPA, Gramm-Leach-Bliley, etc.  have actually forced network operations teams to get much tighter on their controls, forcing us into more stable and secure networks. 

note: This list is US specific, if international readers can post some examples in the comments section, I would be happy to add them to the list of references.

In my experience, one of the issues with GRC requirements in general is that they are very rarely descriptive of what actually needs to be done. They have generic statements like “monitor network access”  or ” secure the it assets”.  

ISACA noticed this and put together the COBIT framework which is a very detailed list of over 30 high-level processes and over 200 specific IT control objectives. Most of the GRC requirements can be mapped to specific COBIT objectives. COBIT is a good thing to be familiar with. 

 Next Steps

As we move forward in IT, operations and orchestration skills are starting to become some of the hottest requirements in IT. 

Whether it’s products like HP’s Cloudsystem, or industry wide projects like OpenStack, CloudStack or Eucalyptus, having solid Operational knowledge and skills is going to be a requirement for anyone seeking the coveted Trusted IT Advisor role in any customer. 

For anyone looking to gain or just brush up on their network management specific skills.

I would recommend

  • Solarwinds videos as a place to get started with the basics of network management
  • become familiar with the basics of COBIT and GRC in general..  Doing some reading on the various GRC requirements that apply to your specific regions and customers is also a great way to change the conversation from speeds and feeds to the challenges of the business. 
  • Read on OpenStack
  • Learn about ITIL and DevOps

Social Media is always a great way to stay current as well. One of the biggest challenges of operations is the best way to learn it is to do it. Unfortunately, many of the really good Network professionals, whether pre-sales or professional services, don’t get an opportunity as they are usually hands-off or on turning over the keys to an ops team after the project has been delivered. Socialmedia helps to connect to the daily challenges of people who are living in the trenches. 

Get some ITSM experience. If you don’t work in a company where you get to babysit the same environment, you can always do what I did and experiment with it at home.

Anyone else have any suggestions on how to get up to speed? Feel free to comment below!

@netmanchris

Juniper EX4200T- Management Observations

So I’ve had been spending some time playing around with a juniper EX4200T from a management standpoint.

This post is just a place to put some observations and questions. Hopefully, some Junos Peeps will be able to shed a little light on some of these questions.

First, as both a criticism and a defence; Juniper does not use SNMP as their primary interface. I get that SNMP has it’s problems, but it’s what we have and if you want to bring Juniper into a network where there is already a network management system in place, I would think that they should at least do the minimum to improve their SNMP support to at least meet the bar.

I have to say; as an operationally focused network engineer, it does disturb me that I can’t even set the sys location from a simple SNMP set command.   

ifIndex

One of the first things I noticed about the Juniper box is that the seem to have some strangeness, at least compared to other vendors, around the number of interfaces. Specifically, I’ve got a 48 port switch with more than twice that many interfaces.  Upon a closer look, it seems that the Juniper switches, or at the very least the EX 4200t, seems to have two index values for every physical port.

Juniper SNMP Interfaces

 

One of the interesting questions that come up here is ” What ifIndex value do I poll?”.  I’d like to get interface stats on this device, but do I poll the ethernet port, or the prop virtual port?  And if both return the same values; Why would I chose one over the other?  

Anyone have a good explanation of WHY they went this direction? @steve did suggest to think of this like a sub interface in Cisco terms.  I’ve been trying to figure this out, but the most common reason I’ve used a sub-interface has been to create dot1q routing on a stick configurations.  I don’t see how that applies here?

 

MIB Walking 

Another strange thing is that it seems that the EX4200 cannot return all the interfaces when reading the ifTable by SNMP.  It may be that this is an issue with my MIB browser, but it’s definitely a pain in the butt.  

Junos-peeps: Anyone have a MIB browser that works here? Suggestions on code? Possibly a bug?

 

VLAN 0

One of the other things I noticed is that the default VLAN of the EX4200 is 0. Huh? VLAN 0? All of the interfaces on the switch belong to VLAN 0 initially.  I did find this article  from the Juniper website says that ” Some attached devices may not accept 802.1q-tagged frames, and therefore can reside only in VLAN 0.” 

Coming from a Cisco and HP background, I’ve always seen the native VLAN initially on a interface listed as VLAN 1.    Anyone able to explain this to me?

VLAN-Range: Anyone able to explain this to me? Now I checked the Juniper documentation .  But I wasn’t able to find an article which explained what exactly the function is for. 

 

If anyone has comments, I’d love to learn here. I freely admit I haven’t had time to get far enough into this to understand the benefits and I do bring the baggage of history to my perspective on this.  If someone has made the jump to Junos, I’d love to hear from you! 

 

@netmanchris

Getting ITSM Experience

Many of the best network engineers I know have little to no network operation experience.

“What? How can that be?” you ask? Well it’s really quite simple.

Most of the best network engineers I know, and we’re talking some double and triple CCIE’s in this crowd, have never actually operated a network for any length of time. They were professional services guys, short term contract guys, some pre-sales or post sales guys. There are a LOT of paths to the top of Mt. Fuji after all.

Although I did a short term network ops. gig early on in my career, I actually feel I squandered the opportunity as I just wasn’t mature enough to understand the experience that I should have been gaining.

So this question came up with a college last week. ” How do you get network operations experience if you’re not in a network ops group?”

This blog post is dedicated to him.

A few years back, after I decided to really get serious about network management, I had the same issue. I wanted to get some experience in network management, but I had no large network to run. In my day job, I’m actually a pre-sales resource, so it’s not likely I’m going to get any experience in the near future, so it occurred to me that I could start a simulation to try and gain that experience.

At this point, I had already done some Ciscoworks LMS projects (long sleeve shirts to cover the scars to prove it!). I had successfully passed my ITILv3 foundations certifications, and I had even gained the honor of being one of the first Solarwinds Certified Professionals shortly after the SCP program was launched.

The Project

So with a bit of knowledge, I decided to run my home network as if it was an ITSM framework for a year. This means that I had to implement good network management hygene. Good Change management practices. Good fault management practices. Try to implement some of the ITIL processes around Service Strategy, Service Design, Service Operations, and Continual Service Improvement.  Basically, run it like a business who’s success depended on the network.

The Tools

So I had some ideas around the processes I wanted to put in place, but it always takes the three P’s to successfully implement any ITSM initiative. People, products and processes. Fortunately for me, I had access to HP’s Intelligent Management Center, as well as the trial versions of Solarwinds Orion NPM and NCM. but I was still missing some critical pieces to the puzzle.

Service Operations: One of primary activities in Service Operations is really around the help desk. How are tickets logged? How are they tracked? Escalations Procedures. Building out and growing the KMS (knowledge management system )

I didn’t have any help desk or ticketing software in place, so I decided to go the free way; Spiceworks.

For those of you who don’t know it, Spiceworks is a free IT Management app which ” includes a free IT management app for everything from network inventory and monitoring to help desk and more!”

It’s not what I would call a full FCAPS system, but it does have an ok help desk system, and it’s hard to beat free, right?

Note: I noticed last week that my Synology NAS now has a help desk app named OS Ticket in the available apps. I haven’t tried this, but considering it’s free and installs easily on the synology box, it might be a good option for those of you who are lucky enough to have one of these great little machines.

Financial Management

Financial Management falls under the service strategy volume of the ITILv3 core books. I’ll be honest, that this wasn’t exactly the strongest part of my little experiment, but I did try to implement some financial processes.

But unlike some of the helpdesk and change control procedures, this wasn’t exactly something that I could count on good self-discipline to track. Can you imagine that conversation?

“Hey Me… I’d really like this new synology RS812.”

“Hmm… Don’t we already have a 411?”

“Yeah, but this one has TWO gigabit ports!”

“Let me think about that… ok. Let’s buy it!”

As you can see, I had to come up with a different plan.

Fortunately, I’m married, so I merely formalized the process of having to ask my wife for permission to buy any new toys. I have to say, this was probably the year that I got the least amount of new techtoys, but I like to think the experience I gained was worth it. ( < – What’s the HTML tag for the sarcasm font again? )

The Results

So how did things go? Well, it was a little funny at times. Emailing myself a support ticket so that I could fix something that wasn’t working. I did try to get my wife to e-mail the tickets in, but that lasted about a week before she just said ” Can you just fix it!?!?!?!”

For the other things, it felt a little strange asking myself for permission so that I could make a change to the environment and then having to consult myself to see what the affects might be ( Change Advisory Board ). Implementing the RACI (responsible accountable consulted informed ) was pretty easy because I generally get along with myself. etc…

To be honest, I wish I would have been blogging back then, because I think it would have made for some interesting reading in retrospect. I’d like to say that I followed all the processes and ran a bullet proof network for the year, but I didn’t. Sometimes I slipped, made a change and locked myself out of my own gear.

But on the bright side…  I did learn why change management is important.

Any one else gone through an experiment like this? Anyone willing to take up the challenge and blog on the experience?

FCAPS – A Quick Introduction

It occurs to me that I’ve been writing the last few posts about network management tasks based on an ITSM model and I didn’t even introduce what is probably the more arguably more useful model for breaking down and understanding network management tasks; the FCAPS model.
FCAPS has it’s roots in the ISO, similar to another model we all know and love; the OSI model. Everyone remember that one? Please Don’t Take Sales’ Peoples Advice?  You may have learned another acronym for it, but this is the probably the most basic conceptual model that every networking person uses to understand the world we live in.

For those of you who are looking for some extra credit reading, or need a cure for insomnia, you can find the actual FCAPS standards in the ITU-T M.3400 recommendations. For the rest, I’m hoping to give a brief overview to help you understand the different aspects of the disciplines of network management.

F is for Fault

This involves the detection, isolation, and correction of a fault condition. Or in plain english, this lets you know when things are broken.

Fault Management could involve things like syslog, SNMP traps been escalated to Alarms. Root-Cause-Analysis and Alarm suppression or some AI which tries to seperate the signal from the noise during event storms.  Alarm notification policies ( sending out an e-mail once you get an alarm ).

Traditionally this was implemented in a lot of NMSs as Green-is-good management. Basically, if everything is green. Things are ok. If they are yellow or red, you’ve probably got along night ahead of you.

In recent years, Fault Management has started to include application performance management as well. In modern networks, it’s not enough to know that an application is “up”. Now we must also make sure that the level of service, or SLA, that is been delivered to the end-user is adequate to meet their needs.

Note: Whether an activity falls into one category of FCAPS or another might depend on your perspective. If you are measuring bandwidth on a particular port, you may be in the “P”, but if you are measuring the bandwidth and raising an alarm if you cross a certain threshold, you’re now in the “F”.

This may seem confusing at first, but remember that FCAPS is just a conceptual model.  This is similar to the 7 Layer OSI model. Ask any good network person what layer MPLS falls at and they will either answer ” It depends” or potentially ” 2.5 “.

C is for Configuration

This involves the configuration of the software and hardware in the network. This includes the versions of software, the actual configurations, change management, etc…

This is probably the easiest to understand. If you’re upgrading code on a switch or router, if you’re logging into a router to make a configuration change, or if you’re just plugging a network cable in to a PC, you’re in the “C”s.

Accounting

This involves the identification of cost to the service provider and payment due for the customer. Ie: Billing.

Personally, I find this definition a little restrictive and prefer to apply the definition that I heard in a presentation.  I wish I could remember the name of the gentleman to give him credit. He started out in a thick southern drawl

The thing to remember about a’counting, is that the rest of the world just calls it counting.

I know. Barely funny, right?

But it does allow us to use this to include things like

  • netflow for counting the different protocols running across a certain WAN link.
  • SNMP polling of T1/PRI interfaces for ensuring that you’re Erlang calculations are accurate and you don’t need to raise or lower the number of trunks on your voice gateways.
  • RADIUS to track how long a user was logged into a specific port on the network or how much bandwidth he actually used.

You get the picture. Basically, accounting is just counting things which might be interesting to you.

Although this is not the strict definition from the ITU M.3400, this amended version makes it easier for me to apply this because I don’t have very many customer (read: any) who actually do charge-backs for their services.

Obviously, in a XaaS service, this domain is probably going to get a lot of attention in the coming years.

P is for Performance

This involves evaluating and reporting on the effectiveness of the network, and individual network devices.

Way back when I did my CCNA, one of the things I remember reading about was how you should be checking your routers and switches often to see if their CPU or memory was running high. I’ve never actually met anyone who logged into a device to check on a daily basis, but the advice was actually really good.

With a good NMS, you can

  • use SNMP polling for the CPU and Memory to track their trending over time.
  • use ICMP to track availability of the devices ( assuming it responds!)
  • use ICMP to track the latency of the device to test the quality of the link.

As I mentioned in the Fault section, performance often blurs with fault in that good performance management habits can alert you to  faults in the network. In fact, good performance management can even allow you to proactively avoid faults by identifying a potential performance block in the network, and addressing the issue before it turns into a fault.

Probably the most important thing to know about performance management is that it helps you make better decisions.

Most good network engineers can instinctively know where the bottlenecks are in their networks and can usually correctly identify what needs to be upgraded to get the most benefit.

Most great network engineers can use the pretty graphs from a good performance management tool to get the money from their CFO for those upgrades.

In my home network, I actually track the response time of all my links, as well as additional services, such as the one below which allows me to keep my wife happy.

Facebook Response Time Performance Tracking

note: probably the most recognizable performance management tool would be MRTG/PRTG. I can’t even imagine how many network upgrades were justfied by the pretty graphs that came out of these tools.

Security

Security is… well security. These are the network management activites that involve securing the network and the data running over it.

In a lot of ways, I strongly believe that security should be addressed in every waking (and sleeping!) moment that you’re thinking about your networks. Security should become so second nature to us that it should be almost impossible to perform any of the other tasks without security entering the conversation.

What do I mean?

Fault – CIA – Confidentiality, Availability, and Integrity. Hard to be secure when it’s not available and the Fault domain helps us keep it that way!

Configuration – Auditing – Good configuration management practices can involve automated IT Control objective verification tools, otherwise known as “scripts” which will allow us to have the NMS ensure all the configurations are what they should be and no unneeded services are on our routers and switches.

Performance – You can’t get performance data without SNMP, and if you’re using SNMP, PLEASE USE SNMPv3 if possible!  It can be encrypted with integrity. Also, lock down your management interfaces with ACLs on your devices.
FCAPS

It’s just a model

Please don’t take it too seriously. It’s not a binary model. Feel free to apply some fuzzy logic here and be confident that it’s 46% Fault Management and 54% Performance Management.

The important thing here is that it helps us understand the network management world we live in. It gives us a conceptual model to be able to understand the different activities involved in network management. As an added bonus, it also gives us a handy tool to evaluate different NMS software packages.

Think about the tools you’re using. Are you using a point solution, like Solarwinds Orion NPM which focuses on Performance monitoring, or an Open Source tool like RANCID which focuses on Configuration?

Or are you looking at a SPOG solution like HP’s IMC which provides full FCAPS (and more!) in the base package?

What tools are you using? Are they full FCAPS?Or are they more focused on one particular area?