It occurs to me that I’ve been writing the last few posts about network management tasks based on an ITSM model and I didn’t even introduce what is probably the more arguably more useful model for breaking down and understanding network management tasks; the FCAPS model.
FCAPS has it’s roots in the ISO, similar to another model we all know and love; the OSI model. Everyone remember that one? Please Don’t Take Sales’ Peoples Advice? You may have learned another acronym for it, but this is the probably the most basic conceptual model that every networking person uses to understand the world we live in.
For those of you who are looking for some extra credit reading, or need a cure for insomnia, you can find the actual FCAPS standards in the ITU-T M.3400 recommendations. For the rest, I’m hoping to give a brief overview to help you understand the different aspects of the disciplines of network management.
F is for Fault
This involves the detection, isolation, and correction of a fault condition. Or in plain english, this lets you know when things are broken.
Fault Management could involve things like syslog, SNMP traps been escalated to Alarms. Root-Cause-Analysis and Alarm suppression or some AI which tries to seperate the signal from the noise during event storms. Alarm notification policies ( sending out an e-mail once you get an alarm ).
Traditionally this was implemented in a lot of NMSs as Green-is-good management. Basically, if everything is green. Things are ok. If they are yellow or red, you’ve probably got along night ahead of you.
In recent years, Fault Management has started to include application performance management as well. In modern networks, it’s not enough to know that an application is “up”. Now we must also make sure that the level of service, or SLA, that is been delivered to the end-user is adequate to meet their needs.
Note: Whether an activity falls into one category of FCAPS or another might depend on your perspective. If you are measuring bandwidth on a particular port, you may be in the “P”, but if you are measuring the bandwidth and raising an alarm if you cross a certain threshold, you’re now in the “F”.
This may seem confusing at first, but remember that FCAPS is just a conceptual model. This is similar to the 7 Layer OSI model. Ask any good network person what layer MPLS falls at and they will either answer ” It depends” or potentially ” 2.5 “.
C is for Configuration
This involves the configuration of the software and hardware in the network. This includes the versions of software, the actual configurations, change management, etc…
This is probably the easiest to understand. If you’re upgrading code on a switch or router, if you’re logging into a router to make a configuration change, or if you’re just plugging a network cable in to a PC, you’re in the “C”s.
This involves the identification of cost to the service provider and payment due for the customer. Ie: Billing.
Personally, I find this definition a little restrictive and prefer to apply the definition that I heard in a presentation. I wish I could remember the name of the gentleman to give him credit. He started out in a thick southern drawl
The thing to remember about a’counting, is that the rest of the world just calls it counting.
I know. Barely funny, right?
But it does allow us to use this to include things like
- netflow for counting the different protocols running across a certain WAN link.
- SNMP polling of T1/PRI interfaces for ensuring that you’re Erlang calculations are accurate and you don’t need to raise or lower the number of trunks on your voice gateways.
- RADIUS to track how long a user was logged into a specific port on the network or how much bandwidth he actually used.
You get the picture. Basically, accounting is just counting things which might be interesting to you.
Although this is not the strict definition from the ITU M.3400, this amended version makes it easier for me to apply this because I don’t have very many customer (read: any) who actually do charge-backs for their services.
Obviously, in a XaaS service, this domain is probably going to get a lot of attention in the coming years.
P is for Performance
This involves evaluating and reporting on the effectiveness of the network, and individual network devices.
Way back when I did my CCNA, one of the things I remember reading about was how you should be checking your routers and switches often to see if their CPU or memory was running high. I’ve never actually met anyone who logged into a device to check on a daily basis, but the advice was actually really good.
With a good NMS, you can
- use SNMP polling for the CPU and Memory to track their trending over time.
- use ICMP to track availability of the devices ( assuming it responds!)
- use ICMP to track the latency of the device to test the quality of the link.
As I mentioned in the Fault section, performance often blurs with fault in that good performance management habits can alert you to faults in the network. In fact, good performance management can even allow you to proactively avoid faults by identifying a potential performance block in the network, and addressing the issue before it turns into a fault.
Probably the most important thing to know about performance management is that it helps you make better decisions.
Most good network engineers can instinctively know where the bottlenecks are in their networks and can usually correctly identify what needs to be upgraded to get the most benefit.
Most great network engineers can use the pretty graphs from a good performance management tool to get the money from their CFO for those upgrades.
In my home network, I actually track the response time of all my links, as well as additional services, such as the one below which allows me to keep my wife happy.
note: probably the most recognizable performance management tool would be MRTG/PRTG. I can’t even imagine how many network upgrades were justfied by the pretty graphs that came out of these tools.
Security is… well security. These are the network management activites that involve securing the network and the data running over it.
In a lot of ways, I strongly believe that security should be addressed in every waking (and sleeping!) moment that you’re thinking about your networks. Security should become so second nature to us that it should be almost impossible to perform any of the other tasks without security entering the conversation.
What do I mean?
Fault – CIA – Confidentiality, Availability, and Integrity. Hard to be secure when it’s not available and the Fault domain helps us keep it that way!
Configuration – Auditing – Good configuration management practices can involve automated IT Control objective verification tools, otherwise known as “scripts” which will allow us to have the NMS ensure all the configurations are what they should be and no unneeded services are on our routers and switches.
Performance – You can’t get performance data without SNMP, and if you’re using SNMP, PLEASE USE SNMPv3 if possible! It can be encrypted with integrity. Also, lock down your management interfaces with ACLs on your devices.
It’s just a model
Please don’t take it too seriously. It’s not a binary model. Feel free to apply some fuzzy logic here and be confident that it’s 46% Fault Management and 54% Performance Management.
The important thing here is that it helps us understand the network management world we live in. It gives us a conceptual model to be able to understand the different activities involved in network management. As an added bonus, it also gives us a handy tool to evaluate different NMS software packages.
Think about the tools you’re using. Are you using a point solution, like Solarwinds Orion NPM which focuses on Performance monitoring, or an Open Source tool like RANCID which focuses on Configuration?
Or are you looking at a SPOG solution like HP’s IMC which provides full FCAPS (and more!) in the base package?
What tools are you using? Are they full FCAPS?Or are they more focused on one particular area?