Shedding the Lights on Operations: REST, a NMS and a Lightbulb

It’s obvious I’ve caught the automation bug. Beyond just automating the network I’ve finally started to dip my toes in the home automation pool as well.

The latest addition to the home project was the Philipps hue light bulbs. Basically, I just wanted a new toy, but imagine my delight when I found that there’s a full REST API available. 

I’ve got a REST API and a light bulb and suddenly I was inspired!

The Project

Network Management Systems have long suffered from information overload.

Notifications have to be tuned and if you’re really good you can eventually get the stream down to a dull roar. Unfortunately, the notification process is still broken in that the notifications are generally dumped into your email which if you are anything like me…

NewImage

Yes. That’s really my number as of this writing

One of the ways of dealing with the deluge is to use a different medium to deliver the message. Many NMS systems, including HPE IMC, has the capability of issuing audio alarms, but let’s be honest. That can get pretty annoying as well and it’s pretty easy to mute them.

I decided that I would use the REST interfaces of the HPE IMC NMS and the Phillips Hue lightbulbs to provide a visual indication of the general state of the system.Yes, there’s a valid business justifiable reason for doing this. But c’mon, we’re friends?  The real reason I worked on this was because they both have REST APIs and I was bored. So why not, right?

The other great thing about this is that you don’t need to spend your day looking at a NOC screen. You can login when the light goes to whatever color you decide is bad.

Getting Started with Philipps Hue API

The Philipps SDK getting started was actually really easy to work through. As well, there’s an embedded HTML interface that allows you to play around with the REST API directly on the hue bridge.

Once you’ve setup your initial authentication to the bridge ( see the getting started guide ) you can login to the bridge at http://ip_address/debug/clip.html

From there it’s all fun and games. For instance, if you wanted to see the state of light number 14, you would navigate to api/%app_name%/lights/14 and you would get back the following in nice easy to read JSON.

http://ipaddress/debug/clip.html/

NewImage

From here, it would be fairly easy to use a http library like REQUESTS to start issuing HTTP commands at the bridge but, as I’m sure you’re aware by now, there’s very little unread territory in the land of python.

PHUE library

Of course someone has been here before me and has written a nice library that works with both python 2 and python 3.  You can see the library source code here, or you can simple

>>> pip install phue

From your terminal.

The Proof of Concept

You can check out the code for the proof of concept here. Or you can watch the video below.

Breaking down the code

1) Grab Current Alarm List

2) Iterate over the Alarms and find the one with the most severe alarm state

3) Create a function to correlate the alarm state to the color of the Philipps Hue lightbulb.

4) Wait for things to move away from green.

Lessons Learned

The biggest lesson here was that colours on a screen and colours on a light bulb don’t translate very well. The green and the yellow lights weren’t far enough apart to be useful as a visual indicator of the health of the network, at least not IMHO.

The other thing I learned is that you can waste a lot of time working on aesthetics. Because I was leveraging the PHUE library and the PYHPEIMC library, 99% of the code was already written. The project probably took me less than 10 minutes to get the logic together and more than a few hours playing around with different colour combinations to get something that I was at least somewhat ok with. I imagine the setting and the ambient light would very much effect whether or not this looks good in your place of business.If you use my code, you’ll want to tinker with it.

Where to Next

We see IoT devices all over in our personal lives, but it’s interesting to me that I could set up a visual indicator for a NOC environment on network health state for less than 100$.  Just thinking about some of the possibilities here

  • Connect each NOC agents ticket queue with the light color. Once they are assigned a ticket, they go orange for DO-NOT-DISTURB
  • Connect the APP to a Clearpass authentication API and Flash the bulbs blue when the boss walks in the building. Always good to know when you should be shutting down solitaire and look like you’re doing something useful, right?
  • Connect the APP to a Meridian location API and turn all the lights green when the boss walks on the floor.

Now I’m not advocating you should hide things from your boss, but imagine how much faster network outages would get fixed if we didn’t have to stop fixing them to explain to our boss what was happening and what we were going to be doing to fix them, right?

Hopefully, this will have inspired someone to take the leap and try something out,

Comments, questions?

@netmanchris

Serial numbers how I love thee…

No one really like serial numbers, but keeping track of them is one of the “brushing your teeth” activities that everyone needs to take care of. It’s like eating your brussel sprouts. Or listening to your mom. You’re just better of if you do it quickly as it just gets more painful over time.

Not only is it just good hygene, but you may be subject to regulations, like eRate in the United States where you have to be able to report on the location of any device by serial number at any point in time.

Trust me, having to play hide-and-go seek with an SSH session is not something you want to do when government auditors are looking for answers.

I’m sure you’ve already guessed what I’m about to say, but I”ll say it anyway…

There’s an API for that!!!

HPE IMC base platform has a great network assets function that automatically gathers all the details of your various devices, assuming of course they supportRFC 4133, otherwise known as the Entity MIB. On the bright side, most vendors have chosen to support this standards based MIB, so chances are you’re in good shape.

And if they don’t support it, they really should. You should ask them. Ok?

So without further ado, let’s get started.

 

Importing the required libraries

I’m sure you’re getting used to this part, but it’s import to know where to look for these different functions. In this case, we’re going to look at a new library that is specifically designed to deal with network assets, including serial numbers.

In [1]:
from pyhpeimc.auth import *
from pyhpeimc.plat.netassets import *
import csv
In [2]:
auth = IMCAuth("http://", "10.101.0.203", "8080", "admin", "admin")
In [3]:
ciscorouter = get_dev_asset_details('10.101.0.1', auth.creds, auth.url)
 

How many assets in a Cisco Router?

As some of you may have heard, HPE IMC is a multi-vendor tool and offers support for many of the common devices you’ll see in your daily travels.

In this example, we’re going to use a Cisco 2811 router to showcase the basic function.

Routers, like chassis switches have multiple components. As any one who’s ever been the victem owner of a Smartnet contract, you’ll know that you have individual components which have serial numbers as well and all of them have to be reported for them to be covered. So let’s see if we managed to grab all of those by first checking out how many individual items we got back in the asset list for this cisco router.

In [4]:
len(ciscorouter)
Out[4]:
7
 

What’s in the box???

Now we know that we’ve got an idea of how many assets are in here, let’s take a look to see exactly what’s in one of the asset records to see if there’s anything useful in here.

In [5]:
ciscorouter[0]
Out[5]:
{'alias': '',
 'asset': 'http://10.101.0.203:8080/imcrs/netasset/asset/detail?devId=15&phyIndex=1',
 'assetNumber': '',
 'boardNum': 'FHK1119F1DX',
 'bom': '',
 'buildInfo': '',
 'cleiCode': '',
 'containedIn': '0',
 'desc': '2811 chassis',
 'devId': '15',
 'deviceIp': '10.101.0.1',
 'deviceName': 'router.lab.local',
 'firmwareVersion': 'System Bootstrap, Version 12.4(13r)T11, RELEASE SOFTWARE (fc1)',
 'hardVersion': 'V04 ',
 'isFRU': '2',
 'mfgName': 'Cisco',
 'model': 'CISCO2811',
 'name': '2811 chassis',
 'phyClass': '3',
 'phyIndex': '1',
 'physicalFlag': '0',
 'relPos': '-1',
 'remark': '',
 'serialNum': 'FHK1119F1DX',
 'serverDate': '2016-01-26T15:20:40-05:00',
 'softVersion': '15.1(4)M, RELEASE SOFTWARE (fc1)',
 'vendorType': '1.3.6.1.4.1.9.12.3.1.3.436'}
 

What can we do with this?

With some basic python string manipulation we could easily print out some of the attributes that we want into what could easily turn into a nicely formated report.

Again realise that the example below is just a subset of what’s available in the JSON above. If you want more, just add it to the list.

In [7]:
for i in ciscorouter:
    print ("Device Name: " + i['deviceName'] + " Device Model: " + i['model'] +
           "\nAsset Name is: " + i['name'] + " Asset Serial Number is: " +
           i['serialNum']+ "\n")
 
Device Name: router.lab.local Device Model: CISCO2811
Asset Name is: 2811 chassis Asset Serial Number is: FHK1119F1DX

Device Name: router.lab.local Device Model: VIC2-2FXO
Asset Name is: 2nd generation two port FXO voice interface daughtercard on Slot 0 SubSlot 2 Asset Serial Number is: FOC11063NZ4

Device Name: router.lab.local Device Model:
Asset Name is: 40GB IDE Disc Daughter Card on Slot 1 SubSlot 0 Asset Serial Number is: FOC11163P04

Device Name: router.lab.local Device Model:
Asset Name is: AIM Container Slot 0 Asset Serial Number is:

Device Name: router.lab.local Device Model:
Asset Name is: AIM Container Slot 1 Asset Serial Number is:

Device Name: router.lab.local Device Model:
Asset Name is: C2811 Chassis Slot 0 Asset Serial Number is:

Device Name: router.lab.local Device Model:
Asset Name is: C2811 Chassis Slot 1 Asset Serial Number is:

 

Why not just write that to disk?

Although we could go directly to the formated report without a lot of extra work, we would be losing a lot of data which we may have use for later. Instead why don’t we export all the available data from the JSON above into a CSV file which can be later opened in your favourite spreadsheet viewer and manipulated to your hearst content.

Pretty cool, no?

In [9]:
keys = ciscorouter[0].keys()
with open('ciscorouter.csv', 'w') as file:
    dict_writer = csv.DictWriter(file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(ciscorouter)
 

Reading it back

Now we’ll read it back from disk to make sure it worked properly. When working with data like this, I find it useful to think about who’s going to be consuming the data. For example, when looking at this remember this is a CSV file which can be easily opened in python, or something like Microsoft Excel to manipuate further. It’s not realy intended to be read by human beings in this particular format. You’ll need another program to consume and munge the data first to turn it into something human consumable.

In [12]:
with open('ciscorouter.csv') as file:
    print (file.read())
 
firmwareVersion,vendorType,phyIndex,relPos,boardNum,phyClass,softVersion,serverDate,isFRU,alias,bom,physicalFlag,deviceName,deviceIp,containedIn,cleiCode,mfgName,desc,name,hardVersion,remark,asset,model,assetNumber,serialNum,buildInfo,devId
"System Bootstrap, Version 12.4(13r)T11, RELEASE SOFTWARE (fc1)",1.3.6.1.4.1.9.12.3.1.3.436,1,-1,FHK1119F1DX,3,"15.1(4)M, RELEASE SOFTWARE (fc1)",2016-01-26T15:20:40-05:00,2,,,0,router.lab.local,10.101.0.1,0,,Cisco,2811 chassis,2811 chassis,V04 ,,http://10.101.0.203:8080/imcrs/netasset/asset/detail?devId=15&phyIndex=1,CISCO2811,,FHK1119F1DX,,15
,1.3.6.1.4.1.9.12.3.1.9.3.114,14,0,FOC11063NZ4,9,,2016-01-26T15:20:40-05:00,1,,,2,router.lab.local,10.101.0.1,13,,Cisco,2nd generation two port FXO voice interface daughtercard,2nd generation two port FXO voice interface daughtercard on Slot 0 SubSlot 2,V01 ,,http://10.101.0.203:8080/imcrs/netasset/asset/detail?devId=15&phyIndex=14,VIC2-2FXO,,FOC11063NZ4,,15
,1.3.6.1.4.1.9.12.3.1.9.15.25,30,0,FOC11163P04,9,,2016-01-26T15:20:40-05:00,1,,,2,router.lab.local,10.101.0.1,29,,Cisco,40GB IDE Disc Daughter Card,40GB IDE Disc Daughter Card on Slot 1 SubSlot 0,,,http://10.101.0.203:8080/imcrs/netasset/asset/detail?devId=15&phyIndex=30, ,,FOC11163P04,,15
,1.3.6.1.4.1.9.12.3.1.5.2,25,6,,5,,2016-01-26T15:20:40-05:00,2,,,0,router.lab.local,10.101.0.1,3,,Cisco,AIM Container Slot 0,AIM Container Slot 0,,,http://10.101.0.203:8080/imcrs/netasset/asset/detail?devId=15&phyIndex=25,,,,,15
,1.3.6.1.4.1.9.12.3.1.5.2,26,7,,5,,2016-01-26T15:20:40-05:00,2,,,0,router.lab.local,10.101.0.1,3,,Cisco,AIM Container Slot 1,AIM Container Slot 1,,,http://10.101.0.203:8080/imcrs/netasset/asset/detail?devId=15&phyIndex=26,,,,,15
,1.3.6.1.4.1.9.12.3.1.5.1,2,0,,5,,2016-01-26T15:20:40-05:00,2,,,0,router.lab.local,10.101.0.1,1,,Cisco,C2811 Chassis Slot,C2811 Chassis Slot 0,,,http://10.101.0.203:8080/imcrs/netasset/asset/detail?devId=15&phyIndex=2,,,,,15
,1.3.6.1.4.1.9.12.3.1.5.1,27,1,,5,,2016-01-26T15:20:40-05:00,2,,,0,router.lab.local,10.101.0.1,1,,Cisco,C2811 Chassis Slot,C2811 Chassis Slot 1,,,http://10.101.0.203:8080/imcrs/netasset/asset/detail?devId=15&phyIndex=27,,,,,15

 

What about all my serial numbers at once?

That’s a great question! I’m glad you asked. One of the most beautiful things about learning to automate things like asset gathering through an API is that it’s often not much more work to do something 1000 times than it is to do it a single time.

This time instead of using the get_dev_asset_details function that we used above which gets us all the assets associated with a single device, let’s grab ALL the devices at once.

In [13]:
all_assets = get_dev_asset_details_all(auth.creds, auth.url)
In [14]:
len (all_assets)
Out[14]:
1013
 

That’s a lot of assets!

Exactly why we automate things. Now let’s write the all_assets list to disk as well.

**note for reasons unknown to me at this time, although the majority of the assets have 27 differnet fields, a few of them actually have 28 different attributes. Something I’ll have to dig into later.

In [15]:
keys = all_assets[0].keys()
with open('all_assets.csv', 'w') as file:
    dict_writer = csv.DictWriter(file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(all_assets)
 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-e4c553049911> in <module>()
 3     dict_writer = csv.DictWriter(file, keys)
 4     dict_writer.writeheader()
----> 5dict_writer.writerows(all_assets)

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/csv.py in writerows(self, rowdicts)
 156         rows = []
 157         for rowdict in rowdicts:
--> 158rows.append(self._dict_to_list(rowdict))
 159         return self.writer.writerows(rows)
 160

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/csv.py in _dict_to_list(self, rowdict)
 147             if wrong_fields:
 148                 raise ValueError("dict contains fields not in fieldnames: "
--> 149 + ", ".join([repr(x) for x in wrong_fields]))  150         return [rowdict.get(key, self.restval) for key in self.fieldnames]
 151

ValueError: dict contains fields not in fieldnames: 'beginDate'
 

Well That’s not good….

So it looks like there are a few network assets that have a different number of attributes than the first one in the list. We’ll write some quick code to figure out how big of a problem this is.

In [16]:
print ("The length of the first items keys is " + str(len(keys)))
for i in all_assets:
    if len(i) != len(all_assets[0].keys()):
       print ("The length of index " + str(all_assets.index(i)) + " is " + str(len(i.keys())))
 
The length of the first items keys is 27
The length of index 39 is 28
The length of index 41 is 28
The length of index 42 is 28
The length of index 474 is 28
The length of index 497 is 28
The length of index 569 is 28
The length of index 570 is 28
The length of index 585 is 28
The length of index 604 is 28
The length of index 605 is 28
The length of index 879 is 28
The length of index 880 is 28
The length of index 881 is 28
The length of index 882 is 28
The length of index 883 is 28
The length of index 884 is 28
The length of index 885 is 28
The length of index 886 is 28
 

Well that’s not so bad

It looks like the items which don’t have exactly 27 attribues have exactly 28 attributes. So we’ll just pick one of the longer ones to use as the headers for our CSV file and then run the script again.

For this one, I’m going to ask you to trust me that the file is on disk and save us all the trouble of having to print out 1013 seperate assets into this blog post.

In [18]:
keys = all_assets[879].keys()
with open ('all_assets.csv', 'w') as file:
    dict_writer = csv.DictWriter(file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(all_assets)
 

What’s next?

So now that we’ve got all of our assets into a CSV file which is easily consumable by something like Excel, you can now chose what to do with the data.

For me it’s interesting to see how vendors internally instrument their boxes. Some have serial numbers on power supplies and fans, some don’t. Some use the standard way of doing things. Some don’t.

From an operations perspective, not all gear is created equal and it’s nice to understand what’s supported when trying to make a purchasing choice for something you’re going to have to live with for the next few years.

If you’re looking at your annual SMARTnet upgrade, at least you’ve now got a way to easily audit all of your discovered environment and figure out what line cards need to be tied to a particualr contract.

Or you could just look at another vendor who makes your life easier. Entirely your choice.

@netmanchris

Automating your NMS build – Part 4 Adding Custom Views

This is part four in a series of using python and the RESTful API to automate the configuration of the HP IMC network management station.  Like with all things, it’s easier to learn something when you’re able to find a good reason to use those skills. I decided to extend my python skills by figuring out how to use python to configure my NMS using the RESTful API. 

If you’re interested, check out the other posts in this series

Creating Operators

Adding Devices

Changing Device Categories

Adding Custom Views

In this post, we’re going to use the RESTful API to programatically add a custom view, and then add devices to that custom view.  For those of you who don’t know, a custom view is simply a logical grouping of devices.  In IMC custom views also form the basis of the topology maps. In fact, a custom view and a topology map are essentially the same object internally,  The difference is just whether you chose to look at it as a list of devices in the normal interface or chose to look at it in the typical topology/visio format which we all know and might-not-love. 

Why might we want to add a custom view programatically you ask? While, the answer to that might be simply that we’re lazy and it’s easier and faster.  Or it might be that you don’t want to have to look through all the different devices, or, as in my case, that you simply want an excuse to extend your python skills.  Whatever you’re reasons are, custom views are something that just don’t get used enough in my opinion. 

Custom views are a great way to be able to zoom in on the status of a specific branch, a geographic area, maybe a logical grouping of devices that support a specific application?  It really doesn’t matter and the best part is that a single device can exist in multiple custom views at the same time, so there’s really no limit to how you put these views together. It all depends on what makes sense to you. 

 

The Code

This code is a little bit more complicated than some of the other examples we’ve looked at. We’re actually going to be using multiple functions together, but the logic should be pretty easy to follow.  I’m sure there are better ways to do this, but this seems to work for me.  I’m sure I’ll be back here in a year going “Why did I write it that way!?!?!?!?!?” but for now, I hope it’s simple enough for someone else to follow and possibly get inspired. 

The main function is really just calling the other functions which we will break down below

Step 1 – Create the Custom View

In this code, we’re going to simply use this small function that I created to gather the name of the custom view ( the view_name variable ) and then use that to create the JSON payload ( the payload variable ).  The other part of the JSON payload is the autoAddDevType variable which is hard-coded to 0.  This could be used to have the system automatically add new devices of a given type, but I”m in interested in doing the automation myself here. The last part of this code will be used as the input for another function, which is the return of the view name.  You can see in the main  function in the the view_name = create_new_view() line. 

Step 2 – Get Custom Views

For the next part, we are going to need to figure out what Id was assigned to this new view, to do this we’re going to have to go through a couple of steps. The first one is to ask the NMS to send us a list of all the known custom views. The following function will request the list, which will be returned as a JSON array, and then convert it over into a python list of dictionaries so we can work with it natively as a python object. You can see this in the main function in the  view_list = get_custom_views() line

Now that we have the view_list which is the list of all the views, we’re going to have to find the ID for the new view that we just created.  We do that by by using the get_view_id() function using the view_name as the input.  Essentially, this will look through each of the views in the view_list that was returned above and let us know when the ‘name’ value is equal to the view_name value that we captured above. Once it’s equal, we then return the ‘symbolId’ which is the internal unique numeric value assigned to this particular custom view. This is the number we’re going to use to identify the view that  that we want to add devices to.  Make sense? Now that we’ve got this number, we’re going to assign it to the object view_id for use later on. 

Note: I actually could have added the devices directly to the view in the original add_custom_view code code above, but then I’d have to write the modify function later if I ever wanted to change or add new devices to the view. I’m trying to follow the DRY ( Don’t Repeat Yourself ) advice here so I just write the modify here and I can then leverage it later without having to re-write the code.  

Step 3 – Generate the Device List

We’re not going to spend too much time on this part as I’m essentially re-using the code from the Changing Device Categories blog.  It’s pretty straight forward. I’m using some user-put to gather a list of devices that we want to add to this specific view and then capture it in the dev_list object.  What’s cool about doing it this way is that the returned list will search through all the IP addresses assigned to your devices, not just the managed address which might come up in the NMS interface where you would normally perform this step. There are ways around that as well, but that’s another blog. 

Step 4 – Add Devices to Existing Custom View

This last step is where things come together. We’re going to run the add_device_to_view(dev_list, view_id) function which is using the dev_list generated in step 3 and the view_id that we captured at the end of step 2 as the inputs.  Essentially, we’re just saying here  “ add all the devices I want to the view I just created “. 

Wrapping it Up

So this is just an example of how you can tie a few pieces of code together to help automate something that might otherwise take you a lot of manual labour. In this case, just a bunch of mouse clicks and depending on the fact that you were able to manually identify all the devices you wanted to add to a specific view.  Personally, I’d rather leave the hard work to the computers and move on to something that requires my brain.

Network Management – How to get started

Network Management Skills

In the last few years, I’ve noticed that I’m a little different. It’s not just because I wear coloured socks or my hair looks like I style after Albert Einstein.  I noticed that I’ve developed a different skill set than the majority of my pre-sales or post-sales network professional peers. What skills you ask?  Network Management and Operations.

Why I choose to develop Network Management skills

About five years ago, I took a look at the market and thought ” This stuff is complicated “.  Earth shattering observation, right?   It sounds simple, but then I started looking at some of the tools we had at the time and I realized that NMS tools could really help to automate not only the information gathering, but also the configuration tasks in our networks.  At the time, we had a cool little tool called 3Com Network Director.  It ran on a single PC. No web interface and it really only managed 3Com gear. But it was better than running CLI commands all day long. And the monitoring aspects really helped my customers identify and resolve problems quickly.  This was a moment of inspiration for me.  I choose to develop skills in network management and operations.

Let me say that again.

I choose to develop skills in network management and operations. 

I didn’t choose to develop skills in 3ND, or IMC, or Solarwinds, Cisco Prime, or any of the various other tools. Overtime, I’ve gained experience on all of those products, but I would say my true value is having gone through the process to develop skills in the sub disciplines of network management. Learning a product is only a very small part of the whole domain. 

What does that mean? 

It’s easy to learn a product. They have bells and whistles. Click this check box. fill in this box. etc..    Those skills are important. But they don’t help us understand how to apply the product to resolve our customers business challenges. They don’t help us understand when not to click that box. And they don’t help us to design a network management strategy, or to consult with our customers on operational efficiencies and what can be done to help increase their networks stability, to reduce the MTTR times, or to mitigate pressures put on the operations team. Learning the domain knowledge has helped me to understand WHY we have developed the product features and what they are to be used for.

My Learning Roadmap

To put it simply, I consumed everything I could on the subject. It’s amazing how much free information is out there if you set your mind on finding it. If anyone’s looking to increase their skills in this area, I’ve put together the following list of resources that have really helped me in this domain. I’ve tried to keep this out of vendor specific products, but I’m sure you’ll find that any product you choose will probably have training and learning resources around it as well. This is in NO way inclusive, there are a lot of resource out there. I highly encourage everyone to read, watch, and listen to as many of them as you can and to think about them critically.

Free Resource

Solarwinds SCP training  The Solarwinds SCP training is online and free. What I really liked about this training is that it’s really focused on network management, netman protocols, and the operational aspects of network management. There are, of course, some product specific aspects to the training, but in general this is a really good primer on network management in general. Oh… did I mention there’s a bunch of videos as well?  Great stuff to rip and put on your tablet when you’re stuck on a plane and you’ve seen all the movies. 

Solarwinds has also provided a bunch of whitepapers going further in depth on network management specific subjects which are a great reference.  If you’re interested there’s also the Solarwinds Certified Professional certification if you’re looking for a way to validate your knowledge.

The Information Technology Infrastructure Library  (ITIL) is a compilation of IT service management practices compiled over the last 30 years. There’s a lot of great stuff in here. The books are expensive though.  There is an entire industry that’s sprung up around ITSM.  If you have some commute time to spare, I would highly suggest typing in the words “ITSM” into your favourite Podcast app and sit back and listen. 

If you’re interested, there’s also the ITILv3 Foundations certification if you’re looking for a way to validate your knowledge.

Blogs and Podcast

Social Media is a great way to learn how people apply ITIL concepts to the real world. I particularly like http://www.itskeptic.org as it’s got a great following of a bunch of smart people who disagree on a regular basis. You never know when the customer you’re going into is operating in a traditional ITIL based ops model, perhaps they are using the Microsoft OperationsFramework, or perhaps they’ve moved on to Agile and DevOps. It’s good to have at least a cursory knowledge in all of this approaches to IT operations, to mention traditional Network Management Frameworks like FCAPS and eTOM

Paid resources

Books are a great way to learn about network management and operations. Here’s my  abbreviated reading list. These are the reference books that sit on my shelf within easy reach. 

Network Maturity Model – This book is actually a academic thesis focused on trying to extend the CMMI models to network specific capabilities maturity models. Of course, network operations is part of the capabilities of an organization, so there’s a lot of great content in here.  The book is definitely academic, but it’s got a LOT of great content in it, assuming you can get through all of the required footnotes and pointers to other academic works. 

Fundamentals of EMS, NMS, and OSS/BSS – This books is wonderful. It covers all aspects of traditional telecom management from FCAPS to eTOM, as well as looking at OSS/BSS architectures which usually exist only in Service Provider networks. Great information in here.  My biggest problem with this book is the font size. I have glasses and it’s tiny.  Worth the effort to make it through, but plan on multiple reading sessions. This is not a book you’re going to get through in one sitting.

Network Management Fundamentals – Cisco Press book that’s a great read. A lot of information in here is covered in some of the other books. What I like about this is that it written as an introduction to network management for people already working in the field. This is not an academic text.

Network Management: Accounting and Performance Strategies – Cisco Press book again. This one focuses strictly on performance management, focusing a lot on Netflow and how it can be applied to accounting and performance in large network.  

Performance and Fault Management – Cisco Press Book again. This is an older book, so the technologies discussed may not be as relevant as they once were. The nice thing though is that we’re talking about operational models and processes here, so the principles still apply. 

VoIP Performance Management and Optimization – Last Cisco Press book. This book looks at the operational aspects of VoIP/IP Telephone/Unified Communications networks specifically. There are a lot of very detailed recommendations in here that can be leveraged to give customer guidance on what they should be doing and what they should be monitoring. This book has helped me a few times when working with customer who have chosen to implement a dual-vendor strategy and want to have HP Intelligent Management Center managing and monitoring there Cisco Callmanager environment in addition to their network.

The Phoenix Project – This book is written as a novel to teach people about the DevOps movement. This is a MUST read for anyone interested in IT operations and the current trends in the industry. It also will help get a first hand accounting of what many customers go through. Read it. Read it. Read it.

The Visible Ops – From the same authors of the Phoenix Project. This book tries to tie DevOps and ITIL together. Interesting read. Many people see DevOps and ITIL as two opposites of the spectrum. Most have had a bad ITIL experience and now the pendulum swings in the other direction. Finding a happy middle is a good goal. I’m not sure they’ve hit the mark, but it’s a start.

Network Management: Principals and Practice – Expensive book. Good information, but the technology is also quite dated. Concepts and knowledge is great. Good diagrams, but it’s sometimes hard to get through the hubs and token ring.

Domain Related Knowledge

Network Management is really about ensuring stability and helping the business to meet their operational requirements with the greatest efficiency possible. In that light, it’s important to understand what some of those operational burdens are. In recent years, businesses have had a ton of GRC (governance, risk, and compliance) requirements put on the operations teams that threaten to break an already overloaded team.   On the bright side, I believe that although they have been forced into these requirements through legislation and governance like SOX, COSO, PCI-DSS, HIPPA, Gramm-Leach-Bliley, etc.  have actually forced network operations teams to get much tighter on their controls, forcing us into more stable and secure networks. 

note: This list is US specific, if international readers can post some examples in the comments section, I would be happy to add them to the list of references.

In my experience, one of the issues with GRC requirements in general is that they are very rarely descriptive of what actually needs to be done. They have generic statements like “monitor network access”  or ” secure the it assets”.  

ISACA noticed this and put together the COBIT framework which is a very detailed list of over 30 high-level processes and over 200 specific IT control objectives. Most of the GRC requirements can be mapped to specific COBIT objectives. COBIT is a good thing to be familiar with. 

 Next Steps

As we move forward in IT, operations and orchestration skills are starting to become some of the hottest requirements in IT. 

Whether it’s products like HP’s Cloudsystem, or industry wide projects like OpenStack, CloudStack or Eucalyptus, having solid Operational knowledge and skills is going to be a requirement for anyone seeking the coveted Trusted IT Advisor role in any customer. 

For anyone looking to gain or just brush up on their network management specific skills.

I would recommend

  • Solarwinds videos as a place to get started with the basics of network management
  • become familiar with the basics of COBIT and GRC in general..  Doing some reading on the various GRC requirements that apply to your specific regions and customers is also a great way to change the conversation from speeds and feeds to the challenges of the business. 
  • Read on OpenStack
  • Learn about ITIL and DevOps

Social Media is always a great way to stay current as well. One of the biggest challenges of operations is the best way to learn it is to do it. Unfortunately, many of the really good Network professionals, whether pre-sales or professional services, don’t get an opportunity as they are usually hands-off or on turning over the keys to an ops team after the project has been delivered. Socialmedia helps to connect to the daily challenges of people who are living in the trenches. 

Get some ITSM experience. If you don’t work in a company where you get to babysit the same environment, you can always do what I did and experiment with it at home.

Anyone else have any suggestions on how to get up to speed? Feel free to comment below!

@netmanchris

Providing Network Leadership

So I have to give credit where credit is due…  a lot of this post is directly inspired by the book Network Maturity Model By William J. Bauman et al.   It’s written in a very academic style, but there are a ton of little gems in there which I think are worth pointing out. I’m expanding a lot on some of these key points, so please feel free to drink from the source rather than the muddy water down river. 🙂

 

The first section of the actual maturity model deals with Enterprise Network Leadership. I think it’s important to say that when I’m using the word Enterprise, I’m not talking about a large organization. I’m just talking about the business. Whether you are responsible for a few switches and a router, firewall or UTM appliance, or you are responsible for a multinational organization with a global WAN, several large campus environments, and smaller branches spanning the globe. I think the same general guidelines apply. 

 

Have a Plan

The network leaders are responsible for creating a network business plan that aligns with business strategy. Now keep in mind, that there are a LOT of very talented people in the industry who are consultants. These hired guns are often jumping from engagement to engagement, so this might not apply to them. But for those who are in an Network Operation role, it’s critically important to understand:

  • What the business goals are?
  • Who the LOB application stakeholders are?
  • What their requirements are? What applications are important to them?
  • How the LOB stakeholders directly impact the profitability of the business?

and most importantly; 

  • How the ability, or lack thereof, to successfully run the network can impact the business directly?

The Network Leaders are responsible for creating both the vision/strategy, and the specific policies and procedures to support the vision in the short, mid, and long term. From specific policies such as acceptable-use statements to longer term procedures such as a planned equipment refresh on a well defined rotational schedule to avoid a massive CAPEX hit, the network leader is responsible for making sure the network has the appropriate capacity, resiliency, availability, redundancy, etc.. to meet the business requirements. 

To create the vision/strategy from which the policies and procedures are derived, they should also be ensuring that the requirements of those stockholders are taken into account when planning out the network and all the operational tasks around it. This is very broad and can be summed up as “understand the business requirements”.

 

Understanding the Business Requirements

This one gets thrown around a lot in our industry. But to be honest, I find that VERY few hardcore network professionals actually take the time to do this. It’s my opinion, obvious bias aside, that the network is one of the fundamental pillars of almost every network in the world now.  I’m choosing not to use the word “foundation” because I don’t believe that’s true. 

A foundation to me is something that business is built upon.  Imagine if you will that a business is responsible for making hand-made clothes. Or is responsible for growing food. I think it’s obvious that the network is not the MOST important thing. In both of these examples, I don’t think any would argue that the business will be incapable of creating it’s product without the network. 

But imagine if the network is down and they are unable to receive orders from their customers? What if the network is down and they are unable to use their ERP system to ship orders? Or to send invoices?  

I think we can all agree that if the products sit on the shelf, it’s not a good thing. Money doesn’t come in. And soon, global economic catastrophe is created, cats sleeping with dogs, total chaos!!!

All because a network went down. 

(OK… maybe I’m exaggerating a little. )

 

So what kind of things should be taken into account when we say “understand the business requirements”?  Here’s some of the top of my list:

What governance, risk, or compliance initiatives does the company have to adhere to?

GRC? Huh? Depending on the specific industry, country, or region of the world that the company operates in, there may many legally enforced burdens that are placed on the company. The major examples everyone seems to know are SOX, Graham Leech, HIPPA, etc..  These all have different, although often complimentary, requirements that depending on the nature of the business, you need to be aware of as a network leader.  

If you are a network leader and you are having trouble getting budget approval for some much needed networking upgrades. Learn about which GRC requirements apply to your organization. It’s amazing how quickly the purse strings open when the business leaders understand that the failure to do these upgrades may have a direct impact on a GRC requirement that they can be personally held liable for. 

What are the different Line of Business applications and how critical are they to the success or failure of the business?

Most companies have a LOT of applications they “need” to do their business. But there is a BIG difference between their Microsoft Lync implementation which they use to increase collaboration between globally dispersed teams, and their ERP system which is responsible for making sure that orders are received, shipping requests are sent to the warehouse, and invoices are sent to the customer. 

If you are a network leader and you are having trouble getting budget for some much needed networking upgrades. Learn which of the LOB applications are directly related to the business’s ability to take orders, ship product, or invoice customers. When requesting budget for the upgrade, make sure you make it clear what hourly business cost can for network downtime. 

An easy way to calculate this, if you have access to the numbers, is to look at the annual report. Figure out what the revenue was last year, divide by 365. divide by 8 and you know have the hourly cost of downtime. 

 

For me, these are two of the most important “understand the business” requirements, but I’m sure there are a ton of others ones.  PLease feel free to call out more examples in the comments! I’d love to hear them!

 

@netmanchris

 

 

 

Configuration Management – Configuration Baselines

Many times when I’m speaking with customers, one of the first questions I get asked is

” Ok, I’ve got this NMS, what’s the first thing I should do that’s going to make the biggest difference in my network?”

There are probably a lot of opinions on the answer to this question. For me, the answer is always this:

Start with Configuration Management.

In ITILv3, one of main aspects of the configuration management domain is to track all of the configuration items that relate to an IT service. For more on ITILv3 CI’s check out this video.

For those of you who suffer from insomnia and would like a cure, most of the ITILv3 change management stuff is found in Volume III, Service Transition. In ITILv3, the first thing you need to do is to define your CMS.

Configuration Management System

This is the ITIL term for the software that handles your configs for you.

Again, remember that ITIL is about process. So it’s possible to actually run an ITIL based shop without tools in place. It’s POSSIBLE… but I think this falls in the JBYCDMYS (Just because you can doesn’t mean you should) bucket.

What to look for in your CMS

So for NMS newbie’s who are trying to get into more process driven network operations, your CMS is the software that does basic tasks like

Backup of Configurations

Any NCCM solution should allow you to backup configurations. If you’re lucky you’re NMS may have additional features that allow you to move beyond basic configuration backups. Ideally, your NMS will have features that will enable you to define configuration baselines and snapshots for any given device.

Configuration Baselines : A configuration baseline is the configuration of a service, product or infrastructure that has been formally reviewed and agreed on, that thereafter can be changed only through formal change procedures. Configuration Snapshots: A snapshot of the current state of a configuration item or an environment. It also serves as a fixed historical record.

In plain english terms, a configuration baseline is the place where you absolutely last know that everything was working. A snapshot is an automatic backup that lets you know what the state of the device was at the time of that backup.

We’ll come back to this later on a subsequent blog post, but snapshots are also great to have around for helping to address your compliance initiatives like SOX, PCI, or HIPPA.  Having a configuration snapshot from a certain date is an easy way for you to prove to the auditors what the configuration state of a given device was on that date.

Configuration Templates: A complete, or a portion, of a device configuration.

This could be your standard configuration for your access switches, a secure configuration for your routers, or even just a portion of a configuration, such as the config required to change the local admin password on all your switches.

Scheduling Configuration Changes: The ability to schedule changes to your network devices at specific time.

The ability to schedule changes is nice. Assuming your changes have gone through a peer-review process and through your companies Change Approval Board, Why do you need to be up at 3am during your companies change window?

Now there may be cases where you will still need to be onsite to verify that a critical change went through. To perform the change validation tests that I KNOW you all had in your change plan. Right?

But for those cases where you are simply changing a local admin password, or adding an NTP server, or some other low-risk change, you may want to just schedule this for the ‘wee hours of the morning while you are home in your toasty bed.

One last thing…

When making major, or minor changes to your network configurations, it’s a good practice to go back and update your CMS to reflect the new Configuration Baseline for that device.  You did actually run through a series of test to make sure you didn’t break something, right?

So although this could be a TFTP server on the network somewhere, hopefully it’s a software that will automate the backup of network device configurations for you. Examples could include HP’s Intelligent Management Center, Solarwinds Orion, Cisco Prime, or perhaps an opensource tool like RANCID.

In this video, I’ll go through the basic CMS functions of HP’s IMC to show how baselining and snapshots can be applied.

Getting ITSM Experience

Many of the best network engineers I know have little to no network operation experience.

“What? How can that be?” you ask? Well it’s really quite simple.

Most of the best network engineers I know, and we’re talking some double and triple CCIE’s in this crowd, have never actually operated a network for any length of time. They were professional services guys, short term contract guys, some pre-sales or post sales guys. There are a LOT of paths to the top of Mt. Fuji after all.

Although I did a short term network ops. gig early on in my career, I actually feel I squandered the opportunity as I just wasn’t mature enough to understand the experience that I should have been gaining.

So this question came up with a college last week. ” How do you get network operations experience if you’re not in a network ops group?”

This blog post is dedicated to him.

A few years back, after I decided to really get serious about network management, I had the same issue. I wanted to get some experience in network management, but I had no large network to run. In my day job, I’m actually a pre-sales resource, so it’s not likely I’m going to get any experience in the near future, so it occurred to me that I could start a simulation to try and gain that experience.

At this point, I had already done some Ciscoworks LMS projects (long sleeve shirts to cover the scars to prove it!). I had successfully passed my ITILv3 foundations certifications, and I had even gained the honor of being one of the first Solarwinds Certified Professionals shortly after the SCP program was launched.

The Project

So with a bit of knowledge, I decided to run my home network as if it was an ITSM framework for a year. This means that I had to implement good network management hygene. Good Change management practices. Good fault management practices. Try to implement some of the ITIL processes around Service Strategy, Service Design, Service Operations, and Continual Service Improvement.  Basically, run it like a business who’s success depended on the network.

The Tools

So I had some ideas around the processes I wanted to put in place, but it always takes the three P’s to successfully implement any ITSM initiative. People, products and processes. Fortunately for me, I had access to HP’s Intelligent Management Center, as well as the trial versions of Solarwinds Orion NPM and NCM. but I was still missing some critical pieces to the puzzle.

Service Operations: One of primary activities in Service Operations is really around the help desk. How are tickets logged? How are they tracked? Escalations Procedures. Building out and growing the KMS (knowledge management system )

I didn’t have any help desk or ticketing software in place, so I decided to go the free way; Spiceworks.

For those of you who don’t know it, Spiceworks is a free IT Management app which ” includes a free IT management app for everything from network inventory and monitoring to help desk and more!”

It’s not what I would call a full FCAPS system, but it does have an ok help desk system, and it’s hard to beat free, right?

Note: I noticed last week that my Synology NAS now has a help desk app named OS Ticket in the available apps. I haven’t tried this, but considering it’s free and installs easily on the synology box, it might be a good option for those of you who are lucky enough to have one of these great little machines.

Financial Management

Financial Management falls under the service strategy volume of the ITILv3 core books. I’ll be honest, that this wasn’t exactly the strongest part of my little experiment, but I did try to implement some financial processes.

But unlike some of the helpdesk and change control procedures, this wasn’t exactly something that I could count on good self-discipline to track. Can you imagine that conversation?

“Hey Me… I’d really like this new synology RS812.”

“Hmm… Don’t we already have a 411?”

“Yeah, but this one has TWO gigabit ports!”

“Let me think about that… ok. Let’s buy it!”

As you can see, I had to come up with a different plan.

Fortunately, I’m married, so I merely formalized the process of having to ask my wife for permission to buy any new toys. I have to say, this was probably the year that I got the least amount of new techtoys, but I like to think the experience I gained was worth it. ( < – What’s the HTML tag for the sarcasm font again? )

The Results

So how did things go? Well, it was a little funny at times. Emailing myself a support ticket so that I could fix something that wasn’t working. I did try to get my wife to e-mail the tickets in, but that lasted about a week before she just said ” Can you just fix it!?!?!?!”

For the other things, it felt a little strange asking myself for permission so that I could make a change to the environment and then having to consult myself to see what the affects might be ( Change Advisory Board ). Implementing the RACI (responsible accountable consulted informed ) was pretty easy because I generally get along with myself. etc…

To be honest, I wish I would have been blogging back then, because I think it would have made for some interesting reading in retrospect. I’d like to say that I followed all the processes and ran a bullet proof network for the year, but I didn’t. Sometimes I slipped, made a change and locked myself out of my own gear.

But on the bright side…  I did learn why change management is important.

Any one else gone through an experiment like this? Anyone willing to take up the challenge and blog on the experience?