Home Automation Setup with Apple #HomeKit

As many of you know, I’ve been diving into the home automation pond for awhile now. I’ve been asked to blog out my current home setup and this is an attempt to do that. There’s MUCH too much to be put into a single post, which is why I started a new blog for this subject over at www.homekitgeek.com as well as doing some video reviews of different HomeKit accessories. Work in process, but I’ll do the short version here.

NewImage

 

Apple HomeKit

NewImage

I chose to use the Apple homeKit framework as the base for my home automation journey for a few reason. The biggest one is that I already owned a couple of Apple TV (gen4) devices which fit the home automation hub role. These devices are a homeKit hub and are the always-on/always-present devices that are used to perform orchestration/automation actions when I’m home or away. These also tie directly into Apple’s iCloud which allows me remote access to my homeKit gear without having to VPN into my home network.

 

Home Setup Room-By-Room

There’s a lot to talk about here, but I thought I would just do a quick description of what’s going on room-by-room with any specific features or automations that I’ve got setup. 

 

Front Entrance (outside)

DoorBell

I’ve got a Ring Pro Doorbell which is NOT HomeKit compatible yet. They’ve been promising for a couple of years and have assured me, both publicly and in DMs, that they are committed to updating the Ring Pro for HomeKit support. Small issues with the wiring of my house, but I was able to eliminate the doorbell chime itself and get the Ring Pro up and running. The Ring Chime Pro was a nice addition as well which I’ve got setup in the living room. 

NewImage

Lights

For the lights, I’ve got the front entrance lights tied to a Luton Caseta light switch in the house which is setup with an automation to turn the lights on at Sundown and turn them off at Sunrise. This is nice in that the system automatically checks the internet to see exactly what time those will be everyday. Living in a Northern country, Sunrise/Sunset can vary pretty wildly throughout the year. This automation lets me get the benefits of a timer system but without the hassle of changing the timer every month as the times change. 

The one thing I haven’t automation yet is the flood motion light.  I would really like to get a ring floodlight cam but I’m not investing any more money into Ring until they deliver on their promise around the doorbell. 

Front Entrance (inside)

Lock

I decided to not replace the whole deadbolt system and just use the August lock which fits nicely on top of most existing locks.  I particularly like the August iPhone App feature which automatically unlocks your front door as you pull in the driveway. Really nice to not have to mess around with keys. 🙂

NewImage

Sensor

In addition to the lock, I also installed an Elgato Eve Door & Window sensor on the front door. Just because it’s locked, doesn’t mean it’s closed. 🙂 Yes, I found that out the hardware. 

 NewImage

Lights

In the mud-room, I’ve got a light which is connected to the same Lutron Caseta which controls the outdoor lights. Once you’re in the house, I’ve got a Philips Hue motion sensor which turns on a Philipps Hue GU10 light bulb at the bottom of the stairs. Unfortunately, the Philippe Hue motion sensors are not exposed directly to HomeKit, so I was forced to setup an on/off automation pair for a Lutron Caseta switch which controls 6 GU10 lights in the ceiling. 

Living Room

Sensors

I’ve got a Elgato Eve Degree for tracking general room stuff ( temperature, air pressure, humidity ). 

Cameras

I’ve got a Dlink Omna 180HD camera which, at the time of this writing, is the only camera on the market with official HomeKit support. I was unsure about it when I first installed it, but it’s started to grow on me.  The device is billed as a 180 degree camera which gives me full view of my kitchen/dining room/living room areas ( open concept ) and also provides night vision.  The one feature that seems to get used the most though is the 2-way audio which lets me talk with someone in the living room without having to be physically present.  

Parent note: Also handy to check in on the kids discretely to see what’s happening when you hear things starting to go sideways. 

NewImage

Lights

The living room definitely has the most complicated light setup in the whole house. I’ve got GU10 ( recessed lights.) through the whole upstairs which can be quite expensive with the Philipps Hue systems. So the living room is the only place I used the GU10 in mass. Typically, I would use a Lutron Caseta switch to power a bunch of GU10 bulbs ( anything over 2 bulbs and the Lutron is cheaper!!!), but for the entertainment area, I really wanted to have the ability to set different scenes.  Philipps Hue has a bunch of different models available and for the living room, I decided to use 

  • Philipps Hue GU10 lights ( both coloured and white )
  • Philipps Hue Bloom ( accent light )
  • Philipps Hue Light Strip

The combination of these products let’s me go from regular family to chill movie watching with the touch of a button.  I’ve also got a coupe of IFTT automations setup to help manage the kids schedule.

At 8:10pm, the coloured GU10 lights go purple which is a visual signal for them to stop what they’re doing and get ready for bed. If they are quick, they can come back and finish whatever they are doing. At 8:30, every light in the living room flashes to let them know it’s time to go to bed.  To be honest, the flashing light is pretty rough, but it is definitely impossible to ignore, even for a kid who’s deep into a Minecraft session. 

Other

The other thing I added to the Living room is a Logitech Pop button with the HomeKit Bridge.  Voice control with Siri is great and all, but it’s also nice to be able to just tap a button to set the scene.  You can set three different scenes with the button which I’ve chosen as follow

  • Single Press – Turns on the Living Room lights bright scene. This is “normal” mode
  • Double Press – Turns on Movie time Scene. Dims lights and turns on the Philipps Hue lights for accent lighting.
  • Long Press – Turns on Good night Scene which turns off all the lights, locks the doors, etc… 

Dining Room

Lights

Dining Room is pretty simple. I have 6 lights controlled by a Lutron Caseta light switch. I also got a couple of Ikea Accent Light Boxes which are hooked up to an iDevice plug. This gives me a candle-light vibe without the fire. 

BackDoor

The dining room opens directly onto a small deck. This is an entry point with I wanted to add a bit of security to. I used the Elgato Eve Door & Window sensors

To let me know when the door opens and closes. This is really nice when you’re leaving the house for a quick “Hey, Did I close the back door?” check.  It doesn’t tell me if I locked it or not. Still haven’t found a solution to that problem.

Kitchen

Lights

Kitchen lights are also pretty simple. Overhead is 6 GU10 bulbs controller by a single Lutron Caseta switch. I’ve got under cabinet lighting as well that I’m planning on adding another Lutron switch to control.  Idea here is to setup a motion sensor to turn on the under cabinet lighting at night. No need to blind anyone, right?

Sensors

 I have an iHome iS550 5-1 sensor in the kitchen. No particular reason to be honest. I originally bought this for the master bedroom but it didn’t work out at all. Constantly lost wifi signal and the fact that it’s a wall-powered device makes it difficult to hide. 

Hallway to Bedrooms

Lights

In the hallway to bedrooms I have a couple of Philipe Hue GU10 bulbs with a motion sensor setup. This automatically turns on the lights as you walk down the hallways. I have young kids ( 6 -10 -11) who need to get up at night, so I also setup a rule that turns the lights in to a “nightlight” mode if motion is detected after 9pm. Keeps the blinding of the children down and makes sure that they can still hit the target in the ‘wee hours of the morning. 

Kids Bedrooms

Kid1

Lutron Caseta Switch for the ceiling lights. I also setup a Philipps Hue bloom with the sunrise scene turned on to help him get out of bed in the morning. His room is in the basement, so I also got a fibaro window & door sensor as his room is in the basement. 

Kid2

I’ve got a Philipps Hue color light bulb paired with the Philipps Hue Light switch.  He’s also got a colorful “moonlight” which is connected to an iDevice plug. This is connected to a time-of-day based automation which turns it on at bedtime and off at midnight ( after he’s fallen asleep). The Philipps Hue color bulb is also setup for the sunrise scene

Kid3

Of the three kids room, this was the hardest. We had legacy fluorescent light that simply didn’t work with the Lutron Caseta light switch. I ended up replacing the light fixture completely as I REALLY like the Lutron switches. I also have a Philipps Bloom setup with, you guessed it, the sunrise scene.

Master BedRoom

Lights

No ceiling lights in the master bedroom, so I use a couple of different lamps. One lamp I’ve used a Philipps Color bulb, and then other large lamp I use a Lutron Caseta wall plug.  The large lamp uses small candle bulbs ( not sure the exact model ) and it was far easier to just grab the Lurtron Caseta wall plug. The Caseta wall plug also acts as a repeater extending the range and reliability of the Lutron automation. ( Luton uses a protocol called ClearConnect, not wi-fi or bluetooth).   There’s also a Philipps Go light in the master bedroom as well. The Philipps Go is great as it’s got an internal battery. This came in handy recently when we got hit by a power outage. 🙂 

Sensors

 I originally used the iHome iS550 sensor in the master bedroom, but it just didn’t work out. Iswapped it for the Elgato Eve Room Sensor which has been working great. The Elgato Eve Room sensor is a bluetooth device and is power by batteries, so no issues with disconnecting wifi networks or losing the device completely during a power outage.  The Eve room sensor gives me temperature and humidity. I then use this measurement as a trigger to either turn on the humidifier if the air is too dry. 

The room also measures VoC ( volatile organic compound ) or air quality. As allergy season is about to hit, I’m going to get an indoor room air-purifier and use the VoC measurement to turn the purifier on or off. 

Humidifier

The humidifier is a generic Honeywell humidifier. I got lucky in this is an analog switch based unit. This means that if it’s in the ON position and you plug it in, it just starts working. I use an iDevices wall plug (with Nightlight!) to control whether or not the power is turned on to the humidifier. Basically, if it gets to dry, as measured by the Elgato Eve Room sensor, a trigger is sent to the iDevices wall plug to turn on and start pumping a little moisture into the air. 

Air Purifier

 I don’t have the air purifier setup yet, but planning on this as my next purchase. Same basic principal as the Humidifier, but replace with the purifier. If the air quality is poor, as measured by the Eve Elgato room sensor, then the power to the plug turns on and the air starts getting clearer. 

Home Office

Lights

 The home office lights are using the Lutron Caseta switch. I also included an Elgato Eve Motion Sensor to automatically turn on the lights when I walk in the office. The Eve is Bluetooth based and doesn’t have the response that the Philipps Hue motion sensor does, but the 2-3 second delay doesn’t bother me at all when I know I’m going to be in the office for awhile.  The Eve motion sensor also turns on some desk accent lighting which has been plugged into an Eve Elgato Energy wall plug.  In a nutshell, the lights turns on when I walk in the office and the lights turn off when I’m no longer there ( after a 15 minute period of no movement detected ). 

Sensors

In addition to the Eve Elgato motion sensor mentioned above, My home office is in the basement, so I also included a couple of Eve Elgato Door & Window sensors to make sure the house is secured.  Living in a northern country, leaving the window open can have consequences. 

Laundry Hallway

Lights

This was my latest addition. I noticed that when I walked downstairs with the laundry basket in my hands, I had to put it down to turn on the light. I know. The horror!!!!   I installed 2 Philipps Hue GU10 bulbs with a Philipps Hue Motion sensor to automatically turn on the lights as I carry my basket.  Kid1’s room is also at the end of this hallway, so I also setup the nightlight ( 5% light.) to trigger if it’s after 9pm. No need to blind the kid, right?

BackYard

Plug

I have a single iDevices outdoor plug which I use to plug in my electric lawn mower. Although this is a dual-plug unit, both plugs are controller simultaneously meaning that they are both ON or OFF.  The plug is set up to be off during the week and power on at 9pm on Friday night. Just enough time to get a full charge to mow the lawn on the weekend.  

As the season gets colder, I’m planning on using this to power the Christmas lights as well. Plan is to setup a time-based trigger to turn on the lights at Sundown to save a little electricity. 

Garage

Lights

I have a single Philipps A19 bulb with a Philipps Hue motion detector installed to run on as motion is detected in the garage. I also had a spare Koogeek Smartsocket that I plugged a regular A19 light bulb into. I’ve setup an ON/OFF trigger pair to turn the Koogeek light to match the Philipps hue light.

The one other thing to note in the garage is a time-of-day automation I setup to prevent me from banging my head until the wee hours of the morning. At 11pm, the lights go into nightlight mode reminding me to go to bed and work on it tomorrow.  

Boiler Room

This is really just a small closet which contains the hot water tank and my furnace. I installed a Fibaro flood sensor in here to detect any leaks before they become a big problem. The hot water tank is a little bit older. Installing the leak sensor just gives me a little peace of mind. This sensor also includes a temperature sensor which is super important for a water tank in a northern country. If it starts to get cold, bad things can happen. Not a bill I’m interested in seeing. 

Downstairs Windows

There are a few different windows in my basement that I installed the Elgato Eve Door & Window sensors on. Peace of mind to make sure that we’re all locked up when we leave at night. Already mentioned, but also important when you live in a Northern country. Bad things can happen when the inside of your house goes below freezing temperatures. 

What’s next?

As you can tell, I’ve really developed a passion for home automation. I currently only have two projects I’m planning, but as new HomeKit enabled devices come out, I expect this list to grow. 

HomeBridge Nest Thermostat integration

The Nest thermostat is not currently Apple homeKit compatible. The Homebridge project exists to let non-Homekit devices participate in a HomeKit ecosystem. I recently purchased a RaspberryPi3 for this purpose. My goal here is basic thermostat integration into the rest of my smart home. 

Soma Smart Shades

Soma has recently announced a HomeKit bridge for pre-order. I’m pretty wary about ordering any products without the HomeKit certification. Too many companies which have announced HomeKit support and never delivered. Soma is an after-market device which connects to your existing shades and allows you to control them as part of your smart home.  Combine this with the different temp sensors and we can do some interesting things such as 

  • Shut Blinds at 12pm every day
  • Shut blinds when outdoor temperature exceeds desired indoor temperature
  • Open blinds at 4pm
  • Shut blinds for Movie Time scene

 

Hopefully this was interesting. If you have any questions or want more information, feel free to post in the comments below or to reach out on twitter 

@netmanchris

The First IoT Culling: My Devices are Dying.

 

Cull: to reduce the population of (a wild animal) by selective slaughter

As an early adopter of technology, I sometimes feel like I get to live in the future. Or as William Gibson said “The future is already here, it’s just not evenly distributed”. There are a lot of benefits to be gained from this, but there are also risks.  One of the biggest risks is 

How long is the product you choose going to be around?

 

I was an early adopter in the first wave of IoT devices, from wearables to home convenience devices, I dipped my toes in the pool early. Most of these platforms were Kickstarter projects and I’ve been generally happy with most of them, at least the ones that were actually delivered. ( That’s a story for another time…).

But in the last six months, the market seems to have decided that there are just too many of these small companies.

The Death Bells are Ringing

In the last year, I’ve noticed that there’s starting to be a trend. Many of the early platforms that I invested in seem to be disappearing. Some have been bought off and killed. Remember the Pebble watches which was acquired by Fitbit? I’ve got an original and a Pebble time that are now little more than short-term battery traditional watches.

 

NewImageSome are just dying on the vine.

The latest victim? The Sense sleep monitor system by Hello.  This was a Kickstarter project that really helped to define a new category. When the project was launched in 2013, there was nothing else like it in the market, at least nothing that I’m aware of. Like most Kickstarter projects, they shipped later than their Aug 2014 estimate, but when it arrived it was definitely worth the wait.

This device had multiple sensors including light, sound, humidity, VoC ( air quality ), and temperature. It also had remote bluetooth motion sensors that attached to your pillows to track body movement while you sleep. The basic idea is that you sleep 1/3 of your life. Shouldn’t we make sure we are doing it right?  The combination of the sensor data combined with sleep research will help users understand why they feel good or bad in the morning. How to create the optimal conditions in your bedroom etc…  Obviously I’m not a sleep expert, but I can say that sense has improved the quality of my sleep since I started using it. 

Just last week, we all received some sad news that Hello, the company behind the Sense product is shutting down.

NewImage 

What’s happening?

Although there are some people who believe that Sense was just a faulty product, I think there is something deeper going on here. This shows a fundamental flaw with the business models that some of the early IoT and wearables players came to market with. There is a very simple business principle that somehow they seemed to completely miss.

If you want to survive as a business, you’re incoming cash must be more than you’re outgoing case. 

 

Pebble and Sense, and several other wearable and IoT products in the market right now were built on a single-purchase model. You buy the product and you get unlimited right to use that product.  This is the consumers preferred model. When I spend my money on something, I want to own it. I want to be able to use it, and I don’t want to have to pay for it again and again. At least this is the simple version of the thought process.

Spending some time watching various devices become little more than expensive bricks has made me re-examine that thought process though.

Why I’m looking for Subscription models now

Yup. That’s right. You heard me.

I want to find companies that are actively looking to provide value funded through a subscription model of some kind. Companies like Nest and Ring who are providing cloud storage for security cameras are a great example of this in action.

Looking at the failing companies; the one thing in common that I’m starting to see in common with these different devices is that they tend to make a whole bunch of money up front. ( $10M+ for the initial Pebble Kickstarter project. One of the largest ever on that platform! ).  But they tend to be niche products that have a limited target audience, and when that target market has been saturated….  No more money comes in and they’re left having to continue to pay for the “cloud” infrastructure required to keep their products going.

Looking at Sense and Pebble, both of these platforms sold with a hardware model. They have a product offering where your devices connect to cloud-based infrastructure, whether that’s AWS based or other is irrelevant. What most consumers don’t realize is that cloud-based infrastructure has a reoccurring monthly cost to it. This also doesn’t include the cost of ongoing platform development, whether that’s adding new features, creating a better user-experience, or just upgrading to stay current with the newest versions of Apple iOS or Android that are shipping on current devices. 

This is fine as long as you continue to sell new hardware product, but as the number of new users start to trend down and your costs stay the same… we start to see what’s happening in the market right now.

Are Subscription models that only way?

Absolutely not. There are other companies, like Dlink, iHome or iDevices that have a fairly broad portfolio of products and are continuously creating new products. These helps to ensure they have a healthy income stream as individual product segments become saturated. They can afford to continue to fund app development and the infrastructure required to host them as they are spreading that cost over many devices.

 

More Deaths in the future

There have been some notable passings, such as Pebble and Sense, but I don’t think they are going to be the last by any stretch of the imagination. 2017 and 2018 are going to be a hard year on early adopters as we start to look at the mirrors, watches, and gadgets blink eternally as they have no home in the cloud to call back to. Hoping that many of the new IoT players start to realize that having a good technology idea isn’t enough if you want to survive. Strange that I’m now looking at business models in a consumer product purchasing decision. I guess this just goes to show how educated the consumer is truly becoming.

As I invest in my SmartHome products, I look for companies who are established with multiple streams of revenue. Companies like Lutron or Philipps. In some cases, like the Soma Smart Blinds, I really don’t have another option. I’ll probably buy them, but I’m not expecting to these to last the long term. I wish Soma the best of luck, but I don’t see a subscription model and it’s not like shades are something you replace every year. 

Bottom line is enjoy your first generation wearables now. They might not be around for that much longer. 

 

@netmanchris

Using JSONSchema to Validate input

There are a lot of REST APIs out there. Quite a few of them use JSON as the data structure which allows us to get data in and out of these devices. There are a lot of network focused blogs that detail how to send and receive data in and out of these devices, but I wasn’t able to find anything that specifically talked about validating the input and output of the data to make sure we’re sending and receiving the expected information.

Testing is a crucial, and IMO too often overlooked, part of the Infrastructure as Code movement. Hopefully this post will help others start to think more about validating input and output of these APIs, or at the very least, spend just a little more time thinking about testing your API interactions before you decide to automate the massive explosion of your infrastructure with a poorly tested script. 🙂

What is JSONSchema

I’m assuming that you already know what JSON is, so let’s skip directly to talking about JsonSchema. This is a pythonlibrary which allows you to take your input/output  and verify it against a known schema which defined the data types you’re expecting to see.

For example, consider this snippet of a schema which defines what a valid VLAN object looks like

"vlan_id":
{
    "description": "The unique ID of the VLAN. IEEE 802.1Q VLAN identifier (VID)", 
    "minimum": 1, 
    "maximum": 4094, 
    "type": "integer", 
    "sql.not_null": true
}

You can see that this is a small set of rules that defines what is a valid entry for the vlan_id property of a VLAN.  As a network professional, it’s obvious to us that a valid VLAN ID  must be between 1 and 4094. We know this because we deal with it every day. But our software doesn’t know this. We have to teach it what a valid vlan_id property looks like and that’s what the schema does.

our software doesn’t know this. We have to teach it

Why do we care?

Testing is SUPER important. By being able to test the input/output of what you’re feeding into your automation/orchestration framework, it can help you to avoid, at worst, a total meltdown, or, at best, a couple of hours trying to figure out why your code doesn’t work.

Using JSONSchema

So the two things you’re going to need to use JSONSchema are

  • The JSON Schema for a specific API endpoint
  • The input/output that you want to validate.

In this case, we’ll use a VLAN object that is coming out of an ArubaOS-Switch REST API.

You did know the ArubaOS-Switches have a REST API, right?

Step1 – Loading the VLAN object

We’re going to gather the output from the VLANS API. Instead of writing custom code, we’ll just use the pyarubaoss library. I’ll leave you to check out the GitHub repo and just paste the contents of the output of a single VLAN JSON output here. I’m also going to create a second VLAN with a VLAN_ID of 5000. Just to show how this works. 5000 of course, is not valid and we’d like to prove that. Right?

Step 2 – Loading the JSON Schema Definition

Now we have the output, we want to make sure that the output here complies with the JSON Schema definition that we’ve been provided.

Loading the JSON schema

Here’s a sub-set of the JSON schema that defines what a valid VLAN looks like

Step 3 – Importing the JSON Schema Library and Validating

Now we’re going to load the JSON Schema library into our python session and use it to validate the VLAN object using the Schema we defined above.

First we’ll look at the vlan_good object and run it through the validate function

As you can see, there’s nothing to see here. Basically this means that the vlan_good object is conforming properly to the provided JSON Schema. The VLAN ID is valid as it’s a integer value between 1 and 4094

Now let’s take a look at the vlan_bad object and run it through the same validate function

We can see that the validate function now raises an exception because and let’s us know very specifically that the VLAN ID 5000 is not valid

jsonschema.exceptions.ValidationError: 5000is greater than the maximum of 4094

Pretty cool right? We can still definitely shoot ourselves in the foot, but at least we know the input/output data that we’re using for our API is valid. To me this is important for two reasons

  • I can validate that the data I’m trying to send to a given API conforms to what that API is expecting
  • I can validate that the vendor didn’t suddenly change their API which is going to break my code

Wrap Up

There are a lot of networking folk who have started to take on the new set of skills required to automate their infrastructure. One of the most crucial parts of this is testing and validating the code to ensure that you’re not just blowing up your network more efficiently. JSON Schema is one of the tools in your tool box that can help you do that.

Comments, questions? Let me know

@netmanchris

Amazon S3 Outage: Another Opinion Piece

So Amazon S3 had some “issues” last week and it’s taken me a few days to put my thoughts together around this. Hopefully I’ve made the tail-end of the still interested-enough-to-find-this-blog-valuable period.

Trying to make the best of a bad situation, the good news, in my opinion, is that this shows that infrastructure people still have a place in the automated cloudy world of the future. At least that’s something right?

What happened:

You can read the detailed explanation on Amazon’s summary here.

In a nutshell

  • there was a small problem
  • they tried to fix it
  • things went bad for a relatively short time
  • They fixed it

What happened during:

The internet lost it’s minds. Or more accurately, some parts of the internet went down. Some of them extremely ironic

UNADJUSTEDNONRAW thumb bbfd

Initial thoughts

The reaction to this event is amusing and it drives home the point that infrastructure engineers are as critical as ever, if not even more important considering the complete lack of architecture that seems to have gone into the majority of these “applications”.

First let’s talk about availability: Looking at the Amazon AWS S3 SLA, available here, it looks like they did fall below there 99.9% SLA for availability. If we do a quick look at https://uptime.is/ we can see that for the monthly period, they were aiming for no more than 43m 49.7s of outage. Seems like they did about 6-8 hours of an outage so clearly they failed. Looking at the S3 SLA page, looks like customers might be eligible for 25% service credits. I’ll let you guys work that out with AWS.

Don’t “JUST CLICK NEXT”

One of the first things that struck me as funny here was the fact that this was the US-EAST-1 Region which was affected. US-EAST is the default region for most of the AWS services. You have to intentionally select another region if you want your service to be hosted somewhere else. But because it’s easier to just cllck next, it seems that the majority of people just clicked past that part and didn’t think about where they were actually hosting there services or the implications of hosting everything in the same region and probably the same availability zone. For more on this topic, take a look here.

There’s been a lot of criticism of the infrastructure people when anyone with a credit card can go to amazon sign up for a AWS account and start consuming their infrastructure. This has been thrown around like this is actually a good thing, right?

Well this is exactly what happens when “anyone” does that. You end up with all your eggs in one basket.  (m/n in round numbers)

“Design your infrastructure for the four S’s. Stability Scalability, Security, and Stupidity” — Jeff Kabel

Again, this is not an issue with AWS, or any Cloud Providers offerings. This is an issue with people who think that infrastructure and architecture don’t matter and it can just be “automated” away. Automation is important, but it’s there so that your infrastructure people can free up some time from mind numbing tasks to help you properly architect the infra components your applications rely upon.

Why o Why o Why

Why anyone would architect their revenue generating system on an infrastructure that was only guaranteed to 99.9% is beyond me.  The right answer, at least from an infrastructure engineers point of view is obvious, right?

You would use redundant architecture to raise the overall resilience of the application. Relying on the fact that it’s highly unlikely that you’re going to lose the different redundant pieces at the same time.  Put simply, what are the chances that two different systems, both guaranteed to 99.9% SLA are going to go down at the exact same time?

Well doing some really basic probability calculations, and assuming the outages are independent events, we multiple the non-SLA’d time happening ( 0.001% ) in system 1 times the same metric in system 2 and we get.

0.001 * 0.001 = 0.000001 probability of both systems going down at the same time.

Or another way of saying that is 0.999999% of uptime.   Pretty great right?

Note: I’m not an availability calculation expert, so if I’ve messed up a basic assumption here, someone please feel free to correct me. Always looking to learn!

So application people made the mistake of just signing over responsibility to “the cloud” for their application uptime, most of whom probably didn’t even read the SLA for the S3 service or sit down to think.

Really? We had people armed with an IDE and a credit card move our apps to “the cloud” and wonder why things failed.

What could they have done?

There’s a million ways to answer this I’m sure, but let’s just look at what was available within the AWS list of service offerings.

Cloudfront is AWS’s content delivery system. Extremely easy to use. Easy to setup and takes care of automatically moving your content to multiple AWS Regions and Availability Zones.

Route 53 is AWS’s DNS service that will allow you to perform health checks and only direct DNS queries to resources which are “healthy” or actively available.

There are probably a lot of other options as well, both within AWS and without, but my point is that the applications that went down most likely didn’t bother. Or they were denied the budget to properly architect resiliency into their system.

On the bright side, the latter just had a budget opening event.

Look who did it right

Unsurprisingly, there were companies who weathered the S3 storm like nothing happened. In fact, I was able to sit and binge watch Netflix well the rest of the internet was melting down. Yes, it looks like it cost 25% more, but then again, I had no problems with season 4 of Big Bang Theory at all last week, so I’m a happy customer.

Companies still like happy customers, don’t they?

The Cloud is still a good thing

I’m hoping that no one reads this as a anti-cloud post. There’s enough anti-cloud rhetoric happening right now, which I suppose is inevitable considering last weeks highly visible outage, and I don’t want to add to that.

What I do want is for people who read this to spend a little bit of time thinking about their applications and the infrastructure that supports them. This type of thing happens in enterprise environments every day. Systems die. Hardware fails. Get over the it and design your architecture to take into consideration these failures as a foregone conclusion. It IS going to happen, it’s just a matter of when. So shouldn’t we design up front around that?

Alternately, we could also chose to take the risk for those services that don’t generate revenue for the business. If it’s not making you money, maybe you don’t want to pay for it to be resilient. That’s ok too. Just make an informed decision.

For the record, I’m a network engineer well versed in the arcane discipline of plumbing packets. Cloud and Application architectures are pretty far away from the land of BGP peering and routing tables where I spend my days. But for the low low price of $15 and a bit of time on Udemy, I was able to dig into AWS and build some skills that let me look at last weeks outage with a much more informed perspective. To all my infrastructure engineer peeps I highly encourage you to take the time, learn a bit, and get involved in these conversations at your companies. Hoping we can all raise the bar collectively together.

Comments, questions?

@netmanchris

Shedding the Lights on Operations: REST, a NMS and a Lightbulb

It’s obvious I’ve caught the automation bug. Beyond just automating the network I’ve finally started to dip my toes in the home automation pool as well.

The latest addition to the home project was the Philipps hue light bulbs. Basically, I just wanted a new toy, but imagine my delight when I found that there’s a full REST API available. 

I’ve got a REST API and a light bulb and suddenly I was inspired!

The Project

Network Management Systems have long suffered from information overload.

Notifications have to be tuned and if you’re really good you can eventually get the stream down to a dull roar. Unfortunately, the notification process is still broken in that the notifications are generally dumped into your email which if you are anything like me…

NewImage

Yes. That’s really my number as of this writing

One of the ways of dealing with the deluge is to use a different medium to deliver the message. Many NMS systems, including HPE IMC, has the capability of issuing audio alarms, but let’s be honest. That can get pretty annoying as well and it’s pretty easy to mute them.

I decided that I would use the REST interfaces of the HPE IMC NMS and the Phillips Hue lightbulbs to provide a visual indication of the general state of the system.Yes, there’s a valid business justifiable reason for doing this. But c’mon, we’re friends?  The real reason I worked on this was because they both have REST APIs and I was bored. So why not, right?

The other great thing about this is that you don’t need to spend your day looking at a NOC screen. You can login when the light goes to whatever color you decide is bad.

Getting Started with Philipps Hue API

The Philipps SDK getting started was actually really easy to work through. As well, there’s an embedded HTML interface that allows you to play around with the REST API directly on the hue bridge.

Once you’ve setup your initial authentication to the bridge ( see the getting started guide ) you can login to the bridge at http://ip_address/debug/clip.html

From there it’s all fun and games. For instance, if you wanted to see the state of light number 14, you would navigate to api/%app_name%/lights/14 and you would get back the following in nice easy to read JSON.

http://ipaddress/debug/clip.html/

NewImage

From here, it would be fairly easy to use a http library like REQUESTS to start issuing HTTP commands at the bridge but, as I’m sure you’re aware by now, there’s very little unread territory in the land of python.

PHUE library

Of course someone has been here before me and has written a nice library that works with both python 2 and python 3.  You can see the library source code here, or you can simple

>>> pip install phue

From your terminal.

The Proof of Concept

You can check out the code for the proof of concept here. Or you can watch the video below.

Breaking down the code

1) Grab Current Alarm List

2) Iterate over the Alarms and find the one with the most severe alarm state

3) Create a function to correlate the alarm state to the color of the Philipps Hue lightbulb.

4) Wait for things to move away from green.

Lessons Learned

The biggest lesson here was that colours on a screen and colours on a light bulb don’t translate very well. The green and the yellow lights weren’t far enough apart to be useful as a visual indicator of the health of the network, at least not IMHO.

The other thing I learned is that you can waste a lot of time working on aesthetics. Because I was leveraging the PHUE library and the PYHPEIMC library, 99% of the code was already written. The project probably took me less than 10 minutes to get the logic together and more than a few hours playing around with different colour combinations to get something that I was at least somewhat ok with. I imagine the setting and the ambient light would very much effect whether or not this looks good in your place of business.If you use my code, you’ll want to tinker with it.

Where to Next

We see IoT devices all over in our personal lives, but it’s interesting to me that I could set up a visual indicator for a NOC environment on network health state for less than 100$.  Just thinking about some of the possibilities here

  • Connect each NOC agents ticket queue with the light color. Once they are assigned a ticket, they go orange for DO-NOT-DISTURB
  • Connect the APP to a Clearpass authentication API and Flash the bulbs blue when the boss walks in the building. Always good to know when you should be shutting down solitaire and look like you’re doing something useful, right?
  • Connect the APP to a Meridian location API and turn all the lights green when the boss walks on the floor.

Now I’m not advocating you should hide things from your boss, but imagine how much faster network outages would get fixed if we didn’t have to stop fixing them to explain to our boss what was happening and what we were going to be doing to fix them, right?

Hopefully, this will have inspired someone to take the leap and try something out,

Comments, questions?

@netmanchris

Auto Network Diagram with Graphviz

One of the most useful and least updated pieces of network documentation is the network diagram. We all know this, and yet we still don’t have/make time to update this until something catastrophic happens and then we says to ourselves

Wow. I wish I had updated this sooner…

Graphviz

According to the website 

Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics,  software engineering, database and web design, machine learning, and in visual interfaces for other technical domains.

note: Lots of great examples and docs there BTW.  Definitely check it out.

Getting started

So you’re going to have to first install graphviz from their website. Go ahead… I’l wait here.

Install the graphviz python binding

This should be easy assuming you’ve already got python and pip installed. I’m assuming that you do.

>>> pip install graphviz

Getting LLDP Neighbors from Arista Devices

You can use the Arista pyeapi library, also installable through pip as well.  There’s a blog which introduces you to the basics here which you can check out. Essentially I followed that blog and then substituted the “show lldp neighbors” command to get the output I was looking for.

Creating a Simple Network Diagram

The code for this is available here

Essentially, I’m just parsing the JSON output from the Arista eAPI and creating a DOTfile which is used to generate the diagram.

Pros: It’s automated

Cons: It’s not very pretty at all.

SimpleTopo.png

 

Prettying it up a Bit

Code for this is available here

So with a little bit of work using the .attr methods we can pretty this up a bit.  For example the

dot.attr('node', shape='box')

method turns the node shape from an ellipse into a box shape. The other transformations are pretty obvious as well.

Notice that we changed the shape of the shape, the style of the arrows a bit and also shaded in the box.  There are a lots of other modifications we can make, but I’ll leave you to check out the docs for that. 

SimplePrettierTopo.png

 

 

Adding your own graphics

Code for this is available here

Getting a bit closer to what I want, but still I think we can do a bit better. For this example, I used mspaint to create a simple PNG file with a switch-ish image on it. From what I can tell, there’s no reason you couldn’t just use the vendor icons for whatever devices you’re using, but for now, just playing with something quick and dirty.

Once the file is created and placed somewhere in the path, you can use this method

dot.attr('node', image="./images/switch1.png")

to get the right image.  You’ll also notice I used

dot.attr('edge', arrowhead='none')

to remove the arrow heads. ( I actually removed another command, can you spot it? )

SimplePrettierGraphicTopo.png

 

Straighter Lines

Code for this is available here

So looking at this image, one thing I don’t like is the curved lines. This is where Graphviz beat me for the day. I did find that I was able to apply the

dot.graph_attr['splines'] = "ortho"

attribute to the dot object to get me the straight lines I wanted, but when I did that, I got a great message that told me that I would need to use xlables instead of standard labels.

SimplePrettierGraphicOrthoTopo.png

Next Steps

Code for this is available here

For this next step, it was time to get the info live from the device, and also to attempt to stitch multiple devices together into a single topology. Some things I noticed is that the name of the node MUST match the hostname of the device, otherwise you end up with multiple nodes.  You can see there’s still a lot of work to do to clean this up, but I think it’s worth sharing. Hopefully you do too.

MultiTopo.png

 

Thoughts

Pros: Graphviz is definitely cool. I can see a lot of time spent in drawing network diagrams here. The fact that you could automatically run this every X period to ensure you have a up to date network diagram at all times is pretty awesome. It’s customizable which is nice, and multi-vendor would be pretty easy to implement. Worse case scenario, you could just poll the LLDP MIB with SNMP and dump the data into the appropriate bucket. Not elegant, but definitely functional.

Cons:  The link labels are a pain. In the short time I was playing with it, I wasn’t able to google or documentation my way into what I want, which is a label on each end of the link which would tell me what interface on which device. Not the glob of data in the middle that makes me wonder which end is which.

The other thing I don’t like is the curvy lines. I want straight lines. Whether that’s an issue with the graphviz python library that I’m using or it’s actually a problem with the whole graphviz framework isn’t clear to me yet. Considering the time saved, I could probably live with this as is, but I’d also like to do better.

If anyone has figured out how to get past these minor issues, please drop me a line!  @netmanchris on twitter or comment on the blog.

As always, comments and fixes are always appreciated!

@netmanchris

Pseudo-Math to Measure Network Fragility Risk

Some of you may have heard me ranting on Packet Pushers on stupid network tricks and why we continue to be forced to implement kluges as a result.  I made some comment about trying to come up with some metric to help measure the deviation of the network from the “golden” desired state to the dirty, dirty thing that it’s become over time due to kluges and just general lack of network hygiene.

So I decided that I would write a bit of code to get the conversation started. All code discussed is available on my github here

The Idea

What I wanted here was to create some pseudo-mathematical way of generating a measurement that can communicate to the management structure WHY the requested change is a really, really, bad idea.

Imagine these two conversations:

bad-conversation

good-conversation

Which conversation would you like to be part of?

Assumptions:

I’m making some assumptions here that I think it’s important to talk about.

  1. You have a source-of-truth defined for your network state. That is you have abstracted your network state into some YAML files or something like that.
  2. You have golden configurations defined in templates (ex Jinja2 ). These templates can be combined with your source-of-truth and used to generate your “golden” config for any network device at any time.
  3. You have “current-state” templates  (jinja2) defined that include all your kluges that can be combined with your source-of-truth and used to generate your “golden” config for any network device at any time.

The Fragility Metric

So how does one calculate the fragility of a network?

Wow! Thanks for asking!

My methodology is something like this.

  1. Generate the configurations for all network devices using the golden configuration templates.
  2. Generate the configurations for all network devices using the “current-state” configuration templates.

We should now be left with a directory full of pairs of configs.

We then use the python difflib SequenceMatcher library to calculate the difference between the pairs of files. The difflib library allows us to take two text files, eliminate the white space and compare the contents of the two files. One of the cool things is that it can give us a ratio metric which gives us a number between zero and one to measure how close the two files are.

What this means is that you can get this as output.

5930-1.cfg stability metric: 1.0
5930-2.cfg stability metric: 0.9958677685950413
7904-1.cfg stability metric: 0.9428861101723556
7904-2.cfg stability metric: 0.9405405405405406

Now that we’ve got a ratio for how different all of the pairs of files are, we can then calculate the mean average of all the files to calculate the network stability metric and network fragility metric

Network Stability Metric: 0.9698236048269844
Network Fragility Metric: 0.030176395173015624

HINT: If you add the two numbers together…

You can also get a nice graph

blog_graphic

Note: The pygal library produces a much cooler graphic which you can see here

The Approach

So the first thing I want to make clear is that I don’t intend this to REALLY measure the risk of a given configuration.

One idea I did have was to adjust the weighting of a specific configuration based on the role of that device.

Example – The core switch blowing up is PROBABLY a lot worse than an edge switch tanking because of some kludgey configuration.

This would be fairly easy to implement by placing some meta data on the configs to add their role.

It would be fairly easy to go down rat holes here on trying to identify every single line that’s changed and try to weight individual changes

Example – Look for [‘BGP’,’OSPF’,’ISIS’,’EIGRP’] in the dirty config and then weight those lines higher. Look for [‘RIP’] and rate that even higher.

Cause.. C’Mon… Friend don’t let friends run RIP, right?

Again, all the code is available here. Have a look. Think about it. Give me your feedback, I’d love to know if this is something you see value in.

 

@netmanchris