The First IoT Culling: My Devices are Dying.

 

Cull: to reduce the population of (a wild animal) by selective slaughter

As an early adopter of technology, I sometimes feel like I get to live in the future. Or as William Gibson said “The future is already here, it’s just not evenly distributed”. There are a lot of benefits to be gained from this, but there are also risks.  One of the biggest risks is 

How long is the product you choose going to be around?

 

I was an early adopter in the first wave of IoT devices, from wearables to home convenience devices, I dipped my toes in the pool early. Most of these platforms were Kickstarter projects and I’ve been generally happy with most of them, at least the ones that were actually delivered. ( That’s a story for another time…).

But in the last six months, the market seems to have decided that there are just too many of these small companies.

The Death Bells are Ringing

In the last year, I’ve noticed that there’s starting to be a trend. Many of the early platforms that I invested in seem to be disappearing. Some have been bought off and killed. Remember the Pebble watches which was acquired by Fitbit? I’ve got an original and a Pebble time that are now little more than short-term battery traditional watches.

 

NewImageSome are just dying on the vine.

The latest victim? The Sense sleep monitor system by Hello.  This was a Kickstarter project that really helped to define a new category. When the project was launched in 2013, there was nothing else like it in the market, at least nothing that I’m aware of. Like most Kickstarter projects, they shipped later than their Aug 2014 estimate, but when it arrived it was definitely worth the wait.

This device had multiple sensors including light, sound, humidity, VoC ( air quality ), and temperature. It also had remote bluetooth motion sensors that attached to your pillows to track body movement while you sleep. The basic idea is that you sleep 1/3 of your life. Shouldn’t we make sure we are doing it right?  The combination of the sensor data combined with sleep research will help users understand why they feel good or bad in the morning. How to create the optimal conditions in your bedroom etc…  Obviously I’m not a sleep expert, but I can say that sense has improved the quality of my sleep since I started using it. 

Just last week, we all received some sad news that Hello, the company behind the Sense product is shutting down.

NewImage 

What’s happening?

Although there are some people who believe that Sense was just a faulty product, I think there is something deeper going on here. This shows a fundamental flaw with the business models that some of the early IoT and wearables players came to market with. There is a very simple business principle that somehow they seemed to completely miss.

If you want to survive as a business, you’re incoming cash must be more than you’re outgoing case. 

 

Pebble and Sense, and several other wearable and IoT products in the market right now were built on a single-purchase model. You buy the product and you get unlimited right to use that product.  This is the consumers preferred model. When I spend my money on something, I want to own it. I want to be able to use it, and I don’t want to have to pay for it again and again. At least this is the simple version of the thought process.

Spending some time watching various devices become little more than expensive bricks has made me re-examine that thought process though.

Why I’m looking for Subscription models now

Yup. That’s right. You heard me.

I want to find companies that are actively looking to provide value funded through a subscription model of some kind. Companies like Nest and Ring who are providing cloud storage for security cameras are a great example of this in action.

Looking at the failing companies; the one thing in common that I’m starting to see in common with these different devices is that they tend to make a whole bunch of money up front. ( $10M+ for the initial Pebble Kickstarter project. One of the largest ever on that platform! ).  But they tend to be niche products that have a limited target audience, and when that target market has been saturated….  No more money comes in and they’re left having to continue to pay for the “cloud” infrastructure required to keep their products going.

Looking at Sense and Pebble, both of these platforms sold with a hardware model. They have a product offering where your devices connect to cloud-based infrastructure, whether that’s AWS based or other is irrelevant. What most consumers don’t realize is that cloud-based infrastructure has a reoccurring monthly cost to it. This also doesn’t include the cost of ongoing platform development, whether that’s adding new features, creating a better user-experience, or just upgrading to stay current with the newest versions of Apple iOS or Android that are shipping on current devices. 

This is fine as long as you continue to sell new hardware product, but as the number of new users start to trend down and your costs stay the same… we start to see what’s happening in the market right now.

Are Subscription models that only way?

Absolutely not. There are other companies, like Dlink, iHome or iDevices that have a fairly broad portfolio of products and are continuously creating new products. These helps to ensure they have a healthy income stream as individual product segments become saturated. They can afford to continue to fund app development and the infrastructure required to host them as they are spreading that cost over many devices.

 

More Deaths in the future

There have been some notable passings, such as Pebble and Sense, but I don’t think they are going to be the last by any stretch of the imagination. 2017 and 2018 are going to be a hard year on early adopters as we start to look at the mirrors, watches, and gadgets blink eternally as they have no home in the cloud to call back to. Hoping that many of the new IoT players start to realize that having a good technology idea isn’t enough if you want to survive. Strange that I’m now looking at business models in a consumer product purchasing decision. I guess this just goes to show how educated the consumer is truly becoming.

As I invest in my SmartHome products, I look for companies who are established with multiple streams of revenue. Companies like Lutron or Philipps. In some cases, like the Soma Smart Blinds, I really don’t have another option. I’ll probably buy them, but I’m not expecting to these to last the long term. I wish Soma the best of luck, but I don’t see a subscription model and it’s not like shades are something you replace every year. 

Bottom line is enjoy your first generation wearables now. They might not be around for that much longer. 

 

@netmanchris

Advertisements

Amazon S3 Outage: Another Opinion Piece

So Amazon S3 had some “issues” last week and it’s taken me a few days to put my thoughts together around this. Hopefully I’ve made the tail-end of the still interested-enough-to-find-this-blog-valuable period.

Trying to make the best of a bad situation, the good news, in my opinion, is that this shows that infrastructure people still have a place in the automated cloudy world of the future. At least that’s something right?

What happened:

You can read the detailed explanation on Amazon’s summary here.

In a nutshell

  • there was a small problem
  • they tried to fix it
  • things went bad for a relatively short time
  • They fixed it

What happened during:

The internet lost it’s minds. Or more accurately, some parts of the internet went down. Some of them extremely ironic

UNADJUSTEDNONRAW thumb bbfd

Initial thoughts

The reaction to this event is amusing and it drives home the point that infrastructure engineers are as critical as ever, if not even more important considering the complete lack of architecture that seems to have gone into the majority of these “applications”.

First let’s talk about availability: Looking at the Amazon AWS S3 SLA, available here, it looks like they did fall below there 99.9% SLA for availability. If we do a quick look at https://uptime.is/ we can see that for the monthly period, they were aiming for no more than 43m 49.7s of outage. Seems like they did about 6-8 hours of an outage so clearly they failed. Looking at the S3 SLA page, looks like customers might be eligible for 25% service credits. I’ll let you guys work that out with AWS.

Don’t “JUST CLICK NEXT”

One of the first things that struck me as funny here was the fact that this was the US-EAST-1 Region which was affected. US-EAST is the default region for most of the AWS services. You have to intentionally select another region if you want your service to be hosted somewhere else. But because it’s easier to just cllck next, it seems that the majority of people just clicked past that part and didn’t think about where they were actually hosting there services or the implications of hosting everything in the same region and probably the same availability zone. For more on this topic, take a look here.

There’s been a lot of criticism of the infrastructure people when anyone with a credit card can go to amazon sign up for a AWS account and start consuming their infrastructure. This has been thrown around like this is actually a good thing, right?

Well this is exactly what happens when “anyone” does that. You end up with all your eggs in one basket.  (m/n in round numbers)

“Design your infrastructure for the four S’s. Stability Scalability, Security, and Stupidity” — Jeff Kabel

Again, this is not an issue with AWS, or any Cloud Providers offerings. This is an issue with people who think that infrastructure and architecture don’t matter and it can just be “automated” away. Automation is important, but it’s there so that your infrastructure people can free up some time from mind numbing tasks to help you properly architect the infra components your applications rely upon.

Why o Why o Why

Why anyone would architect their revenue generating system on an infrastructure that was only guaranteed to 99.9% is beyond me.  The right answer, at least from an infrastructure engineers point of view is obvious, right?

You would use redundant architecture to raise the overall resilience of the application. Relying on the fact that it’s highly unlikely that you’re going to lose the different redundant pieces at the same time.  Put simply, what are the chances that two different systems, both guaranteed to 99.9% SLA are going to go down at the exact same time?

Well doing some really basic probability calculations, and assuming the outages are independent events, we multiple the non-SLA’d time happening ( 0.001% ) in system 1 times the same metric in system 2 and we get.

0.001 * 0.001 = 0.000001 probability of both systems going down at the same time.

Or another way of saying that is 0.999999% of uptime.   Pretty great right?

Note: I’m not an availability calculation expert, so if I’ve messed up a basic assumption here, someone please feel free to correct me. Always looking to learn!

So application people made the mistake of just signing over responsibility to “the cloud” for their application uptime, most of whom probably didn’t even read the SLA for the S3 service or sit down to think.

Really? We had people armed with an IDE and a credit card move our apps to “the cloud” and wonder why things failed.

What could they have done?

There’s a million ways to answer this I’m sure, but let’s just look at what was available within the AWS list of service offerings.

Cloudfront is AWS’s content delivery system. Extremely easy to use. Easy to setup and takes care of automatically moving your content to multiple AWS Regions and Availability Zones.

Route 53 is AWS’s DNS service that will allow you to perform health checks and only direct DNS queries to resources which are “healthy” or actively available.

There are probably a lot of other options as well, both within AWS and without, but my point is that the applications that went down most likely didn’t bother. Or they were denied the budget to properly architect resiliency into their system.

On the bright side, the latter just had a budget opening event.

Look who did it right

Unsurprisingly, there were companies who weathered the S3 storm like nothing happened. In fact, I was able to sit and binge watch Netflix well the rest of the internet was melting down. Yes, it looks like it cost 25% more, but then again, I had no problems with season 4 of Big Bang Theory at all last week, so I’m a happy customer.

Companies still like happy customers, don’t they?

The Cloud is still a good thing

I’m hoping that no one reads this as a anti-cloud post. There’s enough anti-cloud rhetoric happening right now, which I suppose is inevitable considering last weeks highly visible outage, and I don’t want to add to that.

What I do want is for people who read this to spend a little bit of time thinking about their applications and the infrastructure that supports them. This type of thing happens in enterprise environments every day. Systems die. Hardware fails. Get over the it and design your architecture to take into consideration these failures as a foregone conclusion. It IS going to happen, it’s just a matter of when. So shouldn’t we design up front around that?

Alternately, we could also chose to take the risk for those services that don’t generate revenue for the business. If it’s not making you money, maybe you don’t want to pay for it to be resilient. That’s ok too. Just make an informed decision.

For the record, I’m a network engineer well versed in the arcane discipline of plumbing packets. Cloud and Application architectures are pretty far away from the land of BGP peering and routing tables where I spend my days. But for the low low price of $15 and a bit of time on Udemy, I was able to dig into AWS and build some skills that let me look at last weeks outage with a much more informed perspective. To all my infrastructure engineer peeps I highly encourage you to take the time, learn a bit, and get involved in these conversations at your companies. Hoping we can all raise the bar collectively together.

Comments, questions?

@netmanchris

Devops for Networking Forum in Santa Clara

Normally, I would be writing this a few weeks ago, but sometimes the world just takes the luxury of time away from you.  In this case, I couldn’t be happier though as I’m about to part of something that I believe is going to be really really amazing.  This event is really a testimony to Brent Salisbury and John Willis’s commitment to community and their relentless pursuit of trying to evolve the whole industry, bringing along as many of the friends they’ve made along the way as possible. 

Given the speaker list, I don’t believe there’s been any event in recent ( or long term!) memory that has such an amazing list of speakers. The most amazing part is that this event was really put together in the last month!!!! 

If you’re in the bay area, you should definitely be there. If you’re not in the area, you should buy a plane ticket as you might not ever get a chance like this again. 

 

DevOps Forum for Networking

From the website

 

previously known as DevOps4Networks is an event started in 2014 by John Willis and Brent Salisbury to begin a discussion on what Devops and Networking will look like over the next five years. The goal is to create a conversation for change similar to what CloudCamp did for Cloud adoption and DevopsDays for Devops.

 

When and Where

You can register here

DevOps Networking Forum 2016

Monday, March 14, 2016 9:00 AM – 5:00 PM (Pacific Time)

Santa Clara Convention Center
5001 Great America Pkwy
Santa ClaraCalifornia 95054
United States
Questions? Contact us at events@linuxfoundation.org

 Who

You can hit the actual speakers page here, but the here’s the short list

  • Kelsey Hightower, Google,
  • Kenneth Duda, Arista
  • Dave Meyer, Brocade
  • Anees Shaikh, Google
  • Chris Young, HPE
  • Leslie Carr, SFMIX
  • Dinesh Dutt, Cumulus
  • Petr Lapukhov, Facebook
  • Matt Oswalt, keepingitclasseless 
  • Scott Lowe, VMware

I’ve also heard that other of a few industry notables who will be wandering the hallways as ONS starts to spin up for the week. 

Yup. What an amazing list and for the low low price of $100, you can join us as well!

OMG

Im absolutely honoured and, to be honest, a little intimidated to be sharing a spot with some of the industry luminaries who have been guiding lights personally for me in the last five years. I’m hoping to be a little education, a little entertaining, and other than that, I’ll be in the front row with a box of popcorn soaking up as much as I can from the rest of the speakers.  

Hope to see you there!

 

@netmanchris

 

2015 Recap and plans for 2016

About this time last year, I wrote this post.  It’s time to revisit again and plan for 2016. 

 

How did I do?

In 2015, I planned to work on skills in four major areas; python, data science, virtualization, and then just keep up on networking.  In general, I think I did good in all areas, with the breakaway really being in the python area. I made a concerted effort this to year to seek out project after project that would allow me to explore different aspects of python, and force me to grow in as many areas as possible. Attending Interop sessions led by such trailblazers as Jason Edelman, Jeremy Schulman, and Matt Oswalt definitely gave me ideas and inspiration to explore new areas, push boundaries, and have really helped me to grow as a coder/developer not to mention really helped me to cement some opinions on the future of networking in general.

I did manage to get through a bunch of the R courses in Cousera and they were great. I’d love to say that I’m going to return and finish the specialization  (I’m two courses shorts) but if I”m honest, I’m probably going to move towards the Data Science aspects of Python and get into Anaconda and Pandas more in 2016. Nothing like combining two growth areas into one to really push the growth. 

 

More so this year that others, having great conversations over beers, Moscow Mules, and many a sugery umbrella drink helped me expand my knowledge and firm up some of the thoughts I’ve been having for the last couple of years. No need to name drop, but you all know who you are and I’d like to say thanks for all of the conversations and laughs. As much as I love the tech, it’s the people that always help to drive me forward. 

If I’m keeping score, I think that 2015 was a good year. 

 

Plans Plans Plans

Now comes time to publicly declare what I want to accomplish in 2016. This is always the scary part as I know I’m now publicly accountable for any grandiose designs I throw into the ether. 🙂 

 

Practice to Application:

 

I’ve got a bit of a lab at home. If things go as planned, at the end of the year, I will be able to factory reset *almost* the entire lab and have it come back from the dead in a completely automated fashion. The plan here is to use a combination of python, jinja2, IMC, Ansible, and whatever other pieces I need to duck tape together to make this work by the end of the year.  Just because I like to make my life harder than it has to be, I’m planning on building out the topology using vendor independent methodologies, meaning that I want to be able to place a Cisco, HP, or Juniper box into any position in my lab access/distribution/core/dmz/wan/etc… and use a YAML file to dynamically build the required configurations on demand.  

 

Yeah… I know….   But it’s good to set goals right?

 

OpenSwitch

OpenSwitch is also another area which I will definitely be exploring during 2016.  The project is still very new and definitely has some places, mostly in the documentation area, where there’s room to make a difference. I’ve been really lucky to be able to work at a place where I have direct access to some of the projects core developers and I’m hoping I can share the fruits of that access in more blogs posts, pull requests to enhance the documentation, as well as some interoperability testing with some of the usual-suspect network kit that I already have in my lab. Right now, I’m thinking OSPF, BGP, Spanning-Tree as an unambitious start, and moving from there to using the declarative interface and REST interfaces to see how I can incorporate it into project one.

 

 

Thoughts on 2015

2015 was a good year. As an industry, I think we’ve made some great gains in general. The whole “Is SDN really a thing?” conversations seem to be over and we’ve moved on to “ I don’t care what you call it, what does it do for me?” conversations  are starting to really get interesting.  The projects with value are starting to separate themselves from the science fair exhibits and it looks like parts of the networking profession are finally past the out-right denial and have reached the bargaining stage ( “ Can someone else write the scripts for me? Please? ) 

I’ve been able to make forward momentum in all the areas I wanted to and I’m generally where I thought I was going to be at the start of the year.

 

Looking forward to 2016!

 

@netmanchris

XML, JSON, and YAML… Oh my!

I”m a network engineer who codes. Maybe even a network coder. Probably not a a network programmer. Definitely not a programer who knows networking.  I’m in that weird zone where I’m enough of two things that don’t normally go together that it makes conversations I”m having with some of my peers awkward.

I had one such conversation today trying to explain the different data serializations modes in python and why, at the end of the day, they really don’t matter.

The conversation started with one of those “But they have an XML API!!!” comments thrown out as a criticism of someone’s product. My response was something like “ And why does that matter? ”

The person who made the comment certainly couldn’t answer that question. It was just something they had read in a competitive deck somewhere.

I’m all about competing and trying to make sure that the customer’s have the BEST possible information to make the best decisions for their particular requirements, but this little criticism was definitely not, IMHO, the best information. In fact, it was totally irrelevant.   This post is my way of trying to explain why. Hopefully, this will help clear up some of the confusion around data structures and APIs and why they really don’t matter so much, as least not their formatting.

XML

You can read more about XML here. In a nutshell,  XML uses tags, similar to HTML to represent different values in your data stream.  the <item> opens up an item and the </item> closes the item, and what lives between the two is the value for that item. Take a look at the following XML output from the HP IMC NMS. I just cut and paste this straight out of the API interface, so you should be able to do the same if you want to follow along at home.  In this code, I have created a string called x and pasted in the XML formatted text which is a bunch of information about a Cisco 2811 router that lives in my lab. Pay attention to the values as they will stay the same going through this exercise.

XML is the oldest of the bunch, being a W3C recommendation in 1998. It’s important to note though that XML is still relevant, being the native data format of Netconf and still used in a lot of places. It’s old, but that doesn’t mean devoid of value.

Ordered Dictionary

A dictionary is a way of storing data in python that uses keys instead of an index to access the content or value of a specific piece of information you want. Example item[‘ip’] would return “10.101.0.1” with a dictionary.

One of the “issues” with dictionaries is that they are unordered. That means that there’s no guaranty that when you print out a dictionary that the values will be in the same order. ( Pretty obvious when you read the word “unordered” I know.) The OrderedDictionary is a “ dict subclass that remembers the order entries were added”.  So we’re going to use a great little library called xmltodict which takes an XML string ( called x ) above and transforms it into a python ordered dictionary. Now we can do interesting things to it in python. We can access they keys and get to the values directly. We can iterate over top of it because it’s one of pythons native data structures. It’s easy to use. People know it and understand it. It’s a good thing. Lists and dictionaries are the bread and butter of data structures in python. You need, need, need them.

In this code example, we’re going to take the XML string from above, run it through the xmltodict to convert it to an ordered dictionary and assign it to the variable y.  Once I’ve got the ordereddict Y, I could also use xmltodict to convert it back into XML with little to no effort. Cool?

JSON

JSON has become one of the standard ways to represent data between machines. It’s structured, well understood and it’s mostly human readable. A lot of “newer” systems now use JSON as the default data type. Most RESTful APIs for instance seem to have settled on JSON.

This is where things get interesting. Now that I’ve got XML in an ordereddict, I can use the JSON library to convert it to a JSON formatted string which I can then send along to any system that understands JSON. Or write it to a file, or just stare at those pretty, pretty braces.

Note: If I convert from JSON back to a python structure using the json.loads method, it will actually return a regular dictionary, not an Ordered Dictionary, so the values might appear out of order which COULD, in theory, cause issues with an upstream system, but I haven’t seen that in any of my work.

YAML

Although JSON is “more” readable than XML, it’s still got all those braces and apostrophes to worry about. And so YAML was born. YAML is easily the most human readable of the formats I’ve worked with. It uses white space, dashes and asterisk to denote different levels of the data structure. It’s what is commonly used with Jinja2 templating and Ansible and other cool buzzwords that we all are starting to play with.

Just like with the JSON example above, I can take the Ordered Dictionary and convert it to a YAML format (shown below ) and back again.  The yaml.load method does actually return an Ordered Dictionary.

 

What’s my point?

So the original criticism was “But they have an XML api!!!” right?  Well in these little code snippets I just demonstrated how using python and a couple of readily available libraries ( pyyaml and xmltodict are not native python and must be installed ) I was able to go from XML, to OrderedDict, to JSON, to YAML,  with almost no effort. I could take any of these and convert it to something like a Python Pickle, pull it back and convert it to something else. It really doesn’t matter. I can go from one to another without much effort.

Personally, I don’t like working with XML. I can do it, but I would RATHER work with JSON. But that’s just my personal preference, there’s no technical reason why JSON is superior to XML that I can see. At least not in the implementations and the levels that I’m dealing with.

Just like Bilbo Baggins, I can go from there and back again without worrying to much about the actual format in between because when I”m doing something in python, I’m really looking to be working with a native structure like a list of a dictionary anyways.

Anything that I get from externally, I’m just going to convert into a native python data type, munge away, then I”m going convert it back to whatever data format I need, be that JSON, XML or YAML and be on to the next task.

The actual data is what matters.

As long as it’s structured in a way that I can parse easily, I couldn’t care less how it comes in and how it goes out.

Don’t even get me started about simple wrapping CLI commands in XML…

Does that mean the format doesn’t matter at all?

No, I’m sure there are many more experienced programmers who can explain the horror stories of converting between different data formats, or that time when this thing happened that caused this other thing to blow up.  But for me; I’d much rather you had a well structured API that gives me data in a way that I can easily access, convert to a format I can work with, and move on.

Hopefully if you’ve made it to the end of this blog. You’ll agree that the actual format is much less important that you might once have believed. Disagree? Let me know if the comments below. Always looking to learn something and in the coding real, I ‘know I’ve got a LOT to learn!!!

@netmanchris

 

Sometimes Size Matters: I’m sorry, but you’re just not big enough.

So now that I’ve got your attention…I wanted to put together some thoughts around a design principal of what I call the acceptable unit of loss, or AUL. 

Acceptable Unit of Loss: def. A unit to describe the amount of a specific resource that you’re willing to lose

 

Sounds pretty simple doesn’t it?  But what does it have to do with data networking? 

White Boxes and Cattle

2015 is the year of the white box. For those of you who have been hiding under a router for the last year, a white box is basically a network infrastructure device, right now limited to switches, that ships with no operating system.

The idea is that you:

  1. Buy some hardware
  2. Buy an operating system license ( or download an open source version) 
  3. Install the operating system on your network device
  4. Use DevOps tools to manage the whole thing

add Ice. Shake and IT operational goodness ensures. 

Where’s the beef?

So where do the cattle come in?  Pets vs. Cattle is something you can research elsewhere for a more thorough dealing, but in a nutshell, it’s the idea that pets are something that you love and care for and let sleep on the bed and give special treat to on Christmas.  Cattle on the other hand are things you give a number, feed from a trough, and kill off without remorse if a small group suddenly becomes ill. You replace them without a second thought. 

Cattle vs. Pets is a way to describe the operational model that’s been applied to the server operations at scale. The metaphor looks a little like this:

The servers are cattle. They get managed by tools like Ansible, Puppet, Chef, Salt Stack, Docker, Rocket, etc… which at a high level are all tools which allow for a new version of the server to be instantiated on a very specific configuration with little to no human intervention. Fully orchestrated,   

Your servers’s start acting up? Kill it. Rebuild it. Put in back in the herd. 

Now one thing that a lot of enterprise engineers seem to be missing is that this operational model is predicated on the fact that you’re application has been built out with a well thought out scale-out architecture that allows the distributed application to continue to operate when the “sick” servers are destroyed and will seamlessly integrate the new servers into the collective without a second thought. Pretty cool, no?

 

Are your switches Cattle?

So this brings me to the Acceptable Unit of Loss. I’ve had a lot of discussions with enterprise focused engineers who seem to believe that Whitebox and DevOps tools are going to drive down all their infrastructure costs and solve all their management issues.

“It’s broken? Just nuke it and rebuild it!”  “It’s broken? grab another one, they’re cheap!”

For me, the only way that this particular argument that customers give me is if there AUL metric is big enough.

To hopefully make this point I’ll use a picture and a little math:

 

Consider the following hardware:

  • HP C7000 Blade Server Chassis – 16 Blades per Chassis
  • HP 6125XLG Ethernet Interconnect – 4 x 40Gb Uplinks
  • HP 5930 Top of Rack Switch – 32 40G ports, but from the data sheet “ 40GbE ports may be split into four 10GbE ports each for a total of 96 10GbE ports with 8 40GbE Uplinks per switch.”

So let’s put this together

Screen Shot 2015 03 26 at 10 32 56 PM

So we’ll start with

  • 2 x HP 5930 ToR switches

For the math, I’m going to assume dual 5930’s with dual 6125XLGs in the C7000 chassis, we will assume all links are redundant, making the math a little bit easier. ( We’ll only count this with 1 x 5930, cool? )

  • 32 x 40Gb ports on the HP 5930 – 8  x 40Gb ports saved per uplink ) = 24 x 40Gb ports for connection to those HP 6125XLG interconnects in the C7000 Blade Chassis.
  • 24 x 40Gb ports from the HP 5930 will allow us to connect 6 x 6125XLGs for all four 40Gb uplinks. 

Still with me? 

  • 6 x 6125XLGs means 6 x C7000 which then translates into 6*16 physical servers.
Just so we’re all on the same page, if my math is right; we’ve got 96 physical servers on six blade chassis connected through the interconnects at 320Gb ( 4x40Gb x 2 – remember the redundant links?) to the dual HP 5930 ToR switches which will have (16*40Gb – 8*40Gb from each HP 5930) 640Gb of bandwidth out to the spine.  .  

If we go with a conservative VM to server ratio of 30:1,  that gets us to 2,880 VMs running on our little design. 

How much can you lose?

So now is where you ask the question:  

Can you afford to lose 2,880 VMs? 

According to the cattle & pets analogy, cattle can be replaced with no impact to operations because the herd will move on with out noticing. Ie. the Acceptable Unit of Lose is small enough that you’re still able to get the required value from the infrastructure assets. 

The obvious first objection I’m going to get is

“But wait! There are two REDUNDANT switches right? No problem, right?”

The reality of most of networks today is that they are designed to maximize the network throughput and efficient usage of all available bandwidth. MLAGG, in this case brought to you by HPs IRF, allows you to bind interfaces from two different physical boxes into a single link aggregation pipe. ( Think vPC, VSS, or whatever other MLAGG technology you’re familiar with ). 

So I ask you, what are the chances that you’re running the unit at below 50% of the available bandwidth? 

Yeah… I thought so.

So the reality is that when we lose that single ToR switch, we’re actually going to start dropping packets somewhere as you’ve been running the system at 70-80% utilization maximizing the value of those infrastructure assets. 

So what happens to TCP based application when we start to experience packet loss?  For a full treatment of the subject, feel free to go check out Terry Slattery’s excellent blog on TCP Performance and the Mathis Equation. For those of you who didn’t follow the math, let me sum it up for you.

Really Bad Things.  

On a ten gig link, bad things start to happen at 0.0001% packet loss. 

Are your Switches Cattle or Pets?

So now that we’ve done a bit of math and metaphors, we get to the real question of the day: Are you switches Cattle? Or are they Pets? I would argue that if your measuring your AUL in less that 2,000 servers, then you’re switches are probably Pets. You can’t afford to lose even one without bad things happening to your network, and more importantly the critical business applications that are being accessed by those pesky users. Did I mention they are the only reason the network exists?

Now this doesn’t mean that you can’t afford to lose a device. It’s going to happen. Plan for it. Have spares, Support Contracts whatever. But my point is that you probably won’t be able to go with a disposable infrastructure model like what has been suggested by many of the engineers I’ve talked to in recent months about why they want white boxes in their environments.

Wrap up

So are white boxes a bad thing if I don’t have a ton of servers and a well architected distributed application? Not at all! There are other reasons why white box could be a great choice for comparatively smaller environments. If you’ve got the right human resource pool internally with the right skill set, there are some REALLY interesting things that you can do with a white box switch running an OS like Cumulus linux. For some ideas,  check out this Software Gone Wild podcast with Ivan Pepelnjak and Matthew Stone.

But in general, if your metric for  Acceptable Unit of Loss is not measured in Data Centres, Rows, Pods, or Entire Racks, you’re probably just not big enough. 

 

Agree? Disagree? Hate the hand drawn diagram? All comments welcome below.

 

@netmanchris

It Generalist or Network Specialist?

Very shortly after posting this blog, I was pleasantly surprised by a comment from Ethan Banks. Apparently my posting and his free time aligned. 🙂   One line of his comment really got thinking about where I want to take my career in the future. You can read the whole comment, but the part that got thinking was Ethan’s comment “I’m looking into becoming more of an IT generalist in the long run.”.

 

IT Generalist vs. Network Specialist

When I went back and read my post, I can definitely see where the IT Generalist thought came up. There’s a lot of different skills that I”m trying to develop this year, and to be honest, there’s probably a lot more that I’m going to have to gain before I can start developing in the areas that I really want to go after.  For an example, I just spent a few hours reading about GIT.

GIT was not on the list, but I’ve just spent a few very precious hours of my time because it’s almost unthinkable now to be doing any software development without using GIT to share your work. It’s the standard and I just don’t have a job enough working knowledge of repos and forks, and merge’s etc…  just not something I’ve had to pick up in my career yet. So although it’s not on my list, it’s a pre-requisite for really being able to use the skills I want to develop.

So, back to the question; Am I trying to become a IT generalist or a network specialist? I’m not quite sure I have an answer to that yet. But i think that we need to first ask the question: “what is the network?” before I can figure out an answer.

I think this was an easy question a few years ago, but it’s getting much much harder to figure out where my areas of responsibility as a networking professional now ends.

 

How many Tiers?

I was having a conversation with @networkstatic a couple weeks ago and we were laughing the whole idea of a two tier network. Now, I work in marketing and I understand that the point of the phrase “two tier network” is really designed to communicate that there’s less hardware involved, therefore less cost, less latency, less complexity, etc… But once the marketing is down and the POs have been cut, it becomes time to operate the network and then the number tiers suddenly is a whole different number.

Think about a typical connection from a user to an application. Let’s assume that we’ve got typical architecture where there’s a user on a tablet trying to access an application in a data centre somewhere. I’m sure someone is going to disagree and we could easily expand the DMZ into a few more tiers, but for discussion purposes, just work with me here.

 

Network Tiers

Now by my count, this “two tier” network is actually going through thirteen different tiers in the network where policy can be applied which would potentially alter how the traffic would flow through the network, which of course impact the quality of the specific application that the user is trying to access.

Arguably, the last OVS/Linux Bridge layer would not be there for a lot of typical networks, but with technologies like Docker and Rocket, I think we’re going to see that become more commonplace over the next year. This also doesn’t even factor in the whole overlay/underlay and VTEP mess that can also throw another wrench into the mix. But I think you get my point.

Closing the loop

So bringing this back to the opening question; If a networking professional has responsibility to understand the end-to-end path and where issues may arise for a connection between a consumer and a service, it would seem that we need to develop skills across the entire stack.  There are a lot of other places where a problem can arise; database seek times, noisy neighbour issues in a storage array. Badly coded applications, etc…  I would not argue that a network engineer be knowledgable on the ins and outs of ALL of things that can go wrong, but I do believe that we should be making a fighting attempt at understanding the parts that affect our craft.

In the long run, I think we’re going to continue to see networking divide into sub-professions that specialize into specific architectural blocks of the network. Although there may be a lot of common knowledge, I would also argue there’s also a lot of specialized knowledge that can only be gained through experience dedicated to one of these architectural blocks over a period of time.

Network Disciplines List

 

Then again, I might be over thinking the whole thing. It’s also possible that the network knowledge will start to become considered generalist knowledge.

What do you think? Post your comments below.

 

@netmanchris