the illusion of perfection

Let’s face it, anyone who has the mindset of an engineer likes to deal in exacts. Exact measurements in any respect are required to fit together the incredibly complex legos that exist in all realms of engineering. Through this quest for definites, when we’re engineering a solution to a problem we’re almost immediately trying to account for every possible situation that this solution might be used in, and every possible failure scenario that might crop up when using the solution. But what we need to realize is that this quest for perfection when designing this solution is an illusion.

The embodiment of perfection is a marriage of knowledge and perspective. Both of these factors are ever changing, especially in the IT world, like I work in. And with this ever shifting plane of knowledge and perspective, it would be silly to look at something that I’m designing at this particular point in time and expect to get any semblance of perfection out of it as I will ultimately learn something new or acquire a new perspective, through the process of engineering, that can then be tied into the solution that I’d already declared complete and ‘perfect’. Thereby making the entire process cyclical. I see it kinda something like this :

Screen Shot 2014-10-28 at 10.50.41 PM

This isn’t a new concept, iterative engineering is a thing, but I think a lot of operators in the IT world are constantly striving for what would be considered a ‘perfect’ design of the respective platforms that they’re working on. And when that ‘perfection’ is never realized a lot of things can happen, whether it be an outage due to missing something in the design due to lack of knowledge, or lack of perspective on the environment you’re dealing with or how a certain aspect of the service should’ve been engineered. Then throw in the fallible nature of the human psyche and you’re asking for a disaster. Pride comes into play, feelings get hurt, words fly. It’s not pretty. I’ve been part of these types of interactions and they are NOT fun.

With that, we should realize that we should prepare for the fact that our idea of perfection will ever shift as we grow and learn and apply what we’ve seen. With this I leave you with a little anecdote of what made me think of this.

A few months ago I visited a friend who works for a particular organization whom are doing some pretty amazing things in the realm of technology right now. Whether it be open sourcing certain internal tooling, or using their presence to back certain directions in the industry. Hint : It’s something to do with a big blue ‘F’.

When I’d arrived at the facility, I walked into the building and proceeded to check in on the iPad at the reception counter, and I noticed that the iPad was sitting on a little wooden stand that had some words burned into it, “Done is better than perfect”, because you’ll never attain true perfection anyway.

The Whole Picture

Greg Ferro recently presented a challenge to the industry, 30 blogs in 30 days, and I think I’m going to try and tackle that. I’ve always struggled with posting to my blog and curating content to give back to the community – especially the rate as which I consume what’s out there. I have a responsibility to give back more. So, without further ado.

As I’ve been working through some projects at Plexxi, I’m starting to broaden my skill set with respect to computing as a whole, and not just the networking the machines together. And I made a realization, very quickly. I don’t know nearly as much as I’d hoped. All of the time I spent proving it _wasn’t_ the network didn’t help much in learning exactly how applications are architected, why those applications communicate the way they do and how I could actually provide some feedback to the developers and administrators that could help them.

Moving from consuming abstractions and using them to configure single devices in a network, to building workflows and abstractions that enable those workflows, with respect to an entire system has been absolutely eye opening. The networking field seems so small when looking at it from the big picture. Yes, there are considerations to be made with respect to what protocols to deploy within that network and how to architect it, what failure scenarios exist, separation of failure domains, etc, but all of those times where the defense went up when an application administrator, or developer complained that the network was the problem all seem so shameful.

What I should’ve been doing is shutting up, sucking up my pride(network engineers are a prideful bunch), and sitting down with the application administrators and developers and helping them to understand the network while they helped me to understand their systems. Sound familiar?…(DevOps for those who didn’t make the connection)

As I work through some of these integrations I feel like I’m in my first few years of my career again. I’m a stumbling, bumbling toddler trying to sort and make sense of all of these new languages and technologies and how we can architect them, and ensure they’re consumable and agile enough to be deployed through many different infrastructures and iterations. Its almost liberating. Terrifying, because I am now dealing with everything BUT what the last 8 years of my career have been, but liberating.

Talking up and back down the stack again, and beginning to understand what the bits look like in flight above L4 is an awesome experience. Reading white papers, and research papers and understanding things like eventual consistency and tying that back to something like OSPF and realizing that John Moy was a genius in his own right. Creating a distributed application, albeit primitive compared to what we have today, that has lasted decades and scaled very, very well.

So, I encourage everyone in the networking industry. Talk to the developers are your organizations. Talk to the application administrators, understand what their software looks like. Understand _why_ it was built that way and how it could possibly be refactored for better performance. Who knows, you might even find out that you actually like working on systems outside of the network and you’ll be able to make different employment decisions in the future due to the knowledge you gain through these interactions.

So that’s my advice. Drop the attitude that every network engineer seems to have, sit down and work together with everyone else. Be a bit of a hippy and embrace diversity and difference perspectives. You won’t be sorry.

EinsteinWisdom

Bitter Sweet – Moving On

I wanted to take a chance to give everyone in the community the news that I have accepted a position within the Plexxi team. I’ll be working within the Product Management team and concentrating specifically on Big Data Fabrics and what advances and gains can be made with the application of the Plexxi product (I sound like a vendor already, eh?). I will also be helping contribute to the ODL community moving forward in this role a bit as well. I’m extremely excited about the opportunity and the contributions I can make to the organization and the general advancement of the technology. That’s the sweet part.

The bitter part is the job that I am leaving, as I’ve really enjoyed bringing connectivity to the entire educational community of Connecticut, all K-20 educational institutions, and putting impressive speeds in the hands of the kids of the educational system and truly helping to bridge the digital divide that exists today. The team I worked with was impressive handling the workload that we had with such a lean staff and I will miss working with them.

The second semi-bitter part is having to give up my independent status within the community. I will no longer have the ability to participate non-vendor events such as Tech Field Day events, at least from the independent side, from now on. However, I look forward to bringing some potentially interesting solutions to the table for them to vet and help me better myself as a whole as well.

I remember about 3 years ago, talking with Brent Salisbury and Nick Buraglio about the potential impacts of SDN, back when OpenFlow had just hit the scene and we were still trying to understand what that meant and where it fit. Fast forward 3 years and I talk about seeing the network as another consumable that will ultimately be abstracted and consumed much easier than it is today, and I now get to directly contribute to making that happen within one of three companies I’d ever convinced myself that I’d leave the independent scene for. I’m excited for what the opportunity will bring and I really look forward to still delivering content back to the community as much as possible.

Here’s to the future!

TFD ONUG 2014 – Glue Networks

One of the round table discussions that was had at the Tech Field Day event at the Open Networking User Group was with Glue Networks. They offer a solution that addresses some of the challenges experience in the WAN optimization space. Without further ado, I will jump right into my views on the idea of the product and where I think the advantages and disadvantages exist today.

The first thing I took away from the discussion was the use of the phrase ‘Software Defined Networks’ when talking about the product that Glue has to offer. While they offer what looks like a competent product in the realm of configuration automation and orchestration, I don’t know if I would qualify that as ‘SDN’. Though, SDN is such an ambiguous term today, that we can probably deem it just that and be completely fine with it. I don’t necessarily qualify configuration management as ‘SDN’ in respect to how it’s being done in this case as its just templating and revision control, but if done right through APIs and libraries (OnePK in Cisco world), it could push the industry forward.

All that said, the auto provisioning and configuration management piece of the product is something that is definitely useful for companies that have 10’s or maybe 100’s of remote sites as the automation of provisioning and configuration of the device is definitely a time saver, but for organizations that have less than 10 remote sites, I don’t know if the capital increase will be offset by the operational savings on such a small number of remote sites. Though this is without seeing costs for the product.

While Glue offers a what seems to be a lot in the provisioning and configuration automation aspect of their product, I can’t really find a reason to add more sku’s to orders for another piece of software for such a corner case use, meaning companies that have a small(er) number of remote sites. Especially when we are able to perform the same configuration templating using open source tooling like python and jinja2, and configuration management frameworks like Ansible. WAN optimization is an interesting animal as well, a lot of talk was had in the realm of being able to intelligently deploy Performance Routing (PfR) without having a brain scientist on staff, but I haven’t really ever seen organizations who have had the time and patience to deploy, care and feed PfR. So, if there are any gains to be made, they would be made there.

Overall, I think the cost benefit analysis would have to be made by the architecture/engineering team on whether or not the capital expense for the product would be worth the operational impact that it would have on the environment. If you know me, then you know that I’m a firm believer in the architect having a foot in both realms (business and technical) and cost does come into play when deploying solutions. But, if you’re currently looking to throw money at a homunculus of a WAN and are looking for what seem to be quick results for minimal effort for implementation, then I would definitely give Glue a look as they know their market, their position, and their play quite well. The product looks to be polished and could definitely give you a hand if you have a large WAN and a small network staff, which we all know is a constant.

Abstraction vs Automation

There has been quite a bit of twitter buzz these last few weeks, during the OpenStack Summit, around the notion of abstraction vs. automation and I’m not really sure I understand what I’m missing here and why it almost seems both ideas are being seen as mutually exclusive. I’ll give you all my viewpoint and then I will let anyone who has a response, respond, because maybe I’m missing something.

First, let’s talk about automation. Is automation possible today? Yes, yes it is. Is it the easiest thing to accomplish? Not necessarily, but its possible. Mike Bushong from Plexxi writes about the linearity in thinking behind today’s Network Engineers, which definitely exists, but I think its because they haven’t been challenged to think any other way.

Using tools that exist today, like Ansible (I prefer this CM tool because its agentless which works well with network devices that typically have closed NOS’s), you can build out the workflows that Mike writes about all the while still instantiating configuration on the devices that need to be configured via the CLI, just in an ‘automated’ way, ie. not pushing line after line of config to the box by hand. Is this much different from Perl/Expect of the past? No. But it becomes more structured and controlled through the use of the Ansible framework.

So, in a way, we’ve already been able to ‘abstract’ that away from direct human interaction with devices. This might not necessarily be the best way we can do things, but we’re working toward that through the many efforts in the industry today. This is the point that made earlier in the post, I don’t think the Network Engineers today have been challenged with respect to their thinking when it comes to how to design the workflow from end to end. The typical concerns exist, ‘how can I connect all of these disparate devices and deliver bits as fast as possible’. When in reality, they should be thinking about delivering the service that the network supports, end to end. From how we connect the cables(L1), to how we deliver the bits(L2/L3), to how to app ingests those bits are what we should be paying attention to(L4-L7). Stop limiting ourselves in the stack and start thinking about end to end service delivery with respect to the infrastructure.

I keep thinking back to Derick’s post on the Plexxi blog as well, surrounding UX. He references how policy constructs that closely represent the design choices of the underlying system make it much more difficult for a consumer of that system to express their intent in which they want to use that system. Let me give you an example : when configuring a campus network, lets stay away from the DC for a minute, and you’re choosing the VLAN Identifiers to use within the design, does it really matter what numbers you use as identifiers? Sure you have a 12 bit unsigned integer’s worth of space to utilize, but does it really matter WHAT numbers you use? Probably not. How many times have you had to reference an excel spreadsheet to figure out which ID’s have been used and which you can use for the new segment you’re trying to configure? Does this decision still get left up to the engineer responsible for designing and instantiating the system? It sure does. That’s an example of a knob that doesn’t necessarily matter to the overlying consumption of the system (end users utilizing whatever systems ride the wires).

So, abstract that away using some sort of automation toolkit. Code into the tool that that its a 12 bit value and to sequentially utilize the range of VLANs ID’s at your disposal and you shouldn’t care what that number is. Take it a step further and make sure that can tool is aware of all of the tagged links your have in your network and adds them through whatever security workflow you’re trying to honor as well. Get where I’m going with this? Abstraction takes that decision out of the equation. It isn’t a life changing event to not have to decide which VLAN IDs to use and where they need to propagate, but its the little changes in tooling and thought process that will take us where we need to go. Once that function is abstracted, automating it is that much easier. The real effort comes in with deciding how that abstraction works. Once that is decided, consuming that abstraction becomes easier.

Overall, I’m just trying to point out that it isn’t necessarily abstraction vs. automation, or a problem that we have with today’s tools. It can all be accomplished. Again, is it necessarily easy? No. But abstraction will ultimately lead to easier automation.

Thoughts? Comments? Suggestions?

ONUG Hackathon and Tech Field Day

I’m sitting on the Amtrak headed to NYC as I type this and I wanted to throw out there how excited I am to be able to partake in two parts of the Open Networking User Group in these next few days. If you’ve spoken to me then you know how keen I am on what the next few years will bring to data networks and what the impact will be. I’m one of the analytics believers that sees a whole lot of state that exists in today’s data networks ripe and ready to be collected and crunched to help bring faster, and more agile systems for us to develop on top of in the near future.

The first part of ONUG that I will be participating in is the hackathon. I was invited a few months back to be a part of a team and I eagerly accepted the offer and we’re working on how to apply SDN concepts for use over the WAN for organizations that have a highly distributed footprint. I’ll be sure to post more information surround that when we’re done with the hackathon. The concentration on the Data Center with respect to SDN is warranted, but we shouldn’t forget about how we have to get data back to the Data Center to begin with.

The second event that I’m going to be a part of, and am equally excited about is the Tech Field Day ONUG events that will be taking place, and the vendors that we will get to talk to. It’s always a blast to participate in the Tech Field Day events as you get to talk to a lot of incredibly intelligent people about some harshly complex problems that we’re all trying to solve on a daily basis. Along with getting some exposure to the latest and greatest that companies have to offer.

Here’s to more nerdiness!

From Micro to Macro Management with Plexxi DSE

Where we are :

When working on today’s infrastructures, we’re typically weighted down with the fact that we are left to micromanage a lot of the systems within our infrastructures today. When listening to the Networking Field Day 7 recording of Nils Swart giving an overview of the Plexxi Data Services Engine again, it dawned on me. We’re shifting from micromanagers to macromanagers.

Right now we’re tasked with dealing with every little action that needs to be performed on our infrastructure, whether it be entering commands for carving off a LUN within a SAN, or configuring required resources for VMs within our compute infrastructures, or even configuring a VLAN on an interface. These are all forms of micromanagement. Excessive control and attention to detail. We’ve all worked under individuals who are micromanages, in meat space, and we’ve probably all hated it. But I want to speak more to what the opposite of micromanagement is. Macromanagement.

When looking at a system today, as engineers and architects operating under yesterdays paradigms of designing and engineering our systems, we’re constantly paying attention to Every. Little. Detail. Worrying about the flag that is set in whatever header that is used to signal something within our systems. And how to turn that flag on within a particular piece of infrastructure, whether it be issuing an API call, configuring the device via the CLI, or even ticking a check-box within a GUI. This becomes incredibly cumbersome and time consuming and leads to some of the most inefficient workflows that can cripple and eventually close the doors of today’s biggest and best businesses.

Tomorrow’s service offerings will, no doubt, revolve around speed and flexibility, more than they have in the past decade. How fast can we deploy our service, and at the same time, how quickly can we change it.

Where we are going :

There is already an industry trend going on within IT, where engineers and architects are starting to talk about the importance of having one foot in both the technical and the business and I whole heartedly agree with that. It’s vital that the individuals who are working to translate business policy into infrastructure configuration know how to speak both languages as they’ll then be able to move toward exactly what Plexxi’s Data Services Engine is doing for businesses today.

Within DSE, Plexxi is offering a message bus from toolkits and frameworks like Chef, into the the affinity component within the DSE. What this in essence does, is allow you to directly map your business structure and policy into roles within Chef which are then applied to the network through their Affinities within the DSE. No longer do you care about every little underlying detail of how the process is completed. All you do is describe, within the role, what the end product of the network should look like and allow the DSE and affinities to handle instantiating it all.

This leads back to the macromanagement comment I’d made earlier in the post. It allows you, as the Engineer / Architect / Developer / Whatever you’re called, to concentrate on translating the business semantics into policy semantics and allow the infrastructure to more accurately reflect the business as a whole. Thus leading to a tighter integration of the technology into the revenue streams of our businesses. We will work on identifying higher order business constraints, rules, definitions, etc., define them within a role, and allow the infrastructure to spontaneously move toward desired state, instead of having to worry about configuring every little bit within a device. No longer, on a micro level, do we care how the infrastructure has been configured. We can start to concentrate back on the why it is being configured that way.

As Albert Einstein(supposedly) put it, “If I had only one hour to save the world, I would spend fifty-five minutes defining the problem, and only five minutes finding the solution.”