User defined UX

When looking at all of the different ways that we can interact with the systems that exist within the realm  of IT today, we’ve all stopped at some point in time and realized how painful it was to use a product that is being sold by X vendor. We have different ideas and ways that we think the user experience of this product should flow and we’re disappointed that we can’t use it in the logical order in which we’ve interpreted for this particular operation. Whether it be CLI, API, GUI, etc. – we’re always upset that the experience just isn’t there. It doesn’t meet _our_ needs.

Stepping back and taking a look at the world around us, we’re not at all thrilled with the overall user experience of anything that we use today. We’re constantly complaining about where button Y is located, or why knob Z isn’t in the location that we consider logical. A good example is the location of the unlock door button in my Chevy Cruze that I bought last year. I’ve been programmed, over the last 15 years, in understanding that the door unlock button was located ON THE DOOR, and in the case of the Cruze (and other new makes and models)  –  they’ve relocated the door unlock button to the center console. Now, every time I go to lock and unlock the doors of my vehicle, I’m reaching to the wrong place at the wrong time. This is what I would call bad user experience. I would also call this a terrible example of first world problems, but I will use the illustration none the less. It doesn’t fit the logic that I have been accustomed to over the past decade and a half.

Something as simple as the location of a button has caused a small amount of grief for me when operating my vehicle. Can I still use the vehicle? Sure, but its going to take some getting used to and modifying my ‘unlock car door’ workflow. Stepping back and looking at the bigger picture, and understanding that the next generation of kids who inherit these cars as their first modes of transportation, this will be the way they understand the location of that single button within the referential space of the vehicle. Not on the door like our generation understands it. We both have different interpretations, either preconceived or unbiased through the use of the product.

UX is a fluid concept, not something that’s etched in stone. It’s entirely subjective to the individual who is actually using the product. And that subjectiveness is usually driven by some sort of perspective that the user has built, over time, on how something should look, feel and operate within the logical constraints of how they, personally, view the system.

We’re not concentrating on the fact that everyone interprets UX differently. We’ve talked before about how the network world is largely workflow driven, but we haven’t taken the time to understand that everyone has small nuances to how they perform a certain workflow. There are a certain set of atomic constructs that we use to accomplish a certain task, and we all string them together a bit differently throughout the execution of our workflow. This is why we think that UX sucks on every product out there. It isn’t entirely customized to our logical interpretation of how that task should be accomplished.

When talking about defining the workflows that we need to with respect to administrating an IT infrastructure, the same atomic elements are used when creating that workflow. VLAN X needs to propagate throughout Y infrastructure and connect Z nodes to comprise whatever system it is that we’re building. Now the nuances of that particular workflow come into play. We may see it fit to ensure that the VLAN has been created through a sort of show command or log parse before moving on to step 2, or we may not. We may want to publish that information to another system other than validating it through the platform that we’re working with currently. And that’s just service instantiation, that doesn’t include validation, troubleshooting, tearing a service down, etc.

Everyone interacts with these systems in a different manner and we all bring certain nuances, and history, to the table through those interactions. It would be best to be able to provide a user defined UX that fits, precisely, the workflow that a particular user is looking for. A workflow modeling tool that allows you to manipulate the atomic tasks that are well defined, but allow you to define how, when, and where, and in what order to use them. This will provide you with the user experience that everyone is craving within IT today.

the illusion of perfection

Let’s face it, anyone who has the mindset of an engineer likes to deal in exacts. Exact measurements in any respect are required to fit together the incredibly complex legos that exist in all realms of engineering. Through this quest for definites, when we’re engineering a solution to a problem we’re almost immediately trying to account for every possible situation that this solution might be used in, and every possible failure scenario that might crop up when using the solution. But what we need to realize is that this quest for perfection when designing this solution is an illusion.

The embodiment of perfection is a marriage of knowledge and perspective. Both of these factors are ever changing, especially in the IT world, like I work in. And with this ever shifting plane of knowledge and perspective, it would be silly to look at something that I’m designing at this particular point in time and expect to get any semblance of perfection out of it as I will ultimately learn something new or acquire a new perspective, through the process of engineering, that can then be tied into the solution that I’d already declared complete and ‘perfect’. Thereby making the entire process cyclical. I see it kinda something like this :

Screen Shot 2014-10-28 at 10.50.41 PM

This isn’t a new concept, iterative engineering is a thing, but I think a lot of operators in the IT world are constantly striving for what would be considered a ‘perfect’ design of the respective platforms that they’re working on. And when that ‘perfection’ is never realized a lot of things can happen, whether it be an outage due to missing something in the design due to lack of knowledge, or lack of perspective on the environment you’re dealing with or how a certain aspect of the service should’ve been engineered. Then throw in the fallible nature of the human psyche and you’re asking for a disaster. Pride comes into play, feelings get hurt, words fly. It’s not pretty. I’ve been part of these types of interactions and they are NOT fun.

With that, we should realize that we should prepare for the fact that our idea of perfection will ever shift as we grow and learn and apply what we’ve seen. With this I leave you with a little anecdote of what made me think of this.

A few months ago I visited a friend who works for a particular organization whom are doing some pretty amazing things in the realm of technology right now. Whether it be open sourcing certain internal tooling, or using their presence to back certain directions in the industry. Hint : It’s something to do with a big blue ‘F’.

When I’d arrived at the facility, I walked into the building and proceeded to check in on the iPad at the reception counter, and I noticed that the iPad was sitting on a little wooden stand that had some words burned into it, “Done is better than perfect”, because you’ll never attain true perfection anyway.

The Whole Picture

Greg Ferro recently presented a challenge to the industry, 30 blogs in 30 days, and I think I’m going to try and tackle that. I’ve always struggled with posting to my blog and curating content to give back to the community – especially the rate as which I consume what’s out there. I have a responsibility to give back more. So, without further ado.

As I’ve been working through some projects at Plexxi, I’m starting to broaden my skill set with respect to computing as a whole, and not just the networking the machines together. And I made a realization, very quickly. I don’t know nearly as much as I’d hoped. All of the time I spent proving it _wasn’t_ the network didn’t help much in learning exactly how applications are architected, why those applications communicate the way they do and how I could actually provide some feedback to the developers and administrators that could help them.

Moving from consuming abstractions and using them to configure single devices in a network, to building workflows and abstractions that enable those workflows, with respect to an entire system has been absolutely eye opening. The networking field seems so small when looking at it from the big picture. Yes, there are considerations to be made with respect to what protocols to deploy within that network and how to architect it, what failure scenarios exist, separation of failure domains, etc, but all of those times where the defense went up when an application administrator, or developer complained that the network was the problem all seem so shameful.

What I should’ve been doing is shutting up, sucking up my pride(network engineers are a prideful bunch), and sitting down with the application administrators and developers and helping them to understand the network while they helped me to understand their systems. Sound familiar?…(DevOps for those who didn’t make the connection)

As I work through some of these integrations I feel like I’m in my first few years of my career again. I’m a stumbling, bumbling toddler trying to sort and make sense of all of these new languages and technologies and how we can architect them, and ensure they’re consumable and agile enough to be deployed through many different infrastructures and iterations. Its almost liberating. Terrifying, because I am now dealing with everything BUT what the last 8 years of my career have been, but liberating.

Talking up and back down the stack again, and beginning to understand what the bits look like in flight above L4 is an awesome experience. Reading white papers, and research papers and understanding things like eventual consistency and tying that back to something like OSPF and realizing that John Moy was a genius in his own right. Creating a distributed application, albeit primitive compared to what we have today, that has lasted decades and scaled very, very well.

So, I encourage everyone in the networking industry. Talk to the developers are your organizations. Talk to the application administrators, understand what their software looks like. Understand _why_ it was built that way and how it could possibly be refactored for better performance. Who knows, you might even find out that you actually like working on systems outside of the network and you’ll be able to make different employment decisions in the future due to the knowledge you gain through these interactions.

So that’s my advice. Drop the attitude that every network engineer seems to have, sit down and work together with everyone else. Be a bit of a hippy and embrace diversity and difference perspectives. You won’t be sorry.

EinsteinWisdom

Bitter Sweet – Moving On

I wanted to take a chance to give everyone in the community the news that I have accepted a position within the Plexxi team. I’ll be working within the Product Management team and concentrating specifically on Big Data Fabrics and what advances and gains can be made with the application of the Plexxi product (I sound like a vendor already, eh?). I will also be helping contribute to the ODL community moving forward in this role a bit as well. I’m extremely excited about the opportunity and the contributions I can make to the organization and the general advancement of the technology. That’s the sweet part.

The bitter part is the job that I am leaving, as I’ve really enjoyed bringing connectivity to the entire educational community of Connecticut, all K-20 educational institutions, and putting impressive speeds in the hands of the kids of the educational system and truly helping to bridge the digital divide that exists today. The team I worked with was impressive handling the workload that we had with such a lean staff and I will miss working with them.

The second semi-bitter part is having to give up my independent status within the community. I will no longer have the ability to participate non-vendor events such as Tech Field Day events, at least from the independent side, from now on. However, I look forward to bringing some potentially interesting solutions to the table for them to vet and help me better myself as a whole as well.

I remember about 3 years ago, talking with Brent Salisbury and Nick Buraglio about the potential impacts of SDN, back when OpenFlow had just hit the scene and we were still trying to understand what that meant and where it fit. Fast forward 3 years and I talk about seeing the network as another consumable that will ultimately be abstracted and consumed much easier than it is today, and I now get to directly contribute to making that happen within one of three companies I’d ever convinced myself that I’d leave the independent scene for. I’m excited for what the opportunity will bring and I really look forward to still delivering content back to the community as much as possible.

Here’s to the future!

TFD ONUG 2014 – Glue Networks

One of the round table discussions that was had at the Tech Field Day event at the Open Networking User Group was with Glue Networks. They offer a solution that addresses some of the challenges experience in the WAN optimization space. Without further ado, I will jump right into my views on the idea of the product and where I think the advantages and disadvantages exist today.

The first thing I took away from the discussion was the use of the phrase ‘Software Defined Networks’ when talking about the product that Glue has to offer. While they offer what looks like a competent product in the realm of configuration automation and orchestration, I don’t know if I would qualify that as ‘SDN’. Though, SDN is such an ambiguous term today, that we can probably deem it just that and be completely fine with it. I don’t necessarily qualify configuration management as ‘SDN’ in respect to how it’s being done in this case as its just templating and revision control, but if done right through APIs and libraries (OnePK in Cisco world), it could push the industry forward.

All that said, the auto provisioning and configuration management piece of the product is something that is definitely useful for companies that have 10’s or maybe 100’s of remote sites as the automation of provisioning and configuration of the device is definitely a time saver, but for organizations that have less than 10 remote sites, I don’t know if the capital increase will be offset by the operational savings on such a small number of remote sites. Though this is without seeing costs for the product.

While Glue offers a what seems to be a lot in the provisioning and configuration automation aspect of their product, I can’t really find a reason to add more sku’s to orders for another piece of software for such a corner case use, meaning companies that have a small(er) number of remote sites. Especially when we are able to perform the same configuration templating using open source tooling like python and jinja2, and configuration management frameworks like Ansible. WAN optimization is an interesting animal as well, a lot of talk was had in the realm of being able to intelligently deploy Performance Routing (PfR) without having a brain scientist on staff, but I haven’t really ever seen organizations who have had the time and patience to deploy, care and feed PfR. So, if there are any gains to be made, they would be made there.

Overall, I think the cost benefit analysis would have to be made by the architecture/engineering team on whether or not the capital expense for the product would be worth the operational impact that it would have on the environment. If you know me, then you know that I’m a firm believer in the architect having a foot in both realms (business and technical) and cost does come into play when deploying solutions. But, if you’re currently looking to throw money at a homunculus of a WAN and are looking for what seem to be quick results for minimal effort for implementation, then I would definitely give Glue a look as they know their market, their position, and their play quite well. The product looks to be polished and could definitely give you a hand if you have a large WAN and a small network staff, which we all know is a constant.

Abstraction vs Automation

There has been quite a bit of twitter buzz these last few weeks, during the OpenStack Summit, around the notion of abstraction vs. automation and I’m not really sure I understand what I’m missing here and why it almost seems both ideas are being seen as mutually exclusive. I’ll give you all my viewpoint and then I will let anyone who has a response, respond, because maybe I’m missing something.

First, let’s talk about automation. Is automation possible today? Yes, yes it is. Is it the easiest thing to accomplish? Not necessarily, but its possible. Mike Bushong from Plexxi writes about the linearity in thinking behind today’s Network Engineers, which definitely exists, but I think its because they haven’t been challenged to think any other way.

Using tools that exist today, like Ansible (I prefer this CM tool because its agentless which works well with network devices that typically have closed NOS’s), you can build out the workflows that Mike writes about all the while still instantiating configuration on the devices that need to be configured via the CLI, just in an ‘automated’ way, ie. not pushing line after line of config to the box by hand. Is this much different from Perl/Expect of the past? No. But it becomes more structured and controlled through the use of the Ansible framework.

So, in a way, we’ve already been able to ‘abstract’ that away from direct human interaction with devices. This might not necessarily be the best way we can do things, but we’re working toward that through the many efforts in the industry today. This is the point that made earlier in the post, I don’t think the Network Engineers today have been challenged with respect to their thinking when it comes to how to design the workflow from end to end. The typical concerns exist, ‘how can I connect all of these disparate devices and deliver bits as fast as possible’. When in reality, they should be thinking about delivering the service that the network supports, end to end. From how we connect the cables(L1), to how we deliver the bits(L2/L3), to how to app ingests those bits are what we should be paying attention to(L4-L7). Stop limiting ourselves in the stack and start thinking about end to end service delivery with respect to the infrastructure.

I keep thinking back to Derick’s post on the Plexxi blog as well, surrounding UX. He references how policy constructs that closely represent the design choices of the underlying system make it much more difficult for a consumer of that system to express their intent in which they want to use that system. Let me give you an example : when configuring a campus network, lets stay away from the DC for a minute, and you’re choosing the VLAN Identifiers to use within the design, does it really matter what numbers you use as identifiers? Sure you have a 12 bit unsigned integer’s worth of space to utilize, but does it really matter WHAT numbers you use? Probably not. How many times have you had to reference an excel spreadsheet to figure out which ID’s have been used and which you can use for the new segment you’re trying to configure? Does this decision still get left up to the engineer responsible for designing and instantiating the system? It sure does. That’s an example of a knob that doesn’t necessarily matter to the overlying consumption of the system (end users utilizing whatever systems ride the wires).

So, abstract that away using some sort of automation toolkit. Code into the tool that that its a 12 bit value and to sequentially utilize the range of VLANs ID’s at your disposal and you shouldn’t care what that number is. Take it a step further and make sure that can tool is aware of all of the tagged links your have in your network and adds them through whatever security workflow you’re trying to honor as well. Get where I’m going with this? Abstraction takes that decision out of the equation. It isn’t a life changing event to not have to decide which VLAN IDs to use and where they need to propagate, but its the little changes in tooling and thought process that will take us where we need to go. Once that function is abstracted, automating it is that much easier. The real effort comes in with deciding how that abstraction works. Once that is decided, consuming that abstraction becomes easier.

Overall, I’m just trying to point out that it isn’t necessarily abstraction vs. automation, or a problem that we have with today’s tools. It can all be accomplished. Again, is it necessarily easy? No. But abstraction will ultimately lead to easier automation.

Thoughts? Comments? Suggestions?

ONUG Hackathon and Tech Field Day

I’m sitting on the Amtrak headed to NYC as I type this and I wanted to throw out there how excited I am to be able to partake in two parts of the Open Networking User Group in these next few days. If you’ve spoken to me then you know how keen I am on what the next few years will bring to data networks and what the impact will be. I’m one of the analytics believers that sees a whole lot of state that exists in today’s data networks ripe and ready to be collected and crunched to help bring faster, and more agile systems for us to develop on top of in the near future.

The first part of ONUG that I will be participating in is the hackathon. I was invited a few months back to be a part of a team and I eagerly accepted the offer and we’re working on how to apply SDN concepts for use over the WAN for organizations that have a highly distributed footprint. I’ll be sure to post more information surround that when we’re done with the hackathon. The concentration on the Data Center with respect to SDN is warranted, but we shouldn’t forget about how we have to get data back to the Data Center to begin with.

The second event that I’m going to be a part of, and am equally excited about is the Tech Field Day ONUG events that will be taking place, and the vendors that we will get to talk to. It’s always a blast to participate in the Tech Field Day events as you get to talk to a lot of incredibly intelligent people about some harshly complex problems that we’re all trying to solve on a daily basis. Along with getting some exposure to the latest and greatest that companies have to offer.

Here’s to more nerdiness!

From Micro to Macro Management with Plexxi DSE

Where we are :

When working on today’s infrastructures, we’re typically weighted down with the fact that we are left to micromanage a lot of the systems within our infrastructures today. When listening to the Networking Field Day 7 recording of Nils Swart giving an overview of the Plexxi Data Services Engine again, it dawned on me. We’re shifting from micromanagers to macromanagers.

Right now we’re tasked with dealing with every little action that needs to be performed on our infrastructure, whether it be entering commands for carving off a LUN within a SAN, or configuring required resources for VMs within our compute infrastructures, or even configuring a VLAN on an interface. These are all forms of micromanagement. Excessive control and attention to detail. We’ve all worked under individuals who are micromanages, in meat space, and we’ve probably all hated it. But I want to speak more to what the opposite of micromanagement is. Macromanagement.

When looking at a system today, as engineers and architects operating under yesterdays paradigms of designing and engineering our systems, we’re constantly paying attention to Every. Little. Detail. Worrying about the flag that is set in whatever header that is used to signal something within our systems. And how to turn that flag on within a particular piece of infrastructure, whether it be issuing an API call, configuring the device via the CLI, or even ticking a check-box within a GUI. This becomes incredibly cumbersome and time consuming and leads to some of the most inefficient workflows that can cripple and eventually close the doors of today’s biggest and best businesses.

Tomorrow’s service offerings will, no doubt, revolve around speed and flexibility, more than they have in the past decade. How fast can we deploy our service, and at the same time, how quickly can we change it.

Where we are going :

There is already an industry trend going on within IT, where engineers and architects are starting to talk about the importance of having one foot in both the technical and the business and I whole heartedly agree with that. It’s vital that the individuals who are working to translate business policy into infrastructure configuration know how to speak both languages as they’ll then be able to move toward exactly what Plexxi’s Data Services Engine is doing for businesses today.

Within DSE, Plexxi is offering a message bus from toolkits and frameworks like Chef, into the the affinity component within the DSE. What this in essence does, is allow you to directly map your business structure and policy into roles within Chef which are then applied to the network through their Affinities within the DSE. No longer do you care about every little underlying detail of how the process is completed. All you do is describe, within the role, what the end product of the network should look like and allow the DSE and affinities to handle instantiating it all.

This leads back to the macromanagement comment I’d made earlier in the post. It allows you, as the Engineer / Architect / Developer / Whatever you’re called, to concentrate on translating the business semantics into policy semantics and allow the infrastructure to more accurately reflect the business as a whole. Thus leading to a tighter integration of the technology into the revenue streams of our businesses. We will work on identifying higher order business constraints, rules, definitions, etc., define them within a role, and allow the infrastructure to spontaneously move toward desired state, instead of having to worry about configuring every little bit within a device. No longer, on a micro level, do we care how the infrastructure has been configured. We can start to concentrate back on the why it is being configured that way.

As Albert Einstein(supposedly) put it, “If I had only one hour to save the world, I would spend fifty-five minutes defining the problem, and only five minutes finding the solution.”

Hung Process in IOS-XR

So, quick post to help anyone else who runs into this problem in IOS-XR land.

While attempting to issue commands related to the BGP process on the box and I was meet with no response from the box. I was able to break the process with the typical Ctrl + C process, and issue commands to other processes on the box, but the BGP process just refused to response.

Reviewing the logs, I was able to find some errors related to no response being received from the BGP process :

RP/0/RSP0/CPU0:Apr  9 22:55:42.109 : sysdb_shared_nc[382]: %SYSDB-SYSDB-6-TIMEOUT_EDM : EDM request for 'oper/ip-bgp/gl/act/shared/vrf/default/afi/' from 'bgp_show' (jid 65855, node 0/RSP0/CPU0). No response from 'bgp' (jid 1047, node 0/RSP0/CPU0) within the timeout period (100 seconds)

You can see that there is a ‘no response from ‘bgp” string in this log message. The quick and easy way to take care of a hung process like this is to restart it by issuing the following command :

RP/0/RSP0/CPU0: router#process restart 1047 location 0/RSP0/CPU0

WARNING : Issuing this command will rock the BGP process, so plan accordingly. You may experience a brief outage so schedule it during a typical maintenance window.

One CLI to rule them all? It’s more than that.

Previous to the Tail-f Networking Field Day presentation, I had done some research on who Tail-f were and generally understood how their NCS product worked, but after sitting with Carl Moberg and his team and listening to what this device can offer, I was immediately drawn in. It would seem they’ve developed an appliance that can effectively, as Nick Buraglio wrote in his post here, the Rosetta Stone for all of the vendor NOSes that exist today.

Carl started the presentation with the phrase “hopefully we’ll show you something that will make you really upset, in a very good way”. And I think he did just that. He proceeded to tell us how Tail-f was founded and what they were up to for the first few years of their existence. There is a good chance that if you’ve used any current generation networking kit from the large names we all know and love, you’ve used Tail-f software and just haven’t been told you’re doing so. He then pointed out the pain point of configuring multiple OSes within an environment and handling the operational load of configuration management, service provisioning, revision control, etc.

It’s cumbersome to have to configure a mixed vendor environment between all of the syntax that most vendors pride themselves in maintaining. This has always been a pain point for operators and running a few mixed vendor networks in the past has always been a pain point for me. Especially jumping quickly between devices and having to switch gears into either OS. Even in the network I’m in today jumping between IOS-XR, IOS-XE, IOS, and now Junos, I find myself mixing up syntax all the time.

Enter NCS. Tail-f offers an appliance that can bring some simplicity to the idea of running multiple NOSes in today’s networks. But it can also do more than that. It allows you to model your services within the appliance and then deploy it throughout the network while also maintaining integrity of that service across your infrastructure through the transaction modeling done by the appliance when applying configurations to devices. This allows for change tracking and validation when rolling services out to network OSes that don’t offer any type of two phased commit process. Once a command is entered we all know there is no turning back on those platforms. This brings a certain level of sanity check to the process as well allowing the NCS to validate configuration before rolling it out to the device. Think about it like this, NCS knows about point A, the current state of the devices within your network and you describe to NCS point B, the service you want to instantiate, and NCS will handle getting you there along with making sure you don’t lose any limbs in the process.

Though NCS offers a lot of functionality through the use of their transaction process, I don’t think this is the biggest feature that NCS has to offer. The biggest thing that I have to say Tail-f offers with their NCS platform is the use of the YANG modeling language to model services within their product. YANG is a language that was written for use with modeling NETCONF. Since Tail-f utilized both NETCONF and YANG to build their service modeling platform on, along with other southbound protocols, I would say that they’re not your typical American Hustle(see what I did there Stephen?) tech company like we have here in the US, and I should add that Tail-f is a Swedish based company. Their service is built using completely open standards allowing possible future portability to other platforms, such as OpenDaylight (shameless plug). OpenDaylight’s entire Model Driven Service Abstraction Layer (MD-SAL) architecture will be operating on YANG modeling and what is built within NCS could possibly be ported into ODL. However, I’m starting to write some modules in YANG and I’m nowhere near expert in the language and still have a lot to learn, but the potential is definitely there.

The Tail-f product no doubt offers a ton of functionality in the way of faster operational application throughout the network, along with full fledged service modeling within their NCS solution. The ability to apply two phased commits is also quite the addition to the network devices today as we don’t have that functionality within the monolithic network OSes today. Being able to apply the two phased logic to something like Cisco’s IOS and IOS-XE bring a better version control along with operational flexibility to roll a change back should any problems present themselves. Along with all of this functionality is the fact that Tail-f has decided to build their solution on the use of YANG modeling. An open source modeling language developed for use with NETCONF, and that fits with the ability to possibly port service models through different products in the future. This would eliminate a HUGE operational burden to organizations that use lock-in products much like Infoblox’s NetMRI, etc.

Pay attention to Tail-f, I think you’ll see some pretty cool things come down the pipe, especially speaking with the team on their plans with ODL (shameless plug again). Pay attention to my blog as well as I saddle up my YANG Unicorn (YANGicorn?) and ride it around a while as I definitely see a future in spending the time to learn the intricacies of a language such as YANG.