Hung Process in IOS-XR

So, quick post to help anyone else who runs into this problem in IOS-XR land.

While attempting to issue commands related to the BGP process on the box and I was meet with no response from the box. I was able to break the process with the typical Ctrl + C process, and issue commands to other processes on the box, but the BGP process just refused to response.

Reviewing the logs, I was able to find some errors related to no response being received from the BGP process :

RP/0/RSP0/CPU0:Apr  9 22:55:42.109 : sysdb_shared_nc[382]: %SYSDB-SYSDB-6-TIMEOUT_EDM : EDM request for 'oper/ip-bgp/gl/act/shared/vrf/default/afi/' from 'bgp_show' (jid 65855, node 0/RSP0/CPU0). No response from 'bgp' (jid 1047, node 0/RSP0/CPU0) within the timeout period (100 seconds)

You can see that there is a ‘no response from ‘bgp” string in this log message. The quick and easy way to take care of a hung process like this is to restart it by issuing the following command :

RP/0/RSP0/CPU0: router#process restart 1047 location 0/RSP0/CPU0

WARNING : Issuing this command will rock the BGP process, so plan accordingly. You may experience a brief outage so schedule it during a typical maintenance window.

CCIE Bootcamp Review

It’s been almost 3 months since I’ve sat for my CCIE Bootcamp with Marko Milivojevic and I wanted to take the time to write a review for the class to help others who are considering sitting for a Bootcamp before their CCIE Lab attempts.

First I want to give you a little background on the format I was using to study for my CCIE before I was able to sit it in on the bootcamp. I was using a particular vendor’s training solutions, including their workbooks, their training videos, and their rack tokens and all seemed to be going well. I was doing relatively well on the workbook material and content and I was able to finish tasks in reasonable amounts of time. Nothing that I was tracking with any accuracy, though. Just working through training videos and workbooks and doing my best to keep pace with the timeline that I had set forth for myself for my CCIE lab date. I’d read the blog posts about speed and accuracy and other requirements for being able to pass and I’d felt confident that if I’d stayed course, I’d have a healthy chance at passing. Maybe not the first time, but the second or third for sure. So, after passing my CCIE Written at Cisco Live last year, I’d started my usual routine, and targeted this February to sit for the real thing.

From June through November, I’d put in around 200-250 lab hours and worked a decent way through some workbooks, labs and videos as I’d said, and I was fairly confident when I flew out to attend the Bootcamp in RTP. On the flight I’d decided to read up on some BGP design as I was working through some design challenges for work. And I specifically remember reading up on conditional injection and some path manipulations methods. I landed, got my rental, and made it to my Dad’s house – who conveniently lives about 30 minutes from Cisco’s RTP campus. We went out to dinner that Sunday night, and caught up on our lives as we hadn’t seen each other in a few years.

The next day came and I have to say I was excited to see what the class had in store. I was up a bit early and out on my way to the class. We had the benefit of having the bootcamp on the RTP campus, so I was able to find out where I would be going to take the lab, and take that unknown away. Once I got to the campus, I was quickly able to find the building that the bootcamp was going to be in. I walked in and I was meet with about 10 other people who were there to sit for the bootcamp as well. We all got our name tags from the greeter and we waited patiently for Marko to arrive.

Once Marko arrived he greeted us with a smile and a hand shake and we found our way to the conference room that we would be working in for the next two weeks. There was some back and forth banter for about 20-30 minutes as we first arrived and then Marko dove right in. We did a round the room introduction of ourselves and we all learned a little about each other, helping break what would become a pretty thick slab of ice within the room and the people who were in it. We were also asked to address what we thought to be our weak points in our theory so Marko could help form the class to what he thought would benefit us all the most.

He was able to gear the class toward our concerns and help us to identify the gap between theoretical and experiential knowledge. And I can NOT stress this enough. It was a larger gap that existed than I’d thought. I vividly remember him putting an example on the board that consisted of 3 routers, none of his examples consisted of much more than this, other than getting into the BGP examples later on in the week, and after some questions from Marko and a lot of uneasy quiet in the room, we weren’t able to figure out the problem in any reasonable amount of time. A room full of 15 Network Engineers all striving to obtain one of the industry’s most recognized certifications for doing just that, network engineering, were just shown that we don’t know anything about a protocol we all work with on a, probably, regular basis. Now, how much of that was “I don’t want to look like an idiot in front of my peers” syndrome, I can only speak for myself and I say it was a lot. And I would’ve looked like an idiot most of the time if I had decided to speak up.

However, I have to stress, this was not Marko’s intention. He wasn’t trying to make us feel bad about how much we knew about a particular protocol, or if we knew how that protocol would’ve reacted in a specific scenario or topology. He wanted to show us what to strive to become. What it would take to be considered a CCIE. It was on that first day that I’d realized I wasn’t doing NEARLY enough in my studies and I needed to up my game drastically.

Fast forward to Friday and 4 more days of realizing how much I needed to increase my efforts, we wrapped up the week of theory. Within this week, I kept thinking to myself, “I don’t think I’ll ever be able to pass this exam”, on more than a few occasions. A few of us from class decided to get some dinner that night and I had a conversation with Marko. I remember telling him that of everything I’ve taken away from his class so far, I’d taken one thing to heart the most. I realized how much I had to rely on myself and no one and nothing else. When you’re in the heat of the moment its the trust you have to have in yourself to methodically walk yourself through a problem and figure out exactly what it is that needs to be done to fix it. There is no Google, there is no co-worker to collaborate with. You have yourself and some horribly organized and sometimes written Cisco documentation and that’s it.

We had the weekend to recover and started in on the labs Monday morning. All I have to say about those labs is that they are pure evil. Pure, calculated, evil. They are designed to push you to your absolute limits in terms of mental stamina, and when you think you’ve got it figured out, you don’t. Go back and try again. I thought to myself, and even text my wife a few times, “I don’t want to / can’t get this certification and I don’t know why I am wasting my time”. She would text me back and encourage me to stick with it and that everything would be OK. I even thought about leaving the class, that I was wasting my time and I didn’t want the stupid certification. What was to be a 2 hour troubleshooting section took me the better part of 12 hours and don’t even get me started on the configuration!

That said, the labs were tough, but they were fair. Nothing in the labs was anything we shouldn’t be expected to see. It was my lack of closing the gap between theoretical and experiential that was breaking me. Circling back to the reliance on myself, it was myself that was failing me. Nothing more. It was now that I was realizing that a lot of what I’d read online about time saving techniques for typing, etc were just a sham. After all of this was said and done, I took that one over-arching theme back to my studies outside of the class. I needed to up my personal game and investment.

Fast forward to today and I’m writing this post in my basement office at 0045 after a 3 hour QoS lab and I’m feeling much more confident with where I am at in terms of the lab. I hope to pass in a single attempt, but will be completely happy if it takes me multiple times. Reason being, in that time between the bootcamp and now, I’ve slowed down and tried to listen to what my peers has been telling me from the beginning. Obtaining your CCIE is much less about the number and far more about the journey.

To sum this up in a few words, go take the bootcamp with Marko. I regret nothing about the time I spent in his class and I believe it made me a better person, both personally and professionally. So much so, in fact, that if he offers an SP class when I decide to go for that, he will be the first instructor I turn to to invest my time and money into.

Thank you Marko for helping me take myself to that next level. The level required to really call yourself a CCIE.



I ran into an interesting problem a few months back, but haven’t had a chance to blog about it.

Working with a Cisco GET VPN environment we have deployed, we were notified by our ISP that they would be performing some “circuit grooming”. That phrase always makes me cringe. The usual “this shouldn’t effect your connectivity at all” emails came and went.

Needless to say, the maintenance was performed and all of a sudden all of the GM routers were throwing errors pertaining to anti-replay. And as we all know with anti-replay, that’s a bad thing. That means the tunnels are toast. GET VPN uses a psudo-timer solution to protect against replay attacks, called Time Based Anti-Replay(TBAR). This psuedo-time is handled by the Key Server within the GDOI.

Long story short, no traffic was passing. Entering the “clear crypto gdoi” command took care of this. We did some shallow research and found that there was a particular bug for the version of code that we were running and patched accordingly. A few weeks went by and the problem presented itself again. The same “clear crypto gdoi” command, again, took care of the problem. Checking the GDOI psuedo-time that was negotiated, I noticed a sizable delta between the server time and the time that the branch router was reporting in at.

This gets me to the point of my story. Cisco’s Embedded Event Manager(EEM).

EEM is a fantastic subsystem built into IOS. It allows you to build an “applet” that can respond to certain criteria that you can define as triggers.

Being new to EEM, I wrote a less than complex script to parse the syslog buffer of the remote router, looking for a particular replay error, and issue the ‘clear crypto GDOI’ command.

Something along the lines of such :

event manager applet GDOI_TBAR_RESET
event syslog msg "%CRYPTO-4-PKT_REPLAY_ERR"
action 1.0 cli command "clear crypto gdoi" pattern "confirm"
action 2.0 cli command "yes"
action 3.0 syslog msg "Clear Crypto GDOI command issued due to Anti-Replay Error!"

This doesn’t being to scratch the surface of EEM, but it still taught me a little bit about the structure of it and how it can function. I plan to make an update to it in the near future once I have the time to track down the OID that issues that particular error within IOS.

If anyone has any interesting stories / uses for EEM, feel free to leave a comment below. I’m always up for tips and tricks!

Up, Up and Away


It’s been quite some time since I’ve posted to my blog and I feel it necessary to check in with everyone. I’ve recently passed the ROUTE portion of my CCNP and I plan on sitting for the SWITCH in a month or two. Only because I work with switching hardware more than anything else, do I feel more confident in that area of the ciriculuum. After that will obviously be the TSHOOT and that will conclude the track for my NP. If you’d asked me a year ago if I’d thought I’d be this far along in my studies I would have probably laughed at you. I thought of having passed even the NA to be a feat that I might not be able to obtain. But God and my family continue to be my driving force behind working as hard as I do. So, I will keep you all updated on when I sit for the SWITCH.

As for my WGU studies, I’ve not been concentrating on them as much as I should be, and plan on kicking that into high gear starting right after Christmas. I’m taking no technical courses this semester and I think that has derailed me a bit on wanting to study any of the course materials. But, I will do it to get it out of the way so I can finally finish the degree.

I’ve recently been thinking about what it would take to obtain a CCIE and I think I am going to continue on my Cisco studies after the NP and complete the IE. I want to have those numbers and have that feeling of accomplishment that so many other before me have. I am slowly building my library for the studies and I plan on having all of the litterature at arms length within the next few months. I am currently going by the reading list on Cisco’s learning network, but if anyone has any suggestions, I would gladly take them. I know this is going to be a long and arduous process. But totally worth it.

I’ve also been in my new position with CSC now for almost 90 days. I am working with a group of really smart guys and I am learning a lot from them. It’s definitely contributing to the wealth of knowledge that I know will be required for my attempt at an IE R&S. But, through my studies, I’ve also been able to contribute to the team, which is definitely a fulfilling experience. Being able to apply concepts and theory that I’ve studied so hard and watching them play out just as I’ve done in many, many lab scenarios.

I’ve got a few ideas still jotted down for some technical posts and I promise I will post them soon. Just trying to wrap up some loose ends before the holidays are here and gone. Off the top of my head, I can think of a RADUIS implementation post using Server 2008 R2 authenticating against Active Directory. While using Wireshark to verify PSK exchange between the RADIUS server and the end point device. Stay tuned!


Becoming Vendor Agnostic

As you all know, I am studying for my CCNP currently and enjoying every minute of it. And I recently started following many, many Network Engineers or Network oriented individuals on Twitter and Google+. For a long time, I had that usual stigma that Cisco was the best network hardware/software company that existed. It wasn’t until I started following all of these individuals that I realized that I was wrong. Dead wrong. In fact, with Cisco’s current lay-offs, I started thinking that I shouldn’t be putting all of my eggs in one basket. Cisco is indeed a giant in the networking industry and no one can argue the abilities of their equipment and software. But with 6,000 projected lay-offs, that would make anyone stop and take a look at what’s going on in the industry.

Companies like Juniper, Brocade, and HP and many others all have viable alternatives to the Cisco equipment that we all know and love. And the equipment performs on par, if not better in some circumstances by what I’ve seen in social media these days.

This led me to look into studying for some Juniper certifications, as I’ve noticed a lot of attention being given to this company by the people I follow. Come to find out, Juniper is offering a program right now called Fast Track. It’s designed for any type of professional in the networking industry, be it entry level with very little knowledge, to people who have been working the field for decades. You can sign up – login  – and start using their study materials to learn the in’s and out’s of the Juniper OS called JunOS. Along with the free training they’re providing, once you pass their assessments, they give you discounted vouchers to go and get the certifications. I think it’s a win-win for everyone involved.

I plan to move my studies toward a more broad horizon for the industry and allow myself to gain as much knowledge about all vendors as possible, and make myself more marketable in the long run. I suggest anyone looking to pursue this path, do the same. We all know that companies are like countries, and they can never last forever. And I do NOT want to be one of the people left high and dry because I am only trained and seasoned in one vendor’s equipment.

Any thoughts?

Work Stuff – More Studying – Collapsing the Data Center


I created a task for myself at work, not that my work load hadn’t already been enough.

I wanted to implement Active Directory authentication on our Alcatel-Lucent equipment using a Windows Server 2008 R2 server acting as an NPS server. I wasn’t able to find much documentation on this at all on the web or at Alcatel Unleashed or even Alcatel’s configuration manuals or user guides. They were, however, nice enough to provide me with the Vendor Specific Attributes(VSA’s) that needed to be added to the RADIUS server to provide the appropriate information to the device. I will post the complete write-up on that at a later time, as I am still writing the documentation at work and I want to finalize it before I post it.

On a better note! I’ve been doing a lot of CCNP preparation as of late, and I am starting to feel more and more comfortable with the concepts and application of the material. I’ve worked my way through EIGRP and OSPF and I have moved on to route distribution. The more I read and the more I work in my home lab and GNS3, the more thirsty I become for learning and digesting anything network oriented. Using route-maps and ACL’s to efficiently distribute routes between different domains or to assign specific metrics to routes to make it even more efficient! I find myself analyzing everything at work to see how I can make it more efficient. Which can’t be a bad thing, and they certainly benefit from it.

OK, I’m sure you’ve heard enough in my posts about my studies. Let’s talk about some industry buzz for a while. Collapsing the data center, in a good way. In the past, traditional networks required an abundant use of distribution layer switches to communicate with the core layer of the network. The article states that there isn’t as much of need for the distribution layer anymore. That the access layer switches could communicate directly with the core for core services. As we move further into the future and the more I work with these types of technology, the more I realize that there may actually be some logic to this theory. In the past, equipment couldn’t be considered as reliable as it today. Decreases in power demand, footprint and cost, and increases in reliability and performance are starting to allow devices to last longer and produce better results. This leaves us with the option to start eliminating some distribution layer devices and start uplinking access devices directly to the core.

Most arguments I’ve heard against this idea addresses redundancy and availability. But, if you have an access layer switch uplinked to a distribution layer device which then uplinked to the core, and that distribution layer switch were to fail, you’re still left with the same results as if the access layer switch failed. No connectivity.  So, the idea of less complexity and more performance is always something we’re all keen on as network junkies.

Another topic I’d like to discuss in a later post would be “cloud” technologies. While good in theory, I think the world is in for a bit of an eye-opener when we really start moving heavily toward companies who provide a cloud service. I see it as one giant security threat. Even with the ideas of public, private, or hybrid clouds. But, I digress. I will write that up in another post.

Though I’m still new to the in depth studies and theories of networks, I still strive to make the best judgements with what I know. Please feel free to add input or correct me in any of my statements.

Thanks for reading!,