how to build a site



How to… Find a fault with your hands in your pockets

Head to the lights...

Being in IT, you can end up in some crazy places. On this occasion it’s an airbase, admittedly it’s in a comms room on an airbase but it’s an exciting airbase nonetheless.

The fault has come in as intermittent loss of connectivity. For the whole airbase. That translates to military planes being diverted to other locations. When we talk of the days of IT not killing anyone, we are here starting to see that IT has some real impacts on people and operations.

Intermittent faults are a nightmare. Always complicated to find, generally hidden in something you don’t know and often as the result of some unexpected and complex interaction between systems. This particular one has gathered so much attention that the “go to site” call has been made and as such the engineer and comms room meet again.

This has been an extended visit as the fault has not happened and at this point in time there is a rather grumpy man in a uniform rattling away about the criticality of the airbase to the nation’s security. That is the exact moment when the fault occurs.

A combination of phones, pagers and the addition of more grumpy people to the comms room make it abundantly clear that the airbase is well and truly off the air and it is time for the engineer to start looking at laptops and typing furiously.

However, this is not what our hero engineer does. He instead walks down the aisle and points to a box in a rack that has just lit up like a Christmas tree. Fault finding is sometimes that simple.

For the curious, the box in question was what we know of as a bridge. This is basically connecting two layer 2 domains together, often over a distance. You don’t see bridges much these days, the advent of IP has pretty much killed them however in a different view you actually have more of them than ever in the form of multi-port bridges, or switches, as we know them today. The older of you will have spotted that the problem here was a loop. Bridges don’t like loops. Loops create storms, storms shut down airbases.



Learning point for everyone:

Something that took me a long time to learn was that to fix something in the world of IT you don’t always need to get a load of equipment out.

Often standing back and asking questions is far more useful than getting your toolkit out. Generally speaking most things did work, then stopped working. That means only 1 thing – something changed.

Find what changed and you’ll have a pretty good idea what’s gone wrong. Often the change is human initiated, occasionally it is a hardware fault sometimes it’s a disk full or a software ‘glitch’ but in most cases just have a look for the change and you’ll find the problem.

Business details

Registered company no. 11869849 
VAT number 317720513