Thursday, May 1, 2014

Finding the Bottleneck

Hi Friends,

One of the most difficult things to do in a production datacenter is finding the bottleneck.  There's ALWAYS a bottleneck.  When you first build your infrastructure you may not notice the bottleneck, but it's there, waiting...  Ever played that game Whack-a-Mole?

If you've never played, the idea is pretty simple.  When one of the little guys sticks his head out of his hole you whack him with the included hammer as seen in the image.  Where he sticks his head out is random and there's a lot of watching, quick reflexes and usually lots of colorful language that goes along with it.

Why do I bring up Whack-a-Mole?  Well, finding the bottleneck in your datacenter is a lot like this game.  You're not sure where it's going pop up, when, how often, and it causes lots of watching, quick reflexes and usually lots of colorful language!  Is it the Storage?  Compute?  Network?

The way it usually starts is the application folks call the DBA and say the database is slow.  The DBA calls the system administrator and says the compute and or storage is slow.  The system administrator calls the network administrator and says the network is slow.  So where's the bottleneck and how do you get running at peak performance again?   To help demonstrate I've invited a good friend of mine, Eddy.

Hi Eddy!  To make a point you're going to represent data and I'm going to shove you through a straw.

No worries Eddy, this won't hurt a bit and you'll be helping to show the readers what I'm talking about!

Hang in there Eddy!  Now, let's say this straw is your network and Eddy is data.  Is it the straw that's too small?  Can the compute layer process and push Eddy through the straw fast enough?  Can the storage hold and process Eddy quick enough to send him through the straw?  Clearly we have a problem, but sometimes it's not as obvious as trying to shove an elephant through a straw.

Everyone, give Eddy a big hand!

So how do you tell where the elephant is stuck?  Usually it takes tools, analysis, smarter people than me and time!  The cool thing is if you have Nimble Storage, we take most of the headache out of finding the bottleneck.  You've heard me praise InfoSight before, but it's gotten even better!  Let me explain.  Say you have a DBA that keeps telling you the storage is slowing down his database.  Well, you can pull up InfoSight, take a look under the Performance tab and wa-la!

What you're seeing is an actual CS420-X2 that is running at a peak cache usage of 131%.  And if we don't have enough cache we're having to grab data off of spinning disk.  If we have to read data off of spinning disk it's not the end of the world, but it is going to slow things down a bit.  It might be time for an SSD upgrade.

Notice the Latency button next to Recommendations?  This is pretty cool too.  If you click on this it will show you tons of historical data regarding your arrays read and write latencies.  Well, what if you're seeing higher then normal latencies.

But your cache and CPU are just fine?  Time to give your network administrator a call.  Maybe something is up with the switch, a speed mis-match, frame size set incorrectly somewhere down the line?

These are just a couple of graphs from InfoSight and remember this comes with the support of your array!  Where was this when the DBA's used to tell me things were, "slow"?

Don't let those bottlenecks get you down!  Arm yourself with the right tools and you'll know where the moles will stick their heads out of the ground before they do!


