Wednesday, February 11, 2015

Dedupe is Great, but What Affect Does it Have on Nimble Storage?

Hi Friends,

I got a great question on my Microsoft Windows File Sharing with Nimble Storage - What About Deduplication?  blog that I felt it deserved a blog of it's own instead of a response.

The question is, "Did you see a big increase in the cache consumption as the dedupe process churns through your files?"

Again, fantastic question.  Too many times we get focused on the trees and forget about the forest.  What does that mean?  We get obsessed on the end goal and forget our actions can have an affect on the system as a whole and it may not be a positive affect.  For example, I want to make my car faster, so I get an aftermarket turbo, a cat-back exhaust and maybe I start wandering into stage 2 or 3 mods.  Will the car perform better?  Maybe...  But as any car guy can tell you if you start changing one thing without changing others you might start damaging the car.

The same can be true with technology.  If I bolt on dedupe through the operating system, am I going to mess up my storage performance on a whole?  The person that asked the question wanted to know and so did I.

So here's what I did, I ran a Vdbench workload with some SMB clients accessing the file share doing a 50/50 read/write split.  While the performance test was running I kicked off deduplication on a volume that had data on it, but had not been deduped before.  I'm happy to report the array handled the dedupe like a champ!  The only thing I saw was a little additional read throughput, no drop in cache hit (100% cache hit), no drop in IOPS.  Let's have a look!

But remember, you mileage will vary, so it's always important to test performance modifications before you go into production or the track!  :-)

Until Next Time!


  1. What size were the volumes you tested?

  2. Did you see an overall higher consumption of your overall cache? It seems to me that since you are touching all those files they would be put into cache. My thought is once it is done the cache consumption would probably return roughly to normal.