Hi Friends,
If you've been reading my blog for awhile now you'll know I've talked about SESparse vs. VMFSsparse and if we can believe what's written on the box we could certainly have a winner! A little overview for those who haven't been long time Glick's Gray Matter readers. In VMware Horizion View 5.1, VMware introduced a new format called Space Efficient Sparse Virtual Disks. The problem in the past was that View wrote to disk in 512 byte chunks. Now if the virtual desktops wrote everything in 512 byte chunks, there would be no problems, but they don't. Since writes are variable and can be 512b, 4k, 16k, 32k, 64k, etc. we can start to misalign our storage. Take a look at this diagram.
As you can see I've drawn out how VMFSsparse writes to disks. Now depending on what boundaries your storage uses this diagram will vary. Here I'm using a 4K block boundary and all would be cool if only 512b chucks of data were written to it. In the drawing I've got 1 512b chunk of data written and then 4K. Since the storage example above has 4K boundaries a piece of our 4K chunk has to be written to another block across another 4K boundary. Now the problem occurs when I go to read that data. Since it written across two blocks I have to do double the work on the storage. This isn't too big of a deal when you just have a few desktops, but multiply this problem by 1000's and you can see this begins to really put extra demands on the storage.
In the past I thought there was an easy way to use or not use SESparse vs. VMFSsparse, but it looks like in 5.2 it's always on if certain environment variables are met. If anyone from VMware is out there and I'm wrong, please let me know! According to this article, SESparse can not be turned off from the GUI and must be done on the Horizon View server itself. http://myvirtualcloud.net/?p=4745
We ran some tests with SEsparse, with VMFSsparse and with View Storage Accelerator. Here are our results:
The top line is average IO the desktops ran at steady state, ~8 IOPS, the next two blue lines are two tests running VMFSsparse and the two red lines are SEsparse. You can see the number of misaligned I/O is about double for VMFSsparse!
If you'd like to learn more, we have a paper coming out shortly that will discuss this in further detail or you can talk to us about it at VMworld, just drop by the Nimble Storage booth and we'd be happy to chat to you about it!
Until Next Time
-Brain
Brian,
ReplyDeleteThis is Matt from VMware. I first read the blog post and was confused as it sounded like you were concerned there wasn't a way to disable SE sparse. But I think your data is showing what you hoped, that SE sparse eliminates unaligned I/Os (it sort of has to by definition since it only allocates in 4k chunks).
SE sparse also has a huge advantage in that it can support deleted space reclamation, which allows you to control the growth of linked clones without having to refresh or recompose (mandatory for persistent desktop use cases).
It's for these two reasons, some improved metadata caching performance as well as having had some additional operation experience with it, that we felt confident enough to all but retire VMFSsparse as an option for View (which has served us well for nearly 12 years since ESX 1.0).
If we're missing something though, please let us know!
Hi Matt,
ReplyDeleteThanks for the comment! A little of both. Our tests were to see if SESparse eliminated unaligned I/O. There was a little confusion since we weren't sure if there was a way to easily turn VMFSsparse back on. Thanks for the clarification and reading my blog!
-Neil
hi Neil,
ReplyDeleteWith respect to unaligned IO, all sesparse guarantees is that.
1. Data grains of size 'grainsize (4k)' are grainsize aligned in the vmdk LBA space.
I.e if a 4K write is made to a 4K aligned offset in the Vmdk, the resulting grain in SESparse will be placed at some vmdk file offset which is divisible by 4k.
(So a 4k aligned 4k multiple sized write workload will perform best, the windows guest flushes in nice 4k multiples from it guest cache).
2. The worst case workload for SESparse in terms of IO amplification on backed array, will be first write of Vmdk data, at non 4k boundary. This will result in COWing (read-update-write) of 2 adjacent grains.
So when compared to vmfs sparse of 512 bytes grain.
- above point 2 of sesparse is a fundamental change, I.e sesparse is COW based. On vmfs sparse there is no cowing as all Vmdk writes are 512 byte sized and 512 bytes aligned. I.e a vmfs sparse based redo log never has to read data grains from base vmdk to serve a write.
- Sesparse by using higher sized grains, does an far better job of maintaining contiguity of logical data in Vmdk space. (Read as - if array is doing any prefetching of Vmdk data SE based guests will see more direct/predictable benefits in terms IOPS)
- SESparse meta data is stored as group at the starting of the file, and data follows next (as opposed to vmfs sparse which intermixes data and metadata). Read as - if back end array is good at picking up streams of access it can very well optimize metadata and data access. On the other hand vmfssparse's interspersed 512 byte data and metadata grain accesses that don't maintain logical contiguity are a bane to backed storage.
-SESparse saves space by dead space reclamation.
-SESparse uses journaling for updates and does not suffer from orphan grains on crashes. Orphan grains though benign occupy unnecessary space in redo logs.
Where available sesparse should be a clearer choice now. Some real honest effort was put in SESparse to alleviate issues with vmfssparse.
Looking forward to your white paper.
-Faraz