With respect to my work in filesystem caching strategies, this new Solaris release introduces three excellent new features.
First is the introduction of L2ARC cache support. This means that you can now employ readzilla and writezilla SSD devices into any Sun servers.
Second is the introduction of ZFS ARC cache controls through the primarycache (e.g. L1 ARC cache) and secondarycache (e.g. L2 ARC cache) filesystem properties. These new cache controls provide the ability to define what is in and more importantly not in the ZFS ARC cache. You may recall from my blog post on filesystem caching strategies that controlling the contents of the ZFS ARC cache can produce much better and more consistent performance results for data centric services such as directory server and others.
For example, lets say that I am deploying directory server (e.g. DSEE) with the following layout:
* DSEE Bits: ZFS filesystem zpool/bits -primary and secondary caches are disabled
* DS info and txn logs: ZFS filesystem zpool/logs - primary and secondary caches are disabled
* DS DB and ChangeLog: ZFS filesystem zpool/data -primary and secondary caches are enabled
By using the primary and secondary cache controls, I guarantee for the zpool ZFS pool that the
only data stored in the ARC cache is the DS data and changelog.
Here is how to disable both primary and secondary cache for the zpool/logs filesystem at the time of filesystem creation:
# zfs create -o primarycache=none -o secondarycache=none zpool/logs
Here is how to disable both primary and secondary cache for the zpool/logs filesystem at after the filesystem has already been created.
# zfs set primarycache=none zpool/logs
# zfs set secondarycache=none zpool/logs
Note that if you wanted to create and associate pre-tuned ZFS filesystems to a zone at the same time you are creating the zone, you can do this through The Zone Manager with the -r or -w flags. This is possible with the latest release
through the extension that allows you to pass ZFS options like "primarycache=none;secondarycache=none;compression=gzip" to the -r or -w flags. Click here to see full usage help.
The third new feature is the breakout of ZFS ARC cache accounting in the ::memstat kernel metrics. Although the Solaris documentation doesn't make mention of this feature, I presume that it is present in support of the new ARC cache controls. You can see for yourself by running the following command:
# nice -10 echo "::memstat"|mdb -k
Note that you should not run this command on a production server as it may significantly reduce performance of the system while it scans through all physical memory. Note also that the time to complete running is proportional to the amount of physical memory installed in the server.
Here is a sample output of ::memstat metrics prior to Solaris 10 10/09:
Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 94731 370 9% Anon 35113 137 3% Exec and libs 4544 17 0% Page cache 150191 586 14% Free (cachelist) 394526 1541 38% Free (freelist) 367163 1434 35%
Here is a sample of what I hope that you will see with Solaris 10 10/09:
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 428086 3344 3%
ZFS File Data 25006 195 0%
Anon 13992767 109318 85%
Exec and libs 652 5 0%
Page cache 24979 195 0%
Free (cachelist) 1809 14 0%
Free (freelist) 1979424 15464 12%
Total 16452723 128536
Have a super day!Brad
PS: As soon as I get the chance to download and install Solaris 10 10/09, I will check the memstat data, I will confirm or deny the presence of the new memstat data.
7 comments:
I just confirmed that the ::memstat kernel metrics do break out ZFS data as mentioned in the blog post.
Have a super day!
Brad
Very helpful, thanks. Can you confirm if the /etc/system set command:
set zfs:zfs_arc_max=
is still valid in release 10/09
r4sutton@mac.com
Yes, zfs:zfs_arc_max still applies to Solaris 10/09. However, always refer to the Solaris documentation, the ZFS Evil Tuning Guide and the ZFS Best Practices Guide for the latest tuning features of the ZFS filesystem.
Seeking how ZFS cache is handled by the Solaris VM subsystem when application memory demand increases.
On one of our Sun servers, noted ::memstat output shows all extra available memory going to ZFS cache. 67% of the memory is in this ZFS cache by ::memstat. The application only needs 19% (~6GB) of the 32GB. Traditionally Solaris would free the older file cache segments as the application memory demand increases. Can someone pipe in here the behavior of the Solaris VM manager in terms of freeing ZFS cache when application demands more memory?
I think ZFS related memory segments will free if the app demands based upon my observations.
Donovan,
The ZFS cache will free up space when an application needs it. However if the application needs a lot of memory quickly the application may not get it fast enough and can timeout. This is rare though. The best thing to do is tune the upper boundary of the ZFS ARC to not exceed a safe boundary so that you mitigate contention between the cache and applications. You can see learn more on this topic from my blog on the same subject:
Filesystem Cache Optimization Strategies
Brad,
We experienced a problem on an M5000 where the whole box hung. The SUN engineers pointed to a possible memory shortage due to the over-usage of memory by ZFS ARC Cache and the inability to give it up. I would understand if the application had a timeout, but I am surprised and a little bit skeptical it would cause the cannister and the global zone to completely hang.
Have you ever seen this behaviour and resolved it by setting the zfs_arc_max lower?
Hello Trevor,
Thanks for your comment. I personally have not observed the specific scenario that you have described with current versions of Solaris (e.g. U7 or U8). However, I can envision a scenario where if the kernel (or some other large consumer or heavy user of RAM) and ZFS ARC competing for memory could slow things down from time to time.
You might want to make sure that you on the latest version of Solaris because earlier versions had more susceptibility to RAM overlap and contention than with this than current releases. For example, I encountered similar issues on my network backup server (basically little RAM plus lots of zones with rsyncs over ssh to each zone) with pre-u6 versions of Solaris 10. For both past and current versions, reducing the ZFS ARC size helped.
I hope that helps!
Brad
Post a Comment