Monday, June 1, 2009

Serving Up Samples Without Giving Away The Farm

If you are in the business of creating and selling multimedia productions online, you may have wondered how you can make samples of your content available for free but without giving it all away for free. For example, if you produce daily or weekly audio casts that people are willing to pay for, you probably don't want to post the mp3 files in a year/date/month.mp3 format on the web site. Otherwise, people can easily guess this scheme and download all of your recordings for free.

One two part approach to solve this problem is to obscure the file name such that it isn't easy to guess. For example, you could use the md5 sum of the file name. For example, the md5 sum for 20090529.mp3 might be d41d8cd98f00b204e9800998ecf8427e. You could re-name the file to be d41d8cd98f00b204e9800998ecf8427e.mp3.

This obfuscation method may be good for making it hard to guess other file names but it can make it a little more challenging to incorporate it into your web site. Therefore, the second part to this two part scheme is to provide an easy programmatic way to map the obfuscated file name to a more meaningful data point, like a date or title. For example, your web site may currently determine which audiocast sample to serve via the date. Here is a php sample snippet from one web site:
var mfile = 'audio/' + month + day + '.mp3'
The developer for this web site could through cgi (my favorite), php, or even asp lookup the mapping of md5 value through a reference file or a database. For example they could have a simple text file (hidden from web site view of course) like the following that maps the date of a recording to its corresponding md5 sum value.
2009/05/29 d41d8cd98f00b204e9800998ecf8427e
The web site would simply read the file looking for the specified date, and then parse out the md5 sum value for that date and use the md5 sum value for the mp3 file to serve up.

I hope this trivial idea helps you protect your digital assets.

Blessings to you and yours!

Brad


Wednesday, April 22, 2009

Which Cloud Is Right For You?

As you read the title of this blog, you are probably thinking "What are you about to sell me?". Being the architect and consultant at heart that I am, I couldn't bring myself to do that just yet. No instead I want you to think about the various kinds of cloud computing.

Conventional definitions are mostly restricted to public versus private clouds. However, the Cloud Cube white paper from the Jericho Forum and Chris Hoff's elaboration on the cloud cube expanded my concept to include a much broader spectrum of possibilities.

Not only can clouds categorized in terms of public and private but they can also be characterized in many other terms as well. Consider the following possibilities:
  • Internal/External (e.g. in your company's datacenter),
  • Open/Proprietary (e.g. Interoperable, Portable and Interchangeable),
  • Perimeterised/De-perimeterised (e.g. Operating within the traditional IT perimeter),
  • Insourced/Outsourced,
  • Offshore/Onshore (e.g. would offshoring data create a security or privacy risk?),
  • Reglatory X compliant/Non-compliant (e.g. HIPPA, FISMA, SOX, ... etc.)
  • Single tenancy/Multi-tenancy (dedicated/shared server, network storage,... etc.),
  • Isolated data/co-mingled data (e.g. disk storage, database, backups, directory),
  • Dedicated security/Socialist security,
  • and On-premise/Off-premise
Once you start to consider these many facets of Cloud Computing, it becomes easier to understand why it can be so complicated. The conclusion that I came to from this broadened perspective is that it is vital to understand as many aspects of the requirements for your service before blindly jumping into a specific Cloud Computing solution. You may find that there were more requirements than you thought that were inherently addressed by hosting your service within your private datacenter. However, you may find that your service may be very flexible and have a wide variety of cloud options. The main point is to understand your requirements so that you can make the most informed decision possible.

Thanks to the folks at the Jericho Forum and Chris Hoff for broadening my perspective.
Thanks also to Joel Weise for the referral to Chris Hoff's blog.

Brad




The 10 "S"es of Cloud Computing

Hello again,

In my efforts to keep up with all things virtualization, I have begun to learn about Cloud Computing. The starting point for everyone on this journey is to find out "What is Cloud Computing". You will most likely find as I did in this video that there about as many definitions of Cloud Computing as there are people.

Drawing on the The Five Pillars of Cloud Computing, Cloud Computing in Plain English, Rethinking Application Delivery and other resources, I suggest that Cloud Computing is the intersection of the following 10 "S"es:
  1. Services - Infrastructure (IaaS), Platform (Paas), Software (SaaS), Storage (DaaS)... etc.
  2. Servers - Physical or virtual servers.
  3. Storage - The footprint of storage required host the operating systems, services, and data for your service(s).
  4. Service Networking - Physical and/or virtual networking.
  5. Self Service Orchestration - The ability for end users to in an a manual or automated way signup, provision, manage, use, integrate with other services (like data stores, identity services, or any other services), populate (load data), control local and geographic load balancing participation and deprovision hardware, operating systems, or services services.
  6. Self Updating - The underlying hardware, operating systems, and software services are maintained in an automated or at the very least controlled way.
  7. Service Metering - Account for services (Iaas, Paas, and Saas) consumed in as near real time as possible. Think of Google Analytics that benefits both the user and the service provider. For the user, it provides at a granular or global level the usage of the user's service(s). For the provider, it accounts clearly what and how the services are being consumed.
  8. Service Billing - Automated chargeback mechanism for metered services.
  9. Security - Although this isn't talked about much, it needs to be an inherent component of every architectural component of Cloud Computing.
  10. Service Level Agreement (SLA) -An SLA is essentially a contract between a provider and a consumer that defines acceptable terms that the provider is willing to agree to for a certain price to deliver the service to the consumer. If the provider fails to satisfy the terms of the SLA, the provider typically pays a fine to the consumer as compensation. SLA terms are typically defined in terms of availability (e.g. 99.999% uptime), response times (e.g. no greater than 1 second), or throughput (e.g. no less than 1,000 operations per second). At the present time, the main cloud computing providers offer SLAs but they are typically very weak and not par with most enterpise computing much less service provider level expectations.
I hope you find this definition useful in your pilgrimage into the world of Cloud Computing.

Regards,
Brad

Thursday, March 26, 2009

Filesystem Cache Optimization Strategies

Introduction
One often overlooked high performance feature of the Solaris operating system is the filesystem cache. The filesystem cache improves performance by temporarily storing data read from disk into available system memory. The time required to retrieve data from disk based storage varies anywhere from a few milliseconds to several hundred milliseconds depending on the type, configuration, performance, and utilization of the storage holding the data. However, retrieving the same data from memory is considerably faster. The difference in speed is found in length of the I/O path. Read operations from disk traverse the system bus, an I/O controller (SCSI, SATA, FC,...) a disk controller and finally get the data from one or more spinning platters and then traverses back through the same path to return the data to the process. Read operations from filesystem cache go no further than the system bus because the system memory is usually on the main system board. Each step along their respective paths adds an incremental amount of time to the overall response time of the read operation.

Why Use Filesystem Cache
What are the potential benefits that come with a filesystem caching strategy? One or more of the following benefits may apply to your data centric application.
  • If the application is a 32-bit application, it may not be able to address more than 4GB of memory. Using this strategy may enable the application to benefit from far more memory because the data is read from the filesystem cache instead of the storage devices.
  • May deliver equal to or in some cases better performance than the application's native caching methods.
  • May not flood the underlying storage during application checkpoints (or garbage collections in the case of Java applications). For a very large database cache and slow or very busy disk storage, checkpoints can result in drops in performance simply because the filesystem is so busy flushing data from the transaction logs into the database files. Filesystem caching can eliminate many un-necessary disk read operations. This would enable better write throughput.
  • The probability of a memory leak existing in the filesystem or its cache is very low.
  • If a memory leak ever does occur in the application, you will be able to detect it sooner, track it longer, and let it run longer before running out available memory.
  • If the application ever core dumps, the resulting core file could be significantly smaller for applications that otherwise would have a large internal cache. This has two benefits. First, it doesn't take nearly as long to upload the core file to support. Second, there is much less data in the core file for the engineers to cull through when diagnosing a problem.
  • May improve overall disk throughput because fewer read operations are going to disk. In some customer cases, this has made a very dramatic difference because the disk drives and I/O controllers were so busy with read operations that they could barely keep up with replication (write operations).
  • Can enable you to maximize application memory efficiency. Filesystem caching maps directory data directly into memory without any inflation. Some caches like database cache of Directory Server Enterprise Edition (DSEE)[nsslapd-dbcache] uses roughly 1.2x the amount of memory as the on disk format. The DSEE entry cache uses roughly 4x the amount memory as the on disk format because it converts the data to LDIF format for optimal consumption. Thus for DSEE, the filesystem cache offers the best memory efficiency. e.g. You can fit more data into the filesystem cache than you can the db or entry caches.
Solaris Filesystem Caches: segmap cache
Although Solaris supports many different filesystems, the two most commonly used filesystems are UFS and ZFS. Prior to ZFS, the filesystems including UFS, NFS, QFS, and VxFS used a common filesystem cache called segmap. Segmap is a pre-allocated memory pool used to map portions of file data into pages of kernel virtual memory. Once file data is mapped into the segmap cache, subsequent read operations are pulled from either the segmap cache or its overflow cache, the Free cachelist instead of from the underlying disk storage. Once data is evicted from the Free cachelist, it will need to be retrieved again from disk. See Rich McDougal's blog post to learn more on how segmap works.

Solaris Filesystem Caches: Vnode Page Mapping cache
Over the life span of the segmap design, there have been some performance penalties associated with page eviction and replacement from the cache. One example is that there can exist excessive cross calls when unloading page mappings during eviction. This can result in excessive CPU utilization during a heavy volume page evictions. These issues are addressed for UFS via CR 6256083 which implements a new lightweight file mapping mechanism that in most cases substitutes segmap. This new file mapping mechanism is called the Vnode Page Mapping (VPM) cache. As of Solaris 10 update 6, VPM is implemented in Sparc and x86 64-bit/x64 architectures.

Solaris Filesystem Caches: ZFS Adaptive Replacement Cache (ARC)
The ZFS ARC cache has two levels of filesystem caching. The primary cache, called the primary ARC cache uses physical memory and the second cache, called the secondary L2ARC cache uses Solid State disks to cache data. The L2ARC is not currently in Solaris 10 update 6 but is in OpenSolaris 2008.11. The default configuration of the ZFS ARC (primary) cache is to use all unused memory for caching data. Note that one of the primary design goals for ZFS is to enable it to be self adjusting so that it does not deep tuning skills to optimize it for specific use cases. With that in mind, some or all of the tuning options defined in this blog post may change in future versions of the ZFS filesystem. To ensure that you have the most up-to-date information on optimizing ZFS performance, see the ZFS Evil Tuning Guide.

Solaris Filesystem Caches: Memory Contention
Once server memory reaches capcity, the filesystem caches will give up memory to applications requesting memory. I refer to this overlapping use of memory as memory contention. The ZFS ARC and segmap filesystem caches handle memory contention a little differently.

The segmap cache architecture consists of two levels of caching. The primary level is the segmap cache itself. The second is the Free cachelist. This second cache is really free memory that happens to still occupy memory. So applications can claim memory from the Free cachelist with very little overhead.

The ZFS ARC cache on the other hand has a single cache that is maintained in the kernel itself. So when an application makes a request for memory, the overhead associated with freeing up that memory is a little greater than with the segmap cache. Consequently, the ZFS ARC cache may need a little more care to ensure that memory contention is avoided alltogether.

The best way to avoid memory contention is to eliminate potential overlap of memory consumption. I will talk more about this later. For now, just consider that a data centric application may not reach optimal performance when it constantly contends for memory.

How Filesystem Cache Can Improve Performance
The easiest example to show how a filesystem cache can improve data centric application performance is to consider the find command. The find command as its name implies is used to traverse a filesystem tree to find one or more files within a specified filesystem tree structure. To see the difference in performance between disk read and filesystem cache read data is to run the following find command twice in a row. For reference later in this document, this find command is looking for all files with the SUID or GUID bits set. This is a common command run by systems administrators to find potential security vulnerabilities.

# find / -type f \( -perm -004000 -o -perm -002000 \) -exec ls -lg {} \;

The first time it is run, the command reads the data from disk as it traverses the directory tree starting from the root (/) directory. As the find command reads the data from disk, that data is stored in the filesystem cache. More and more data will be populated into the filesystem cache until either the command completes or the filesystem cache reaches maximum capacity. The second time the command is run, some or all of the data is retrieved from the filesystem cache instead of from the disk storage. For example, lets look at how long it takes to complete the find command to run two times in a row. The first non-cached run took 32.93 seconds to complete. The second cached run took 1.74 seconds to complete. That is a 95% reduction in time to complete the same command. Here is the memstat data before (from a fresh Solairs reboot):

Fresh boot memstat data: Page Cache=69MB Free (cachelist)=42MB
First run memstat data: Page Cache=141Mb Free (cachelist)=42

From the above data, we see that the find pushed approximately 72MB of data into the filesystem cache.

Now that you see the value of leveraging the filesystem cache in order to improve data centric application performance, lets look at the configuration considerations for properly implementing a filesystem cache strategy.

Establish A Safe Ceiling
One of the goals of an effective filesystem caching strategy is to avoid memory contention. To do this, configure the filesystem cache to use just enough memory that the operating system and applications won't contend with the filesystem cache for available memory. Lets consider the following example. Consider a server that has 32GB of RAM. In this example lets assume that the sum of memory consumed by Solaris, the primary application, and all other programs is approximately 8GB. Lets assume that the primary application is a data centric application that interacts with as much as 80GB of data. Over a sufficient amount of time, the filesystem cache eventually fills to capacity as the primary application interacts with its 80GB of data.

You can get an estimate of the sum of memory consumed by all running processes with the following command.

# ps -eo rss | awk ' BEGIN { t = 0; }{ t += $1; } END { print t; } '

This command returns in kb the sum of all resident set size (rss) memory consumed by all running processes. Alternatively, use the modular debugger (mdb) to get a big picture view of how memory is allocated. Consider the following sample memstat output.
# echo "::memstat"|mdb -k
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 94731 370 9%
Anon 35113 137 3%
Exec and libs 4544 17 0%
Page cache 150191 586 14%
Free (cachelist) 394526 1541 38%
Free (freelist) 367163 1434 35%
Among other things, this output tells us that 35% of memory is unused (Free freelist), that 52% of memory is allocated to UFS filesystem cache (Page cache plus Free cachelist) and that 9% is used by the Kernel. Note that if this example was showing ZFS caching, the bulk of the percentage would show up in the Kernel line because ZFS ARC cache is currently accounted for in the Kernel. In a future Solaris 10 version, the ZFS ARC caches will be broken out into their own lines.

Lets get back to our example. Assume that the data centric application, like most follows the 80/20 rule. The 80/20 rule suggests that 20% of users drive 80% of a service load. Lets also assume that this means that 20% of users represents 20% of the 80GB of data. The ideal performance of data centric applications occurs when you keep active data in memory so that very few read operations go to disk storage. In this example, the ideal cache size would be 20% of 80GB which is 16GB. Lets consider three filesystem caching configurations to determine which would be the most optimal.

UFS Default Caching
The default UFS cache configuration for 64-bit Solaris allocates 12% (3.84GB) of physical memory (32GB) for the filesystem cache. This leaves approximately 69% of physical memory unused. This configuration would cache only 24% of the ideal memory allocation for optimal performance. As a result, the I/O subsystem would spend significantly more time performing read operations for data that is not in the filesystem cache. Thus overall performance of the application for data reads would be slower than if the data were in the filesystem cache.


ZFS Default Caching
The default ZFS cache configuration would consume all free memory (~26GB). However, it wouldn't leave any memory buffer to avoid contention with Solaris or other applications on the system. The performance of the application would be good because more than the ideal 16GB of data would be cached. However, there would be a certain amount of performance loss in the give and take contention between filesystem cache and all other applications consumption of physical memory. There is an additional issue that may come with the memory contention. If while the application is offline ZFS consumes all available memory, the application may take longer to start as it waits for ZFS to free enough memory for the application to start.

Optimized ZFS Filesystem Caching
In this configuration for ZFS, we establish a safe memory consumption upper boundary for the filesystem cache. The goal of the upper boundary is to avoid memory contention. In this example, we set the upper boundary of filesystem cache to 22GB, which provides a 2GB saftey buffer and is 4GB larger than the ideal size of 16GB. Thus, this will further improve application performance and avoid contention with Solaris or applications.

Clearly the optimized filesystem cache configuration offers the lowest risk with the best potential performance. Lets look at how to set the upper boundary of the UFS and ZFS filesystems.

Tuning ZFS Cache
The ZFS filesystem cache (ARC) size is specified as a hex value. Thus, to change the ZFS ARC cache from its default of no upper boundary to a fixed upper boundary of of 22GB, you can use the following to determine the hexidecimal value.

# bash
# numGigs=22
# decVal=$((${numGigs}*(2**30)));
# echo "obase=16;ibase=10; ${decVal}" | bc
580000000

Apply this upper boundary by adding the following entry in /etc/system and reboot the server in order for the change to take effect.

set zfs:zfs_arc_max = 0x580000000

Unlock The Governors
Although using the filesystem cache may improve the performance of some applications, it is not necessarily best for all applications. Solaris is tuned by default for general purpose computing. Consequently, the default Solaris configuration is not optimized to realize the full potential of the filesystem cache for some applications.

For UFS, the FreeBehind UFS filesystem kernel option is used to prevent caching sequentially read data. The freebehind option needs to be re-configured to allow all data read in to go into the filesystem cache. To do this, add the following line to /etc/system.

set ufs:freebehind = 0

Lets examine the limiting effect of freebehind for the UFS filesystem cache occupancy by looking at the cache before and after reading in large amounts of sequential data for two different configurations.

Here are the requisites for this experiment.
  • A non-production server that is running Solaris 10 Update 6 or greater,
  • with a minimum of 8GB of RAM,
  • a 10+GB disk drive that can be formatted with UFS and ZFS filesystems (ALL DATA ON THE DISK WILL BE DESTROYED after formatting),
  • and root access to the server.
Once you have found a server with sufficient storage, here are the sequence of steps for this experiment.
  1. Find or create a UFS filesystem with 6GB of free space. Change directories into that UFS filesystem.
  2. # cd /my_ufs_filesystem
  3. Create a 4GB file with randomized data.
  4. # openssl rand -out datafile 10737418240
  5. Reboot the server to clear the filesystem cache.
  6. # init 6
  7. Login as root to the server and run the following command to get a big picture view of memory allocation prior to populating the filesystem cache.
  8. # echo "::memstat"|mdb -k
    Note the value of the “Page cache”.
  9. Then, use dd to load the file contents into the filesystem cache
  10. # dd if=datafile of=/dev/null bs=512k
  11. Run the following command and compare the “Page cache” and "Free (cachelist)" values with the previous run of the same command.
  12. # echo "::memstat"|mdb -k
Note that the sum “Page cache” and "Free (cachelist)" values have grown but not by 4GB. In order to put the full contents of the datafile into the filesystem cache, we need to make sure the filesystem cache is large enough to hold the file and we need to adjust freebehind to allow ungoverned population of the filesystem cache. To do this, we add the following line to /etc/system and then reboot the server.

set ufs:freebehind = 0

Load the datafile into the filesytem cache followed by checking the “Page cache” with the following two commands.

# dd if=datafile of=/dev/null bs=512k
# echo "::memstat"|mdb -k

Note that the sum “Page cache” and "Free (cachelist)" values have increased the full 4GB compared to the prior iteration with the default values for freebehind and segmap.

Note that although not in the Solaris documentation, the segmap and freebehind options have been in Solaris since version 72.

Avoid Diluting The Filesystem Cache
The filesystem cache is a general purpose facility intended to improve general performance by minimizing frequent disk storage read operations. The bad part of it being general purpose is that any file can get loaded into the filesystem cache. Once the cache reaches capacity, as new data is read in old data is pushed out. This process can dilute the cache with data that you don't want in the cache. One common example that can dilute the filesystem is when an administrator runs the find command looking for setuid programs. Another example copying or backing up large files from one filesystem to another. Any of these examples can push a significant amount of data out of the filesystem cache.

The best strategy to avoid filesystem cache dilution is to ensure that only the application data can be loaded into the filesystem cache. There are two parts to this strategy. First, make sure that the application data is in a filesystem by itself. Second, make sure that all other filesystems do not use the filesystem cache. This latter part is managed through the following methods.

UFS caching is disabled by adding the forcedirectio option in the /etc/vfstab to filesystems that you don't want cached. Here is a sample filesystem entry.
/dev/dsk/c0t0d0s1  /dev/rdsk/c0t0d0s1 /var  ufs   1    no   logging,forcedirectio
Note that there is a known deadlock condition9 that should be avoided by not using the UFS "logging" option.

As for ZFS, there aren't any per filesystem cache controls prior to Solaris 10 update 6. However, a future update will introduce the ability to disable primary and secondary caches by filesystem. Both options are configured through the “zfs set command”. The following two sections taken from the OpenSolaris 2008.11 zfs man page describe the primary and secondary caches, their possible values and the default values.
primarycache=all | none | metadata

Controls what is cached in the primary cache (ARC). If this property is set to "all", then both user data and metadata is cached. If this property is set to "none", then neither user data nor metadata is cached. If this property is set to "metadata", then only metadata is
cached. The default value is "all".

secondarycache=all | none | metadata

Controls what is cached in the secondary cache (L2ARC). If this property is set to "all", then both user data and metadata is cached. If this property is set to "none", then neither user data nor metadata is cached. If this property is set to "metadata", then only metadata is cached. The default value is "all".
Note again that these two options don't yet exist in Solaris 10 (as of Solaris 10 update 6).
Here are OpenSolaris 2008.11 commands for disabling both the primary and secondary caches of the dump zfs filesystem.

# zfs set primarycache=none rpool/dump
# zfs set secondarycache=none rpool/dump

Match Data Access Patterns
Data access patterns typically fall into one of the following three categories.
  • Sequential Access – This access pattern reads data blocks one after another in sequence from the disk.
  • Random Access – This access pattern reads data blocks randomly throughout the disk.
  • Hybrid Access – This access pattern is some mixture of Sequential and Random access patterns.
Aligning the filesystem tuning according to the anticipated data access patterns can significantly improve performance of the filesystem and application interacting with the filesystem. For example, the changelog of just about any database has a sequential access pattern. However, the data access pattern of the database data may be random. Thus, tuning the filesystem containing the changelog for sequential access would improve its performance.
For ZFS, the main configuration option that improves sequential access patterns is pre-fetching. File level and block level pre-fetching is on by default. For random access patterns, you may want to consider disabling pre-fetching. To disable ZFS pre-fetching, add the following lines to /etc/system.

* Disable file level pre-fetching
set zfs:zfs_prefetch_disable = 1

* Disable block level pre-fetching.
set zfs:zfs_vdev_cache_bshift = 13

Unfortunately because ZFS sets pre-fetching system wide, you should make sure that the predominate access pattern is random before making this change. Alternatively, you can put just the random access data in a ZFS filesystem with random access configuration. Then put the sequential data (like the changelog) in a UFS filesystem that is optimized for read-ahead pre-fetching.

UFS read-ahead is configured per filesystem indirectly via the maxcontig option. The maxcontig option itself is defined (from the Solaris 10 tunefs man page) as the “disk drive maximum transfer size” divided by “the disk block size”. If the disk drive's maximum transfer size cannot be determined, the default value for maxcontig is calculated from kernel parameters as follows:

If maxphys is less than ufs_maxmaxphys, which is 1 Mbyte, then maxcontig is set to maxphys. Otherwise, maxcontig is set to ufs_maxmaxphys.

UFS read-ahead is determined indirectly through the maxcontig value. Setting the maxcontig value to a large value will look further ahead than if the maxcontig is set to a small value. The ideal value should be determined based on the average data size that the application uses and thorogh testing. Note also that if (as recommended earlier for non-data filesystems) forcedirectio not only bypasses the filesystem cache, it also disables read-ahead as well.5

Reducing the UFS read-ahead for the directory server of DSEE may improve throughput depending on the average entry size of the directory data. To decrease the UFS read-ahead, consider dropping the maxcontig from its default value to 16M through tunefs.

Consider Disabling vdev Caching
ZFS allocates a small amount of memory for each virtual device (a.k.a. vdev) that participates in a zpool. You may consider disabling this feature in order to eliminate all un-necessary caching. I have done done extensive testing with this to determine the full extent of memory savings by disabling. I recommend that you test it out in the lab before trying it in production. Having said that, here is the value to set in /etc/system to disable vdev caching:

* Disable vdev caching
set zfs:zfs_vdev_cache_size = 0

Minimize Application Data Pagesize
Data centric applications like relational databases and directory services often specify a specific page size when writing data to storage. The page size determines the size of data to allocate for each write operation. In the case of directory services, the actual data size may be significantly smaller than the page size. Mis-matched pagesize can result in degraded performance as a results of less efficient use of memory, and writing more data to disk than is necessary. More information on how to determine if your DSEE pagesize is mismatched and how to correct it here.

Match Average I/O Block Sizes
The average block size for a given data block should be used as the metric to map all other datablock sizes to. For example, the ZFS recordsize is 128kb by default. If the average block (or page) size of a directory server is 2k, then the mismatch in size will result in degraded throughput for both read and write operations. One of the benefits of ZFS is that you can change the recordsize of all write operations from the time you set the new value going forward. However, with UFS, you have to set the blocksize at the time of creation. Fortunately though for most directory services deployments the UFS blocksize is adequate.

Consider The Pros and Cons of Cache Flushes
The default ZFS configuration assumes the disks used to make up a zpool volume are just disks and not a storage array. If the disks are actually a storage array with a cache, consider disabling cache flushes by adding the following entry to /etc/system.

* If using caching storage array, disable cache flushes.
set zfs:zfs_nocacheflush = 1

Prime The Filesystem Cache
If the data of your application could fit entirely in the available filesystem cache, you may prefer load the data into the cache before starting the application. This could improve the consistency of response time for all operations that would otherwise read data from disk storage. If the data is much larger than the available cache, the performance benefit will not be as great.

There are a few ways to prime the filesystem cache. One way is to push the data in through the null device. For example, the following command could be used to load all of the DSEE data files of a directory instance that lives in /ds into the filesystem cache:

# find /ds/db -type f -name "*.db3" -exec dd if={} of=/dev/null bs=512k \;

Conclusion
Before closing out this blog post, I want to highlight again that one of the design goals of the ZFS filesystem is that you shouldn't have to tune anything. It will have the intelligence necessary to optimally tune itself. That being said, any of the /etc/system settings mentioned in this blog could change with future versions of ZFS. I highly suggest that you bookmark the ZFS Evil Tuning Guide as it will keep up with the ZFS performance tuning options as they evolve over time.

That concludes this blog entry. I hope that you find this information useful. Thanks to Arnaud Lacour, Steve Sistare, Mark Maybee, Marcos Ares, Pedro Vazquez, and others that contributed to this blog. Your help and input was tremendously helpful.

End Notes
1 - ZFS Evil Tuning Guide
2 - Historical freebehind and segmap references: Segmap Tuning, Understanding Perforce Server Performance, Performance Oriented System Administration, NFS CookBook - Based on Solaris 8, 9 (written prior to Solaris 10)
3 - Solaris 10 Solaris Tunable Parameters Reference Manual: freebehind
4 - Solaris 10 Solaris Tunable Parameters Reference Manual: segmap_percent
5 - Pages 158-160 of System Performance Tuning - By Gian-Paolo D. Musumeci, Mike Loukides, Michael Kosta Loukides
6 - System Administration Commands: tunefs
7 - Directory Server Databases and Usage of db_stat
8 - Three way deadlock involving UFS logging and directio
9 - UFS Directio Implementation
10 - Segmap default and max values
11 - Understanding Memory Allocation and File System Caching in OpenSolaris


Monday, November 17, 2008

Resizing VDI Windows Sessions

Hello again,

Here is a juicy morsel from my pilgrimage into the world of Virtual Desktop Infrastructure (VDI).

This is a variation on Maurice Bonotto's blog post on Resizing Windows Sessions entry. The primary difference is that I added the necessary step of killing the active RDP connection (uttsc session) so that the RDP connection will be re-created at the proper resolution.

Place the following contents into a script named /opt/SRSS-Addons/utresize.
#!/bin/sh
# utaction script to resize VDI desktop
PATH=$PATH:/opt/SUNWut/bin:/opt/SUNWut/sbin:/usr/X11/bin

dtuDev=`basename $DTDEVROOT`
newRes=`grep RESOL /var/opt/SUNWut/dispinfo/$dtuDev|cut -d= -f2|cut -d: -f2`

# Determine the current settings. If the resolution hasn't changed,
# don't do anything. However, if it has, change X11 pallet, xrandr,
# and reset the RDP connection.
oldRes=`xrandr | grep "^*" | sed -e "s/ x /x/g" | awk '{ print $2 }'`

if [ "$newRes" != "$oldRes" ]
then
# utxconfig defines the X11 pallet size
utxconfig -r $newRes

# xrandr (X11 Resize and Rotate) changes the X11 resolution of
# the DTU session.
xrandr -s $newRes

# For VDI, the RDP session needs be re-established at the new
# resolution. There isn't a clean way to do that. So we, just
# kill the existing RDP connection (uttsc)
pkill uttsc-bin
fi

Then add the following line to the top of the vda script found here:
/etc/opt/SUNWkio/sessions/vda/vda
/opt/SUNWut/bin/utaction -i -c /opt/SRSS-Addons/utresize &

When you insert your token (aka Java Card) into a Desktop Unit (DTU) that has a different resolution that your current DTU, it will seem like the SunRay is rebooting. This is because the script has killed the RDP (uttsc) session. Also, when the Windows login screen comes up, the login screen may not yet represent the proper resolutioin. However, after you login it will resize to the proper resolution.

Enjoy!

Brad



Thursday, October 9, 2008

Wireless Options for Thin Clients

Hello all,

It has been a while since my last post. Part of the reason why is because I am expanding my focus to include the broader spectrum of Sun's virtualization portfolio. Click here for a brief video overview. This blog post relates to the desktop virtualization part of the portfolio.

Sun's desktop virtualization solution breaks down into three layers. The bottom layer is the virtualization layer. This is where your desktop lives. Your desktop may live on a shared server like a Windows Terminal server, a Solaris server, or even a Linux server. Or your desktop may be contained within a virtual machine running on top of a physical server. Or your desktop may just run on a physical blade server, or even just a plain old desktop. Most likely though in order to get the full benefits of the desktop virtualization solution, your desktop will be running within a datacenter.

The next layer is the session management layer. In this layer, we manage your desktop session so that you can access it from just about any device. Session management enables you to go from one device to another and pick up right where you left off on your desktop. This awesome feature is called hotdesking.

The third and last layer is the access layer. This layer represents the thing that you interact with. It could be a thin device like a SunRay, or a browser on your Mac desktop, or even microbrowser in your mobile phone or PDA.

With that brief overview, the real purpose of this blog is to talk about the various wireless options for thin client devices of the access layer of Sun's virtual desktop solution.

Sun's SunRay thin clients do not currently offer native wireless support. However, there are several wireless/Ethernet bridge devices available that you can use to put a SunRay thin client on a wireless network. I did a brief market survey this morning to find out what devices are available for this task. Here are the results of my survey.

ProductModelCompanyPrice
Wireless Pocket Router/AP w/ Client ModeDWL-G730APD-Link$64.99
High Speed Mode Wireless Pocket Access PointWL-330gEAsus$89.76
5 Port 802.11G Enet Bridge Adptr PoeWET200Linksys$161.69
Wireless-G Ethernet BridgeWET54GLinksys$120.07

The DWL-G730AP and WL-330gE are very compact devices that appear to be great options for demos. The WET200 is nice if you want to connect up to 5 SunRay's through a single wireless device. This might be great for an Internet Cafe that can't or doesn't want to run Ethernet cables through their facility.

If you prefer a notebook form factor, the Tadpole division of General Dynamics offers the Tadpole M1400 Ultra Thin Client. You can either call Sun or General Dynamics for pricing on this thin client device. The only con with this device that I can see is that doesn't offer wireless encryption. If you need wireless security, then consider using one of the Wireless/Ethernet bridges mentioned earlier.

Have a great day!

Brad


Monday, September 8, 2008

Having fun in Colorado...


The CMA SE and Sales team at business planning meeting.



Brad