Monday, June 30, 2008 1:08 PM
bart
A Lap Around Microsoft "Velocity" - Cache It NOW!
At the beginning of this month, we released the first CTP of Velocity, an early preview of our distributed object cache solution. You can download it here. Notice it's a very early preview so things will definitely change moving forward. This post introduces how to install and use Velocity.
Introduction
But first... what's in a name? Multi-tiered distributed applications are common-sense nowadays and with cloud computing within reach the need to build scalable distributed services has never been bigger. One of the core aspects in enabling those scenarios is to have intelligent caching of objects, not only to reduce the number of accesses to the underlying data source but also to boost availability by employing scale out techniques. Obviously, developers want to be able to do all of this without having to worry about the complexities that this brings, having to deal with load balancing and availability themselves. That's where Velocity comes into play.
The core idea is very straightforward: we have a cache that behind the scenes is distributed and replicated across a bunch of machines called the cluster. Storing data in the distributed cache is as easy as calling some Add or Put method, and retrieving it is as easy as calling Get. With some creative stealing from the documentation we end up with the following picture:
An important thing to emphasize is the fact the cache clients deal with regular .NET objects all the time and don't have to worry about storing those objects. Indeed, .NET serialization takes care of the rest. There are more concepts to it such as cache eviction policies (when objects are removed from the cache, such as least-recently used or LRU), the distribution mechanism where simple clients just contact the cluster "in the cloud" through any cache host and get redirected to whatever host the object is available on versus routing clients that have awareness of object placement through a routing table. Other important pieces include the supported concurrency models and associated locking mechanisms but let's not go there in this introductory post.
Installation
Installing Velocity is fairly straightforward. Just run the MSI. After a while you'll see the following window:
This is where you configure the "cache host". Under Cluster Configuration Share you can enter the UNC path to the cluster's configuration share. This is the place where an XML file is kept that makes sure that configuration settings are consistent across all hosts in the cluster. Create this folder and grant the Everyone account full access to it; this is known issue in the CTP which by no means will become the final design. Also notice that currently in the first CTP this is a single point of failure: if the share goes down, the cluster can die. Obviously this will be addressed in subsequent releases. For now, we'll just specify a local path and enter a new name under "Cluster Name". The Cluster Size is self-explanatory and for the purpose of this introductory sample, we'll stick with a one-host cluster (a degenerate case of a cluster if you will...).
Next, two ports are being specified: the service port and the cluster port. The defaults are just fine here. Basically the service port number is what clients connect to in order to talk to the cache host. The cluster port on the other hand is used by the servers in the cluster to talk to one another (there's another port, called the arbitration port which is listening on cluster+1, i.e. 22235 in the sample above).
Last but not least, there's the Max Server Memory setting which defaults to half of the available physical memory. I've reduced it to 256 MB since I'm using my main dev machine as my playground but obviously in production scenarios boxes will get dedicated to the distributed cache cluster, where it makes sense to boost this.
After clicking Save & Close, setup will ask you to open the required ports on the firewall, which can be done easily by allowing the DistributedCache.exe program (that runs as a service) through the firewall.
Doing so isn't hard at all. Geeks can go the netsh way as illustrated below. Alternatively one can use the Firewall Settings in the Control Panel (click to enlarge):
With this, setup has completed.
Configuration
Before we can start to use the service, we need to make a few changes to its configuration. Let's take a look at the configuration share's file structure first:
The XML file contains the configuration that's shared by all hosts in the cluster. In addition the ConfigStore file is a SQL CE 3.5 database that keeps additional information about partitions, nodes, regions, etc which you can find more information about in the CTP's documentation. Notice that you won't find the cached data here since we're talking about an in memory distributed cache. Geeks can investigate what goes inside this little database, but we'll instead just focus on the XML file:
In here you can find the list of hosts and caches that are part of the cluster. We'll take a look again at this file in just a minute when we've altered the cluster configuration. In order to configure the service, Velocity comes with a command-line driven tool named originally the "Administration Tool" which you can find through the start menu (click to enlarge).
In the configuration steps below, we're adding a cache to the cluster after investigating the hosts and caches that are part of the cluster. Once we've added the cache, the cluster is started which puts all the hosts online by starting their Windows Services.
The service is called DistributedCacheService which is kept in the cacheHostName property in the XML file:
Taking a look back the XML file, you'll notice a new cache configuration entry has been added:
The new cache has an eviction type of Least Recently Used (LRU) which means that the - as the name implies - least recently used objects get evicted from the cache when necessary. In addition, objects expire from the cache after a time-to-live of 10 minutes. The type of the cache is set to partitioned allowing objects to be distributed across the hosts in the cluster.
Using it
After covering the server side, we should take a look at the client side picture of using Velocity. Here's the simple program we want to use:
namespace VelocityDemo
{
class Program
{
static void Main(string[] args)
{
CacheFactory factory = new CacheFactory();
Cache cache = factory.GetCache("Test");
cache.Put("Name", "Bart De Smet");
string name = (string)cache.Get("Name");
Console.WriteLine(name);
}
}
}
Notice how we get access to the cache called Test through a factory object, after which we simply use a Put method to add an item (key, value pair) to the cache which subsequently can be retrieved (on any client connected to the cache) using the corresponding Get method. Obviously more complex serializable objects will be stored in the cache but this simply shows the main idea.
In order to make this work we need to add a couple of references to the project. Notice that in the CTP the number of client-side assemblies that need to be references on the client isn't optimized in any way, so here it goes (click to enlarge):
Core ones are CacheBaseLibrary and ClientLibrary. All of these carry the System.Data.Caching namespace:
But wait a minute, how can the client know where to find the cache retrieved through the factory? Indeed, it can't:
In order to make things work, the exception hints you to create a client configuration file. The documentation that comes with the CTP contains such a configuration file, so I won't paste it here but the key takeaway here is that mirrored nature of client-side and server-side files:
Basically, the client just needs to point at a host in the "distributed caching cloud" to gain access to the cache (depending on the type of client, things look a little different - in the sample above the deployment type is for a simple client - more information can be found in the documentation).
Once you've run the program you can take a look at the cluster's cache by means of the show cachestats command in the administration tool:
To see the cache-behavior in action, try caching the current time (DateTime.Now) and get it back. Here's a piece of sample code illustrating this (using indexers instead of Get/Put calls):
while (true)
{
object data = DateTime.Now;
Console.WriteLine("Add to cache: {0}", data);
cache["Time"] = data;
data = cache["Time"];
while (data != null)
{
Console.WriteLine("Retrieved from cache: {0}", data);
Thread.Sleep(10000);
data = cache["Time"];
}
Console.WriteLine();
}
For sake of this demo, I've lowered the TTL to 1 minute (stop the cluster using stop cluster, change the TTL value in the config file and restart the cluster using start cluster). Here's a sample output:
You can clearly see how the data becomes null when the object has been evicted from the cache - this is an important thing: an application should never assume a cache hit will occur and be prepared to handle a cache miss. If such a miss occurs, the data in the cache can be refreshed if the client feels the need for it. Notice there's a method called ResetObjectTimeout that can be used to reset the TTL counter for an object specified through its key value.
More stuff
There's more to Velocity than just the stuff covered in the post. For example, there's the concept of regions that allows objects to be located on a specific node, offering additional search capabilities to find cached objects at the price of scalability across hosts. To handle concurrency when dealing with cached objects, Velocity supports optimistic and pessimistic locking semantics. And finally for ASP.NET applications, there's integration with session state. Obviously there's much more to come in upcoming CTPs but there's more than enough to explore already in this CTP.
Enjoy playing with Velocity (but as usual, keep in mind the CTP quality)!
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks