Saturday, June 26, 2010

SharePoint 2010 Cache

Introduction

When architecting your SharePoint 2010 solution need consider how you will leverage cache to make your applications more fast and scalable. In think a lot people (myself included) forget about caching strategies and how they can benefit from it. Each one has different pros/cons and correctly picking one based on the business requirements is important for success.

I found a detailed whitepaper called SharePoint Server Caches Overview, Advanced details on the SharePoint BLOB, Output, and Object Caches which goes over the topic. You will need to download SharePointServerCachesPerformance.docx.

The following are my summary notes from the whitepaper.

Really the purpose of caching for SharePoint is to reduce the amount of calls to SQL Server such that you can quickly return results to users while lowering SQL Server utilization. The negative is there can be a lag in showing the user the latest and greatest content. Once the cache is created, it is maintained locally on the SharePoint WFEs. There are three caching strategies you need to be aware of for SharePoint 2010: BLOB cache, output cache and object cache.

BLOB Cache

BLOB cache help improve performance by storing requested files on the WFE Server such that they do not need to retrieved every time from SQL Server. There are two basic ways files can be store in SharePoint, they can be placed directly on the server (like in the layouts directory in WFE) or they can be stored in SharePoint library. Placing the files on the SharePoint server is quicker than retrieving a file out of SharePoint however only administrators can update the files. BLOB caching specifically solves the issue of retrieving files from the document library by caching them on the WFE. This gives you the best of both worlds, centrally managed in SharePoint and improved load time of files.

  • BLOB cache should be used when pages that are accessed frequently have Javascript, CSS, images files, and large rich media files that can be cached on the WFEs. However BLOB caching is not useful if the files are not frequently accessed or if the files are modified on a frequent basis.
  • Another advantage of BLOB cache is that it reduces the time to reload web pages. This is because cache control headers can be added to the HTTP responses for the cached files on the WFE. What this will do is push cached files on the WFE down to the user's browser's cache. This will reduced in even less HTTP requests to the WFE itself.
  • BLOB cache is particularly helpful for cache large files out of the SQL server. The while paper goes into the details but there is no disk buffering for serving up larger files (5MB) which results in low latency. SharePoint is optimized for server up smaller chunk sizes (100KB).
  • When using BLOB cache, HTTP range request is supported which allows the browser to request pieces of the file to cache locally instead of the entire file. Media players that run on the client benefit when this is supported.

Let's take a deeper look at BLOB caching:

  • There is performance overhead to initially build the BLOB cache, which is around five times more expensive. One reason why permissions and metadata associated to the file is needs to be brought over to the cache to ensure security is still maintained.
  • BLOB cache is created by each web application on each WFE machine on the farm. This translates to each virtual site in IIS (which maps to a WFE) will have its own BLOB cache. BLOB cache cannot be reused across web gardens (or zones).
  • It is possible to configure the files that can be placed into the BLOB cache. There is a file with a list of extensions which can be modified based on your business requirements.
  • BLOB cache can handle multiple concurrent requests even when the requested file has not been cached yet. The example given was if a link to a large video file is mailed out you want to make sure when everyone starts clicking that link, the server does not get flooded requesting the same large file. With BLOB cache on, even if the file has not been fully cached, the video file will only be retrieved once per WFE from the SQL Server.
  • An interesting thing to know is BLOB files are stored on the WFE in folders that match the location in SharePoint. There is a 260 character limitation on file paths in Windows, so if you URLs are larger then there will be problems building your cache. It is recommended to keep relative URLs smaller than 160 characters.
  • You will need to plan for RAM utilization when using BLOB caching. The BLOB cache index will use 800 bytes of REM per entry.
  • BLOB cache is persistent cache because the cache is periodically written to file on the WFE. This means an IIS recycle or shut down will not lose the built cache. If the BLOB cache is very larger, there is a lag on when the cache will become available again once the IIS operation is complete.
  • Items in the cache are invalidated based on a polling mechanism that checks SQL Server every 5 seconds. This interval can be changed. They will not be added back to the cache until it is requested again.
  • BLOB cache also has a configurable size limit to keep the cache from growing at an uncontrolled rate. If the max is exceeded, files used the least will be removed until the cache is 70% below the max size. If this threshold is exceeded a lot, there will be performance overhead incurred, and it would be recommended to increase the max size.
  • It is possible to manually flush the BLOB cache forcing it to be rebuilt.
  • BLOB cache is optimized for returning files anonymously because the file can be immediately returned without making any SQL Server round trips. This can be done by marking the site as anonymous or storing the files in a library that has AllowEveryoneViewItems set to true.

Configuring and Managing BLOB Cache:

  • A mentioned earlier BLOB Cache is enabled by IIS site or SharePoint Web Applicatio. All you need to do is go to the web.config and modify the BlobCache node as enabled (Reference). There are several other configurations that are available for tuning the BLOB cache, I recommend reading the whitepaper for those details.
  • For the application pool you will need to increase the startup and shutdown time limits. It is recommended to set it to 300 seconds which will allow enough time initialize or serialize on startup or shutdown. Note this does not mean it takes 300 seconds to perform the operation, however it prevents IIS from terminating the application until 300 has elapsed so the cache is not lost (great reference).
  • It is recommended to keep all content to be cached in a specified list and sure the site containing that list is stable. This is because frequent changes to the site or list will invalidate all the cached files.
  • To flush cache a simple IIS Reset can be performed, the SharePoint API can be used or you can finally disable the cache, delete the folder that contains the cache and then re-enable the cache in the web.cong.

In summary use BLOB Cache:

  • If there is a high read to write ratio BLOB caching should be used. For instance you would want to cache a site logo that is used on every page request versus a collaboration word document that is actively updated.
  • It is optimized for supporting large files which can significantly reduce bandwidth between the WFEs and SQL Server.
  • It is optimized to support cache control headers so that clients can cache small files which can reduce overall number of hits to the WFEs.
  • If there is anonymous access, there can be dramatic improvements because permissions do not have to be validated for cache files.
  • Client applications that use range requests can optimize load times to access large files.

Output Cache

The second caching option you have with SharePoint 2010 is ASP.net Output Cache. This is an in-memory cache that saves rendered ASPX pages. Using Output cache improves performance in two ways first it reduces the amount of SQL calls. Second it reduces workload on the WFE because pages do not need to be re-rendered. Along those lines if the pages are anonymous, then no SQL check needs to be done at all present the cached pages. Microsoft testing concluded a ninefold improvement in throughput when compared to having to render the page every time it was rendered.

The only catch for using Output Cache is that it can only be used in conjunction with Publishing pages. It cannot be used with a collaboration site. Output cache is configured on a per site collection basis using cache profiles. A cache profile is the settings and parameters used to control how pages will be cached. Some examples of rules that can be capture in a cache profile would be to not cache if the requestor is a user who can edit pages to ensure they see the latest version of content. The cache profile also specifies rules for when a page is invalidated so that when the next request is made, it comes from the database.

There are two options for cache invalidation:

  • Time to live (TTL) – Is a basic rule that will retain the page until a length of time has been exceeded. Microsoft testing results found that TTL cache invalidation did perform well when the site content changes frequently.
  • Check for Changed – Is a rule that states all pages using the profile are invalidated when if there is any site change or TTL has passed. This is best used for sites where changes do not occur often.

One of the main considerations for Output cache is the memory needed to support it. For each rendered page, 2(size of the page) + 32KB is needed to store the rendered page in memory. Depending on your cache profile, you may store multiple different versions of the cache. You may create different cache versions based on what the users role is, what type of browser they use, or page layout type. For each version a different cache entry will be made for the same page. So it would actually be possible to create a rule that says specific types of publishing pages may become invalid every 10 minutes while other types would become invalidated every 24 hours.

To configure Output Cache:

  • To configure and set up Output cache and profiles read here.
  • Next you need to go to the site collection where publishing has been turned on and in the collection settings page turn on the output cache.
  • On the page you can set up the Output cache profiles – read the whitepaper for details on those configurations.

In summary Output Cache should not be used with sites using a low read to write ratio because frequent changes to content make it hard to keep the cache fresh. So understanding how important it is to have the most current content available to the user is important. Another consideration to know is how dynamic the content is and if per-user content has to be supported. Having to support per user cache, more space it is needed to store the cache. As well having to support lots of variations of cache will again require memory to support the cache.

Object Cache

Object cache is the third caching option we have for SharePoint 2010. What Object cache does is stores metadata about SharePoint Server objects (like SPWeb, SPSite, SPList, etc.) on the WFEs. When a page is rendered, if there is data that needs to be retrieved through these objects, the SQL Server will not be hit. Features of SharePoint that use Object cache are publishing, content query web part, navigation, search query box and metadata navigation. These features are specifically written to use the Object cache API instead of the SharePoint API directly. Developers writing custom functionality can also tap into the Object cache API.

The Output cache algorithm for how in determines what to cache is complicated because the user permissions have to be accounted for. Obviously we want to make sure security trimming is respected but it would be completely inefficient to create an Object cache of each and every user who comes to the web site. At a high-level there accounts (Portal Super Accounts) which can be created that can have standard permission levels assigned to them. Cache will be created based on these accounts and then the ACL of the current user will be applied to show the user data from the Object cache they have permission to. Along with this, there is the Object cache multiplier which is set at in a site collection. This multiplier controls the number of rows that will be cached. Increasing the multiplier will increase the number items returned from cache at the expense of utilizing more RAM to store more data.

Object cache cannot be disabled. Object cache configuration is very similar to Output cache. Object cache can check the site for changes every time there is a request or check for updates periodically. The only difference is Object cache will not become completely invalidated when any change is made to the site. For instance, if a list item were to change, only that list item would be invalidated and all of the other list items would remain in Object cache. Also, dependencies between cached items are maintained and if the list itself were deleted, the all the list items for the list in the cache would as well become invalidated.

The first step in configuration of Object Cache is the set up of the Portal Super Accounts which control how the cache is built. Please read the whitepaper for more information about how to set up these accounts. Some configurations can be easily access in the site collection administration page. As well, there are some configurations that can be applied to the web.config which are new to SharePoint 2010.

Finally you need to plan for sizing of the Object Cache. According to Microsoft, a small number of site collections (fewer than 50) the object cache should have a little to no memory issues but more than that you may need to do some planning. Microsoft's recommendations is to plan 500KB of RAM per site collection that has Object Caching turned on.



No comments: