Wednesday, December 29, 2010

SharePoint 2010 Search Configuring for Scale

This is the third part in my series on SharePoint 2010 Search Architecture.

SharePoint 2010 Search Scaling Configuration

Now if you have been reading this far you may be think this is really hard to configure. Well it is actually very simple. I am going to skip the step of setting up the SharePoint Search Application instance. I am going to give you a quick introduction to the screens you need to know about to scale SharePoint 2010 Search.

Note the screenshots because are just off my single server development box, so I cannot show you the nice screenshots with different server names.

Search Application Topology Screen

This is a screenshot of the Search Application Topology in Central Admin. Here you can see:

  • The Admin Component and what machine it is configured to run on.
  • Crawl Components – in this case I only have one and the machine that is configured to run on. Notice the name of the crawl database is shown so you can quickly see which Crawl Component maps to which crawl database.
  • Databases – lists all of the search databases associated to this Search Service Application instance.
  • Index Partition – Shows the single index that has been created and the query component for that partition.
  • You will notice there is a Modify button. Pressing that button will allow us to make changes.

clip_image001

Search Modification Screen

This screen shows the same information as above but there are a few differences

clip_image002

  • There is a New button in the top left had corner. This will allow us to create new crawl databases, crawl components, query components/index partitions, index partition mirrors and property databases.
  • Note there is an Apply Topology Changes button in the bottom right which allows us to make all of our changes at one time.

clip_image003

New Crawl Component

On this screen you basically just select the server on the farm where you want to install the crawl component. Then you need to select the crawl database for the crawler component; that is it.

Remember you can have multiple crawler components installed on the same machine. It would not make much sense to have two crawler components on the same machine that I using the crawler database.

clip_image004

Reference - http://technet.microsoft.com/en-us/library/ee805950.aspx

New Query Component

This is where you create new query components. Remember when you create a new query component; a new index partition is created. The SharePoint Search Service will have the responsibility to ensure that each index partition has an even distribution of items.

Here you select what server it will be installed on and you select the property database that will be used. The location of the index file is important; it needs to be in a location that is biggest enough to handle the index.

clip_image005

Reference - http://technet.microsoft.com/en-us/library/ee805955.aspx

New Query Component / Index Partition Mirror

To create a mirror, back on the Modify Topology screen, if you hover your mouse over the Query Component link, a dropdown will come up saying “Add Mirror”.

clip_image002[1]

This will allow you to pick a server for where you would like to create a mirror. Remember there is now real point in creating a mirror for an index partition on the same server where the partition resides. The same sort of screen will come up as the as when adding a new query component. The checkbox at the bottom “Set this query component as failover-only” is important. If this is checked, the mirror will only be redundant. If not checked, the mirror will be used as part of the query process.

Reference - http://technet.microsoft.com/en-us/library/ee805953.aspx

New Crawl Database

This is the screen where you create a new crawl database. Here you simply just fill out the basics, not much to it.

One example of why you may need a second is to support dedicated search resources to specific content sources. If you want to do that, you have to make sure to check the checkbox below. If you do not, the new crawl database will simply be used as a new crawl database along with other existing ones you have. After you check the box below, you will need to apply the changes then perform some steps like creating a new content source and host destination rules which will map content sources to crawl databases.

clip_image006

Reference – Crawl Database - http://technet.microsoft.com/en-us/library/ee805952.aspx

Reference – Host Destination Rule - http://technet.microsoft.com/en-us/library/ee805949.aspx

New Property Database

Finally there is an option to create new property databases; nothing special about this. If you were to do this, you may have to go back and reconfigure you existing query components to use this new property database. That will in turn re-create your index partitions underneath the hood and as well require you to create new index partition mirrors. All in all, not a big deal, just a few extra steps.

clip_image007

Reference - http://technet.microsoft.com/en-us/library/ee805954.aspx

Series References:

Scaling SharePoint 2010 Search

This is the second part of my series on the architecture of SharePoint 2010 Search.

SharePoint 2010 Search Component Scaling

As I mentioned the SSP was one of the biggest constraints to SharePoint 2007 Search’s ability to scale. Now that is gone, we have a ton more flexibility.

Let’s talk about the components I introduced earlier and how each are commonly scaled.

Crawler – There can be multiple instances of the crawler component within a single search application service. This will allow you to more efficiently crawl content based on the scenario you need to support.

One example would be adding a second crawl component (on a different server) to improve the performance of performing a crawl. As I mentioned the crawl component is stateless and data about what has or has not been crawled is stored in the crawl database. So you can create multiple crawl components that are configured to use the same crawl database which will help with the performance for building the index (i.e. multiple crawlers running in parallel).

Another example would be you want to dedicate higher end machines with crawl components to particular content sources. For instance let’s say there is content to be indexed on SharePoint and in File Shares, and there is significantly more data on the File Share. You have the ability to create new crawler components that has a dedicated crawl database. Using host distribution rules, you can configure a new crawl component and crawl database to only crawl content in the File Shares while others crawl SharePoint content.

Some other guidelines you should be aware of are:

  • There should be at least 4 cores dedicated for each crawl component running on a server.
  • Not recommended to have more than 16 crawler components in a single Search Service Application.

Crawl Database – Multiple instances of the crawl database can be created to support different scaling scenarios as just mentioned. It is not recommended to have more than 25 million items in a single crawler database and no more than 10 crawl databases per Search Service Application.

Query Component – In SharePoint 2007 we had the ability to run multiple instance of the query component on each load balanced WFE so we had the ability to some scaling. We now have more granular control of how we scale out multiple query components. We still have the ability to create multiple instances of a query component for redundancy purposes.

Index Partition – As I mentioned the Index Partition was added to allow for queries to perform more efficiently. Each query component has an associated partition. Whenever a new partition is created, a new query component must be created as there is a one-to-one relationship between the query component and the index partition. There is an emphasis to ensure that each index partition is evenly balanced with documents (a hash algorithm is used based on the document id).

For example, let’s say you have 2 million documents in your single index and it takes 1 second to return the results. If this were split into two partitions with two query components, the time to complete the search is cut in half because now there are two query components only search 1 million documents.

Couple Notes:

  • It is not recommended to exceed more than 20 index partitions within a single Search Application Service instance; even though the hard boundary is 128 index partitions.
  • There is a hard limit of 10 million items that can be in any index partition with SharePoint 2010 Search, and it is not recommend exceed 10 partitions (100 million items). If you are exceeding that many items, you will probably look into separating search into multiple search service instances or look into using FAST.

Index Partition Mirror – Earlier I introduced the content of index mirrors as a way to provide redundancy and better performance. The mirror is how it sounds; it is an identical copy of the index. When mirrors are used, you basically now have a one-to-many relationship between the query component and the index partitions but there will only be one primary partition. The mirror partitions can be used a way to provide redundancy so if the machine that is hosting the primary partition was to fail, the mirror will become primary. Also, the mirror can be configured so it can be search against as well to help support scenarios where there is a load of queries being made.

Property Database – There can be more than one property database created in the farm to again support scaling and query performance. It is recommended that once there 25 million items that have been indexed, a new property database should be introduced into the solution architecture. This is because with that many items, a significant amount of metadata will be created and the property database is responsible for managing that metadata for search queries. This is not hard limit; just a recommendation.

It is also recommended to separate the Property and Crawl databases to separate storage to remove I/O contention that can occur when there are both crawls and queries executing at the same time. This is because the property database is highly utilized as part of the querying process. If full crawls are done during off hours and incremental crawls are not very intensive based on the amount, this should not be an issue.

Search Admin Component – There is no capability to scale this; not needed either.

Search Admin Database – Since I mentioned this earlier, I will mention it to be consistent. There is no capability to scale the search admin databases since there can only be one per Search Application Service instance. You can create redundancy at the SQL Server level.

SharePoint 2010 Search Scaling Scenarios

Up to this point you have been consuming a lot of information about what the new SharePoint 2010 Search components are and how they can scale. It is a lot to take in. What I personally like to do is start with the most basic scenario and then scale out from there.

I am going to focus each scenario based on the number of items to be searched.

To save myself time, I am going to re-use several of the diagrams that are provided to us here - http://technet.microsoft.com/en-us/library/cc263199.aspx. Specifically I am using diagram from “Search Architectures for Microsoft SharePoint Server 2010” and “Design Search Architectures for Microsoft SharePoint Server 2010” diagrams. I highly recommend reading both of these in detail once you have finished reading my blog as re-enforcement.

Scenario 1 – 0 to 1 Million Items

This is probably one of the most basic scenarios but according to Microsoft either of the farms can support this many documents.

clip_image001

HOWEVER I would never recommend these two environments for production where a Service Level Agreements (SLAs) might be in place because it is not a redundant. I would expect out of the gate, to minimally have something like below. This is a best practice from SharePoint architecture perspective to have your Web Front Ends (WFEs) and Application Servers on different machines. As well, you typically have the search service dedicated to its own Application Server as to not create contention with other services that may be running in the farm.

clip_image002

I think the point at the end of the day is that a single application server, that running all of the Search components we described earlier, can support up to a 1 million items. So if you are running a small SharePoint production site, where SLAs are not stringent, you should be fine using search on one application server for up to 1 million items.

Scenario 2 – 1 to 10 Million Items

In this scenario is we now have a few million items and at the point where scale must be introduced. This is referred to as a Small Search Farm.

clip_image003

Observations:

  • There is a single crawl server with a single crawl component that builds a single index of all content.
  • There is a single database server used where the crawl, property and admin databases are hosted.
  • There are two query servers and they have been configured to run on each web front end (WFE). This means the WFE needs to have enough space to store the index.
  • There is only one index partition created by the crawler where all of the indexed items can be found.
  • There is one primary query component and there is one mirror query component. All the queries will go against primary query component and will failover to the second.
  • One question you may have is will only having one active query component become performance bottleneck? The answer is it could if you have a website with a lot of users performing concurrent queries. Remember you can configure the query component mirror to accept queries.

Scenario 3 – 10 to 20 Million Items

Next we have a scenario where we have roughly 10 to 20 million items to be indexed and we want to start scaling the architecture. This is the one that many organizations will use to start with because it scaled for redundancy and performance. So even though you may not have that much content right off the bat, you will be able to grow into it.

clip_image004

Observations:

  • There are now two crawler components that reside on different machines. The two crawler components that work with a single crawl database in parallel to build up the index. This also adds a level of redundancy if one of the crawl servers were to go down.
  • There are now to partitions of the index, mostly based on 10 million item limit for each index partition. The need for two index partitions will require that two query components are created. The primary for each query component is installed on a different Query Server with the query mirror installed on the other machine. The net effect of this is query time will be reduced because the query component does not have to search the entire index; it only has to search half the index.
  • The Crawl and Property databases have been split apart onto different servers. This is because contention can be created between them when crawling and querying occurs concurrently. I really think this is probably one of the last things you need to do to improve performance if you have a highly available SQL Server production environment. This will only help when there are database load issues, there are more than 25 million items or there is a significant amount of metadata that has been built up.

Scenario 4 – 20 to 40 Million Items

In this next diagram the amount of content is dialed up to 20 to 40 million items.

clip_image005

Observations:

  • This is still actually referred to as a medium farm but it is called a dedicated farm because the Query components are no longer hosted on the web front ends. The query components are not hosted on individual machines.
  • Next you will notice there are now four index partitions for the corresponding four query components. As well, the mirror for each query partition is placed on a different query server.
  • Next notice there are now two crawl databases and four crawler components. Two crawler components are dedicated to each crawler database. This configuration supports the ability to have different crawler components search different content sources. For instance one set of crawl components may search SharePoint while the other set search MySites, File Shares and Public Exchange Folders. The point is that if you have scaled up to searching this amount of content is likely that you will be breaking about how content is indexed for performance reasons.

Scenario 5 – 40 to 100 Million Items

Last is a fully scaled out SharePoint Search Application service instance.

clip_image007

Observations:

  • This is a fully scaled out farm based on the recommended capacity for SharePoint 2010 search, being 100 million items. As you can see there are now ten index partitions.
  • There are several crawl components and crawl servers.
  • Crawl databases have been broken into multiple database servers.
  • Multiple property databases on different database servers.

Personally I am not sure how common you will see something like this. Do I believe more than 100 million items will need to be indexed by SharePoint? Yes – I do believe that will happen in large environments. However you may employ a strategy where you have:

  • Multiple Search Application Service instances in the farm with less data being indexed by each.
  • There may be a central hosted search farm while smaller farms have their own search configuration.
  • And when you need to provide a single user search experience where more than 100 million items need to be searched; FAST can be utilized. I want to point out that even if you have less than 100 million items to be indexed FAST should still be considered. This is because there is a ton of search and usability features available in FAST which is not available in the out of the box SharePoint search.

Closing

To this point I discussed only how to scale based on the number of items to be indexed. However that is not really an accurate way to build a SharePoint 2010 Search architecture. In many cases business rules or other environment factors can drive how search is architected. Here are some examples:

  • Connectivity to specific content locations is slower in some cases. What you may do is add dedicated crawler components and databases just for indexing this specific content.
  • Content that is indexed must be fresh and full indexes are needed on a regular basis. In this scenario you will again add more crawl components.
  • There is a significant amount of users who will query. In this case, you may only have 5 million items but you may have two or even three query components dedicated to their own machines.

Hearing these examples, you should take the information above and start scaling out based on the business requirements and SLAs you need to support.

My personal recommendation is probably start with scenario 3 and make any tweaks to the configuration based on what you need for a production environment that needs to support growth and redundancy. However adding new components later is no big deal either.

Now that you have finished reading this – I highly recommend you read both of these:

SharePoint 2010 Search Architecture Introduction

Introduction

Every time there is a major version release of SharePoint there is always a major improvement in the area of Search. Users need to have the ability to search for content and the users demand the same sort of user experience they have with their everyday search engine. Microsoft really stepped up to the plate this time with SharePoint 2010.

SharePoint 2007 introduced tons of new features that were not available in SharePoint 2003. For instance searches could be done across site collections, it actually returned back correct results, scopes, best bets, search analytics, search federation, business data catalog (searching external line of business systems), a search API, etc. Still however there were some challenges. Scale would become an issue because SharePoint 2007 had exponential growth due to its ease of use. The Share Service Provide (SSP) and the way it is architected was a contributing factor. For instance an SSP could only have on crawler which provided no ability to control large amounts of content. As well, users were demanding for the same sort of user experience they have with Bing, Google, etc.

For SharePoint 2010, there is a lot of improvements.

  • There is the new Service Architecture of SharePoint 2010 and the removal of the SSP. Read this blog. As you will see this now enable SharePoint 2010 Search to scale and that will be the focus of this blog.
  • The ability to index 100 million items.
  • Continued support for indexing file shares, external web sites, line of business systems, public exchange folders, etc.
  • Boolean search (and, or, not) are supported.
  • Range symbols such as =, <, >, <=, and >= can be used.
  • Wildcard searches are now supported out of the box.
  • Support for property based searches on the metadata (title:“XXX YYY”).
  • Improved relevancy mode like Phrase matching and clickthrough counts.
  • Refiners which provide the ability filter down the search results using the returned metadata without have to re-run the actual search.
  • Did you mean feature which provides suggestions – like in the case the user misspells a word.
  • Search suggestions which provides an auto complete based on what the user commonly searches on.
  • Search Alerts and RSS Feeds
  • Improved query federation
  • More extensible search web parts How
  • Several new administration features
  • Mobile search

Another major Search improvement is Microsoft’s acquisition of FAST Enterprise Search. Microsoft spent over $2 billion to acquire one of the most high enterprise search engines on the market and incorporate it into the SharePoint platform. The goal of this blog is not to do a feature and architecture comparison with FAST. I may do one in the near future. At a high level you should know that FAST has:

  • Limitless scalability. Can search and return results in sub-second times over petabytes of data.
  • High scale search refiners.
  • Extremely powerful and tunable search relevancy model.
  • User contextual search results and relevancy.
  • Entity extraction.
  • Ability to index almost any type of content imaginable.
  • Similar search result suggestions.
  • Thumbnails and document previewing in search results.
  • Visual best bets.

Still getting your arms around the OOB of the box search if you are familiar with SharePoint 2007 search can be daunting task. I was able to pull together a bunch of information and I am going to consolidate this down for you.

  1. I will capture the new components for SharePoint 2010 Search.
  2. I will then discuss how each component can be scaled.
  3. I will then discuss scenarios on SharePoint 2010 Search is scaled.
  4. I will actually show you how simple it is do the scaling of SharePoint 2010 Search.

SharePoint 2010 Search Components

Let’s first talk about all the new components and architecture you need to know about right off the bat.

Crawler – You will hear about this a lot; it is commonly referred to as the crawling component or indexer. It is responsible for building indexes. Unlike the previous version of SharePoint the crawl component is stateless; meaning the index that is created is not actually stored in the crawl component. The index is pushed to the appropriate query server.

Crawl Database – As you just learned, the crawling component itself is stateless. State is actually managed in the crawl database which will track what needs to be crawled and what has been crawled.

Query Component – This is the component that will perform a search against an index created by the crawler component. The query component will apply such things as security trimming, best bets, relevancy, removes duplicates, etc. It is also commonly referred to as the query server.

Index Partition – Is a new feature of SharePoint 2010 and is directly correlated to the query component. We now have the ability to break the indexes into multiple partitions to improve the amount of time it takes to perform a search by the query component. For every query component there will be a single index partition that is queried by the query component. Another way of putting it is, every time a query component is created, another index partition is created.

Index Partition Mirror – There is a new capability to create mirrors of the index partitions. These mirrors again provide the ability to provide redundancy and better search result performance.

Property Database – Stores metadata and security information items in the index. The property database will be associated to one or more query components and is user as part of the query process. These properties will be populated as part of the crawling process which creates the index.

Search Admin Component – The admin component that manages the configuration of the instance of the Search Application Service.

Search Admin Database – It is worth noting there is a search administration database and it mostly responsible for managing information associated to the configuration and topology of the SharePoint Search service. There will only ever be one instance of this database for each Search Application Service instance.

Now that I have introduced you to the major components of SharePoint 2010 Search, I will dive into details about the components and how they can be used together to create search solutions.

Sunday, December 26, 2010

SharePoint 2010 High Availability with SQL Server

Introduction

For almost my entire career I have been an application developer. I got into SharePoint roughly six years ago and continued that philosophy however a few years ago I found out I could no longer get away with that. I had to start learning the architecture of SharePoint and how to design / configure it to support the major “ilities”. The one on everyone mind was how to scale SharePoint. I like to think I became very knowledgeable of this topic for SharePoint 2007 and now SharePoint 2010. But again, I was lacking in an area – SQL Server.

I actually like to again think I am very knowledgeable on SQL Server from an application development perspective. I am strong at normalizing databases, views, creating indexes, stored procedures, optimistic/pessimistic locking, partitioning data, creating data access layers (DALs), object relational mapping (ORM), etc. but when it comes to the administration of SQL I needed to brush up on the topic. As you can see my focus has been all about getting data in and out.

Understanding SQL Server architectures for high availability are critical to the overall SharePoint architecture because of the strong relationship between the two. I have usually been able to get away with a client already having a highly available SQL Server environment ready for me but now I am finding out that is not always the case. The purpose of this blog posting is to provide information on:

  • What is SQL Server High Availability?
  • How does SQL Server High Availability relate to SharePoint?

SQL Server High Availability

I am not going to give the definition of all the SQL Server high availability solutions. This section will mostly be comprised of high recommended readings that will make you smart enough to have to have an educated discussion on SharePoint and SQL Server.

When you start reading about SQL Server High Availability, you will find the following solutions. I am going to give my very quick definitions of what they are:

  • Clustering – An approach where multiple SQL Server resources (that share the same disks) are presented as a single machine to calling applications. Applications, like SharePoint, do not know how machines there are; all it knows is that there is SQL Server. If there is a failure, clustering services will bring online the other SQL Machine without affecting the application. Clustering does not provide protection against disk failure as it is a software level solution.
  • Mirroring – This is an approach where a primary and a mirror database server are set up. There are two modes that you have to understand. There is high-safety mode (synchronous) which will ensure that any transaction that is executed against the primary is completed on the mirror. One drawback of running in high-safety is that it takes more time. The second is high-performance (asynchronous) which commits transactions on both the primary and mirror databases; however the primary will not wait on the mirror to complete the transaction. This provides better performance for highly transactional systems but there is the potential for data loss. So at best this solution provides a “warm” stand-by with potential data loss if asynchronous is used. There is configuration of mirroring called high-safety with automatic failover where a third server is used (called the witness) which will evaluate communication between the two machines and potentially make the mirror server the primary server. This effectively makes the mirror server a “hot” standby but the calling applications must be notified and there are solutions for this. Another limitation of database mirroring is that it can only be configured between two machines.
  • Log Shipping – They usually start off by saying this solution operates that the database level. This solution allows you to create one-to-many “warm” standbys where there is a primary database and secondary databases monitor the logs of the primary and bring themselves up to date with the primary. As you can see, there is a delay between the secondary being completely up to date with the primary (which is not a bad thing in some scenarios). For a secondary database to become the primary, all the logs must be applied. This can be used as a solution to supplement mirroring.
  • Replication – I joke the tried and true methodology which uses a publish-subscribe model where the primary will distribute data to one or more secondary servers. We are not going to dive into this for SharePoint.

Now you may say that is not much – but at a high level that is what you need know. I am not a SQL Server administrator, nor do I want to be. Here are readings I believe that you must read so that you can become knowledgeable on this topic. The great thing about these readings is that I have now saved the cost of going out and buying a book; it is all here. All of these articles have links to more articles to go deeper where ever you need to.

  1. High Availability Solutions Overview - http://msdn.microsoft.com/en-us/library/ms190202.aspx - I highly recommend starting here. My overview aligns to this.
  2. Selecting a High Availability Solution - http://msdn.microsoft.com/en-us/library/bb510414.aspx - this is a great read that summaries the pro and cons of the approaches I have above. Highly recommend reading this.
  3. High Availability: Interoperability and Coexistence - http://msdn.microsoft.com/en-us/library/bb500117.aspx - This is a very solid series written about how the high availability solutions I discussed above can be used together. Specifically there are two articles in this section that show how Mirroring and Clustering are implemented together. Then another discussion on how Mirroring and Log Shipping work together. These enlightening to read.
  4. High Availability with SQL Server 2008 - http://msdn.microsoft.com/en-us/library/ee523927(SQL.100).aspx - This is a great article that says “wait” understand your requirements before you start talking high availability solutions. Plus it goes into backup and recovery which is another major piece of the puzzle. I would read this after you have read my other recommendations as this will bring the entire discussion into perspective.
  5. Achieve High Availability for SQL Server - http://technet.microsoft.com/en-us/magazine/2007.03.highavailability.aspx - This is an optional recommendation on the basic of availability. It is on SQL 2005 but it is well written and supplements the first two I referenced.

At a minimum you have to become knowledgeable with the first two readings. If you have the time, the third reading is worth your time.

SharePoint and SQL Server for High Availability

So now, you may be wondering how this may translate into SharePoint 2010 and how I create my architecture. Hopefully I will be able to provide you some references on how you should configure SharePoint 2010 for high availability because on your requirements.

SharePoint 2010 Databases

First it is important to understand all of the databases of SharePoint 2010. I wrote a blog here based on this Technical Diagram - http://go.microsoft.com/fwlink/?LinkId=187969. Highly recommend reading these and understanding the nature of each SharePoint database. You will need to understand this in the context of the configuration of SharePoint you are using.

SharePoint 2010 Capacity Testing Results

Yes, Microsoft has done a significant amount of testing to understand the boundaries of SharePoint 2010 and have made the results from these tests available. I have written a summary about these test results on my blog here. At the top of this blog I provide references to detailed whitepapers published by Microsoft. These are very in-depth and I would not recommend reading this out right. However there is a ton of good information here which will point you in the right direction of how you may want to design for high availability.

SQL Server 2008 R2 and SharePoint 2010 Products: Better Together

This is a whitepaper (http://technet.microsoft.com/en-us/library/cc990273.aspx) that has been written on the topic of how SharePoint 2010 and SQL Server 2008 together make a great solution. However you will find it to be no more of an overview and will not give you information how to configure for a highly available environment. I would recommend this as supplemental only or provide this to folks who have very limited understanding of SQL Server (so this whitepaper could be useful).

Storage and SQL Server capacity planning and configuration (SharePoint Server 2010)

This whitepaper (http://technet.microsoft.com/en-us/library/cc298801.aspx) is on the completely other side of the spectrum that just referenced. I would just give this article immediately to any high end SQL administrator who is evaluating or designing a solution to support SharePoint 2010. It is full of best practices and official recommendations. There are gems of information in here like:

  • Making sure I/O operations per second be the fastest it can be.
  • Equations for estimating database sizing for all the types of databases.
  • Direct Attached Storage (DAS), Storage Area Network (SAN), and Network Attached Storage (NAS).
  • Disk types
  • Memory
  • Server architecture. Interesting facts I read where additional database server is recommended when there are more than four web servers running at full capacity. Plus add additional SQL Server when content databases exceed 5 terabytes.
  • SQL Server configuration options
  • Disk prioritization
  • Performance Counters to watch.

Plan for availability (SharePoint Server 2010)

Now this is the whitepaper (http://technet.microsoft.com/en-us/library/cc748824.aspx) that really brings the whole thing together and I highly recommend this article. I purposely brought this up LAST because if you read this far you should be rewarded J

When you start reading this you will see this has little to do with the SharePoint 2010 Service architecture and more to do with SQL Server high availability solutions. Specifically it recommends:

  • Failover clustering
  • Mirroring – “For mirroring within a SharePoint Server farm, you must use high-availability mirroring, also known as high-safety mode with automatic failover”. This means synchronous.

untitled

A few other interesting notes are:

  • SharePoint 2010 is now mirroring aware meaning that SharePoint 2010 can be configured to know which SQL server is the primary and which is the mirror. This reduces the need for clustering.
  • There is a good chart that compares clustering against mirroring.
  • There is no mention of Log Shipping however you may very well want to implement it. I can see scenarios where you can do some creative things like have a log ship configuration refreshing a testing environment with production data on an interval, etc.

Please note that this is not a requirement for running SharePoint 2010. These are best practices solutions for creating highly available SharePoint environments to meet Service Level Agreements (SLAs). So if you have a 99% sort of SLA, you will have to consider these sorts of architectures.

Plan for disaster recovery (SharePoint Server 2010)

Disaster Recovery is another piece of the puzzle when designing a highly available SharePoint 2010 architecture. Please read this whitepaper (http://technet.microsoft.com/en-us/library/ff628971.aspx). On the topic of SQL Server there is specific mention to using either Log Shipping or Asynchronous Mirroring to create a “hot” disaster recovery environment. There are specific discussions to which databases need to be part of the configuration. This is a good starting place to start understanding this topic.

untitled

Conclusions

There are a couple conclusions I came to while performing this research:

  • The solutions for scaling SQL Server itself are not really all that much different in concept to the solutions we have at our hands for SharePoint 2010 services.
  • Much of what I talked about in this blog applies to both SharePoint 2007 and 2010.
  • I have always understood there has been a direct correlation between SharePoint and SQL Server when it comes to performance. I have seen when SharePoint flat out perform better in these situations.
  • High Availability does not equate to better performance in all circumstances. Typically SQL machines that are in a high available environment will be more “beefy”. Point is if you expect good performance and you put SharePoint on an already highly utilized SQL server there will be performance issues. In those situations you may want to have dedicated SQL Servers for SharePoint. I wish I could still give a one size fits all recommendation but it really comes down to how your environment is configured and how SharePoint is ultimately used.
  • Back Up and Recovery are not addressed as part of this discussion. That is a completely separate topic but is absolutely critical to the overall high availability architecture.

Monday, December 20, 2010

SharePoint 2010 Development Patterns and Practices

Introduction

I would highly recommend reading the Developing Applications for SharePoint 2010 best practices that have been written by the Microsoft Patterns and Practices team. In this blog posting I am going to capture my notes as I read this.

Part 1 - Application Foundations for SharePoint 2010

This first section covers several basic concepts. Is basically a discussion for three reusable components that have created as part of this effort and I would seriously recommend bringing them into your custom solutions.

  • First they created a Service Locator pattern which will help you more layered architectures instead of embedding all your logic into a web part.
  • Second they create an Application Setting Manager which follows several best practices for using config data for you custom solutions.
  • Third they have provided a SharePoint Logger that provides you great utility to have better application logging within SharePoint.

There is also a detailed discussion on creating solutions and custom code that are testable. Over the past few years, I have seen a lot of custom SharePoint solutions that are not as testable as they should be. I think the real point is that you should continue to use the same custom software development practices with SharePoint as you would if you were creating a custom application. It seems like so many solutions I have seen have forgotten this.

Part 2 - Execution Models in SharePoint 2010

If you are a developer, architect or technical team leader – this section is an absolute must read. I was so happy to see this written because I cannot tell you how many times while I was doing SharePoint 2003 and 2007 projects that we would have to figure this out every time because there was no good guidance.

They basically break it down into understanding two aspects. The first being the execution environment which is basically where will the DLL you create being deployed (i.e. the bin, gac). Understanding this has several Code Access Security (CAS) considerations you must understand. The second is the execution logic which means understanding how will your custom code be invoked. Code can be invoked in a web part, timer job, workflow, etc. Understanding these aspects will give you a fundamental understanding of how SharePoint works underneath the hood.

In the Farm Solutions section there is a great discussion describing the process flow of how libraries and loaded into memory. This documentation was not available in the SharePoint 2007 timeframe and we had to figure this out ourselves. The discussion specifically discusses loading custom dlls into either the Bin or the GAC and the design decisions you should make around this. I still stand firm that it is best to deploy to the Bin mostly for security reasons.

There is a good justification made here for Sandboxed solutions. The basic point is that to deploy a standard SharePoint WSP solution requires administration permission levels and access to the SharePoint box, which has to be tightly controlled for production. Developers should never have access. Sandboxed solutions allow developers to deploy into production environments safely. The biggest issue that will come up is the limitations on the SharePoint API the developer has when running in a Sandbox solution. In the Sandbox section there is a listing of what can and cannot be done in a Sandbox solution. At a very very highly level you have the ability to create web parts that can work with lists within your site collection and pretty much do anything you need within that scope. Here are the following namespaces that are available to you:

  • Microsoft.SharePoint
  • Microsoft.SharePoint.Administration
  • Microsoft.SharePoint.Navigation
  • Microsoft.SharePoint.UserCode
  • Microsoft.SharePoint.Utilities
  • Microsoft.SharePoint.WebControls
  • Microsoft.SharePoint.WebPartPages
  • Microsoft.SharePoint.Workflow

I can say that many of the departmental level custom solutions I have written would adhere to this. Only place where this would not work well is if:

  • I were creating some sort of Farm Level solution that had to be used everywhere.
  • If you have to go outside of the SharePoint context (i.e. to an external database) you will not be able to call anything that has reference to the SQL API. However you can use the BDC object model to access external data.
  • If you need to run with SPSecurity.RunWithElevatedPrivileges.

There is a great discussion of how Sandboxed Solution DLLs are loaded. Specifically it discusses both the new SharePoint User Code Service and Sandbox Worker Process and how they work together to create new processes where Sandboxed solutions run within. There are maximum levels that will be managed so if too many Sandbox solutions are running, ones that are not being used will be unloaded.

Part 3 - Data Models in SharePoint 2010

Everything in this section is good read. They really help you with the decision process of what type of data you have and how it should be stored and how should the data be accessed based on the nature and life-cycle of that data. You should never make a long term decision of how data will be stored because it is “easy” to do it one way versus another. A real decision needs to be made on the long term management of that data.

There are so many new features in the area of ECM for developers in SharePoint 2010. You have Business Connectivity Services, LINQ to SharePoint, relational lists,

Part 4 - Client Application Models in SharePoint 2010

This section is a whole discussion on client based access to SharePoint 2010; specifically focusing on Rich Internet Applications (RIA) like Silverlight, Ajax, ECMA Javascript, Client Side Object Model, REST Services, etc. You really need to have an understanding of all the tools in the toolbox before you make a commitment on how you are going to build your custom solution because Microsoft has pretty much provided you a tool to solve any use case that can be thrown at you.

I personally liked the “RIA Technologies: Benefits, Tradeoffs, and Considerations” because there were several things in that discussion which I take for granted. I also really liked reading the article “Using the Client Object Model” because it broke down the three main client scripting languages and focused in on the new client-side object model (CSOM).

SharePoint 2010 Developer Notes

I recently went to SharePoint 2010 Developer training. The following are some notes I took which were really interesting for me to learn about when comparing to my past experiences in SharePoint 2007. There are tons of published articles out there that capture much of the big things; this should capture some subtle changes as a developer you should know about.

My instructor was Scott Hillier with Critical Path Training. The course was really good.

  • Sandbox Solutions – We can talk about the challenges of working within the SharePoint Sandbox from a developer perspective because you are working with a restricted API. However the best practice moving forward is that all solutions should be initially developed for the Sandbox. My personal experience so far is that trying to force yourself into running in the Sandbox is a good thing and will force you to write good performing and secure solutions. If you cannot create a solution that reasonably fits in the Sandbox, then we should move out of it. Still there are ways to get around running in the SharePoint Sandbox. Enterprise data can be pulled in using External Content Types, Silverlight and ECMA JavaScript have no constraints, and it is possible to call a custom DLL from a Sandbox Solution writing a Full Trust Proxy. This right here is the best discussion I have read on the SharePoint Sandbox to date - http://msdn.microsoft.com/en-us/library/ff798382.aspx
  • Upgrade Actions and Event Handler – Can I say Hallelujah? This is a new ability to add on custom actions to deployed features without having to completely remove old versions and redeploy. There is a new tag called <UpgradeActions> which can be added to the Feature.xml. There is also a new FeatureUgrading event handler. One note is that PowerShell must be used to actually initiate the upgrade. This will be very helpful to managing deployed Features. When Features are activated across a large number of sites it can be challenging to push out changes and make sure that all new activations of that Feature are the same from that point on.
  • Declarative Features – Little tip is that in Visual Studio 2010, if you are building a purely declarative Feature, like content types, turn off the DLL generation because there really is no DLL that needs to be deployed.
  • SharePoint Explorer – Do not take this for granted. This is a great way to dig around SharePoint and get things you may need for development, like those pesky instance GUIDs.
  • Common Master Page – Application and Site pages no longer have different master pages and CSS. Default.master is available for backwards compatibility to SharePoint 2007 (i.e. an upgrade). v4.master is used for new pages that are created in SharePoint 2010. Application pages use the same v4.master page. The v4.master links to corev4.css, has a reference to the ECMA javascript library, and heavily utilizes the ribbon.
  • Asynchronous Web Parts – For long running processes in your web parts. You can use System.Web.UI.PageAsyncTask class to facilitate handling the callback when the processing is complete.
  • Upgrading Content Types – Along the lines previously mentioned, <UpgradeActions> has a tag called <AddContentTyepField> which can be used to help with content type Features deployments. Sometimes pushing out changes in SharePoint 2007 would not well and this should resolve those problems.
  • Relational Lists – This will be very interesting to work with in SharePoint 2010. We all know that SharePoint lists do not replace a real database like SQL Server however we do Joins between lists and referential integrity is now supported. Another interesting feature is that columns can now be marked as unique, forcing a unique value for that the list.
  • List Throttling – Has been introduced to ensure that developers do not continue bad behavior by querying too many items and taking down your SharePoint site.
  • List Item and List Item Field Validation – We can now create formulas for validations reducing the amount of custom event handlers we would have to write.
  • Post Synchronous Events – THANK YOU. We can now add a property to the receiver definition (SPEventReceiverDefinition.Synchronization) which allows for post-processing after the commit but before the user is shown the result.
  • Cancel Event Error Page – We now have the ability when we want to cancel an event handler, maybe because a business rule failed, we can route the user to a screen with custom error explaining why the event failed.
  • LINQ to SharePoint – This is news everyone should know about but man is working with data in SharePoint lists so much easier without CAML code. NOTE – if you want to extend the code that is generated code by SPMetal are partial classes. So you can write your own partial class that combines custom logic which will not be lost when you regenerate your classes.
  • WCF Data Services – REST based services make it very easy for external applications to have direct access to data in SharePoint. Generating a Proxy for simple Windows or command line application is easy.
  • Content Organizer – You should really look into the new features here instead of custom coding like we have had to do the past.