Showing posts with label SQL Server. Show all posts
Showing posts with label SQL Server. Show all posts

Wednesday, September 26, 2012

SharePoint 2013 Technical Diagrams Notes

Introduction

SharePoint 2013 Preview Technical Diagrams are now available here - http://technet.microsoft.com/en-us/library/cc263199(v=office.15).aspx

Ever since SharePoint 2007 started publishing these technical diagrams, I have recommended that architects become very familiar with them. I always start here when trying to understand a new major product release for SharePoint. If you search my blog, you will see that I have directly referenced these diagrams when building SharePoint strategies for customers.

The following is a high-level review of the new architecture changes available with SharePoint 2013.

Corporate Portal Diagrams

I reviewed the two new Corporate Portal Diagrams for SharePoint 2013. From a logical architecture perspective, these diagrams do not have any major changes from the SharePoint 2010 versions. These diagrams accurately capture how organizations should build web applications, site collections, application pools, SharePoint services, etc. to support major business initiatives. The diagrams are still a must read for people who are new or who need a refresher to understand how they should be segregating content and business functions across SharePoint.

Extranet Diagram

The new diagram for SharePoint 2013 extranet architecture closely resembles the corporate portal diagrams, however it is not very revealing on the type of information organizations need when making a decision on how to deploy an extranet. Looking back at the SharePoint 2010 Extranet Topologies diagrams (http://technet.microsoft.com/en-us/library/cc263199.aspx), I find that diagram to be much more helpful and the information contained here still holds true with SharePoint 2013. I would recommend reviewing both of them together.

Services in SharePoint 2013 Diagram

I admit this has always been one of my more favorite diagrams. When this was released in SharePoint 2010, it captured a fundamental change in how SharePoint services are configured and delivered. This new architecture was created to support Microsoft’s ability to deliver SharePoint Online as a SaaS solution.

I reviewed this diagram and nothing has significantly changed in regards to sharing services across farms, the logical architecture of services, service groups and service deployment.

In the services table there are a few new services that have been added.

  • Access Services – Do not be confused by this. Yes there was Access Services in SharePoint 2010. At this early point, I know that that Access Services for SharePoint 2013 have been changed to be more focused on utilize the new App Architecture. As such, Access Services for SharePoint 2013 is pretty different. Access Services solutions created in SharePoint 2010 will still be supported moving forward, however they will run in a different service.
  • App Management Services - This is a new service that will be used specifically for supporting the new internal catalog or the public SharePoint store. Remember that in SharePoint 2013, everything is an app; EVERYTHING. Even everyday SharePoint lists are now called an app. Once you get over the name change, you will find out it makes complete sense and Microsoft has just aligned what is does with how business users talk about technology.
  • Machine Translation Service – This is a new one and as of right now, I do not have much information on the purpose of this service other than the description which says it performs automated machine translation.
  • Work Management Services – This service provides task aggregation across management systems including SharePoint, Exchange and Project Server. This is huge from a user perspective. One single place to see all of your tasks. No more building content query web parts to find all tasks; this effectively does this plus goes outside the SharePoint boundary to find more tasks. This is a very exciting service.
  • Office Web App Services – Is called out in here as a service that is no longer running inside of SharePoint server. Why? Microsoft strategy is to provide Office Web App Service to other enterprise application than just SharePoint and it strategically made sense to move it out of SharePoint.

In the rest of this diagram there are architecture diagrams for how to architect service groups across farms, none of which have changed from SharePoint 2010. If you are not familiar with this stuff, this is a must read and I recommend reading my old posting on it here.

Mobile Architecture Diagram

There is a brand new mobile architecture diagram provided and obviously this is drive by Microsoft’s focus on being a “services and devices” company. This is a pretty simple architecture that basically describes some things you need to think about if you are going to support mobile to your users and discusses some of the mobile capabilities. This can serve as a launch point for you to begin to dive deeper into how you will support mobile for your organization. The following are some high-level observations I had when reading this the first time:

Extranet – If you are not thinking extranet, you need to so you mobile users can access content when they are on a mobile device. They have some diagrams which will get your started thinking about it and additionally how you can use Unified Access Gateway (UAG) as a reverse proxy to help with that.

Mobile Device Management (MDM) – One interesting thing brought up in this diagram is how do you manage mobile devices? If you need something simple, you can leverage Exchange ActiveSync for remote device wipe, password enforcement, etc. If you are looking for application level MDM there are additional solutions out in the marketplace today that provide even more capabilities.

Application Architecture – The new SharePoint 2013 mobile architecture is introduced. They break it down basically into two logical layers: mobile and SharePoint. Some key points are:

  • Automatic Mobile Browser Redirection – Is a new capability that can be used to optimize the mobile experience based on the connecting device. This Feature must be active on the site and will be activated by default on numerous site templates. First there is the Classic View which is used to provide backwards capability to mobile devices and will have a SharePoint 2010 mobile browser experience. Then there is the Contemporary View which is geared to support HTML5 browsers. The Contemporary View is several enhanced features for navigation of SharePoint sites. Additionally, Full Site View is available so the SharePoint site page can be viewed as if it were on a desktop browser or a tablet device.
  • Office Hub for Windows Phone – Is an application for Windows phone devices that provides enhanced capabilities to access SharePoint content from multiple places in one spot. It also leverages mobile Office.
  • Location – There is a new geo-location field type that is available in a SharePoint List. This can make a list location aware to capture latitude and longitude which can be used with map applications. For instance, if a user enters in data on their mobile device, it will capture where it was done from and then can be displayed on a map. Here is some more information about this - http://technet.microsoft.com/en-us/library/fp161355(v=office.15).aspx
  • Push Notifications – There is a new capability to allow notifications to be sent from a SharePoint site to registered applications running on a mobile device. The nice thing about this is that Windows Phone Apps can receive notifications without having to poll. Here is some additional reading on the topic - http://msdn.microsoft.com/en-us/library/jj163784(v=office.15).aspx
  • Device Channels – This is a really important new capability as device channels allow you to deliver a publishing site geared specifically to support different types of remote devices. Basically the site can mapped to multiple master pages and style sheets and even control what content you want to make available to specific devices. Here is an overview on the new device channels - http://technet.microsoft.com/en-us/library/fp161351(v=office.15).aspx
  • Office Web Apps – As mentioned earlier in this posting, Office Web Apps is now a separate standalone server which does not run inside of the SharePoint boundary. Office Web Apps has been improved a lot to support mobile devices. There are Word, Excel and PowerPoint Mobile Viewers.

SharePoint 2013 Upgrade Process Diagrams

There are two upgrade diagrams that have been provided. Here are some high points I walked away with:

  • Must be on SharePoint 2010 – To upgrade, you must be on SharePoint 2010 technologies. This means if you are on SharePoint 2003, 2007 or 2010, you will need to upgrade to the appropriate version to get to SharePoint 2013. There are Microsoft migration partners that have solutions to assist with this. I saw many times, this is the big value proposition for using SharePoint Online as this is handled for customers.
  • Database Attach Upgrade – Is the only supported method for upgrading. There is no more “in-place” upgrade option. Frankly that is fine because most customers always went down a database-attach upgrade.
  • Preparation – much of the preparation activities that we have discussed in the past with SharePoint 2010 hold true with SharePoint 2013. There is a bunch of information you are responsible for gathering.
  • Manual Configuration Settings – In the preparation phase is recommended to get a understanding of all the custom configurations that you may have done because not all of them are going to be migrated. This is because not all databases are upgraded. So many custom configurations in central admin such as alternate access methods, time job tweaks, managed paths, incoming/outgoing email settings, certificates, etc. will need to be documented and reconfigured in the new farm.
  • Databases That Can Be Upgraded – There is a set of databases that can be upgraded. They are Content, BDC, Managed Metadata, PerformancePoint, Secure Store, Search and User Profile databases.
  • Customizations – This is an important task that needs to be completed. I have seen many cases where good software organizations have not implemented a strong configuration management process and the result is an organization may not know about all the customize code that may be implemented. There are numerous ways to find all of them by running PowerShell commands, doing system directory diffs, checking web.config, etc.
  • Upgrade Health Checks – There are some new features that are available to site collection administrators that will show you a health check of a site collection before actually upgrading the site collection.
  • Evaluation Site Collection – Site Collection Administrators also have the ability to request the site collection be copied into a new site collection to evaluate how the upgrade will affect any customizations they may have. This is helpful so you can remediate issues before you actually perform the upgrade. This is also nice because your site collection will run in a SharePoint 2010 mode until you are ready to actually upgrade it.
  • Testing – Just like for SharePoint 2010, the best way to prepare for a migration is to build up your new SP 2013 farm and then multiple practices runs of that upgrade into the new production environment. An entire process is defined in one of the diagrams and is a great place to start.

SharePoint 2013 Search Diagrams

If you are a reader of my blog, I wrote some long postings about the search architecture for both SharePoint 2010 Search (here) and FAST for SharePoint 2010 (here). I am not going to do a deep dive into all these search components and roles because they are basically covered. As many people now know, the FAST search engine is now the core search engine for SharePoint. It will just be referred to as SharePoint Search. Now you will be able to leverage a very powerful search engine out of the box. However many of the advanced enterprise features of search will only be available in the SharePoint Enterprise addition. I am also really excited about this for SharePoint Online because it can leverage FAST too. SharePoint Online will not be able to do Enterprise Search of line of business systems but a Search Farm (which is FAST underneath the hood) can be configured on premise and SharePoint Online can invoke that search and provide the search results in the cloud; pretty exciting.

I highly recommend taking the time to review both of these diagrams. It explains how each of the components interacts with each other. Additionally there is a diagram the goes into how to scale the server farms for the amount of content you will need to index. There is a great, new table in there that shows you how scaling will work. To be honest, the folks who are really serious about search will say it is an art and a table does not always communicate how you will do it. It always comes down to how many items, the types of data sources, custom transformations, query latency, index latency, etc.

SharePoint 2013 App Overview Diagram

This is an area I plan to do a lot more exploration of this coming year on and writing on this blog. Why? This is something we have been waiting for a long time with SharePoint development. There are several ways to look and this. SharePoint Features which we have been writing for years are Apps. This is name change to better communicate our technology to the end users who have to use SharePoint. However the new SharePoint App architecture is way more than that.

I have seen so many things over the years.

  • I think one of the biggest challenges people would run into is developing great SharePoint solutions only to find out they incorporated some dependency they should not have, they wrote some high-end code that should not be running in the SharePoint layer, they cannot leverage their solution outside the SharePoint boundary, etc. We want to resolve those problems by helping developers to deploy solutions in way that will keep their SharePoint environment nimble.
  • Plus we want to provide third-party vendors quicker access to customers. We want to help customers to quickly acquire third-party solutions.
  • Additionally we want to allow customers to leverage commodity based SharePoint Online. As you may know SharePoint Online has restrictions on high-end custom development and if that code where to run in another location, while be highly integrated with SharePoint Online, that is a huge win.

I will thing of many more reason this year on why this is so great J

Now we can achieve this through the new SharePoint App architecture. The old SharePoint Solution architecture where you create a WSP is still around. Nothing has changed there. This is used to create deployment packages and in many cases is used to deploy code that requires full trust. SharePoint Solution packages will continue to be used by third-party vendors or developed internally with such tools as Visual Studio. You can still create Sandbox Solutions which run in a more secure runtime and can be deployed in SharePoint Online.

Now the new Apps framework for SharePoint 2013 is a packaged up in a file called .APP. It is composed of many of the same types of files, AppManifest, embedded Solution.wsp, etc. Once an app is loaded into SharePoint, it is accessible through the App Catalog in SharePoint. This App Catalog can be controlled at an organizational level.

Remember the big point with Apps is, that the custom solution you are writing may or may not actually run in SharePoint. Full trust code is not supported. Your custom solution code itself may run in a different SharePoint farm, on an IIS server as ASP.net pages, ASP.net pages running in Windows Azure, etc. So how does SharePoint access these solutions running outside the SharePoint context? In simple terms we have an IFrame (with some extensions) that external solution is available through. OAuth provides the secure connection for access SharePoint objects from a remote location. We will additionally use a new extended and robust event model and remote client SharePoint library to write integrated, remote code.

Why is this so great? We are going to ensure that custom applications and solutions that are being developed with SharePoint are isolated. No more writing a bunch code and services that should not be running in SharePoint servers. It is great that you can do whatever you want with SharePoint, however this will drive solution management.

So you may be asking where does this get deployed? There are many different options for hosting.

  • SharePoint Hosted – This means the app and all the resources run in SharePoint. Remember you server side is not supported however you can write applications with SharePoint’s JavaScript libraries and such.
  • Windows Azure Autohosted – This is a model that is only supported in SharePoint Online. In This case you can write an App package that will have code for Azure and SQL Azure embedded into it. When the application is deployed, the azure solution will be automatically deployed for you. You do not have to go to Azure and set anything up at all; it is all handled for you behind the scenes.
  • Provider-Hosted – This is the third model where custom code and solutions are hosted in a separate server in your organization, hosted in Azure, hosted in different SharePoint servers, etc.

Once an App package is installed, it can be managed and monitored through the catalog. End users have the ability to select an app to run in their sites (much in the same way as turning on a Feature). If and when an app is updated, the user can decide how they want to upgrade to the new app.

Again I really plan to go much deeper on this in my blog but for right now, these are just some introduction notes and ramblings on how excited I am about this new capability J

Back Up and Recovery Diagram

There is a new diagram that goes into the details of doing your own back-up and recovery for SharePoint 2013. I know many people have become accustomed to using third-party vendors for supporting these operations and I still believe these vendors will continue to provide features above and beyond what is out of the box. However, if you are a do it yourself sort of person, this is a great diagram to review.

Not much has changed in regards to developing back-up procedures for both the SQL Servers and the SharePoint Servers. There are tons of scenarios covered in here, and I recommend reading this if it is important to you.

SharePoint 2013 Database Diagram

Finally the database server diagram has been updated. This is a really really really important diagram to review if you are managing on-premise servers. It goes over all the SharePoint databases, plus provide sizing and scaling guidance. Great information.

Thursday, March 15, 2012

Updated Independent Analysis on Microsoft Productivity Solutions for Prem and Cloud

There have been a lot of updates with Gartner analysis for the Microsoft Productivity stack and other related stacks that I work with. Specifically SharePoint, Office 365, Lync and BI have all been getting great independent reviews. The important take away that I discuss with my customers is that all these top quartile rated technology is delivered on a single, integrated solution stack. They are NOT hodgepodge solutions brought custom integration.
Many of the reports and newest reports can always be accessed from here - http://www.microsoft.com/presspass/itanalyst/default.mspx
SharePoint and Office 365
As you can see we have been great assessments for enterprise content management, enterprise search with FAST for SharePoint 2010, great recognition as portal provider, social computing and business process analysis with Visio Services and SharePoint workflow. Much of this capability is delivered through the Office 365 cloud as well.
SQL Server and BI
As you can see these are very recent reports giving us good reviews in the areas of BI and Data Warehouses. I think the big take away is again total cost of ownership (TCO). We give strong BI solutions and customers do not need to make significant investments to deliver BI through the same SharePoint investments discussed above.
Exchange and Office 365
For email with Exchange either delivered on-premise or through the Office 365 cloud we continue to be evaluated as a leader. This really speaks to our versatility in allowing customers to select the most appropriate place and get the same level of end user experience for business users.
Lync and Office 365
Microsoft is again recognized as a leader in all these types of solutions with Lync which can additionally be delivered through the Office 365 cloud. Our Unified Communications solutions are completely integrated in Office, Outlook and SharePoint allowing users to quickly transition between tasks they are working on making them highly productive. These capabilities are extremely useful for telework or communications between people spread across large geographies.

Tuesday, March 13, 2012

SQL Server 2012 Brings New Features and Capabilities for SharePoint 2010

Introduction
If you have SharePoint customer or architect and you really should be looking at the new SQL 2012 release. As we all know SharePoint success is highly contingent on SQL Server. You need a strong SQL Server deployment to ensure there is good performance, high availability, back up/recovery, etc. Additionally Microsoft delivers their Business Intelligence (BI) stack through SharePoint and there have been several new features and capabilities added there as well.
For SQL 2012 there are three areas where there have been major capability additions:
  1. Mission Critical Confidence – This is the capability to more easily deliver high-availability with lower total cost of ownership.
  2. Breakthrough Insight – New and expanded capabilities for Business Intelligence.
  3. Cloud on Your Terms – Additional capabilities to create SQL databases in either the private or public cloud (Office 365).
In this blog I am going to cover some of the new SQL 2010 capabilities specifically focusing on how they can improve your SharePoint 2010 implementation. Please note that there are a lot of new capabilities for SQL Server 2012 which I have not covered such as data warehousing, resource management, full text searching, auditing, Big Data (Hadoop), etc. Here is a good reference to quickly spin up on all the new capabilities (“What's New in SQL Server 2012 Whitepaper” located here).
Mission Critical Confidence
AlwaysOn
First and foremost is the new high-availability solution in SQL Server 2012 called AlwaysOn. One of the first and foremost challenges that organizations face with deploying SharePoint 2010 is providing a solution architecture that will meet the SLA’s of their business users. Business processes, technical processes and Governance need to be put in place to ensure that SharePoint 2010 will be up as much as possible.
In the past with SQL Server 2008 R2, you employ such solutions as clustering, mirroring, log shipping and replication (my previous blog on this topic). However this could require a lot of planning, configuration, and management. With SQL 2012 AlwaysOn, new configuration wizards and tuning tools are now provided that makes set-up and configuration of High-Availability extremely simple.
The concept of Availability Groups have been added which specifically makes configuration of Database Mirroring easier. Availability Groups are a logical of databases that failover together. Through the configuration wizard, you can determine if you need such things as automatic or manual failover, set-up of primary and multiple secondary instances, synchronous or asynchronous data movement, etc.
Availability Groups remove the need for shared disk storage (SAN or NAS) for deployment of a failover cluster instance. Note that AlwaysOn Failover Cluster Instances support multiple-site clustering across subnets which subsequently enables cross datacenter failover.
This feature is very useful when setting up your SharePoint 2010 (or 2007) farm because High Availability is one of the most paramount tasks that are needed when setting up a mission critical SharePoint environment.
Recovery Advisor
Database Recovery Advisor provides many new features and capability for the support of back-up and restore of databases through SQL Server Management Studio. The new Recovery Advisor streamlines the back-up process. One such solution is a visual timeline that provides the backup history and all the available points in which you can restore from.
There is also new capability called Split File Backup which allows you to split a backup into multiple files. This allows for quicker backup and restores because they can be written and restored across disks running in parallel.
This is very helpful for improving the amount of time to work with large SharePoint 2010 (or 2007) Content Databases that have grown significantly.
Breakthrough Insight
As you probably already know SharePoint 2010 is where Microsoft’s Business Intelligence (BI) stack is delivered. This is a combination of solutions such as Excel Services, PerformancePoint, Visio Services, Chart Web Parts (Dundas), PowerPivot, and SQL Reporting Services (SSRS). With the release of SQL 2012 a new solution called Power View is now provided, PowerPivot can now be done on SharePoint 2010 server side and there is significantly enhanced integration with SSRS.
Power View
Power View is a new highly interactive data exploration and reporting tool that allows business users to visually explore data, in an ad-hoc fashion. End users can create the reports/dashboards very quickly, create shapes/graphs with clicks, create animation, highlight capabilities based on rules, drill down relationships and performs very well with large datasets. The design environment is very similar to Office. It can be published through SharePoint 2010 as shown in the diagram below.image
Note that Power View requires SQL Reporting Services (SSRS) to be integrated with SharePoint 2010 and there be an instance of SQL Server 2012 Analysis Services (SSAS) or PowerPivot be available.
PowerPivot
Up to this point PowerPivot was a solution that was available through Excel client only via an Add-in. This is a really powerful, end user friendly capability that allows for data analytics. There are actually several new capabilities that have been introduced that I recommend you read up on (in references below).
Specifically to SharePoint 2010, PowerPivot for SharePoint is available as an add-in and can run server side in conjunction with Excel Services. This now allows end users the ability to publish PowerPivot reports through the browser allowing end users easy access to this data.
image
As well, in SharePoint 2010 Central Administration there is a new PowerPivot Management Dashboard that provides several reports on performance of reports.
image
SQL Reporting Services (SSRS)
Personally the enhancements to SSRS are very exciting to me as I have had to do configuration of SSRS with SharePoint 2007 and 2010 in the past in production environments. If you have done it before, you may recall it was tedious task of configuration the SSRS Configuration Wizard and getting Kerberos Authentication set-up correctly. Now with SQL Server 2012, configuration of SSRS with SharePoint 2010 is completely done through Central Administration. Additionally there is a new service in SharePoint 2010 that runs SSRS, it supports WCF/Claims authentication, integrated ULS logging, built in load balancing across SharePoint servers, report performance improvements and there are PowerShell commands for management.
image
Additionally there is a SSRS alerting capability that allows end users to subscribe to alerts that are associated to SSRS report. Users can create conditions for any report, and when they are met a notification will be sent to them.
image
One more last addition change is you may know that SSRS reports can generate Word or Excel Documents. Up to now, the file formats generated were either .doc or .xls which means they are not Open XML document renditions. With SQL Server 2012 and SSRS now .docx and .xlsx file types will now be generated.
Cloud on Your Terms
With SQL Server 2012, there are a bunch of new capabilities to support even tighter integration with SQL Azure. You will have a unified database development experience between SQL 2012 on-prem and SQL Server. The reason why I bring this up is because it is very common that when creating a solution inside SharePoint, you will have complex data management requirements that can be better supported with a SQL Server database instead of a SharePoint list.
Now we have the ability to more quickly and efficiently create custom database in SQL Azure which can be utilized with SharePoint 2010. Plus this is really good for working with Office 365 SharePoint Online because we can make a cloud to cloud connection to work with advanced data structures.
References

Tuesday, November 1, 2011

SharePoint 2010 Content Database Sizing

A lot of people already know this but on July 14, 2011 it was announced that Content Database Sizing has been updated http://technet.microsoft.com/en-us/library/cc262787.aspx#ContentDB.

There is still recommendation to keep content databases at 200GB. Why? This recommendation is still a good recommendation because if you need to back-up and restore a database very quickly, you do not have move around large back-up files. Plus if you have an Information Architecture that drives content to specific site collections (with dedicated content databases), you will be much more agile to requirement changes, upgrades, etc. One big massive content database is an indication of poor planning and governance.

The first new recommendations in this article is that 4TB of data can be stored in a content database. There are some parameters around this recommendation that you should read.

The second new recommendation is that there is no explicit recommendation on sizing for document archiving scenarios. However there are some very specific recommendations made in here to support that scenario – so review http://technet.microsoft.com/en-us/library/cc262787.aspx#ContentDB. This is really important for Records Management solutions. On this specific point, it would also be good to review the “Extremely large-scale content archive” section in http://technet.microsoft.com/en-us/library/cc263028.aspx. When going down this path you will need to have Remote Blob Storage and FAST to support this solution architecture.

Finally SQL Server tuning is fundamental to your success for managing large content databases. Here is a blog that I wrote on the topic - http://www.astaticstate.com/2010/12/sharepoint-2010-high-availability-with.html

Saturday, August 20, 2011

SharePoint 2010 SQL Server Encryption

I have been asked several times what SharePoint ability to encrypt data that resides the SQL Server databases. I was answering this question when I ran across this awesome presentation that I highly recommend. “SQL Transparent Data Encryption for SharePoint Content Databases” at http://technet.microsoft.com/en-us/edge/Video/ff711430. Just want the first few minutes of the video.

Smaller notes I usually bring up are:

· Row level encryption is not really possible as we do encryption at the database level.

· An important note is that the SQL connection from the SharePoint application layer to SQL server is done through a service account. Users are not directly authenticating and accessing data in the database.

· Documents themselves can have rights management applied so the BLOB is encrypted when it is stored in SharePoint. However metadata associated to the document is not encrypted and documents will not be searchable.

Using Transparent Data Encryption really is a great solution.

Sunday, December 26, 2010

SharePoint 2010 High Availability with SQL Server

Introduction

For almost my entire career I have been an application developer. I got into SharePoint roughly six years ago and continued that philosophy however a few years ago I found out I could no longer get away with that. I had to start learning the architecture of SharePoint and how to design / configure it to support the major “ilities”. The one on everyone mind was how to scale SharePoint. I like to think I became very knowledgeable of this topic for SharePoint 2007 and now SharePoint 2010. But again, I was lacking in an area – SQL Server.

I actually like to again think I am very knowledgeable on SQL Server from an application development perspective. I am strong at normalizing databases, views, creating indexes, stored procedures, optimistic/pessimistic locking, partitioning data, creating data access layers (DALs), object relational mapping (ORM), etc. but when it comes to the administration of SQL I needed to brush up on the topic. As you can see my focus has been all about getting data in and out.

Understanding SQL Server architectures for high availability are critical to the overall SharePoint architecture because of the strong relationship between the two. I have usually been able to get away with a client already having a highly available SQL Server environment ready for me but now I am finding out that is not always the case. The purpose of this blog posting is to provide information on:

  • What is SQL Server High Availability?
  • How does SQL Server High Availability relate to SharePoint?

SQL Server High Availability

I am not going to give the definition of all the SQL Server high availability solutions. This section will mostly be comprised of high recommended readings that will make you smart enough to have to have an educated discussion on SharePoint and SQL Server.

When you start reading about SQL Server High Availability, you will find the following solutions. I am going to give my very quick definitions of what they are:

  • Clustering – An approach where multiple SQL Server resources (that share the same disks) are presented as a single machine to calling applications. Applications, like SharePoint, do not know how machines there are; all it knows is that there is SQL Server. If there is a failure, clustering services will bring online the other SQL Machine without affecting the application. Clustering does not provide protection against disk failure as it is a software level solution.
  • Mirroring – This is an approach where a primary and a mirror database server are set up. There are two modes that you have to understand. There is high-safety mode (synchronous) which will ensure that any transaction that is executed against the primary is completed on the mirror. One drawback of running in high-safety is that it takes more time. The second is high-performance (asynchronous) which commits transactions on both the primary and mirror databases; however the primary will not wait on the mirror to complete the transaction. This provides better performance for highly transactional systems but there is the potential for data loss. So at best this solution provides a “warm” stand-by with potential data loss if asynchronous is used. There is configuration of mirroring called high-safety with automatic failover where a third server is used (called the witness) which will evaluate communication between the two machines and potentially make the mirror server the primary server. This effectively makes the mirror server a “hot” standby but the calling applications must be notified and there are solutions for this. Another limitation of database mirroring is that it can only be configured between two machines.
  • Log Shipping – They usually start off by saying this solution operates that the database level. This solution allows you to create one-to-many “warm” standbys where there is a primary database and secondary databases monitor the logs of the primary and bring themselves up to date with the primary. As you can see, there is a delay between the secondary being completely up to date with the primary (which is not a bad thing in some scenarios). For a secondary database to become the primary, all the logs must be applied. This can be used as a solution to supplement mirroring.
  • Replication – I joke the tried and true methodology which uses a publish-subscribe model where the primary will distribute data to one or more secondary servers. We are not going to dive into this for SharePoint.

Now you may say that is not much – but at a high level that is what you need know. I am not a SQL Server administrator, nor do I want to be. Here are readings I believe that you must read so that you can become knowledgeable on this topic. The great thing about these readings is that I have now saved the cost of going out and buying a book; it is all here. All of these articles have links to more articles to go deeper where ever you need to.

  1. High Availability Solutions Overview - http://msdn.microsoft.com/en-us/library/ms190202.aspx - I highly recommend starting here. My overview aligns to this.
  2. Selecting a High Availability Solution - http://msdn.microsoft.com/en-us/library/bb510414.aspx - this is a great read that summaries the pro and cons of the approaches I have above. Highly recommend reading this.
  3. High Availability: Interoperability and Coexistence - http://msdn.microsoft.com/en-us/library/bb500117.aspx - This is a very solid series written about how the high availability solutions I discussed above can be used together. Specifically there are two articles in this section that show how Mirroring and Clustering are implemented together. Then another discussion on how Mirroring and Log Shipping work together. These enlightening to read.
  4. High Availability with SQL Server 2008 - http://msdn.microsoft.com/en-us/library/ee523927(SQL.100).aspx - This is a great article that says “wait” understand your requirements before you start talking high availability solutions. Plus it goes into backup and recovery which is another major piece of the puzzle. I would read this after you have read my other recommendations as this will bring the entire discussion into perspective.
  5. Achieve High Availability for SQL Server - http://technet.microsoft.com/en-us/magazine/2007.03.highavailability.aspx - This is an optional recommendation on the basic of availability. It is on SQL 2005 but it is well written and supplements the first two I referenced.

At a minimum you have to become knowledgeable with the first two readings. If you have the time, the third reading is worth your time.

SharePoint and SQL Server for High Availability

So now, you may be wondering how this may translate into SharePoint 2010 and how I create my architecture. Hopefully I will be able to provide you some references on how you should configure SharePoint 2010 for high availability because on your requirements.

SharePoint 2010 Databases

First it is important to understand all of the databases of SharePoint 2010. I wrote a blog here based on this Technical Diagram - http://go.microsoft.com/fwlink/?LinkId=187969. Highly recommend reading these and understanding the nature of each SharePoint database. You will need to understand this in the context of the configuration of SharePoint you are using.

SharePoint 2010 Capacity Testing Results

Yes, Microsoft has done a significant amount of testing to understand the boundaries of SharePoint 2010 and have made the results from these tests available. I have written a summary about these test results on my blog here. At the top of this blog I provide references to detailed whitepapers published by Microsoft. These are very in-depth and I would not recommend reading this out right. However there is a ton of good information here which will point you in the right direction of how you may want to design for high availability.

SQL Server 2008 R2 and SharePoint 2010 Products: Better Together

This is a whitepaper (http://technet.microsoft.com/en-us/library/cc990273.aspx) that has been written on the topic of how SharePoint 2010 and SQL Server 2008 together make a great solution. However you will find it to be no more of an overview and will not give you information how to configure for a highly available environment. I would recommend this as supplemental only or provide this to folks who have very limited understanding of SQL Server (so this whitepaper could be useful).

Storage and SQL Server capacity planning and configuration (SharePoint Server 2010)

This whitepaper (http://technet.microsoft.com/en-us/library/cc298801.aspx) is on the completely other side of the spectrum that just referenced. I would just give this article immediately to any high end SQL administrator who is evaluating or designing a solution to support SharePoint 2010. It is full of best practices and official recommendations. There are gems of information in here like:

  • Making sure I/O operations per second be the fastest it can be.
  • Equations for estimating database sizing for all the types of databases.
  • Direct Attached Storage (DAS), Storage Area Network (SAN), and Network Attached Storage (NAS).
  • Disk types
  • Memory
  • Server architecture. Interesting facts I read where additional database server is recommended when there are more than four web servers running at full capacity. Plus add additional SQL Server when content databases exceed 5 terabytes.
  • SQL Server configuration options
  • Disk prioritization
  • Performance Counters to watch.

Plan for availability (SharePoint Server 2010)

Now this is the whitepaper (http://technet.microsoft.com/en-us/library/cc748824.aspx) that really brings the whole thing together and I highly recommend this article. I purposely brought this up LAST because if you read this far you should be rewarded J

When you start reading this you will see this has little to do with the SharePoint 2010 Service architecture and more to do with SQL Server high availability solutions. Specifically it recommends:

  • Failover clustering
  • Mirroring – “For mirroring within a SharePoint Server farm, you must use high-availability mirroring, also known as high-safety mode with automatic failover”. This means synchronous.

untitled

A few other interesting notes are:

  • SharePoint 2010 is now mirroring aware meaning that SharePoint 2010 can be configured to know which SQL server is the primary and which is the mirror. This reduces the need for clustering.
  • There is a good chart that compares clustering against mirroring.
  • There is no mention of Log Shipping however you may very well want to implement it. I can see scenarios where you can do some creative things like have a log ship configuration refreshing a testing environment with production data on an interval, etc.

Please note that this is not a requirement for running SharePoint 2010. These are best practices solutions for creating highly available SharePoint environments to meet Service Level Agreements (SLAs). So if you have a 99% sort of SLA, you will have to consider these sorts of architectures.

Plan for disaster recovery (SharePoint Server 2010)

Disaster Recovery is another piece of the puzzle when designing a highly available SharePoint 2010 architecture. Please read this whitepaper (http://technet.microsoft.com/en-us/library/ff628971.aspx). On the topic of SQL Server there is specific mention to using either Log Shipping or Asynchronous Mirroring to create a “hot” disaster recovery environment. There are specific discussions to which databases need to be part of the configuration. This is a good starting place to start understanding this topic.

untitled

Conclusions

There are a couple conclusions I came to while performing this research:

  • The solutions for scaling SQL Server itself are not really all that much different in concept to the solutions we have at our hands for SharePoint 2010 services.
  • Much of what I talked about in this blog applies to both SharePoint 2007 and 2010.
  • I have always understood there has been a direct correlation between SharePoint and SQL Server when it comes to performance. I have seen when SharePoint flat out perform better in these situations.
  • High Availability does not equate to better performance in all circumstances. Typically SQL machines that are in a high available environment will be more “beefy”. Point is if you expect good performance and you put SharePoint on an already highly utilized SQL server there will be performance issues. In those situations you may want to have dedicated SQL Servers for SharePoint. I wish I could still give a one size fits all recommendation but it really comes down to how your environment is configured and how SharePoint is ultimately used.
  • Back Up and Recovery are not addressed as part of this discussion. That is a completely separate topic but is absolutely critical to the overall high availability architecture.

Monday, June 28, 2010

SharePoint 2010 Databases

Introduction

One of the first things that will come up with any SharePoint 2010 is what databases are needed to support SharePoint 2010. With every SharePoint deployment there are two people who you need to become your best friends, the system admin and database admin. Now both these guys/gals have very distinctive views of the world – and at times we can find them down right annoying J However they do what they do because they have gone through lessons we as solution developers do not.

In this blog I am going to go over the databases that are needed for SharePoint 2010. The number and types of databases that are needed to support SharePoint 2010 have changed from SharePoint 2007. As you are about to see when I say more databases, I really mean more databases. Many of the maintenance, sustainment, governance, etc. challenges a SharePoint engagement suffer from tend to take this for granted or think that is can be resolved later – and by then it is too late.

I would highly recommend you understand this along with the new service architecture and logical topology of SharePoint 2010.

External Databases

I figured I put this first because it is an important. I have always said that understanding the databases of SharePoint is not good enough. Once you bring in data from an external line of business systems, the databases become part of SharePoint from a user's perspective. So capacity planning, continuity management, etc. need to be part of your SharePoint governance plan.

Configuration Database (SharePoint 2010 Foundation)

This database is responsible for managing data associated to the all the SharePoint databases in the farm, all IIS web sites, trusted solutions, WSP packages, site templates, all web application and farm settings.

From a size perspective this database will be small and there can only be one per farm. This database has a strong dependency to the Central Administration Content Database and they must reside on the same database instance.

Central Administration Content Database (SharePoint 2010 Foundation)

This is the content database from for the Central Administration web site. It will not grove very large and has a strong dependency on the Configuration Database (i.e. they must be located on the same instance). Only one of these databases will be created per farm.

Content Database (SharePoint 2010 Foundation)

This is the database(s) that is responsible for storing all content stored in SharePoint websites. This would include lists, documents, web part properties, audit logs, sandboxed solutions, etc. It will also store data for Office Web Applications (Excel, Access, OneNote, InfoPath, etc.)

A content database can store data for multiple site collections however data within a specific site can only be store in on content database. There will potentially be numerous content databases based on the design of your SharePoint environment.

Microsoft strongly recommends that content databases size should be limited to 200GB. Supporting content databases with terabytes of data is supported for large single repositories of data like a Records Center. If you have gone over 200GB of data in a content database, you have not done your planning nor put the governance in to manage your environment. I personally would recommend making dedicated content databases per site collection and for an enterprise deployment of SharePoint there should be multiple site collections, not just one big one.

Content databases can be located anywhere; there are no dependencies other than being accessible to the SharePoint farm. For very large sites, you may even created dedicated instances to support performance.

Usage Database (SharePoint 2010 Foundation)

This is a new database which is dedicated to supporting the new Usage and Health Data Collection Service Application service. This database stores all of the health monitoring and usage data collected and the data within it is temporary. This database needs to support heavy write operations because data will be continually written to it. The health monitoring service will later take this data, aggregate it and then store it in the Web Analytics Reporting Database.

This database can get very large relative to the size of the amount of content you have stored in SharePoint as well as how many reports you have running. It will never be as large as the content database(s) because the actual data will not be store in it but it will store information about all data in all content databases across the entire farm. There can only be one of these databases per farm.

Business Data Connectivity (SharePoint 2010 Foundation)

This is the database that is used to support BCS services. All it stores is external content types and associated metadata. This database will remain small because it does not store any data from the external systems. The only thing this database will need to support if heavy read operations because on the usage of BCS within SharePoint.

Application Registry Database (SharePoint 2010 Foundation)

This database stores data required to support backwards compatibility for Business Data Connectivity (BDC) from SharePoint 2007. This database is only used during the upgrade process and can actually be deleted after the upgrade is complete.

Subscription Settings Database (SharePoint 2010 Foundation)

This is a new database for SharePoint 2010 and supports the Subscription Settings Service. This database is used to support the new partitioning feature for SharePoint 2010. If you did not know, SharePoint data can now be partitioned by service subscription. This is will be used if you are providing hosted, centrally managed services and you want to make sure one service subscriber cannot access data of another service subscriber. This way services can be shared in a farm but the data can be protected. This database needs to support heavy read operations for hosted services that are highly utilized.

This database is not big and will not be created by default. The SharePoint administrator will create this database using PowerShell.

Search Administration Database (SharePoint 2010 Standard)

This database is used to support SharePoint 2010 Search service. It contains all the configuration information associated to search and Access Control List (ACL) which is used for securing content that is indexed. This data bases is neither small nor big. An instance of this database can be created for each search service that is running.

Crawl Database (SharePoint 2010 Standard)

This is another database that is used to support SharePoint 2010 Search service. This database will store the state of the crawled data and the crawl history.

This database can grow to be very large based on the amount of content that you are indexing. More crawl databases can always be added into the farm to scale out. This database must support heavy read operations and it is recommended to run on SQL Server Enterprise Edition.

Property Database (SharePoint 2010 Standard)

This is the third database that is used to support SharePoint 2010 Search service. This database will store information associated to crawled data (i.e. properties, history, and crawl queries). This database can become large but not as big as the Crawl Database. It recommended for very large SharePoint environments that this database be put in a different database server; separate from the crawl database. This database must support heavy write operations and it recommended to run on SQL Server Enterprise Edition.

Web Analytics Staging Database (SharePoint 2010 Standard)

This database store temporary usage data collected from the Usage Database. The data comes to this database in an un-aggregated format and web analytics service will take this data, process it, aggregate it and then sent it to the Web Analytics Reporting Database. This database will be cleaned out every 24 hours but then refilled with new data that has been collected.

Web Analytics Reporting Database (SharePoint 2010 Standard)

This is new database for SharePoint 2010 used to support the Web Analytics Service. This database stores all the aggregated analytics data collected across the SharePoint 2010 farm. This is the database the usage reports run against and there will only be one of these databases per farm.

This database can grow to become very large relative to the amount of data stored in the entire farm. This database will only have analytics data; it will not have any actual data from the content databases. By default, data will be stored in here for up to 25 months.

State Database (SharePoint 2010 Standard)

The state service is used to support storing temporary data across HTTP request. This database is utilized by InfoPath Form Services, Visio Services, Exchange, Chart Web Part, etc. (). The space required for this database is driven by the usage of the services that utilize of this database. Multiple state databases can be added through PowerShell commands.

Profile Database (SharePoint 2010 Standard)

This is a database used by the User Profile service and is used to store profile data. This database will not become very big and the size will be based on amount data be stored about each user. The database needs to support heavy read operations to get user data which is access commonly (user permissions are not store here; they would be in the content database).

Synchronization Database (SharePoint 2010 Standard)

This is another database used by the User Profile service. Its purpose is to store the configuration of the service that brings user profile data into SharePoint. It is also used to stage data that is being synchronized from directory services like Active Directory. The size of this database will be relative to the number of users and groups that are being synchronized. This database needs to support both heavy reading and writing when the synchronization service is running.

Social Tagging Database (SharePoint 2010 Standard)

This is the third database used by the User Profile service. It is used for storing social tags and notes created by users for content in SharePoint. The size of this database is completely based on the utilization of social networking services. This database will experience mostly heavy read operations.

Managed Metadata Service Database (SharePoint 2010 Standard)

This is the database used to support the new Managed Metadata Service will stored centralized content types that can be used across the farm. This database will not get very big. If managed metadata is used a lot, this database will need to support heavy read operations.

Secure Store Database (SharePoint 2010 Standard)

This is used by the secure store service which is the new SharePoint 2010 service to support Single Sign-On. It stores user credentials and passwords. This database will be small. It is recommended that this database have limited access and potentially even in a different location from the other databases.

Word Automation Services Database (SharePoint 2010 Enterprise)

This database is used by the Word Automation service and stores all pending and completed document conversions. The database will not get very large and has processes to ensure that it does not get too large.

PerformancePoint Database (SharePoint 2010 Enterprise)

This is another small database used to support PerformancePoint. It will store temporary objects and settings needed to support dashboards.

FAST Search Administration Database (SharePoint 2010 FAST)

This stores all configurations associated to groups, keywords, synonyms, term entity, inclusions, exclusions, spell check, best bets, search schema, etc. This will be a small database but must support heavy read operations to support both indexing and querying of data.

References

Tuesday, May 13, 2008

Recursive SQL Query

Some time ago I was done a ton do SQL 2005 development and I tons of requirements to do recursive queries. Well I had it come up again where I wanted to return back a tree of data. Not to say I could to the recursion on in some .Net code however it is a little cleaner in my situation to return the data from a stored procedure ready to go. There are tons of various solutions out there so use it if you like… I had found this solution on some blog two years ago, I lost the link and I luckily found the code sitting around deep in the bowels of one my code directories...

Here is the table definition:

CREATE TABLE [dbo].[Reference](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](100) NOT NULL,
[Phone] [nvarchar](50) NULL,
[Fax] [nvarchar](50) NULL,
[ParentID] [int] NULL,
[SortOrder] [int] NULL,
[Location] [nvarchar](50) NULL)

The table schema is really simple. ParentID is the ID to the record that is the parent. The SortOrder is the order specific to the level. This will make sense shortly but it is the sort for each level; an example would be:


ID    ParentID    SortOrder
1 NULL 0
2 1 0
3 1 1
4 2 0
5 2 1
6 2 2
7 3 0
8 3 1

So the SortOrder is relative to the grouping the ID is associated to. To select the data so that you get back all the data in the tree and so the data is nicely ordered just like above use this query.

    WITH Tree (id, [name], phone, fax, parentID, sortOrder, location, GroupLevel, SortKey) AS
(
-- Anchor query.
SELECT DISTINCT R.id, R.[name], R.phone, R.fax, R.parentID, R.sortOrder, R.location,
0, CAST(R.sortOrder AS VARBINARY(900))
FROM Referenceas R
WHERE R.parentID IS NULL
UNION ALL

-- Recursive query.
SELECT R.id, R.[name], R.phone, R.fax, R.parentID, R.sortOrder, R.location,
T.GroupLevel+1, CAST(T.SortKey + CAST (R.sortOrder AS BINARY(4)) AS VARBINARY(900))
FROM Referenceas R, Tree as T
WHERE R.parentID = T.id
)
SELECT
T.id,
T.[name],
T.phone,
T.fax,
T.parentID,
T.sortOrder,
T.location,
T.GroupLevel,
T.SortKey
FROM Tree as T
ORDER BY T.SortKey

This is how it works:

  • There is an inner query referred to as the anchor; this will get the root node. One limitation of this query is that there must be a root node. In this case, it is there the parentID IS NULL. However if you do not what to return the root node, you can add WHERE T.parentID IS NOT NULL to the outer query.
  • Then there is the recursive query which used the anchor query.
  • Both the anchor and recursion are based on the sortOrder.
  • The outer query selects data from the two inner queries and orders them based on the SortKey. The SortKey was generated by the two inner queries and will ensure that both the hierarchy and the SortOrder within the results are respected.

Friday, January 18, 2008

Demystify MOSS Content Database Sizing

1) Introduction

A common question that has arisen how big content databases should be and what is the governance around the management of those databases. Hopefully this will demystify a lot around content database sizing for MOSS. Much of this information I have gathered in trying to gain my own understanding of the topic. I will discuss considerations, assumptions, various best practices, how to use central administration functionality, managing growth, recommendations, etc. One conclusion is not all of this can be done out of the box and tools like AvePoint DocAve and K2 BlackPearl can be used to create governance around an enterprise deployment of MOSS.

2) Content Database Sizing
MOSS has no restrictions on the size of content databases. Many available resources available state that a content database should range from 30 to 50GB but this guidance originated from WSS v2/SPS 2003 in regards to quick recovery of documents and site collections. If configured correctly, WSS v3 and MOSS 2007 implementations can have content databases that can handle terabytes of data. [1] Still the ability to quickly backup and restore content database is important. Most state that a best practice is keep content databases capped at 20 GBs of data to support backup/restore requirements. Tools like AvePoint can be purchased that can alleviate this issue however both the logical and physical architecture of the SharePoint implementation should be evaluated.

When evaluating the size of a SharePoint content database things like high availability, disaster recovery and backup methods must be understood. [2] The size of the database should always be discussed with database administrators. Knowing their preferences and what their environment can support is critical to understanding how many and acceptable sizes for the content databases that can be manage moving forward.

It is important to know how the database administrators strategy for managing back-ups (clustering, database mirroring, log shipping, etc.). For instance if mirroring is used (new to SQL 2005) a higher availability could be achieved but there is a higher operational cost and learning experience associated to it. Using SQL Clustering is the most common solution offering greater processing power and scalability but uses shared storage. A log shipping strategy could be used with either clustering or mirroring but things such as latency between publishing and subscribing servers should be considered. [3] This may or may not affect your end decisions but is value information to know.

In the end there are two architectural questions that need to be answered are:

  • How large to make the content databases?
  • How many content database instances should be created?

3) Types of High Capacity MOSS Sites

MOSS supports high-capacity document storage which can allow for the storage of a 5 million documents with a document library. Performance of a site containing a large document library can degrade depending on how the content is used. Microsoft has provided some guidance to help support the design of large-scale content management solutions. Design decisions and governance made around the capacities of site collections, sites, and libraries in MOSS must be taken into account not only for the physical storage constraints but also the content usage and viewing patterns of the users.

Microsoft classifies three types of large scale content management databases. [9]

  • Large Scale Authoring Environment: Contains a library within which users actively edit 50,000 or more documents across 500 or more folders. There is heavy collaboration on documents, versioning, workflows, custom content types, etc. The database could be as large as 150GB of data. [9]
  • Large Scale Content Archive: A large-scale archive is a repository where users view or upload documents. Little to no collaboration occurs on the site. These sites can be classified as a knowledge base or records management solution. A knowledge base site would typically have a single version of most documents, containing around1,000,000 or more documents. The content would be stored in a single database as large as 400 GB. On the other hand a records center site template features and functionality that are used to manage the retention and disposition of company records. Records are can be described as evidence of activities or transactions performed by the organization and that must be retained for some time period for compliance or regulatory requirements. [9]
  • Extremely Large Scale Content Archive: Microsoft describes this as an archive might contain up to 10,000,000 documents distributed across 5,000 or more folders with a database larger than 3TB. There could be 50,000 or more users primarily browsing content by searching. [9]
Personally I have not run into any scenario yet where MOSS has had to go to that scale.

4) Database Sizing Considerations
For large content databases understanding the topology of the website and the organization of the content will help drive how many site collections and content databases are needed. A single large content database can provide a better overall picture of all content and the ability to aggregate data. They typically allow for easier backup/restore policies as there is only a single content database. Putting content that is highly related to each other into the same site collection allows administrators to focus on a single content database. [4]

The following are some performance recommendations that have been made by Microsoft. Before going down the path of using a few large content databases there are a few things that should be considered. First is the potential performance implication if the site collection will host a large amount of content (either at the root level or in the sub webs). Typically, performance will become progressively worsened as the number of total objects exceeds 2,000 (sites, libraries, documents, etc.). [4] The following are recommendations provided by Microsoft for improving performance at each level of storage when high volumes of content are being stored. [9]
  • Site Collection: 2,000 subsites of any site is the recommended limit.
  • Site: 2,000 libraries and lists is the recommended limit.
  • Library: 10,000,000 documents is the recommended limit. 2,000 items per view is the recommended limit.
  • Folder: 2,000 items per folder is the recommended limit.
Delete content is can become problematic as the size of the content database grows significantly. It is possible that SQL blocking can occur when a large number of items are deleted because the high number of transactions occurring on the SQL database server during a delete operation. Additionally, sites that will have high amounts of concurrent users actively collaborating can also have negative implications as SQL transaction logs can grow rapidly creating challenges for backup/recovery. [4]

If a number of site collections can be accommodated with multiple content databases to manage them some benefits can be realized. For instance content can be managed by business line and usage of analysis reporting can be more targeted which can drive governance, manageability and metrics which can be used to gain insight into the business. Even though content would be broken apart over site collections aggregation of data can still be achieved by creating an aggregation portal. [4]

An aggregation portal would be a top level site collection that has the links to all of the site collections below it. There is currently no out of the box way to elegantly link site collections together. The aggregation portal will have some custom code built to make it aware that there are site collections below it and provide navigation functionality to them. The aggregation portal will more than likely require views of data across all site collections. This can be accomplished by using custom web parts written to retrieve data this data and present it. Another option for an aggregation portal is a records center where content is published it.

Joel Oleson, a SharePoint MVP, specifically notes that "in WSS 3.0/MOSS 2007 I recommend you pick a maximum site collection quota of no larger than 15GB, excluding the top level site collection and to move sites to dedicated databases where they need to grow beyond this. SharePoint content databases can scale to hundreds of GB and SharePoint farms can handle TBs of content. These recommendations are based on my experience with MS IT and in conversations with product support, and your experience may vary." [10]

This statement is consistent with other colleagues that I work with that have been confronted with this challenge.

5) Facilities to Manage Content Database Sizing
Content databases are databases used to store all MOSS content. By default a single content database will be created for each web application. As new site collections are added to a web application the content for that site collection will be stored in the database assigned to the parent web app. Central Administration site can be used to add new databases.

Within central administration there are is some basic information indicating the database location, the account used to make a connection to the database and capacity setting which control how many sites can be added to the content database. Using the capacity setting can help control the size of the databases from a governance perspective. If the site capacity is low, more content databases will be required to provision new sites in a site collection. Now this does not control the actual size of the database nor control the amount of content that is in a specific site. But if it can be planned on large "typical" sites within the site collection are, when certain "thresholds" are exceeded administrations can be warned.


Site quotas should be created to help with the governance of the actual size of the content databases. First a site quota template should be created that specifies that when a site collection reaches a specific size the site collection administrator will be notified.

Then when a new site collection is created the quota template can be selected.

It is possible to change the site quota template after a site collection has been created. It is possible to assign unique quota size values to specific site collection. As well, locks can be placed on the site collection to block new content from being added. This allows an administrator to take action before the site collection reaches the maximum size.


As well SharePoint Central Administration provides workflows that manage content that is not being actively used. This is turned on at the site collection level and provides a proactive way to delete web content that is no longer in use. When turned on SharePoint will email the site collection administrator if the content is still being used. If the confirmation remain un-challenged by the site collection administer the site collection will be removed. This functionality would be used in collaborative environment where team, project and workspace sites are heavily used and not needed to be retained (i.e. never to be used with a records center).

6) Managing Growth
As a site collection grows to a point where performance suffers administrators can break content up using the stadmn operations to manage the fastest growing sites into their own site collection. To support the possible partitioning of content across site collections the topology of content must be considered to it is agile. This can be as simple as having capturing guidance on should be considered a top level site such that it can be moved easily.

Note that site collections, even defined within the same web applications, are not linked to each other by using the out of the box portal template. The following screenshot shows a screen where an administrator can create a new site collection. In the Web Site Address section there are two options by default. The first is a "/" which signifies the site collection being created will be assigned to the root. The second is "/sites/" is a managed path. Managed paths in MOSS are used to provide a location to where site collections are hosted. This is a different definition from what they were in previous versions of SharePoint.

There are two types of managed paths, explicit and wildcard inclusion which can be used to create a hierarchy of site collections. It would be possible to have a top level site collection then create a "/dept/" wildcard managed path under which site collections like "/dept/hr/", "/dept/fin", "/dept/it", etc. are placed. However the out of the box site templates do not have the intelligence built into them to instinctively know that first the site collection is within a managed path and second there is possibly a parent site collection. The left navigation, breadcrumb, top right parent link, etc. will not reference the parent site collection and the user will not be able to navigate up or between site collections.

A work around for this issue would be to customize the site collections to have a link to the parent site. There is a MOSS configuration to assign an address to the site to site collection but the link is not managed. If the address were to change it would have to be manually updated everywhere. A better option would be to create a custom site definition and make it configurable to point to a parent site collection. Even another option would be to create a custom Feature which when turned on will make a site collection aware that is has parent and provide links to navigate back to it.

Note that enterprise search supports crawling content across site collections so there are no issues other than managing search. It is not trivial to just re-index content at drop of a hat so understanding how users will be searching across site collections is very important.

Repartitioning a Content Database

If the situation occurs when a content database has grown to the point where it has reached its quote limit, a new content database can be created and site collections can be moved to the new one. [9] The following steps can be used:

  • Lock down the site collection that will be moved so that no users can update the site collection.
  • Make sure that no one else will create any new site collections while this is being executed if the intent of the new content database is to be fully dedicated to the site collection to be moved.
  • Backup the site collection using the stsadm tool.
  • Then go to Central Administration and create a new content database in the web application where the site collection currently resides. Make sure to set the maximum number of sites to a very high number like 50,000 way more than the others.
  • Next backup the content databases just in case any issues could arise. This could be done through SQL Server.
  • Delete the site collection through Central Administration.
  • Restore the site collection using stsadm and it will be moved to the new site collection.
  • In Central Administration set the maximum number of site collections to 1 for the new content database if it is to be dedicated to a single site collection.

Assign a Site Collection to a Content Database

It is possible to designate a specific content database for a new site collection. The following steps can be done to accomplish this:

  • Use the Stsadm command-line tool to create a site collection in a specific database.
  • Dedicate a database to a single site collection by applying the following database capacity settings: Number of sites before a warning event is generated = 1 and Maximum number of sites that can be created in this database = 1
  • Add a group of site collections to a dedicated database by performing the following steps:
    • Within the Web application, create the database and set the database status to Ready.
    • Set the status of all other databases to Offline. While content databases are offline, new site collections cannot be created. However, existing site collections in offline databases are still accessible for both read and write operations.
    • Create the site collections. They are automatically added to the database.
    • Set the status of all other databases back to Ready.

Note that tools such as AvePoint DocAve provide comprehensive functionality to allow administrators to backup/restore content at a grandular level. I highly recommend in investing in these sort of tools.

7) Monitoring Best Practices


Content in site collections need to be continually reviewed by designated owners. Typically a "librarian" role needs to be created where this person's job is to review the content and ensure the content is well organized. If a succinct plan for maintaining site or sites is developed from the beginning they will stay current, useful, and usable.


"Plan for site maintenance and management" by Microsoft provides a great set questions that should be asked on a regular basis: [13]

  • Discuss with users in their sites and perform surveys to determine what users.
  • Make sure logging and reporting are used to determine which sites are used the most and map that back to user survey results. This will help you determine if content should be preemptively partitioned.
  • Archive or delete obsolete content, sites, etc.
  • Regularly review user site permissions and remove permissions to users whom no longer need access. Workflows can be created to manage this (K2 BlackPearl).
  • Have a plan for regular backups of site content, determine how often backups will be made, and the process for restoring content.
  • Ensure that site collections are being used correctly and content is being grouped into the correct site collection. From what I have seen this is extremely important and having librarians whose job is to manage this content is extremely important.
  • Creating too many sub-sites within a site collection will affect performance and usability. Microsoft recommends a limit of 2,000 sub-sites be in any site collection. If you have more than that, you should put in policies, procedures and a topology design to move this content and organize it better.
  • It is highly recommended that if a site collection will be extremely active just move it into its own site collection.

8) Recommendations

The following are some final recommendations and conclusions:

  • Utilize DocAve - The usage of DocAve has a lot of functionality to better allow administrators to manage content. Because of this the logical architecture site collections can be organized to support larger content databases making management less complex implementation.
  • Adaptable Architecture - Create an adaptable environment that can be scaled both logically and physically. The content placed in site collections should be organized that if content will grow over a threshold the content be partitioned across many content database and site collections.
  • Use Content Quotas - A good practice is to set a maximum site collection quota template that can be supported using SharePoint administration tools (stsadm). With the usage of a more robust back/up and restore tool like DovAve, the of setting quota to 20 GBs is less about supporting the stsadm tool versus knowing which content databases are growing more quickly.
  • Aggregate Portal - Create an aggregation portal solution if it is known that certain areas will grow exponentially.
  • Dedicated Content Databases - Create dedicated site collections and content databases for very active sites.
  • Agile Environment - Design the topology of site collections such that if the size becomes an issue a based predetermined thresholds it can be moved without adversely affecting organization of the content.
  • Audit Content Regularly - Perform regular reviews of site content using standardized criteria. Ensure to used the reports that are available in Central Administration.
  • Site Creation Governance with K2 BlackPearl - Limit site collection creation to a unique administration group to provide oversight and management of the environments content. Using workflow products like K2 BlackPearl can assist with creating automated processes for this. Make sure to hide site collection templates making them unavailable for self-service site creation. It is also suggested that standard site templates be designed or created.
  • Content Lifecycle Plan for Content - Establish a life-cycle management plan for sites, documents and content. Usage reporting can be used to help determine the life-cycle which may not be immediately apparent. Life cycle management can be tuned according to individual growth, retention, and data management planning. Again automated workflows using K2 BlackPearl can used assist with this management.

9) Resources


  1. SharePoint Content Database Restrictions... There are none!
  2. How large for a single SharePoint content database?
  3. Cluster or mirror?
  4. Site Collection Sizing Considerations
  5. MOSS 2007 - Import/Export Lessons Learnt
  6. Site Quotas
  7. Create quota templates (Office SharePoint Server)
  8. Logical architecture model: Corporate deployment
  9. Plan enterprise content storage
  10. Tips on Site Collection Sizing
  11. MOSS 2007 - Import/Export Lessons Learnt
  12. Export/Import traps for young players
  13. Plan for site maintenance and management (Office SharePoint Server)
  14. Can SharePoint Backup Harm Your Business
  15. Microsoft Office SharePoint Server (MOSS) 2007 Backup Strategies