Showing posts with label Content Database. Show all posts
Showing posts with label Content Database. Show all posts

Wednesday, September 26, 2012

SharePoint 2013 Technical Diagrams Notes

Introduction

SharePoint 2013 Preview Technical Diagrams are now available here - http://technet.microsoft.com/en-us/library/cc263199(v=office.15).aspx

Ever since SharePoint 2007 started publishing these technical diagrams, I have recommended that architects become very familiar with them. I always start here when trying to understand a new major product release for SharePoint. If you search my blog, you will see that I have directly referenced these diagrams when building SharePoint strategies for customers.

The following is a high-level review of the new architecture changes available with SharePoint 2013.

Corporate Portal Diagrams

I reviewed the two new Corporate Portal Diagrams for SharePoint 2013. From a logical architecture perspective, these diagrams do not have any major changes from the SharePoint 2010 versions. These diagrams accurately capture how organizations should build web applications, site collections, application pools, SharePoint services, etc. to support major business initiatives. The diagrams are still a must read for people who are new or who need a refresher to understand how they should be segregating content and business functions across SharePoint.

Extranet Diagram

The new diagram for SharePoint 2013 extranet architecture closely resembles the corporate portal diagrams, however it is not very revealing on the type of information organizations need when making a decision on how to deploy an extranet. Looking back at the SharePoint 2010 Extranet Topologies diagrams (http://technet.microsoft.com/en-us/library/cc263199.aspx), I find that diagram to be much more helpful and the information contained here still holds true with SharePoint 2013. I would recommend reviewing both of them together.

Services in SharePoint 2013 Diagram

I admit this has always been one of my more favorite diagrams. When this was released in SharePoint 2010, it captured a fundamental change in how SharePoint services are configured and delivered. This new architecture was created to support Microsoft’s ability to deliver SharePoint Online as a SaaS solution.

I reviewed this diagram and nothing has significantly changed in regards to sharing services across farms, the logical architecture of services, service groups and service deployment.

In the services table there are a few new services that have been added.

  • Access Services – Do not be confused by this. Yes there was Access Services in SharePoint 2010. At this early point, I know that that Access Services for SharePoint 2013 have been changed to be more focused on utilize the new App Architecture. As such, Access Services for SharePoint 2013 is pretty different. Access Services solutions created in SharePoint 2010 will still be supported moving forward, however they will run in a different service.
  • App Management Services - This is a new service that will be used specifically for supporting the new internal catalog or the public SharePoint store. Remember that in SharePoint 2013, everything is an app; EVERYTHING. Even everyday SharePoint lists are now called an app. Once you get over the name change, you will find out it makes complete sense and Microsoft has just aligned what is does with how business users talk about technology.
  • Machine Translation Service – This is a new one and as of right now, I do not have much information on the purpose of this service other than the description which says it performs automated machine translation.
  • Work Management Services – This service provides task aggregation across management systems including SharePoint, Exchange and Project Server. This is huge from a user perspective. One single place to see all of your tasks. No more building content query web parts to find all tasks; this effectively does this plus goes outside the SharePoint boundary to find more tasks. This is a very exciting service.
  • Office Web App Services – Is called out in here as a service that is no longer running inside of SharePoint server. Why? Microsoft strategy is to provide Office Web App Service to other enterprise application than just SharePoint and it strategically made sense to move it out of SharePoint.

In the rest of this diagram there are architecture diagrams for how to architect service groups across farms, none of which have changed from SharePoint 2010. If you are not familiar with this stuff, this is a must read and I recommend reading my old posting on it here.

Mobile Architecture Diagram

There is a brand new mobile architecture diagram provided and obviously this is drive by Microsoft’s focus on being a “services and devices” company. This is a pretty simple architecture that basically describes some things you need to think about if you are going to support mobile to your users and discusses some of the mobile capabilities. This can serve as a launch point for you to begin to dive deeper into how you will support mobile for your organization. The following are some high-level observations I had when reading this the first time:

Extranet – If you are not thinking extranet, you need to so you mobile users can access content when they are on a mobile device. They have some diagrams which will get your started thinking about it and additionally how you can use Unified Access Gateway (UAG) as a reverse proxy to help with that.

Mobile Device Management (MDM) – One interesting thing brought up in this diagram is how do you manage mobile devices? If you need something simple, you can leverage Exchange ActiveSync for remote device wipe, password enforcement, etc. If you are looking for application level MDM there are additional solutions out in the marketplace today that provide even more capabilities.

Application Architecture – The new SharePoint 2013 mobile architecture is introduced. They break it down basically into two logical layers: mobile and SharePoint. Some key points are:

  • Automatic Mobile Browser Redirection – Is a new capability that can be used to optimize the mobile experience based on the connecting device. This Feature must be active on the site and will be activated by default on numerous site templates. First there is the Classic View which is used to provide backwards capability to mobile devices and will have a SharePoint 2010 mobile browser experience. Then there is the Contemporary View which is geared to support HTML5 browsers. The Contemporary View is several enhanced features for navigation of SharePoint sites. Additionally, Full Site View is available so the SharePoint site page can be viewed as if it were on a desktop browser or a tablet device.
  • Office Hub for Windows Phone – Is an application for Windows phone devices that provides enhanced capabilities to access SharePoint content from multiple places in one spot. It also leverages mobile Office.
  • Location – There is a new geo-location field type that is available in a SharePoint List. This can make a list location aware to capture latitude and longitude which can be used with map applications. For instance, if a user enters in data on their mobile device, it will capture where it was done from and then can be displayed on a map. Here is some more information about this - http://technet.microsoft.com/en-us/library/fp161355(v=office.15).aspx
  • Push Notifications – There is a new capability to allow notifications to be sent from a SharePoint site to registered applications running on a mobile device. The nice thing about this is that Windows Phone Apps can receive notifications without having to poll. Here is some additional reading on the topic - http://msdn.microsoft.com/en-us/library/jj163784(v=office.15).aspx
  • Device Channels – This is a really important new capability as device channels allow you to deliver a publishing site geared specifically to support different types of remote devices. Basically the site can mapped to multiple master pages and style sheets and even control what content you want to make available to specific devices. Here is an overview on the new device channels - http://technet.microsoft.com/en-us/library/fp161351(v=office.15).aspx
  • Office Web Apps – As mentioned earlier in this posting, Office Web Apps is now a separate standalone server which does not run inside of the SharePoint boundary. Office Web Apps has been improved a lot to support mobile devices. There are Word, Excel and PowerPoint Mobile Viewers.

SharePoint 2013 Upgrade Process Diagrams

There are two upgrade diagrams that have been provided. Here are some high points I walked away with:

  • Must be on SharePoint 2010 – To upgrade, you must be on SharePoint 2010 technologies. This means if you are on SharePoint 2003, 2007 or 2010, you will need to upgrade to the appropriate version to get to SharePoint 2013. There are Microsoft migration partners that have solutions to assist with this. I saw many times, this is the big value proposition for using SharePoint Online as this is handled for customers.
  • Database Attach Upgrade – Is the only supported method for upgrading. There is no more “in-place” upgrade option. Frankly that is fine because most customers always went down a database-attach upgrade.
  • Preparation – much of the preparation activities that we have discussed in the past with SharePoint 2010 hold true with SharePoint 2013. There is a bunch of information you are responsible for gathering.
  • Manual Configuration Settings – In the preparation phase is recommended to get a understanding of all the custom configurations that you may have done because not all of them are going to be migrated. This is because not all databases are upgraded. So many custom configurations in central admin such as alternate access methods, time job tweaks, managed paths, incoming/outgoing email settings, certificates, etc. will need to be documented and reconfigured in the new farm.
  • Databases That Can Be Upgraded – There is a set of databases that can be upgraded. They are Content, BDC, Managed Metadata, PerformancePoint, Secure Store, Search and User Profile databases.
  • Customizations – This is an important task that needs to be completed. I have seen many cases where good software organizations have not implemented a strong configuration management process and the result is an organization may not know about all the customize code that may be implemented. There are numerous ways to find all of them by running PowerShell commands, doing system directory diffs, checking web.config, etc.
  • Upgrade Health Checks – There are some new features that are available to site collection administrators that will show you a health check of a site collection before actually upgrading the site collection.
  • Evaluation Site Collection – Site Collection Administrators also have the ability to request the site collection be copied into a new site collection to evaluate how the upgrade will affect any customizations they may have. This is helpful so you can remediate issues before you actually perform the upgrade. This is also nice because your site collection will run in a SharePoint 2010 mode until you are ready to actually upgrade it.
  • Testing – Just like for SharePoint 2010, the best way to prepare for a migration is to build up your new SP 2013 farm and then multiple practices runs of that upgrade into the new production environment. An entire process is defined in one of the diagrams and is a great place to start.

SharePoint 2013 Search Diagrams

If you are a reader of my blog, I wrote some long postings about the search architecture for both SharePoint 2010 Search (here) and FAST for SharePoint 2010 (here). I am not going to do a deep dive into all these search components and roles because they are basically covered. As many people now know, the FAST search engine is now the core search engine for SharePoint. It will just be referred to as SharePoint Search. Now you will be able to leverage a very powerful search engine out of the box. However many of the advanced enterprise features of search will only be available in the SharePoint Enterprise addition. I am also really excited about this for SharePoint Online because it can leverage FAST too. SharePoint Online will not be able to do Enterprise Search of line of business systems but a Search Farm (which is FAST underneath the hood) can be configured on premise and SharePoint Online can invoke that search and provide the search results in the cloud; pretty exciting.

I highly recommend taking the time to review both of these diagrams. It explains how each of the components interacts with each other. Additionally there is a diagram the goes into how to scale the server farms for the amount of content you will need to index. There is a great, new table in there that shows you how scaling will work. To be honest, the folks who are really serious about search will say it is an art and a table does not always communicate how you will do it. It always comes down to how many items, the types of data sources, custom transformations, query latency, index latency, etc.

SharePoint 2013 App Overview Diagram

This is an area I plan to do a lot more exploration of this coming year on and writing on this blog. Why? This is something we have been waiting for a long time with SharePoint development. There are several ways to look and this. SharePoint Features which we have been writing for years are Apps. This is name change to better communicate our technology to the end users who have to use SharePoint. However the new SharePoint App architecture is way more than that.

I have seen so many things over the years.

  • I think one of the biggest challenges people would run into is developing great SharePoint solutions only to find out they incorporated some dependency they should not have, they wrote some high-end code that should not be running in the SharePoint layer, they cannot leverage their solution outside the SharePoint boundary, etc. We want to resolve those problems by helping developers to deploy solutions in way that will keep their SharePoint environment nimble.
  • Plus we want to provide third-party vendors quicker access to customers. We want to help customers to quickly acquire third-party solutions.
  • Additionally we want to allow customers to leverage commodity based SharePoint Online. As you may know SharePoint Online has restrictions on high-end custom development and if that code where to run in another location, while be highly integrated with SharePoint Online, that is a huge win.

I will thing of many more reason this year on why this is so great J

Now we can achieve this through the new SharePoint App architecture. The old SharePoint Solution architecture where you create a WSP is still around. Nothing has changed there. This is used to create deployment packages and in many cases is used to deploy code that requires full trust. SharePoint Solution packages will continue to be used by third-party vendors or developed internally with such tools as Visual Studio. You can still create Sandbox Solutions which run in a more secure runtime and can be deployed in SharePoint Online.

Now the new Apps framework for SharePoint 2013 is a packaged up in a file called .APP. It is composed of many of the same types of files, AppManifest, embedded Solution.wsp, etc. Once an app is loaded into SharePoint, it is accessible through the App Catalog in SharePoint. This App Catalog can be controlled at an organizational level.

Remember the big point with Apps is, that the custom solution you are writing may or may not actually run in SharePoint. Full trust code is not supported. Your custom solution code itself may run in a different SharePoint farm, on an IIS server as ASP.net pages, ASP.net pages running in Windows Azure, etc. So how does SharePoint access these solutions running outside the SharePoint context? In simple terms we have an IFrame (with some extensions) that external solution is available through. OAuth provides the secure connection for access SharePoint objects from a remote location. We will additionally use a new extended and robust event model and remote client SharePoint library to write integrated, remote code.

Why is this so great? We are going to ensure that custom applications and solutions that are being developed with SharePoint are isolated. No more writing a bunch code and services that should not be running in SharePoint servers. It is great that you can do whatever you want with SharePoint, however this will drive solution management.

So you may be asking where does this get deployed? There are many different options for hosting.

  • SharePoint Hosted – This means the app and all the resources run in SharePoint. Remember you server side is not supported however you can write applications with SharePoint’s JavaScript libraries and such.
  • Windows Azure Autohosted – This is a model that is only supported in SharePoint Online. In This case you can write an App package that will have code for Azure and SQL Azure embedded into it. When the application is deployed, the azure solution will be automatically deployed for you. You do not have to go to Azure and set anything up at all; it is all handled for you behind the scenes.
  • Provider-Hosted – This is the third model where custom code and solutions are hosted in a separate server in your organization, hosted in Azure, hosted in different SharePoint servers, etc.

Once an App package is installed, it can be managed and monitored through the catalog. End users have the ability to select an app to run in their sites (much in the same way as turning on a Feature). If and when an app is updated, the user can decide how they want to upgrade to the new app.

Again I really plan to go much deeper on this in my blog but for right now, these are just some introduction notes and ramblings on how excited I am about this new capability J

Back Up and Recovery Diagram

There is a new diagram that goes into the details of doing your own back-up and recovery for SharePoint 2013. I know many people have become accustomed to using third-party vendors for supporting these operations and I still believe these vendors will continue to provide features above and beyond what is out of the box. However, if you are a do it yourself sort of person, this is a great diagram to review.

Not much has changed in regards to developing back-up procedures for both the SQL Servers and the SharePoint Servers. There are tons of scenarios covered in here, and I recommend reading this if it is important to you.

SharePoint 2013 Database Diagram

Finally the database server diagram has been updated. This is a really really really important diagram to review if you are managing on-premise servers. It goes over all the SharePoint databases, plus provide sizing and scaling guidance. Great information.

Tuesday, March 13, 2012

SQL Server 2012 Brings New Features and Capabilities for SharePoint 2010

Introduction
If you have SharePoint customer or architect and you really should be looking at the new SQL 2012 release. As we all know SharePoint success is highly contingent on SQL Server. You need a strong SQL Server deployment to ensure there is good performance, high availability, back up/recovery, etc. Additionally Microsoft delivers their Business Intelligence (BI) stack through SharePoint and there have been several new features and capabilities added there as well.
For SQL 2012 there are three areas where there have been major capability additions:
  1. Mission Critical Confidence – This is the capability to more easily deliver high-availability with lower total cost of ownership.
  2. Breakthrough Insight – New and expanded capabilities for Business Intelligence.
  3. Cloud on Your Terms – Additional capabilities to create SQL databases in either the private or public cloud (Office 365).
In this blog I am going to cover some of the new SQL 2010 capabilities specifically focusing on how they can improve your SharePoint 2010 implementation. Please note that there are a lot of new capabilities for SQL Server 2012 which I have not covered such as data warehousing, resource management, full text searching, auditing, Big Data (Hadoop), etc. Here is a good reference to quickly spin up on all the new capabilities (“What's New in SQL Server 2012 Whitepaper” located here).
Mission Critical Confidence
AlwaysOn
First and foremost is the new high-availability solution in SQL Server 2012 called AlwaysOn. One of the first and foremost challenges that organizations face with deploying SharePoint 2010 is providing a solution architecture that will meet the SLA’s of their business users. Business processes, technical processes and Governance need to be put in place to ensure that SharePoint 2010 will be up as much as possible.
In the past with SQL Server 2008 R2, you employ such solutions as clustering, mirroring, log shipping and replication (my previous blog on this topic). However this could require a lot of planning, configuration, and management. With SQL 2012 AlwaysOn, new configuration wizards and tuning tools are now provided that makes set-up and configuration of High-Availability extremely simple.
The concept of Availability Groups have been added which specifically makes configuration of Database Mirroring easier. Availability Groups are a logical of databases that failover together. Through the configuration wizard, you can determine if you need such things as automatic or manual failover, set-up of primary and multiple secondary instances, synchronous or asynchronous data movement, etc.
Availability Groups remove the need for shared disk storage (SAN or NAS) for deployment of a failover cluster instance. Note that AlwaysOn Failover Cluster Instances support multiple-site clustering across subnets which subsequently enables cross datacenter failover.
This feature is very useful when setting up your SharePoint 2010 (or 2007) farm because High Availability is one of the most paramount tasks that are needed when setting up a mission critical SharePoint environment.
Recovery Advisor
Database Recovery Advisor provides many new features and capability for the support of back-up and restore of databases through SQL Server Management Studio. The new Recovery Advisor streamlines the back-up process. One such solution is a visual timeline that provides the backup history and all the available points in which you can restore from.
There is also new capability called Split File Backup which allows you to split a backup into multiple files. This allows for quicker backup and restores because they can be written and restored across disks running in parallel.
This is very helpful for improving the amount of time to work with large SharePoint 2010 (or 2007) Content Databases that have grown significantly.
Breakthrough Insight
As you probably already know SharePoint 2010 is where Microsoft’s Business Intelligence (BI) stack is delivered. This is a combination of solutions such as Excel Services, PerformancePoint, Visio Services, Chart Web Parts (Dundas), PowerPivot, and SQL Reporting Services (SSRS). With the release of SQL 2012 a new solution called Power View is now provided, PowerPivot can now be done on SharePoint 2010 server side and there is significantly enhanced integration with SSRS.
Power View
Power View is a new highly interactive data exploration and reporting tool that allows business users to visually explore data, in an ad-hoc fashion. End users can create the reports/dashboards very quickly, create shapes/graphs with clicks, create animation, highlight capabilities based on rules, drill down relationships and performs very well with large datasets. The design environment is very similar to Office. It can be published through SharePoint 2010 as shown in the diagram below.image
Note that Power View requires SQL Reporting Services (SSRS) to be integrated with SharePoint 2010 and there be an instance of SQL Server 2012 Analysis Services (SSAS) or PowerPivot be available.
PowerPivot
Up to this point PowerPivot was a solution that was available through Excel client only via an Add-in. This is a really powerful, end user friendly capability that allows for data analytics. There are actually several new capabilities that have been introduced that I recommend you read up on (in references below).
Specifically to SharePoint 2010, PowerPivot for SharePoint is available as an add-in and can run server side in conjunction with Excel Services. This now allows end users the ability to publish PowerPivot reports through the browser allowing end users easy access to this data.
image
As well, in SharePoint 2010 Central Administration there is a new PowerPivot Management Dashboard that provides several reports on performance of reports.
image
SQL Reporting Services (SSRS)
Personally the enhancements to SSRS are very exciting to me as I have had to do configuration of SSRS with SharePoint 2007 and 2010 in the past in production environments. If you have done it before, you may recall it was tedious task of configuration the SSRS Configuration Wizard and getting Kerberos Authentication set-up correctly. Now with SQL Server 2012, configuration of SSRS with SharePoint 2010 is completely done through Central Administration. Additionally there is a new service in SharePoint 2010 that runs SSRS, it supports WCF/Claims authentication, integrated ULS logging, built in load balancing across SharePoint servers, report performance improvements and there are PowerShell commands for management.
image
Additionally there is a SSRS alerting capability that allows end users to subscribe to alerts that are associated to SSRS report. Users can create conditions for any report, and when they are met a notification will be sent to them.
image
One more last addition change is you may know that SSRS reports can generate Word or Excel Documents. Up to now, the file formats generated were either .doc or .xls which means they are not Open XML document renditions. With SQL Server 2012 and SSRS now .docx and .xlsx file types will now be generated.
Cloud on Your Terms
With SQL Server 2012, there are a bunch of new capabilities to support even tighter integration with SQL Azure. You will have a unified database development experience between SQL 2012 on-prem and SQL Server. The reason why I bring this up is because it is very common that when creating a solution inside SharePoint, you will have complex data management requirements that can be better supported with a SQL Server database instead of a SharePoint list.
Now we have the ability to more quickly and efficiently create custom database in SQL Azure which can be utilized with SharePoint 2010. Plus this is really good for working with Office 365 SharePoint Online because we can make a cloud to cloud connection to work with advanced data structures.
References

Tuesday, November 1, 2011

SharePoint 2010 Max File Size

I have as well been receiving an inordinate amount of questions associated to the max file size the SharePoint 2010 supports. It is been clearly stated for some time that the max file size supported by SharePoint 2010 is 2 GB - http://technet.microsoft.com/en-us/library/cc262787.aspx#ListLibrary. A lot of customers are asking why. I am not on product team but usually couch it as SharePoint is a web based ECM platform. Two GB uploads and downloads is not very effective over the internet in general. Yes, internet speeds are getting better but still, that is a big file. Plus, if you are not using Remote BLOB storage, that is a lot of data to put into SQL Server.

If you have files that are larger than 2 GB you should look into storing the content externally and then utilize enterprise search and third party web parts to make the large files accessible through SharePoint 2010.

SharePoint 2010 Content Database Sizing

A lot of people already know this but on July 14, 2011 it was announced that Content Database Sizing has been updated http://technet.microsoft.com/en-us/library/cc262787.aspx#ContentDB.

There is still recommendation to keep content databases at 200GB. Why? This recommendation is still a good recommendation because if you need to back-up and restore a database very quickly, you do not have move around large back-up files. Plus if you have an Information Architecture that drives content to specific site collections (with dedicated content databases), you will be much more agile to requirement changes, upgrades, etc. One big massive content database is an indication of poor planning and governance.

The first new recommendations in this article is that 4TB of data can be stored in a content database. There are some parameters around this recommendation that you should read.

The second new recommendation is that there is no explicit recommendation on sizing for document archiving scenarios. However there are some very specific recommendations made in here to support that scenario – so review http://technet.microsoft.com/en-us/library/cc262787.aspx#ContentDB. This is really important for Records Management solutions. On this specific point, it would also be good to review the “Extremely large-scale content archive” section in http://technet.microsoft.com/en-us/library/cc263028.aspx. When going down this path you will need to have Remote Blob Storage and FAST to support this solution architecture.

Finally SQL Server tuning is fundamental to your success for managing large content databases. Here is a blog that I wrote on the topic - http://www.astaticstate.com/2010/12/sharepoint-2010-high-availability-with.html

Saturday, January 24, 2009

Content Database Restore File Not Found

We recently experienced a rather weird experience when moving some SharePoint content databases. We are not using stsadmn. We have several environments, all the same, and we just wanted to move the content DB from one environment to another. We have done this several times with no error.

This time around when we did it, we were getting File Not Found errors when we try to go to the site. We went into Central Admin and the Content Database was listed on the Manage Content Databases page, however if you went to view the Site Collection list no information was listed. On the Site Collection page you should see things such as a URL, Title, some other stuff and a Database Name. All of this was blank. There were no exceptions in the Event Log on the box and no errors either in the SharePoint log.

We did some searching and could not find anything. Andy on my team found this posting. It had nothing to do with this error however we decided to give it a try. It does not actually delete the databases, it just removes the association between the MOSS site and the database.

Stsadm -o deletecontentdb -url [Site URL] -databasename [database name]

Stsadm -o addcontentdb -url [Site URL] -databasename [database name]

Once we did this, everything started running again.

References:

Friday, January 18, 2008

Demystify MOSS Content Database Sizing

1) Introduction

A common question that has arisen how big content databases should be and what is the governance around the management of those databases. Hopefully this will demystify a lot around content database sizing for MOSS. Much of this information I have gathered in trying to gain my own understanding of the topic. I will discuss considerations, assumptions, various best practices, how to use central administration functionality, managing growth, recommendations, etc. One conclusion is not all of this can be done out of the box and tools like AvePoint DocAve and K2 BlackPearl can be used to create governance around an enterprise deployment of MOSS.

2) Content Database Sizing
MOSS has no restrictions on the size of content databases. Many available resources available state that a content database should range from 30 to 50GB but this guidance originated from WSS v2/SPS 2003 in regards to quick recovery of documents and site collections. If configured correctly, WSS v3 and MOSS 2007 implementations can have content databases that can handle terabytes of data. [1] Still the ability to quickly backup and restore content database is important. Most state that a best practice is keep content databases capped at 20 GBs of data to support backup/restore requirements. Tools like AvePoint can be purchased that can alleviate this issue however both the logical and physical architecture of the SharePoint implementation should be evaluated.

When evaluating the size of a SharePoint content database things like high availability, disaster recovery and backup methods must be understood. [2] The size of the database should always be discussed with database administrators. Knowing their preferences and what their environment can support is critical to understanding how many and acceptable sizes for the content databases that can be manage moving forward.

It is important to know how the database administrators strategy for managing back-ups (clustering, database mirroring, log shipping, etc.). For instance if mirroring is used (new to SQL 2005) a higher availability could be achieved but there is a higher operational cost and learning experience associated to it. Using SQL Clustering is the most common solution offering greater processing power and scalability but uses shared storage. A log shipping strategy could be used with either clustering or mirroring but things such as latency between publishing and subscribing servers should be considered. [3] This may or may not affect your end decisions but is value information to know.

In the end there are two architectural questions that need to be answered are:

  • How large to make the content databases?
  • How many content database instances should be created?

3) Types of High Capacity MOSS Sites

MOSS supports high-capacity document storage which can allow for the storage of a 5 million documents with a document library. Performance of a site containing a large document library can degrade depending on how the content is used. Microsoft has provided some guidance to help support the design of large-scale content management solutions. Design decisions and governance made around the capacities of site collections, sites, and libraries in MOSS must be taken into account not only for the physical storage constraints but also the content usage and viewing patterns of the users.

Microsoft classifies three types of large scale content management databases. [9]

  • Large Scale Authoring Environment: Contains a library within which users actively edit 50,000 or more documents across 500 or more folders. There is heavy collaboration on documents, versioning, workflows, custom content types, etc. The database could be as large as 150GB of data. [9]
  • Large Scale Content Archive: A large-scale archive is a repository where users view or upload documents. Little to no collaboration occurs on the site. These sites can be classified as a knowledge base or records management solution. A knowledge base site would typically have a single version of most documents, containing around1,000,000 or more documents. The content would be stored in a single database as large as 400 GB. On the other hand a records center site template features and functionality that are used to manage the retention and disposition of company records. Records are can be described as evidence of activities or transactions performed by the organization and that must be retained for some time period for compliance or regulatory requirements. [9]
  • Extremely Large Scale Content Archive: Microsoft describes this as an archive might contain up to 10,000,000 documents distributed across 5,000 or more folders with a database larger than 3TB. There could be 50,000 or more users primarily browsing content by searching. [9]
Personally I have not run into any scenario yet where MOSS has had to go to that scale.

4) Database Sizing Considerations
For large content databases understanding the topology of the website and the organization of the content will help drive how many site collections and content databases are needed. A single large content database can provide a better overall picture of all content and the ability to aggregate data. They typically allow for easier backup/restore policies as there is only a single content database. Putting content that is highly related to each other into the same site collection allows administrators to focus on a single content database. [4]

The following are some performance recommendations that have been made by Microsoft. Before going down the path of using a few large content databases there are a few things that should be considered. First is the potential performance implication if the site collection will host a large amount of content (either at the root level or in the sub webs). Typically, performance will become progressively worsened as the number of total objects exceeds 2,000 (sites, libraries, documents, etc.). [4] The following are recommendations provided by Microsoft for improving performance at each level of storage when high volumes of content are being stored. [9]
  • Site Collection: 2,000 subsites of any site is the recommended limit.
  • Site: 2,000 libraries and lists is the recommended limit.
  • Library: 10,000,000 documents is the recommended limit. 2,000 items per view is the recommended limit.
  • Folder: 2,000 items per folder is the recommended limit.
Delete content is can become problematic as the size of the content database grows significantly. It is possible that SQL blocking can occur when a large number of items are deleted because the high number of transactions occurring on the SQL database server during a delete operation. Additionally, sites that will have high amounts of concurrent users actively collaborating can also have negative implications as SQL transaction logs can grow rapidly creating challenges for backup/recovery. [4]

If a number of site collections can be accommodated with multiple content databases to manage them some benefits can be realized. For instance content can be managed by business line and usage of analysis reporting can be more targeted which can drive governance, manageability and metrics which can be used to gain insight into the business. Even though content would be broken apart over site collections aggregation of data can still be achieved by creating an aggregation portal. [4]

An aggregation portal would be a top level site collection that has the links to all of the site collections below it. There is currently no out of the box way to elegantly link site collections together. The aggregation portal will have some custom code built to make it aware that there are site collections below it and provide navigation functionality to them. The aggregation portal will more than likely require views of data across all site collections. This can be accomplished by using custom web parts written to retrieve data this data and present it. Another option for an aggregation portal is a records center where content is published it.

Joel Oleson, a SharePoint MVP, specifically notes that "in WSS 3.0/MOSS 2007 I recommend you pick a maximum site collection quota of no larger than 15GB, excluding the top level site collection and to move sites to dedicated databases where they need to grow beyond this. SharePoint content databases can scale to hundreds of GB and SharePoint farms can handle TBs of content. These recommendations are based on my experience with MS IT and in conversations with product support, and your experience may vary." [10]

This statement is consistent with other colleagues that I work with that have been confronted with this challenge.

5) Facilities to Manage Content Database Sizing
Content databases are databases used to store all MOSS content. By default a single content database will be created for each web application. As new site collections are added to a web application the content for that site collection will be stored in the database assigned to the parent web app. Central Administration site can be used to add new databases.

Within central administration there are is some basic information indicating the database location, the account used to make a connection to the database and capacity setting which control how many sites can be added to the content database. Using the capacity setting can help control the size of the databases from a governance perspective. If the site capacity is low, more content databases will be required to provision new sites in a site collection. Now this does not control the actual size of the database nor control the amount of content that is in a specific site. But if it can be planned on large "typical" sites within the site collection are, when certain "thresholds" are exceeded administrations can be warned.


Site quotas should be created to help with the governance of the actual size of the content databases. First a site quota template should be created that specifies that when a site collection reaches a specific size the site collection administrator will be notified.

Then when a new site collection is created the quota template can be selected.

It is possible to change the site quota template after a site collection has been created. It is possible to assign unique quota size values to specific site collection. As well, locks can be placed on the site collection to block new content from being added. This allows an administrator to take action before the site collection reaches the maximum size.


As well SharePoint Central Administration provides workflows that manage content that is not being actively used. This is turned on at the site collection level and provides a proactive way to delete web content that is no longer in use. When turned on SharePoint will email the site collection administrator if the content is still being used. If the confirmation remain un-challenged by the site collection administer the site collection will be removed. This functionality would be used in collaborative environment where team, project and workspace sites are heavily used and not needed to be retained (i.e. never to be used with a records center).

6) Managing Growth
As a site collection grows to a point where performance suffers administrators can break content up using the stadmn operations to manage the fastest growing sites into their own site collection. To support the possible partitioning of content across site collections the topology of content must be considered to it is agile. This can be as simple as having capturing guidance on should be considered a top level site such that it can be moved easily.

Note that site collections, even defined within the same web applications, are not linked to each other by using the out of the box portal template. The following screenshot shows a screen where an administrator can create a new site collection. In the Web Site Address section there are two options by default. The first is a "/" which signifies the site collection being created will be assigned to the root. The second is "/sites/" is a managed path. Managed paths in MOSS are used to provide a location to where site collections are hosted. This is a different definition from what they were in previous versions of SharePoint.

There are two types of managed paths, explicit and wildcard inclusion which can be used to create a hierarchy of site collections. It would be possible to have a top level site collection then create a "/dept/" wildcard managed path under which site collections like "/dept/hr/", "/dept/fin", "/dept/it", etc. are placed. However the out of the box site templates do not have the intelligence built into them to instinctively know that first the site collection is within a managed path and second there is possibly a parent site collection. The left navigation, breadcrumb, top right parent link, etc. will not reference the parent site collection and the user will not be able to navigate up or between site collections.

A work around for this issue would be to customize the site collections to have a link to the parent site. There is a MOSS configuration to assign an address to the site to site collection but the link is not managed. If the address were to change it would have to be manually updated everywhere. A better option would be to create a custom site definition and make it configurable to point to a parent site collection. Even another option would be to create a custom Feature which when turned on will make a site collection aware that is has parent and provide links to navigate back to it.

Note that enterprise search supports crawling content across site collections so there are no issues other than managing search. It is not trivial to just re-index content at drop of a hat so understanding how users will be searching across site collections is very important.

Repartitioning a Content Database

If the situation occurs when a content database has grown to the point where it has reached its quote limit, a new content database can be created and site collections can be moved to the new one. [9] The following steps can be used:

  • Lock down the site collection that will be moved so that no users can update the site collection.
  • Make sure that no one else will create any new site collections while this is being executed if the intent of the new content database is to be fully dedicated to the site collection to be moved.
  • Backup the site collection using the stsadm tool.
  • Then go to Central Administration and create a new content database in the web application where the site collection currently resides. Make sure to set the maximum number of sites to a very high number like 50,000 way more than the others.
  • Next backup the content databases just in case any issues could arise. This could be done through SQL Server.
  • Delete the site collection through Central Administration.
  • Restore the site collection using stsadm and it will be moved to the new site collection.
  • In Central Administration set the maximum number of site collections to 1 for the new content database if it is to be dedicated to a single site collection.

Assign a Site Collection to a Content Database

It is possible to designate a specific content database for a new site collection. The following steps can be done to accomplish this:

  • Use the Stsadm command-line tool to create a site collection in a specific database.
  • Dedicate a database to a single site collection by applying the following database capacity settings: Number of sites before a warning event is generated = 1 and Maximum number of sites that can be created in this database = 1
  • Add a group of site collections to a dedicated database by performing the following steps:
    • Within the Web application, create the database and set the database status to Ready.
    • Set the status of all other databases to Offline. While content databases are offline, new site collections cannot be created. However, existing site collections in offline databases are still accessible for both read and write operations.
    • Create the site collections. They are automatically added to the database.
    • Set the status of all other databases back to Ready.

Note that tools such as AvePoint DocAve provide comprehensive functionality to allow administrators to backup/restore content at a grandular level. I highly recommend in investing in these sort of tools.

7) Monitoring Best Practices


Content in site collections need to be continually reviewed by designated owners. Typically a "librarian" role needs to be created where this person's job is to review the content and ensure the content is well organized. If a succinct plan for maintaining site or sites is developed from the beginning they will stay current, useful, and usable.


"Plan for site maintenance and management" by Microsoft provides a great set questions that should be asked on a regular basis: [13]

  • Discuss with users in their sites and perform surveys to determine what users.
  • Make sure logging and reporting are used to determine which sites are used the most and map that back to user survey results. This will help you determine if content should be preemptively partitioned.
  • Archive or delete obsolete content, sites, etc.
  • Regularly review user site permissions and remove permissions to users whom no longer need access. Workflows can be created to manage this (K2 BlackPearl).
  • Have a plan for regular backups of site content, determine how often backups will be made, and the process for restoring content.
  • Ensure that site collections are being used correctly and content is being grouped into the correct site collection. From what I have seen this is extremely important and having librarians whose job is to manage this content is extremely important.
  • Creating too many sub-sites within a site collection will affect performance and usability. Microsoft recommends a limit of 2,000 sub-sites be in any site collection. If you have more than that, you should put in policies, procedures and a topology design to move this content and organize it better.
  • It is highly recommended that if a site collection will be extremely active just move it into its own site collection.

8) Recommendations

The following are some final recommendations and conclusions:

  • Utilize DocAve - The usage of DocAve has a lot of functionality to better allow administrators to manage content. Because of this the logical architecture site collections can be organized to support larger content databases making management less complex implementation.
  • Adaptable Architecture - Create an adaptable environment that can be scaled both logically and physically. The content placed in site collections should be organized that if content will grow over a threshold the content be partitioned across many content database and site collections.
  • Use Content Quotas - A good practice is to set a maximum site collection quota template that can be supported using SharePoint administration tools (stsadm). With the usage of a more robust back/up and restore tool like DovAve, the of setting quota to 20 GBs is less about supporting the stsadm tool versus knowing which content databases are growing more quickly.
  • Aggregate Portal - Create an aggregation portal solution if it is known that certain areas will grow exponentially.
  • Dedicated Content Databases - Create dedicated site collections and content databases for very active sites.
  • Agile Environment - Design the topology of site collections such that if the size becomes an issue a based predetermined thresholds it can be moved without adversely affecting organization of the content.
  • Audit Content Regularly - Perform regular reviews of site content using standardized criteria. Ensure to used the reports that are available in Central Administration.
  • Site Creation Governance with K2 BlackPearl - Limit site collection creation to a unique administration group to provide oversight and management of the environments content. Using workflow products like K2 BlackPearl can assist with creating automated processes for this. Make sure to hide site collection templates making them unavailable for self-service site creation. It is also suggested that standard site templates be designed or created.
  • Content Lifecycle Plan for Content - Establish a life-cycle management plan for sites, documents and content. Usage reporting can be used to help determine the life-cycle which may not be immediately apparent. Life cycle management can be tuned according to individual growth, retention, and data management planning. Again automated workflows using K2 BlackPearl can used assist with this management.

9) Resources


  1. SharePoint Content Database Restrictions... There are none!
  2. How large for a single SharePoint content database?
  3. Cluster or mirror?
  4. Site Collection Sizing Considerations
  5. MOSS 2007 - Import/Export Lessons Learnt
  6. Site Quotas
  7. Create quota templates (Office SharePoint Server)
  8. Logical architecture model: Corporate deployment
  9. Plan enterprise content storage
  10. Tips on Site Collection Sizing
  11. MOSS 2007 - Import/Export Lessons Learnt
  12. Export/Import traps for young players
  13. Plan for site maintenance and management (Office SharePoint Server)
  14. Can SharePoint Backup Harm Your Business
  15. Microsoft Office SharePoint Server (MOSS) 2007 Backup Strategies