Friday, January 29, 2010

SharePoint 2010 Beta VPC

I was talking with someone in the community and they pointed this resource out to me.

It is called the "2010 Information Worker Demonstration Virtual Machine (Beta)" and you can download it from here - http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=0c51819b-3d40-435c-a103-a5481fe0a0d2

Look at all the cool stuff that is preconfigured with and ready to go.

  • Windows Server 2008 SP2 Standard Edition x64, running as an Active Directory Domain Controller for the "CONTOSO.COM" domain with DNS and WINS
  • Microsoft SQL Server 2008 SP2 Enterprise Edition with Analysis, Notification, and Reporting Services
  • Microsoft Office Communication Server 2007 R2
  • Visual Studio 2010 Beta 2 Ultimate Edition
  • Microsoft SharePoint Server 2010 Enterprise Edition Beta 2
  • Microsoft Office Web Applications Beta 2
  • FAST Search for SharePoint 2010 Beta 2
  • Microsoft Project Server 2010 Beta 2
  • Microsoft Office 2010 Beta 2
  • Microsoft Exchange Server 2010 Active directory has been preconfigured over 200 "demo" users with metadata in an organizational structure

I have not had time to play with it…

Monday, January 25, 2010

SharePoint 2010 Service Architecture

Introduction

Recently I wrote a series of blogs on Enterprise Search and FAST ESP with the intention of having a deep dive into SharePoint Search 2010. There are many changes to both the logical and physical topologies of SharePoint 2010. Most of the readers of this blog series are very familiar with SharePoint 2007 but not so much with SharePoint 2010 (since it is in beta). Before doing a deep dive into SharePoint Search 2010 we need to have a good grasp of the changes for SharePoint 2010.

A well architected logical and physical SharePoint environment tends to revolve around Search. Search tended to drive much of the logical and physical architectures for SharePoint 2007. In this blog series I am going to do an introduction to the SharePoint 2010 services, logical and physical architectures. It is critical to have this worked out from the beginning because the SharePoint architecture must scale with the business. What we have seen is once SharePoint is implemented; it becomes highly adopted.

SharePoint 2010 Versions

In SharePoint 2007 the versions we became very familiar with were:

  • Windows SharePoint Services (WSS)
  • MOSS 2007 Standard
  • MOSS 2007 Enterprise

For SharePoint 2010 it has changed to:

  • SharePoint Foundation Services
  • SharePoint Server 2010 Standard
  • SharePoint Server 2010 Enterprise

Tier Architecture

SharePoint 2010 has not changed from a Tier perspective. There are Web, Application and Database tiers. What’s important is it to understand is how to architect those tiers. For SharePoint 2010 the Application tier has changed significantly as it is more sophisticated than what was available in SharePoint 2007. Some things that we will get into within this series are creating service farms and partitioned services.

Service Application

Important changes that you should be aware of:

  1. In SharePoint 2007 there was Shared Service Providers (SSP) which was used to host services. SSPs have been completely removed from SharePoint 2010 and services can be run independently.
  2. Some services in SharePoint 2010 will be referred to as Service Applications. Not all services in Central Administration are Service Applications. In the table below, you will see which services are considered to be Service Applications. You will see a trend that Service Applications tend to map to major features of SharePoint rather than services which could be considered part of the infrastructure of SharePoint.
  3. Service Groups have been introduced to logically manage Service Applications. When Service Applications are added they will be included in a Default Group. Web Applications can use that Default Group or use a Custom Group of Service Applications. What this provides is greater control of which Service Applications are available to specific web applications. If you are familiar with SharePoint 2007, the Service Group concept is one of the reasons we created different SSPs because sometimes we needed to create barriers between web applications.
  4. SharePoint 2010 services can be reused within and across farms. This was not available in SharePoint 2007 and provides a significant amount of scalability options.
  5. SharePoint 2010 supports Service Partitioning. If you are familiar with database partitioning, think of it is horizontal partitioning of data within a SharePoint service. Not all services support service partitioning; partitioning is only used in services that are data driven. A typical scenario is that a centrally managed/cross farm service with data that should not be exposed to all subscribing farms. If that is the case, a farm would subscribe to a partition of centrally managed service. In SharePoint lingo each partition is referred to as a “tenant”.

Knowing what we now know, when scaling out these services we will take the following into consideration:

  • Multiple instances of the same Service Applications can be initiated within a farm.
  • Service Applications are shared across Web Applications within a farm by Service Group.
  • Some Service Applications can be shared across multiple farms while others cannot.
  • Service Groups can logically contain Service Applications that reside in other farms.
  • Web Applications have the flexibility to use multiple instances of the same type of Service Application (regardless of which farm hosts that service).
  • Service Applications can have its data partitioned and only accessible to specific subscribers.
  • Service Groups can be used to logically scale for performance, security and scalability.

Some side notes:

  • Service Applications are hosted within IIS. It is possible to have Service Applications deployed to different application pools to achieve process isolation (important for both security and fault tolerance). So it is possible to have a single machine host many services and isolation between them.
  • Each Service Application instance has a Connection which Web Applications connect to. Web Applications use these Connections (sometimes referred to as proxies) to send and retrieve data from a Service Application.
  • If the same type of Service Application is used more than once in a single Web Application, one of the Connections will be marked as the primary.
  • Services are deployed through the Configuration Wizard, Central Admin and using Windows PowerShell. Services can be managed through Central Admin and PowerShell.

Given this flexibility in service configuration you now can:

  • Have better ability to share dedicated services across regional locations.
  • Have the ability to create dedicated services by business unit. For instance a Finance Web Application may have a dedicated Excel Services Service Application instance while a different Excel Services Service Application instance may be available to the rest of the farm.
  • Now have greater control to ensure that data cannot be shared between logical groups of users. For instance lock down departmental or intranet data.
  • Have the ability to support hosted models in a more secure and efficient manner.
  • Service Applications that have expensive operations, like Search, can be centralized and reused across farms.

We will put these rules for Service Applications into action in the next part of this series.

SharePoint 2010 Services

In SharePoint 2007 we commonly had to configure the following services:

  • Document Conversions Launcher Service
  • Document Conversions Load Balancer Service
  • Excel Calculation Services
  • Office SharePoint Server Search
  • Windows SharePoint Services Help Search
  • Windows SharePoint Services Web Application

In many cases with SharePoint 2007 implementations services were not configured correctly. This resulted in poor performance and the inability to scale to meet business demand. Many people implementing SharePoint 2007 did not understand that both a logical and physical architectures have to be aligned to how SharePoint services will be utilized. This will be a problem for many on the 2010 platform.

As discussed the service architecture has changed for SharePoint 2010 in many ways. Before we dive into all of the strategies of how services should be aligned in both the logical and physical architectures let us understand what the new services are.

Here is a list of services for SharePoint 2010. I found several pieces of information and I manually created this table with the information that I have.

Services

Description

Service Application

Cross Farm

Partitioning

Available On

Access Database Services

New service that allows for viewing, editing and interacting with MS Access through a browser.

Yes

No

Yes **

SharePoint Server 2010 Enterprise

Application Registry Service

Enables users to search and collaborate around business data. Provides backward compatibility to BDC service.

No

No

NA

SharePoint Foundation 2010 and up

Business Data Connectivity

Access to line of business systems. Service now supports writing to data services.

Yes

Yes

Yes

SharePoint Foundation 2010 and up

Central Administration

Central Admin Site

No

No

NA

SharePoint Foundation 2010 and up

Document Conversions Launcher Service

Schedules and initiates document conversions.

No

No

NA

SharePoint Foundation 2010 and up

Document Conversions Load Balancer Service

Balances document conversions across the SharePoint farm.

No

No

NA

SharePoint Foundation 2010 and up

Excel Calculation Services

Ability to interact with Excel files in a browser. New extended functionality.

Yes

No

No

SharePoint Server 2010 Enterprise

Lotus Notes Connector

Index service connector to index Lotus Notes Domino Servers.

Yes

Yes

Yes

SharePoint Server 2010 Standard and up

InfoPath Service

Supports hosting InfoPath forms in SharePoint.

No

Yes

Yes **

SharePoint Server 2010 Enterprise

Managed Metadata Service

New service that manages taxonomy structures and definitions.

Yes

Yes

Yes

SharePoint Server 2010 Standard and up

Microsoft SharePoint Foundation Incoming E-mail

Email service. This will run on the machine where the web application is running.

No

No

NA

SharePoint Foundation 2010 and up

Microsoft SharePoint Foundation Subscription Settings Services

New service used to track subscription IDs and settings for services that deployed in partition mode.

Yes

NA

NA

SharePoint Foundation 2010 and up

Microsoft SharePoint Foundation User Code Service

New service runs code deployed as part of a sandbox solution and runs in restricted mode. Must be started on any machine in the farm that needs to run Sandbox code.

No

NA

NA

SharePoint Foundation 2010 and up

Microsoft SharePoint Foundation Web Application

The service that runs the web application.

No

No

NA

SharePoint Foundation 2010 and up

Microsoft SharePoint Foundation Workflow Timer Service

Responsible for running timer jobs.

No

No

NA

SharePoint Foundation 2010 and up

PerformancePoint

BI Dashboard services.

Yes

No

NA

SharePoint Server 2010 Enterprise

PowerPoint

New services that allows viewing, editing and broadcasting PowerPoint in a browser.

Yes

No

Yes **

SharePoint Server 2010 Enterprise

Project

Host project server 2010.

Yes

No

Yes

Additional server product.

Search Query and Site Settings Service

Service that performs a query across built indexes.

Yes

Yes

Yes*

SharePoint Server 2010 Standard and up

Secure Store Service

Service provide SSO authentication.

Yes

Yes

Yes

SharePoint Server 2010 Standard and up

SharePoint Foundation Search

Service that provides search capabilities for SharePoint Foundation Search only. For SharePoint Server 2010 Standard and Enterprise this service will perform online Help search.

No

No

NA

SharePoint Foundation 2010 and up.

SharePoint Server Search

Crawls content, creates indexes and performs queries. Automatically configured.

Yes

Yes

Yes*

SharePoint Server 2010 Standard and up

State Service

New services that provides temporary storage of user session data for SharePoint components.

Yes

No

Yes **

SharePoint Server 2010 Standard and up

Usage and Health Data Collection

Reporting services that provide farm wide usage and health.

Yes

No

Yes

SharePoint Foundation 2010 and up

User Profile

New and expanded social networking services and features.

Yes

Yes

Yes

SharePoint Server 2010 Standard and up

User Profile Synchronization Service

Synchronizes user and group profile information that is stored in the SharePoint Server 2010 profile store with profile information that is stored in directory services across the enterprise. Works with AD, BDC, Novel LDAP and Sun LDAP (more info).

Yes

Yes

Yes

SharePoint Server 2010 Standard and up

Visio Graphics Service

Ability to view published Visio diagrams in a browser.

Yes

No

Yes **

SharePoint Server 2010 Enterprise

Web Analytics Data Processing Service

Captures data for analytics.

Yes

Yes

Yes

SharePoint Foundation 2010 and up

Web Analytics Web Service

Web service interfaces for analytics.

Yes

Yes

Yes

SharePoint Foundation 2010 and up

Word Automation Services

Service that performs automated bulk document conversions.

Yes

No

Yes **

SharePoint Server 2010 Standard and Up

* FAST Search cannot be partitioned.

** Supports partitioning but is not needed because there is no tenant data.

Next

In the next blog we will actually jump into the topologies of SharePoint farms (with diagrams) based on the information captured we have gone over.

References

Thursday, January 14, 2010

SharePoint 2009 Conference Notes

I unfortunately did not get selected to go to the SharePoint Conference however I have tons of notes from other colleagues that I have gone through. Greg Galipeau, colleague of mine, puts tons of great notes on his site (blog 1, blog 2, blog 3). Since the dust has settled, I started to sift through a couple presentations from the conference. It is still early to tell how things are going to play out.

Here are a few notes I took while scanning over a couple SharePoint 2009 Conference presentations.

Terminology Changes

  • Web Application renamed to Service Application
  • Content Database renamed to Service Application Database
  • WSS now called SharePoint Foundation

System Requirements & Installation Process

  • OS – Windows Server 2008 SP2 or R2 and 64 bit
  • SQL – 2005 SP2 or SQL 2008 and 64 bit
  • Prerequisites
    • SQL Native Client
    • Geneva Framework
    • Sync Framework
    • Chart Controls Framework
    • Filter Pack
    • SQL Server 2008 Analysis Services
    • Web Server Role
    • Application Server Role
  • Installation has not really changed too much, the wizard has stayed pretty much the same but been rebranded.
  • The configuration wizard is now available through Central Administration.
  • There is a new Farm Configuration Wizard that is available inside of Central Administration.

System Administration

  • Managed Accounts –SharePoint can manage password changes. SharePoint will auto-generate the password or you can get reminders for when the password needs to be changed.
  • stsadm is being replaced by Windows PowerShell. It is supposed to have more functionality and power.
  • Backup and Restore
    • Can be run on separate threads.
    • Have the ability to do Configuration only backups.
    • Available in Central Admin to do granular backups of site collections, web or lists.
  • HTTP Request Monitoring
    • Has the ability to throttle performance during high peaks of usage. If a threshold is exceeded get requests will receive 503 errors, timer jobs may not start but put requests are still allowed.
    • This is added to protect the server during peak loads. It will evaluate available memory, CPU utilization, the ASP.NET queue and wait time queue. It is checked on a regular interval of 5 seconds.
  • Logging
    • Will be more compressed by default.
    • Ability to have a SQL Logging database.
      • Opinion - This was very interesting because to me because searching and management of logs can potentially be easier.
  • Best Practice Analyzer – uses best practice rules and will alert if there are issues. The rules can be customized and it can be executed by time jobs.
    • Opinion - Let's see where this goes but it could be helpful for clients who do not have lots of expertise in house and they commonly make configuration mistakes.
  • Patch Levels and Upgrades – Servers and databases can be run at different patch levels as necessary but must maintain a level of compatibility.
    • Opinion - This is important as it would have helped me with a client I had; however this does not negate good Configuration Management of the SharePoint farm.

SharePoint Lists

  • SharePoint Lists can now handle 50 million items per list versus the previous 2,000 item UI limit.
  • As the list grows, there will still be a performance impact.
    • Opinion – I am still not a believer in managing highly relational or referenced data in SharePoint lists. This should be done in database tables.

New Development Tools

  • Developer Dashboard – is used to monitor page load and performance. It will show times it takes to load components on a page, database query information, check out status, web part processing times and critical events. Its purpose is to provide developers with a better tool to diagnose problems with web parts or lists that may take a long time to load.
    • Opinion - This should be pretty useful and know several clients that could take advantage of it.
  • Visual Studio 2010 will address many of the issues that we were having with SharePoint development. Reality was SharePoint 2007 development was very difficult because we did not have the development tools like what we have for ASP.net, Win Form, SQL Server, etc . Debugging was a major challenge and that has been fixed. Microsoft heard this loud and clear and it has been fixed.
  • Business Connectivity Services – basically a new version of the BDC but allows you to do a read and write.
  • LINQ to SharePoint – will provide better strongly typed way to get code out of SharePoint lists.
    • Opinion – This is created because writing dynamic CAML queries is very challenging based on the structure of that query language.
  • Client Object Model – this allows for SharePoint API code to run on external machines and not just on the SharePoint machine itself.
    • Opinion – This is will be very beneficial, we will see if it will do everything we need it to do.

General

  • SharePoint Workspace – If you are familiar with the product called Groove, it has been rebranded as SharePoint Workspace. This tool will provide a rich offline user experience with SharePoint 2010. If you tried Groove with SharePoint 2007 it did not work very but I suspect they will do the job now. There is a need for this especially with support mobile personnel, like a sales team.

Improved cross browser support (sorry, little details).

Dispose SPSite and SPWeb Check Tool

I read something pretty interesting. Scot Hillier in a presentation said some of the things that drive a SharePoint consultant crazy are:

  1. Running in Full trust. (Drives me nuts, read here.)
  2. Not automating web.config settings. (Very true, read here.)
  3. Deploying non-Release code to production. (Doh – do not forget!)
  4. Deploying to the GAC. (Drives me nuts, read here.)
  5. Not using Feature and Solutions properly to do deployments. (My addition)
  6. Forgetting to run SPDisposeCheck. (???)

Given I had recently seen some issues in SharePoint production environments where connections were not being closed, I wonder what was SPDisposeCheck? I had never heard of it and sounded useful.

SPDisposeCheck was developed by Microsoft and provided to the community to avoid this issue. It will check all of your assemblies to see if SPSite and SPWeb objects are not being disposed of correctly. Here is information for using it:

If you are seeing performance problems on your SharePoint sites, check your logs! If you see the following errors objects are not being disposed correctly and memory is being consumed:

ERROR: request not found in the TrackedRequests. We might be creating and closing webs on different threads. ThreadId = 21, Free call stack = at Microsoft.SharePoint.SPRequestManager.Release(SPRequest request) at Microsoft.SharePoint.SPSite.Close() ......

An SPRequest object was not disposed before the end of this thread. To avoid wasting system resources, dispose of this object or its parent (such as an SPSite or SPWeb) as soon as you are done using it. This object will now be disposed. Allocation Id: {B274FEBF-1D42-463C-B0C3-7A9005371494} To determine where this object was allocated, create a registry key at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\HeapSettings. Then create a new DWORD named SPRequestStackTrace with the value 1 under this key.

An SPRequest object was reclaimed by the garbage collector instead of being explicitly freed. To avoid wasting system resources, dispose of this object or its parent (such as an SPSite or SPWeb) as soon as you are done using it. Allocation Id: {617E63C9-32A3-42AE-AEEB-BC41CB798C88} To determine where this object was allocated, create a registry key at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\HeapSettings. Then create a new DWORD named SPRequestStackTrace with the value 1 under this key.

Here is a good blog that goes into the details of the error. The error is associated to not disposing SPSite and SPWeb objects correctly in custom code. This is extremely important and it can take down a SharePoint site.

Monday, January 11, 2010

FAST ESP SharePoint Connector

Series

Introduction

After introducing the components and providing a preview of design considerations for scaling a FAST ESP implementation, let us take a look at how FAST EST works with SharePoint today. In this post I will introduce you to the architecture of the FAST SharePoint Connector and explain how content is fed, processed, stored and queried. We will cover considerations and strategies for a successful implementation.

In the next set of posting, we will discuss in detail what has been planned for SharePoint 2010.

Note, if you have no FAST ESP experience or training, you must read this blog to understand some of the concepts.

FAST Connector for SharePoint Today

FAST supports both SharePoint 2003 and 2007 in the same manner it would support any other enterprise application that FAST would index. FAST provides an API (Java, .Net, C++) and the FAST Content Connector Toolkit, which facilitates the building for Connector applications. The SharePoint Connector is built on these frameworks to feed content from SharePoint into FAST.

There are three features of the FAST SharePoint Connector you should be aware of:

  1. It will index sites, lists, list items, files and associated metadata from SharePoint.
  2. It can incrementally retrieve content from SharePoint.
  3. It will capture SharePoint item permissions and incorporate it into the access control list.

Architecture

The architecture of the FAST SharePoint Connector is pretty simple and well contained.

  1. A custom web service will be installed into the SharePoint farm. This web service will be accessible just like the out of the box web services provided in SharePoint. Side note: if you are interested in writing out your own custom web service for SharePoint, read this blog .
  2. The FAST SharePoint Connector must be installed on a machine that can access the SharePoint web services and is able to connect to FAST ESP Content Distributors. It really does not matter where this is installed as long as it can make the required connections. That being said, the Connector could be installed on either the SharePoint Farm or on the FAST Farm.
  3. The Windows Authentication Proxy must be installed onto the FAST Farm.

SharePointFASTConnector

Basic Processing Flow

The installed components work together to retrieve content from SharePoint and make it searchable within FAST. Here is the process:

  1. The SharePoint Connector calls the FAST SharePoint Web Service to retrieve content.
  2. The FAST Connector connects to the FAST Content Distributor and sends along the SharePoint data.
  3. The FAST Windows Authentication Proxy “may” be used to get additional SharePoint data.
  4. The document processors process the content into FIXML documents so an index can be built.

Now let’s dive a little bit deeper into some of the details about how this works.

Incremental Loading

The FAST SharePoint Connector will perform incremental loading of content. The first time it will be heavy because all of the SharePoint data will be loaded. However, subsequent content loads will only retrieve changes. The interval for incremental loading is configurable.

Incremental Loading Strategy

If you need to completely reload the data, you must clear the Collection the documents were fed to. Doing this has ramifications that you should be aware of. The most important one that comes to mind is that Collection can have documents from other locations. If so, all of that content will have to be re-indexed too! That can be a big deal. So it is important organize your Collections and potentially anticipate if you will ever have to do this.

You may be wondering how you can control the amount of data that is indexed at any given time. Well there are probably many ways but here are some options that come to mind first.

Create multiple pipeline instances for processing SharePoint data, then configure the pipelines to include or exclude specific URLs within SharePoint. I might create a dedicated pipeline for processing content in areas where I know there will be lots of updates. For instance, collaboration or project sites will have data updated on a regular basis. I would then configure that pipeline to refresh the interval on a regular basis. The advantage of doing this is that a smaller subset of data that is regularly updated will be polled more frequently. Then I would create a dedicated pipeline for a publishing site where data is updated less frequently. The interval between getting data may be longer.

Another thing I may take into consideration is pushing data to different collections. For instance, you can have dedicated collections for intranet, extranet and Internet (remember I am not talking about SharePoint collections, I am talking about FAST collections). Typically in the SharePoint world you logically group data into different content databases, shared service providers and even different hardware. It may be good to maintain that logical separation knowing that it is recommended to feed content into different FAST Collections based on these logical boundaries. Doing this will also give more control over the Search Profiles and what Collections people have access to.

Document Feeding

When SharePoint data is read through the web service by the SharePoint Connector, both metadata and security information will be sent to the Connector. However, depending on the configuration you set, SharePoint files may or may not be part of that pay load. By default, a reference to the file (a URL) will be part of the information sent from the SharePoint Connector to the Content Distributor(s). During Document Processing, the Windows Authentication Proxy will use this reference to retrieve the actual document from SharePoint. You have the ability to change this configuration and send the file as a BLOB.

Document Feeding Strategy

Why are you provided with this option? Mostly for flexibility reasons. If you pass the files by reference:

  • The Connector is going to perform more quickly because it has less data to work with.
  • Your network I/O will be better utilized because the document will not have to be passed twice as it will only be retrieved one from SharePoint. This is a big deal if you have large files.

If you choose to pass the file immediately:

  • Document Processing will be quicker because it does not have to go out a retrieve the file from SharePoint.
  • The machines where Document Processing is located do not need to have access to the SharePoint sites because all of the content is available.

Processing SharePoint Data

Working with SharePoint data is not really different than working with other data that may come into FAST. But we need to be aware of some strategies you may want to employ. First, you should know how the data type mapping from SharePoint to FAST will he handled. Second, metadata from columns in SharePoint will be mapped into the fields within the Index Profile. This mapping is based on the unique name of the field. For example, if you have a column called Last Update Date in SharePoint, in the FAST index profile there must be a column called lastupdatedate (notice it is lower case, no spaces and no special characters). If this is the case, the data from the SharePoint column will be automatically mapped into that index field and become searchable. Note the SharePoint column data is not mapped; Document Processing will discard the data.

If you have a good understanding of SharePoint, this will raise a red flag for you. This is because you know that columns of SharePoint data can be added in an ad-hoc fashion. The question on your mind is how can data made searchable if it is not mapped to a field? When you learn FAST there is a concept called Scope fields. Scope fields take metadata and store them in a structured format (similar to XML) in a single field in the index. Scope fields are specifically provided to support storing of index data without having to know the schema of that data in advance. When you store data in a Scope field you have the ability to query it back out using their FQL language (similar to writing an XPath query).

Processing SharePoint Data Strategy

There are some considerations that you must take into account. First, you can add new fields into the Index Profile to match all of the data that you want to bring into SharePoint. This is fine; however, if you need to add a field this is considered to be a “warm update.” After making the change, only new documents will have the data but all previously indexed documents will not. For old documents to have the data, they must be completely re-processed. This will require you to clear the collection and completely re-index (discussed above). A second consideration is that using Scope fields to support querying all column data has a query performance penalty.

Here are some additional recommendations:

  • Come up with a hybrid approach where important SharePoint columns are mapped to a field in the Index Profile. Then allow all other columns to be indexed automatically into a Scope field. This is a common practice. This will give you good query performance on most common columns of data and still allow you to access to all other column data.
  • Earlier, we mentioned potentially creating separate Collections for publishing sites versus collaboration sites. In that scenario, do not turn on Scope fields for the publishing Collection because the metadata should be very well defined. This way you can get better query performance. All you need to do is either add new fields that map directly to SharePoint columns or add document processing stage(s) that will save the data in existing index fields.

Wrap Up

This post provided you with some insight into how FAST ESP indexes data from SharePoint. Hopefully you will take these factors into consideration before you start to index your content. This is why we say it is so important to understand the life-cycle of the data you are indexing - because it will influence your approach.

SharePoint 2010 Planning, Development and Architecture

I recently stumbled across a bunch of documents that are available that give SharePoint professionals insight into the new SharePoint 2010 platform.

SharePoint 2010 Architecture

SharePoint 2010 Development

SharePoint 2010 Search

FAST for SharePoint 2010

Thursday, January 7, 2010

Scaling FAST ESP Enterprise Search

Series

Introduction

In the first part of this series I introduced all of the components and their role for FAST ESP 5.3. Now I want to discuss how these components should be scaled and the associated design decisions.

Just as with a SharePoint installation, typically the first thing a client wants to know is how much hardware they need to buy and what the licensing costs will be. So in the SharePoint world we have a multitude of questions that we ask up front to gather this information, which is usually easy to get since SharePoint is an application server. For FAST you need to focus on understanding the content that will be indexed, which can be pretty challenging at times. This is because you need a full understanding of the characteristics and lifecycle of that data. This is usually not very well documented, especially with legacy systems.

In this post I am going to expand upon the previous one by focusing on how to scale the components of FAST based on business rules.

General Comment

FAST can be implemented on a single machine or may be completely scaled out with each component having own dedicated hardware. FAST is linearly scalable, meaning that doubling the hardware will double the capacity. This is unlike SharePoint 2007, where there can be decreasing performance (for example, with more than four WFEs). FAST best practices also state that better performance can be achieved by using multiple small machines instead of one big machine.

Requirements for Scaling

Some initial questions you need answered right off the bat for scaling are:

  • How many documents are going to be processed (content volume)?
  • What is the size of the documents that will be fed into FAST?
  • What is the estimated total number of documents that are going to be searchable (which is different than number of documents that could be processed)?
  • How would the documents be described (format, percentage of content to be indexed, type, metadata, etc.)?
  • Will data be continually fed or pulled in periodically?
  • What is the life-cycle of the data to be indexed?
  • What is the acceptable amount of time between a document being fed into FAST and being made searchable to a user (Index latency)?
  • How long should it take for search results to be returned to the user (Search latency)?
  • What are the expected peaks of queries that must be handled? Specifically getting to understand how many queries per second that must be handled.
  • What features are required for document search (spelling checking, lemmatization, entity extraction, categorization, ranking, navigation, etc.)?
  • What sort of service level agreement exists in case there is a failure at any point in the component architecture? Example, must search continue to work at all times (even on stale data)?
  • What can the network bandwidth support, specifically for copying large amounts of data?
  • What storage capacity can be supported?
  • What service level agreement must be maintained for business owners and users?

Here is a picture of the components that make up FAST.

components

Connectors

Connector applications will have to be scaled appropriately to support the feeding of content into FAST. Multiple servers may need to be used depending on what is fed into FAST and how often. Content may have to be sent in batches or run based on schedule.

Make sure you do not create bottlenecks when you have multiple Connector applications running. It's probably best to create a schedule of when content is fed into FAST.

Collections

I mention Collections here because we introduced them earlier but there is no scaling for them. They are just a logical grouping of searchable documents.

Content Distributors

Having multiple Content Distributor machines will provide fault tolerance but will not improve performance as they cannot be load balanced. When a Connector application connects, you have the ability to provide 1 to n Content Distributors and when one fails, the Connector will simply connect to the next one available.

If you need to support a high volume of content processing (or continuous feeding) you can use multiple Content Distributors and dedicate them to specific Connector applications. Remember Content Distributors are simply a pass through and do not have special hardware requirements outside the FAST basic recommendations.

Feeding Proxy

A Feeding Proxy (not enabled by default) can be placed in front of the Content Distributors. This will replicate submitted content to another installation (like a cold back up). This can provide a very high level of fault tolerance for your entire installation. The Connector application(s) would instead connect to the Feeding Proxy instead of directly to the Content Distributors.

Note: if a Feeding Proxy fails, the non-failing Feeding Proxy will buffer the new content until the failing Feeding Proxy comes back online.

Document Processors

Multiple Document Processors can provide fault tolerance and increased document throughput. All of the Document Processors will pass content to only one Content Distributor at a time. Should a Content Distributor fail, that Document Processor will connect to the next available Content Distributor. The Content Distributor is responsible for sending content across Document Processors, which increases throughput of content.

Document Processors have the biggest impacts on CPU and RAM.

Index Nodes

Multiple Index servers can be used to provide fault tolerance, capacity and performance. Basically you create a “matrix” of index servers. The more rows you create, the more fault tolerance you provide. The more columns, the more content is created and the better performance will be realized. Below is a diagram depicting the Index server matrix (a node is the same thing as a server).

IndexNodes

First, let’s talk about fault tolerance. Fault tolerance is created by adding more rows to the index. The first row in a column is the Index Master and all subsequent Index servers in the column are Backups. The Index Dispatcher will push all of the FIXML files to each index server in the column. However, the Master will build an index file. If Master goes down, the next Backup will be made to the Master and the index partitions will have to be rebuilt with the FIXML already on that machine.

Adding new columns to the matrix will increase capacity and can help indexing performance. Performance is improved because we can decrease the index latency (amount of time for a document to become searchable) because there are more indexing services running.

Index Servers will require machines with high RAM (building the Index), Disk (store the index), and Net/IO (send the index to Search Server). You need enough memory to build and enough space to store the index being built. If there is not enough space for new content, it will not be indexed. That is why it is extremely important to know the size of the content that is being fed into FAST. Net/IO is needed to receive processed content and to push the index to Search Servers.

Index Partitions

Let’s go ahead and dive a little deeper into how indexing works as there are ways to improve performance. Within each Index node there are Index Partitions (by default there are three). Indexes are built by Index Partitions. The goal of using Index Partitions is to have low index latency (time it takes for a document to become visible in search) and support real time indexing with high volumes of content.

The first Index Partition should be the smallest with the other partitions being increasingly larger. The size of the Index Partition is controlled by a document count. This document count states how many documents can be stored within each Index Partition. New FIXML documents are always added into the first Index Partition (smallest) for indexing. Once a document count threshold is met for an Index Partition, the documents within that Index Partition will be merged with the next Index Partition. This merging will cause a re-index of that partition.

The goal of all is to not index large amounts of documents at one time. Instead index only what needs to be indexed. Once a document has been indexed it is highly likely that the document will not need to be re-indexed on a regular basis. As mentioned, new documents will be added to the first partition and that document will be immediately indexed. Since the first partition is significantly smaller than the other partitions, indexing is not an expensive operation. Once the count of documents goes over the threshold in the first partition, the documents will be merged into the next partition and re-indexed within that next partition. Obviously that operation will be a little more expensive but it will be less frequent and you are not re-indexing content on a regular basis. More Index Partitions can be added and the documents counts for each Index Partition can be adjusted. This can be strategically aligned with the schedules of when content is fed into the FAST.

You may be wondering how updated documents are handled across Index Partitions. This is handled through a process called Index Blacklisting. Based on what was just discussed it possible that different versions of the same document could spread across Index Partitions. The Index Blacklisting process (which runs on an interval) will suppress old versions of a document from search results. When an Index Partition is re-indexed and both of the versions reside on the same Index Partition only then will the old version be discarded. An interesting thing to know is that the index can become fragmented when there are multiple versions of the FIXML residing on the Index Server. This can lead to query performance issues, disk space issues, and increased time requirements needed to rebuild the entire index server. It is important to determine how often content is updated and develop a strategy to align that to the Index Partitions.

One last side note - a document’s Collection does not dictate where a document will be stored in an Index Partition. Documents from the same Collection will be stored across all Index Nodes. Remember, a Collection can be thought of as a piece of metadata.

Index Dispatchers

Multiple Index Dispatchers can provide fault tolerance and better scaling. It will be more scalable as the documents created by Document Processors will be load balanced between the available Index Dispatchers. It is not common to need to scale here unless you want to add Fault Tolerance.

The Index Dispatcher has the responsibility of sending processed documents to the correct Index columns in a round robin fashion (the actual policy is beyond the scope of this article).

Note: Index Dispatchers are a pass through and do not have special hardware requirements outside the FAST basic recommendations.

Search Nodes

Search Servers typically require machines with good CPU, RAM and disk to store the index. Multiple Search Nodes (servers) can provide fault tolerance, better query performance and support for more volume. Like the Index Nodes, Search Nodes are arranged into a “matrix.” The number of Search versus Index rows do not have to match. However, there must be at least one Search Node for each Index Node column.

SearchNodes

More rows add fault tolerance and better query performance. Adding more Search Nodes rows will directly improve query performance. This is because the Top Level Dispatchers will balance queries across the Search nodes in a column. Additionally, having many Search rows provides fault tolerance in case one goes down.

It is possible to have both Search and Index roles installed on the same machine, but this not a good practice for the following reasons: First, indexing is an expensive operation and can directly affect query performance. When the roles are on separate machines, the index will be replicated to each Search node in the respective column. Then the Search node will search on the local index file. Secondly, more fault tolerance is introduced by having the Index and Search nodes on separate machines. If the Index nodes were to all fail, search results can still be returned because the Search nodes will find results using the local index file. The only downside is the results will be stale. However, that is acceptable in many cases.

If you are familiar with MOSS 2007 Search this is not new stuff. In MOSS 2007 you typically set up a single index server and run all the query services on each WFE server to achieve the performance and redundancy that was just discussed.

Query/Result (QR) Server

QR Servers require good CPU and RAM to support the query request from applications. Adding more QR servers will provide you with better query performance and provide redundancy. Performance of the QR server can be directly affected by the features that have been activated. If it takes a long time to prepare a query or to prepare the results, having multiple QR servers can be helpful.

Search Front End Servers (SFE)

SFE are the front end applications that make query requests. The scaling of these applications is beyond the scope of this article.

Administration Server

If you are looking to have a truly fault tolerant environment, it is recommended that you set u pa separate second Admin server. However, this is not usually done. One of the more important Administrative services that can be scaled is the CORBA Name Service. This service ensures that components will be able to resolve each other.

Index and Query Expansion

Another concept I will introduce you to is Index and Query expansion. This can affect performance and storage capacity and should be considered as part of your planning efforts. Expansion is specifically related to lemmatization, which is a linguistic feature typically implemented. Lemmatization provides the ability to search for alternate forms of a word (walk >> walked, walks, walking, etc.). Using lemmatization will improve the search experience, but at a cost of disk space, content processing and query processing. Typically, you will not use lemmatization when content is small, very structured, and requires exact text matching. Lemmatization really helps when there is lots of rich, verbose and unstructured structured content.

There are three ways to approach lemmatization: index expansion, query expansion and by reduction. Index expansion will store the lemma values in the index which will have an impact to the size of the index, but querying will be faster. Query expansion will have the opposite affect: making the index smaller but impacting query performance. Expansion by reduction will store the reduced lemma value in the index and the query terms will be reduced to the same lemma value. This is out of scope of this article, but Expansion by reduction is basically a middle ground.

Applying What We Learned

This is a lot to consume for a newcomer to FAST, let alone Enterprise Search. But I wanted to introduce you to some of the considerations you need to be thinking about when designing a FAST implementation. The goal is a low index latency (time to make a document searchable) and to maximize query per seconds. Some things you should consider right off the bat are:

  • Establish dedicated machines for searching and indexing. Rows can always be added to increase capacity or to improve performance.
  • Try to process documents and index during periods when you know there will be a low amount of queries.
  • Try to use machines with high disk speed, 8 GB RAM, and multiple cores.
  • Only use the features that you need (for example, turn off lemmatization if you do not need it).
  • Optimize the document processing so that it is efficient.

Depending on the business rules you have to support, you will have to scale the components to different machines. Here is an example:

Servers1

In this small farm, the first machine is dedicated to receiving content and then processing that content. The second machine is devoted to building indexes that can be searched. The third machine is devoted to accepting queries, performing searches, and returning query results.

Some benefits of this configuration are:

  • A level of redundancy is created in that Search results can still be returned if the other components should fail (which is important).
  • Hardware is dedicated for searching and indexing.
  • It is easy to scale by adding more search and index nodes.

Some drawbacks of this configuration are:

  • Latency will be introduced when copying the index from the Index Node to the Search Node.
  • There is no redundancy for each specific component.

Here is another strategy for FAST configuration:

Servers2

In this case both the Index and Search Nodes are placed onto the same machine. Why would you do that? Because there could be data that when updated must be available very quickly. We have reduced that time because the index does not have to be copied between machines. Obviously, issues can arise in this configuration if indexing was to occur on a regular basis and if there are lots of users concurrently running queries.

We know that FAST can run perfectly well on one or two servers and that can even make sense in some environments. Still, more servers will be required to support the proper level of redundancy and performance required by production users. Simply take the information provided here and scale the environment based on your business requirements. Knowing the answers to the questions presented at the beginning of this posting is critical and will ensure that you have a successful and sustainable implementation.

Wednesday, January 6, 2010

FAST ESP Components Introduction

Series

Introduction

I am going to provide an introduction in the architecture of FAST ESP 5.3. As many of you know, FAST is part of the enterprise MOSS 2010 CAL but not for MOSS 2007. I will delve into the future architecture of FAST with MOSS later in this series. Right now I wanted to introduce the architecture of FAST as it is today. Many of the concepts will carry over for MOSS 2010 so having a good understanding of FAST will be valuable.

In this blog I am going to start by giving you an introduction to the various different servers, components, and logical architecture that make up a FAST ESP 5.3 implementation. This blog is meant to provide a tip of the iceberg view of FAST.

In this blog there will be a technical focus on understanding how content is actually made searchable with FAST. One of the first things you should understand is the purpose of each component and its role in making content searchable.

MOSS 2007 Search Architecture

I am going to assume the reader is familiar with SharePoint and is learning FAST from a SharePoint perspective. Some of the Enterprise Search concepts with MOSS 2007 are similar to FAST. Let me touch on these so you have a frame of reference before we jump into FAST. In MOSS:

  • There is an index server that builds the index.
  • There is a physical index file that is pushed to each WFE server.
  • There is a query service that runs on the WFE that queries the index file.
  • There are SSPs which are to host search services and control what content is indexed.
  • The BDC is used to add external data (from a database and web services) to the index.
  • There is an access control list which is used to manage security to items.

SharePoint2007Search

FAST ESP 5.3

The most current version of FAST is called ESP 5.3. If you are familiar with FAST and the value proposition (further reading) you will know that it is a really powerful engine for crawling content and providing relevant results to users.

Let us look at FAST ESP 5.3 components from a 20,000 foot level (If you are new to Enterprise Search please review this blog first). A simple way of looking at FAST is to think of it as a big ETL project. First you need to get content from various locations across your enterprise. Then you need to process and format the content so it can be indexed. Once content is indexed, you need to quickly return results in a relevant matter. There are numerous components within FAST that perform this processing, indexing and searching:

  • Connectors – Applications that can feed content into FAST.
  • Content Distributors – Connecters feed content to Content Distributors which send the content for document processing.
  • Document Processors – Processing workflows that create searchable documents from submitted content.
  • Collections – Logical grouping of searchable documents.
  • Index Dispatchers – Routes processed documents to the appropriate Index Node.
  • Index Node – Builds searchable indexes from processed documents.
  • Search Node – Queries built indexes for matching documents.
  • Top Level Dispatcher (TLD) – Manages communications and performance between the QR Server and multiple Search Nodes.
  • Query/Result (QR) Servers – Prepares queries to be sent to the Search Nodes and refines search results returned.
  • Search Front End Servers (SFE) – Applications where a user makes a search request.
  • Administration Server – Administrative features, configuration management, etc. for the enterprise installation.

The diagram below depicts how all of these components interact with each other at a high level. This should give you a general idea of the life-cycle of a document.

FASTComponents

Connectors

Connectors are also commonly referred to as content feeders because that is exactly what they do. Whether Connectors do it by pushing or pulling content to FAST, they are built on a common API (available in Java, C# and C++) which submits content to FAST. Think of Connectors as standalone applications that submit content into FAST. FAST provides several Connector applications. Those that you should be immediately familiar with include:

  • Enterprise Crawler – an extremely powerful crawler for web content.
  • File Traverser – This app can crawl file directories and supports over 270 file formats out of the box. It also provides extensive support for consuming XML content.
  • JDBC Connector – This app is used for submitting structured data/content from databases into FAST.
  • SharePoint Connector – For the SharePoint people reading this blog, there is a connector for SharePoint 2003 and 2007. We will discuss details later in this series.

This is by no means the only list of Connectors. Many FAST partners have built product FAST connectors for many of the well-known enterprise servers that we work with today. You also have the ability to create your own feeder applications using their API.

Collections

Collections are one of the fundamental concepts you need to know about when implementing FAST. Content is always submitted to Collections, which are logical groupings of searchable documents. In FAST terminology, a “document” is anything that has been indexed. This can be a file, some web content, a database record, or something else that can be processed by FAST. Each document has fields which are populated based on document processing rules. FAST has a very powerful relevancy model and fields are used to provide more relevant results to users.

Why would you create Collections? Some examples are you may want to create different collections for different type of content that you are indexing (Internet versus intranet). You may want to index content differently based on business rules and relevance models.

We will touch more on Collections later when discussing Document Processing.

Note if you have a SharePoint background you may want to say a Collection is similar to a SSP but they are not the same. SSP is a concept unique to SharePoint.

Content Distributors

When Connectors feed content to FAST, they must provide two things: the Collection (which is the logical destination for the content) and the Content Distributor. The purpose of Content Distributors is to provide fault tolerance and increased Document Processing throughput for FAST. They are responsible for routing content directly to the Document Process servers. Specifically, the Content Distributor is responsible for sending content to a Collection which has a Document Processing pipeline mapped to Collection. Content will only pass through the Content Distributor and it will never be modified.

Document Processors

Content processing occurs within Document Processing servers of FAST. Pipelines in the servers support an end-to-end process that transforms submitted content into a FAST searchable document. Within each pipeline there are stages which perform actual tasks. FAST comes with numerous pipelines already preconfigured with stages. It is possible to write your own stages in Python and then add them into existing pipelines. You can also create your own pipelines for custom content processing. The goal is to create a document that is in a proprietary format called FIXML. FIXML is a physical file that is used by the FAST index servers to build an index that can be searched. A common comparison made is a pipeline and it stages to an ETL job such as in SQL SSIS.

When a Collection is defined, a single pipeline will be assigned to perform the Document Processing for that Collection. Pipelines can be reused between Collections but it is important to note that a Collection will only have one defined pipeline. This is because some pipelines are geared for processing unstructured content (web) versus structured content (database and XML). Also, documents are uniquely identified by the internal ID along with the Collection name the document belongs to. It is possible that the same document could be in the index more than once but in different Collections. They are not considered to be the same document (even though the document source is the same) because it “may” have passed through a different document processing pipeline.

Note there is a significant amount of work (beyond the scope of this blog) that is performed on the Document Processor servers, including everything from extracting complex entities to applying linguistics.

Index Dispatchers

Before data is sent to the Index nodes, a component called an Index Dispatcher sends the FIXML to the correct Index node. The job of the Index Dispatcher is to hide the topology of the Index nodes from the Document Processing servers. I will discuss details of the Index Server topology later.

Index Nodes

The Index nodes are responsible for building binary indexes from FIXML files created by the Document Processors. Searches are completed against the built indexes, not the FIXML files. There may be many Index Nodes to support fault tolerance, performance or the amount of content that must be indexed. We will dive into this in the next section.

Search Nodes

Search Nodes are the processes that perform queries against Index Nodes. There will always be at least one Search Node for every Index Node. The Search Node will only search the Index Node that it is assigned to. We will go into the details of topology of Index and Search nodes when we discuss scaling FAST.

Within a Search Node, there is a process called fSearch which is created for each index partition within the Index Node. The fSearch processsearches the index partition for matching documents. Then a single process within the Search Node called FDispatch takes all the results from each index partition and merges them into a single result that is ranked and sorted appropriately based on the rules specified in the Index Profile.

We will not go into all the details of the Index Profile, but since it was mentioned I will give you a highlight of it. The Index Profile (XML file) defines the search index schema for the search cluster. It defines document fields, document processing features, search features and results features. Almost every server in the search cluster configuration uses this file in some way.

Note that a Search Node is very similar in concept to the Query Service in MOSS 2007. The Query Service has the responsibility of searching the index file that is built by the Index service.

Query/Result (QR) Server

QR Server is responsible for preparing queries to be sent to the Search Nodes and refines the results before they are returned to the calling Search Front End Server (SFE). Query transformation includes spell checking, query-side lemmatization, query-side synonym expansion, anti-phrasing, stop work removal. It is applied to ensure that the best possible query is submitted. Some of this processing can be controlled by providing parameters with the query.

QR Server is also responsible for preparing the results for the calling SFE. Results transformation will perform result side duplication removal, build document clusters and build shallow navigators based on the query parameters that were given to the QR Server.

Note much of the configuration for the QR server is managed in the Index Profile (mentioned in the previous section).

Top Level Dispatcher (TLD)

Between the Search Nodes and the QR server is the Top Level Dispatcher (TLD). This must be installed anytime there is more than one Search Node. The TLD is responsible for distributing queries across the Search Nodes to improve query performance. The TLD is also responsible for merging search results from each FDispatch in each Search Node into a consolidated result set.

Search Front End Servers (SFE)

Search Front End Server (SFE) is simply the front end application that calls out to the FAST server. FAST provides an SFE application but it is not recommended for production use. A majority of the time, SFEs are custom built or integrated into existing applications. There is an API that is available in C#, Java and C++.

For MOSS 2007, there is a set of web parts available on CodePlex that allows you to display results of FAST ESP 5.3. This will be integrated into SharePoint 2010.

Administration Server

There are several administrative components to FAST such as COBRA Name Service, License Manager, Resource Service, Log Server, Config Server, Cache Manager, Admin Server, etc. Going into their details is beyond the scope of this article.

Round Up

Hopefully this was a good introduction to the major components and their role for FAST ESP 5.3. In the next post, I am going to discuss how these components are scaled and the design decisions you should consider as part of a FAST deployment.

Tuesday, January 5, 2010

SharePoint Web Configuration Options

Background

We recently had some questions come up on what are the best practices for pushing out custom configurations into SharePoint, especially across the farm. Many of the custom SharePoint solutions you write will need to push out a custom configuration to a web.config (database connection string) or even have to push out custom configuration file (like a log4net.config).

After some discuss here are some approaches with some considerations for each.

Use a Feature

Whatever you do we recommend using a Feature to do the deployment. Going through an alternate method or even manual can introduce errors. SharePoint Features are meant to provide a way to deploy files and solutions across your entire SharePoint farm.

Provision to 12 Hive

Scenario – How do you deploy a custom configuration file?

Approach - One solution is to provision a file to the 12 hive of SharePoint. In the manifest.xml of the WSP solution package use the <RootFiles> element to push the file to a specific location. For instance <RootFile Location="CONFIG\MyCustomConfig\Foo.xml" />. This will create a folder call MyCustomConfig under CONFIG in the 12 hive and place a custom config file there. You can use this approach to push the file anywhere inside of the 12 hive.

Considerations:

  • This file will be accessible to all web applications. This actually will be a good approach for many solutions.
  • Your calling solutions within SharePoint (let's say a web part) will have to be able to access these deployed files. You will need to have proper Code Access Security FileIOPermissions to access the files. We recommend keeping your security level a minimum and adding a CodeAccessSecurity tag to you manifest.xml of the web part only. For more information on how to deploy Code Access Security with web parts, please read this - http://www.k2distillery.com/2009/06/deploy-web-part-as-feature-with-cas.html.
  • If you like this approach but you need to deploy out custom configurations per web application you can do the following. Create a separate Feature for each web application and set the Feature scope=WebApplication. Then deploy and activate the Feature to the appropriate web application. However calling applications (web part for example) will have to know which directory to go to within the CONFIG directory.

Feature Activation

Scenario 1 – How do you deploy a custom configuration value (like a database connection string)?

Approach – A well accepted approach is to use SPFeatureReceiver in a Feature to push out the changes. One approach would be to use the SPWebConfigModification object to push on the changes on activation (FeatureActivating) and remove the changes on deactivation (FeatureDeactivating). For more information on SPWebConfigModification please read this detailed blog on the topic http://www.crsw.com/mark/Lists/Posts/Post.aspx?ID=32. As well read this http://weblogs.asp.net/wesleybakker/archive/2009/01/21/web.config-modifications-with-a-sharepoint-feature.aspx

Scenario 2 – How do you deploy a custom configuration file?

Approach – Again use SPFeatureReceiver however you will have to write some different code that will push the file to the location of your choice. Here is an example to get started from http://geekswithblogs.net/bjackett/archive/2009/12/01/deploy-files-to-sharepoint-web-application-virtual-directories-at-feature-again.aspx

Considerations

  • Requires custom code.
  • Small note - if new web servers are added, the configuration values will not be pushed to the new web servers unless the Feature is deactivated and then activated again.

Deploy to 12 Hive Bin

Scenario – How do you deploy a custom configuration file?

Approach - We found a trick described in this blog that will push a file into the bin of the web application - http://oricode.wordpress.com/2008/02/27/deploy-a-xml-file-in-your-sharepoint-solution-to-the-web-application-folder/

Considerations

  • This directory is really meant for binaries. We do not consider this a good approach.

Final Thoughts

All of the solutions listed here we believe to be valid solutions however you need to use the right one based on the scenario you need to support.