Wednesday, February 24, 2010

K2 User March 2010 User Group – K2 blackpearl 4.5

There is a K2 User Group meeting Tuesday, March 9, 11am-1pm central time. In this meeting we will be going over the new release of K2 blackpearl 4.5! If you are a K2 professional or you use K2 at your company I highly recommend that you attend to learn about it and use this as a forum to ask questions. This is a community driven event and is not a sales event.

------------------------------------------------------

Hello Everyone,

Phillip Knight from Merit Energy will be hosting the K2 user group meetings at Merit Energy, located at 13727 Noel Road, 2nd Floor Conference room, Tower 2, Dallas , Texas 75240. Parking information is included in the linked map below. Remote attendance information is included at the bottom of this message.

Link to map: http://www.meritenergy.com/content/MeritMap.pdf. Reminder: Merit Energy is on the 5th floor, but the meeting will be held in the 2nd floor conference room. Once off the elevator, go to the reception area and we will bring you back to the conference room.

Please RSVP to me via email whether you are attending via live meeting or if you will be attending in person (so that we can plan for the number of people to order food for).

Check out the K2 Underground site and our user group at http://www.k2underground.com/groups/greater_texas_user_group/default.aspx. We are posting webexes/live meetings from our meetings at this site.

4/13/2010 11am – 1pm
5/11/2010 11am – 1pm
6/8/2010 11am – 1pm
7/13/2010 11am – 1pm
8/10/2010 11am – 1pm

Meeting Agenda:

11-11:15 Networking/Refreshments
11:15-11:30 Announcements/Intros of New people
11:30-11:45 Tips & Tricks
11:45-12:45 Technical Presentation
12:45-1:00 Meeting Wrapup

The Announcements section of the meeting will include any information regarding K2 upcoming events and user group events as well as brief introductions of our presenter and refreshment provider.

The Tips & Tricks Presentation is when we as members can pose questions to each other on projects that we are working on and having difficulty with. It is also a time when if we have learned something that we feel will be helpful to others, we can share it with the group. Bring yours to share/ask.

Meeting Presentation and Sponsoring Company:

We thank K2 for providing a presenter for our March meeting to show K2 BlackPearl release 4.5's new features.

The K2 platform is for delivering process-driven applications that improve business efficiency. Visual tools make it easy for anyone to assemble reusable objects into applications that use workflow and line-of-business information.

K2-based solutions are deployed by a growing number of the global Fortune 100. K2 is a division of SourceCode Technology Holdings, Inc. based in Redmond, Washington, and has offices all over the world

For more information, contact Joe Bocardo at joeb@k2.com.

Meeting Presenter:

Our meeting presenter information will be available in the next meeting announcement.

For Virtual Attendees:

Note: please keep your phone on mute until you are ready to speak.

Audio Information

Telephone conferencing

Choose one of the following:

  • Start Live Meeting client, and then in Voice & Video pane under Join Audio options, click Call Me. The conferencing service will call you at the number you specify. (Recommended)

Use the information below to connect:

Toll: +1 (719) 867-1571
Participant code: 914421
First Time Users: To save time before the meeting, check your system to make sure it is ready to use Microsoft Office Live Meeting.

Copy and paste the required information:

Location: https://www.livemeeting.com/cc/scna
Meeting ID: 34PCR2
Entry Code: s53+9^P

Troubleshooting

Unable to join the meeting? Follow these steps:

Copy this address and paste it into your web browser: https://www.livemeeting.com/cc/scna/join?id=CH5CQQ&role=attend&pw=k3T@z(h

If you would like to provide refreshments at an upcoming meeting or present at an upcoming meeting, please contact me.

Let me know if you have any questions prior to the meeting.

The next meeting announcement will be sent out next Tuesday.

Thanks,

Have a great day!

Friday, January 29, 2010

SharePoint 2010 Beta VPC

I was talking with someone in the community and they pointed this resource out to me.

It is called the "2010 Information Worker Demonstration Virtual Machine (Beta)" and you can download it from here - http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=0c51819b-3d40-435c-a103-a5481fe0a0d2

Look at all the cool stuff that is preconfigured with and ready to go.

  • Windows Server 2008 SP2 Standard Edition x64, running as an Active Directory Domain Controller for the "CONTOSO.COM" domain with DNS and WINS
  • Microsoft SQL Server 2008 SP2 Enterprise Edition with Analysis, Notification, and Reporting Services
  • Microsoft Office Communication Server 2007 R2
  • Visual Studio 2010 Beta 2 Ultimate Edition
  • Microsoft SharePoint Server 2010 Enterprise Edition Beta 2
  • Microsoft Office Web Applications Beta 2
  • FAST Search for SharePoint 2010 Beta 2
  • Microsoft Project Server 2010 Beta 2
  • Microsoft Office 2010 Beta 2
  • Microsoft Exchange Server 2010 Active directory has been preconfigured over 200 "demo" users with metadata in an organizational structure

I have not had time to play with it…

Monday, January 25, 2010

SharePoint 2010 Service Architecture

Introduction

Recently I wrote a series of blogs on Enterprise Search and FAST ESP with the intention of having a deep dive into SharePoint Search 2010. There are many changes to both the logical and physical topologies of SharePoint 2010. Most of the readers of this blog series are very familiar with SharePoint 2007 but not so much with SharePoint 2010 (since it is in beta). Before doing a deep dive into SharePoint Search 2010 we need to have a good grasp of the changes for SharePoint 2010.

A well architected logical and physical SharePoint environment tends to revolve around Search. Search tended to drive much of the logical and physical architectures for SharePoint 2007. In this blog series I am going to do an introduction to the SharePoint 2010 services, logical and physical architectures. It is critical to have this worked out from the beginning because the SharePoint architecture must scale with the business. What we have seen is once SharePoint is implemented; it becomes highly adopted.

SharePoint 2010 Versions

In SharePoint 2007 the versions we became very familiar with were:

  • Windows SharePoint Services (WSS)
  • MOSS 2007 Standard
  • MOSS 2007 Enterprise

For SharePoint 2010 it has changed to:

  • SharePoint Foundation Services
  • SharePoint Server 2010 Standard
  • SharePoint Server 2010 Enterprise

Tier Architecture

SharePoint 2010 has not changed from a Tier perspective. There are Web, Application and Database tiers. What’s important is it to understand is how to architect those tiers. For SharePoint 2010 the Application tier has changed significantly as it is more sophisticated than what was available in SharePoint 2007. Some things that we will get into within this series are creating service farms and partitioned services.

Service Application

Important changes that you should be aware of:

  1. In SharePoint 2007 there was Shared Service Providers (SSP) which was used to host services. SSPs have been completely removed from SharePoint 2010 and services can be run independently.
  2. Some services in SharePoint 2010 will be referred to as Service Applications. Not all services in Central Administration are Service Applications. In the table below, you will see which services are considered to be Service Applications. You will see a trend that Service Applications tend to map to major features of SharePoint rather than services which could be considered part of the infrastructure of SharePoint.
  3. Service Groups have been introduced to logically manage Service Applications. When Service Applications are added they will be included in a Default Group. Web Applications can use that Default Group or use a Custom Group of Service Applications. What this provides is greater control of which Service Applications are available to specific web applications. If you are familiar with SharePoint 2007, the Service Group concept is one of the reasons we created different SSPs because sometimes we needed to create barriers between web applications.
  4. SharePoint 2010 services can be reused within and across farms. This was not available in SharePoint 2007 and provides a significant amount of scalability options.
  5. SharePoint 2010 supports Service Partitioning. If you are familiar with database partitioning, think of it is horizontal partitioning of data within a SharePoint service. Not all services support service partitioning; partitioning is only used in services that are data driven. A typical scenario is that a centrally managed/cross farm service with data that should not be exposed to all subscribing farms. If that is the case, a farm would subscribe to a partition of centrally managed service. In SharePoint lingo each partition is referred to as a “tenant”.

Knowing what we now know, when scaling out these services we will take the following into consideration:

  • Multiple instances of the same Service Applications can be initiated within a farm.
  • Service Applications are shared across Web Applications within a farm by Service Group.
  • Some Service Applications can be shared across multiple farms while others cannot.
  • Service Groups can logically contain Service Applications that reside in other farms.
  • Web Applications have the flexibility to use multiple instances of the same type of Service Application (regardless of which farm hosts that service).
  • Service Applications can have its data partitioned and only accessible to specific subscribers.
  • Service Groups can be used to logically scale for performance, security and scalability.

Some side notes:

  • Service Applications are hosted within IIS. It is possible to have Service Applications deployed to different application pools to achieve process isolation (important for both security and fault tolerance). So it is possible to have a single machine host many services and isolation between them.
  • Each Service Application instance has a Connection which Web Applications connect to. Web Applications use these Connections (sometimes referred to as proxies) to send and retrieve data from a Service Application.
  • If the same type of Service Application is used more than once in a single Web Application, one of the Connections will be marked as the primary.
  • Services are deployed through the Configuration Wizard, Central Admin and using Windows PowerShell. Services can be managed through Central Admin and PowerShell.

Given this flexibility in service configuration you now can:

  • Have better ability to share dedicated services across regional locations.
  • Have the ability to create dedicated services by business unit. For instance a Finance Web Application may have a dedicated Excel Services Service Application instance while a different Excel Services Service Application instance may be available to the rest of the farm.
  • Now have greater control to ensure that data cannot be shared between logical groups of users. For instance lock down departmental or intranet data.
  • Have the ability to support hosted models in a more secure and efficient manner.
  • Service Applications that have expensive operations, like Search, can be centralized and reused across farms.

We will put these rules for Service Applications into action in the next part of this series.

SharePoint 2010 Services

In SharePoint 2007 we commonly had to configure the following services:

  • Document Conversions Launcher Service
  • Document Conversions Load Balancer Service
  • Excel Calculation Services
  • Office SharePoint Server Search
  • Windows SharePoint Services Help Search
  • Windows SharePoint Services Web Application

In many cases with SharePoint 2007 implementations services were not configured correctly. This resulted in poor performance and the inability to scale to meet business demand. Many people implementing SharePoint 2007 did not understand that both a logical and physical architectures have to be aligned to how SharePoint services will be utilized. This will be a problem for many on the 2010 platform.

As discussed the service architecture has changed for SharePoint 2010 in many ways. Before we dive into all of the strategies of how services should be aligned in both the logical and physical architectures let us understand what the new services are.

Here is a list of services for SharePoint 2010. I found several pieces of information and I manually created this table with the information that I have.

Services

Description

Service Application

Cross Farm

Partitioning

Available On

Access Database Services

New service that allows for viewing, editing and interacting with MS Access through a browser.

Yes

No

Yes **

SharePoint Server 2010 Enterprise

Application Registry Service

Enables users to search and collaborate around business data. Provides backward compatibility to BDC service.

No

No

NA

SharePoint Foundation 2010 and up

Business Data Connectivity

Access to line of business systems. Service now supports writing to data services.

Yes

Yes

Yes

SharePoint Foundation 2010 and up

Central Administration

Central Admin Site

No

No

NA

SharePoint Foundation 2010 and up

Document Conversions Launcher Service

Schedules and initiates document conversions.

No

No

NA

SharePoint Foundation 2010 and up

Document Conversions Load Balancer Service

Balances document conversions across the SharePoint farm.

No

No

NA

SharePoint Foundation 2010 and up

Excel Calculation Services

Ability to interact with Excel files in a browser. New extended functionality.

Yes

No

No

SharePoint Server 2010 Enterprise

Lotus Notes Connector

Index service connector to index Lotus Notes Domino Servers.

Yes

Yes

Yes

SharePoint Server 2010 Standard and up

InfoPath Service

Supports hosting InfoPath forms in SharePoint.

No

Yes

Yes **

SharePoint Server 2010 Enterprise

Managed Metadata Service

New service that manages taxonomy structures and definitions.

Yes

Yes

Yes

SharePoint Server 2010 Standard and up

Microsoft SharePoint Foundation Incoming E-mail

Email service. This will run on the machine where the web application is running.

No

No

NA

SharePoint Foundation 2010 and up

Microsoft SharePoint Foundation Subscription Settings Services

New service used to track subscription IDs and settings for services that deployed in partition mode.

Yes

NA

NA

SharePoint Foundation 2010 and up

Microsoft SharePoint Foundation User Code Service

New service runs code deployed as part of a sandbox solution and runs in restricted mode. Must be started on any machine in the farm that needs to run Sandbox code.

No

NA

NA

SharePoint Foundation 2010 and up

Microsoft SharePoint Foundation Web Application

The service that runs the web application.

No

No

NA

SharePoint Foundation 2010 and up

Microsoft SharePoint Foundation Workflow Timer Service

Responsible for running timer jobs.

No

No

NA

SharePoint Foundation 2010 and up

PerformancePoint

BI Dashboard services.

Yes

No

NA

SharePoint Server 2010 Enterprise

PowerPoint

New services that allows viewing, editing and broadcasting PowerPoint in a browser.

Yes

No

Yes **

SharePoint Server 2010 Enterprise

Project

Host project server 2010.

Yes

No

Yes

Additional server product.

Search Query and Site Settings Service

Service that performs a query across built indexes.

Yes

Yes

Yes*

SharePoint Server 2010 Standard and up

Secure Store Service

Service provide SSO authentication.

Yes

Yes

Yes

SharePoint Server 2010 Standard and up

SharePoint Foundation Search

Service that provides search capabilities for SharePoint Foundation Search only. For SharePoint Server 2010 Standard and Enterprise this service will perform online Help search.

No

No

NA

SharePoint Foundation 2010 and up.

SharePoint Server Search

Crawls content, creates indexes and performs queries. Automatically configured.

Yes

Yes

Yes*

SharePoint Server 2010 Standard and up

State Service

New services that provides temporary storage of user session data for SharePoint components.

Yes

No

Yes **

SharePoint Server 2010 Standard and up

Usage and Health Data Collection

Reporting services that provide farm wide usage and health.

Yes

No

Yes

SharePoint Foundation 2010 and up

User Profile

New and expanded social networking services and features.

Yes

Yes

Yes

SharePoint Server 2010 Standard and up

User Profile Synchronization Service

Synchronizes user and group profile information that is stored in the SharePoint Server 2010 profile store with profile information that is stored in directory services across the enterprise. Works with AD, BDC, Novel LDAP and Sun LDAP (more info).

Yes

Yes

Yes

SharePoint Server 2010 Standard and up

Visio Graphics Service

Ability to view published Visio diagrams in a browser.

Yes

No

Yes **

SharePoint Server 2010 Enterprise

Web Analytics Data Processing Service

Captures data for analytics.

Yes

Yes

Yes

SharePoint Foundation 2010 and up

Web Analytics Web Service

Web service interfaces for analytics.

Yes

Yes

Yes

SharePoint Foundation 2010 and up

Word Automation Services

Service that performs automated bulk document conversions.

Yes

No

Yes **

SharePoint Server 2010 Standard and Up

* FAST Search cannot be partitioned.

** Supports partitioning but is not needed because there is no tenant data.

Next

In the next blog we will actually jump into the topologies of SharePoint farms (with diagrams) based on the information captured we have gone over.

References

Thursday, January 14, 2010

SharePoint 2009 Conference Notes

I unfortunately did not get selected to go to the SharePoint Conference however I have tons of notes from other colleagues that I have gone through. Greg Galipeau, colleague of mine, puts tons of great notes on his site (blog 1, blog 2, blog 3). Since the dust has settled, I started to sift through a couple presentations from the conference. It is still early to tell how things are going to play out.

Here are a few notes I took while scanning over a couple SharePoint 2009 Conference presentations.

Terminology Changes

  • Web Application renamed to Service Application
  • Content Database renamed to Service Application Database
  • WSS now called SharePoint Foundation

System Requirements & Installation Process

  • OS – Windows Server 2008 SP2 or R2 and 64 bit
  • SQL – 2005 SP2 or SQL 2008 and 64 bit
  • Prerequisites
    • SQL Native Client
    • Geneva Framework
    • Sync Framework
    • Chart Controls Framework
    • Filter Pack
    • SQL Server 2008 Analysis Services
    • Web Server Role
    • Application Server Role
  • Installation has not really changed too much, the wizard has stayed pretty much the same but been rebranded.
  • The configuration wizard is now available through Central Administration.
  • There is a new Farm Configuration Wizard that is available inside of Central Administration.

System Administration

  • Managed Accounts –SharePoint can manage password changes. SharePoint will auto-generate the password or you can get reminders for when the password needs to be changed.
  • stsadm is being replaced by Windows PowerShell. It is supposed to have more functionality and power.
  • Backup and Restore
    • Can be run on separate threads.
    • Have the ability to do Configuration only backups.
    • Available in Central Admin to do granular backups of site collections, web or lists.
  • HTTP Request Monitoring
    • Has the ability to throttle performance during high peaks of usage. If a threshold is exceeded get requests will receive 503 errors, timer jobs may not start but put requests are still allowed.
    • This is added to protect the server during peak loads. It will evaluate available memory, CPU utilization, the ASP.NET queue and wait time queue. It is checked on a regular interval of 5 seconds.
  • Logging
    • Will be more compressed by default.
    • Ability to have a SQL Logging database.
      • Opinion - This was very interesting because to me because searching and management of logs can potentially be easier.
  • Best Practice Analyzer – uses best practice rules and will alert if there are issues. The rules can be customized and it can be executed by time jobs.
    • Opinion - Let's see where this goes but it could be helpful for clients who do not have lots of expertise in house and they commonly make configuration mistakes.
  • Patch Levels and Upgrades – Servers and databases can be run at different patch levels as necessary but must maintain a level of compatibility.
    • Opinion - This is important as it would have helped me with a client I had; however this does not negate good Configuration Management of the SharePoint farm.

SharePoint Lists

  • SharePoint Lists can now handle 50 million items per list versus the previous 2,000 item UI limit.
  • As the list grows, there will still be a performance impact.
    • Opinion – I am still not a believer in managing highly relational or referenced data in SharePoint lists. This should be done in database tables.

New Development Tools

  • Developer Dashboard – is used to monitor page load and performance. It will show times it takes to load components on a page, database query information, check out status, web part processing times and critical events. Its purpose is to provide developers with a better tool to diagnose problems with web parts or lists that may take a long time to load.
    • Opinion - This should be pretty useful and know several clients that could take advantage of it.
  • Visual Studio 2010 will address many of the issues that we were having with SharePoint development. Reality was SharePoint 2007 development was very difficult because we did not have the development tools like what we have for ASP.net, Win Form, SQL Server, etc . Debugging was a major challenge and that has been fixed. Microsoft heard this loud and clear and it has been fixed.
  • Business Connectivity Services – basically a new version of the BDC but allows you to do a read and write.
  • LINQ to SharePoint – will provide better strongly typed way to get code out of SharePoint lists.
    • Opinion – This is created because writing dynamic CAML queries is very challenging based on the structure of that query language.
  • Client Object Model – this allows for SharePoint API code to run on external machines and not just on the SharePoint machine itself.
    • Opinion – This is will be very beneficial, we will see if it will do everything we need it to do.

General

  • SharePoint Workspace – If you are familiar with the product called Groove, it has been rebranded as SharePoint Workspace. This tool will provide a rich offline user experience with SharePoint 2010. If you tried Groove with SharePoint 2007 it did not work very but I suspect they will do the job now. There is a need for this especially with support mobile personnel, like a sales team.

Improved cross browser support (sorry, little details).

Dispose SPSite and SPWeb Check Tool

I read something pretty interesting. Scot Hillier in a presentation said some of the things that drive a SharePoint consultant crazy are:

  1. Running in Full trust. (Drives me nuts, read here.)
  2. Not automating web.config settings. (Very true, read here.)
  3. Deploying non-Release code to production. (Doh – do not forget!)
  4. Deploying to the GAC. (Drives me nuts, read here.)
  5. Not using Feature and Solutions properly to do deployments. (My addition)
  6. Forgetting to run SPDisposeCheck. (???)

Given I had recently seen some issues in SharePoint production environments where connections were not being closed, I wonder what was SPDisposeCheck? I had never heard of it and sounded useful.

SPDisposeCheck was developed by Microsoft and provided to the community to avoid this issue. It will check all of your assemblies to see if SPSite and SPWeb objects are not being disposed of correctly. Here is information for using it:

If you are seeing performance problems on your SharePoint sites, check your logs! If you see the following errors objects are not being disposed correctly and memory is being consumed:

ERROR: request not found in the TrackedRequests. We might be creating and closing webs on different threads. ThreadId = 21, Free call stack = at Microsoft.SharePoint.SPRequestManager.Release(SPRequest request) at Microsoft.SharePoint.SPSite.Close() ......

An SPRequest object was not disposed before the end of this thread. To avoid wasting system resources, dispose of this object or its parent (such as an SPSite or SPWeb) as soon as you are done using it. This object will now be disposed. Allocation Id: {B274FEBF-1D42-463C-B0C3-7A9005371494} To determine where this object was allocated, create a registry key at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\HeapSettings. Then create a new DWORD named SPRequestStackTrace with the value 1 under this key.

An SPRequest object was reclaimed by the garbage collector instead of being explicitly freed. To avoid wasting system resources, dispose of this object or its parent (such as an SPSite or SPWeb) as soon as you are done using it. Allocation Id: {617E63C9-32A3-42AE-AEEB-BC41CB798C88} To determine where this object was allocated, create a registry key at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\HeapSettings. Then create a new DWORD named SPRequestStackTrace with the value 1 under this key.

Here is a good blog that goes into the details of the error. The error is associated to not disposing SPSite and SPWeb objects correctly in custom code. This is extremely important and it can take down a SharePoint site.

Monday, January 11, 2010

FAST ESP SharePoint Connector

Series

Introduction

After introducing the components and providing a preview of design considerations for scaling a FAST ESP implementation, let us take a look at how FAST EST works with SharePoint today. In this post I will introduce you to the architecture of the FAST SharePoint Connector and explain how content is fed, processed, stored and queried. We will cover considerations and strategies for a successful implementation.

In the next set of posting, we will discuss in detail what has been planned for SharePoint 2010.

Note, if you have no FAST ESP experience or training, you must read this blog to understand some of the concepts.

FAST Connector for SharePoint Today

FAST supports both SharePoint 2003 and 2007 in the same manner it would support any other enterprise application that FAST would index. FAST provides an API (Java, .Net, C++) and the FAST Content Connector Toolkit, which facilitates the building for Connector applications. The SharePoint Connector is built on these frameworks to feed content from SharePoint into FAST.

There are three features of the FAST SharePoint Connector you should be aware of:

  1. It will index sites, lists, list items, files and associated metadata from SharePoint.
  2. It can incrementally retrieve content from SharePoint.
  3. It will capture SharePoint item permissions and incorporate it into the access control list.

Architecture

The architecture of the FAST SharePoint Connector is pretty simple and well contained.

  1. A custom web service will be installed into the SharePoint farm. This web service will be accessible just like the out of the box web services provided in SharePoint. Side note: if you are interested in writing out your own custom web service for SharePoint, read this blog .
  2. The FAST SharePoint Connector must be installed on a machine that can access the SharePoint web services and is able to connect to FAST ESP Content Distributors. It really does not matter where this is installed as long as it can make the required connections. That being said, the Connector could be installed on either the SharePoint Farm or on the FAST Farm.
  3. The Windows Authentication Proxy must be installed onto the FAST Farm.

SharePointFASTConnector

Basic Processing Flow

The installed components work together to retrieve content from SharePoint and make it searchable within FAST. Here is the process:

  1. The SharePoint Connector calls the FAST SharePoint Web Service to retrieve content.
  2. The FAST Connector connects to the FAST Content Distributor and sends along the SharePoint data.
  3. The FAST Windows Authentication Proxy “may” be used to get additional SharePoint data.
  4. The document processors process the content into FIXML documents so an index can be built.

Now let’s dive a little bit deeper into some of the details about how this works.

Incremental Loading

The FAST SharePoint Connector will perform incremental loading of content. The first time it will be heavy because all of the SharePoint data will be loaded. However, subsequent content loads will only retrieve changes. The interval for incremental loading is configurable.

Incremental Loading Strategy

If you need to completely reload the data, you must clear the Collection the documents were fed to. Doing this has ramifications that you should be aware of. The most important one that comes to mind is that Collection can have documents from other locations. If so, all of that content will have to be re-indexed too! That can be a big deal. So it is important organize your Collections and potentially anticipate if you will ever have to do this.

You may be wondering how you can control the amount of data that is indexed at any given time. Well there are probably many ways but here are some options that come to mind first.

Create multiple pipeline instances for processing SharePoint data, then configure the pipelines to include or exclude specific URLs within SharePoint. I might create a dedicated pipeline for processing content in areas where I know there will be lots of updates. For instance, collaboration or project sites will have data updated on a regular basis. I would then configure that pipeline to refresh the interval on a regular basis. The advantage of doing this is that a smaller subset of data that is regularly updated will be polled more frequently. Then I would create a dedicated pipeline for a publishing site where data is updated less frequently. The interval between getting data may be longer.

Another thing I may take into consideration is pushing data to different collections. For instance, you can have dedicated collections for intranet, extranet and Internet (remember I am not talking about SharePoint collections, I am talking about FAST collections). Typically in the SharePoint world you logically group data into different content databases, shared service providers and even different hardware. It may be good to maintain that logical separation knowing that it is recommended to feed content into different FAST Collections based on these logical boundaries. Doing this will also give more control over the Search Profiles and what Collections people have access to.

Document Feeding

When SharePoint data is read through the web service by the SharePoint Connector, both metadata and security information will be sent to the Connector. However, depending on the configuration you set, SharePoint files may or may not be part of that pay load. By default, a reference to the file (a URL) will be part of the information sent from the SharePoint Connector to the Content Distributor(s). During Document Processing, the Windows Authentication Proxy will use this reference to retrieve the actual document from SharePoint. You have the ability to change this configuration and send the file as a BLOB.

Document Feeding Strategy

Why are you provided with this option? Mostly for flexibility reasons. If you pass the files by reference:

  • The Connector is going to perform more quickly because it has less data to work with.
  • Your network I/O will be better utilized because the document will not have to be passed twice as it will only be retrieved one from SharePoint. This is a big deal if you have large files.

If you choose to pass the file immediately:

  • Document Processing will be quicker because it does not have to go out a retrieve the file from SharePoint.
  • The machines where Document Processing is located do not need to have access to the SharePoint sites because all of the content is available.

Processing SharePoint Data

Working with SharePoint data is not really different than working with other data that may come into FAST. But we need to be aware of some strategies you may want to employ. First, you should know how the data type mapping from SharePoint to FAST will he handled. Second, metadata from columns in SharePoint will be mapped into the fields within the Index Profile. This mapping is based on the unique name of the field. For example, if you have a column called Last Update Date in SharePoint, in the FAST index profile there must be a column called lastupdatedate (notice it is lower case, no spaces and no special characters). If this is the case, the data from the SharePoint column will be automatically mapped into that index field and become searchable. Note the SharePoint column data is not mapped; Document Processing will discard the data.

If you have a good understanding of SharePoint, this will raise a red flag for you. This is because you know that columns of SharePoint data can be added in an ad-hoc fashion. The question on your mind is how can data made searchable if it is not mapped to a field? When you learn FAST there is a concept called Scope fields. Scope fields take metadata and store them in a structured format (similar to XML) in a single field in the index. Scope fields are specifically provided to support storing of index data without having to know the schema of that data in advance. When you store data in a Scope field you have the ability to query it back out using their FQL language (similar to writing an XPath query).

Processing SharePoint Data Strategy

There are some considerations that you must take into account. First, you can add new fields into the Index Profile to match all of the data that you want to bring into SharePoint. This is fine; however, if you need to add a field this is considered to be a “warm update.” After making the change, only new documents will have the data but all previously indexed documents will not. For old documents to have the data, they must be completely re-processed. This will require you to clear the collection and completely re-index (discussed above). A second consideration is that using Scope fields to support querying all column data has a query performance penalty.

Here are some additional recommendations:

  • Come up with a hybrid approach where important SharePoint columns are mapped to a field in the Index Profile. Then allow all other columns to be indexed automatically into a Scope field. This is a common practice. This will give you good query performance on most common columns of data and still allow you to access to all other column data.
  • Earlier, we mentioned potentially creating separate Collections for publishing sites versus collaboration sites. In that scenario, do not turn on Scope fields for the publishing Collection because the metadata should be very well defined. This way you can get better query performance. All you need to do is either add new fields that map directly to SharePoint columns or add document processing stage(s) that will save the data in existing index fields.

Wrap Up

This post provided you with some insight into how FAST ESP indexes data from SharePoint. Hopefully you will take these factors into consideration before you start to index your content. This is why we say it is so important to understand the life-cycle of the data you are indexing - because it will influence your approach.

SharePoint 2010 Planning, Development and Architecture

I recently stumbled across a bunch of documents that are available that give SharePoint professionals insight into the new SharePoint 2010 platform.

SharePoint 2010 Architecture

SharePoint 2010 Development

SharePoint 2010 Search

FAST for SharePoint 2010