Monday, January 17, 2011

Part 6 - FAST for SharePoint 2010 References

References

Below is a full list of all the resources I used for this series on FAST for SharePoint.

Background References

FAST for SharePoint Planning and Architecture References

FAST for SharePoint Deployment References

Part 5 - FAST for SharePoint 2010 Service Configuration

In part five of this series, we are going to focus on the SharePoint side of FAST farm configuration.

Configuration of SharePoint Services

After the FAST farm has been set up, you will need to configure the SharePoint 2010 farm to communicate with it. As I mentioned earlier in this blog series this is done through two services:

  • FAST Search Connector service – also commonly referred to as the Content SSA.
  • FAST Search Query service

Before configuring the search components in SharePoint, there is a file that will help. Go to the server where the FAST Administration Component is installed. There will be a file under c:\FASTSearch called install_info.txt. The values within this file will be used in the configuration of the SharePoint services that communicate with the FAST for SharePoint Farm.

FAST Search Connector service

To set up a new FAST Search Connector service, go to Central Administration as normal and create a new service. Add a New Search Service Application and in the FAST Service Application section, select the FAST Search Connector option.

Next you need to specify a search service account – which in this case will be a SharePoint managed service account. As well you need to specific an application pool.

image

Next we need to configure some stuff specific to FAST and this is where information from install_info.txt file will assist us.

As I mentioned earlier, the job of the FAST Search Connector service is to send information to the FAST Content Distributor component which will subsequently feed documents to the various processing components on the FAST farm. As you see below, there is a Content Distributor server name.

As well there is a Content Collection Name. The term “collection” is a carryover from the FAST ESP product. A collection is a logical grouping of searchable documents in FAST. If you like to read more about what a FAST collection is read my old blog here. You only have the ability to specific one collection name. The default collection name is “sp”. You may ask why should I care about this? Well you may have multiple SharePoint farms that are feeding content to the same FAST farm and you may want to logically organize content being fed (i.e. intranet content versus extranet content).

image

Once the FAST Search Connector service has been created, go take a look at the service. You right off the bat that is almost identical to the out of the box search administration, except none crawling links on the left are available. Everything is about crawling.

All you need to do from here is add content sources, set up some crawl schedules and you are off to the races.

image

Below shows the topology of the FAST Connector service (which is only the topology of the SharePoint side, not the FAST side). There is nothing of real interest other than the fact there is an administration component. The crawl component is not the same as the SharePoint Search. It is used to just support the feeding of content but it is not crawling in the tradition sense. There are no SQL performance considerations you have to account for here.

image

FAST Search Query service

Next you need to configure the FAST Search Query service. As I mentioned earlier in this blog, this service has two purposes: to send / retrieve query results from FAST and to perform the People Search. To support the people search, this service includes both indexing and querying.

Just like before, add a New Search Service Application but this time select FAST Search Query. Again you will need to specify a service account; you can probably just reuse the one you created for the connector service.

image

Next you will need to configure two applications pools.

image

Finally there are four configurations for FAST. All of the information needed to enter here are again located in the install_info.txt file we pulled off the FAST Admin server.

image

When it is all done, you will have a service that looks like the following. It looks identical to the out of the box SharePoint search. You will notice that there are both Crawling and Query and Result links on the left. Remember there are Crawling links because People Search is part of this service. You will need to configure everything like you normally would for search service.

image

Finally here again is the search topology for this FAST Search Query service. You see that there are several databases and what not that have been created all to support the crawling and querying.

image

Once we have completed created both of these services, you will see the two FAST Services in the Service Application list in Central Admin.

image

Conclusion

I really hope that this blog series will help you to get started in understanding what a FAST for SharePoint deployment would be like before you go off and do one.

Part 4 - FAST for SharePoint 2010 Farm Configuration

Part four of this series is going to focus on configuration of FAST for SharePoint.

Configuration of the FAST for SharePoint Farm

Now the next question is how to configure this FAST farm because in many instances you know how to do SharePoint but FAST is a foreign concept. Well the configuration of the FAST for SharePoint 2010 farm is not configured through SharePoint 2010 Administration. At a high level, there is an xml file that you need to create that captures what FAST for SharePoint components are configured on which server in the FAST farm. This xml file basically drives the entire configuration of the FAST farm.

In this part of the series I am going to give an introduction and pointers to information on how to configure your FAST for SharePoint 2010 Farm.

The best resource for you to begin your understanding of a FAST for SharePoint is “Deployment guide for FAST Search Server 2010 for SharePoint” located at - http://go.microsoft.com/fwlink/?LinkId=204984. This whitepaper pretty much has it all. This really just a supplementary with some added details.

Understanding this document will go a long ways in understanding the architecture of FAST for SharePoint. At a high level:

  1. There are several service accounts, firewall configuration, IP address work, windows updating, anti-virus and proxy settings you need to do before you can even start the installation.
  2. Next I would make sure you have SharePoint 2010 installed and then make sure that the servers that will host the FAST for SharePoint farm have access to the same SQL environment.
  3. Next you need to install FAST for SharePoint onto each server in the FAST farm. There is a prerequisites installer that you need to run to make sure all the requirement components are on the machine.
  4. Then you need to configure the FAST for SharePoint farm. There are two options: stand-alone or multiple server farm. The stand-alone is as simple as it sounds and you need to go through the configuration steps.
  5. The multiple server farm configuration entails the creation of a deployment.xml file that I referred to earlier. After installing the FAST for SharePoint bits on each server in the farm, you will go through a configuration process which will use the deployment.xml file to activate components on that server. We will take a deeper looking into that shortly.
  6. Next we need to create the FAST Search Connector service in SharePoint Central Administration which will send SharePoint content to FAST. You will need to configure SSL for the communication between the two.
  7. Finally you will need to set up the FAST Search Query service in SharePoint Central Administration which crawls people information and calls the FAST for SharePoint query servers. There is an extra step of this configuration which requires claims authentication to support the call to the FAST query servers.
  8. There are several other steps in this whitepaper that layout how to get the SharePoint search centers set up and to test your installation.

For the remaining parts of this blog, I am going to focus on steps 5, 6 and 7.

Deployment XML File

As I mentioned, the deployment.xml file is the key the deployment the FAST for SharePoint farm. There is no nice GUI that will show you the farm and allow you to configure it. In step five you will install the FAST for SharePoint bits on each server of the FAST farm. Then on the machine that will have the admin component installed on it, you need go through the configuration process. The deployment.xml file will be used. Then you will go to each other server in the FAST farm and configure it using the same deployment.xml file.

Now you see the important of this file and why I wanted to focus on it. I was able to find some good references and examples.

First let’s talk about the deployment.xml file. You need to read this - http://technet.microsoft.com/en-us/library/ff354931.aspx - albeit it may be a boring read however it shows how it works. Below is a listing of the components that I provided in the blog with a mapping to the XML nodes.

  • Administration component - <admin>
  • Document Processing component - <document-processor>
  • Content Distributor component - <content-distributor>
  • Indexing Dispatcher component - <indexing-dispatcher>
  • Web Crawler component - <crawler>
  • Web Analyzer component - <webanalyzer>
  • Indexing component - <searchengine>
  • Query Matching component - <searchengine>
  • Query Processing component - <query>

All of the tags described in the deployment.xml reference are important and several do not map directly components. It was not easy at first to gain an understanding of how this works. The best way to do this is to basically open up the deployment.xml reference and then review some examples of deployment.xml files to really understand how it works.

The following are several places where I found examples of the deployment.xml file:

To save myself time, I am going to pull one of the examples from the FAST Search Server 2010 for SharePoint Capacity Planning document here. I picked this one because it really shows what a FAST Medium size farm that is scaled out. I am not saying this is the best configuration to start off the bat either – read the FAST Search Server 2010 for SharePoint Capacity Planning to find a base farm that best meets your needs and tweak as needed. In many cases a simple three server farm configuration is the best place to start (i.e. a small server farm deployment which was shown earlier in this series). But it would be no fun to talk about J

So here is a picture of a medium farm. As you can see there are:

  • This farm has two rows and three columns.
  • On the first row, document processing components have been spread across multiple servers to provide good content consumption.
  • The first row also really dedicated to indexing as well as the Indexing components will be turned on their along with Query Matching components.
  • The second row is dedicated to searching as there are Query Matching components and Query Processing components.
  • Note that components such as content distribution and index dispatcher have been turned on in the first row.
  • Looking at the very first server in the farm, you may ask why there are query and document processor components are there too? The justification was that they would not be actively referenced by the FAST farm however available if maintenance is being done on the FAST farm.

Now honestly, given all the information I have provided thus far in this blog series, there are a few things I would potentially change in this configuration to ensure that I had a highly available FAST farm. I will get into those changes shortly.

image

Here is the deployment.xml file for this farm configuration above:

<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="M4"
xmlns=”http://www.microsoft.com/enterprisesearch
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">

<instanceid>M4</instanceid>

<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M4.jdbc]]>
</connector-databaseconnectionstring>

<host name="fs4sp1.contoso.com">
<admin />
<query />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="12" />
</host>

<host name="fs4sp2.contoso.com">
<content-distributor />
<searchengine row="0" column="0" />
<document-processor processes="12" />
</host>

<host name="fs4sp3.contoso.com">
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="1" />
<document-processor processes="12" />
</host>

<host name="fs4sp4.contoso.com">
<indexing-dispatcher />
<searchengine row="0" column="2" />
<document-processor processes="12" />
</host>

<host name="fs4sp5.contoso.com">
<query />
<searchengine row="1" column="0" />
</host>

<host name="fs4sp6.contoso.com">
<query />
<searchengine row="1" column="1" />
</host>

<host name="fs4sp7.contoso.com">
<query />
<searchengine row="1" column="2" />
</host>

<searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="none" search="true" />
</searchcluster>

</deployment>


Now looking at that deployment.xml file, here are some notes that will help you better understand it:

  • The <deployment> tag is a wrapper tag. It has some attributes for you to manage the version, last modified date, etc. Highly recommend you use these attributes for configuration management reasons.
  • The <instanceid> tag can be used by SCOM.
  • The <connector-databaseconnectionstring> is the location where you specify JDBC connection strings if that connector is being used.
  • The <host> tag is a specific server in the farm. Within the <host> tag is where you identify all of the components that will turned on in a specific server. This where <admin>, <document-processor>, <content-distributor>, <indexing-dispatcher>, <crawler>, <webanalyzer>, <searchengine> and <query> are defined. You can read the specification about these tags as they are pretty straight forward in their configuration.
  • The <searchengine> tag is the most important component tag when comes to understanding how many Indexing and Query Matching servers there are. In the <searchengine> tag, you specify the row number and column number however you do not specify here if the Indexing and/or Query Matching components are turned on. This is done in correlation with the <searchcluster> and <row> tags.
  • The <searchcluster> tag is an important tag that is used to wrap the <row> tag. Earlier in this series I presented a diagram that discusses columns and rows. These <row> tags are used to define the number of Indexing and Query Matching component rows that are being used in the farm configuration.
  • A <row> tag can be defined as a primary, secondary or none using the index attributed. There can only be one primary index row. Marking the <row> tag as primary indicates this is the primary row of servers with Indexing components. Marking the <row> tag as secondary means the row is redundant index row. Finally, marking the <row> tag with the value of “none” means there is no Indexing components on this row.
  • The <row> tag also has an attribute called search which can be true or false. If marked true, Query Matching component is deployed on that row.
  • So here are some examples of the <row> tag which find critical to understanding the deployment. There are more permutations however these are the ones you will run up against the most:
    • <row id="0" index="primary" search="true" /> - this means this is a primary row with the Indexing component. As well there are Query Matching components on that row.
    • <row id="0" index="primary" search="false" /> - this means this row is completely dedicated to indexing.
    • <row id="1" index="secondary" search="true" /> - this means this second row has redundant Indexing components and the primary purpose of this row is for Query Matching components.
    • <row id="2" index="none" search="true" /> - this means this row is completely dedicated to Query Matching components only.
  • The <row> tag is tightly correlated back to the <searchengine> tag. Having a <searchengine> tag defined for <host> tag indicates that row will have Indexing and/or Query Matching components. As you can see, the <row> tag actually controls which one is active on that server.

As I mentioned earlier, I believe there are some changes I would make to this farm to ensure there is some redundancy. For instance I would change the <row id="1" index="none" search="true" /> to be <row id="1" index="secondary" search="true" />. The reason being is if the index server were to fail, the query server in that columns would redundantly become the index server until the other server were fixed. Now you could experience some query performance issues while this is going on if you have high volumes of content that are being consumed.

The way the redundancy works is FIXML document (which is the searchable representation of every document in the index) will be replicated to the second row. However an index will not be actually built from the FIXML document. When an indexing component fails, a new index will be built using the FIXML documents on the redundant Index row.

Once you start to get the hang of it, this deployment.xml file is not too hard.

When you have completed an installation on a specific machine, run the following command

nctrl status

This will return the status of all the FAST components that are running on that machine.


image

Another thing you can do is go into the services console and check out the FAST services.

image

In the next part of the series I am going to focus on the SharePoint side of the configuration.

Part 3 - FAST for SharePoint 2010 Physical Architecture

Part three of this series will now focus on the physical architecture of FAST for SharePoint farm deployment.

Physical Architecture / Topology

Now am going to take what we have learned about the logical architecture and scaling and apply it to some farm scenarios. From what I know, scaling a FAST for SharePoint 2010 farm is a similar to scaling a SharePoint 2010 farm. We basically need to analyze the requirements and the scale the FAST components appropriately to meet those requirements. Some of the immediate ones will come up are:

  • How much content needs to be searchable?
  • What is the total number of items that need to be searched?
  • What is the format, size and any other interesting characteristic of the items to be searched?
  • What sort of availability is needed to support search?
  • How fresh must the search results be and how often is content changed?
  • How many users will be performing searches concurrently?

This is probably just a starting point. I actually wrote a set of questions that need to be asked here.

Up this point we learned about the FAST for SharePoint components such as:

  • Administration component
  • Document Processing component
  • Content Distributor component
  • Indexing Dispatcher component
  • Web Crawler component
  • Web Analyzer component
  • Indexing component
  • Query Matching component
  • Query Processing component

We have also learned about how they scale. Now honestly, it is pretty much impossible for me to give you all the examples out there on how to scale FAST. So I am going to pick ones (which I shamelessly already have pictures for) and discuss those J

Minimum Deployment

This following is a very basic, non-redundant implementation of SharePoint 2010 and FAST for SharePoint 2010. It is never really commended to install both on the same machine unless you are creating a local development environment. In both of these scenarios there really is no scale other than SharePoint and FAST are on different machines.

In the case of FAST all of the core components such as admin, document processing, content distributor, index dispatcher, web analyzer, indexing, query matching and query processing are all deployed to a single machine.

image

It is interesting to note here that SQL is not heavily utilized by FAST for SharePoint. FAST only needs to access configuration information from SQL. This is different than the out of the box SharePoint 2010 Search which heavily utilizes SQL as part of the indexing and querying process.

Small Deployment

The following is considered to be a small deployment of FAST for SharePoint. You will notice there is both a SharePoint and FAST farm. I am not going to discuss the SharePoint Farm; you can read my blog on how to scale SharePoint 2010 farms. However let’s look at the FAST for SharePoint 2010 Farm. As you can see there are three boxes.

image

The first box:

  • Has the Administrative component which can only be installed on one machine in the farm.
  • Has the Content Distributor which is responsible for routing documents to document processors.
  • Has a Document Processor component with 12 processes running. The more processes you have running the more content can be consumed. The number of processes that can be supported on the machine is available cores on the server.
  • Has an instance of the Web Analyzer running.

The second box:

  • Has an additional document processing component.
  • Has the index dispatcher component which is responsible for sending processed content to be indexed.
  • Has the indexing component, query matching and query processing components installed.

The third box:

  • Has redundant indexing component, query matching and query processing components.

In the end this is a really simple / standard implementation that can handle in the range of 10+ million items and managed about 10 queries per second.

Medium Deployment

This is a medium sized FAST for SharePoint deployment. This should be able to index roughly 40 million items and continue to support 10 queries per second.

image

Observations:

  • The first two FAST servers have all the administration, content distributors, and web analyzer components. It has also document processing components.
  • The following six servers have been configured used columns and rows patterns I introduced earlier.
  • There are three index columns to partition the content to allow for a higher volume of content than the previous.
  • On the first row we have additional document processing components. This will improve consumption and processing of content. Plus it adds redundancy.
  • As well on the first row index dispatching and indexing components have been installed.
  • On the second row are secondary indexing components which are redundant copies of the indexes on the first row. It is not exactly clear in the diagram but the query matching and query processing components are configured. This way there are dedicated machines to support the querying of content.

Large Deployment

This final farm is very similar to the previous other than some of the administration components have been scaled to a third server and there are now a total of six index columns to support up to 100 million searchable items.

image

As I said before, this is by no means the only three configurations of the FAST for SharePoint farm. These components can be scaled however to best meet the requirements.

References

Part 2 - FAST for SharePoint 2010 Logical Architecture

This is part two in this FAST for SharePoint series and it will focus on the logical architecture.

Logical Architecture

If you are a reader of my blog, you will notice I always start out with understanding the features (previous section), then understand the logical architecture and then finally understand the physical architecture. I have always said the most common mistake is to make is jump right to the physical architecture because everyone wants to know how many “boxes” they need. However both the features and logical architecture heavily influence the physical architecture.

If you are an experienced SharePoint professional, right off the bat you want to know, how are the FAST services made available to SharePoint? Hopefully the below diagram will clear it up.

image

Here is the reference to the above diagram.

If you have been reading up on the new SharePoint 2010 Service Architecture you will know that it is much more scalable than the prior version. Basically there are two services that have to be configured when using FAST for SharePoint:

  • FAST Search Connector service – In the above diagram, this is the FAST Content SSA. This service is responsible for feeding content to FAST for SharePoint farm. Architecturally, it is performing the same function as connector application would in the FAST ESP 5.3 product. This service is configured in Central Administration and you configure it like the out of the box search (i.e. set up content sources, rules, etc.). The configured content sources will be used to feed the content to FAST for indexing. One thing to point out is the all the data for the content sources will flow through this connector whether they be in SharePoint, file share, public exchange folder, etc. All content will be fed to the FAST Content Distributors which will subsequently send the content for processing.
  • FAST Search Query service – In the above diagram, this is the FAST Query SSA. This service has two responsibilities. First and foremost it is responsible for forwarding all search queries to the FAST farm query service. The FAST farm query service has the responsibility of building the search results and then returning them to SharePoint FAST Search Query service. Second the SharePoint FAST Search Query service is responsible for performing all people searches. This service actually performs the indexing of user profile information and also performs the querying of the data in the user profile index. In summary, when a query is made content data will be retrieved from FAST while user profile data will be returned from this service.

Now let’s take this one step farther by understanding to understand the roles and responsibilities. In the diagram below you can see how it aligns with the previous diagram:

  • On the far right we see the FAST Search Connector service which is configured to feed content from other locations.
  • On the bottom far left we see the FAST Search Query service which will receive queries from SharePoint web parts and then perform a query for content in FAST as well in the user profile index it has built.

In the middle you can see several services that are part of the FAST for SharePoint Search farm.

image

Here is some background on these services. If you read up on my FAST ESP 5.3 series, you will see that the architecture is not really different. Starting on the far right within the “FAST Search Server 2010 for SharePoint Farm” area:

  • FAST Indexing Connectors – This is where additional FAST Indexing Connectors reside like the JDBC and Lotus Notes connectors. These connectors are conceptually no different that the FAST Search Connector running in SharePoint. They are responsible for feeding content to FAST. The only difference is these are configured as part of the FAST server deployment configuration file and they are not configurable in SharePoint Central Administration.
  • Item Processing – This is the where a lot of the secret sauce of FAST for SharePoint 2010 lives. There are a couple sub-components which are not shown in this picture that you should be aware of. First there are Content Distributor(s) which have the responsibility for directing content to the correct Document Processing Pipeline for processing. Document Processing Pipeline does much of the FAST magic such as entity extraction, linguistic processing, and content normalization.
  • Indexing – This is the component that manages the content that is produced by the Document Processing Pipelines. We will talk about scaling later, but there is a good chance that there will multiple Index nodes in the FAST for SharePoint farm. Multiple index nodes are needed to support searching large volumes of content in a quick amount of time. The concept is very similar to the SharePoint out of the box search where you create Index Partitions to allow for quicker querying of large volumes of content. Another sub component you should be aware of is called the Index Dispatcher which is responsible for routing a searchable item to a location on the index.
  • Query Matching – This component is responsible for actually searching and retrieving items from the index. It will build the result set from the Index node that it is associated to. This is another component that can be scaled to assist with improving performance. It will also do things such as return a summary of content for an item, highlight the search terms and supports both shallow and deep refiners. This component is the same as the Search Node in the FAST ESP 5.3 product and is similar to the Query Component in the SharePoint 2010 out of the box search.
  • Query Processing – This component will perform both pre and post query processing. This is also the component that the SharePoint FAST Search Query service connects to. For pre-querying such things as language parsing, linguistic and security processing will be applied. For post-query operations such as result merging from multiple index nodes, formatting and duplicate removal will be performed.
  • FAST Search Authorization (FSA) – This component works with the query processing component to ensure that user performing the search only has access to content they have permission to.
  • Web Link Analysis – This is also referred to as the Web Analyzer. This component provides various different features to improve the relevancy. For instance if a piece of content is linked to a lot by other content sources, that piece of content will be considered more relevant than others. Also this component does click-through analysis from the search results. The logic being if the content is clicked on a lot in the search results it may be more relevant than other content.
  • Web Crawler – This component is not shown in the diagram above but you will run across it in your FAST for SharePoint research. This is a component that will crawl and feed web content to FAST. This component does not have to be used if SharePoint FAST Search Connector service has been configured to crawl websites in SharePoint Central Administration.
  • Administration Component – This is the component that manages the FAST farm. The interesting thing about this one is that it cannot be scaled or made redundant however if this component were to fail that FAST server will continue to run in the configuration it is currently running in.

Scaling the FAST for SharePoint 2010 Architecture

In this section I am going to give an introduction into how FAST for SharePoint 2010 components can be scaled.

The diagram below comes from this MSDN article. In the previous diagram discussion I introduced the two components called Indexing and Query Matching. If you have been doing research on FAST you have probably heard about creating columns and rows to support better querying and indexing. This applied directly to both Indexing and Query Matching components.

An entire index can be made up of multiple index columns. Breaking the index into multiple columns is commonly referred to as partitioning. Searches are performed across the search rows, which in turn search all of the index columns. The Query Processing component that I introduced earlier (which is different that Query Matching) is responsible for merging all of the search results, from all of the index columns together into a single result set. It is not depicted in this diagram but there will be Query Processing components for each column. This is conceptually very similar to the new out of the box features of SharePoint 2010 Search.

image

The point of this diagram is show how FAST search is scaled, and scaling the index and query nodes are the most common. Here are some basic rules you should know right off the bat:

  • When there are more Index Columns – more content volume can be managed and better search performance can be realized.
  • When there are more Index Rows – there is better fault tolerance as there is a primary Indexing node and back-ups. Should the primary fail; the back-up will kick in.
  • Adding more Search Columns – will provide better search performance against a large volume of content.
  • Adding more Search Rows – will provide not just fault tolerance but also better performance as queries can be load-balanced across the search rows.

The information you just learned will help you understand how many servers of FAST you may potentially need.

Now I know the next question you may be asking is “when do I need to add a new index column”? Well the guidance that I found is depends as I have talked with FAST folks and say 10 to 25 million items can be managed per index column. Drivers of how many columns are based on this such as:

  • How much data is being consumed?
  • How often is data fed?
  • How quickly does the data need to be made available?
  • How many concurrent queries will be made by users?
  • Etc.

Now the Indexing and Query Matching components are not the only components can be scaled. Pretty much every component, with exception of Administration, can be scaled:

  • FAST Indexing Connectors – Multiple indexing connectors can be added to better supporting feeding of JDBC and Lotus Notes data.
  • Item Processing – The Document Processing Pipeline I referred to earlier can be scaled. You will want to add more item processors to support the rate at which content is fed into FAST. This is important if you are trying to reduce index latency which is the amount time it takes to make content searchable in the index.
  • Content Distributor – I earlier mentioned this as part of the Item Processing component. Adding multiple Content Distributors will add fault tolerance but there were only ever be one primary.
  • Query Processing – If you recall this is the component that does the pre and post processing around making a call to the Query Matching nodes(s). More Query Processing components can be added to better support queries per second and to provide fault tolerance.
  • Web Link Analysis – Can have multiple instances added to reduce the amount of time needed to complete the analysis.

Hopefully this has provided you a good introduction into how FAST for SharePoint 2010 can be scaled.

Part 1 - FAST for SharePoint 2010 Features and Proposition

Introduction

FAST for SharePoint 2010 (FS4SP) is one of the newest and coolest features of SharePoint 2010. FAST was a major acquisition for Microsoft and it is one of the top Enterprise Search engines in the Gartner quartiles. Combined together, Microsoft really has a best of breed platform for managing content and building enterprise solutions.

FAST ESP 5.3 was the last version of the product before it was integrated with SharePoint 2010. The legacy product is still available in different formats from Microsoft because they want to continue to support existing clients and there are scenarios where FAST ESP 5.3 is a better fit. It has been rebranded as FAST Search for Internet Sites (FSIS) and FAST Search for Internal Applications (FSIA). I am not going to go into either of these. If you want an understanding of FAST ESP 5.3 I have written a blog series about it here which goes into the architecture.

After writing that series over a year ago, I have been wondering how is FAST integrated into SharePoint 2010. I really wanted to understand this from an architecture perspective so I can provide good guidance on how to set up SharePoint 2010 to use FAST. When I set out to do this, it took me awhile to really find the information but I was able to piece it together. I have listed all the references I have used at the end of the series. What you will soon find out is the architecture of FAST did not really change.

Features

The following are some of the additional features of FS4SP that are added on top of SharePoint Search you should know about right off the bat:

  • Advanced Content Processing – Extract and create metadata by using content within the documents. This will improve search results, relevancy, sorting and refinement. Plus this reduces the human workload to actually create the metadata.
  • Advanced Sorting – Provides ability to sort results based on any managed properties or rank profiles that are available to the current user.
  • Business Intelligence Indexing Connector – Supports the ability to index such things as Excel workbooks, SSRS reports, etc. So for instance there may data within the report, a title in a pie chart, etc. which will now appear in a search result.
  • Contextual Search – Have the ability to customize search results and refinement options based on the user profile or the audience that is performing the search. This is a major feature when it comes to building relevant search results to the user.
  • Tunable Relevance with Multiple Rank Profiles – Ability to create rank profiles that incorporate things such as freshness, authority and quality to provide more relevant results. This is important because these rankings are not part of the index and are applied to the query; making this tunable. There is the ability to identify authoritative pages, you can identify documents for promotion (and associate that promotion based on the user context).
  • Deep Refinement – SharePoint Search only performs a shallow refiner which only allows refinement for the first 50 results in the original query. FAST provides deep refinement which is based on statistical aggregation of managed property values within an entire result set. Because FS4SP provides a deep refiner, the exact count of documents in refiners.
  • Extensible Search Platform – Build complex search solutions, search driven applications, etc. Personally this is where I see lots opportunity to use search in ways you have not thought about before.
  • Extreme Scale Search – The ability to search millions upon millions of documents with sub second time speed. There is no scale boundary when it comes to FAST.
  • Rich Web Indexing – Ability to index dynamic HTML and javascript content with custom indexing connector.
  • Similar Results – When results are returned, there will be a link called Similar Results which the user can click. Basically when a user clicks on the link the search is re-defined and re-executed to include documents that are similar to the result.
  • Result Collapsing – FS4SP can perform a checksum on data within the index and will collapse results into a single result where possible. Now documents that are popular and stored in many places across the organization will not be shown multiple times. The user has the ability to click a link to see where all of the duplicates are stored.
  • Thumbnails and Previews – Review content quickly with thumbnail and preview images in search results. One example is the PowerPoint previewer. In reality this feature is given to us through Office Web Apps however it is bundled as part of the FAST solution offering.
  • Two Way Synonyms – SharePoint Search supports the usage of synonyms but they are only applied to keywords that in the query. However FS4SP also supports the ability to apply synonyms to the documents itself so if the query has a keyword with a synonym but the document only has the synonym, that document will be returned in the result set.
  • Managed Property – With FAST, you have the ability to create metadata mappings and create rules for that metadata and how it will be used to provide a better search experience. Specifically this identifies if the property can be used for sorting (i.e. can the property be sorted in the search results), filtering (i.e. can the property be used to filter), as a refiner (i.e. can be used with deep refiners), priority (i.e. used in ranking algorithm against other documents), and dynamic/static summaries (i.e. dynamic summary will display a hit-highlighted summary of the managed property in the result).
  • Property Extraction – FAST will identify key information like people names, company names, geographic names/location, etc. within a document and then use that data to provide more relevant search results.
  • Rank Profiles – Are a feature of FS4SP which control how relevancy is calculated for all items that are indexed. Rank Profiles can be aligned to type of user to provide them a more relevant search experience or even let users select a rank profile based on what they are trying to accomplish. Rank Profiles in FS4SP provide more features such as freshness, proximity, authority, query authority, context, managed properties
  • Visual Best Bets – Provide the ability to return rich, editorialized results based on keywords. This can as well be tied contextually to the type of user viewing the best bet.
  • Linguistics – Supports the ability for language variations to be used to allow users to find relevant information. What this helps with is finding relevant search based on words and phrases that may not be identical between the query and the indexed item. Such features as character normalization, normalization of stemming variations and suggested spelling corrections are part of the linguistic processing. Linguistic processing is part of both item processing (before the document is indexed) and as well as for the queries that are submitted by the user.
  • Multiple Language Support – FAST support well over 80 languages - http://technet.microsoft.com/en-us/library/ff793350.aspx

Value Proposition

I will be honest, all those features sound great, but taking a step back you really need to say to yourself how will a power search engine help your organization? I actually wrote about this topic from my perspective when I was starting to learn about Enterprise Search - http://www.astaticstate.com/2009/11/why-is-fast-enterprise-search-important.html. I highly recommend that you review this posting understand why Enterprise Search is so important. Upon reading this, you will see how important having FAST is whether you have massive amounts of content or you are just a medium sized organization with 1 million items. At the end of the day:

  • Users are bombarded with information that is located in numerous locations.
  • Users are not aware, educated or trained on how to find that data.
  • In many instances documents do not have good descriptive data making them hard to find.
  • Companies have challenges with retention of people and technology. They are looking for ways keep that data available to the organization.
  • Companies have challenges on boarding new personnel.
  • Companies have challenges communicating across geographical boundaries where information is not shared.
  • Users are challenged with finding the most right information from authoritative experts.
  • Organizations have challenges with customer relationships, satisfaction and retention.
  • Companies are not enabled to bring the offline office community online. Documents are not recognized in the way they are actually used within the organization.
  • Companies have challenges with effectively sharing or selling data they have.
  • Organizations have challenges with knowledge management as information is managed in silos.
  • Organizations are not responsive to their employees and customers.
  • Companies and organizations have challenges bridging together business processes the span across geographies, organizational hierarchies and technologies.

Really the list can go on and on. The utilization of FAST in SharePoint, regardless of how big or small the content source(s) is will ultimately improve a company’s ability to react to these challenges.

Tuesday, January 11, 2011

SharePoint and JSR Compliance

I had an interesting question come up to me a few weeks ago that I wanted to share my notes on. I was asked the question is SharePoint 2010 JSR 168 compliant? Well I did some searching and this is what I came up with. The straight answer is no; it is not. Now there are several really good reasons why:

  • Microsoft is based on numerous open standards. One of them is Web Services for Remote Portlets (WSRP). JSR 168 is a Java standard.
  • What is the purpose of the JSR 168 standard? It is a standard that allows portlets (web parts) to be supported across different Java portal platforms.
  • From what my researched turned up, the JSR 168 standard is supported by many Java portals but still require customization to some degree.

SharePoint 2010 provides a WSRP Consumer WebPart. Using this WebPart an administrator can define producer URLs that are available for consumption, users can choose remote portlets and render them in Web Part pages. The consumer supports any Web Part in strict compliance with the WSRP 1.1 specification. This allows SharePoint technology-based portals to support two portlet models, a local one based on Web Parts, and a remote one based on WSRP. Microsoft has also written Producer toolkits and there are partners that have written accelerators.

Closing thoughts which directly align to some of the references below:

  • I bet there scenarios out there where a company may want to leverage an existing investment in a Java Portlet and render it natively in SharePoint; however I have never come across it yet to date. I recommend linking to it or some sort of service based integration solution. Even if you get it to work, I bet there will be some challenges with user experience which have to be heavily weighed.
  • If you read the JSR definition it talks about eventing, web part communication, etc. All of these sort of things are defined as part of the SharePoint Web Part framework. All developers and Microsoft partners develop to this. ASP.net web parts and user controls can be rendered across version of SharePoint and in ASP.net applications outside of SharePoint. It appears to me that the JSR standard was created as a way to enforce standards and consistency for Java Portlet development in an open source sort of world. That is not needed when it comes to .NET so it is not really fair to hold SharePoint to that criteria.

References

SharePoint 2010 SSRS Local Mode

For SharePoint 2010 you may have heard about SQL Reporting Services (SSRS) running in Local or Connected mode. Here is some clarity on it. Simply put:

  • Connected Mode – is when the SSRS add-in and when SSRS server has been fully installed.
  • Local Mode – Is new to SharePoint 2010 and is a light-weight versus of SSRS that works with SharePoint only. This only has the SSRS add-in with no SSRS server.

One reason local mode was created was to support Access Services because SSRS may not be available to every SharePoint 2010 environment. All reports created in an Access 2010 web database will be converted to an SSRS report when deployed to Access Services. Here is a reference - http://technet.microsoft.com/en-us/library/ee662542.aspx

Now trying to define exactly what light-weight means is not always clear. I found this - http://msdn.microsoft.com/en-us/library/ff487969(v=SQL.105).aspx. Some interesting points are:

  • The Report Viewer is used to render reports in SharePoint.
  • Works with MS Access 2010 reporting extension.
  • Reporting services SharePoint list data extension – so SharePoint list data can be used as a data source.
  • Will only support rendering reports that have an embedded data source or shared data source.

Thursday, January 6, 2011

SharePoint 2010 WCM Highlight Changes Between Page Versions

I was recently asked is version comparison feature in SharePoint 2010 that will show you the actual differences highlighted. My initial answer was no; but I was thinking about documents. For instance, if you have version control set up for a document library and you have Word, PDFs, etc. in there you will not get that capability. But you can access the major and minor versions and compare as you need (or use track versions like in the MS Word document).

However the question posed to me was around Web Content Management (WCM). A new feature of SharePoint 2010 is versioning of the web parts. So if you are doing publishing, the publishing pages will support showing differences between versions. Check this out.

Here is a simple publishing page from a demo environment that I have.

clip_image002

Now here is a view of the publishing page versions. Here you see the differences between version two and three and that I put in “This is a test”.

clip_image004

Even more interesting, here are two screenshots that captures the differences between versions 1 and 2. Notice, that the changes that are captured are not just the content in text field I had on the publishing page, is shows all of the differences of all the metadata fields. In here I made comments, I added an image, added a bunch of new text, and I changed some of the metadata fields.

clip_image006

clip_image008