Monday, January 17, 2011

Part 2 - FAST for SharePoint 2010 Logical Architecture

This is part two in this FAST for SharePoint series and it will focus on the logical architecture.

Logical Architecture

If you are a reader of my blog, you will notice I always start out with understanding the features (previous section), then understand the logical architecture and then finally understand the physical architecture. I have always said the most common mistake is to make is jump right to the physical architecture because everyone wants to know how many “boxes” they need. However both the features and logical architecture heavily influence the physical architecture.

If you are an experienced SharePoint professional, right off the bat you want to know, how are the FAST services made available to SharePoint? Hopefully the below diagram will clear it up.

image

Here is the reference to the above diagram.

If you have been reading up on the new SharePoint 2010 Service Architecture you will know that it is much more scalable than the prior version. Basically there are two services that have to be configured when using FAST for SharePoint:

  • FAST Search Connector service – In the above diagram, this is the FAST Content SSA. This service is responsible for feeding content to FAST for SharePoint farm. Architecturally, it is performing the same function as connector application would in the FAST ESP 5.3 product. This service is configured in Central Administration and you configure it like the out of the box search (i.e. set up content sources, rules, etc.). The configured content sources will be used to feed the content to FAST for indexing. One thing to point out is the all the data for the content sources will flow through this connector whether they be in SharePoint, file share, public exchange folder, etc. All content will be fed to the FAST Content Distributors which will subsequently send the content for processing.
  • FAST Search Query service – In the above diagram, this is the FAST Query SSA. This service has two responsibilities. First and foremost it is responsible for forwarding all search queries to the FAST farm query service. The FAST farm query service has the responsibility of building the search results and then returning them to SharePoint FAST Search Query service. Second the SharePoint FAST Search Query service is responsible for performing all people searches. This service actually performs the indexing of user profile information and also performs the querying of the data in the user profile index. In summary, when a query is made content data will be retrieved from FAST while user profile data will be returned from this service.

Now let’s take this one step farther by understanding to understand the roles and responsibilities. In the diagram below you can see how it aligns with the previous diagram:

  • On the far right we see the FAST Search Connector service which is configured to feed content from other locations.
  • On the bottom far left we see the FAST Search Query service which will receive queries from SharePoint web parts and then perform a query for content in FAST as well in the user profile index it has built.

In the middle you can see several services that are part of the FAST for SharePoint Search farm.

image

Here is some background on these services. If you read up on my FAST ESP 5.3 series, you will see that the architecture is not really different. Starting on the far right within the “FAST Search Server 2010 for SharePoint Farm” area:

  • FAST Indexing Connectors – This is where additional FAST Indexing Connectors reside like the JDBC and Lotus Notes connectors. These connectors are conceptually no different that the FAST Search Connector running in SharePoint. They are responsible for feeding content to FAST. The only difference is these are configured as part of the FAST server deployment configuration file and they are not configurable in SharePoint Central Administration.
  • Item Processing – This is the where a lot of the secret sauce of FAST for SharePoint 2010 lives. There are a couple sub-components which are not shown in this picture that you should be aware of. First there are Content Distributor(s) which have the responsibility for directing content to the correct Document Processing Pipeline for processing. Document Processing Pipeline does much of the FAST magic such as entity extraction, linguistic processing, and content normalization.
  • Indexing – This is the component that manages the content that is produced by the Document Processing Pipelines. We will talk about scaling later, but there is a good chance that there will multiple Index nodes in the FAST for SharePoint farm. Multiple index nodes are needed to support searching large volumes of content in a quick amount of time. The concept is very similar to the SharePoint out of the box search where you create Index Partitions to allow for quicker querying of large volumes of content. Another sub component you should be aware of is called the Index Dispatcher which is responsible for routing a searchable item to a location on the index.
  • Query Matching – This component is responsible for actually searching and retrieving items from the index. It will build the result set from the Index node that it is associated to. This is another component that can be scaled to assist with improving performance. It will also do things such as return a summary of content for an item, highlight the search terms and supports both shallow and deep refiners. This component is the same as the Search Node in the FAST ESP 5.3 product and is similar to the Query Component in the SharePoint 2010 out of the box search.
  • Query Processing – This component will perform both pre and post query processing. This is also the component that the SharePoint FAST Search Query service connects to. For pre-querying such things as language parsing, linguistic and security processing will be applied. For post-query operations such as result merging from multiple index nodes, formatting and duplicate removal will be performed.
  • FAST Search Authorization (FSA) – This component works with the query processing component to ensure that user performing the search only has access to content they have permission to.
  • Web Link Analysis – This is also referred to as the Web Analyzer. This component provides various different features to improve the relevancy. For instance if a piece of content is linked to a lot by other content sources, that piece of content will be considered more relevant than others. Also this component does click-through analysis from the search results. The logic being if the content is clicked on a lot in the search results it may be more relevant than other content.
  • Web Crawler – This component is not shown in the diagram above but you will run across it in your FAST for SharePoint research. This is a component that will crawl and feed web content to FAST. This component does not have to be used if SharePoint FAST Search Connector service has been configured to crawl websites in SharePoint Central Administration.
  • Administration Component – This is the component that manages the FAST farm. The interesting thing about this one is that it cannot be scaled or made redundant however if this component were to fail that FAST server will continue to run in the configuration it is currently running in.

Scaling the FAST for SharePoint 2010 Architecture

In this section I am going to give an introduction into how FAST for SharePoint 2010 components can be scaled.

The diagram below comes from this MSDN article. In the previous diagram discussion I introduced the two components called Indexing and Query Matching. If you have been doing research on FAST you have probably heard about creating columns and rows to support better querying and indexing. This applied directly to both Indexing and Query Matching components.

An entire index can be made up of multiple index columns. Breaking the index into multiple columns is commonly referred to as partitioning. Searches are performed across the search rows, which in turn search all of the index columns. The Query Processing component that I introduced earlier (which is different that Query Matching) is responsible for merging all of the search results, from all of the index columns together into a single result set. It is not depicted in this diagram but there will be Query Processing components for each column. This is conceptually very similar to the new out of the box features of SharePoint 2010 Search.

image

The point of this diagram is show how FAST search is scaled, and scaling the index and query nodes are the most common. Here are some basic rules you should know right off the bat:

  • When there are more Index Columns – more content volume can be managed and better search performance can be realized.
  • When there are more Index Rows – there is better fault tolerance as there is a primary Indexing node and back-ups. Should the primary fail; the back-up will kick in.
  • Adding more Search Columns – will provide better search performance against a large volume of content.
  • Adding more Search Rows – will provide not just fault tolerance but also better performance as queries can be load-balanced across the search rows.

The information you just learned will help you understand how many servers of FAST you may potentially need.

Now I know the next question you may be asking is “when do I need to add a new index column”? Well the guidance that I found is depends as I have talked with FAST folks and say 10 to 25 million items can be managed per index column. Drivers of how many columns are based on this such as:

  • How much data is being consumed?
  • How often is data fed?
  • How quickly does the data need to be made available?
  • How many concurrent queries will be made by users?
  • Etc.

Now the Indexing and Query Matching components are not the only components can be scaled. Pretty much every component, with exception of Administration, can be scaled:

  • FAST Indexing Connectors – Multiple indexing connectors can be added to better supporting feeding of JDBC and Lotus Notes data.
  • Item Processing – The Document Processing Pipeline I referred to earlier can be scaled. You will want to add more item processors to support the rate at which content is fed into FAST. This is important if you are trying to reduce index latency which is the amount time it takes to make content searchable in the index.
  • Content Distributor – I earlier mentioned this as part of the Item Processing component. Adding multiple Content Distributors will add fault tolerance but there were only ever be one primary.
  • Query Processing – If you recall this is the component that does the pre and post processing around making a call to the Query Matching nodes(s). More Query Processing components can be added to better support queries per second and to provide fault tolerance.
  • Web Link Analysis – Can have multiple instances added to reduce the amount of time needed to complete the analysis.

Hopefully this has provided you a good introduction into how FAST for SharePoint 2010 can be scaled.

No comments: