Monday, January 17, 2011

Part 3 - FAST for SharePoint 2010 Physical Architecture

Part three of this series will now focus on the physical architecture of FAST for SharePoint farm deployment.

Physical Architecture / Topology

Now am going to take what we have learned about the logical architecture and scaling and apply it to some farm scenarios. From what I know, scaling a FAST for SharePoint 2010 farm is a similar to scaling a SharePoint 2010 farm. We basically need to analyze the requirements and the scale the FAST components appropriately to meet those requirements. Some of the immediate ones will come up are:

  • How much content needs to be searchable?
  • What is the total number of items that need to be searched?
  • What is the format, size and any other interesting characteristic of the items to be searched?
  • What sort of availability is needed to support search?
  • How fresh must the search results be and how often is content changed?
  • How many users will be performing searches concurrently?

This is probably just a starting point. I actually wrote a set of questions that need to be asked here.

Up this point we learned about the FAST for SharePoint components such as:

  • Administration component
  • Document Processing component
  • Content Distributor component
  • Indexing Dispatcher component
  • Web Crawler component
  • Web Analyzer component
  • Indexing component
  • Query Matching component
  • Query Processing component

We have also learned about how they scale. Now honestly, it is pretty much impossible for me to give you all the examples out there on how to scale FAST. So I am going to pick ones (which I shamelessly already have pictures for) and discuss those J

Minimum Deployment

This following is a very basic, non-redundant implementation of SharePoint 2010 and FAST for SharePoint 2010. It is never really commended to install both on the same machine unless you are creating a local development environment. In both of these scenarios there really is no scale other than SharePoint and FAST are on different machines.

In the case of FAST all of the core components such as admin, document processing, content distributor, index dispatcher, web analyzer, indexing, query matching and query processing are all deployed to a single machine.

image

It is interesting to note here that SQL is not heavily utilized by FAST for SharePoint. FAST only needs to access configuration information from SQL. This is different than the out of the box SharePoint 2010 Search which heavily utilizes SQL as part of the indexing and querying process.

Small Deployment

The following is considered to be a small deployment of FAST for SharePoint. You will notice there is both a SharePoint and FAST farm. I am not going to discuss the SharePoint Farm; you can read my blog on how to scale SharePoint 2010 farms. However let’s look at the FAST for SharePoint 2010 Farm. As you can see there are three boxes.

image

The first box:

  • Has the Administrative component which can only be installed on one machine in the farm.
  • Has the Content Distributor which is responsible for routing documents to document processors.
  • Has a Document Processor component with 12 processes running. The more processes you have running the more content can be consumed. The number of processes that can be supported on the machine is available cores on the server.
  • Has an instance of the Web Analyzer running.

The second box:

  • Has an additional document processing component.
  • Has the index dispatcher component which is responsible for sending processed content to be indexed.
  • Has the indexing component, query matching and query processing components installed.

The third box:

  • Has redundant indexing component, query matching and query processing components.

In the end this is a really simple / standard implementation that can handle in the range of 10+ million items and managed about 10 queries per second.

Medium Deployment

This is a medium sized FAST for SharePoint deployment. This should be able to index roughly 40 million items and continue to support 10 queries per second.

image

Observations:

  • The first two FAST servers have all the administration, content distributors, and web analyzer components. It has also document processing components.
  • The following six servers have been configured used columns and rows patterns I introduced earlier.
  • There are three index columns to partition the content to allow for a higher volume of content than the previous.
  • On the first row we have additional document processing components. This will improve consumption and processing of content. Plus it adds redundancy.
  • As well on the first row index dispatching and indexing components have been installed.
  • On the second row are secondary indexing components which are redundant copies of the indexes on the first row. It is not exactly clear in the diagram but the query matching and query processing components are configured. This way there are dedicated machines to support the querying of content.

Large Deployment

This final farm is very similar to the previous other than some of the administration components have been scaled to a third server and there are now a total of six index columns to support up to 100 million searchable items.

image

As I said before, this is by no means the only three configurations of the FAST for SharePoint farm. These components can be scaled however to best meet the requirements.

References

2 comments:

Unknown said...

Hi Jason,
Here is a question on a medium size configuration. I would have:
1. Admin with query, web analyzer, and 12 docprocs
2. Non-Admin with Content Dist. searchengine, and doc procs
3. QR server with content dist. column 0 and doc procs
4. QR index-dispatcher, column 1

About 40 million docs, different indexing (some once a month others once every couple of hours. Low QPS is expected maybe at most 5 to 10.

Do we need another row which is a substantial cost. Don't we still have Fault Tolerance if one of the QR servers go down?

Thanks.

Mark

MarkSmith0306@att.net

Jason Apergis said...

A four farm configuration is a very common place to start. I am going to have to give a generic response - sorry about that. I say right up SLA around what you can support, test and have a plan on how you plan to add in more boxes or reconfigure services when you need to.