Monday, January 17, 2011

Part 4 - FAST for SharePoint 2010 Farm Configuration

Part four of this series is going to focus on configuration of FAST for SharePoint.

Configuration of the FAST for SharePoint Farm

Now the next question is how to configure this FAST farm because in many instances you know how to do SharePoint but FAST is a foreign concept. Well the configuration of the FAST for SharePoint 2010 farm is not configured through SharePoint 2010 Administration. At a high level, there is an xml file that you need to create that captures what FAST for SharePoint components are configured on which server in the FAST farm. This xml file basically drives the entire configuration of the FAST farm.

In this part of the series I am going to give an introduction and pointers to information on how to configure your FAST for SharePoint 2010 Farm.

The best resource for you to begin your understanding of a FAST for SharePoint is “Deployment guide for FAST Search Server 2010 for SharePoint” located at - http://go.microsoft.com/fwlink/?LinkId=204984. This whitepaper pretty much has it all. This really just a supplementary with some added details.

Understanding this document will go a long ways in understanding the architecture of FAST for SharePoint. At a high level:

  1. There are several service accounts, firewall configuration, IP address work, windows updating, anti-virus and proxy settings you need to do before you can even start the installation.
  2. Next I would make sure you have SharePoint 2010 installed and then make sure that the servers that will host the FAST for SharePoint farm have access to the same SQL environment.
  3. Next you need to install FAST for SharePoint onto each server in the FAST farm. There is a prerequisites installer that you need to run to make sure all the requirement components are on the machine.
  4. Then you need to configure the FAST for SharePoint farm. There are two options: stand-alone or multiple server farm. The stand-alone is as simple as it sounds and you need to go through the configuration steps.
  5. The multiple server farm configuration entails the creation of a deployment.xml file that I referred to earlier. After installing the FAST for SharePoint bits on each server in the farm, you will go through a configuration process which will use the deployment.xml file to activate components on that server. We will take a deeper looking into that shortly.
  6. Next we need to create the FAST Search Connector service in SharePoint Central Administration which will send SharePoint content to FAST. You will need to configure SSL for the communication between the two.
  7. Finally you will need to set up the FAST Search Query service in SharePoint Central Administration which crawls people information and calls the FAST for SharePoint query servers. There is an extra step of this configuration which requires claims authentication to support the call to the FAST query servers.
  8. There are several other steps in this whitepaper that layout how to get the SharePoint search centers set up and to test your installation.

For the remaining parts of this blog, I am going to focus on steps 5, 6 and 7.

Deployment XML File

As I mentioned, the deployment.xml file is the key the deployment the FAST for SharePoint farm. There is no nice GUI that will show you the farm and allow you to configure it. In step five you will install the FAST for SharePoint bits on each server of the FAST farm. Then on the machine that will have the admin component installed on it, you need go through the configuration process. The deployment.xml file will be used. Then you will go to each other server in the FAST farm and configure it using the same deployment.xml file.

Now you see the important of this file and why I wanted to focus on it. I was able to find some good references and examples.

First let’s talk about the deployment.xml file. You need to read this - http://technet.microsoft.com/en-us/library/ff354931.aspx - albeit it may be a boring read however it shows how it works. Below is a listing of the components that I provided in the blog with a mapping to the XML nodes.

  • Administration component - <admin>
  • Document Processing component - <document-processor>
  • Content Distributor component - <content-distributor>
  • Indexing Dispatcher component - <indexing-dispatcher>
  • Web Crawler component - <crawler>
  • Web Analyzer component - <webanalyzer>
  • Indexing component - <searchengine>
  • Query Matching component - <searchengine>
  • Query Processing component - <query>

All of the tags described in the deployment.xml reference are important and several do not map directly components. It was not easy at first to gain an understanding of how this works. The best way to do this is to basically open up the deployment.xml reference and then review some examples of deployment.xml files to really understand how it works.

The following are several places where I found examples of the deployment.xml file:

To save myself time, I am going to pull one of the examples from the FAST Search Server 2010 for SharePoint Capacity Planning document here. I picked this one because it really shows what a FAST Medium size farm that is scaled out. I am not saying this is the best configuration to start off the bat either – read the FAST Search Server 2010 for SharePoint Capacity Planning to find a base farm that best meets your needs and tweak as needed. In many cases a simple three server farm configuration is the best place to start (i.e. a small server farm deployment which was shown earlier in this series). But it would be no fun to talk about J

So here is a picture of a medium farm. As you can see there are:

  • This farm has two rows and three columns.
  • On the first row, document processing components have been spread across multiple servers to provide good content consumption.
  • The first row also really dedicated to indexing as well as the Indexing components will be turned on their along with Query Matching components.
  • The second row is dedicated to searching as there are Query Matching components and Query Processing components.
  • Note that components such as content distribution and index dispatcher have been turned on in the first row.
  • Looking at the very first server in the farm, you may ask why there are query and document processor components are there too? The justification was that they would not be actively referenced by the FAST farm however available if maintenance is being done on the FAST farm.

Now honestly, given all the information I have provided thus far in this blog series, there are a few things I would potentially change in this configuration to ensure that I had a highly available FAST farm. I will get into those changes shortly.

image

Here is the deployment.xml file for this farm configuration above:

<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="M4"
xmlns=”http://www.microsoft.com/enterprisesearch
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">

<instanceid>M4</instanceid>

<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M4.jdbc]]>
</connector-databaseconnectionstring>

<host name="fs4sp1.contoso.com">
<admin />
<query />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="12" />
</host>

<host name="fs4sp2.contoso.com">
<content-distributor />
<searchengine row="0" column="0" />
<document-processor processes="12" />
</host>

<host name="fs4sp3.contoso.com">
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="1" />
<document-processor processes="12" />
</host>

<host name="fs4sp4.contoso.com">
<indexing-dispatcher />
<searchengine row="0" column="2" />
<document-processor processes="12" />
</host>

<host name="fs4sp5.contoso.com">
<query />
<searchengine row="1" column="0" />
</host>

<host name="fs4sp6.contoso.com">
<query />
<searchengine row="1" column="1" />
</host>

<host name="fs4sp7.contoso.com">
<query />
<searchengine row="1" column="2" />
</host>

<searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="none" search="true" />
</searchcluster>

</deployment>


Now looking at that deployment.xml file, here are some notes that will help you better understand it:

  • The <deployment> tag is a wrapper tag. It has some attributes for you to manage the version, last modified date, etc. Highly recommend you use these attributes for configuration management reasons.
  • The <instanceid> tag can be used by SCOM.
  • The <connector-databaseconnectionstring> is the location where you specify JDBC connection strings if that connector is being used.
  • The <host> tag is a specific server in the farm. Within the <host> tag is where you identify all of the components that will turned on in a specific server. This where <admin>, <document-processor>, <content-distributor>, <indexing-dispatcher>, <crawler>, <webanalyzer>, <searchengine> and <query> are defined. You can read the specification about these tags as they are pretty straight forward in their configuration.
  • The <searchengine> tag is the most important component tag when comes to understanding how many Indexing and Query Matching servers there are. In the <searchengine> tag, you specify the row number and column number however you do not specify here if the Indexing and/or Query Matching components are turned on. This is done in correlation with the <searchcluster> and <row> tags.
  • The <searchcluster> tag is an important tag that is used to wrap the <row> tag. Earlier in this series I presented a diagram that discusses columns and rows. These <row> tags are used to define the number of Indexing and Query Matching component rows that are being used in the farm configuration.
  • A <row> tag can be defined as a primary, secondary or none using the index attributed. There can only be one primary index row. Marking the <row> tag as primary indicates this is the primary row of servers with Indexing components. Marking the <row> tag as secondary means the row is redundant index row. Finally, marking the <row> tag with the value of “none” means there is no Indexing components on this row.
  • The <row> tag also has an attribute called search which can be true or false. If marked true, Query Matching component is deployed on that row.
  • So here are some examples of the <row> tag which find critical to understanding the deployment. There are more permutations however these are the ones you will run up against the most:
    • <row id="0" index="primary" search="true" /> - this means this is a primary row with the Indexing component. As well there are Query Matching components on that row.
    • <row id="0" index="primary" search="false" /> - this means this row is completely dedicated to indexing.
    • <row id="1" index="secondary" search="true" /> - this means this second row has redundant Indexing components and the primary purpose of this row is for Query Matching components.
    • <row id="2" index="none" search="true" /> - this means this row is completely dedicated to Query Matching components only.
  • The <row> tag is tightly correlated back to the <searchengine> tag. Having a <searchengine> tag defined for <host> tag indicates that row will have Indexing and/or Query Matching components. As you can see, the <row> tag actually controls which one is active on that server.

As I mentioned earlier, I believe there are some changes I would make to this farm to ensure there is some redundancy. For instance I would change the <row id="1" index="none" search="true" /> to be <row id="1" index="secondary" search="true" />. The reason being is if the index server were to fail, the query server in that columns would redundantly become the index server until the other server were fixed. Now you could experience some query performance issues while this is going on if you have high volumes of content that are being consumed.

The way the redundancy works is FIXML document (which is the searchable representation of every document in the index) will be replicated to the second row. However an index will not be actually built from the FIXML document. When an indexing component fails, a new index will be built using the FIXML documents on the redundant Index row.

Once you start to get the hang of it, this deployment.xml file is not too hard.

When you have completed an installation on a specific machine, run the following command

nctrl status

This will return the status of all the FAST components that are running on that machine.


image

Another thing you can do is go into the services console and check out the FAST services.

image

In the next part of the series I am going to focus on the SharePoint side of the configuration.

No comments: