Saturday, January 26, 2013

New Capabilities eDiscovery with Office 365

Introduction

I can happily say I am very excited about the new eDiscovery capability that is part of SharePoint 2013. I am even more excited for this capability being delivered with Office 365. Up to this point in Office 365 you have had the ability to do eDiscovery in the cloud but you have had to execute multiple searches. Now with the new features that will be released, there is a unified eDiscovery for Exchange, SharePoint and Lync data in Office 365. Additionally there are some new features for Exchange 2013, which are available in Office 365, which allows more granular way of eDiscovery and then placing a legal hold on that item.

The following are my notes and some additional resources which you will valuable as you start exploring this technology more.

  • The new eDiscovery solution is part of SharePoint Online Plan 2 and is part of the SharePoint 2013 Enterprise CAL on-premise.
  • This solutions allows you to search, hold, and export content from Exchange and SharePoint. Lync instant messages can be captured when they in Exchange’s conversation history folder.
  • The high level process is you create a case, identify locations of where to search, and any filter to find it. You can manage sources, eDiscovery Sets (sources/filter combos), queries and all exports done. All of these operations can be done from a site in SharePoint Online.
  • The major steps are to Create a Case >> Create a eDiscovery set to find and preserve content (optional in-place hold) >> Create a Query to find and export content (previewing and filtering) >> Release Hold

General Legal Hold Improvements for SharePoint and Exchange

With SharePoint 2013 delivered in Office 365, there will be some additional new features available.

  • The state of the content is recorded, thus allowing users to continue to work with the data. With SharePoint 2010 technologies this was not possible as once the item placed on Hold, it is locked down until the Hold is released. Even though users have the ability to edit or delete, SharePoint will ensure there is no loss of an item that was placed on hold. Discovery managers will continue to have access to all the data that was put on hold. There is a new special SharePoint library created at the site level to handle edit and delete scenarios. Basically when anyone of those transactions occur, the file or item on hold will be stored there.
  • Preservation can be done at the site level now. Users can continue to use preserved content.

With Exchange 2013, there is again a several new features that are available.

  • With Exchange Server 2010, the notion of legal hold was to hold all mailbox data for a user indefinitely or until when hold is removed. The legal hold was placed at the mailbox level. With Exchange 2013 in Office 365 you can now determine what to hold and for hold long.
  • Indefinite Holds – This is the way it was done with Exchange 2010 and it is still available. The entire mailbox is put on indefinite hold; nothing can be deleted and edits are managed until the mailbox is released.
  • Query Holds – This is new. If you need the ability to query for items to be placed on hold, this is now supported. When items are found in the query, just the items are placed on hold. Additionally this does support not just finding existing items, but to future email items that have not arrived yet.
  • Time-based Holds – is the idea where legal hold and retention policies are used in conjunction with each other. Specifically this allows you to place a hold on items for a specific period of time which is calculated from the date the item is received or when the hold is created. What this allows you to easily do is create a rule to ensure that all items are retained for X days / years. For instance a policy that says all emails must be kept for 90 days. When an item is deleted out of the user’s mailbox, it will be retained for the remaining amount of time required by the policy.
  • Multiple Holds – this is new where by a user can be placed on multiple holds. When a user is on multiple holds, all of the holds are applied together using an OR operator.
  • In summary, with the addition of these new in Exchange Online you will be able to place an entire mailbox or specific items on hold; Email will be preserved whether a user or process edits or deletes an email; Users can be placed on multiple holds; Items can be held indefinitely; Legal hold can be made transparent to the user; eDiscovery searches can be done on items that have been placed on hold.
  • One Small Note – the In-Place hold utilizes the Recoverable Items folder which is a replacement to the “dumpster”. The Recoverable Items folder is used in support of Legal Hold. There are four sub folders which are used to manage items. First there is the Deletions Sub Folder which is used when a user were to shift delete from inbox or when an item is deleted from the deleted items folder in their mailbox. Users have the ability to recover items from that folder using the item recovery feature from Outlook or OWA. Second there is the DiscoveryHold Sub Folder which manages items that were on legal hold but deleted by the user. Third is the Versions Sub Folder which manage email items that edited and all versions of the edits. For both DiscoveryHold and Versions Sub Folders items will be removed once the hold is released. None of the data stored in these three folders count against the end-user’s mailbox size limit either. Finally there is the Purgres Sub Folder which is responsible for deleting items once all rules have passed.

If you are deep in both SharePoint and Exchange, what you may not is some convergence. SharePoint has always had query based holds which has been brought forward into Exchange. As well Exchange has always allowed user to still work with their mailbox even though they were on hold which SharePoint did not allow. Additionally SharePoint only support legal hold at the item level and not the site level (which is similar conceptually to a mailbox). This is great and is needed to support the next part of this discusion!!!

eDiscovery Sites

Now let’s talk a little bit about this new eDiscovery site that will be available in SharePoint Online to do eDiscovery on Exchange, SharePoint and Lync.

First you create a case using a site template; pretty straight forward process. Once the site is created you will have the some major buckets of features. There are Discovery Sets which facilitates define the sources the search and creating holds. Second there are Queries which search those sources, allow for previewing of results and ultimately exporting the data.

clip_image002

The below screen a new discovery set. As you can see you have the ability to identify the sources, filters, data ranges, etc. for information you want to hold. You have the ability to define what type of hold you want, and in most cases an In-Place hold will be utilized.

clip_image004

Then once items have be placed on hold, you can use the Search and Export feature to run queries to narrow down and find items. Once you find those items, you can export them.

Below is a screen capture of shows has a query can be created across your held data sources. You have the ability to run the query, get information about that query and even preview items before you do an export. You can see that when the Exchange tab is selected you can see all the email, contacts, etc. objects that were round.

clip_image005

The following is from the same query above and shows you the SharePoint data that was returned.

clip_image007

Finally you can export a query once you have it fined tuned. The big question everyone always asks is how the data will be exported. Data will be exported conforming to Electronic Discovery Reference Model standard:

  • SharePoint Documents – in their format
  • Lists - .csv file will be created.
  • Pages – will be exported in MIME HTML (.mht) files
  • Exchange Objects – email, tasks, contacts, calendar, attachments exported in .pst.
  • Additional XML manifest that complies with EDRM is provided that captures all the information exported.

clip_image009

Resources:

Friday, January 25, 2013

New Exchange Online DLP

There is a great new feature of Exchange 2013 that will be part of Exchange Online. I am really happy to now have native Data Loss Prevention (DLP) features to share with customers. Up to this point, you have been able to utilize Transport Rules to implement light DLP however if you wanted to implement real DLP organizations were required to manage an appliance on premise to support this. Now organizations have the ability to now remove that dependency and utilize DLP delivered through the Office 365 cloud.
Below are some notes and resources that you should know about. The new DLP capability:
  • The goal is to help identify, monitor, and protect sensitive information from leaving the organization.
  • DLP can be configured through the Exchange Administration Center.
  • You have the ability to start with a pre-configured DLP templates to detect information such as PII. You have the ability to create custom templates with sensitive information types. This will save you a lot time.
  • Types – Detect sensitive information in attachments, body text, subject lines and adjust sensitive level to take action rules (transport rules).
  • DLP Policies are tied directly into Transport Rules. They are no more that packages of conditions, transport rules, action and exceptions.
  • Transport Rules – You have the ability to coordinate DLP rules with Transport Rules and create actions to capture information. Transport Rules look for specific conditions on a message and then takes action on them. Transport Rules let you apply messaging policies, secure messages, protect message and prevent leakage. You can prevent information from leaving, filter confidential information, track / copy messages sent / received by individuals, redirect email for inspection, apply disclaimers, etc. You have the ability to incorporate classifying sensitive information. Additionally you can perform content analysis through keyword matches, dictionary matches, regular expression, etc.
  • Testing – There is the ability to test rules before actually enforcing them. This is possible by creating rules but not activating. Email-flow is not affected until they are finalized.
  • Policy Tips – This is truly a great feature in that preventive actions can be taken with an end user before they actually send an email that could violate a DLP policy. Policy Tips to show users warnings in Outlook in the same manner as Mail Tips. This does require Outlook 2013 client.
  • Reporting – DLP Reports are available and can create own specific reports to monitor issues.
Some additional deeper notes:
  • There are three ways to create a template – 1) Create template using an OOB one. 2) Import ones 3) Start from scratch. There are OOB templates like PCI Data Security Standard, US Financial data, U.S. Gramm-Leach-Bliley Act (GLBA), HIPPA, Patriot Act, PII, etc. Types of common modifications could be to make certain types of users exempt from specific policies for specific situations. Or maybe even invoke RMS in certain situations when a DLP policy may be broken. This native integration into Exchange Online itself is really exciting.
  • There are Sensitive Information Types like a US SSN, Drivers License Number, etc. It is the common rule to find that type. You have ability to create XML files that can be imported through PowerShell to do customer ones. You have the ability to create Entity Rules which define identifiers like SSN. Then there are Affinity Rules which are targeted towards documents. Built of a multiple evidence rules when aggregated together and matches happening in proximity to each other can constitute a DLP policy being triggered. So depending on how many times a rule it tripped in a single item can create a DLP policy to be tripped.
  • Sensitive Information Rules can be used to with transport rules to create hard and soft rules. There is a new “If this message contains…Sensitive Information” transport rule. This can be used with existing transport rules and Boolean logic. For example: Limit interaction between recipient and senders – between internal groups and external groups, Applying separate policies for communications internal and external, Prevent inappropriate information from entering or leaving, Filter confidential information, Tracking or archiving messages sent / received by specific individuals, Redirect inbound / outbound message for inspection before delivery, and Disclaimers.
  • DLP Supported File Types – All the core file types are supported (including zips and cabs). However if there is an unknown file attached that must go through DLP evaluation an exception will be raised to allow you to take action. For Exchange Online you cannot extend this like you can on-premise because you need to create your own IFilter packages which is not supported in the cloud.
References:

Windows Azure AD Whitepaper

Here is a new Whitepaper on entitled “Active Directory from on-premises to the cloud” - http://www.microsoft.com/en-us/download/details.aspx?id=36391

This is a great whitepaper if you want to learn more about Windows Azure AD. Additionally you will find out that Windows Azure AD is the AD solution that is used for Office 365. This whitepaper gives insightful information on:

  • How AD Authentication actually works inside of Office 365.
  • There is some great information in here that explains how once you get it set up, it can be utilized across Office 365 and Azure services.
  • There is information in here that will shed light on how Office 365 is support non-AD directories. This is because you can sync your corporate LDPA directors up to Windows Azure AD and Office 365 will utilize it.

Wednesday, January 9, 2013

SharePoint 2013 and SharePoint Online Upgrade Notes

Introduction

For SharePoint 2013, there is a lot of great stuff coming for performing upgrades. I may have mentioned this recently on my blog but a lot of this optimization in upgrading SharePoint. Much of this is driven by experience from the five major releases of the SharePoint however SharePoint Online is proving to be a strong catalyst for improvement. When SharePoint 2010 was released we saw how the architecture of SharePoint completely changed, and much of that change is associated to a long-term Microsoft vision to deliver SharePoint through the cloud. Whether you are implementing on-premise or purchase SharePoint Online, you will be able to benefit.

I know many folks have done upgrades in past versions of 2003, 2007 and 2010 and have dealt with a lot of work to get through those. There are a mixed bag of reasons of why upgrades were complex. When I was consulting I would also remind customers to design and implement for the future. Every large SharePoint customization should have a path for the future. There is really too much to go into here and not the topic of this posting….

In this blog I am going to discuss some of the big improvements that you should know right off the bat in regards to performing SharePoint 2013 upgrades. Second I am going to discuss how these new upgrade changes apply to SharePoint Online.

The following are some of my notes I have captured from multiple events and presentations.

New Facts About Upgrades for SharePoint 2013

The following are some of the big facts that you should know about SharePoint 2013 Upgrades:

  • Upgrade Approach – The only approach available for SharePoint 2013 upgrades is DB Attach. There is no more In-Place Upgrade (2003, 2007 and 2010) or Gradual Upgrade (2003).
  • Upgrade Versions – It is pretty much the same, you will need to be on the previous version to perform an upgrade; which will require you to be on SharePoint 2010. There are numerous third-party solutions available in the marketplace that can assist you to upgrade from older versions of SharePoint to SharePoint 2013.
  • Requires New Server Instances – Given the fact that you must do DB Attach, you will have to create SharePoint 2013 servers. You cannot upgrade your existing SharePoint 2010 servers.
  • Important >> Site Collections – Upgrading is now focused at the site collection level. Databases will be attached into the new SharePoint 2013 farm however the actual upgrade itself will be done at the site collection, when the site collection is ready. This is a really important point; “when the site collection is ready”. This means SharePoint 2010 site collections will run in full SharePoint 2010 experience on a SharePoint 2013 server until such time it is decided that it will be upgraded.
  • Improvement >> More Well Known There is a new capability for performing an upgrade health check before actually moving forward with an upgrade. The goal is to provide a capability that will tell organizations that an upgrade will fail prior to it actually failing. A health check report will provide detailed warnings and information on issues to remediate. This is extensible and can build own custom rules. These rules that can detect and custom repair operations can be created for specialized scenarios you may need to support.
  • Improvement >> Evaluation Site – This is an exciting capability. There is a new capability that will provide you the ability to spin up a new site collection to preview an upgrade of a site collection before it is actually done. This is a great capability to test and remediate issues before you actually perform an upgrade. This is basically a replacement of the visual upgrade process that was provided in SharePoint 2010 (in my opinion a better solution). These evaluations site collections cannot made be permanent nor is it recommended for long-term usage. There is configuration available to administrators to control the maximum size for an evaluation site (for instance when an administrator did not put on site quotas and you have TBs of data sitting in a single site collection). There is PowerShell available to automate the creation of evaluation site collections (which can come in real handy if you need to do some automation around testing large amounts of site collections). As well an expiration date can be applied to the evaluation site collection to ensure that these sites are not accidently used for too long.
  • Improvement >> Quicker Upgrades – With the focus of moving to Site Collection upgrades versus an entire content database upgrade, significant amounts of time is saved when actually performing an upgrade. This is because you upgrade the farm but all the site collections (which require all processing) can be spread out. This reduces having to do big bang upgrades.
  • Improvement >> Communications – New features have been implemented that will communicate to users via email during an upgrade. There are events and different email templates you can work with to communicate the status of an upgrade. Additionally a system status bar will also be displayed that an upgrade is being performed. Administrators have the ability to customize the messages displayed to the user to give customized instructions and information.
  • Easy User Transition – As I mentioned earlier, upgrades are done at the site collection level and the SharePoint 2010 experience will remain until in effect the site collection is upgraded. This means you can start delivering new SharePoint 2013 solutions in other site collections while continue to run SharePoint 2010 solutions until you have created a proper transition path for the end users. Additionally, this is great for user transition as it will allow end users get user to SharePoint 2013. Organizations have the ability to strategically identify site collections for upgrade to SharePoint 2013 and then select which ones that should go first.
  • Initiating an Upgrade – Upgrades of site collections can be automated through PowerShell, administrators can manually execute them or they can be delegated to site collection admins (which may be a person within the business). Remember there are still controls available to control how site collection admins can do it (max sizes, specific site collection blocks, etc.).
  • Queuing Site Collections – There is a new capability to throttle the number of parallel upgrades. This can become important if you need to coordinate the upgrade of a lot of production site collections. Specifically you do not want to overload your SQL Servers databases. The Web Application processes will wait for processing capacity and database space before executing. Time jobs are responsible for assisting with this. Large, oversized site collections will not be done through the Web Application; it would be done through a timer job. The queue can be managed if you need. Note that each queue is assigned at the content database level. Even if there is a failure (for whatever reason) the upgrade will be re-initiated in the queue. Finally PowerShell is available to place site collections in the queue or even change the throttle.
  • Logging Improvements – Additionally there have been many new improvements for logging. ULS Logs can easily be pulled into Excel, correlations IDs, more error codes and more details are now provided when an error may occur.

Beyond new upgrade capabilities your process for performing an upgrade will change a little bit given these changes. Here are some things to keep in mind.

  • Review the Upgrade charts located here - http://msdn.microsoft.com/en-us/subscriptions/cc263199.aspx. There are two specific diagrams a workflow for performing an actual upgrade and another for testing.
  • Do not skip doing due diligence. Please audit yourself, know what you have deployed and determine what the best path is to ensure continuity of operations.
  • As discussed above, you will need to build out your SharePoint 2013 farms prior to performing the upgrade. It is also good practice to leave those SharePoint 2010 farms as long as you can until you feel the production migration is complete.
  • Additionally if you are using federated service farms (ie you have a SharePoint 2010 services that provides common services to other SharePoint farms), you MUST upgrade those farms first. Do not worry, an upgraded federated service farm on SharePoint 2013 can communicate with a SharePoint 2010 content farm. However it does not work in the reverse order.
  • Similar to the prior note, even if you do not have a federated service farm, you should upgrade your services first, then bring over content databases and then gradually upgrade sites collections to SharePoint 2013.
  • Do not forget, if you have a ton of customization and third-party tools, you will need to work through the process of moving them to the SharePoint 2013 farm.

SharePoint Online

As you have been reading, you may have noticed a lot of new features that are focused on supporting large SharePoint environments. One would be Office 365 and upgrading SharePoint Online customers. Knowing that:

  • Previously I talked about upgrading federated service farms first and then upgrading the content farms. Well this is exactly what is going to happen with SharePoint Online. The common federated service farms will be upgraded first by building new service farms on SharePoint 2013 and connecting them to SharePoint 2010 content farms. The new SharePoint 2013 content farms will be created and then the content databases will be moved over.
  • Next customers in SharePoint Online will determine when they want to upgrade the Site Collections to use SharePoint 2013. New PowerShell commands are available for SharePoint Online to automate this activity. Both the new Health Check Report and Evaluation Site Collection features are available.
  • Note that with SharePoint Online customers will be required to eventually upgrade all Site Collections to SharePoint 2013 user experience. Even on-premise customers should do this as the goal is not to allow SharePoint 2010 solutions to run forever. This capability is really intended to provide support for transition.

As I mentioned, moving forward it is always good to understand how you are building custom solutions. I know a lot of people are doing this SharePoint. I really believe that you can build a lot of solutions with out of the box features and capabilities. I recommend that if you are building highly customized solution, to investigate building solutions with the new SharePoint Apps model.

Thursday, December 13, 2012

Office 365 for Government FISMA ATO

Office 365 for Government The Recovery Accountability and Transparency Board (RATB) has granted the Authority to Operate (ATO) under the Federal Information Security Management Act (FISMA). This is a moderate rating. To read more – please read this announcement - http://blogs.office.com/b/microsoft_office_365_blog/archive/2012/11/29/office-365-government-customer-ratb.aspx

Additionally here is another article on how The Recovery Accountability and Transparency Board (RATB) is being a trend setter for cloud computing within US Federal - http://gcn.com/Articles/2012/12/12/Recovery-Board-hub-gathers-multiple-clouds.aspx?Page=1

Friday, November 30, 2012

Office 365 Preview Service Descriptions

Not sure if anyone has noticed but the Office 365 Preview Service Descriptions are published here - http://technet.microsoft.com/en-us/library/jj819284.aspx. There is really good information in here to help you evaluate all the features and capabilities that are available in each type of Office 365 plan. Remember these are Preview and subject to change; however this will really help you with your planning!!!

If you read my blog at all, you will know that I constantly talk about Service Description updates and such. These are really the most important documents customers should be reviewing as part of their decision to the cloud.

Friday, November 23, 2012

SharePoint Conference, SharePoint Online Operations Team Presentations

1.0 Introduction
I recently went to the SharePoint Conference and attended several great sessions. I am going to put up some of my notes from the conference as there was a ton of great information for the 10,000 people whom attended. I openly admit that I am a little focused on SharePoint Online because I am constantly talking with customers about all our solutions available in Office 365. Also excuse the grammar – I am just trying to get content up as quick as I can…

1.1 Cloud First and Aligned Management
From a SharePoint perspective, Office 365 has really driven Microsoft towards building solutions that are more scalable and manageable. For instance in SharePoint 2010 we were given this “new SharePoint Federate Services model” that really not many organizations stood up on-premise however it is heavily used in SharePoint Online. Now with SharePoint 2013 we can really see that everything is being built for a cloud environment. For instance SharePoint 2013 upgrades, wow. If you look at the way we will be doing upgrades from SharePoint 2010 to 2013 you are going say “this so much more well thought out now”. First, we have lessons learned since the product has been around since 2001 however the cloud has really driven Microsoft to deliver better solutions. Why? Because Office 365 delivers a finically backed SLA with a promise to keep customers moving forward on latest and greatest solutions in the cloud. We will not get stuck in an old version and this has forced Microsoft deliver even better upgrade capabilities.
In this blog I am going to be capturing some information from two specific sessions I attended on “How We Do It” for SharePoint Online. Throughout these sessions it really resonated to me how we were changing the architecture of the SharePoint product to be cloud first. The Microsoft SharePoint Product Group is the same group of people supporting SharePoint Online. I will talk about this later in the blog but this is a big deal as it really demonstrates Microsoft’s commitment to have people, process and technology strategically aligned.

1.2 Sessions on How We Do It at SharePoint Online
There were two amazing sessions at the SharePoint Conference. One was called Operating SharePoint Online and the other was called Building and Managing SharePoint Online. If you are a developer of SharePoint you may not have attended these two sessions but they will blow your mind. Specifically if you are a person that has ever managed a production SharePoint farm, you will really appreciate what they have done; not just from a physical and logical architecture perspective as there is program management and governance built into Office 365 that frankly organizations have a very tough time in building and delivering on-premise. This is driven by the 99.9% finically backed SLA that Microsoft delivers with Office 365.

1.3 Session on Operating SharePoint Online
Here are some of my notes (formalized a little) from the presentation.

Some Current Stats - Microsoft has made over a $3.28 billion dollar investment in data centers that are supporting Office 365. This really demonstrates Microsoft’s commitment to the cloud. At the time of this presentation, in support of SharePoint Online, there are currently more than 13,000 servers with over 37,000 SQL servers in the cloud data center. They indicated they are currently bringing on 30,000 companies a week! They have a 24/7 development staff. Plus they have actually maintained 99.95% YTD deliver of service for SharePoint Online – basically beating their stated SLA. This is absolutely amazing when you hear stats like this, and you have done SharePoint administration, it really makes understand the value that Microsoft is delivering to your organization. Stop and think a second for all the people, business processes, governance, management, etc. that is needed to deliver this.

Goals – The SharePoint Online Operations team discussed their focus on several things such as zero downtime, zero loss of data, always up to date and security/compliance. These are all things which organizations whom deploy SharePoint Online try to adhere to which are very challenging to implement because they can require serious investment in people, process and technology which may not simply deploying some SharePoint servers.

Zero Downtime – During the session they spent a lot of time discussing this.
  • In order to support this, the SharePoint Online team is constantly monitoring from multiple different angles. SCOM is highly utilized in support of this activity. They specifically implement scenario based monitoring so it is just not checking server machine status. As part of this they do a lot of live traffic monitoring and watch for patterns. They also implement rather comprehensive scenario based monitoring on the SharePoint Online environment.
  • They stated that one of the biggest reasons for their success thus far is their alignment to the SharePoint Product Group. They have direct integration with the people whom write the actual code for SharePoint and these same people have direct responsibilities support SharePoint Online. As I mentioned earlier, it is this sort of alignment which drives great delivery as there are is direct access to people who wrote SharePoint to support SharePoint Online.
  • They also stated that even though they have access to the people who built SharePoint Online, from a support perspective they have a goal to “automate everything”. Really this is the only way they can ever scale and they have demonstrated that with the level they are currently delivering at.
  • They said stated they are doing close to 172 million probes per month to make there are no issues. They stated that this can result in roughly 600,000 anomalies a month that SCOM may identify. Through correlations systems, they are able to identity roughly 200 escalations a month they may need to deal with. This is pretty amazing when you look at the number of probes and the amount of automation they put in place to discover and automatically resolve. Plus they continually find ways to reduce this.
  • When issues are discovered they have an entire automated system will communicate to engineers, manage workflows and tasks, and proactively initiates meetings between responsible engineers. The system even provides a full report and list of past resolutions on how to immediately resolve it if is something that has been encountered before.
  • Additionally they talked a little about this internal solution they created with Microsoft Research that can parse ULS logs. I know when I have had to debug SharePoint on-premise production issues in the past I had to work with ULS logs which can never be a fun task. However this tool provides a dashboard, drill down capabilities and pattern analysis across every ULS log across the entire SharePoint Online cloud. It is impressive.
Zero Data Loss – The SharePoint Online Operations team spent some time talking about zero data loss. If you have ever read the SharePoint Online Service Descriptions this directly correlates to RTO and RPO. RTO is Recovery Time Objective which is the target time to between when there is a disaster and when the service is running again. RPO is Recovery Point Objective which is the time associated to the possible data loss that could occur during an unexpected event. The key word here is “disaster recovery”. The definition is nice but how is this actually achieved with SharePoint Online and they explained how they do it. If you are a SharePoint architect you know that this would be driven by SQL Server configuration. The SharePoint Online team stated that this is what they do today. When a document hits SharePoint Online:
  • The document is first stored in the content database associated to the site, so that is on place it is stored.
  • Second, all the SQL databases are using RAID 10, so there is an immediate duplication.
  • Third, there is synchronous SQL Mirroring built up to a DR SQL server in the immediate data center, so that is 4 copies of the file.
  • Fourth, there is asynchronous log shipping from the primary cloud data center to the secondary data center. So that is roughly 4 additional copies of the file into the secondary data center.
  • Fifth, there are schedule backups at the primary data center and then asynchronous replication of those backups to the secondary data center.
As many of you may know, getting that sort of SQL Server redundancy built and managed can be challenging for many organizations to handle however it is required for Microsoft to meet the RTO/RPO.

On top of all this, remember users have the Recycle Bin to recover items that they may have deleted. Note I re-checked the service descriptions and they state the Recycle Bin will keep deleted items for 30 days and backups are stored for 14 days. Also note with SharePoint Online, the Recycle Bin is can also be used to recover objects such as site and even site collections (through tenant administration).
The Operations team stated that Disaster Recovery for them is a hot standby where data centers are always paired with each other. They adhere to an Active-Passive farm set-up with automated failure overusing DNS. They do tons of monitoring, testing, and production fail overs tests. They states specifically that each data center is production and there is no such thing as on primary data center is taking all the traffic while secondary is just sitting there waiting. They ensure that all data centers are performing primary workloads and if there ever is a disaster, they would just re-distribute that workload across the data centers. They stated that as part of this they demand resiliency at both hardware and software layers. They also indicated that for SharePoint Online that they have never really run into the whole situation where the data center has gone. More realistic scenarios they run into are connectivity issues or something to that effect where they will do a DNS flip and keep operations going.

Always Up to Date – Again one of the biggest reasons why customers want to move to SharePoint Online is to ensure they are always up to date with latest SharePoint software, but additionally all the security and feature patching that is provided to ensure that best secure user experience is being delivered. The SharePoint Online Operations team discussed some of their change management and governance they implement to support this for their customers. They need to make sure that security patches, platform upgrades, escalation responses and latest/greatest features are deployed.
Doing this across a large cloud environment requires a significant amount of automation and they built internal tools that will orchestrate these changes. There is a Change Manager application that manages all the physical and virtual machines. The manager knows the state of every machine, the patch level, how it is being utilized and has deep logic for it to know how to apply patches based on scenarios. Plus VMs (where SharePoint servers roles running for SharePoint Farm) are not all located in the same physical servers. VMs are deployed across multiple physical machines and “availability groups” are created so that when a patch is run, it is executed by availability group to ensure there are no performance issues during patching. The Manager will handle lock management across VMs and SharePoint farms and they state they do patching roughly every two weeks worldwide but this could be more dependent on the need.

The Operations team also noted that changes are not rolled out whenever they feel like it J There is a phased roll out process including change approve board which analyzes every proposed change. They have an automated, multi-step process of numerous environments they will test these changes out before ever going into production. The SharePoint Online Operations team even said “we eat our own dog food” by pushing all completely vetted patches into the Microsoft Corporate’s SharePoint Online production tenant before it goes to customer. SharePoint Online is highly utilized by Microsoft employees.

Secure and Compliant – The final goal they discussed was security and compliance.
  • First there was a good discussion on how they are fully patched 100% of the time. They have a team of security specialists (they joked hackers) whose job it to continually search and test for vulnerabilities.
  • Security by Design was something the team stressed. Role based access is required at all times, regardless of the task or operation at hand. If there is an operation that must be done by a human, then there are secure consoles that are provided based on your role. Plus permissions are managed using an on-demand access model. The said almost no operations required admin access levels. They also stated the operations people do not need access to customer data to perform the tasks they need to complete. They need to work with system logs and such. If support needs to work with customer data, that would be done as part of a customer request. The goal is to be extremely respectful to customer data. They even discussed that for the US Government cloud personnel must be US Citizens.
  • They discussed, which I talk a lot about is, Office 365 support for compliance through audits. ISO 27001, EU Model, HIPPA, FISMA, etc. This is the only way to scale and Microsoft has demonstrated they adhere to the most of them.
  • One last thing they discussed is they take the approach that they always assume there could be a breach. This is basically to ensure that they are always proactive, checking, monitoring and improving. To assist them with this they actually have a Big Data solution (which is compliant with all our standards and scrubs out PII) that consume log data for them to proactive searching and security analysis. For instance they said SharePoint Online today generates roughly 2TB of ULS logs per day (that is amazing). They scrub and then push this data in the system and they check for instance SharePoint correlation logs in less than a second going back three or more years.
All in all, this was a very impressive session to sit in where the SharePoint Online Operations team shared with customer what they do to ensure how they meet their SLAs.

1.4 Session on Building and Managing SharePoint Online
The second session I sat in and took notes was on was a session on how SharePoint Online is built and managed. In this session they discussed at length how SharePoint Farms are provisioned.

Layers of Office 365 – They had a good discussion on how they logically break out the layers of Office 365.
  • Office 365 Portals – This was the sign up experience and tenant administration services that allow customers to manger purchased services.
  • Office 365 Platform Services – This is made up of Commerce / Billing, Identity Platform, authentication, and DNS.
  • Office 365 Services – These are the services that you know and purchase today – SharePoint, Lync, Exchange and Office Web Apps.
The SharePoint Online team then discussed some of the components of Office 365. They noted that the SharePoint 2013 bits are the same bits that customer purchase and install on premise. The Service Fabric is made up all the components that are needed to run the service. For instance this is this is made up of several things such as deployment / environments, authentication, tenant administration, upgrade, high availability and production support management.

Layers of SharePoint Online – They then broke out the layers of SharePoint Online as being three core layers:
  • Physical – this is all the data centers, machines and physical networks that are used to support SharePoint Online.
  • Virtual Machines – they then discussed how Hyper-V was central to their delivery strategy. They also discussed how the break out units of scale by “networks”. Now the term network does not really mean what you normally think. Let’s come back to that a little later.
  • Services – they noted that every service that runs in SharePoint Online has a 1+ redundancy strategy. There are thousands of services that are running and everything must be integrated.
Topology – Next the SharePoint Online service team showed a topology of SharePoint in Office 365.
  • First they have a network. On that network they have a lot of common services that are available. For instance such services AD synchronization, provisioning services, SCOM, DNS, administration, back-up, etc.
  • Then within each network they create what the call a stamp. A Stamp is a set of SharePoint Farms that a customers are brought into. First within the stamp the have a SharePoint Federated Services farm. This was introduce in SharePoint 2010 as a way to create scaled our services for such things a search, metadata managed service, etc. The second farm in the stamp is the SharePoint farm itself including all the WFEs, crawl WFEs, app servers, timer jobs, sandboxes, etc. They said this usually will be around 10 or more SharePoint servers. The third farm is a SQL Server farm. Finally there is a local Active Directory with accounts for the customers who have been provisioned to that stamp. Remember this could be a mixture of cloud based IDs or federated IDs from on premise. Once a stamp is built, there will be a second identical stamp set up on a network. They stated that each one of these stamps could support roughly 100,000 users.
  • Third they discussed this component of Office 365 called the Grid Manager. This is the component of SharePoint Online that is responsible for basically running, coordinating and automating almost everything. Then there are other services such as the Global directory, tenant administration, commerce backend, DNS, authentication, incident management, Azure service and CDN services.
Grid Manager – They then proceed to discuss the Grid Manager at more length. Basically the Grid manager is a solution that is in constant communication with all the stamps and networks of SharePoint Online. It does this communication through APIs, web services and powershell scripts. It does a significant amount of remote orchestration through scripts to really support this goal of complete automation. The Grid Manager stores the state information for all managed objects in all of SharePoint Online. It has hundreds of automated jobs to strategically manage all these objects.

Provisioning Process – The operations team then discussed at a high-level how the Grid Manager would provision a new stamp. Many of the operations SharePoint Administrators do but this is completely automated. For instance they have stamps such as bring in the standard VMs, deploy the local AD and SQL farms, create the federated services farm, then the content management farm, then post deployment patching of VMs and SharePoint, etc.

Provisioning New Customers – The operations team then had another interesting discussion on how they provision customers based on the layered architecture they described earlier. They also gave some interesting stats that they on board roughly 30K new tenants a week with roughly 4K new tenants a day. They then discussed some of the rules that would determine when network and stamp that a customer is provisioned to. The Grid Manager basically has tons of factors that it evaluates as part of that such as geography, capacity of existing farms, operation activities currently occurring within a stamp, tenant vision (is it primarily a SP 2010 of 2013 farm), and dependency of services (for instance a government customer will go into a government network and stamps). Once that is done there is a whole another set of provisioning services that are responsible for setting up the initial site collections for the customer, creating DNS entries, creating user groups, etc. They even discussed how they have become pretty smart with doing pre-provisioning of tenants in advance and then can just adjust them as customers come into the service to be even more efficient with delivery.

Upgrades – The operations team had a very interesting discussion on this but my next blog will be focused on that with notes I captured from another session. Will post a link here once I have that done.

1.5 Conclusions
You can draw a ton of conclusions from this. The point everyone should be taking away is building this on your own, even if it is nowhere as near as automated as SharePoint Online is a major task for many organizations to take on. Why? Organizations are in the business of providing goods and services. Even though organizations create IT groups in support of their mission, it is really hard to justify this sort of level of automation and management for an organization that may have a just a 12 server farm on-premise. The value of SharePoint Online is your business can focus IT resources at building solutions versus running them.

1.6 Additional References
There is more information associated if you read the service descriptions. In this case read the SharePoint Online, Security and Continuity and Support Service Descriptions and you will see how all this information plays into supporting them.
http://www.microsoft.com/en-us/download/details.aspx?id=13602