Tuesday, December 29, 2015

Advanced eDiscovery with Equivio

Introduction
About a year ago Microsoft made an announcement for the acquisition of a company called Equivio with the plans of incorporating it into Office 365.  Equivio or Office 365 Advanced eDiscovery is now available and is part of the new E5 suite.

Advanced eDiscovery with Equivio provides organizations the ability to perform eDiscovery analysis across significant volumes of unstructured content using machine learning, relevance/predictive coding and text analytics.  This will enable organizations to tactically search through emails, documents, files and instant messages that are stored inside of Exchange Online, SharePoint Online, OneDrive for Business and Skype for Business Online.  Advanced eDiscovery achieves this by removing duplicate files, reconstructing email threads, identifying key themes and data relationships.

Enterprise organizations are presented with a real eDiscovery challenge today.  They have significant volumes of data they need to go through.  Today with Office 365 (depending on your plan) a user could have unlimited email storage, along with 1 TB of personal OneDrive storage plus SharePoint Online collaboration data they use.  An organization can use Office 365 eDiscovery tools to find data across Office 365 however that could produce a significant amount of data.  That significant amount of data can be exported but it will need to be reviewed file-by-file by lawyers.  That review of data will have a significant cost to the organization.  Organizations are looking for solutions that can prune, pair-down and identify the most relevant data.  Office 365 Advanced eDiscovery with Equivio can assist with this.
Let’s review the new Office 365 Advanced eDiscovery.
At a High Level
At a high level, there are several key features of the new Advanced eDiscovery (Equivio).
  • Near-Duplicate Detection – With large volumes of data there will documents that are the same or have be nearly the same.  Instead of having multiple people review different versions of the same document, it is more efficient to group same/similar documents together and have a single person review those documents.
  • Email Threading – Allow reviewers the ability to focus on the unique text of an email thread versus having to focus on all the duplicative text in an email thread.
  • Themes – Will help reviewers by grouping (or clustering) related documents for added context.
  • Predictive Coding – This capability will allow the Advanced eDiscovery solution to automatically identify relevant data.  Advanced eDiscovery has a machine learning component that can be “trained” using a sample set of data.  Once “trained” the system can make relevancy decisions across all the data in the case.
These features will become more apparent later within this blog.
So what is the relationship between all Office 365 eDiscovery Solutions?
Advanced eDiscovery with Equivio works in concert with the eDiscovery capabilities already delivered in Office 365.  You will still use Office 365 eDiscovery to initially search content sources.  Once the searches have been completed, you will use the new Advanced eDiscovery (Equivio) to perform analysis on that data.  Here are some of the major solutions.
  • Exchange Online In-Place eDiscovery – This capability has been in Exchange Online for a while.  Administrators have the ability to perform eDiscovery across mailboxes, place items on legal hold, data export, etc.
  • Skype for Business Online eDiscovery – Is enabled through Exchange Online capabilities.
  • SharePoint Online and OneDrive for Business Search / Hold / Export – SharePoint Online has capabilities to discover, hold and export data natively in that service.
  • eDiscovery Center in SharePoint Online – Is an eDiscovery portal solution that allows for you to create and manage cases for eDiscovery, legal hold and data export across SharePoint Online, OneDrive for Business, Exchange Online and Skype for Business.
  • Compliance Search – In a new capability that works with Exchange, SharePoint, OneDrive for Business and Skype In-Place eDiscovery.  Specifically, Compliance Search has no limits on the number of target mailboxes that can be searched.  This is especially important if you have an organization larger than 10,000 mailboxes and you want to perform a search across the entire organization.  Once you have identified the data in Compliance Search, you have the ability to manage legal hold, data export, etc. though the respective In-Place eDiscovery capabilities previously mentioned.
Knowing these solutions, where does Advanced eDiscovery with Equivio play in? 
Advanced eDiscovery with Equivio can be used after you complete a Compliance Search.  It is very common that the results of a Compliance Search could yield a lot of data depending on the scope of your search.  However, not all that data may not be relevant to the case.  Organizations need tools that will help them identify only the relevant data.  When there are significant volumes of content, it is not realistic that hundreds of thousands of files will each be read.  This is where Advanced eDiscovery with Equivio can help.
Simple Scenario
A common scenario would be that an organization wants to ensure that the amount of data being provided meets the minimum required by a court.  The organization wants to make sure that they provide the most relevant data and they have a significant amount of data they need go through.
First we go to the Compliance Admin Center >> eDiscovery and then create a case.
 
Then in the eDiscovery case, we have the ability to search across Exchange Online, SharePoint Online, OneDrive for Business and Skype for Business Online.  Once you have your queries defined, you can then click the “Analyze with Equivio Analytics” link.
 
A screen will come up for you to confirm sending the data over for that query to Equivio for preparation.
 
Once the preparation is completed, you need to click on the “Go to Equivio Analytics” link back on the Compliance Admin Center >> eDiscovery screen.  You then need to create a Case in Equivio and associate search results you just sent over.  Here is an article for the past few steps just described - https://technet.microsoft.com/EN-US/library/ms.o365.cc.customizeexportwithzoom.aspx. 
Once that is done, you can configure near-duplicates, email threads, themes and text that should be ignored configurations.  Here is an article with more information on these configurations as there is a lot of detail to this which can impact the Equivio analysis - https://technet.microsoft.com/EN-US/library/mt297882.aspx and https://technet.microsoft.com/EN-US/library/mt297873.aspx. 
Once you have your configurations complete, press the Analyze button.
Upon completion of the analysis, you can see the following results.  This will tell you how many types of files you have as well as information that shows you how much unique data there is.  The report has emphasis on identifying unique data.  Here is some more information about this report - https://technet.microsoft.com/EN-US/library/mt297878.aspx.
 
Once you have completed initial assessment of the data, we to determine the relevancy of the data.  The way this is done is by training the system (Equivio) to know how to identify relevant data.  This is enabled through the machine learning / predictive coding features of Equivio.  As you can see below, the initial Assessment has been completed and training need to be initiated.  Simply press the Training button.
Next you will go through a sample of the data selected by Equivio (commonly in 40 file batches).  Your analyst (knowledgeable with the case) will mark data as being relevant or not-relevant.  This marking of files will be used by Equivio to determine the richness of the data and will be to perform statistics.  For more details on this process, read this subset of articles - https://technet.microsoft.com/en-us/library/mt303697.aspx.  In the screenshot below, the file was marked as relevant.
In the screenshot below, the file was marked as not-relevant.
Once enough data has been reviewed, you will see that training is complete and now it is time to complete Batch Calculation.  Press the Batch Calculation button.
Batch calculation will go through all of the rest of the files for you and apply a relevancy score.
Once Batch Calculation is complete, you need to review the findings on the Decide step.  This is the step where you need to decide how many more documents you want to review and how much it will cost for that review.  The objective is find an acceptable Cutoff point.  Why?  Because pulling out all of the data for legal review is not cost effective; especially when we are working with significant volumes of unstructured data.  An organization needs to find the right balance between discovering as much data as possible while at the same time reducing the amount of actual files a lawyer would have to review.  This saves real money when we are talking about significant volumes of data.
 
Let’s dig into this report a little more.
First, the “Review” is the percentage of files reviewed based on the cut-off point.  Second, the “Recall” is the percentage of relevant files in the review set.
In this specific example below, the total number of files in the collection being analyzed is 685,591 files.
The Richness is the total number of relevant files in the entire collection.  In this case the Richness was determined to be 5.20% of the files (or roughly 35,000 relevant files).
The number of documents Reviewed in this specific set is roughly 137,000 files (or 20% of the total files in the collection).
At this specific Review-recall ratio, the Cutoff Score is 7.42.  What does this mean?  During the training and batch calculation steps, each file was given a relevance score between 0 and 100; 100 being the most relevant. 
Here is the really important part of this analysis.  If lawyers were to review all files that have a Relevance score of higher than 7.42 (which is 20% of the total data) they would find 84.6% of the relevant files (roughly 30,000 relevant files).
In this example there is some simple cost analysis.  If it were to cost $1 to have a lawyer review each file, there is a cost of roughly $135,700 for that to be completed.  The $1 was set as something simple and is a configurable parameter.
You may then wonder, what would the cost implications be to review more data?  Equivio support this by allowing you to use the slider in the Review-recall ratio section.  In this specific example, if you drag that slide to review 28% of the data, Recall will increase to 90.4% of the relevant files being produced for lawyer review.  This does increase the cost to $189,900 for them to review those additional files.  This could be considered a cost effective decision depending on your organization’s risk profile.
You can again increase the Review of documents to 56% of the files, this will provide a Recall to 97% of the relevant files.  However, as you will see there is a cost of $384,700; which is double the cost.  At some point a determination needs to be made where is an acceptable Cutoff for your organization.  In this third case, you are spending twice the amount of money to review documents that have a very low relevance score.
 
If you want to do some additional testing based on the information in this report, there is a Test Relevance feature where you can test the rest of the data.  You can manually tag more data and perform more analysis - https://technet.microsoft.com/EN-US/library/mt303692.aspx.
The final step is to use the Export capability.  This will allow you to export all the files used in the case in Equivio.  Remember using this export removes all off the near-duplicates and consolidates the email threads.  This means using this tool, lawyers have less files to review and those files are grouped together.
 Can Advanced eDiscovery with Equivio be used with data outside of Office 365?
The answer is yes.
With Office 365 you have the ability to bring data from your premises solutions and pull them into Office 365.
 
Additionally, third-party archiving solutions are being introduced that can feed data from other platforms into Office 365.  Once that data is in Office 365, it can be managed and just like any other file in Office 365.  Thus the data is now discoverable.
 
Conclusions
This is a really interesting solution that has been introduced into Office 365.  It will go a long way towards helping enterprise customers work with the significant volumes of data that they need to work with.
References
Office 365 Advanced eDiscovery in TechNet - https://technet.microsoft.com/en-us/library/mt303716.aspx - tons of articles that show how to execute this capability…
 
 

No comments: