About a year ago Microsoft made an announcement for the acquisition of a company called Equivio with the plans of incorporating it into Office 365. Equivio or Office 365 Advanced eDiscovery is now available and is part of the new E5 suite.
Advanced eDiscovery with Equivio provides organizations the ability to perform eDiscovery analysis across significant volumes of unstructured content using machine learning, relevance/predictive coding and text analytics. This will enable organizations to tactically search through emails, documents, files and instant messages that are stored inside of Exchange Online, SharePoint Online, OneDrive for Business and Skype for Business Online. Advanced eDiscovery achieves this by removing duplicate files, reconstructing email threads, identifying key themes and data relationships.
Enterprise organizations are presented with a real
eDiscovery challenge today. They have
significant volumes of data they need to go through. Today with Office 365 (depending on your
plan) a user could have unlimited email storage, along with 1 TB of personal
OneDrive storage plus SharePoint Online collaboration data they use. An organization can use Office 365 eDiscovery
tools to find data across Office 365 however that could produce a significant
amount of data. That significant amount
of data can be exported but it will need to be reviewed file-by-file by
lawyers. That review of data will have a
significant cost to the organization. Organizations
are looking for solutions that can prune, pair-down and identify the most
relevant data. Office 365 Advanced
eDiscovery with Equivio can assist with this.
Let’s review the new Office 365 Advanced eDiscovery.
At a High Level
At a high level, there are several key features of the new
Advanced eDiscovery (Equivio).
- Near-Duplicate Detection – With large volumes of data there will documents that are the same or have be nearly the same. Instead of having multiple people review different versions of the same document, it is more efficient to group same/similar documents together and have a single person review those documents.
- Email Threading – Allow reviewers the ability to focus on the unique text of an email thread versus having to focus on all the duplicative text in an email thread.
- Themes – Will help reviewers by grouping (or clustering) related documents for added context.
- Predictive Coding – This capability will allow the Advanced eDiscovery solution to automatically identify relevant data. Advanced eDiscovery has a machine learning component that can be “trained” using a sample set of data. Once “trained” the system can make relevancy decisions across all the data in the case.
These features will become more apparent later within this
blog.
So what is the
relationship between all Office 365 eDiscovery Solutions?
Advanced eDiscovery with Equivio works in concert with the
eDiscovery capabilities already delivered in Office 365. You will still use Office 365 eDiscovery to
initially search content sources. Once
the searches have been completed, you will use the new Advanced eDiscovery
(Equivio) to perform analysis on that data.
Here are some of the major solutions.
- Exchange Online In-Place eDiscovery – This capability has been in Exchange Online for a while. Administrators have the ability to perform eDiscovery across mailboxes, place items on legal hold, data export, etc.
- Skype for Business Online eDiscovery – Is enabled through Exchange Online capabilities.
- SharePoint Online and OneDrive for Business Search / Hold / Export – SharePoint Online has capabilities to discover, hold and export data natively in that service.
- eDiscovery Center in SharePoint Online – Is an eDiscovery portal solution that allows for you to create and manage cases for eDiscovery, legal hold and data export across SharePoint Online, OneDrive for Business, Exchange Online and Skype for Business.
- Compliance Search – In a new capability that works with Exchange, SharePoint, OneDrive for Business and Skype In-Place eDiscovery. Specifically, Compliance Search has no limits on the number of target mailboxes that can be searched. This is especially important if you have an organization larger than 10,000 mailboxes and you want to perform a search across the entire organization. Once you have identified the data in Compliance Search, you have the ability to manage legal hold, data export, etc. though the respective In-Place eDiscovery capabilities previously mentioned.
Knowing these
solutions, where does Advanced eDiscovery with Equivio play in?
Advanced eDiscovery with Equivio can be used after you
complete a Compliance Search. It is very
common that the results of a Compliance Search could yield a lot of data
depending on the scope of your search. However,
not all that data may not be relevant to the case. Organizations need tools that will help them
identify only the relevant data. When
there are significant volumes of content, it is not realistic that hundreds of
thousands of files will each be read.
This is where Advanced eDiscovery with Equivio can help.
Simple Scenario
A common scenario would be that an organization wants to
ensure that the amount of data being provided meets the minimum required by a
court. The organization wants to make
sure that they provide the most relevant data and they have a significant
amount of data they need go through.
First we go to the Compliance Admin Center
>> eDiscovery and then create a case.
Then
in the eDiscovery case, we have the ability to search across Exchange Online,
SharePoint Online, OneDrive for Business and Skype for Business Online. Once you have your queries defined, you can then
click the “Analyze with Equivio Analytics” link.
A
screen will come up for you to confirm sending the data over for that query to Equivio
for preparation.
Once the preparation is completed, you need to click on the “Go
to Equivio Analytics” link back on the Compliance Admin Center >>
eDiscovery screen. You then need to
create a Case in Equivio and associate search results you just sent over. Here is an article for the past few steps just
described - https://technet.microsoft.com/EN-US/library/ms.o365.cc.customizeexportwithzoom.aspx.
Once that is done, you can configure near-duplicates, email
threads, themes and text that should be ignored configurations. Here is an article with more information on
these configurations as there is a lot of detail to this which can impact the
Equivio analysis - https://technet.microsoft.com/EN-US/library/mt297882.aspx
and https://technet.microsoft.com/EN-US/library/mt297873.aspx.
Once you have your configurations complete,
press the Analyze button.
Upon
completion of the analysis, you can see the following results. This will tell you how many types of files
you have as well as information that shows you how much unique data there is. The report has emphasis on identifying unique
data. Here is some more information about
this report - https://technet.microsoft.com/EN-US/library/mt297878.aspx.
Once
you have completed initial assessment of the data, we to determine the relevancy
of the data. The way this is done is by
training the system (Equivio) to know how to identify relevant data. This is enabled through the machine learning
/ predictive coding features of
Equivio. As you can see below, the
initial Assessment has been completed and training need to be initiated. Simply press the Training button.
Next
you will go through a sample of the data selected by Equivio (commonly in 40
file batches). Your analyst (knowledgeable
with the case) will mark data as being relevant or not-relevant. This marking of files will be used by Equivio
to determine the richness of the data and will be to perform statistics. For more details on this process, read this
subset of articles - https://technet.microsoft.com/en-us/library/mt303697.aspx. In the screenshot below, the file was marked
as relevant.
In
the screenshot below, the file was marked as not-relevant.
Once
enough data has been reviewed, you will see that training is complete and now
it is time to complete Batch Calculation.
Press the Batch Calculation button.
Batch
calculation will go through all of the rest of the files for you and apply a
relevancy score.
Once
Batch Calculation is complete, you need to review the findings on the Decide
step. This is the step where you need to
decide how many more documents you want to review and how much it will cost for
that review. The objective is find an
acceptable Cutoff point. Why? Because pulling out all of the data for legal
review is not cost effective; especially when we are working with significant
volumes of unstructured data. An
organization needs to find the right balance between discovering as much data
as possible while at the same time reducing the amount of actual files a lawyer
would have to review. This saves real
money when we are talking about significant volumes of data.
Let’s dig into this report a little more.
First, the “Review” is the percentage of files reviewed
based on the cut-off point. Second, the “Recall”
is the percentage of relevant files in the review set.
In this specific example below, the total number of files in
the collection being analyzed is 685,591 files.
The Richness is the total number of relevant files in the
entire collection. In this case the
Richness was determined to be 5.20% of the files (or roughly 35,000 relevant files).
The number of documents Reviewed in this specific set is roughly
137,000 files (or 20% of the total files in the collection).
At this specific Review-recall ratio, the Cutoff Score is
7.42. What does this mean? During the training and batch calculation
steps, each file was given a relevance score between 0 and 100; 100 being the
most relevant.
Here is the really important part of this analysis. If lawyers were to review all files that have
a Relevance score of higher than 7.42 (which is 20% of the total data) they
would find 84.6% of the relevant files (roughly 30,000 relevant files).
In this example there is some simple cost analysis. If it were to cost $1 to have a lawyer review
each file, there is a cost of roughly $135,700 for that to be completed. The $1 was set as something simple and is a configurable
parameter.
You may then wonder, what would the cost implications
be to review more data? Equivio support
this by allowing you to use the slider in the Review-recall ratio section. In this specific example, if you drag that
slide to review 28% of the data, Recall will increase to 90.4% of the relevant files
being produced for lawyer review. This does
increase the cost to $189,900 for them to review those additional files. This could be considered a cost effective decision
depending on your organization’s risk profile.
You
can again increase the Review of documents to 56% of the files, this will
provide a Recall to 97% of the relevant files.
However, as you will see there is a cost of $384,700; which is double
the cost. At some point a determination
needs to be made where is an acceptable Cutoff for your organization. In this third case, you are spending twice
the amount of money to review documents that have a very low relevance score.
If you want to do some additional testing based on the
information in this report, there is a Test Relevance feature where you can test
the rest of the data. You can manually
tag more data and perform more analysis - https://technet.microsoft.com/EN-US/library/mt303692.aspx.
The final step is to use the Export
capability. This will allow you to
export all the files used in the case in Equivio. Remember using this export removes all off
the near-duplicates and consolidates the email threads. This means using this tool, lawyers have less
files to review and those files are grouped together.
Can Advanced
eDiscovery with Equivio be used with data outside of Office 365?
The answer is yes.
With Office 365 you have the ability to bring
data from your premises solutions and pull them into Office 365.
Additionally,
third-party archiving solutions are being introduced that can feed data from
other platforms into Office 365. Once
that data is in Office 365, it can be managed and just like any other file in
Office 365. Thus the data is now discoverable.
Conclusions
This is a really interesting solution that has been
introduced into Office 365. It will go a
long way towards helping enterprise customers work with the significant volumes
of data that they need to work with.
References
Announcement of Office 365 Advanced eDiscovery - https://blogs.office.com/2015/12/10/reduce-ediscovery-costs-and-challenges-with-office-365-advanced-ediscovery/
Exchange Online In-Place eDiscovery - https://technet.microsoft.com/en-us/library/dd298021(v=exchg.150).aspx
eDiscovery Center in SharePoint Online - https://support.office.com/en-us/article/Set-up-an-eDiscovery-Center-in-SharePoint-Online-a18f8975-aa7f-43b4-a7d6-001d14744d8e
Prepare search results for Advanced eDiscovery - https://technet.microsoft.com/EN-US/library/ms.o365.cc.customizeexportwithzoom.aspx
Office 365 Advanced eDiscovery in TechNet - https://technet.microsoft.com/en-us/library/mt303716.aspx
- tons of articles that show how to execute this capability…
No comments:
Post a Comment