At a High Level
At a high level, there are several key features of the new
Advanced eDiscovery (Equivio).
Near-Duplicate Detection – With large volumes of data there will documents that are the same or have be nearly the same. Instead of having multiple people review different versions of the same document, it is more efficient to group same/similar documents together and have a single person review those documents.
Email Threading – Allow reviewers the ability to focus on the unique text of an email thread versus having to focus on all the duplicative text in an email thread.
Themes – Will help reviewers by grouping (or clustering) related documents for added context.
Predictive Coding – This capability will allow the Advanced eDiscovery solution to automatically identify relevant data. Advanced eDiscovery has a machine learning component that can be “trained” using a sample set of data. Once “trained” the system can make relevancy decisions across all the data in the case.
These features will become more apparent later within this
blog.
So what is the
relationship between all Office 365 eDiscovery Solutions?
Advanced eDiscovery with Equivio works in concert with the
eDiscovery capabilities already delivered in Office 365. You will still use Office 365 eDiscovery to
initially search content sources. Once
the searches have been completed, you will use the new Advanced eDiscovery
(Equivio) to perform analysis on that data.
Here are some of the major solutions.
Exchange Online In-Place eDiscovery – This capability has been in Exchange Online for a while. Administrators have the ability to perform eDiscovery across mailboxes, place items on legal hold, data export, etc.
Skype for Business Online eDiscovery – Is enabled through Exchange Online capabilities.
SharePoint Online and OneDrive for Business Search / Hold / Export – SharePoint Online has capabilities to discover, hold and export data natively in that service.
eDiscovery Center in SharePoint Online – Is an eDiscovery portal solution that allows for you to create and manage cases for eDiscovery, legal hold and data export across SharePoint Online, OneDrive for Business, Exchange Online and Skype for Business.
Compliance Search – In a new capability that works with Exchange, SharePoint, OneDrive for Business and Skype In-Place eDiscovery. Specifically, Compliance Search has no limits on the number of target mailboxes that can be searched. This is especially important if you have an organization larger than 10,000 mailboxes and you want to perform a search across the entire organization. Once you have identified the data in Compliance Search, you have the ability to manage legal hold, data export, etc. though the respective In-Place eDiscovery capabilities previously mentioned.
Knowing these
solutions, where does Advanced eDiscovery with Equivio play in?
Advanced eDiscovery with Equivio can be used after you
complete a Compliance Search. It is very
common that the results of a Compliance Search could yield a lot of data
depending on the scope of your search. However,
not all that data may not be relevant to the case. Organizations need tools that will help them
identify only the relevant data. When
there are significant volumes of content, it is not realistic that hundreds of
thousands of files will each be read.
This is where Advanced eDiscovery with Equivio can help.
Simple Scenario
A common scenario would be that an organization wants to
ensure that the amount of data being provided meets the minimum required by a
court. The organization wants to make
sure that they provide the most relevant data and they have a significant
amount of data they need go through.
First we go to the Compliance Admin Center
>> eDiscovery and then create a case.
Then
in the eDiscovery case, we have the ability to search across Exchange Online,
SharePoint Online, OneDrive for Business and Skype for Business Online. Once you have your queries defined, you can then
click the “Analyze with Equivio Analytics” link.
A
screen will come up for you to confirm sending the data over for that query to Equivio
for preparation.
Once you have your configurations complete,
press the Analyze button.

Upon
completion of the analysis, you can see the following results. This will tell you how many types of files
you have as well as information that shows you how much unique data there is. The report has emphasis on identifying unique
data. Here is some more information about
this report - https://technet.microsoft.com/EN-US/library/mt297878.aspx.
Once
you have completed initial assessment of the data, we to determine the relevancy
of the data. The way this is done is by
training the system (Equivio) to know how to identify relevant data. This is enabled through the machine learning
/ predictive coding features of
Equivio. As you can see below, the
initial Assessment has been completed and training need to be initiated. Simply press the Training button.
Next
you will go through a sample of the data selected by Equivio (commonly in 40
file batches). Your analyst (knowledgeable
with the case) will mark data as being relevant or not-relevant. This marking of files will be used by Equivio
to determine the richness of the data and will be to perform statistics. For more details on this process, read this
subset of articles - https://technet.microsoft.com/en-us/library/mt303697.aspx. In the screenshot below, the file was marked
as relevant.
In
the screenshot below, the file was marked as not-relevant.
Once
enough data has been reviewed, you will see that training is complete and now
it is time to complete Batch Calculation.
Press the Batch Calculation button.
Batch
calculation will go through all of the rest of the files for you and apply a
relevancy score.
Once
Batch Calculation is complete, you need to review the findings on the Decide
step. This is the step where you need to
decide how many more documents you want to review and how much it will cost for
that review. The objective is find an
acceptable Cutoff point. Why? Because pulling out all of the data for legal
review is not cost effective; especially when we are working with significant
volumes of unstructured data. An
organization needs to find the right balance between discovering as much data
as possible while at the same time reducing the amount of actual files a lawyer
would have to review. This saves real
money when we are talking about significant volumes of data.

Let’s dig into this report a little more.
First, the “Review” is the percentage of files reviewed
based on the cut-off point. Second, the “Recall”
is the percentage of relevant files in the review set.
In this specific example below, the total number of files in
the collection being analyzed is 685,591 files.
The Richness is the total number of relevant files in the
entire collection. In this case the
Richness was determined to be 5.20% of the files (or roughly 35,000 relevant files).
The number of documents Reviewed in this specific set is roughly
137,000 files (or 20% of the total files in the collection).
At this specific Review-recall ratio, the Cutoff Score is
7.42. What does this mean? During the training and batch calculation
steps, each file was given a relevance score between 0 and 100; 100 being the
most relevant.
Here is the really important part of this analysis. If lawyers were to review all files that have
a Relevance score of higher than 7.42 (which is 20% of the total data) they
would find 84.6% of the relevant files (roughly 30,000 relevant files).
In this example there is some simple cost analysis. If it were to cost $1 to have a lawyer review
each file, there is a cost of roughly $135,700 for that to be completed. The $1 was set as something simple and is a configurable
parameter.
You may then wonder, what would the cost implications
be to review more data? Equivio support
this by allowing you to use the slider in the Review-recall ratio section. In this specific example, if you drag that
slide to review 28% of the data, Recall will increase to 90.4% of the relevant files
being produced for lawyer review. This does
increase the cost to $189,900 for them to review those additional files. This could be considered a cost effective decision
depending on your organization’s risk profile.

You
can again increase the Review of documents to 56% of the files, this will
provide a Recall to 97% of the relevant files.
However, as you will see there is a cost of $384,700; which is double
the cost. At some point a determination
needs to be made where is an acceptable Cutoff for your organization. In this third case, you are spending twice
the amount of money to review documents that have a very low relevance score.
The final step is to use the Export
capability. This will allow you to
export all the files used in the case in Equivio. Remember using this export removes all off
the near-duplicates and consolidates the email threads. This means using this tool, lawyers have less
files to review and those files are grouped together.
Can Advanced
eDiscovery with Equivio be used with data outside of Office 365?
The answer is yes.
With Office 365 you have the ability to bring
data from your premises solutions and pull them into Office 365.
Additionally,
third-party archiving solutions are being introduced that can feed data from
other platforms into Office 365. Once
that data is in Office 365, it can be managed and just like any other file in
Office 365. Thus the data is now discoverable.
Conclusions
This is a really interesting solution that has been
introduced into Office 365. It will go a
long way towards helping enterprise customers work with the significant volumes
of data that they need to work with.
References