Mar 21

Review & Merge : Segmentation

With DataTrim Dupe Alert, Matching and Merging records is NOT limited to being a one-man’s Job.
By intelligently segmenting your data into logical parts, you can match and review these parts separately.
By involving multiple data-stewards and even end-users, you will improve the effectiveness of your cleaning process and increase your visibility to the cleaning initiative.

In this and 2 related blogs we will talk about:

  1. Segmentation: How to break your data into logical segments, and how to create views for more effective reviewing and matching.
  2. Permissions: Sharing Rules, Record Ownership, and how you provide review and merge processes respecting this.
  3. Collaboration: Engaging the End-users in the Review and Merging process, or just allowing them to provide feedback on the duplicates.

Segmentation

If your database is large (100.000’s or millions), you cover multiple countries and regions, business units or in any other way have a database where the data, the quality and the way it is being used may vary from segment to segment, -applying the same matching and merging rules across the entire database will NEVER work.

Certain parts of your company may have had different rules for data entry, or no rules at all. Data from certain source system may have enforced mandatory fields to populated, and you have ended up with a lot of nonsense in these fields, or you may be sitting on a pile of contacts called: Accounts Payable etc.

Splitting the database into logical segments, and managing these with the respect for the local particularities is critical for any data quality initiative and also for the deduplication initiative.

But segmenting the data in the matching process may not be the only key to an effective cleaning process.
Will your team in the US be able to decide whether 2 German companies are duplicates?

Should the Marketing be doing the cleaning of leads only, or should they also cleaning up accounts and contacts, or should this be done by Sales – because you don’t think of having someone in IT doing it all, do you?

Engaging multiple users in your org, improves the overall end-user engagement, their ownership of the problem and at the end: it increases the quality of your data.

Using Filters to segment your data

In DataTrim Dupe Alerts, we separate the Deduplication process in 2 steps: a) Matching records and b) Reviewing and Merging

Matching Records is simple, because we have already built in best practice matching rules and algorithms from more than 12 years of doing this on a consulting basis for companies and organizations all over the world, but segmenting can be rather complex, but it doesn’t have to be.

Once you have your data segment defined and matched, you will then of cause need to orchestrate your reviewing process.

All potential duplicates should not necessarily be reviewed by the same person, so you will need to be able to provide each of these users (even if it is an end-user) with simple-to-use Views, where they have the necessary options to take appropriate action on the duplicates found during the matching process.

How To …

In DataTrim Dupe Alerts we operate with 2 sets of records in the matching process, a Secondary Data Set and a Master Data Set. During the matching these 2 sets are compared to each other in order to identify the potential duplicates, e.g. like when you match Leads against Contacts.

This also allows you to define 2 sets of records within an Object for your matching, e.g. matching your newly imported leads against the active leads already in the database.

And of cause it allows you to perform the simple process of matching one set of records against itself, just with a single click (Master = Secondary).

In DataTrim Dupe Alerts we support the entire querying language of the underlying salesforce database. This gives great flexibility, but it also imposes a strict syntax which need to be followed.

Simple filters and more complex filters can be created to identify subsets of records:

Filter Description
Owner.Name = ‘Frank Scaramanga’ All records owned by a particular user
RecordType.Name = ‘Prospect’ All Records of a particular record type
Country = ‘USA’ By Geography
Country = ‘USA’ or Country = ‘Canada’ By Geography (multiple)
Name > ‘A’ AND Name < ‘F’ By Alphabetic grouping
Type IN (‘Prospect’, ‘Lead’, ‘Active’) By Type (multiple)
LastModifiedDate = LAST_N_DAYS:1 By time period

The built in: Include New and Changed data since and Campaign Field, can be used in addition to the main filter field to select a specific set of records:
Filter x3
Image text: All records with Country = USA AND linked to the campaign: Dreamforce AND has been updated or added since 01/01/2016

Reviewing and Merging

Once an Alert has been processed it will generate an Alert Result Record, which is like a log record, with static data, providing you with details on what happened at the time of the matching.

The Potential Duplicates are all stored in the Matched Record Object, and it is from the DataTrim Matched Records Tab, that anyone who will be reviewing and merging the duplicates should initiate the process.

Matched Records Tab

The package will install a number of default views. These are convenient to get started, but like anywhere else in salesforce the List Views are a basic way of selecting the set of records which you will work on, and naturally it also applies here.

Edit Filter (Lightning) Edit Filter (Classic)
List View - Lightning List View - Classic

Make sure that your views include the 2 filters for Resolves and False Dupes, as you do not want to waste time browsing through potential duplicates which already have been reviewed in the past.

Edit Filter (Lightning) Edit Filter (Classic)
List View - Lightning plus List View - Classic, plus

Jan 26

How to understand which solution is the best for us?

Fuzzy Logic – Probabilistic Matching – Blocking and Scoring?

We are often asked about our technology:

  • Do you use Fuzzy Logic?
  • Are you using Probabilistic Matching?
  • What’s your process for Blocking and Scoring?
  • etc.

We normally take pride in protecting our end users from having to understand any of this, and for those who would like to take a deeper look: Record linkage on Wikipedia may be a starting point.

But to be short the answer is, YES we do use Fuzzy logic, YES we do probabilistic matching and YES we do have algorithms in place for Blocking and Scoring, -but so does everyone else because there is not one single clear definition of what these terms contain when it comes to the implementation of them.

No, look for Completeness, Accuracy and Speed

Completness, Accuracy and Speed
The nature of any deduplication problem lies in optimizing the coverage of 3 needs which 2-by-2 becomes the opposing combination to the 3rd.

  1. Completeness
  2. Accuracy
  3. Speed

If a solution is capable of finding all duplicates, without giving ‘false’ or negative dupes in no time – then you have the optimal solution.

Unfortunately the 3 elements aren’t combined easily.

  1. You can imagine a fast solution, which find exact duplicates, but being fast means that you may not get all duplicates (completeness).
  2. To get all duplicates in with a high level of accuracy even a computer will need a lot of time.
  3. Identification of all duplicates in limited time (speed) can only be achieved by compromising on the accuracy and accept that an amount of the identified duplicates are ‘false’ duplicates
  4. .

In practice this can be addressed in various ways:
Simplify the Dupe definition: If dupes are easily identified e.g. has same email, then a solution can be fast and accurate although it is of cause not complete as many duplicates will escape this simplicity. To improve this approach you can then add criteria (Match Keys) e.g. Name and Mobile, Name and street etc. but the more you add of these match keys – the more slow will your solution become.

This approach is very common and will take care of the most obvious duplicates in any database. The factors which will make this approach fail, is lack of data normalization and standardization as well as missing data, which all will have a negative impact on the result.

By multiplying the number of Match Keys you will achieve completeness with regards to identifying all potential duplicates in your database. With many potential duplicates the Blocking and Scoring algorithm becomes the critical factor as to eliminate and exclude duplicates which after closer review turns out not to be duplicates after all. E.g. you can have 2 records with the same email (info@datatrim.com) even though the 2 contacts are not the same.
If in addition; your time is limited (computer processor time) your Scoring and Blocking algorithms will provide a level of inaccuracy in your result and you will therefore experience a that you during your review process will encounter many ‘False’ duplicates.

DataTrim Dupe Alerts combines the best from the 2 scenarios above, and because DataTrim Dupe Alerts are processing on dedicated servers the computer power will emphasize on achieving completeness and accuracy in the deduplication process.

The User Experience is what matters!

In the end, it’s the user experience that matters, so test the solution. 
With DataTrim Dupe Alerts you do not have to worry about which match keys that works best for you and only gradually improve your deduplication effort as you build your rules, it is all built in from day one.

DataTrim Dupe Alerts comes with an initial set of predefined Match Keys. We have, based on years of experience of matching large international b2b and b2c databases, identified and included a matching process based on best practice.
All you have to do is install it and Run!

To provide completeness in large databases we realize that records which may look like potential duplicates from an algorithm point of view but may not be duplicates from a business point of view becomes a natural part of the matching result.
To provide a structural overview of the result, DataTrim Dupe Alerts has build in a Classification process which classifies each duplicate for you. 
It provides you with a confidence indicator (Match Class) which allows you to prioritize your review process and enable you to review and merge your duplicates in an efficient way.

 

Curious to see how it works? Take a look at our 4 min demo: View Demo or install a Trial from the AppExchange: Install Trial

Nov 04

Matching Campaigns or Imports

You might have realized that matching ALL records EVERY time, isn’t always the optimal way to perform your deduplication. You constantly run into those odd dupes which takes up all your time and prevent you from focusing on the new records you just added, or those that you want to use in your next campaign.

So why not limit the matching of records to only those records you want to use?

In this blog we will talk about Campaigns and Imports. Click here for more information about segmentation and filtering, or check out the user guide.

Matching by Campaign

Say that you have created a campaign, and you have 10.000 contacts in it, and now you want to make sure that there are no duplicates within the campaign, as this clearly will impact your response rate, and also degrade your prospects view of you as a well-organized company (without duplicates).

Creating a filter which identifies your 10.000 records can be tricky, but would look something like this:
ID IN (SELECT ContactId FROM CampaignMember WHERE CampaignID = ‘003300000112345’)
Where 003300000112345 is the Id of your campaign.

Not so straight forward, but with DataTrim Dupe Alerts we have already made the preparation for you to make this much easier.

Simply Add a Field on the DataTrim Dupe Alerts Object, called Campaign, and make it a lookup to the Campaign Object, place it on the page layout of the Dupe Alerts Page and you are ready.

CampaignField setup

On the Dupe Alerts page you will now be able to select the Campaign simply by providing the name of the campaign, or using the built-in function to look it up.

CampaignField select

Notice that the Campaign filter value will be used in combination with the other filter values for the Secondary Data Set. Meaning that by default only the Secondary Data Set will be limited to the records in your Campaign. So, in order to match the Campaign records against themselves you must also tick the checkbox: Master = Secondary
On the other hand, if this was an incoming campaign, e.g. records which you imported from a show, exhibition, or other source, you may actually want to match the 10.000 records against ALL of the records in your database. In this case you leave the Checkbox: Master = Secondary unticked, Simple Right?

Matching by Import

Many companies use the process above to manage their imports.
Each set of imported records are during the import linked to a Campaign (even if it is just a placeholder).
This allows them to keep track of where each individual record came from, and ensures a consistent process which can be used repeatedly for dupe checking each import.

If you don’t use the Campaign module for your imports, we still recommend you to make sure that you track the origin of all your new data.

Not only for having traceability back to the original source, which later will help you evaluate if a source is good, bad or in between, but it will also allow you to effectively match the new data against the existing data.

Simply create a field called Data Source (API name: Data_Source__c) on Account, Contact and Lead, and make sure that each time you import this field will be populated with a unique value: e.g. MIPIM_04_2017 for all the contacts imported from the MIPIM April 2017 event in Cannes, France.

When you then want to perform at matching of the newly imported records against the records already in the system, all you have to do is to create a simple filter for the Secondary Data Set to scope the matching: Data_Source__c = ‘MIPIM_04_2017′
DataSource Import
Don’t hesitate to reach out to our support team if you run into any questions.

Oct 19

Merge Rules/Options – Keep Values from Dupe

By default, the salesforce merge process takes 2 records; a Master (‘Survivor’) and a Dupe. The selected Master takes priority over the Dupe which means that the following rules are applied:

  1. When 2 fields on the 2 records both are populated the value from the Master and the value from the Dupe is lost (unless you use the option to store this with the Master).
  2. If a field on the Master is empty and the corresponding field on the Dupe is populated, then the value from the Dupe is carried across to the Master and the master is in this way enriched with the additional information from the Dupe.
  3. Related data, such as Camping info, activities, tasks etc. are moved across from the Dupe to the Master so that the entire history of the consolidated records is available to the Master record.

Keep Values from Dupe
Often the Master is selected based on simple principles, e.g. a New Lead is considered the Master over an older Lead as the new information is considered more up to date.
But in certain situations, e.g. to track the origin of the leads, you want to proceed with the New lead (merged with the older record) but keep the original Lead Source. Using the new “Keep Values from Dupe” Rule you can easily do this by setting the Rule to Keep the Dupe value for particular fields

2017-03-24_15-59-26
This functionality applies to the DataTrim Quick Merge wizard and the Mass Merge functions of DataTrim Dupe Alerts

This functionality is available to all users DataTrim Dupe Alerts

See also:
Merge Rules/Options, Keep Max value e.g. Lead Score from Marketo
Merge Rules/Options – Exclude fields from Merge

Oct 16

Merge Rules/Options, Keep Max value e.g. Lead Score from Marketo

If you are using Marketo, you may be levering their Lead Score feature to rank your leads.

With the Keep Max Value Merge option, you can now make sure that no matter how 2 leads are merged, you will always keep the highest Leads Score.

By default, the salesforce merge process takes 2 records; a Master (‘Survivor’) and a Dupe. The selected Master takes priority over the Dupe which means that the following rules are applied:

  1. When 2 fields on the 2 records both are populated the value from the Master and the value from the Dupe is lost (unless you use the option to store this with the Master).
  2. If a field on the Master is empty and the corresponding field on the Dupe is populated, then the value from the Dupe is carried across to the Master and the master is in this way enriched with the additional information from the Dupe.
  3. Related data, such as Camping info, activities, tasks etc. are moved across from the Dupe to the Master so that the entire history of the consolidated records is available to the Master record.

Keep Max or Min Values
You may be ranking your leads using Apps like Marketo to prioritize the efforts of your lead management team. Marketo and others are using a Lead Score to indicate Hot leads over less interesting or less qualified leads.

Assume that you get a duplicate lead into your database, which hasn’t yet been qualified (user entry, lead import etc.) and you want the details from this new lead to survive as the Master then by choosing a Master you also choose the Lead Score.

By applying a Merge Rule to e.g. keep the Maximum value of a given field you ensure that during the merge you keep the Master records as per your selection, but always keep the Maximum value of the 2 records for e.g. Lead Score.
2017-03-24_15-59-26
This functionality applies to the DataTrim Quick Merge wizard and the Mass Merge functions of DataTrim Dupe Alerts

This functionality is available to all users DataTrim Dupe Alerts

See also:
Merge Rules/Options – Exclude fields from Merge
Merge Rules/Options – Keep Values from Dupe

Oct 13

Merge Rules/Options – Exclude fields from Merge

By default, the salesforce merge process takes 2 records; a Master (‘Survivor’) and a Dupe. The selected Master takes priority over the Dupe which means that the following rules are applied:

  1. When 2 fields on the 2 records both are populated the value from the Master and the value from the Dupe is lost (unless you use the option to store this with the Master).
  2. If a field on the Master is empty and the corresponding field on the Dupe is populated, then the value from the Dupe is carried across to the Master and the master is in this way enriched with the additional information from the Dupe.
  3. Related data, such as Camping info, activities, tasks etc. are moved across from the Dupe to the Master so that the entire history of the consolidated records is available to the Master record.

Exclude fields from Merge
When merging records, all updatable fields are processed as described above. This includes fields which may be used in workflow rules and triggers.
When you perform a Merge the Master records is update with information from the dupe which may generate a new event which already was taken care of on the Dupe.
By adding these fields to the list of fields to exclude you will prevent this double triggering and speed up the merge process.

2017-03-24_15-59-26
This functionality applies to the DataTrim Quick Merge wizard and the Mass Merge functions of DataTrim Dupe Alerts

This functionality is available to all users DataTrim Dupe Alerts

See also:
Merge Rules/Options, Keep Max value e.g. Lead Score from Marketo
Merge Rules/Options – Keep Values from Dupe

Aug 22

Optimizing the Matching Process

In a constant effort to improve the matching algorithms and make them better and better we continuously introduce new Advanced Parameters, for you to optimize the matching process.

Our comparison routines are highly complex algorithms, which are capable of analysing the content of the fields and find the words which are the most significant and thus which words to weight heaviest during the scoring process.

Here are a few examples of how our algorithms are capable of improving the matching by ‘understanding’ and process the field values.

Example A: If you have 2 account names: ‘ABC Incorporated’ and ‘XYZ Incorporated’, our algorithm will recognize the word “incorporated” and flag this as a company legal type which is more of a descriptive nature than actually the company name, so the comparison and the score will mainly be derived from comparing ‘ABC’ against ‘XYZ’, i.e. the score is more qualitatively more accurate than if the whole field values where compared.

Example B: Account A: ‘International Business Machines’ and Account B: ‘IBM’

Based on a list of Normalization formulas our solution will recognize ‘International Business Machines’ (A) and NORMALIZE this into ‘IBM’ during the comparison, so that the final comparison is between ‘IBM’ (A) and ‘IBM’ (B)

Example C: Account A: ‘St Mary’s Hospital’, Account B: ‘SaintMarysHospital’

In this example our solution will again NORMALIZE the word ‘Saint’ into ‘St’, and identify the word ‘Hospital’ as a description word, thus the main comparison will be by comparing ‘St Mary’s’ (A) with ‘Saint Marys’ (B)

Example D: Email A: ‘Unknown@Unknown.com, Email B: ‘info@datatrim.com’
In this example our solution will reference the list and identify the Email of A as a dummy email (used as a place holder). The advantage of this is that the comparison is now between a value (B) and a blank field.

The score in this case is 0, and can therefore easily be identified in the review process by filters etc, on the Email score field. Whereas a normal comparison on the 2 original emails is likely to give a score of somewhere between 1 and 75, and can in many cases be perceived as a comparison between 2 valid emails.

In the 4 examples above our matching algorithm is supported by an internal reference database containing 1000’s of words to be RELAX’ed about, to NORMALIZE or to IGNORE

Although our list is long and we constantly are adding to it on your behalf, it might be that your database contains words which to your data become significant for the deduplication process.
In a database where you have account names from all over the world, the word “Florida” may not be frequent in account names, but if you are a local company in Florida and you happen to work with the public sector you are likely to have the word ‘Florida’ in many of your account names.

To address this and related specific needs we have Advanced parameters which allows you to add custom word to the list, so that our algorithm will use the standard set plus anything you want to add.

The 3 parameters are defined as follows:
Relax:=:
Normalize:= :=
Exclude:=:

Where is any word which may occur in the field (not case sensitive).
Where < NormalizedWord> is the normalized word to be used for the comparison.
Where is one of the following Values:
ORGNAME
EMAIL
ALL

Examples (Relax):
Relax:=ORGNAME:Builders
Relax:=ORGNAME:Telco
Relax:=ORGNAME:Sparkasse
Relax:=ORGNAME:Florida
Relax:=ORGNAME:Tampa
Relax:=ORGNAME:Miami

Examples (Normalize):
Normalize:=ORGNAME:sforce=salesforce
Normalize:=ORGNAME:BMS=Bristol-Myers Squibb
Normalize:=ORGNAME:Minnesota Mining and Manufacturing Company=3M
Normalize:=NAME:Dave=David

Examples (Exclude):
Exclude:=EMAIL:.@.com
Exclude:=EMAIL:none@none.com
Exclude:=EMAIL:noreply@none.com
Exclude:=ALL:NOCITY
Exclude:=ALL:NOZIP
Exclude:=ALL:LastName
Exclude:=ALL:FirstName

Classic View, of field on the DataTrim Dupe Alert:
Classic View
Lightning View:
Lightning View
Note you can of cause have multiple words for each Advanced Parameter (in comparison to other advanced parameters which are single line parameters).
TIPS: You might eventually hit the limit of the size of the Advanced Parameters text field in salesforce. You can increase the size but you can also replace the key words: ‘Relax’ with ‘R’, ‘Normalize’ with ‘N’ and ‘Exclude’ with ‘E’ to save space.

This functionality is available to all users DataTrim Dupe Alerts

Jul 17

Matching – with Social CRM dupe check

DataTrim Dupe Alerts also include the Website, Facebook, LinkedIn, Twitter and Google+ references to better detect duplicates – all by Default.
Social CRM, blank

This functionality is already available, all you have to do is to go to the setup and provide field references to where you store your Social CRM data.

This feature is of cause also part of the Dupe Detection upon Entry functionality which is part of the Dupe Alerts application.


So when you enter new leads and contacts from e.g. LinkedIn, DataTrim Dupe Alerts will IMMEDIATELY check if a record with the same LinkedIn id exists in your database and alert you of the duplicate.

This functionality complements the existing matching using fuzzy logic and gives you the possibility to detect the duplicates in the very early stage before you start working with the new record.

This functionality is already available, all you have to do is to go to the setup and provide field references to where you store your Social CRM data.

This functionality is available to all users DataTrim Dupe Alerts

Jul 13

Matching – Cross checking phone numbers

In salesforce phone numbers are stored in multiple fields, and your contacts don’t always differentiate between an office phone and a mobile.

We have often seen that Phone and mobile numbers are mixed around, or realizing that the phone number on one records is the mobile and on the other it is the office number.

DataTrim Dupe Alerts are by default using phone numbers to identify and match records, and have incorporated this challenge:

A cross check between the phone numbers to improve the likeliness of finding duplicates and refine the classification by getting a more accurate phone score.

Phone Cross Checking blank

A best practice feature, available within the default matching rules, – all by default.

This functionality is available to all users DataTrim Dupe Alerts

Apr 19

Dupe Alerts, Spring Release 2016

Spring 2016

New Features and Functions:
-Console Updated to Lightning Design, more interactivity and direct click-through to Matched Records
-Dupe Alerts Results, incl analytics, and direct click-through to Matched Records
-Quick Merge improved to also include an end-user oriented Review process.
-Improved Initialization and first time setup process

Console: Interactive and click through to Matched Records

The Dupe Alerts Console has been improved with Lightning Design and interactivity, allowing you to jump directly to the Matched Records, from an Alert or an Alert Result.

Dupe Alerts Console

Dupe Alerts Results, updated View Page

The Dupe Alerts Result has now finally stepped out of the shadow as being “boring” log data. The result of a matching process is now presented in an intuitive way with key information and direct – one-click – access to reviewing the potential duplicates.


Dupe Alert Results Detail

Review Wizard and Quick Merge Redesigned

The Review Wizard, allowing you to effectively go through a list of potential duplicates and take action while reviewing all the details carefully, has been redesigned to meet the Lightning Design standards, but also to simplify the process of reviewing, merging, linking or classifying as False Dupes
The Quick Merge and Convert Wizard has been improved for ease of use, and as displayed below, we now incorporate a User Feedback process, which enables even more collaborative processes.


review wizard account

Support for End-User Review process

Involve your end users in reviewing and validating dupes, without necessary given them the possibility to Merge (Update and Delete). With the Review Wizard in Review mode, the end users can now review their potential dupes, give feedback, and participate in cleaning up your database and improving the quality of the database. Taking ownership of their data, the quality and making the end-users more engaged.


review wizard account review

Improved Initialization and first time setup process

First Impression counts, and so we have also improved teh setup and initialization wizard, so that you can get to the insterestign part without having to go through a difficult setup process.


Setup Step 1
New to this release is though the addition of a Remote Site for enabling our App to create List Views on the Matched Records tab on the fly.


Setup Step 5

New Release – DataTrim Dupe Alerts

General availability: Spring 2016, a new version of DataTrim Dupe Alerts.
Take the advantage of upgrading your existing version and benefit from the new features and bug fixes in this version.
Get Started

Learn more about Dupe Alerts
Contact Us for more information about this solution

Older posts «

» Newer posts