Data Machines
...
Models Supported by Data Machi...
Activity Models

Find Duplicates

6min

This model provides a deduplication solution for identifying duplicate entities such as Person, Contact, Product or any other Object from a given list. This list can be provided using the Data Streaming integration.

Any entity that is provided as part of the Object Data Object with the required attributes is automatically included in the indexing required for serving this model. Please refer to the instructions provided in the Object attribute configuration to prepare the data in the right format.

Note: Data Streaming integration is required for this model. Please complete the integration setup, verify the data connection and ensure that the streaming is Live before using this model.

Model Input Parameters

Parameter Name

Parameter Type

Required

input

Text

Yes

object type

Text

Yes

result count

Number

No

Rest API Example

In this example, the input request is trying to check whether a product already exists in the catalog. The input parameter can contain the exact name of the product but a variation of the name can be provided as well as seen in the example below. In the first input example, the exact name of the product is provided and in the second example the name or the search input is being changed slightly.

Example 1: Find Exact Match

JS


Example 2: Find match based on similarity

JS


Model Output Result

Parameter Name

Parameter Type

Info

results

Dictionary

The full list of results based on the result count parameter provided.

top result

Dictionary

The top result with the highest similarity score from the list of results

highest confidence

Number

Value of the highest confidence result

Rest API Output Example

Example 1 Response: Find Exact Match

JSON


Example 2 Response: Find Match based on Similarity

JSON


Standard Output Parameters

Every model execution output consists of the following standard output parameters

  • input
    • The input string required for the model to extract the categories
  • original input
    • This is the input provided to the first step in model which is retained across multiple steps in a Data Machine workflow.
  • final result
    • The result of the model executed in the final step of the Data Machine workflow
  • sessionid
    • A unique session id that is generated for every execution of a Data Machine which can be used to retain results across multiple sessions
  • status
    • The result of the Data Machine execution. If all of the steps in a sequence are successfully executed, a value of "Completed" is provided. If the execution is interrupted at any point, a value of "Terminated" is provided with the reason for Termination.