Find Duplicates
This model provides a deduplication solution for identifying duplicate entities such as Person, Contact, Product or any other Object from a given list. This list can be provided using the Data Streaming integration.
Any entity that is provided as part of the Object Data Object with the required attributes is automatically included in the indexing required for serving this model. Please refer to the instructions provided in the Object attribute configuration to prepare the data in the right format.
Note: Data Streaming integration is required for this model. Please complete the integration setup, verify the data connection and ensure that the streaming is Live before using this model.
Parameter Name | Parameter Type | Required |
---|---|---|
input | Text | Yes |
object type | Text | Yes |
result count | Number | No |
Rest API Example
In this example, the input request is trying to check whether a product already exists in the catalog. The input parameter can contain the exact name of the product but a variation of the name can be provided as well as seen in the example below. In the first input example, the exact name of the product is provided and in the second example the name or the search input is being changed slightly.
Example 1: Find Exact Match
Example 2: Find match based on similarity
Parameter Name | Parameter Type | Info |
---|---|---|
results | Dictionary | The full list of results based on the result count parameter provided. |
top result | Dictionary | The top result with the highest similarity score from the list of results |
highest confidence | Number | Value of the highest confidence result |
Rest API Output Example
Example 1 Response: Find Exact Match
Example 2 Response: Find Match based on Similarity
Every model execution output consists of the following standard output parameters
- input
- The input string required for the model to extract the categories
- original input
- This is the input provided to the first step in model which is retained across multiple steps in a Data Machine workflow.
- final result
- The result of the model executed in the final step of the Data Machine workflow
- sessionid
- A unique session id that is generated for every execution of a Data Machine which can be used to retain results across multiple sessions
- status
- The result of the Data Machine execution. If all of the steps in a sequence are successfully executed, a value of "Completed" is provided. If the execution is interrupted at any point, a value of "Terminated" is provided with the reason for Termination.