Data Machines
...
Models Supported by Data Machi...
NLP Models

Detect PII

5min

Detect PII, identifies and if desired redacts and/or extracts mentions of entities related to PII (person, organizations, dates, emails, SSN, bank account numbers, credit card numbers, etc.)

These models detect, classify and provide options to de-identify personal identifiable information (PII) in unstructured text. A simple identification example would be if the phrase My email is [email protected] is analyzed it can return the specific entity and the label for that entity, "text":"[email protected]" and "type":"EmailAddress".

A more complex example of Detect PII, would be taking the following text:

Hello Support Team,

I am reaching out to seek help with my credit card number 1234 5678 9873 2345 expiring on 11/23. There was a suspicious transaction on 12-Aug-2022 which I reported by calling from my mobile number +1 (423) 111-9999 also I emailed from my email id [email protected]. Would you please let me know the refund status?

Regards,

Sarah

And processing it to redact PII information resulting in:

Hello Support Team, I am reaching out to seek help with my credit card number ******************* expiring on ***** . There was a suspicious transaction on *********** which I reported by calling from my mobile number ** ************** also I emailed from my email id *************************** . Would you please let me know the refund status? Regards, *****

Use cases this can be particularly helpful for would be:

Detecting private information in user feedback

Many organizations collect user feedback is collected through various channels such as product reviews, return requests, support tickets, and feedback forums. You can use Language PII detection service for automatic detection of PII entities to not only proactively warn, but also anonymize before storing posted feedback. Using the automatic detection of PII entities allows you to proactively warn users about sharing private data, and applications to implement measures like storing masked data.

Scanning object storage for presence of sensitive data

Cloud storage solutions are widely used by employees to store business documents in locations either locally controlled or shared by multiple teams. Ensuring these shared locations do not store private information such as employee names, demographics and payroll information requires automatic scanning of all the documents for the presence of PII.  This model can support this process at scale.

Model Input Parameters

Parameter Name

Parameter Type

Required

input

Text

Yes

masking character

Character

No (If not provided, '*' will be used)

Rest API Example

JS


Model Output Result

Parameter Name

Parameter Type

masked text

Text

PII entity count

Number

detected PII entities

JSON String

Rest API Output Example

JSON


Standard Output Parameters

Every model execution output consists of the following standard output parameters

  • input
    • The input string required for the model to extract the categories
  • original input
    • This is the input provided to the first step in model which is retained across multiple steps in a Data Machine workflow.
  • final result
    • The result of the model executed in the final step of the Data Machine workflow
  • sessionid
    • A unique session id that is generated for every execution of a Data Machine which can be used to retain results across multiple sessions
  • status
    • The result of the Data Machine execution. If all of the steps in a sequence are successfully executed, a value of "Completed" is provided. If the execution is interrupted at any point, a value of "Terminated" is provided with the reason for Termination.