Configure PII detection rules, masking patterns, and de-identification pipelines
De-identification in OpenRails automatically detects and masks personally identifiable information (PII) in documents and conversation data. Rules use pattern matching, dictionary lookups, and category-based detection to find sensitive data, then apply configurable masking to protect it.
| Rule Type | Description | Example |
|---|---|---|
| Pattern | Regex-based detection of structured PII | SSN: \d{3}-\d{2}-\d{4}, Email: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+ |
| Dictionary | Lookup-based detection using curated word lists | Names, cities, medical terms, organization names |
| Category | Broad category-based detection using NLP | Person names, locations, financial data, medical records |
From the sidebar, go to Governance > De-identification.
Click New Rule to open the rule editor.
Choose Pattern, Dictionary, or Category based on the type of PII you want to detect.
Based on the rule type:
Choose how detected PII should be masked:
[REDACTED], [NAME], [SSN])***-**-1234)Assign a priority to determine execution order when multiple rules match the same text. Higher-priority rules take precedence.
Click Save to activate the rule. It will be applied to new content and can be retroactively applied to existing data.
Configure where de-identification rules are applied in the data pipeline:
Use the Test feature to validate rules against sample text before deploying them:
Paste text containing the type of PII your rule targets.
Click Test to see which parts of the text are detected and how they would be masked.
Adjust the pattern, dictionary, or category settings to improve detection accuracy. Repeat until false positives and negatives are minimized.