PII detection, also known as sensitive data discovery, is the automated process of scanning structured and unstructured data to identify columns, fields, or documents containing personally identifiable information. This includes direct identifiers like social security numbers, email addresses, and phone numbers, as well as quasi-identifiers that can be combined to re-identify individuals. The process typically employs pattern matching, regular expressions, natural language processing (NLP), and machine learning classifiers to recognize data formats and semantic contexts indicative of sensitive content.




