Masking and anonymization of data

Why does business need data masking?

In an enterprise environment, data almost always goes beyond the production system: to test/dev, BI, DWH, Data Lake, RAG, Document AI, service support, contractors, integrators and internal sandboxes. This is where risk most often appears: real customers, employees, payments, contracts, medical information or commercial terms begin to live in environments where there is less control, more copies and a wider range of users.

Masking and anonymization are needed not only to comply with requirements. This is a way to launch releases, AI pilots and analytics faster without the constant manual approval of each upload. A good project answers a simple management question: what data can be safely used, where it needs to be replaced, where it is enough to restrict access, and where real records should not leave production at all.

Key Terms, Plainly Explained

Term	What does it mean in practice	When to apply
Data Masking	Replacing sensitive values with safe analogues: part of the number, full name, address, account or contract is hidden or converted.	Test/dev, support, demonstrations, reports, uploads to contractors.
Depersonalization	Transformation of personal data so that without additional information it is impossible to identify a specific person. In the Russian context, it is related to Federal Law No. 152-FZ and the requirements of Roskomnadzor.	Analytics, research, data marts, model training, data set sharing.
Pseudonymization	Identifiers are replaced by aliases, but the relationship can be restored if a separate key or mapping table is available.	Scenarios where reversibility is needed: investigations, reconciliations, process support.
Tokenization	The sensitive value is replaced with a token, and the original is stored in a secure loop.	Payment data, client IDs, integrations, APIs.
Synthetic data	Artificially created records that are similar in structure and distribution to real ones, but are not data on specific people.	Development, testing, training, load scenarios, demonstrations.

Abbreviations that are often found in such projects: PDn - personal data, ISPDn — personal data information system, DWH — corporate data warehouse, BI — business analytics, RAG — Retrieval-Augmented Generation, an approach in which AI responds based on corporate sources, DLP — Data Loss Prevention, leak control, DAM/DBF — Database Activity Monitoring / Database Firewall, monitoring and protection of databases.

Where risks arise

Development and testing

A copy of the production database ends up in a less secure environment, where developers, testers, contractors, and CI/CD processes have access to it.

Analytics and DWH

Data is combined from CRM, ERP, 1C, SAP, personal accounts and file storage. The richer the display case, the higher the risk of re-identification.

AI and RAG

Documents, requests, logs and knowledge base fragments may contain personal data, trade secrets, internal prices, contractual terms and closed regulations.

Contractors and support

An external team often needs realistic data for diagnostics, but does not need real passports, phone numbers, accounts, addresses and personal profiles.

Russian and international practice

In Russia, the basic context is set by Federal Law No. 152-FZ “On Personal Data”, requirements for the protection of ISPDn and orders of the FSTEC of Russia, including FSTEC order No. 21. For depersonalization, it is important to consider Roskomnadzor order No. 140 dated June 19, 2025 on requirements for anonymization of personal data. If the environment relates to critical information infrastructure, Federal Law No. 187-FZ and requirements for CII are added.

Useful in international practice NIST Privacy Framework, NIST SP 800-188 by de-identification, NIST SP 800-122 on PII protection, materials ENISA on pseudonymization And EDPB Guidelines 01/2025. We use these approaches as a practical framework: re-identification risk assessment, data minimization, key separation, access control, auditable procedures and regular review of rules.

What RESTART Delivers

Finding sensitive data

We collect a map of systems, tables, documents, logs, APIs, file storages and data owners. We look separately at production, test/dev, DWH, BI, AI and contractors.

Classifying scenarios

We determine where real data is needed, where masking is sufficient, where pseudonymization, tokenization, depersonalization or a synthetic set is needed.

Designing the rules

We describe fields, conversion algorithms, reversibility, key storage, exceptions, roles, logs, quality criteria and residual risk.

We integrate into the architecture

We prepare HLD/LLD, integrations, roles, regulations, pilot, acceptance, operation and connection with information security, Data, DevSecOps, ERP and AI environments.

AI as an accelerator, not an autopilot

AI can significantly speed up a project: find probable personal data in documents, tables, logs and code; classify fields; highlight the risk of re-identification; propose masking rules; check whether sensitive fragments are included in the RAG index, prompts, responses and logs. In test/dev, AI helps generate synthetic data sets that are similar to real ones in structure and distribution.

But the final decision cannot be left to the model. The rules of transformation, the admissibility of anonymization, residual risk and access mode must be confirmed by data owners, information security, lawyers and architects. RESTART's role is to make AI a useful survey and control tool, rather than a source of uncontrollable legal or architectural conclusions.

Secure AI audit RAG pilot Document AI Restart AI Enterprise Platform

Affiliate Technologies

The technology stack depends on the task: static or dynamic masking, tokenization, database protection, DLP, DBF/DAM, user activity control, secure test/dev preparation and integration with existing repositories. RESTART does not start with the selection of one product: first, data, scenarios, risks and architecture are recorded, then an instrumental environment is selected.

DAMASCUS

masking, tokenization, dynamic data protection

Garda

DLP, DBF, Data Masking, NDR, WAF, Anti-DDoS

Partners are listed as the technology backbone of the solution class. The specific composition of products, versions, licenses, certificates and delivery conditions are confirmed before the project.

Information Security partner ecosystem Data, BI and DWH Federal Law No. 152-FZ and ISPDn

What does the client get?

Artifact	Why is it needed?
Sensitive Data Card	Shows where personal data, trade secrets, payment and contract data are stored, who the owner is and where they are transferred.
Scenario Matrix	Separates production, test/dev, BI, DWH, AI, contractors, support and data exchange by risk level.
Conversion rules	They record fields, algorithms, reversibility, keys, exceptions, quality and residual risk.
Implementation architecture	Describes HLD/LLD, integrations, roles, logs, loops, pilot, and operations.
Evidence pack	Provides an evidence base for information security, audit, compliance and internal audits.
Roadmap	Helps you start with the highest-risk and highest-value scenarios without turning the project into an endless inventory.

First step

A practical start is a short examination of one or two environments: for example, production → test/dev, RAG pilot for corporate documents or a BI/DWH showcase with personal and contract data. At this stage, it is important not to promise “complete anonymization of everything,” but to quickly understand the real flows, fields, owners, risks and limitations.

After the first diagnosis, you can choose a route: masking pilot, test/dev preparation policy, rules for AI/RAG, connecting a vendor solution, finalizing the DWH process, or a full-fledged ISPD protection project.

Discuss the Data environment Information security Data, BI, DWH DevSecOps and AppSec

Testing masking in the laboratory

Masking and anonymization are best tested on a controlled set of data: you need to make sure that business logic is preserved, tests pass, analytics don’t break, and real personal data doesn’t go to test/dev, contractors, or AI scripts. The Information Security Lab helps test these conditions before scaling.

Information Security Laboratory Secure AI audit Federal Law No. 152-FZ and ISPDn

Frequently asked questions

Does masking replace personal data protection?

No. This is one of the technical and organizational mechanisms. We also need access roles, logs, regulations, a threat model, upload control and clear data owners.

Is it possible to use real data in test/dev?

Sometimes it is difficult to test complex business logic without this, but such an approach must be justified, limited and controlled. It is often safer to use masked, pseudonymized or synthetic sets.

Is depersonalization always irreversible?

Not always: in practical language, companies confuse depersonalization, pseudonymization, and masking. Therefore, it is important to explicitly state reversibility, keys, re-identification risk, and acceptable use cases in the design.

How does this relate to RAG and enterprise AI?

RAG and AI work with documents, indexes, queries and logs. If the sources contain confidential information or trade secrets, masking, access and logging rules should be designed before the pilot, and not after the demonstration.

Is it possible to start without purchasing DLP or DBF?

Yes. Often the first step is a map of data, scenarios and risks. After it, it becomes clear whether a separate product, refinement of processes, change of roles, masking at the database level, or a combination of measures is needed.

What will be a good result of the first stage?

A clear data map, list of fields and systems, selected conversion methods, restrictions, pilot environment, acceptance criteria and implementation roadmap.

Let's discuss your environment

Describe the task, current systems, constraints, and expected results. We will offer a practical first step: diagnostics, pilot, audit, roadmap or project team.