The Easiest Way To Protect Personal Information: Applying The Privacy Commissioner’s De-identification Guidelines for Structured Data in 10 Easy Steps

Ask: Could someone still figure out who this person is?
If the answer might be yes, the data is probably still personal information.

Pseudonymized data: names and direct identifiers removed, but people may still be identifiable.

De-identified data: both direct and indirect identifiers have been transformed enough that re-identification risk is very low.

week 5 2
Male accountant filing taxes

Even without names, people can often be identified by combinations such as:

  • full or partial postal code
  • exact date of birth
  • rare job title
  • location
  • dates of service
  • gender
  • diagnosis or event type
  • unique transaction details

The guideline stresses that most known re-identification attacks rely on these indirect identifiers.

Be clear about the business purpose before you share or reuse it.

Separate fields into:

  • direct identifiers
  • indirect identifiers
  • other business data

Delete, encrypt, tokenize, or replace them. This is pseudonymization.

Examples:

  • convert exact birth date to age range or year of birth
  • shorten postal code
  • group rare categories together
  • remove unusually unique records
  • suppress small cells in reports
  • add limited noise where appropriate

The guideline says re-identification risk depends on:

  • how vulnerable the data is
  • how likely an attack is

For public release, assume someone will try.

For non-public sharing, contracts, security, and recipient controls matter.

image 2

If you post data publicly, or publish it openly, you should assume an attack is certain. That means you must rely mainly on stronger data transformation, not promises or policies.

If data moves to another department, affiliate, consultant, or vendor, that still needs controls. The guideline emphasizes privacy, security, and contractual safeguards for non-public sharing.

The guideline gives practical threshold examples tied to privacy sensitivity:

  • low privacy invasion: risk threshold 0.09, roughly equivalent to groups of at least 11
  • medium: 0.075, roughly 15
  • high: 0.05, roughly 20

For a small business, a simple takeaway is:

Do not publish or share reports where tiny groups can reveal individuals.

week 8 1
week 8 2

There is always a privacy-versus-utility tradeoff. If you transform too little, privacy risk stays high. If you transform too much, the data may become useless. The goal is an acceptable balance.

Keep a short record of:

  • when it should be reviewed again
  • why you de-identified the data
  • what fields you changed
  • what method you used
  • what threshold you applied
  • who received the data
  • what controls were required

10. Review it again later

The guideline says de-identification decisions are not forever. Risk assessments should usually be reviewed every two to three years, and sooner if something material changes.

  • The Removing names is not enough.
  • Think in combinations.
  • Treat public release as high risk.
  • Use contracts and controls for private sharing.
  • Document your method.
  • Review it over time.
secure communication workshop 1

With thanks to the Office of the Information and Privacy Commissioner of Ontario for their excellent Deidentification Guidelines for Structured Data.