Advanced Data Scrubbing (Beta)

In addition to using beforeSend in your SDK or our server-side data scrubbing features to redact sensitive data, we are currently beta-testing ways to give you more granular control over server-side data scrubbing of your events. Additional functionality includes:

  • Define custom regular expressions to match on sensitive data
  • Detailed tuning on which parts of an event to scrub
  • Partial removal or hashing of sensitive data instead of deletion

Overview

Advanced Data Scrubbing is available only if your organization is enabled as an Early Adopter. To enable this option, navigate to your organization’s settings and enable the “Early Adopter” option. Turning on this option allows access to features prior to full release, and can be disabled at any time.

Early adopters have access to a new option in both organization settings as well as the setting of each project. Go to your project- or organization-settings and click Data Privacy (or Security and Privacy) in the sidebar. Scrolling down, you will find a new section Data Privacy Rules.

Note that everything you configure there will have direct impact on all new events, just as all the other data privacy-related settings do. However, it is not possible to break or undo any other data privacy settings that you may have configured. In other words, it is only possible to accidentally remove too much data, not too little.

If you have any questions related to this feature, feel free to contact us at markus@sentry.io.

A Basic Example

Go to your project- or organization-settings and click Data Privacy (or Security and Privacy) in the sidebar. Scrolling down, you will find a new section Data Privacy Rules.

Click on Add Rule. This already adds a very simple rule:

[Mask] [credit card numbers] from [    ]

As soon as you hit Save, we will attempt to find all creditcard numbers in your events going forward, and replace them with a series of ******, keeping only the last 4 digits.

Rules generally consist of three parts:

Methods

  • Remove: Remove the entire field. We may choose to either set it to null, remove it entirely or replace it with an empty string depending on technical constraints.
  • Mask: Replace all characters with *. For creditcards this replaces everything but the last 4 digits.
  • Hash: Replace the matched substring with a hashed value.
  • Replace: Replace the matched substring with a constant placeholder value such as [Filtered] or [creditcard]. Right now this value cannot be configured.

Data Types

  • Regex Matches: Custom Perl-style regex (PCRE).
  • Credit Card Numbers: Any substrings that look like credit card numbers.
  • Password Fields: Any substrings that look like they may contain passwords. Any string that mentions passwords, auth tokens or credentials, any variable that is called password or auth.
  • IP Addresses: Any substrings that look like valid IPv4 or IPv6 addresses.
  • IMEI Numbers: Any substrings that look like an IMEI or IMEISV.
  • Email Addresses
  • UUIDs
  • PEM Keys: Any substrings that look like the content of a PEM-keyfile.
  • Auth in URLs: Usernames and passwords in URLs like https://user:pass@example.com/foo.
  • US social security numbers: 9-digit social security numbers for the USA.
  • Usernames in filepaths: For example myuser in /Users/myuser/file.txt, C:/Users/myuser/file.txt, C:/Documents and Settings/myuser/file.txt, /home/myuser/file.txt, …
  • MAC Addresses
  • Anything: Matches any value. This is useful if you want to remove a certain JSON key by path using Sources regardless of the value.

Sources

Selectors allow you to restrict rules to certain parts of the event. This is useful to unconditionally remove certain data by event attribute, and can also be used to conservatively test rules on real data. A few examples:

  • ** to scrub everything
  • $error.value to scrub in the exception message
  • $message to scrub the event-level log message
  • extra.'My Value' to scrub the key My Value in “Additional Data”
  • extra.** to scrub everything in “Additional Data”
  • $http.headers.x-custom-token to scrub the request header X-Custom-Token
  • $user.ip_address to scrub the user’s IP address
  • $frame.vars.foo to scrub a stack trace frame variable called foo
  • contexts.device.timezone to scrub a key from the Device context
  • tags.server_name to scrub the tag server_name

All key names are treated case-insensitively.

Advanced source names

Data scrubbing always works on the raw event payload. Keep in mind that some fields in the UI may be called differently in the JSON schema. When looking at an event there should always be a link called “JSON” present that allows you to see what the data scrubber sees.

For example, what is called “Additional Data” in the UI is called extra in the event payload. To remove a specific key called foo, you would write:

[Remove] [Anything] from [extra.foo]

Another example. Sentry knows about two kinds of error messages: the exception message, and the top-level log message. Here is an example of how such an event payload as sent by the SDK (and downloadable from the UI) would look like:

{
  "logentry": {
    "formatted": "Failed to roll out the dinglebop"
  },
  "exception": {
    "values": [
      {
        "type": "ZeroDivisionError",
        "value": "integer division or modulo by zero",
      }
    ]
  }
}

Since the “error message” is taken from the exception’s value, and the “message” is taken from logentry, we would have to write the following to remove both from the event:

[Remove] [Anything] from [exception.values.*.value]
[Remove] [Anything] from [logentry.formatted]

Boolean Logic

You can combine sources using boolean logic.

  • Prefix with ! to invert the source. foo matches the JSON key foo, while !foo matches everything but foo.
  • Build the conjunction (AND) using &&, such as: foo && !extra.foo to match the key foo except when inside of extra.
  • Build the disjunction (OR) using ||, such as: foo || bar to match foo or bar.

Wildcards

  • ** matches all subpaths, so that foo.** matches all JSON keys within foo.
  • * matches a single path item, so that foo.* matches all JSON keys one level below foo.

Value Types

Select subsections by JSON-type using the following:

  • $string matches any string value
  • $number matches any integer or float value
  • $datetime matches any field in the event that represents a timestamp
  • $array matches any JSON array value
  • $object matches any JSON object

Select known parts of the schema using the following:

  • $error matches a single exception instance in {"exception": {"values": [...]}}
  • $stack matches a stack trace instance
  • $frame matches a frame in a stack trace
  • $http matches the HTTP request context of an event
  • $user matches the user context of an event
  • $message matches the top-level log message in {"logentry": {"formatted": ...}}
  • $logentry matches the logentry attribute of an event.
  • $thread matches a single thread instance in {"threads": {"values": [...]}}
  • $breadcrumb matches a single breadcrumb in {"breadcrumbs": {"values": [...]}}
  • $span matches a trace span
  • $sdk matches the SDK context in {"sdk": ...}

Escaping Special Characters

If the object key you want to match contains whitespace or special characters, you can use quotes to escape it:

[Remove] [Anything] from [extra.'my special value']

This matches the key my special value in Additional Data.

To escape ' (single quote) within the quotes, replace it with '' (two quotes):

[Remove] [Anything] from [extra.'my special '' value']

This matches the key my special ' value in Additional Data.