Reduce False Positives in Twilio Detector #4516

shahzadhaider1 · 2025-10-23T12:26:18Z

This PR addresses false positive issues in the Twilio detector by making the regex patterns more context-aware and removing an overly generic keyword.

Changes Made

1. Added Context-Aware Regex Patterns

Updated both sidPat and keyPat to require contextual keywords within 40 characters of the credential:

// Before
sidPat = regexp.MustCompile(`\bAC[0-9a-f]{32}\b`)
keyPat = regexp.MustCompile(`\b[0-9a-f]{32}\b`)

// After
sidPat = regexp.MustCompile(detectors.PrefixRegex([]string{"twilio", "account", "sid"}) + `\b(AC[0-9a-f]{32})\b`)
keyPat = regexp.MustCompile(detectors.PrefixRegex([]string{"twilio", "auth", "token", "key"}) + `\b([0-9a-f]{32})\b`)

Why: The previous keyPat matched any 32-character hexadecimal string, which is extremely common in codebases (MD5 hashes, commit SHAs, etc.). By requiring proximity to Twilio-related keywords, we significantly reduce false matches while maintaining detection of legitimate credentials.

2. Removed "sid" from Keywords

// Before
func (s Scanner) Keywords() []string {
    return []string{"sid", "twilio"}
}

// After
func (s Scanner) Keywords() []string {
    return []string{"twilio"}
}

Why: The keyword "sid" is extremely common in code (session IDs, database fields, variable names like user_sid, request_sid, etc.) and was causing the detector to run unnecessarily on a large percentage of scanned files. Since Twilio Account SIDs always start with "AC" and our regex already requires contextual keywords, keeping only "twilio" as the trigger is sufficient and improves performance.

3. Switched to FindAllStringSubmatch

Updated the pattern matching to use FindAllStringSubmatch instead of FindAllString:

// Before
keyMatches := keyPat.FindAllString(dataStr, -1)
sidMatches := sidPat.FindAllString(dataStr, -1)

// After
keyMatches := keyPat.FindAllStringSubmatch(dataStr, -1)
sidMatches := sidPat.FindAllStringSubmatch(dataStr, -1)

for _, sidMatch := range sidMatches {
    sid := sidMatch[1]  // Extract capture group

Why: With the addition of capturing groups in the regex patterns, we need FindAllStringSubmatch to properly extract just the credential values (capture group [1]) without the surrounding context keywords that are used for filtering.

Impact

Reduces false positives: Only matches hex strings that appear near Twilio-related keywords
Improves performance: Detector runs less frequently by removing the generic "sid" keyword
Maintains detection accuracy: Legitimate Twilio credentials will still be detected as they typically appear with contextual keywords like "twilio_auth_token", "TWILIO_ACCOUNT_SID", etc.

Testing

Verified that the detector still matches valid Twilio credentials in common formats while filtering out unrelated hex strings and reducing unnecessary detector invocations.

Checklist:

Tests passing (make test-community)?
Lint passing (make lint this requires golangci-lint)?

…eyword

camgunz

I think your analysis here is correct--the profile indicated that regexp was allocating a lot and the O(N2) combined with a very common pattern is probably why that was happening. Let's merge this and be very careful w/ the rollout--I'll sync up w/ you on Slack

martinlocklear

This really feels like something we want to have unit test coverage on. Feel free to push back if that's not the style in this area of code, but (as I read it) this begs for a unit test to make sure that the new regex isn't catching the false positives that it was previously.

martinlocklear · 2025-12-10T21:45:12Z

pkg/detectors/twilio/twilio.go


 // Keywords are used for efficiently pre-filtering chunks.
 // Use identifiers in the secret preferably, or the provider name.
 func (s Scanner) Keywords() []string {


I really think we need to have unit test coverage the covers this case, especially since it's already been causing problems, just to prevent regressions in the future (and make absolutely clear why we're making this change).

added contextual keywords in regex and removed unnecessary detector k…

911dd63

…eyword

shahzadhaider1 requested a review from a team as a code owner October 23, 2025 12:26

Merge branch 'main' into INS-65-enhance-twilio-detector

5ba3645

shahzadhaider1 changed the title ~~added contextual keywords in regex and removed unnecessary detector k…~~ Reduce False Positives in Twilio Detector Oct 24, 2025

shahzadhaider1 requested a review from a team October 24, 2025 11:45

added more keywords in the regex prefix

7dfc3d6

shahzadhaider1 force-pushed the INS-65-enhance-twilio-detector branch from d48eec3 to 7dfc3d6 Compare October 24, 2025 13:42

camgunz approved these changes Oct 27, 2025

View reviewed changes

Merge branch 'main' into INS-65-enhance-twilio-detector

594676e

shahzadhaider1 requested a review from a team November 5, 2025 05:07

martinlocklear requested changes Dec 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce False Positives in Twilio Detector #4516

Reduce False Positives in Twilio Detector #4516

Uh oh!

shahzadhaider1 commented Oct 23, 2025

Uh oh!

camgunz left a comment

Uh oh!

martinlocklear left a comment •

edited

Loading

Uh oh!

martinlocklear Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reduce False Positives in Twilio Detector #4516

Are you sure you want to change the base?

Reduce False Positives in Twilio Detector #4516

Uh oh!

Conversation

shahzadhaider1 commented Oct 23, 2025

Changes Made

1. Added Context-Aware Regex Patterns

2. Removed "sid" from Keywords

3. Switched to FindAllStringSubmatch

Impact

Testing

Checklist:

Uh oh!

camgunz left a comment

Choose a reason for hiding this comment

Uh oh!

martinlocklear left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martinlocklear Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

martinlocklear left a comment •

edited

Loading