-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Reduce False Positives in Twilio Detector #4516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Reduce False Positives in Twilio Detector #4516
Conversation
d48eec3 to
7dfc3d6
Compare
camgunz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your analysis here is correct--the profile indicated that regexp was allocating a lot and the O(N2) combined with a very common pattern is probably why that was happening. Let's merge this and be very careful w/ the rollout--I'll sync up w/ you on Slack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really feels like something we want to have unit test coverage on. Feel free to push back if that's not the style in this area of code, but (as I read it) this begs for a unit test to make sure that the new regex isn't catching the false positives that it was previously.
|
|
||
| // Keywords are used for efficiently pre-filtering chunks. | ||
| // Use identifiers in the secret preferably, or the provider name. | ||
| func (s Scanner) Keywords() []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really think we need to have unit test coverage the covers this case, especially since it's already been causing problems, just to prevent regressions in the future (and make absolutely clear why we're making this change).
This PR addresses false positive issues in the Twilio detector by making the regex patterns more context-aware and removing an overly generic keyword.
Changes Made
1. Added Context-Aware Regex Patterns
Updated both
sidPatandkeyPatto require contextual keywords within 40 characters of the credential:Why: The previous
keyPatmatched any 32-character hexadecimal string, which is extremely common in codebases (MD5 hashes, commit SHAs, etc.). By requiring proximity to Twilio-related keywords, we significantly reduce false matches while maintaining detection of legitimate credentials.2. Removed "sid" from Keywords
Why: The keyword "sid" is extremely common in code (session IDs, database fields, variable names like
user_sid,request_sid, etc.) and was causing the detector to run unnecessarily on a large percentage of scanned files. Since Twilio Account SIDs always start with "AC" and our regex already requires contextual keywords, keeping only "twilio" as the trigger is sufficient and improves performance.3. Switched to FindAllStringSubmatch
Updated the pattern matching to use
FindAllStringSubmatchinstead ofFindAllString:Why: With the addition of capturing groups in the regex patterns, we need
FindAllStringSubmatchto properly extract just the credential values (capture group [1]) without the surrounding context keywords that are used for filtering.Impact
Testing
Verified that the detector still matches valid Twilio credentials in common formats while filtering out unrelated hex strings and reducing unnecessary detector invocations.
Checklist:
make test-community)?make lintthis requires golangci-lint)?