[INS-206] Store Gitlab Project ID in secret location metadata #4601
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
This PR adds the changes to include gitlab project details like project ID, project name and project owner name to the metadata of a chunk.
Achieving this wasn't straightforward as the chunks are generated by the
gitsource and it calls an injectedSourceMetadataFuncto create the metadata. The signature of this function obviously does not include the gitlab project ID.The solution implemented here is to maintain a
repoToProjectCachemap that stores the project details for a repo. This cache is populated as we enumerate the repos and the callback function uses this cache to populate the project details.The only concerning part of this solution is memory usage, so just to put in perspective how much this impacts that, I found a public organization with ~3000 projects and ran benchmarks for before and after making the changes. (I know this isn't anywhere close to the largest organizations we've come across, but this is the biggest one I could find that was public)
Results before making changes:
Total memory usage: 43234992 bytes (43.23 MBs)
Results after this change:
Total memory usage: 44666216 bytes (44.66 MBs)
It's important to note that this doesn't really perform a full scan (I tried doing that first but it would take hours), so I tweaked the code to expose the callback function and directly called that after enumerating the repos. This is the code used for benchmarking:
Checklist:
make test-community)?make lintthis requires golangci-lint)?