-
Notifications
You must be signed in to change notification settings - Fork 686
[#9448] improvement(catalogs): Validate table location and add unit tests for location checks #9450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| } | ||
|
|
||
| File file = new File(location); | ||
| return file.isAbsolute(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling is that Lance will throw an exception if the path is illegal, we don't have to check here in Gravitino.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, why do we need to ensure that the path is absolute? My understanding is that relative path should also work (FS will figure out the working directory and how to normalize the relative path), do we need such a strict restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in registering a table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, why do we need to ensure that the path is absolute? My understanding is that relative path should also work (FS will figure out the working directory and how to normalize the relative path), do we need such a strict restriction?
You are right, the whole check is not so necessary in the beginning. I also have the same consideration, let hold it until the 1.1.0 is released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request enhances the Lance catalog implementation by adding validation for table locations to prevent invalid or ambiguous paths from being used during table creation. The changes introduce a new isValidLanceLocation method that validates table locations are either valid URIs with schemes or absolute file paths, along with comprehensive test coverage.
Key Changes
- Added
isValidLanceLocationmethod to validate table locations before table creation - Integrated validation into the
createTablemethod with informative error messages - Added parameterized unit tests covering various valid and invalid location formats
- Added integration test to verify the error handling for invalid locations
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java |
Implements location validation logic using URI parsing and File.isAbsolute() checks, integrated into table creation flow |
catalogs/catalog-lakehouse-generic/src/test/java/org/apache/gravitino/catalog/lakehouse/lance/TestLanceTableOperations.java |
Adds parameterized unit tests for the validation method covering URIs with schemes, absolute paths, and invalid cases |
lance/lance-rest-server/src/test/java/org/apache/gravitino/lance/integration/test/LanceRESTServiceIT.java |
Adds integration test verifying that table registration with invalid location fails with appropriate error message |
| static Stream<Arguments> pathProvider() { | ||
| return Stream.of( | ||
| Arguments.of("/data/lance/table1", true), | ||
| Arguments.of("hdfs://namenode:8020/data/lance/table2", true), | ||
| Arguments.of("s3a://bucket/data/lance/table3", true), | ||
| Arguments.of("file:///data/lance/table4", true), | ||
| Arguments.of("ftp://server/data/lance/table5", true), | ||
| Arguments.of("invalid/path", false), | ||
| Arguments.of("", false), | ||
| Arguments.of(null, false)); |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test cases don't include Windows-style absolute paths (e.g., C:/data/lance/table or C:\\data\\lance\\table). While the implementation should handle these correctly via File.isAbsolute(), adding test cases for Windows paths would improve cross-platform test coverage. Consider adding:
Arguments.of("C:/data/lance/table", true)for forward slashArguments.of("C:\\\\data\\\\lance\\\\table", true)for backslash (note: double escaping in Java strings)
| static Stream<Arguments> pathProvider() { | ||
| return Stream.of( | ||
| Arguments.of("/data/lance/table1", true), | ||
| Arguments.of("hdfs://namenode:8020/data/lance/table2", true), | ||
| Arguments.of("s3a://bucket/data/lance/table3", true), | ||
| Arguments.of("file:///data/lance/table4", true), | ||
| Arguments.of("ftp://server/data/lance/table5", true), | ||
| Arguments.of("invalid/path", false), | ||
| Arguments.of("", false), | ||
| Arguments.of(null, false)); |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider adding test cases for edge cases such as:
- Relative paths with special characters (e.g.,
"./path","../path") - Paths with only whitespace (e.g.,
" ") - URI with scheme but invalid syntax (e.g.,
"http://invalid path with spaces")
These would help ensure the validation is robust against various malformed inputs.
| StringUtils.isNotBlank(location), "Table location must be specified"); | ||
|
|
||
| Preconditions.checkArgument( | ||
| isValidLanceLocation(location), String.format("Table location is invalid:'%s'", location)); |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a spacing issue in the error message format. There should be a space after "invalid:" for better readability. The format should be "Table location is invalid: '%s'" instead of "Table location is invalid:'%s'".
| isValidLanceLocation(location), String.format("Table location is invalid:'%s'", location)); | |
| isValidLanceLocation(location), String.format("Table location is invalid: '%s'", location)); |
|
|
||
| @ParameterizedTest | ||
| @MethodSource("pathProvider") | ||
| void testRegisterWithInvalidLocation(String location, boolean isValid) { |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test method name testRegisterWithInvalidLocation is misleading because it tests both valid and invalid locations. Consider renaming it to testLocationValidation or testIsValidLanceLocation to better reflect what the test is actually verifying.
| void testRegisterWithInvalidLocation(String location, boolean isValid) { | |
| void testLocationValidation(String location, boolean isValid) { |
What changes were proposed in this pull request?
This pull request improves validation for table locations in the Lance catalog implementation. The main change is the introduction of a stricter check to ensure table locations are valid URIs or absolute file paths, preventing invalid or ambiguous locations from being used. Comprehensive unit and integration tests are also added to verify this behavior.
Validation improvements:
isValidLanceLocationmethod inLanceTableOperationsto validate that table locations are either valid URIs with a scheme or absolute file paths. This method is now used in table creation to enforce correct location formats. [1] [2] [3]Testing enhancements:
TestLanceTableOperationsto cover various valid and invalid location formats, ensuring the new validation logic works as intended. [1] [2] [3]LanceRESTServiceITto assert that registering a table with an invalid location fails with the expected error message.Why are the changes needed?
Better user experience.
Fix: #9448
Does this PR introduce any user-facing change?
N/A.
How was this patch tested?
UTs and ITs