Skip to content

Conversation

@yuqi1129
Copy link
Contributor

What changes were proposed in this pull request?

This pull request improves validation for table locations in the Lance catalog implementation. The main change is the introduction of a stricter check to ensure table locations are valid URIs or absolute file paths, preventing invalid or ambiguous locations from being used. Comprehensive unit and integration tests are also added to verify this behavior.

Validation improvements:

  • Added a new isValidLanceLocation method in LanceTableOperations to validate that table locations are either valid URIs with a scheme or absolute file paths. This method is now used in table creation to enforce correct location formats. [1] [2] [3]

Testing enhancements:

  • Added parameterized unit tests in TestLanceTableOperations to cover various valid and invalid location formats, ensuring the new validation logic works as intended. [1] [2] [3]
  • Updated integration test LanceRESTServiceIT to assert that registering a table with an invalid location fails with the expected error message.

Why are the changes needed?

Better user experience.

Fix: #9448

Does this PR introduce any user-facing change?

N/A.

How was this patch tested?

UTs and ITs

Copilot AI review requested due to automatic review settings December 10, 2025 12:52
}

File file = new File(location);
return file.isAbsolute();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that Lance will throw an exception if the path is illegal, we don't have to check here in Gravitino.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, why do we need to ensure that the path is absolute? My understanding is that relative path should also work (FS will figure out the working directory and how to normalize the relative path), do we need such a strict restriction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in registering a table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, why do we need to ensure that the path is absolute? My understanding is that relative path should also work (FS will figure out the working directory and how to normalize the relative path), do we need such a strict restriction?

You are right, the whole check is not so necessary in the beginning. I also have the same consideration, let hold it until the 1.1.0 is released.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request enhances the Lance catalog implementation by adding validation for table locations to prevent invalid or ambiguous paths from being used during table creation. The changes introduce a new isValidLanceLocation method that validates table locations are either valid URIs with schemes or absolute file paths, along with comprehensive test coverage.

Key Changes

  • Added isValidLanceLocation method to validate table locations before table creation
  • Integrated validation into the createTable method with informative error messages
  • Added parameterized unit tests covering various valid and invalid location formats
  • Added integration test to verify the error handling for invalid locations

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java Implements location validation logic using URI parsing and File.isAbsolute() checks, integrated into table creation flow
catalogs/catalog-lakehouse-generic/src/test/java/org/apache/gravitino/catalog/lakehouse/lance/TestLanceTableOperations.java Adds parameterized unit tests for the validation method covering URIs with schemes, absolute paths, and invalid cases
lance/lance-rest-server/src/test/java/org/apache/gravitino/lance/integration/test/LanceRESTServiceIT.java Adds integration test verifying that table registration with invalid location fails with appropriate error message

Comment on lines +103 to +112
static Stream<Arguments> pathProvider() {
return Stream.of(
Arguments.of("/data/lance/table1", true),
Arguments.of("hdfs://namenode:8020/data/lance/table2", true),
Arguments.of("s3a://bucket/data/lance/table3", true),
Arguments.of("file:///data/lance/table4", true),
Arguments.of("ftp://server/data/lance/table5", true),
Arguments.of("invalid/path", false),
Arguments.of("", false),
Arguments.of(null, false));
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test cases don't include Windows-style absolute paths (e.g., C:/data/lance/table or C:\\data\\lance\\table). While the implementation should handle these correctly via File.isAbsolute(), adding test cases for Windows paths would improve cross-platform test coverage. Consider adding:

  • Arguments.of("C:/data/lance/table", true) for forward slash
  • Arguments.of("C:\\\\data\\\\lance\\\\table", true) for backslash (note: double escaping in Java strings)

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +112
static Stream<Arguments> pathProvider() {
return Stream.of(
Arguments.of("/data/lance/table1", true),
Arguments.of("hdfs://namenode:8020/data/lance/table2", true),
Arguments.of("s3a://bucket/data/lance/table3", true),
Arguments.of("file:///data/lance/table4", true),
Arguments.of("ftp://server/data/lance/table5", true),
Arguments.of("invalid/path", false),
Arguments.of("", false),
Arguments.of(null, false));
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider adding test cases for edge cases such as:

  • Relative paths with special characters (e.g., "./path", "../path")
  • Paths with only whitespace (e.g., " ")
  • URI with scheme but invalid syntax (e.g., "http://invalid path with spaces")

These would help ensure the validation is robust against various malformed inputs.

Copilot uses AI. Check for mistakes.
StringUtils.isNotBlank(location), "Table location must be specified");

Preconditions.checkArgument(
isValidLanceLocation(location), String.format("Table location is invalid:'%s'", location));
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a spacing issue in the error message format. There should be a space after "invalid:" for better readability. The format should be "Table location is invalid: '%s'" instead of "Table location is invalid:'%s'".

Suggested change
isValidLanceLocation(location), String.format("Table location is invalid:'%s'", location));
isValidLanceLocation(location), String.format("Table location is invalid: '%s'", location));

Copilot uses AI. Check for mistakes.

@ParameterizedTest
@MethodSource("pathProvider")
void testRegisterWithInvalidLocation(String location, boolean isValid) {
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test method name testRegisterWithInvalidLocation is misleading because it tests both valid and invalid locations. Consider renaming it to testLocationValidation or testIsValidLanceLocation to better reflect what the test is actually verifying.

Suggested change
void testRegisterWithInvalidLocation(String location, boolean isValid) {
void testLocationValidation(String location, boolean isValid) {

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Lance Rest Server: precheck the invalid location

2 participants