-
Notifications
You must be signed in to change notification settings - Fork 14
[JSON Validator] Tuple validation, multi-type enum checks #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JSON Validator] Tuple validation, multi-type enum checks #213
Conversation
|
@adamziel I purposefully pushed the failing test first so that you can see the exact test I added failing in CI. Second commit adds the fix, which makes all tests pass again (except 1 that's stuck in setting up PHP in CI). |
| // In PHP, an associative array like {"type":"string"} is still an array. | ||
| // We need to distinguish between a list of schemas (tuple validation) | ||
| // and a single schema object (list validation). A simple way is to | ||
| // check if the keys are sequential integers from 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a good candidate for array_is_list polyfill (not a blocker here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit confused about this logic. It's, effectively:
$is_tuple_validation = array_is_list( $schema['items'] );
$is_list_validation = ! array_is_list( $schema['items'] );Which reads like a contradiction. Also, a "list of schemas (tuple validation)" vs "single schema object (list validation)" confuses me, too. Can we find clearer terms to describe it?
I've looked it up in the JSON schema spec and perhaps this quote would help:
* List validation: a sequence of arbitrary length where each item matches the same schema
* Tuple validation: a sequence of fixed length where each item may have a different schema. In this usage, the index (or location) of each item is meaningful as to how the value is interpreted. (This usage is often given a whole separate type in some programming languages, such as Python's tuple).
Although it tells me that array_is_list() is not the right check here, since we need to confirm we have a specific number of items, each defining a separate schema. It seems like this check needs to be more involved than it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamziel Thanks for the sharp & helpful feedback! I pushed another commit in hopes of making it more readable. Since "list" and "tuple" is how JSON schema spec defined it, I renamed the flag vars to is_tuple_schema after added a detailed explanation in the comments.
I have added a second check inside the tuple validation loop to make it a two-layered approach (first checking the schema's intent, then its integrity) which should fully address the concerns you raised. If I misunderstood, please let me know.
I have added some tuple related to test cases to the data provider of testArrayValidation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And better polyfills available on array_is_list() page. Top voted note has the best one.
Do you want that one added to components/Polyfill/php-functions.php and used here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashfame up to you, it doesn't seem like a big deal and maybe it's for the best that the validator doesn't have an implicit dependency on polyfills 🤔
| // If there's no schema at $schema['items'][$idx], it's an "additional item", handled above. | ||
| } | ||
| } else { | ||
| // === LIST VALIDATION (existing logic) === |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLMs telltale sign: marking a code branch as "existing logic". Once we merge this PR, both branches become existing logic and that comment no longer makes sense. How about:
| // === LIST VALIDATION (existing logic) === | |
| // List validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very true!
| $is_tuple_validation = is_array( $schema['items'] ) && array_keys( $schema['items'] ) === range( 0, count( $schema['items'] ) - 1 ); | ||
|
|
||
| if ( $is_tuple_validation ) { | ||
| // === TUPLE VALIDATION === |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // === TUPLE VALIDATION === | |
| // Tuple validation |
|
|
||
| if ( $is_tuple_schema ) { | ||
| // Tuple validation. | ||
| // Note: We do not support the "additionalItems" keyword. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be cool to support it eventually. Not a blocker here
|
Thank you @ashfame! |
In my wpcom work, I found the following valid blueprint fixtures were out of date as per blueprints v2 spec. Our validator flagged them, and it now passes with the changes in this PR. Also, note that these fixtures files are currently not used in the repo. Nor they were in the original commit. Seems like an intermediary artifact. With this PR, we can atleast have them to be upto dated & can possibly start using them in tests that we can add. In #213 I have the entire incorrect valid fixture file in the test, but if they are indeed to kept around, it can also use the fixture file.
This PR addresses two related bugs in the schema validator that caused it to throw fatal errors (
UnsupportedSchemaExceptionor aTypeError) when it should have gracefully returned aValidationError.The Problem
The validator would crash when attempting to validate data against a schema that used either of the following valid JSON Schema constructs:
itemskeyword was an array of schemas (e.g.,{"type": "array", "items": [{"type": "string"}, {"type": "object"}]}), the validator would throw anUnsupportedSchemaExceptioninstead of validating the data.typekeyword along with anenum(e.g.,{"type": ["string", "number"]}), the validator's internal integrity check would cause a fatalTypeError.This meant that schemas that are perfectly valid according to the JSON Schema spec were not supported and would break the validation process entirely.
Root Cause Analysis
validate_array()method was only implemented to handle "list validation" (whereitemsis a single schema object). It lacked the logic to handle "tuple validation," causing it to misinterpret the array of schemas and throw an exception.validate_type()was incorrectly callingtype_matches()—which only accepts a string for the type—instead of thetype_matches_any()method, which is designed to handle an array of types.The Solution
This PR implements two fixes to make the validator more compliant with the JSON Schema spec:
validate_array()method now detects when theitemskeyword is an array (a tuple) and applies the correct validation logic, iterating through the data and schema arrays in lockstep.validate_type()has been updated to use thetype_matches_any()method, allowing it to correctly validate schemas that use multiple types.How to Verify
I have added a new unit test,
testItReturnsValidationErrorForInvalidTupleData(), which uses a schema with tuple validation.UnsupportedSchemaException.ValidationErrorobject as expected.