Skip to content

Conversation

@ashfame
Copy link
Member

@ashfame ashfame commented Dec 4, 2025

This PR addresses two related bugs in the schema validator that caused it to throw fatal errors (UnsupportedSchemaException or a TypeError) when it should have gracefully returned a ValidationError.

The Problem

The validator would crash when attempting to validate data against a schema that used either of the following valid JSON Schema constructs:

  1. Tuple Validation: When an array's items keyword was an array of schemas (e.g., {"type": "array", "items": [{"type": "string"}, {"type": "object"}]}), the validator would throw an UnsupportedSchemaException instead of validating the data.
  2. Multi-type Enums: When a schema used an array for the type keyword along with an enum (e.g., {"type": ["string", "number"]}), the validator's internal integrity check would cause a fatal TypeError.

This meant that schemas that are perfectly valid according to the JSON Schema spec were not supported and would break the validation process entirely.

Root Cause Analysis

  1. The validate_array() method was only implemented to handle "list validation" (where items is a single schema object). It lacked the logic to handle "tuple validation," causing it to misinterpret the array of schemas and throw an exception.
  2. The schema integrity check inside validate_type() was incorrectly calling type_matches()—which only accepts a string for the type—instead of the type_matches_any() method, which is designed to handle an array of types.

The Solution

This PR implements two fixes to make the validator more compliant with the JSON Schema spec:

  1. Added Tuple Validation Logic: The validate_array() method now detects when the items keyword is an array (a tuple) and applies the correct validation logic, iterating through the data and schema arrays in lockstep.
  2. Corrected the Enum Integrity Check: The check in validate_type() has been updated to use the type_matches_any() method, allowing it to correctly validate schemas that use multiple types.

How to Verify

I have added a new unit test, testItReturnsValidationErrorForInvalidTupleData(), which uses a schema with tuple validation.

  • Before this fix: Running this test causes the validator to throw a fatal UnsupportedSchemaException.
  • After this fix: The test now passes, as the validator correctly identifies the invalid data and returns a ValidationError object as expected.

@ashfame
Copy link
Member Author

ashfame commented Dec 4, 2025

@adamziel I purposefully pushed the failing test first so that you can see the exact test I added failing in CI. Second commit adds the fix, which makes all tests pass again (except 1 that's stuck in setting up PHP in CI).

// In PHP, an associative array like {"type":"string"} is still an array.
// We need to distinguish between a list of schemas (tuple validation)
// and a single schema object (list validation). A simple way is to
// check if the keys are sequential integers from 0.
Copy link
Collaborator

@adamziel adamziel Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a good candidate for array_is_list polyfill (not a blocker here)

Copy link
Collaborator

@adamziel adamziel Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused about this logic. It's, effectively:

$is_tuple_validation =   array_is_list( $schema['items'] );
$is_list_validation  = ! array_is_list( $schema['items'] );

Which reads like a contradiction. Also, a "list of schemas (tuple validation)" vs "single schema object (list validation)" confuses me, too. Can we find clearer terms to describe it?

I've looked it up in the JSON schema spec and perhaps this quote would help:

* List validation: a sequence of arbitrary length where each item matches the same schema
* Tuple validation: a sequence of fixed length where each item may have a different schema. In this usage, the index (or location) of each item is meaningful as to how the value is interpreted. (This usage is often given a whole separate type in some programming languages, such as Python's tuple).

Although it tells me that array_is_list() is not the right check here, since we need to confirm we have a specific number of items, each defining a separate schema. It seems like this check needs to be more involved than it is now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel Thanks for the sharp & helpful feedback! I pushed another commit in hopes of making it more readable. Since "list" and "tuple" is how JSON schema spec defined it, I renamed the flag vars to is_tuple_schema after added a detailed explanation in the comments.

I have added a second check inside the tuple validation loop to make it a two-layered approach (first checking the schema's intent, then its integrity) which should fully address the concerns you raised. If I misunderstood, please let me know.

I have added some tuple related to test cases to the data provider of testArrayValidation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And better polyfills available on array_is_list() page. Top voted note has the best one.

Do you want that one added to components/Polyfill/php-functions.php and used here instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashfame up to you, it doesn't seem like a big deal and maybe it's for the best that the validator doesn't have an implicit dependency on polyfills 🤔

// If there's no schema at $schema['items'][$idx], it's an "additional item", handled above.
}
} else {
// === LIST VALIDATION (existing logic) ===
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLMs telltale sign: marking a code branch as "existing logic". Once we merge this PR, both branches become existing logic and that comment no longer makes sense. How about:

Suggested change
// === LIST VALIDATION (existing logic) ===
// List validation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very true!

$is_tuple_validation = is_array( $schema['items'] ) && array_keys( $schema['items'] ) === range( 0, count( $schema['items'] ) - 1 );

if ( $is_tuple_validation ) {
// === TUPLE VALIDATION ===
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// === TUPLE VALIDATION ===
// Tuple validation


if ( $is_tuple_schema ) {
// Tuple validation.
// Note: We do not support the "additionalItems" keyword.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool to support it eventually. Not a blocker here

@adamziel adamziel changed the title Fix: Implement tuple validation and correct multi-type enum checks [JSON Validator] Tuple validation, multi-type enum checks Dec 9, 2025
@adamziel adamziel added the enhancement New feature or request label Dec 9, 2025
@adamziel adamziel merged commit 203ebca into WordPress:trunk Dec 9, 2025
22 checks passed
@adamziel
Copy link
Collaborator

adamziel commented Dec 9, 2025

Thank you @ashfame!

adamziel pushed a commit that referenced this pull request Dec 9, 2025
In my wpcom work, I found the following valid blueprint fixtures were
out of date as per blueprints v2 spec. Our validator flagged them, and
it now passes with the changes in this PR.

Also, note that these fixtures files are currently not used in the repo.
Nor they were in the original commit. Seems like an intermediary
artifact. With this PR, we can atleast have them to be upto dated & can
possibly start using them in tests that we can add. In #213 I have the
entire incorrect valid fixture file in the test, but if they are indeed
to kept around, it can also use the fixture file.
@ashfame ashfame deleted the validator/add_tuple_handling_support branch December 9, 2025 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants