Skip to content

Conversation

@pggPL
Copy link
Collaborator

@pggPL pggPL commented Dec 11, 2025

Description

Skip L1 tests which should not be run with debug tools.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@pggPL
Copy link
Collaborator Author

pggPL commented Dec 11, 2025

/te-ci pytorch L1

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 11, 2025

Greptile Overview

Greptile Summary

Skips delay_wgrad_compute tests in distributed numerics when debug mode (NVTE_TEST_NVINSPECT_ENABLED) is enabled. Also fixes environment variable parsing by converting to int instead of using the string value directly in boolean context.

Changes:

  • Converted NVTE_TEST_NVINSPECT_ENABLED environment variable to int for proper boolean evaluation
  • Added skip logic in test_linear, test_layernorm_linear, and test_layernorm_mlp to skip tests when delay_wgrad_compute=True and debug mode is enabled
  • Pattern matches existing skip logic in tests/pytorch/test_numerics.py (lines 1911, 2055)

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are minimal, well-targeted, and follow established patterns from other test files. The fix addresses both a bug (incorrect env var parsing) and adds necessary test skips for incompatible configurations. The implementation is consistent with tests/pytorch/test_numerics.py which already has identical skip logic.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
tests/pytorch/distributed/run_numerics.py 5/5 Added skip logic for delayed wgrad tests when debug mode is enabled, and fixed environment variable parsing to use int() conversion

Sequence Diagram

sequenceDiagram
    participant Test as Test Runner
    participant Env as Environment Check
    participant Config as Test Configuration
    participant Skip as Skip Logic
    participant Execute as Test Execution

    Test->>Env: Check NVTE_TEST_NVINSPECT_ENABLED
    Env->>Env: Convert env var to int (0 or 1)
    Env->>Config: Store as global variable
    
    alt Debug mode enabled
        Env->>Config: Initialize nvdlfw_inspect.api
        Config->>Config: Load debug config & features
    end
    
    Test->>Config: Iterate over test kwargs
    Config->>Skip: Check delay_wgrad_compute flag
    
    alt delay_wgrad_compute=True AND debug enabled
        Skip->>Test: Skip test (continue to next)
        Note over Skip,Test: Delayed wgrad is incompatible<br/>with debug mode
    else
        Config->>Execute: Run test with kwargs
        Execute->>Test: Return test results
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. tests/pytorch/distributed/run_numerics.py, line 745 (link)

    logic: Inconsistent env var handling - os.environ.get() returns strings, not booleans. Other test files use int(os.environ.get("NVTE_TEST_NVINSPECT_ENABLED", "0")) (see tests/pytorch/test_numerics.py:106). While this works when set to "True", it would incorrectly evaluate to true if someone sets NVTE_TEST_NVINSPECT_ENABLED="0" or "False".

  2. tests/pytorch/distributed/run_numerics.py, line 929 (link)

    logic: Same env var handling issue - should use int(os.environ.get("NVTE_TEST_NVINSPECT_ENABLED", "0")) for consistency with test_numerics.py.

  3. tests/pytorch/distributed/run_numerics.py, line 1042 (link)

    logic: Same env var handling issue - should use int(os.environ.get("NVTE_TEST_NVINSPECT_ENABLED", "0")) for consistency with test_numerics.py.

1 file reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

…nabled

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant