Skip to content

fix: jaworinkler_similarity failure on NULL input #15736

@natashasehgal

Description

@natashasehgal

Description

Presto SOT expression fuzzer fails on null input

Update Velox implementation or fix fuzzer as needed
Local testing shows Presto and Velox returning the same result

SELECT
JAROWINKLER_SIMILARITY(col1, col1) as same_col_null,
JAROWINKLER_SIMILARITY(col1, col2) as diff_cols_null,
JAROWINKLER_SIMILARITY(col1, col3) as one_null_one_value
FROM (
VALUES
(NULL, NULL, '1')
) AS t(col1, col2, col3);

(or)

SELECT
JAROWINKLER_SIMILARITY(col1, col1) as same_col_null,
JAROWINKLER_SIMILARITY(col1, col2) as diff_cols_null
FROM (
VALUES
(CAST(NULL AS VARCHAR), CAST(NULL AS VARCHAR))
) AS t(col1, col2);

Error Reproduction

I1209 01:13:12.494881 1328 ExpressionFuzzerVerifier.cpp:389] ==============================> Started iteration 1590 (seed: 2140540856)
I1209 01:13:12.495028 1328 ExpressionVerifier.cpp:247] Executing expression 0 : jarowinkler_similarity("c0","c0")
I1209 01:13:12.495045 1328 ExpressionVerifier.cpp:247] Executing expression 1 : "row_number"
I1209 01:13:12.495137 1328 ExpressionVerifier.cpp:308] Executing test case: 0
I1209 01:13:12.495260 1328 FuzzerToolkit.cpp:157] Two vectors match.
I1209 01:13:12.495271 1328 ExpressionVerifier.cpp:379] Common eval succeeded.
I1209 01:13:12.495308 1328 PrestoQueryRunner.cpp:484] Execute presto sql: DROP TABLE IF EXISTS t_c6712179_4451_4a07_b5de_b0b85895dbfa
I1209 01:13:12.507293 1328 PrestoQueryRunner.cpp:484] Execute presto sql: CREATE TABLE t_c6712179_4451_4a07_b5de_b0b85895dbfa(c0, row_number) WITH (format = 'DWRF') AS SELECT cast(null as VARCHAR), cast(null as BIGINT)
I1209 01:13:12.536800 1328 PrestoQueryRunner.cpp:484] Execute presto sql: SELECT "$path" FROM t_c6712179_4451_4a07_b5de_b0b85895dbfa
I1209 01:13:12.558835 1328 PrestoQueryRunner.cpp:484] Execute presto sql: DELETE FROM t_c6712179_4451_4a07_b5de_b0b85895dbfa
I1209 01:13:12.581321 1328 PrestoQueryRunner.cpp:484] Execute presto sql: SELECT jarowinkler_similarity(c0, c0) as p0, row_number as p1 FROM (SELECT c0 as c0, row_number as row_number FROM (t_c6712179_4451_4a07_b5de_b0b85895dbfa))
I1209 01:13:12.613256 1328 PrestoQueryRunner.cpp:484] Execute presto sql: DROP TABLE IF EXISTS t_c6712179_4451_4a07_b5de_b0b85895dbfa
E1209 01:13:12.625236 1328 Exceptions.h:53] Line: fbcode/velox/expression/tests/ExpressionVerifier.cpp:480, Function:verify, Expression: exec::test::assertEqualResults( referenceEvalResult.value(), projectionPlan->outputType(), {commonEvalResultRow}) Velox and reference DB results don't match, Source: RUNTIME, ErrorCode: INVALID_STATE
I1209 01:13:12.625325 1328 ExpressionVerifier.cpp:628] Skipping persistence because repro path is empty.
2025-12-09 01:13:13,060 - CogwheelHarness.Framework - INFO - Running test-provided tearDown() method
2025-12-09 01:13:13,060 - CogwheelHarness.Framework - INFO - Running test-provided cleanups
2025-12-09 01:13:13,061 - CogwheelHarness.Framework - INFO - Collecting network connections made by []
�[1mTest results:
[FAILURE] [Test Suite] (474.45s)
[FAILURE] test_velox_fuzzer (474.44s)
Exception: Presto expression fuzzer with Presto as SOT failed
Traceback (most recent call last):
File "/packages/cogwheel_velox_expression_fuzzer_presto_sot_test_harness/cogwheel_velox_expression_fuzzer_presto_sot-inplace#link-tree/windtunnel/cogwheel/test_case_runner.py", line 141, in _run_test_case
result_or_coro = test_case(self._fixture)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/packages/cogwheel_velox_expression_fuzzer_presto_sot_test_harness/cogwheel_velox_expression_fuzzer_presto_sot-inplace#link-tree/windtunnel/cogwheel/lib/logging.py", line 206, in logging_wrapper_impl
ret = function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/packages/cogwheel_velox_expression_fuzzer_presto_sot_test_harness/cogwheel_velox_expression_fuzzer_presto_sot-inplace#link-tree/windtunnel/cogwheel/base.py", line 272, in impl
test_fn(self)
File "/packages/cogwheel_velox_expression_fuzzer_presto_sot_test_harness/cogwheel_velox_expression_fuzzer_presto_sot-inplace#link-tree/velox/expression/fuzzer/facebook/cogwheel_velox_fuzzers.py", line 205, in test_velox_fuzzer
getattr(self, self.fuzzer_job)()
File "/packages/cogwheel_velox_expression_fuzzer_presto_sot_test_harness/cogwheel_velox_expression_fuzzer_presto_sot-inplace#link-tree/velox/expression/fuzzer/facebook/cogwheel_velox_fuzzers.py", line 137, in expression_fuzzer_presto_sot
raise Exception("Presto expression fuzzer with Presto as SOT failed")
Exception: Presto expression fuzzer with Presto as SOT failed

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfuzzerIssues related the to Velox fuzzer test components.fuzzer-found

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions