Skip to content

test: add PostgreSQL TPC-H integration tests#855

Open
bestbeforetoday wants to merge 5 commits into
substrait-io:mainfrom
bestbeforetoday:tpch-reference-tests
Open

test: add PostgreSQL TPC-H integration tests#855
bestbeforetoday wants to merge 5 commits into
substrait-io:mainfrom
bestbeforetoday:tpch-reference-tests

Conversation

@bestbeforetoday

@bestbeforetoday bestbeforetoday commented Jun 8, 2026

Copy link
Copy Markdown
Member

Extends the changes in #700 to generate TPC-H data on demand during test execution and avoid checking in large amounts of test data.

Signed-off-by: Niels Pardon <par@zurich.ibm.com>
@bestbeforetoday bestbeforetoday marked this pull request as ready for review June 8, 2026 18:44
@bestbeforetoday bestbeforetoday changed the title feat: add PostgreSQL TPC-H integration tests test: add PostgreSQL TPC-H integration tests Jun 8, 2026
Signed-off-by: Mark S. Lewis <Mark.S.Lewis@outlook.com>
Signed-off-by: Mark S. Lewis <Mark.S.Lewis@outlook.com>
Signed-off-by: Mark S. Lewis <Mark.S.Lewis@outlook.com>
Comment on lines +41 to +42
// TODO: These queries produce different results when generated from Substrait
private static final List<Integer> EXCLUDED_QUERIES = List.of(14);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting that query 14 is not producing the same result for you while for my PR with the static data it was query 21 that was not producing the same result

@bestbeforetoday bestbeforetoday Jun 10, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Calcite version has been bumped up between those two PRs. Possibly that has made a difference.

I notice that with larger scale factors more failure start to appear. I suspect this might be due to resource constraints in the containerized test environment so stuck to a small scale factor. It might also be that a larger variety of data shows up edge case failures.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I downgraded the Calcite version (to 1.41.0) and ran this test locally with identical (scale factor 0.001) test data. This gives failure for TPC-H query 21, just as you were seeing before. Using Calcite 1.42.0 produces failure for only TPC-H query 14.

Increasing the scale factor to 0.01, query 21 remains the only failure with Calcite 1.41.0 whereas with Calcite 1.42.0 both queries 8 and 14 fail:

PostgreSqlIntegrationTest > testTpcH(int) > [8] 8 FAILED
    org.opentest4j.AssertionFailedError: Reference and generated SQL produce 2 different results.

    Reference SQL:
    select
      "O_YEAR",
      sum(case
        when "NATION" = 'EGYPT' then "VOLUME"
        else 0
      end) / sum("VOLUME") as "MKT_SHARE"
    from
      (
        select
          extract(year from "O"."O_ORDERDATE") as "O_YEAR",
          "L"."L_EXTENDEDPRICE" * (1 - "L"."L_DISCOUNT") as "VOLUME",
          "N2"."N_NAME" as "NATION"
        from
          "PART" "P",
          "SUPPLIER" "S",
          "LINEITEM" "L",
          "ORDERS" "O",
          "CUSTOMER" "C",
          "NATION" "N1",
          "NATION" "N2",
          "REGION" "R"
        where
          "P"."P_PARTKEY" = "L"."L_PARTKEY"
          and "S"."S_SUPPKEY" = "L"."L_SUPPKEY"
          and "L"."L_ORDERKEY" = "O"."O_ORDERKEY"
          and "O"."O_CUSTKEY" = "C"."C_CUSTKEY"
          and "C"."C_NATIONKEY" = "N1"."N_NATIONKEY"
          and "N1"."N_REGIONKEY" = "R"."R_REGIONKEY"
          and "R"."R_NAME" = 'MIDDLE EAST'
          and "S"."S_NATIONKEY" = "N2"."N_NATIONKEY"
          and "O"."O_ORDERDATE" between date '1995-01-01' and date '1996-12-31'
          and "P"."P_TYPE" = 'PROMO BRUSHED COPPER'
      ) as "ALL_NATIONS"
    group by
      "O_YEAR"
    order by
      "O_YEAR"


    Generated SQL:
    SELECT "t3"."$f600" AS "O_YEAR", "t3"."$f4" AS "MKT_SHARE"
    FROM (SELECT EXTRACT(YEAR FROM "ORDERS"."O_ORDERDATE") AS "$f600", SUM(CAST(CASE WHEN CAST("NATION0"."N_NAME" AS VARCHAR(25)) = 'EGYPT' THEN "LINEITEM"."L_EXTENDEDPRICE" * (1 - "LINEITEM"."L_DISCOUNT") ELSE 0 END AS DECIMAL(19, 0))) / SUM("LINEITEM"."L_EXTENDEDPRICE" * (1 - "LINEITEM"."L_DISCOUNT")) AS "$f4"
    FROM "PART",
    "SUPPLIER",
    "LINEITEM",
    "ORDERS",
    "CUSTOMER",
    "NATION",
    "NATION" AS "NATION0",
    "REGION"
    WHERE "PART"."P_PARTKEY" = "LINEITEM"."L_PARTKEY" AND "SUPPLIER"."S_SUPPKEY" = "LINEITEM"."L_SUPPKEY" AND ("LINEITEM"."L_ORDERKEY" = "ORDERS"."O_ORDERKEY" AND ("ORDERS"."O_CUSTKEY" = "CUSTOMER"."C_CUSTKEY" AND "CUSTOMER"."C_NATIONKEY" = "NATION"."N_NATIONKEY")) AND ("NATION"."N_REGIONKEY" = "REGION"."R_REGIONKEY" AND CAST("REGION"."R_NAME" AS VARCHAR(25)) = 'MIDDLE EAST' AND ("SUPPLIER"."S_NATIONKEY" = "NATION0"."N_NATIONKEY" AND ("ORDERS"."O_ORDERDATE" >= DATE '1995-01-01' AND "ORDERS"."O_ORDERDATE" <= DATE '1996-12-31' AND "PART"."P_TYPE" = 'PROMO BRUSHED COPPER')))
    GROUP BY EXTRACT(YEAR FROM "ORDERS"."O_ORDERDATE")
    ORDER BY 1) AS "t3"
PostgreSqlIntegrationTest > testTpcH(int) > [14] 14 FAILED
    org.opentest4j.AssertionFailedError: Reference and generated SQL produce 2 different results.

    Reference SQL:
    select
      100.00 * sum(case
        when "P"."P_TYPE" like 'PROMO%'
          then "L"."L_EXTENDEDPRICE" * (1 - "L"."L_DISCOUNT")
        else 0
      end) / sum("L"."L_EXTENDEDPRICE" * (1 - "L"."L_DISCOUNT")) as "PROMO_REVENUE"
    from
      "LINEITEM" "L",
      "PART" "P"
    where
      "L"."L_PARTKEY" = "P"."P_PARTKEY"
      and "L"."L_SHIPDATE" >= date '1994-08-01'
      and "L"."L_SHIPDATE" < date '1994-08-01' + interval '1 month'


    Generated SQL:
    SELECT 100.00 * SUM(CAST(CASE WHEN "PART"."P_TYPE" LIKE 'PROMO%' THEN "LINEITEM"."L_EXTENDEDPRICE" * (1 - "LINEITEM"."L_DISCOUNT") ELSE 0 END AS DECIMAL(19, 0))) / SUM("LINEITEM"."L_EXTENDEDPRICE" * (1 - "LINEITEM"."L_DISCOUNT")) AS "PROMO_REVENUE"
    FROM "LINEITEM",
    "PART"
    WHERE "LINEITEM"."L_PARTKEY" = "PART"."P_PARTKEY" AND "LINEITEM"."L_SHIPDATE" >= DATE '1994-08-01' AND "LINEITEM"."L_SHIPDATE" < (DATE '1994-08-01' + INTERVAL '0-1' YEAR TO MONTH)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failures above demonstrate the value of these tests, since they are not picked up by any of the existing unit tests. This change aims to deliver the tests, not to resolve existing problems that they highlight. That should happen in other PRs.

Assert that expected failures occur to ensure the list of expected
failures is accurate. Also increase the scale factor used to generate
TPC-H test data to increase the chances of detecting edge-case
inconsistencies in query results.

Signed-off-by: Mark S. Lewis <Mark.S.Lewis@outlook.com>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with PostgreSQL TPC-H, but where do these test files actually come from?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Niels created them (for PR #700) by modifying the existing TPC-H queries so that the SQL syntax was acceptable to PostgreSQL. He can talk to that better than me though.

I would personally prefer for exactly the same input SQL used to generate the Substrait plan also be used as the reference SQL. I will look at that when I get a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants