test: add PostgreSQL TPC-H integration tests#855
Conversation
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
553da51 to
2489ce5
Compare
2489ce5 to
c9be0fd
Compare
Signed-off-by: Mark S. Lewis <Mark.S.Lewis@outlook.com>
c9be0fd to
3f317ba
Compare
Signed-off-by: Mark S. Lewis <Mark.S.Lewis@outlook.com>
72f0420 to
3f36dca
Compare
Signed-off-by: Mark S. Lewis <Mark.S.Lewis@outlook.com>
| // TODO: These queries produce different results when generated from Substrait | ||
| private static final List<Integer> EXCLUDED_QUERIES = List.of(14); |
There was a problem hiding this comment.
interesting that query 14 is not producing the same result for you while for my PR with the static data it was query 21 that was not producing the same result
There was a problem hiding this comment.
The Calcite version has been bumped up between those two PRs. Possibly that has made a difference.
I notice that with larger scale factors more failure start to appear. I suspect this might be due to resource constraints in the containerized test environment so stuck to a small scale factor. It might also be that a larger variety of data shows up edge case failures.
There was a problem hiding this comment.
I downgraded the Calcite version (to 1.41.0) and ran this test locally with identical (scale factor 0.001) test data. This gives failure for TPC-H query 21, just as you were seeing before. Using Calcite 1.42.0 produces failure for only TPC-H query 14.
Increasing the scale factor to 0.01, query 21 remains the only failure with Calcite 1.41.0 whereas with Calcite 1.42.0 both queries 8 and 14 fail:
PostgreSqlIntegrationTest > testTpcH(int) > [8] 8 FAILED
org.opentest4j.AssertionFailedError: Reference and generated SQL produce 2 different results.
Reference SQL:
select
"O_YEAR",
sum(case
when "NATION" = 'EGYPT' then "VOLUME"
else 0
end) / sum("VOLUME") as "MKT_SHARE"
from
(
select
extract(year from "O"."O_ORDERDATE") as "O_YEAR",
"L"."L_EXTENDEDPRICE" * (1 - "L"."L_DISCOUNT") as "VOLUME",
"N2"."N_NAME" as "NATION"
from
"PART" "P",
"SUPPLIER" "S",
"LINEITEM" "L",
"ORDERS" "O",
"CUSTOMER" "C",
"NATION" "N1",
"NATION" "N2",
"REGION" "R"
where
"P"."P_PARTKEY" = "L"."L_PARTKEY"
and "S"."S_SUPPKEY" = "L"."L_SUPPKEY"
and "L"."L_ORDERKEY" = "O"."O_ORDERKEY"
and "O"."O_CUSTKEY" = "C"."C_CUSTKEY"
and "C"."C_NATIONKEY" = "N1"."N_NATIONKEY"
and "N1"."N_REGIONKEY" = "R"."R_REGIONKEY"
and "R"."R_NAME" = 'MIDDLE EAST'
and "S"."S_NATIONKEY" = "N2"."N_NATIONKEY"
and "O"."O_ORDERDATE" between date '1995-01-01' and date '1996-12-31'
and "P"."P_TYPE" = 'PROMO BRUSHED COPPER'
) as "ALL_NATIONS"
group by
"O_YEAR"
order by
"O_YEAR"
Generated SQL:
SELECT "t3"."$f600" AS "O_YEAR", "t3"."$f4" AS "MKT_SHARE"
FROM (SELECT EXTRACT(YEAR FROM "ORDERS"."O_ORDERDATE") AS "$f600", SUM(CAST(CASE WHEN CAST("NATION0"."N_NAME" AS VARCHAR(25)) = 'EGYPT' THEN "LINEITEM"."L_EXTENDEDPRICE" * (1 - "LINEITEM"."L_DISCOUNT") ELSE 0 END AS DECIMAL(19, 0))) / SUM("LINEITEM"."L_EXTENDEDPRICE" * (1 - "LINEITEM"."L_DISCOUNT")) AS "$f4"
FROM "PART",
"SUPPLIER",
"LINEITEM",
"ORDERS",
"CUSTOMER",
"NATION",
"NATION" AS "NATION0",
"REGION"
WHERE "PART"."P_PARTKEY" = "LINEITEM"."L_PARTKEY" AND "SUPPLIER"."S_SUPPKEY" = "LINEITEM"."L_SUPPKEY" AND ("LINEITEM"."L_ORDERKEY" = "ORDERS"."O_ORDERKEY" AND ("ORDERS"."O_CUSTKEY" = "CUSTOMER"."C_CUSTKEY" AND "CUSTOMER"."C_NATIONKEY" = "NATION"."N_NATIONKEY")) AND ("NATION"."N_REGIONKEY" = "REGION"."R_REGIONKEY" AND CAST("REGION"."R_NAME" AS VARCHAR(25)) = 'MIDDLE EAST' AND ("SUPPLIER"."S_NATIONKEY" = "NATION0"."N_NATIONKEY" AND ("ORDERS"."O_ORDERDATE" >= DATE '1995-01-01' AND "ORDERS"."O_ORDERDATE" <= DATE '1996-12-31' AND "PART"."P_TYPE" = 'PROMO BRUSHED COPPER')))
GROUP BY EXTRACT(YEAR FROM "ORDERS"."O_ORDERDATE")
ORDER BY 1) AS "t3"
PostgreSqlIntegrationTest > testTpcH(int) > [14] 14 FAILED
org.opentest4j.AssertionFailedError: Reference and generated SQL produce 2 different results.
Reference SQL:
select
100.00 * sum(case
when "P"."P_TYPE" like 'PROMO%'
then "L"."L_EXTENDEDPRICE" * (1 - "L"."L_DISCOUNT")
else 0
end) / sum("L"."L_EXTENDEDPRICE" * (1 - "L"."L_DISCOUNT")) as "PROMO_REVENUE"
from
"LINEITEM" "L",
"PART" "P"
where
"L"."L_PARTKEY" = "P"."P_PARTKEY"
and "L"."L_SHIPDATE" >= date '1994-08-01'
and "L"."L_SHIPDATE" < date '1994-08-01' + interval '1 month'
Generated SQL:
SELECT 100.00 * SUM(CAST(CASE WHEN "PART"."P_TYPE" LIKE 'PROMO%' THEN "LINEITEM"."L_EXTENDEDPRICE" * (1 - "LINEITEM"."L_DISCOUNT") ELSE 0 END AS DECIMAL(19, 0))) / SUM("LINEITEM"."L_EXTENDEDPRICE" * (1 - "LINEITEM"."L_DISCOUNT")) AS "PROMO_REVENUE"
FROM "LINEITEM",
"PART"
WHERE "LINEITEM"."L_PARTKEY" = "PART"."P_PARTKEY" AND "LINEITEM"."L_SHIPDATE" >= DATE '1994-08-01' AND "LINEITEM"."L_SHIPDATE" < (DATE '1994-08-01' + INTERVAL '0-1' YEAR TO MONTH)
There was a problem hiding this comment.
The failures above demonstrate the value of these tests, since they are not picked up by any of the existing unit tests. This change aims to deliver the tests, not to resolve existing problems that they highlight. That should happen in other PRs.
Assert that expected failures occur to ensure the list of expected failures is accurate. Also increase the scale factor used to generate TPC-H test data to increase the chances of detecting edge-case inconsistencies in query results. Signed-off-by: Mark S. Lewis <Mark.S.Lewis@outlook.com>
There was a problem hiding this comment.
I'm not super familiar with PostgreSQL TPC-H, but where do these test files actually come from?
There was a problem hiding this comment.
I believe Niels created them (for PR #700) by modifying the existing TPC-H queries so that the SQL syntax was acceptable to PostgreSQL. He can talk to that better than me though.
I would personally prefer for exactly the same input SQL used to generate the Substrait plan also be used as the reference SQL. I will look at that when I get a chance.
Extends the changes in #700 to generate TPC-H data on demand during test execution and avoid checking in large amounts of test data.