The code for the reworked n-distinct estimation on commit
7b504eb282 was
written differently in a previous version of the patch, prior to commit;
on rewriting it, we missed updating an initializer. This caused the
code to (mistakenly) apply a fudge factor even in the case where a
single value is applied, leading to incorrect results.
This means that the 'relvarcount' variable name is now wrong. Add a
comment to try and make the situation clearer, and remove an incorrect
comment I added.
Problem noticed, and code patch, by Tomas Vondra. Additional commentary
by Álvaro.
RelOptInfo *rel = varinfo1->rel;
double reldistinct = 1;
double relmaxndistinct = reldistinct;
- int relvarcount = 1;
+ int relvarcount = 0;
List *newvarinfos = NIL;
List *relvarinfos = NIL;
* we multiply them together. Any remaining relvarinfos after
* no more multivariate matches are found are assumed independent too,
* so their individual ndistinct estimates are multiplied also.
+ *
+ * While iterating, count how many separate numdistinct values we
+ * apply. We apply a fudge factor below, but only if we multiplied
+ * more than one such values.
*/
while (relvarinfos)
{
reldistinct *= mvndistinct;
if (relmaxndistinct < mvndistinct)
relmaxndistinct = mvndistinct;
- relvarcount++; /* inaccurate, but doesn't matter */
+ relvarcount++;
}
else
{