granicus.if.org Git - postgresql/commit

author	Tom Lane <tgl@sss.pgh.pa.us>
	Sun, 7 Aug 2016 22:52:02 +0000 (18:52 -0400)
committer	Tom Lane <tgl@sss.pgh.pa.us>
	Sun, 7 Aug 2016 22:52:02 +0000 (18:52 -0400)
commit	95bee941be4c009ebbc29162a0dc9664f40de12f
tree	97557258e66828f2075dbf6281d416c53cde4621	tree \| snapshot
parent	8a8c6b53810026641a1e12f60f873a7bd3cea5e3	commit \| diff

Fix misestimation of n_distinct for a nearly-unique column with many nulls.

If ANALYZE found no repeated non-null entries in its sample, it set the
column's stadistinct value to -1.0, intending to indicate that the entries
are all distinct.  But what this value actually means is that the number
of distinct values is 100% of the table's rowcount, and thus it was
overestimating the number of distinct values by however many nulls there
are.  This could lead to very poor selectivity estimates, as for example
in a recent report from Andreas Joseph Krogh.  We should discount the
stadistinct value by whatever we've estimated the nulls fraction to be.
(That is what will happen if we choose to use a negative stadistinct for
a column that does have repeated entries, so this code path was just
inconsistent.)

In addition to fixing the stadistinct entries stored by several different
ANALYZE code paths, adjust the logic where get_variable_numdistinct()
forces an "all distinct" estimate on the basis of finding a relevant unique
index.  Unique indexes don't reject nulls, so there's no reason to assume
that the null fraction doesn't apply.

Back-patch to all supported branches.  Back-patching is a bit of a judgment
call, but this problem seems to affect only a few users (else we'd have
identified it long ago), and it's bad enough when it does happen that
destabilizing plan choices in a worse direction seems unlikely.

Patch by me, with documentation wording suggested by Dean Rasheed

Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena>
Discussion: <16143.1470350371@sss.pgh.pa.us>

doc/src/sgml/catalogs.sgml		diff \| blob \| history
src/backend/commands/analyze.c		diff \| blob \| history
src/backend/tsearch/ts_typanalyze.c		diff \| blob \| history
src/backend/utils/adt/rangetypes_typanalyze.c		diff \| blob \| history
src/backend/utils/adt/selfuncs.c		diff \| blob \| history
src/include/catalog/pg_statistic.h		diff \| blob \| history