Refine statistics.NormalDist documentation and improve test coverage (GH-12208)

author Raymond Hettinger <rhettinger@users.noreply.github.com>

Thu, 7 Mar 2019 07:23:55 +0000 (23:23 -0800)

committer Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>

Thu, 7 Mar 2019 07:23:55 +0000 (23:23 -0800)
author Raymond Hettinger <rhettinger@users.noreply.github.com>
Thu, 7 Mar 2019 07:23:55 +0000 (23:23 -0800)
committer Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
Thu, 7 Mar 2019 07:23:55 +0000 (23:23 -0800)
diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst

index be0215af60361230033b2ae3d4a11de5665de90d..157500ed4b4a107c5f66327ebad7d15d997eaf48 100644 (file)
--- a/Doc/library/statistics.rst
+++ b/Doc/library/statistics.rst
@@ -479,7 +479,7 @@ measurements as a single entity.
  
  Normal distributions arise from the `Central Limit Theorem
  <https://en.wikipedia.org/wiki/Central_limit_theorem>`_ and have a wide range
-of applications in statistics, including simulations and hypothesis testing.
+of applications in statistics.
  
  .. class:: NormalDist(mu=0.0, sigma=1.0)
  
@@ -492,19 +492,19 @@ of applications in statistics, including simulations and hypothesis testing.
  
      .. attribute:: mean
  
-       A read-only property representing the `arithmetic mean
+       A read-only property for the `arithmetic mean
         <https://en.wikipedia.org/wiki/Arithmetic_mean>`_ of a normal
         distribution.
  
      .. attribute:: stdev
  
-       A read-only property representing the `standard deviation
+       A read-only property for the `standard deviation
         <https://en.wikipedia.org/wiki/Standard_deviation>`_ of a normal
         distribution.
  
      .. attribute:: variance
  
-       A read-only property representing the `variance
+       A read-only property for the `variance
         <https://en.wikipedia.org/wiki/Variance>`_ of a normal
         distribution. Equal to the square of the standard deviation.
  
@@ -584,8 +584,8 @@ of applications in statistics, including simulations and hypothesis testing.
      Dividing a constant by an instance of :class:`NormalDist` is not supported.
  
      Since normal distributions arise from additive effects of independent
-    variables, it is possible to `add and subtract two normally distributed
-    random variables
+    variables, it is possible to `add and subtract two independent normally
+    distributed random variables
      <https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables>`_
      represented as instances of :class:`NormalDist`.  For example:
  
@@ -607,15 +607,15 @@ of applications in statistics, including simulations and hypothesis testing.
  
  For example, given `historical data for SAT exams
  <https://blog.prepscholar.com/sat-standard-deviation>`_ showing that scores
-are normally distributed with a mean of 1060 and standard deviation of 192,
+are normally distributed with a mean of 1060 and a standard deviation of 192,
  determine the percentage of students with scores between 1100 and 1200:
  
  .. doctest::
  
      >>> sat = NormalDist(1060, 195)
-    >>> fraction = sat.cdf(1200) - sat.cdf(1100)
+    >>> fraction = sat.cdf(1200 + 0.5) - sat.cdf(1100 - 0.5)
      >>> f'{fraction * 100 :.1f}% score between 1100 and 1200'
-    '18.2% score between 1100 and 1200'
+    '18.4% score between 1100 and 1200'
  
  What percentage of men and women will have the same height in `two normally
  distributed populations with known means and standard deviations
@@ -644,20 +644,12 @@ model:
  
  Normal distributions commonly arise in machine learning problems.
  
-Wikipedia has a `nice example with a Naive Bayesian Classifier
-<https://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_.  The challenge
-is to guess a person's gender from measurements of normally distributed
-features including height, weight, and foot size.
+Wikipedia has a `nice example of a Naive Bayesian Classifier
+<https://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_.  The challenge is to
+predict a person's gender from measurements of normally distributed features
+including height, weight, and foot size.
  
-The `prior probability <https://en.wikipedia.org/wiki/Prior_probability>`_ of
-being male or female is 50%:
-
-.. doctest::
-
-    >>> prior_male = 0.5
-    >>> prior_female = 0.5
-
-We also have a training dataset with measurements for eight people.  These
+We're given a training dataset with measurements for eight people.  The
  measurements are assumed to be normally distributed, so we summarize the data
  with :class:`NormalDist`:
  
@@ -670,8 +662,8 @@ with :class:`NormalDist`:
      >>> foot_size_male = NormalDist.from_samples([12, 11, 12, 10])
      >>> foot_size_female = NormalDist.from_samples([6, 8, 7, 9])
  
-We observe a new person whose feature measurements are known but whose gender
-is unknown:
+Next, we encounter a new person whose feature measurements are known but whose
+gender is unknown:
  
  .. doctest::
  
@@ -679,19 +671,23 @@ is unknown:
      >>> wt = 130        # weight
      >>> fs = 8          # foot size
  
-The posterior is the product of the prior times each likelihood of a
-feature measurement given the gender:
+Starting with a 50% `prior probability
+<https://en.wikipedia.org/wiki/Prior_probability>`_ of being male or female,
+we compute the posterior as the prior times the product of likelihoods for the
+feature measurements given the gender:
  
  .. doctest::
  
+   >>> prior_male = 0.5
+   >>> prior_female = 0.5
     >>> posterior_male = (prior_male * height_male.pdf(ht) *
     ...                   weight_male.pdf(wt) * foot_size_male.pdf(fs))
  
     >>> posterior_female = (prior_female * height_female.pdf(ht) *
     ...                     weight_female.pdf(wt) * foot_size_female.pdf(fs))
  
-The final prediction is awarded to the largest posterior -- this is known as
-the `maximum a posteriori
+The final prediction goes to the largest posterior. This is known as the
+`maximum a posteriori
  <https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation>`_ or MAP:
  
  .. doctest::
diff --git a/Lib/test/test_statistics.py b/Lib/test/test_statistics.py

index 132b9823fd21c017f622e6e4cea02ed21ae31350..a63e4bf6cc84d769b235d2f64058d8aa41c74798 100644 (file)
--- a/Lib/test/test_statistics.py
+++ b/Lib/test/test_statistics.py
@@ -2123,6 +2123,7 @@ class TestNormalDist(unittest.TestCase):
              0.3605, 0.3589, 0.3572, 0.3555, 0.3538,
          ]):
              self.assertAlmostEqual(Z.pdf(x / 100.0), px, places=4)
+            self.assertAlmostEqual(Z.pdf(-x / 100.0), px, places=4)
          # Error case: variance is zero
          Y = NormalDist(100, 0)
          with self.assertRaises(statistics.StatisticsError):
@@ -2262,7 +2263,7 @@ class TestNormalDist(unittest.TestCase):
          self.assertEqual(X * y, NormalDist(1000, 150))      # __mul__
          self.assertEqual(y * X, NormalDist(1000, 150))      # __rmul__
          self.assertEqual(X / y, NormalDist(10, 1.5))        # __truediv__
-        with self.assertRaises(TypeError):
+        with self.assertRaises(TypeError):                  # __rtruediv__
              y / X
  
      def test_equality(self):
author	Raymond Hettinger <rhettinger@users.noreply.github.com>
	Thu, 7 Mar 2019 07:23:55 +0000 (23:23 -0800)
committer	Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
	Thu, 7 Mar 2019 07:23:55 +0000 (23:23 -0800)
Doc/library/statistics.rst		patch \| blob \| history
Lib/test/test_statistics.py		patch \| blob \| history