{"version":"https:\/\/jsonfeed.org\/version\/1","title":"mathspp.com feed","home_page_url":"https:\/\/mathspp.com\/blog\/tags\/numerical-analysis","feed_url":"https:\/\/mathspp.com\/blog\/tags\/numerical-analysis.json","description":"Stay up-to-date with the articles on mathematics and programming that get published to mathspp.com.","author":{"name":"Rodrigo Gir\u00e3o Serr\u00e3o"},"items":[{"title":"TIL #053 \u2013 precision of Python floats","date_published":"2022-09-14T00:00:00+02:00","id":"https:\/\/mathspp.com\/blog\/til\/precision-of-python-floats","url":"https:\/\/mathspp.com\/blog\/til\/precision-of-python-floats","content_html":"<p>Today I learned what precision Python floats have.<\/p>\n\n<h1 id=\"precision-of-python-floats\">Precision of Python floats<a href=\"#precision-of-python-floats\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>Python floats are IEEE 754 double-precision binary floating-point numbers,\ncommonly referred to as \u201cdoubles\u201d, and take up 64 bits.\nOf those:<\/p>\n<ul>\n<li>1 is for the sign of the number;<\/li>\n<li>11 are for the exponent; and<\/li>\n<li>52 are for the fraction.<\/li>\n<\/ul>\n<figure class=\"image-caption\"><img title=\"A diagram representing the 64 bits of doubles. Original by Codekaizen on Wikipedia, licensed under CC BY-SA 4.0.\" alt='\"A diagram representing the 64 bits of doubles.\"' src=\"\/user\/pages\/02.blog\/04.til\/053.precision-of-python-floats\/_float_diagram_light.svg?decoding=auto\"><figcaption class=\"\">A diagram representing the 64 bits of doubles. Original by Codekaizen on Wikipedia, licensed under CC BY-SA 4.0.<\/figcaption><\/figure>\n<p>We can verify experimentally that Python floats use 52 bits to store the fraction.\nThe number <code>1 &lt;&lt; 53<\/code> is an exact integer:<\/p>\n<pre><code class=\"language-py\">&gt;&gt;&gt; n = 1 &lt;&lt; 53\n&gt;&gt;&gt; n\n9007199254740992<\/code><\/pre>\n<p>In binary, this number is a <code>1<\/code> followed by 53 <code>0<\/code>:<\/p>\n<pre><code class=\"language-py\">&gt;&gt;&gt; bin(n)\n'0b100000000000000000000000000000000000000000000000000000'\n&gt;&gt;&gt; bin(n)[2:]\n'100000000000000000000000000000000000000000000000000000'\n&gt;&gt;&gt; bin(n)[2:].count(\"0\")\n53<\/code><\/pre>\n<p>Now, if you convert <code>n<\/code> to a float and add <code>1<\/code>, nothing happens;\nwhereas if you add <code>2<\/code>, you get the correct value:<\/p>\n<pre><code class=\"language-py\">&gt;&gt;&gt; float(n)\n9007199254740992.0\n&gt;&gt;&gt; float(n) + 1\n9007199254740992.0\n&gt;&gt;&gt; float(n) + 2\n9007199254740994.0<\/code><\/pre>\n<p>Why?<\/p>\n<p>Well, <code>n + 1<\/code> in binary starts and ends with <code>1<\/code> and has 52 zeroes in the middle:<\/p>\n<pre><code class=\"language-py\">&gt;&gt;&gt; bin(n + 1)\n'0b100000000000000000000000000000000000000000000000000001'<\/code><\/pre>\n<p>Represented in scientific notation (in binary), this number would have 53 digits after the decimal point:<\/p>\n<p class=\"mathjax mathjax--block\">\\[\n1.00000000000000000000000000000000000000000000000000001_2 \\times 2^{53}\\]<\/p>\n<p>However, doubles only have 52 digits after the decimal point, so the final <code>1<\/code> is dropped and the number becomes<\/p>\n<p class=\"mathjax mathjax--block\">\\[\n1.0000000000000000000000000000000000000000000000000000_2 \\times 2^{53}\\]<\/p>\n<p>Which is exactly the same number.<\/p>\n<p>However, if we add <code>2<\/code> instead of just <code>1<\/code>, the final result is<\/p>\n<p class=\"mathjax mathjax--block\">\\[\n1.00000000000000000000000000000000000000000000000000010_2 \\times 2^{53}\\]<\/p>\n<p>which looks like<\/p>\n<p class=\"mathjax mathjax--block\">\\[\n1.0000000000000000000000000000000000000000000000000001_2 \\times 2^{53}\\]<\/p>\n<p>if we only use 52 digits after the decimal point, which is the same as the correct result.<\/p>\n<p>The fact that <code>n<\/code>, <code>n + 2<\/code>, <code>n + 4<\/code>, ... give the correct results,\nand <code>n + 1<\/code>, <code>n + 3<\/code>, <code>n + 5<\/code>, ... don't,\ntogether with the fact that this phenomenon started at <code>n = 1 &lt;&lt; 53<\/code>\nand not <code>n = 1 &lt;&lt; 52<\/code> shows that Python floats use 52 bits to store the fraction of a number.<\/p>\n<p>That's it for now! <a href=\"\/subscribe\">Stay tuned<\/a> and I'll see you around!<\/p>","summary":"Today I learned what precision Python floats have.","date_modified":"2024-08-10T20:53:02+02:00","tags":["floats","mathematics","numerical analysis","programming","python"],"image":"\/user\/pages\/02.blog\/04.til\/053.precision-of-python-floats\/thumbnail.webp"},{"title":"TIL #021 \u2013 Spouge&#039;s formula","date_published":"2022-01-10T00:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/til\/021","url":"https:\/\/mathspp.com\/blog\/til\/021","content_html":"<p>Today I learned about Spouge's formula to approximate the factorial.<\/p>\n\n<figure class=\"image-caption\"><img title=\"Photo by Scott Graham on Unsplash (cropped).\" alt=\"\" src=\"\/images\/3\/5\/b\/c\/e\/35bcec4646bc80054d47a77cfc1d0a69b5377379-thumbnail.png\"><figcaption class=\"\">Photo by Scott Graham on Unsplash (cropped).<\/figcaption><\/figure>\n<h1 id=\"spouge-s-formula\">Spouge's formula<a href=\"#spouge-s-formula\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>Spouge's formula allows one to approximate the value of the gamma function.\nIn case you don't know, the gamma function is like a generalisation of the factorial.<\/p>\n<p>In fact, the following equality is true:<\/p>\n<p class=\"mathjax mathjax--block\">\\[\n\\Gamma(z + 1) = z!\\]<\/p>\n<p>where <span class=\"mathjax mathjax--inline\">\\(\\Gamma\\)<\/span> is the gamma function.<\/p>\n<p>What Spouge's formula tells us is that<\/p>\n<p class=\"mathjax mathjax--block\">\\[\n\\Gamma(z + 1) = (z + a)^{z + \\frac12}e^{-z-a}\\left( c_0 + \\sum_{k=1}^{a-1} \\frac{c_k}{z+k} + \\epsilon_a(z) \\right)\\]<\/p>\n<p>In the equality above, <span class=\"mathjax mathjax--inline\">\\(a\\)<\/span> is an arbitrary positive integer and <span class=\"mathjax mathjax--inline\">\\(\\epsilon_a(z)\\)<\/span> is the error term.\nThus, if we drop <span class=\"mathjax mathjax--inline\">\\(\\epsilon_a(z)\\)<\/span>, we get<\/p>\n<p class=\"mathjax mathjax--block\">\\[\n\\Gamma(z + 1) = z! \\approx (z + a)^{z + \\frac12}e^{-z-a}\\left( c_0 + \\sum_{k=1}^{a-1} \\frac{c_k}{z+k} \\right)\\]<\/p>\n<p>The coefficients <span class=\"mathjax mathjax--inline\">\\(c_k\\)<\/span> are given by:<\/p>\n<p class=\"mathjax mathjax--block\">\\[\n\\begin{cases}\nc_0 = \\sqrt{2\\pi} \\\\\nc_k = \\frac{(-1)^{k-1}}{(k - 1)!}(-k + a)^{k - \\frac12}e^{-k+a}, ~ k \\in \\{1, 2, \\cdots, a-1\\}\n\\end{cases}\\]<\/p>\n<p>By picking a suitable value of <span class=\"mathjax mathjax--inline\">\\(a\\)<\/span>, one can approximate the value of <span class=\"mathjax mathjax--inline\">\\(z!\\)<\/span> up to a desired number of decimal places.\nAlthough we need the factorial function to compute the coefficients <span class=\"mathjax mathjax--inline\">\\(c_k\\)<\/span>,\nthose coefficients only need the factorial of numbers up to <span class=\"mathjax mathjax--inline\">\\(a - 2\\)<\/span>.\nIf we are approximating <span class=\"mathjax mathjax--inline\">\\(z!\\)<\/span>, where <span class=\"mathjax mathjax--inline\">\\(a &lt;&lt; z\\)<\/span>, then this approximation saves us some work.<\/p>\n<p>In order to determine the number of correct decimal places of the result,\none needs to control the error term <span class=\"mathjax mathjax--inline\">\\(\\epsilon_a(z)\\)<\/span>.\nIf <span class=\"mathjax mathjax--inline\">\\(a &gt; 2\\)<\/span> and the <span class=\"mathjax mathjax--inline\">\\(Re(z) &gt; 0\\)<\/span> (which is always true if <span class=\"mathjax mathjax--inline\">\\(z\\)<\/span> is a positive integer), then<\/p>\n<p class=\"mathjax mathjax--block\">\\[\n\\epsilon_a(z) \\leq a^{-\\frac12}(2\\pi)^{-a-\\frac12}\\]<\/p>\n<p>By determining the value of <span class=\"mathjax mathjax--inline\">\\(a^{-\\frac12}(2\\pi)^{-a-\\frac12}\\)<\/span>,\nwe can tell how many digits of the result will be correct.\nFor example, with <span class=\"mathjax mathjax--inline\">\\(a = 10\\)<\/span>, we get<\/p>\n<p class=\"mathjax mathjax--block\">\\[\na^{-\\frac12}(2\\pi)^{-a-\\frac12} \\approx 1.31556 \\times 10^{-9} ~ ,\\]<\/p>\n<p>meaning we will get 8 correct digits.<\/p>\n<div class=\"notices yellow\">\n<p>However, notice that the approximating formula must, itself,\nbe computed with enough precision for the final result to hold\nas many correct digits as expected.\nIn other words, if a higher value of <span class=\"mathjax mathjax--inline\">\\(a\\)<\/span> is picked so that the\nfinal result is more accurate, then we need to control the accuracy used when\ncomputing the coefficients <span class=\"mathjax mathjax--inline\">\\(c_k\\)<\/span> and the formula itself.<\/p>\n<\/div>\n<p>I'll leave it as an exercise for you, the reader,\nto implement this approximation in your favourite programming language.<\/p>\n<h1 id=\"spouge-s-formula-in-apl\">Spouge's formula in APL<a href=\"#spouge-s-formula-in-apl\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>In APL, (and disregarding the accuracy issues) it can look something like this:<\/p>\n<pre><code class=\"language-APL\">      \u235d Computes the `c_k` coefficients:\n      Cks \u2190 {(.5*\u2368\u25cb2),((!ks-1)\u00f7\u2368\u00af1*ks-1)\u00d7((\u2375-ks)*ks-.5)\u00d7*\u2375-ks\u21901+\u2373\u2375-1}\n\n      \u235d Computes the approximation of the gamma function:\n      GammaApprox \u2190 {((\u2375+\u237a)*\u2375+.5)\u00d7(*-\u2375+\u237a)\u00d7(\u22a2\u00f71,\u2375+1\u2193\u2373\u2218\u2262)Cks \u237a}\n\n      \u235d Computes an upper bound for the error term:\n      Err \u2190 {(\u2375*\u00af.5)\u00d7(\u25cb2)*-\u2375+.5}\n\n      a \u2190 10\n      Err a\n1.315562187E\u00af9  \u235d Thus, we expect 8 decimal places to be correct.\n      z \u2190 100\n      a GammaApprox z\n9.332621544E157\n      !z\n9.332621544E157<\/code><\/pre>\n<p>That's it for now! <a href=\"\/subscribe\">Stay tuned<\/a> and I'll see you around!<\/p>","summary":"Today I learned about Spouge&#039;s formula to approximate the factorial.","date_modified":"2024-08-10T20:53:02+02:00","tags":["apl","mathematics","numerical analysis"],"image":"\/user\/pages\/02.blog\/04.til\/021.spouges-formula\/thumbnail.png"}]}