
    
    
        
        
                
    
        
        
                
    
        
        
                
    
        
        
                
    
        
        
                
    
        
        
            
{"version":"https:\/\/jsonfeed.org\/version\/1","title":"mathspp.com feed","home_page_url":"https:\/\/mathspp.com\/blog\/tags\/nnfwp","feed_url":"https:\/\/mathspp.com\/blog\/tags\/nnfwp.json","description":"Stay up-to-date with the articles on mathematics and programming that get published to mathspp.com.","author":{"name":"Rodrigo Gir\u00e3o Serr\u00e3o"},"items":[{"title":"Neural networks fundamentals with Python \u2013 student-teacher","date_published":"2021-07-26T00:00:00+02:00","id":"https:\/\/mathspp.com\/blog\/neural-networks-fundamentals-with-python-student-teacher","url":"https:\/\/mathspp.com\/blog\/neural-networks-fundamentals-with-python-student-teacher","content_html":"<p>In this article of the <a href=\"\/blog\/tag:nnfwp\">NNFwP series<\/a> we'll do the\n&ldquo;student-teacher&rdquo; experiment with two neural networks,\nwhere one network will learn directly from the other.<\/p>\n\n<figure class=\"image-caption\"><img title=\"Original photo by JJ Ying on Unsplash.\" alt=\"A nice image with blue and purple lights.\" src=\"\/images\/a\/f\/1\/1\/2\/af1126fa06332065f9becb93d9634afb26fcad5f-thumbnail.webp\"><figcaption class=\"\">Original photo by JJ Ying on Unsplash.<\/figcaption><\/figure><h2 id=\"purpose-of-this-article\">Purpose of this article<a href=\"#purpose-of-this-article\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>The purpose of this article is to try and perform an experiment\nthat is called the &ldquo;student-teacher&rdquo; experiment,\nin which we use one large neural network to train another smaller network directly,\ninstead of using conventional training data.<\/p>\n<p>In the field of machine learning,\nthis is also commonly referred to as &ldquo;knowledge distillation&rdquo;.\nThe &ldquo;student-teacher&rdquo; experiment we will do in this article\nwill be smaller than similar experiments you might find done in the field,\nbut the principles will be more or less the same;\nour experiment will only be smaller because we are dealing with a small and simple problem,\nthe MNIST classification problem, and because the networks we are using are fairly small.<\/p>\n<p>If you want to learn more about &ldquo;student-teacher&rdquo; experiments\/knowledge distillation,\nmaybe take a look at <a href=\"https:\/\/arxiv.org\/abs\/2006.05525\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">this survey<\/a>.<\/p>\n<div class=\"notices blue\">\n<p>The code for this article, and for the all articles of the series,\ncan be found in <a href=\"https:\/\/github.com\/mathspp\/NNFwP\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">this GitHub repository<\/a>.\nThis article will build upon <a href=\"https:\/\/github.com\/mathspp\/NNFwP\/tree\/v1.2\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">v1.2<\/a> of that code.<\/p>\n<\/div>\n<h2 id=\"the-layout-of-the-experiment\">The layout of the experiment<a href=\"#the-layout-of-the-experiment\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>Let me explain to you how the &ldquo;student-teacher&rdquo; experiment works,\nand why we bother doing it.<\/p>\n<p>As you have seen, neural networks are composed of several layers\nthat are stacked together.\nThe number of layers is variable, and so is the shape of each layer.\nAs it turns out, for more complex tasks, like speech recognition,\nor object recognition, or text translation, among others,\nwe usually have networks much larger than the network we used for\nthe MNIST dataset.<\/p>\n<p>However, a network being larger brings a couple of downsides with it,\nnamely the computation power it takes to run it and the memory usage\nof having it loaded in memory.\nTo counteract these issues, researchers looked for a way of &ldquo;compressing&rdquo;\nthese large networks, and they came up with a really interesting method.\nThey observed that, if you trained a smaller network right from the get-go,\nthe smaller network wouldn't be as good as the larger one.\n<em>But<\/em>, if they started by training the larger one, <em>then<\/em> they could train\nthe smaller one by teaching it to <em>mimic<\/em> the larger one.\nThey realised that the smaller network\nwould work as well as the original one, or sometimes even better!<\/p>\n<p>Here are the detailed steps of how this experiment works:<\/p>\n<ol><li>create, train, and test, a large network in the conventional way &ndash; this will be the teacher;<\/li>\n<li>create a smaller network &ndash; this will be the student;<\/li>\n<li>train the student as follows:\n<ol><li>traverse the training data;<\/li>\n<li>feed the input <code>x<\/code> to the teacher and obtain output <code>o<\/code>; and<\/li>\n<li>train the student on the input data <code>x<\/code> and use the teacher output <code>o<\/code> as the target.<\/li>\n<\/ol><\/li>\n<li>test the student.<\/li>\n<\/ol><p>Let us do this experiment together, with the MNIST data.<\/p>\n<h2 id=\"benchmarking-the-student\">Benchmarking the student<a href=\"#benchmarking-the-student\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>Before we begin, let...<\/p>","date_modified":"2025-12-19T21:31:00+01:00","tags":["machine learning","nnfwp","numpy","programming","python"],"image":"\/user\/pages\/02.blog\/neural-networks-fundamentals-with-python-student-teacher\/thumbnail.webp"},{"title":"Neural networks fundamentals with Python \u2013 subtleties","date_published":"2021-04-30T00:00:00+02:00","id":"https:\/\/mathspp.com\/blog\/neural-networks-fundamentals-with-python-subtleties","url":"https:\/\/mathspp.com\/blog\/neural-networks-fundamentals-with-python-subtleties","content_html":"<p>In the fifth article of this short series we will be handling\nsome subtleties that we overlooked in our experiment to classify handwritten\ndigits from the MNIST dataset.<\/p>\n\n<figure class=\"image-caption\"><img title=\"Original photo by JJ Ying on Unsplash.\" alt=\"A nice image with blue and purple lights.\" src=\"\/images\/0\/5\/4\/f\/7\/054f7338870635fba959e1a34f403c3527faf85a-thumbnail.webp\"><figcaption class=\"\">Original photo by JJ Ying on Unsplash.<\/figcaption><\/figure><h2 id=\"purpose-of-this-article\">Purpose of this article<a href=\"#purpose-of-this-article\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>The purpose of this article is to go over the code\nwe have written so far and, in particular, the code\nwe wrote in the <a href=\"\/blog\/neural-networks-fundamentals-with-python-mnist\">previous article<\/a>,\nand fix a couple of little inconsistencies that are related\nto some subtleties I overlooked in the previous articles.<\/p>\n<div class=\"notices blue\">\n<p>The code for this article, and for the all articles of the series,\ncan be found in <a href=\"https:\/\/github.com\/mathspp\/NNFwP\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">this GitHub repository<\/a>.\nThis article will build upon <a href=\"https:\/\/github.com\/mathspp\/NNFwP\/tree\/v1.1\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">v1.1<\/a> of that code.<\/p>\n<\/div>\n<h2 id=\"interpreting-outputs-as-probabilities\">Interpreting outputs as probabilities<a href=\"#interpreting-outputs-as-probabilities\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>In the <a href=\"\/blog\/neural-networks-fundamentals-with-python-mnist\">article where we classified handwritten digits<\/a>\nthere was a point in which we had to create the target\ncolumn vector outputs for the images of the digits:\nwe only knew what digit each image referred to, we didn't\nhave the column vector we wanted the network to output.<\/p>\n<p>What we decided to do was create a vector with 0s, except\nin the position of the correct digit:<\/p>\n<pre><code class=\"language-py\">&gt;&gt;&gt; digit = 3\n&gt;&gt;&gt; t = np.zeros((10, 1))\n&gt;&gt;&gt; t[digit] = 1\n&gt;&gt;&gt; t\narray([[0.],\n       [0.],\n       [0.],\n       [1.],\n       [0.],\n       [0.],\n       [0.],\n       [0.],\n       [0.],\n       [0.]])<\/code><\/pre>\n<p>I then proceeded to saying we would more or less interpret\nthese numbers as the probabilities that the network assigns to\neach digit; e.g. if the network receives an image of a three,\nwe want the network to say that there's 0% chance that image\nis a 0, 0% chance it is a 1, or 0% chance it is an 8, but we\nwould want the network to say there's 100% chance that that image\nis an image of a 3.\n<em>However<\/em>, the activation function we used in the last\nlayer was the leaky ReLU, and the leaky ReLU can output numbers\noutside the range <span class=\"mathjax mathjax--inline\">\\([0, 1]\\)<\/span>, which is the range where probabilities\nlie.<\/p>\n<p>As we have seen, we can still create something that works if we\noverlook this subtlety, but we should actually be careful to\nmake sure everything is consistent with our assumptions, otherwise\nwe might get a problem later down the road that is caused by\nthese inconsistencies... and if you kick this can down the road,\nit becomes really hard to trace back those future problems\nto this present inconsistency.<\/p>\n<p>Fixing this can be as simple as using an activation function\nthat only produces values in the range <span class=\"mathjax mathjax--inline\">\\([0, 1]\\)<\/span> in the last layer.\nThe <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sigmoid_function\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">sigmoid function<\/a> is an appropriate alternative,\nas it takes any real number and always produces a value in <span class=\"mathjax mathjax--inline\">\\([0, 1]\\)<\/span>.\nThe sigmoid function looks like this:<\/p>\n<figure class=\"image-caption\"><img title=\"Graph of the sigmoid function.\" alt=\"\" src=\"\/user\/pages\/02.blog\/neural-networks-fundamentals-with-python-subtleties\/_sigmoid.svg?decoding=auto&amp;fetchpriority=auto\"><figcaption class=\"\">Graph of the sigmoid function.<\/figcaption><\/figure><p>The formula for the sigmoid function is the following:<\/p>\n<p class=\"mathjax mathjax--block\">\\[\nf(x) = \\frac1{1 + e^{-x}}\\]<\/p>\n<p>If we want to implement a <code>Sigmoid<\/code> activation function, all we need\nto do is figure out what its derivative is, and then implement it\nas a class...<\/p>","date_modified":"2025-12-19T21:31:00+01:00","tags":["machine learning","mathematics","nnfwp","numpy","programming","python"],"image":"\/user\/pages\/02.blog\/neural-networks-fundamentals-with-python-subtleties\/thumbnail.webp"},{"title":"Neural networks fundamentals with Python \u2013 MNIST","date_published":"2021-03-13T00:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/neural-networks-fundamentals-with-python-mnist","url":"https:\/\/mathspp.com\/blog\/neural-networks-fundamentals-with-python-mnist","content_html":"<p>In the fourth article of this short series we will apply\nour neural network framework to recognise handwritten digits.<\/p>\n\n<figure class=\"image-caption\"><img title=\"Original photo by JJ Ying on Unsplash.\" alt=\"A nice image with blue and purple lights.\" src=\"\/user\/pages\/02.blog\/neural-networks-fundamentals-with-python-mnist\/_thumbnail.png\"><figcaption class=\"\">Original photo by JJ Ying on Unsplash.<\/figcaption><\/figure><h2 id=\"purpose-of-this-article\">Purpose of this article<a href=\"#purpose-of-this-article\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>The purpose of this article is to take the neural network\nframework you built in the previous three articles and apply it\nto an actual machine learning problem.\nIn particular, we will take the MNIST dataset &ndash; a dataset that\ncontains images of handwritten digits &ndash; and train a neural\nnetwork to be able to recognise them.<\/p>\n<p>The images we will be working with are like the ones below:<\/p>\n<figure class=\"image-caption\"><img title=\"Some example images from the MNIST dataset.\" alt=\"\" src=\"\/user\/pages\/02.blog\/neural-networks-fundamentals-with-python-mnist\/_mnist_examples.webp\"><figcaption class=\"\">Some example images from the MNIST dataset.<\/figcaption><\/figure><p>By the time you are done with this article, you will have a neural\nnetwork that is able to recognise the digit in an image\n9 out of 10 times.<\/p>\n<div class=\"notices blue\">\n<p>The code for this article, and for the all articles of the series,\ncan be found in <a href=\"https:\/\/github.com\/mathspp\/NNFwP\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">this GitHub repository<\/a>.\nThis article will build upon <a href=\"https:\/\/github.com\/mathspp\/NNFwP\/tree\/v1.0\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">v1.0<\/a> of that code.<\/p>\n<p>If you need a refresher on what we built last time, have a quick read\n<a href=\"\/blog\/neural-networks-fundamentals-with-python-backpropagation\">at the previous article<\/a>.<\/p>\n<\/div>\n<h2 id=\"getting-the-data\">Getting the data<a href=\"#getting-the-data\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>In the world of machine learning, the MNIST dataset is very well-known\nand it is common to play around with this dataset when you are experimenting\nwith a new machine learning model.\nThink of it as the &ldquo;Hello, World!&rdquo; of machine learning.<\/p>\n<p>For your convenience, I compressed the MNIST dataset\nit and made it available for download <a href=\"https:\/\/github.com\/mathspp\/NNFwP\/blob\/main\/examples\/mnistdata.rar\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">here<\/a>\n(<a href=\"https:\/\/github.com\/mathspp\/NNFwP\/blob\/main\/examples\/mnistdata.rar?raw=true\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">direct download link<\/a>).<\/p>\n<p>After you decompress the data folder, you should get two files:<\/p>\n<ul><li>\n<code>mnist_train.csv<\/code>, a CSV file with 60 000 rows to train the network; and<\/li>\n<li>\n<code>mnist_test.csv<\/code>, a CSV file with 10 000 rows to test the network.<\/li>\n<\/ul><div class=\"notices yellow\">\n<p>If you want to follow along with me, be sure to decompress the <code>mnistdata<\/code>\nfolder inside the <code>examples<\/code> folder, so that the two files are\n<code>examples\/mnistdata\/mnist_train.csv<\/code> and <code>examples\/mnistdata\/mnist_test.csv<\/code>.<\/p>\n<\/div>\n<p>Each row of the file is composed of 785 integers:\nthe first one is a number between 0 and 9, inclusive,\nand tells you which digit that row is.\nThe remaining 784 integers go from 0 to 255 and represent\nthe greyscale levels of the original image, which\nwas 28 by 28 pixels.<\/p>\n<h2 id=\"creating-a-new-file\">Creating a new file<a href=\"#creating-a-new-file\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>Because you are done with <em>implementing<\/em> the network and are now going\nto <em>use<\/em> it, it makes sense to create a new file for this little\nproject, so go ahead and create a <code>mnist.py<\/code> file next to your file\ncontaining the neural network implementation.<\/p>\n<h2 id=\"reading-the-data-in\">Reading the data in<a href=\"#reading-the-data-in\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>The first thing we want to do is take a look at the data,\nso let's go ahead and create a simple function to read it in\nand build a NumPy array out of the file.<\/p>\n<p>In order to do that, we first use <code>csv<\/code> to import the data\nfrom the CSV file into a list with the rows of the file,\nand then use NumPy to convert that list of rows into an array.\nDo...<\/p>","date_modified":"2025-07-23T16:49:02+02:00","tags":["machine learning","mathematics","nnfwp","numpy","programming","python"],"image":"\/user\/pages\/02.blog\/neural-networks-fundamentals-with-python-mnist\/thumbnail.webp"},{"title":"Neural networks fundamentals with Python \u2013 backpropagation","date_published":"2021-03-06T00:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/neural-networks-fundamentals-with-python-backpropagation","url":"https:\/\/mathspp.com\/blog\/neural-networks-fundamentals-with-python-backpropagation","content_html":"<p>The third article of this short series concerns itself with the\nimplementation of the backpropagation algorithm, the usual choice\nof algorithm used to enable a neural network to learn.<\/p>\n\n<figure class=\"image-caption\"><img title=\"Original photo by JJ Ying on Unsplash.\" alt=\"A nice image with blue and purple lights.\" src=\"\/user\/pages\/02.blog\/neural-networks-fundamentals-with-python-backpropagation\/_thumbnail.png\"><figcaption class=\"\">Original photo by JJ Ying on Unsplash.<\/figcaption><\/figure><h2 id=\"purpose-of-this-article\">Purpose of this article<a href=\"#purpose-of-this-article\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>In this article we will be deriving and implementing the backpropagation algorithm.\nThat is, we will be performing the necessary calculations in order to see\nhow the algorithm works, and then we will implement it.\nAfter the algorithm is implemented, we will run a small test to verify\nempirically that it is working.<\/p>\n<div class=\"notices blue\">\n<p>The code for this article, and for the all articles of the series,\ncan be found in <a href=\"https:\/\/github.com\/mathspp\/NNFwP\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">this GitHub repository<\/a>.\nThis article will build upon <a href=\"https:\/\/github.com\/mathspp\/NNFwP\/tree\/v0.2\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">v0.2<\/a> of that code.<\/p>\n<p>If you need a refresher on what we built last time, have a quick read\n<a href=\"\/blog\/neural-networks-fundamentals-with-python-network-loss\">at the previous article<\/a>.<\/p>\n<\/div>\n<h2 id=\"preamble\">Preamble<a href=\"#preamble\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>I will try my best to write an intuitive explanation of what really happens under the hood,\nand at the same time I will include all the mathematics needed to formalise what is really going on.\nIf I succeed, you will be able to grasp what is going on by reading the intuitive explanations\nand you will be able to check I didn't make any mistakes if you check the mathematics.<\/p>\n<div class=\"notices green\">\n<p>The mathematical formalisations will be included inside these notices,\nso feel free to ignore these if you do not care about the mathematics behind everything.<\/p>\n<\/div>\n<p>First things first, we need to understand <em>why<\/em> we need an algorithm like backpropagation.<\/p>\n<h2 id=\"why-is-backpropagation-needed\">Why is backpropagation needed?<a href=\"#why-is-backpropagation-needed\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>A neural network contains <em>many<\/em> weights and bias which affect the output of the network when you feed it some input.\nThe whole idea of the neural network model is that<\/p>\n<ol><li>you feed some input to the network and get some output;<\/li>\n<li>you compare this output with what was the expected output;<\/li>\n<li>you tweak the neural network just a <em>tiny<\/em> bit so that this particular input produces something that is slightly closer to the actual output;<\/li>\n<li>you repeat this for various different inputs, in the hopes that these small improvements compound into really good improvements after a lot of iterations.<\/li>\n<\/ol><div class=\"notices green\">\n<p>In order to know how to tweak the neural network, we will use derivatives.\nIf <span class=\"mathjax mathjax--inline\">\\(L\\)<\/span> is your loss function and <span class=\"mathjax mathjax--inline\">\\(w\\)<\/span> is one of the many weights of the neural network,\nthen <span class=\"mathjax mathjax--inline\">\\(\\frac{\\partial L}{\\partial w}\\)<\/span> quantifies how much <span class=\"mathjax mathjax--inline\">\\(L\\)<\/span> changes when you change <span class=\"mathjax mathjax--inline\">\\(w\\)<\/span> slightly,\nand thus you can use that information to know how to tweak <span class=\"mathjax mathjax--inline\">\\(w\\)<\/span>.<\/p>\n<\/div>\n<p>The backpropagation algorithm is an algorithm you can use to make these small improvements <em>in an efficient way<\/em>.<\/p>\n<h2 id=\"setup\">Setup<a href=\"#setup\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>Let us start by defining exactly what we are working with, so that there are no mix-ups.<\/p>\n<p>We will be working with an instance of a <code>NeuralNetwork<\/code>, that we will refer to as <code>net<\/code>,\nthat we assume has <code>n<\/code> layers.\nThat is, <code>n = len(net._layers)<\/code>.\nEach layer <code>i<\/code> has a weights matrix, <code>net._layers[i]._W<\/code>, and a <em>column<\/em> vector <code>net._layers[i]._b...<\/code><\/p>","date_modified":"2025-07-23T16:49:02+02:00","tags":["machine learning","mathematics","nnfwp","numpy","programming","python"],"image":"\/user\/pages\/02.blog\/neural-networks-fundamentals-with-python-backpropagation\/thumbnail.webp"},{"title":"Neural networks fundamentals with Python \u2013 network &amp; loss","date_published":"2021-03-03T00:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/neural-networks-fundamentals-with-python-network-loss","url":"https:\/\/mathspp.com\/blog\/neural-networks-fundamentals-with-python-network-loss","content_html":"<p>In the second article of this short series we will create a class\nfor a generic neural network and we will also see how to assess\nthe quality of the output of a network, essentially preparing\nourselves to implement the backpropagation algorithm.<\/p>\n\n<figure class=\"image-caption\"><img title=\"Original photo by JJ Ying on Unsplash.\" alt=\"A nice image with blue and purple lights.\" src=\"\/user\/pages\/02.blog\/neural-networks-fundamentals-with-python-network-loss\/_thumbnail.png\"><figcaption class=\"\">Original photo by JJ Ying on Unsplash.<\/figcaption><\/figure><h2 id=\"purpose-of-this-article\">Purpose of this article<a href=\"#purpose-of-this-article\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>In this article we want to create a class that represents a generic neural network,\nwhich will build up on the <code>Layer<\/code> class we created <a href=\"\/blog\/neural-networks-fundamentals-with-python-intro\">in the first article<\/a>\nof the series: this class should provide some methods that allow to deal with a whole\nnetwork, like feeding it some input and getting the final network output\n(just like the little demo we included in our script from the previous article).<\/p>\n<p>After creating such a representation, we will be dealing with the concept of loss:\nthe way in which we assess how a neural network is performing, and an essential\nconcept we need if we want our neural network to <em>learn<\/em>.<\/p>\n<div class=\"notices blue\">\n<p>The code for this article, and for the all articles of the series,\ncan be found in <a href=\"https:\/\/github.com\/mathspp\/NNFwP\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">this GitHub repository<\/a>.\nWe will build upon <a href=\"https:\/\/github.com\/mathspp\/NNFwP\/tree\/v0.1\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">v0.1<\/a> of that code.<\/p>\n<p>If you need a refresher on what we built last time, have a quick read\n<a href=\"\/blog\/neural-networks-fundamentals-with-python-intro\">at the previous article<\/a>.<\/p>\n<\/div>\n<h2 id=\"neural-network-as-a-chain-of-layers\">Neural network as a chain of layers<a href=\"#neural-network-as-a-chain-of-layers\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>In the <a href=\"\/blog\/neural-networks-fundamentals-with-python-intro\">previous article<\/a> we implemented a <code>Layer<\/code> class and then proceeded\nto show how several layer instances could be chained as long as their input\nand output dimensions matched.\nThis is the main characterisation of a neural network:\na sequence of layers that receives some information as input, processes it\nover its several layers, and then produces some output.<\/p>\n<p>Aggregating these layers as a single object will make it easier for us to reason\nabout the neural network as a single entity, instead of having to constantly\ndeal with several layers.<\/p>\n<p>For that matter, to define a <code>NeuralNetwork<\/code> we need as little as the\nsequence of layers that composes it:<\/p>\n<pre><code class=\"language-py\">class NeuralNetwork:\n    \"\"\"A series of connected, compatible layers.\"\"\"\n    def __init__(self, layers):\n        self._layers = layers<\/code><\/pre>\n<p>Of course, it might be a good idea to do a single check at this point,\nto see if the layers are compatible with each other:<\/p>\n<pre><code class=\"language-py\">class NeuralNetwork:\n    \"\"\"A series of connected, compatible layers.\"\"\"\n    def __init__(self, layers):\n        self._layers = layers\n\n        # Check layer compatibility\n        for (from_, to_) in zip(self._layers[:-1], self._layers[1:]):\n            if from_.outs != to_.ins:\n                raise ValueError(\"Layers should have compatible shapes.\")<\/code><\/pre>\n<p>After defining the object that holds all our layers, and ensuring\nthe layers are compatible, we can implement the forward pass method\nof the network: the method that takes network inputs and then propagates\nthat information forward, until the network produces some output.<\/p>\n<p>Because we already have a <code>forward_pass<\/code> method on the <code>Layer<\/code> object,\nall we need to do is feed the output of a layer as the input to the\nnext:<\/p>\n<pre><code class=\"language-py\">class NeuralNetwork:\n    # ...\n\n    def forward_pass(self, x):\n        out = x\n        for layer in self._layers:\n            out = layer.forward_pass(out)\n        return out<\/code><\/pre>\n<p>We can now...<\/p>","date_modified":"2025-07-23T16:49:02+02:00","tags":["machine learning","mathematics","nnfwp","numpy","programming","python"],"image":"\/user\/pages\/02.blog\/neural-networks-fundamentals-with-python-network-loss\/thumbnail.webp"},{"title":"Neural networks fundamentals with Python \u2013 intro","date_published":"2021-03-02T00:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/neural-networks-fundamentals-with-python-intro","url":"https:\/\/mathspp.com\/blog\/neural-networks-fundamentals-with-python-intro","content_html":"<p>This is the first article in a series to implement a neural network from <em>scratch<\/em>.\nWe will set things up in terms of software to install, knowledge we need,\nand some code to serve as backbone for the remainder of the series.<\/p>\n\n<figure class=\"image-caption\"><img title=\"Photo by JJ Ying on Unsplash.\" alt=\"A nice image with blue and purple lights.\" src=\"\/images\/2\/4\/3\/4\/b\/2434ba19ce13d55f4c8c9998e7f68ae0c827d0fd-thumbnail.png\"><figcaption class=\"\">Photo by JJ Ying on Unsplash.<\/figcaption><\/figure><h2 id=\"purpose-of-the-series\">Purpose of the series<a href=\"#purpose-of-the-series\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>In this short series I will be guiding you on how to implement a neural network\nfrom scratch, so that you really understand how they work.<\/p>\n<blockquote>\n<p>By the time we are done, your network will be able to read images\nof handwritten digits and identify them correctly, among other things.<\/p>\n<\/blockquote>\n<p>Your network will receive an image like<\/p>\n<figure class=\"image-caption\"><img title=\"A digit 7.\" alt=\"\" src=\"\/user\/pages\/02.blog\/neural-networks-fundamentals-with-python-intro\/_digit_7.webp\"><figcaption class=\"\">A digit 7.<\/figcaption><\/figure><p>and will output a <code>7<\/code>.<\/p>\n<p>Not only that, but we will also do a bunch of other cool things with our networks.<\/p>\n<p>It is incredible that nowadays you can just install tensorflow or pytorch or any\nother machine learning framework, and with just a couple of lines you can train\na neural network on some task you pick.\nThe downside to using those libraries is that they teach you little about how the\nmodels actually work, and one of the best ways to understand how something works\nis by dissecting it, studying it and assembling it yourself.<\/p>\n<h3 id=\"knowledge-pre-requisites\">Knowledge pre-requisites<a href=\"#knowledge-pre-requisites\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>When it comes to programming, I will assume you are comfortable with the basic\nconcepts of Python.\nI will not be using features that are too advanced,\nbut every now and then I might use modules from the Python Standard Library.\nIf you don't know those modules, that's fine.\nReading the description from the documentation should be enough to bring you\nup to speed!\nI also have a <a href=\"\/blog\/pydonts\">series of blog articles<\/a> to help you write better\nPython code, so take a look at that if you feel the need.<\/p>\n<p>Throughout the series I will be assuming you are familiar with the general idea\nof how neural networks work.\nIf you have <em>no idea<\/em> what a neural network is, it is with great pleasure that\nI recommend you watch <a href=\"https:\/\/www.youtube.com\/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">3blue1brown's video(s)<\/a>.<\/p>\n<p>Being familiar with matrix algebra and a bit of calculus (knowing what derivatives are\nand understanding them) will make it easier for you to follow along, but you\ndon't really need to be a master of these subjects in order to implement a neural net:\nthe first time I implemented a neural network I didn't know matrix algebra nor\nderivatives.\nI will be giving intuitive explanations of what is going on at each step, and then\nproceed to justify them with the calculations needed.\nFeel free to skip the formal calculations if you are in a rush, but bear in mind\nthat going through those is one of the things that really helps the knowledge sink in.<\/p>\n<h2 id=\"purpose-of-this-article\">Purpose of this article<a href=\"#purpose-of-this-article\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>In this article we will be setting up for the remainder of the series.\nFirst, I will be covering what needs to be installed.\nThen we will be taking a look at some of...<\/p>","date_modified":"2025-07-23T16:49:02+02:00","tags":["machine learning","mathematics","nnfwp","numpy","programming","python"],"image":"\/user\/pages\/02.blog\/neural-networks-fundamentals-with-python-intro\/thumbnail.png"}]}
