For the basic introduction of Convolutional Neural Networks (CNN), here is mainly based on code implementation.
CNN is a multi-layered neural network, each layer consisting of multiple two-dimensional planes, each of which consists of multiple independent neurons.
Using MNIST as a database, a simple 7-layer CNN structure is modeled after LeNet-5 and TIny-cnn():
Input layer Input: number of neurons 32*32=1024;
C1 layer: convolution window size 5*5, output feature map number 6, convolution window type 6, output feature map size 28*28, training parameters (weight + threshold (offset)) 5*5*6+ 6=150+6, the number of neurons is 28*28*6=4704;
S2 layer: convolution window size 2*2, output downsampling number 6, convolution window type 6, output downsampling size 14*14, training parameter 1*6+6=6+6, number of neurons 14 *14*6=1176;
C3 layer: convolution window size 5*5, output feature map number 16, convolution window type 6*16=96, output feature map size 10*10, training parameter 5*5*(6*16)+16= 2400+16, the number of neurons is 10*10*16=1600;
S4 layer: convolution window size 2*2, output downsampled graph number 16, convolution window type 16, output downsampling map size 5*5, training parameter 1*16+16=16+16, number of neurons 5 *5*16=400;
C5 layer: convolution window size 5*5, output feature map number 120, convolution window type 16*120=1920, output feature map size 1*1, training parameter 5*5*(16*120)+120= 48000+120, the number of neurons is 1*1*120=120;
Output layer Output: Convolution window size 1*1, output feature map number 10, convolution window type 120*10=1200, output feature map size 1*1, training parameter 1*(120*10)+10=1200 +10, the number of neurons is 1*1*10=10.
The following describes the implementation process:
1. Obtain training samples and test sample data from the MNIST database:
(1), the original MNIST library image size is 28 * 28, where the scaling is 32 * 32, the data value range is [-1, 1], the expansion value is taken -1; a total of 60,000 32 * 32 training samples, 10,000 32*32 test samples;
(2) The output layer has 10 output nodes. In the training phase, the node value of the corresponding position is set to 0.8, and the other nodes are set to -0.8.
2. Initialization weights and thresholds (offsets): Weights are convolutional images. The neurons on each feature map share the same weight and threshold. The number of feature maps is equal to the number of thresholds.
(1), the weight is initialized by the method of uniform rand;
(2), the threshold is initialized to 0.
3. Forward Propagation: Calculate the value of each layer of neurons based on weights and thresholds.
(1) Input layer: Enter one 32*32 data each time.
(2), C1 layer: use each 5*5 convolution image to multiply the image by 32*32 to obtain a 28*28 image, that is, the corresponding positions are added and then summed, and the stride length is 1; Six 5*5 convolution images are then added to each neuron with a threshold, and finally each neuron is computed by the tanh activation function to get the final result of each neuron.
(3), S2 layer: Generate 6 14*14 downsampled maps for 6 28*28 feature maps in C1. The adjacent four neurons are added and summed separately, then multiplied by a weight, and then The averaging is divided by 4, then a threshold is added, and finally each neuron is operated by the tanh activation function to obtain the final result of each neuron.
(4), C3 layer: 16 10*10 feature maps are generated from 6 14*14 downsampled graphs in S2. For each 10*10 feature map generated, it is composed of 6 5*5 convolutions. The image is multiplied by 6 14*14 downsampling plots, then the corresponding positions are summed and summed, then a threshold is added to each neuron, and finally each neuron is computed by the tanh activation function to get each of the finals. The result of neurons.
(5), S4 layer: 16 5*5 downsampled graphs are generated from 16 10*10 feature maps in C3, and adjacent four neurons are added and summed separately, then multiplied by a weight, and then The mean is divided by 4, then a threshold is added, and finally each neuron is computed by the tanh activation function to get the final result of each neuron.
(6), C5 layer: 120 1*1 feature maps are generated from 16 5*5 downsampling graphs in S4. For each 1*1 feature map generated, 16 5*5 convolution images are generated. Multiply by 16 5*5 under the use of the graph, then add the sum, then add a threshold to each neuron, and finally operate each neuron through the tanh activation function to get the final for each neuron. result.
(7), the output layer: the fully connected layer, each neuron in the output layer is multiplied by the corresponding weights of 120 neurons in the C5 layer, and then summed; then for each nerve The element is added with a threshold, and finally each neuron is operated by the tanh activation function to obtain the result of each of the final neurons.
4. Backpropagation: mainly calculates the error of each layer of neurons, weights and thresholds to update weights and thresholds
(1) Output layer: Calculate the output layer neuron error; calculate the output layer neuron error by the derivative function of the mse loss function and the derivative function of the tanh activation function.
(2), C5 layer: Calculate C5 layer neuron error, output layer weight error, output layer threshold error; multiply the output layer neuron error by the output layer weight, sum, and multiply the C5 layer neuron The derivative of the tanh activation function obtains the error of each neuron in the C5 layer; the error of the output layer is obtained by multiplying the error of the output layer neuron by the C5 layer; the error of the output layer is the threshold error of the output layer.
Suizhou simi intelligent technology development co., LTD , https://www.msmsmart.com