[轉載]理解不同的神經網絡損失函數(Understanding different Loss Functions for Neural Networks)

原文
There are various loss functions available for different objectives. In this guide, I will take you through some of the very frequently used loss functions, with a set of examples. This guide is designed by keeping the Keras and Tensorflow framework in the mind.
每要達到不同目的需要不同的損失函數這篇文章會介紹一些比較常用的附上一些例子這篇文章主要以Keras以Tensorflow作背景框架

Loss Function | Brief Intro 損失函數 | 概觀

Loss function helps in optimizing the parameters of the neural networks. Our objective is to minimize the loss for a neural network by optimizing its parameters(weights). The loss is calculated using loss function by matching the target(actual) value and predicted value by a neural network. Then we use the gradient descent method to update the weights of the neural network such that the loss is minimized. This is how we train a neural network.
損失函數有助於優化神經網絡的參數. 我們的目的通常是用損失函數來最小化神經網絡的損失值(注:損失值通常代表神經網絡的恶劣程度)並優化權重. 損失值的取得通常來自損失函數,而損失函數以目標結果與預測結果作參數來評估損失值. 然後我們可用梯度下降法優化神經網絡權重,直到損失值達到最優化(注:我在說什麼). 這就是訓練神經網絡的方法

Mean Squared Error均方誤差

When we have a regression task, one of the loss function you can go ahead is this one. As the name suggests, this loss is calculated by taking the mean of squared differences between actual(target) and predicted values.
當我們需要用到回歸,這是其中一個可以考慮的. 故名思義,這函數是用作計算目標平均數和預測結果平均數的平方差

Example例子
For Example, we have a neural network which takes house data and predicts house price. In this case, you can use the MSE loss. Basically, in the case where the output is a real number, you should use this loss function.
舉例說,我們有一個神經網路是以房地產資訊來預測房產價. 我們就可以用均方誤差(MSE). 如果你的輸出是實數,你就可以用此函數

Binary Crossentropy 2元交叉熵

When we have a binary classification task, one of the loss function you can go ahead is this one. If you are using BCE loss function, you just need one output node to classify the data into two classes. The output value should be passed through a sigmoid activation function and the range of output is (0 – 1).
當我們需要用到2元分類,這是其中一個可以考慮的. 如果你是用2元交叉熵(BCE),你只需一個輸出並用作分為兩類. 輸出值會透經過sigmoid激活函數而輸出值將會在0~1範圍內

Example例子
For Example, we have a neural network which takes atmosphere data and predicts whether it will rain or not. If the output is greater than 0.5, the network classifies it as rain and if the output is less than 0.5, the network classifies it as not rain. (it could be opposite depending upon how you train the network). More the probability score value, more the chance of raining.
舉例說,我們的神經網絡是以大氣層數據來預測會否下雨. 如果輸出大於0.5網絡會分類為會下雨,反之為不會(會和不會的方向可以互換, 這看你怎訓練網絡). 數值愈大愈有機會下雨

While training the network, the target value fed to the network should be 1 if it is raining otherwise 0.
當你訓練網絡,目標值應該只有1和0

Note 1備注 1
One important thing, if you are using BCE loss function the output of the node should be between (0–1). It means you have to use sigmoid activation function on your final output. Since sigmoid converts any real value in the range between (0–1).
如果你使用bce作損失函數,輸出節點可能為0至1 這代表你要用sigmoid激活最後輸出. sigmoid會把輸出縮小至0至1

Note 2備注 2
What if you are not using sigmoid activation on the final layer? Then you can pass an argument called from logits as true to the loss function and it will internally apply the sigmoid to the output value.
如果你不用sigmoid激活函數就必須給將logits參數設為true 然後就會自動引用sigmoid

Categorical Crossentropy 分類交叉熵

When we have a multi-class classification task, one of the loss function you can go ahead is this one. If you are using CCE loss function, there must be the same number of output nodes as the classes. And the final layer output should be passed through a softmax activation so that each node output a probability value between (0–1).
當我們需要處理多類別的分類問題, 你可以試試這個. 如果你用了CCE作損失函數,模型輸出的數量必須等於你的分類的數量. 而你模型的最後一層必須通過softmax激活函數,所以得出每個輸出都會在0~1之間

Example例子
For example, we have a neural network which takes an image and classifies it into a cat or dog. If cat node has high probability score then the image is classified into cat otherwise dog. Basically, whichever class node has the highest probability score, the image is classified into that class.
例如你有一個網絡用作判斷相片中的動物是貓還是狗. 當你代表貓的輸出非常大,那張照片就會被認作貓. 基本上,當有一個類別獲得了最大的輸出值就代表是那個類別

For feeding the target value at the time of training, we have to one-hot encode them. If the image is of cat then target vector would be (1, 0) and if the image is of dog, target vector would be (0, 1). Basically, target vector would be of the same size as the number of classes and the index position corresponding to the actual class would be 1 and all others would be zero.
在訓練中使用訓練集的目標答案時,所有目標答案都需要經過one-hot轉碼(注:one-hot轉碼是把最高數變作1,其餘0). 如果相片內容是貓就會得到輸出(1,0),反之(0,1). 基本上目標陣列(注:vector應該是指向量,在這裡翻譯為陣列比較容易理解)會是和要輸出的每種數字的數量同等大小,索引位置對應要分類的類別,而目標答案應該為1其他都會是0

Note 備注
What if we are not using softmax activation on the final layer? Then you can pass an argument called from logits as true to the loss function and it will internally apply the softmax to the output value. Same as in the above case.
假如我們不使用softax激活函數,你可以把logits(注:中文是什麼呢...)設為真,然後softmax就為自動應用於輸出. 就如上述

Sparse Categorical Crossentropy 稀疏分類交叉熵

This loss function is almost similar to CCE except for one change.
這個損失函數與CCE比較近似, 但又一個不同

When we are using SCCE loss function, you do not need to one hot encode the target vector. If the target image is of a cat, you simply pass 0, otherwise 1. Basically, whichever the class is you just pass the index of that class.
當你使用SCCE作為損失函數, 你不需把目標陣列轉碼. 如果你的相片裡是隻貓, 你只需簡單的標示為0,其餘為1. 基本上,無論任何類別都只需以索引值作為輸出即可

These were the most important loss functions. And probably you will be using one of these loss functions when training your neural network.
這些都是最重要的損失函數. 你可能在訓練模型時只需使用其中一個

This is the source code for all available loss function in Keras.
這裡的都是在keras中使用損失函數的源碼

Raujika的筆記本

2020年2月24日星期一