Hi all

With my last couple posts I have been exploring neural nets. This post continues the topic with a quick look at objective functions.

In the last two posts I glossed over the assumption of the objective function. I used the sum of squared error (SSE) but that is a bit of strange choice as the data being modelled are dichotomous. This post gives a more rigorously objective function and explore the difference between it and SSE.

Recall that the SEE objective function for a network with one output node is where are the observed data, are the network weights and is the output of the neural network for the i’th observation written as a function of the network weights.

Also recall that the probability of a sample of IID Gaussian random variables with known variance is . Thus the -2 log-probability for a Gaussian distribution is the SSE; this is also called the deviance.

Therefore (my favourite pretentious word), a good way to make a make an objective function is to use the -2 log-probability for a distribution that is more representative of dichotomous data. Here I will use the Bernoulli distribution. The probability of a set of observed Bernoulli random variables with is . Thus the deviance and our objective function is . The partial derivative (used in back propagation) for this objective function is .

So what is the difference between the SSE and Bernoulli objective functions? Consider a data set with one observation . Figure 1 shows the two objective functions as a function of in the range . As can be seen the SSE penalizes bad fits (relative to good fits) much less than Bernoulli. In particular as fit with is meaningfully better than a fit with .

Figure 1: The two objective functions.

That is it for now.

### Like this:

Like Loading...

*Related*