Why Do Neural Networks Need An Activation Function?

by Computer Vision Department of NTRLab 

Suppose we are given a set of distinct points P = {(xi, yi) ∈ ℝm ×ℝ}i=1,…,n which we regard as a set of test samples xi ∈ ℝm with known answers yi ∈ ℝ. To avoid non-compactness we may assume that P lie in some compact K, for example, K may be some polytope. Does there exist some continuous function in space of all C(K) continuous functions on K such that its graph is a good approximation for our set P in some sense?

From the approximation theory point of view, a neural network is a family of functions {Fθ, θ ∈ Θ} of some functional class. Each special neural network defines each own family of functions. Some of them might be equivalent in some sense. If we restrict ourselves to only MLP according to the above problem with only one intermediate layer consisting of N elements then the corresponding family of functions will be

FORMULA NN 12

and ask, for each σ our set M (σ) has good approximation ability to represent continuous functions in C(K).

While changing the σ the approximation ability of our model is changing. Setting σ to identity function or to polynomial allow to our model span only the finite dimensional vector subspace of the infinite dimensional space of continuous functions on K. There is the theorem that states:

Let σ is bounded and Riemann integrable on each finite interval of ℝ (one can imagine, for example, a continuous function acting from ℝ  ℝ), then M (σ) has good approximation ability to represent any continuous function in C(K) if and only if σ is not a polynomial.

To those who feel interested in, I recommend to look at the history and further development of Kolmogorov – Arnold representation theorem and to read the book of A. Pinkus about ridge functions.

Connect with us on Linkedin!

Let us know what you think about this subject in the comments below.

Share this article with your network. 

Leave a Reply

Your email address will not be published. Required fields are marked *