Multi-Valued Neurons

The Multi-Valued Neuron (MVN) is a neuron with complex-valued weights and inputs and output that are located on the unit circle. The latter means that MVN’s output depends only on the argument (phase) of its weighted sum and does not depend on its magnitude. This important property distinguishes MVN from other complex-valued neurons and determines all its advantages. The most important of these advantages is derivative-free learning algorithm for both a single neuron and an MVN-based feedforward neural network.

 

Some Historical Notes and Essentials

 

A term “Multi-Valued Neuron” was suggested in 1992 by Naum Aizenberg and Igor Aizenberg in their paper [1]. However, the MVN story has started much earlier when in 1971 Naum Aizenberg et al. in their paper [2] suggested a model of multiple-valued logic over the field of complex numbers and introduced a notion of multiple-valued threshold function over the field of complex numbers. In that seminal paper they also introduced the first historically known complex-valued activation function. The main idea behind that work was to get generalization of Boolean threshold logic for the multiple-valued case and to generalize a notion of a Boolean threshold function for the multiple-valued case, respectively. Unlike a classical multiple-valued logic, where values of k-valued logic are encoded by integers from the set K={0, 1, …, k-1}, in multiple-valued logic over the field of complex numbers they are encoded by the kth roots of unity, as it was suggested in [2]. Thus, the values of k-valued logic are located on the unit circle:

ε=ei2π/k – primitive kth root of unity, i is an imaginary unity

 

Important advantage of this model of multiple-valued logic over the traditional one is that all values of k-valued logic encoded by the kth roots of unity are normalized – their absolute values are equal to 1 and they differ only by their arguments (phases). A key definition given in [2], which is a background behind the multi-valued neuron, is a definition of a multiple-valued threshold function. Let \(E_k = \{1, \varepsilon_{k}, \varepsilon^{2}_{k}, ..., \varepsilon^{k-1}_{k}\}\) where \(\varepsilon_{k} = e^{i2\pi/k}\) is the primitive kth root of unity (i is an imaginary unity and k is some positive integer) be the set of the kth roots of unity. Then a function \(f(x_{1}, ..., x_{n}): E^{n}_{k}\rightarrow E_{k}\) of k-valued logic is called a k-valued threshold function (or threshold function of k-valued logic) if there exist a complex-valued vector \((w_{0}, w_{1}, ..., w_{n})\) such that for all \((x_{1}, ..., x_{n}) \) from the domain of the function \(f(x_{1}, ..., x_{n}) \)

\(f(x_{1}, ..., x_{n}) = P(w_{0} + w_{1}x_{1} + ... + w_{n}x_{n})\) ,

(1)

where

\(P(z)=e^{i2\pi j/k}, \) if \(\ 2\pi j/k \le arg\ z < 2\pi(j+1)/k\)

(2)

\(j=0,1,...,k-1\) are values of k-valued logic, i is an imaginary unity, and arg z is the argument of the complex number z. Vector \((w_{0}, w_{1}, ..., w_{n})\) is called a weighting vector of the threshold function f. Function (2) separates the complex plane into k equal sectors. P(z) depends only on arg z. This is illustrated as follows:

\(\require{color}\) \( \textcolor{BrickRed}{P(z)}=exp(i2\pi j/k)= \textcolor{BrickRed}{\varepsilon^{j}}\),
if \(\ 2\pi j/k \le arg\ z < 2\pi(j+1)/k\)

Function P maps the complex plane into the set of the kth roots of unity

 

Function (2) suggested in [2] in 1971 is the first historically known complex-valued activation function. In [3]-[4], a multi-valued threshold element, which implements a multiple-valued threshold function, and methods of its synthesis, was introduced. A theory of multiple-valued threshold logic over the field of complex numbers was deeply developed in [5].

 

The Neuron

 

The discrete Multi-Valued Neuron (MVN) was introduced in [1] in 1992. It is a neuron with n complex-valued inputs that are located on the unit circle and a single complex-valued output, which is the kth root of unity and is also located on the unit circle, respectively. The weights can be arbitrary complex numbers (the weighted sum can also be an arbitrary complex number, accordingly). Function (2) is the activation function of MVN. Let O be the continuous set of the points located on the unit circle. Then the discrete MVN implements input/output mapping, which is described by a function \(f(x_{1}, ..., x_{n}): X^{n} \rightarrow E_{k} \) where X=Ek or X=O:

 

Multi-Valued Neuron

 

Hence MVN implements mapping (1) with activation function (2).

The main advantages of MVN over other neurons are its higher functionality (that is, its ability to learn those input/output mappings that other neurons cannot learn) and simplicity of its learning, which is derivative-free.

The high functionality of MVN is determined by its activation function. Let us take a look at its 3D graphical representation

3D graph of the MVN activation function, k=16

 

It looks like a circular stairs where each stair is a sector on the complex plane. The next picture illustrates why the MVN activation function ensures higher functionality of a single MVN over a neuron with a sigmoid activation function. When we train a sigmoidal neuron to produce some exact output, we have to ensure that a weighted sum takes some certain and the only possible value. It is very difficult (and often impossible) to achieve. Unlike this situation, when we train MVN to produce some exact output, a weighted sum does not need to take a certain value, because it can appear in the pre-defined sector, corresponding to the desired output. However, the sector is infinite and this makes MVN much more flexible and respectively much more functional that a sigmoidal neuron.

 

MVN vs. Sigmoidal Neuron

 

In [6], a continuous activation function for MVN was introduced. The concept of the continuous MVN has been then developed in [7]. Activation function (2) becomes continuous when k→∞. However, if k→∞, then the angular size of a sector on the complex plane approaches 0 and function (2) turns into the projection of the weighted sum \(z=w_{0} + w_{1}x_{1} + ... + w_{n}x_{n} \) onto the unit circle:

P(z)=eiArgz = z/|z|.

(3)

Continuous activation function (3) is illustrated as follows:

Continuous MVN Activation Function

3D Representation of the Continuous MVN Activation Function

 

The continuous MVN is a very suitable tool for doing with continuous input/output mappings and solving regression problems using an MVN-based neural network. There is also another interesting property. The continuous MVN has a direct analogy with a biological neuron. In fact, biological neurons intercommunicate with each other by spike trains. The information contained in the corresponding spike train is encoded by the frequency of the corresponding spikes, while their magnitude is a constant. The information contained in the signals transmitted by continuous MVNs to each other is completely contained in the phases of these signals, while their magnitude is always equal to 1. The correspondence between frequency and phase can easily be established. Let f be the frequency. As it is commonly known from the oscillations theory, if t is the time, and φ is the phase, then \(\varphi = \theta_{0} + 2\pi \int fdt = \theta_{0} + \theta(t) \) . If the frequency f is fixed for some time interval \( \Delta t \) then the last equation may be transformed as follows: \(\varphi = \theta_{0} + 2\pi f \Delta t \) Thus, if the frequency f generated by a biological neuron is known, it is very easy to transform it to the phase φ and to the complex number \( e^{i\phi} \) located on the unit circle, which encodes the MVN state. The opposite is also true: having any complex number lying on the unit circle, which is the MVN's output it is possible to transform it to the frequency. This means that all signals generated by the biological neurons may be unambiguously transformed into the form acceptable by the MVN, and wise versa, preserving a physical nature of the signals.

 

Derivative-Free Learning Algorithm for MVN

 

MVN learning is derivative-free. It is not necessary to consider it as the optimization problem. Thus, neither a derivative of the error function nor a derivative of the activation function appears in a learning rule. Moreover, both discrete (2) and continuous (3) activation functions are not differentiable as functions of complex variable. Intuitively it is easy to conclude that since an MVN output is always located on the unit circle, the learning process is reduced to movement along the unit circle. Evidently, a circular movement along the unit circle can never lead in the incorrect direction. If we need to move from one point on the circle to another one, we always will reach the target, even taking a longer way instead of a shorter one. However, there exists a learning rule, which grantees taking the shortest way to the target. This is error-correction learning rule, which is illustrated as follows:

 

Error-Correction Learning Rule

 

If D is the desired output and Y is the actual output, then the error δ is their difference δ=D-Y. This rule is common for both discrete and continuous MVN. The error completely determines adjustment of the weights and the error-correction learning rule for MVN is as follows:

\(W_{r+1} = W_{r} + \frac{a}{(n+1)}\delta \overline{X} \) ,

(4)

where Wr is a current weighting vector, Wr+1 is the following weighting vector (after adjustment), α is a learning rate, which should be equal to 1, X is a vector of inputs (bar over it means that it is taken complex-conjugated), n is the number of inputs. A learning algorithm based on the error-correction learning rule (4) is considered in detail in [8], where it convergence is proven for the discrete-valued MVN. For the continuous MVN the convergence is proven in [7]. The learning algorithm consists of the consecutive checking whether for a given learning sample the actual output coincides with the desired one and if not, the weights should be adjusted according to (4).

 

Theory of multi-valued neurons, essentials of their learning, and theory of multiple-valued logic over the field of complex numbers are considered in detail in [8]. Contributions made by Dr. Igor Aizenberg and his co-authors can be found here. Multi-valued neurons have been using in various applications: in associative memories, in cellular neural networks, in feedforward neural networks, etc. There are important contributions in this area made by Dr. Jacek Zurada and his co-authors, Dr. Hiroyuki Aoki and his co-authors, Dr. Dong-Liang Lee, and by other scientists.

One of the most successful applications of multi-valued neurons is a multilayer feedforward neural network based on them (Multilayer Neural Network with Multi-Valued Neurons – MLMVN). Its idea was proposed in [6], and then it was considered in detail in [7] and further developed in [9]. This network significantly outperforms a standard backpropagation network and many kernel-based techniques in terms of learning speed and generalization capability.

 

REFERENCES

 

[1]   N. N. Aizenberg and I. N. Aizenberg, "CNN Based on Multi-Valued Neuron as a Model of Associative Memory for Gray-Scale Images", Proceedings of the Second IEEE Int. Workshop on Cellular Neural Networks and their Applications, Technical University Munich, Germany October 1992, pp.36-41.

[2]   N. N. Aizenberg, Yu. L. Ivaskiv, and D. A. Pospelov, "About one generalization of the threshold function" Doklady Akademii Nauk SSSR (The Reports of the Academy of Sciences of the USSR), vol. 196, No 6, 1971, pp. 1287-1290 (in Russian).

[3]   N. N. Aizenberg , Yu. L. Ivas'kiv, D. A. Pospelov and G. F. Khudyakov, “Multi-valued threshold functions I. Boolean complex-threshold functions and their generalization”, Cybernetics and Systems Analysis, 1971, Vol. 7, No 4, pp. 626-635.

[4]   N. N. Aizenberg , Yu. L. Ivas'kiv, D. A. Pospelov and G. F. Khudyakov,“Multi-valued threshold functions II. Synthesis of Multi-Valued Threshold Element”, Cybernetics and Systems Analysis, 1973, Vol. 9, No 1, pp. 61-77.

[5]   N. N. Aizenberg and Yu. L. Ivaskiv, Multiple-Valued Threshold Logic, Naukova Dumka Publisher House, Kiev, 1977 (in Russian).

[6]   I. Aizenberg, C. Moraga, and D. Paliy, "A Feedforward Neural Network based on Multi-Valued Neurons", In Computational Intelligence, Theory and Applications. Advances in Soft Computing, XIV, B. Reusch, Ed., Springer, Berlin, Heidelberg, New York, pp. 599-612, 2005.

[7]   I. Aizenberg and C. Moraga, "Multilayer Feedforward Neural Network based on Multi-Valued Neurons and a Backpropagation Learning Algorithm", Soft Computing, vol. 11, No 2, January 2007, pp. 169-183.

[8]   I. Aizenberg, N. Aizenberg, and J. Vandewalle, Multi-Valued and Universal Binary Neurons: Theory, Learning, Applications, Kluwer Academic Publishers, Boston/Dordrecht/London, 2000.

[9]   I. Aizenberg, D. Paliy, J. Zurada, and J. Astola, "Blur Identification by Multilayer Neural Network based on Multi-Valued Neurons", IEEE Transactions on Neural Networks, vol. 19, No 5, May 2008, pp. 883-898.