|
|
Multi-Valued Neurons
The
Multi-Valued Neuron (MVN) is a neuron with complex-valued weights and inputs
and output that are located on the unit circle. The latter means that MVN’s
output depends only on the argument (phase) of its weighted sum and does not
depend on its magnitude. This important property distinguishes MVN from other
complex-valued neurons and determines all its advantages. The most important of
these advantages is derivative-free learning algorithm for both a single neuron
and an MVN-based feedforward neural network.
Some Historical Notes and Essentials
A
term “Multi-Valued Neuron” was suggested in 1992 by Naum Aizenberg and
Igor Aizenberg in their paper [1]. However, the MVN story has started much earlier
when in 1971 Naum Aizenberg et al. in their paper [2] suggested a model of multiple-valued logic over the
field of complex numbers and introduced a notion of multiple-valued threshold
function over the field of complex numbers. In that seminal paper they also
introduced the first historically known complex-valued activation function. The
main idea behind that work was to get generalization of Boolean threshold logic
for the multiple-valued case and to generalize a notion of a Boolean threshold
function for the multiple-valued case, respectively. Unlike a classical
multiple-valued logic, where values of k-valued logic are encoded by
integers from the set K={0, 1, …, k-1}, in multiple-valued logic
over the field of complex numbers they are encoded
by the kth roots of unity, as it was suggested in [2]. Thus, the values of k-valued logic are located on
the unit circle:
|

|
|
ε=ei2π/k
– primitive kth root of unity, i is an imaginary unity
|
Important
advantage of this model of multiple-valued logic over the traditional one is
that all values of k-valued logic encoded by the kth roots of unity are normalized – their absolute values are
equal to 1 and they differ only by their arguments (phases). A key
definition given in [2], which is a background behind the multi-valued neuron, is
a definition of a multiple-valued threshold function. Let where is the primitive kth
root of unity (i is an imaginary unity and k is some positive
integer) be the set of the kth roots of unity. Then a
function of k-valued
logic is called a k-valued threshold function (or threshold
function of k-valued logic) if there exist a complex-valued vector such that for all from the domain of the
function 
|
,
|
(1)
|
where
are values of k-valued
logic, i is an imaginary unity, and arg z is the argument
of the complex number z. Vector is called a weighting
vector of the threshold function f. Function (2) separates the complex
plane into k equal sectors. P(z) depends only on arg z.
This is illustrated as follows:
|

|
|

Function P maps the complex plane into the
set of the kth roots of unity
|
Function (2) suggested in [2] in 1971 is the
first historically known complex-valued activation function. In [3]-[4], a multi-valued
threshold element, which implements a multiple-valued threshold function, and
methods of its synthesis, was introduced. A theory of multiple-valued threshold
logic over the field of complex numbers was deeply developed in [5].
The Neuron
The discrete
Multi-Valued Neuron (MVN) was introduced in [1] in 1992. It is a
neuron with n complex-valued inputs that are located on the unit circle
and a single complex-valued output, which is the kth root of unity
and is also located on the unit circle, respectively. The weights can be
arbitrary complex numbers (the weighted sum can also be an arbitrary complex
number, accordingly). Function (2) is the activation function of MVN. Let O
be the continuous set of the points located on the unit circle. Then the
discrete MVN implements input/output mapping, which is described by a function where X=Ek
or X=O:
|

|
|
Multi-Valued Neuron
|
Hence MVN
implements mapping (1) with activation function (2).
The main
advantages of MVN over other neurons are its higher functionality (that is, its
ability to learn those input/output mappings that other neurons cannot learn)
and simplicity of its learning, which is derivative-free.
The high
functionality of MVN is determined by its activation function. Let us take a
look at its 3D graphical representation
|

|
|
3D graph of the MVN activation function, k=16
|
It looks like a circular stairs
where each stair is a sector on the complex plane. The next picture illustrates
why the MVN activation function ensures higher functionality of a single MVN
over a neuron with a sigmoid activation function. When we train a sigmoidal
neuron to produce some exact output, we have to ensure that a weighted sum
takes some certain and the only possible value. It is very difficult (and often
impossible) to achieve. Unlike this situation, when we train MVN to produce
some exact output, a weighted sum does not need to take a certain value,
because it can appear in the pre-defined sector, corresponding to the desired
output. However, the sector is infinite and this makes MVN much more flexible
and respectively much more functional that a sigmoidal neuron.
|

|
|
MVN vs. Sigmoidal Neuron
|
In [6], a continuous activation function for MVN was introduced. The concept of the
continuous MVN has been then developed in [7]. Activation function (2) becomes continuous when k→∞. However, if k→∞,
then the angular size of a sector on the complex plane approaches 0 and
function (2) turns into the projection of the weighted sum onto the unit circle:
Continuous activation function (3) is illustrated as follows:
|

|
|
Continuous MVN Activation Function
|
|

|
|
3D Representation of the Continuous MVN Activation
Function
|
The continuous MVN is a very
suitable tool for doing with continuous input/output mappings and solving
regression problems using an MVN-based neural network. There is also another
interesting property. The continuous MVN has a direct analogy with a biological
neuron. In fact, biological neurons intercommunicate with each other by spike
trains. The information contained in the corresponding spike train is encoded
by the frequency of the corresponding spikes, while their magnitude is a
constant. The information contained in the signals transmitted by continuous
MVNs to each other is completely contained in the phases of these signals,
while their magnitude is always equal to 1. The correspondence between
frequency and phase can easily be established. Let
f be the frequency. As it is commonly known from the oscillations
theory, if t is the time, and φ is the phase, then . If the frequency f
is fixed for some time interval then
the last equation may be transformed as follows: Thus, if the frequency
f generated by a biological neuron is known, it is very easy to
transform it to the phase φ
and to the complex number located
on the unit circle, which encodes the MVN state. The opposite is also true:
having any complex number lying on the unit circle, which is the MVN's output
it is possible to transform it to the frequency. This means that all signals
generated by the biological neurons may be unambiguously transformed into the
form acceptable by the MVN, and wise versa, preserving a physical nature of the
signals.
Derivative-Free
Learning Algorithm for MVN
MVN learning
is derivative-free. It is not necessary to consider it as the optimization
problem. Thus, neither a derivative of the error function nor a derivative of
the activation function appears in a learning rule. Moreover, both discrete (2) and continuous (3) activation functions are not differentiable as functions of
complex variable. Intuitively it is easy to conclude that since an MVN output
is always located on the unit circle, the learning process is reduced to
movement along the unit circle. Evidently, a circular movement along the unit
circle can never lead in the incorrect direction. If we need to move from one
point on the circle to another one, we always will reach the target, even
taking a longer way instead of a shorter one. However, there exists a learning
rule, which grantees taking the shortest way to the target. This is
error-correction learning rule, which is illustrated as follows:
|

|
|
Error-Correction Learning Rule
|
If D is the desired output
and Y is the actual output, then the error δ
is their difference δ=D-Y.
This rule is common for both discrete and continuous MVN. The error completely
determines adjustment of the weights and the error-correction learning rule for
MVN is as follows:
|
,
|
(4)
|
where Wr is a current weighting vector, Wr+1 is the following weighting vector (after adjustment), α is a learning rate, which should be
equal to 1, X is a vector of
inputs (bar over it means that it is taken complex-conjugated), n is the number of inputs. A learning
algorithm based on the error-correction learning rule (4) is considered in
detail in [8], where it convergence is proven for the discrete-valued MVN. For
the continuous MVN the convergence is proven in [7]. The learning algorithm
consists of the consecutive checking whether for a given learning sample the
actual output coincides with the desired one and if not, the weights should be
adjusted according to (4).
Theory of
multi-valued neurons, essentials of their learning, and theory of
multiple-valued logic over the field of complex numbers are considered in
detail in [8]. Contributions made by Dr. Igor Aizenberg and his co-authors can
be found here.
Multi-valued neurons have been using in various applications: in associative memories,
in cellular neural networks, in feedforward neural networks, etc. There are
important contributions in this area made by Dr. Jacek Zurada and his
co-authors, Dr. Hiroyuki Aoki and his co-authors, Dr. Dong-Liang Lee, and by other
scientists.
One of the
most successful applications of multi-valued neurons is a multilayer
feedforward neural network based on them (Multilayer Neural Network with
Multi-Valued Neurons – MLMVN). Its idea was proposed in [6], and then it was
considered in detail in [7] and further developed in [9]. This network
significantly outperforms a standard backpropagation network and many
kernel-based techniques in terms of learning speed and generalization
capability.
REFERENCES
[1] N.
N. Aizenberg and I. N. Aizenberg, "CNN
Based on Multi-Valued Neuron as a Model of Associative Memory for Gray-Scale
Images", Proceedings of the Second IEEE Int. Workshop on Cellular
Neural Networks and their Applications, Technical University Munich,
Germany October 1992, pp.36-41.
[2] N.
N. Aizenberg, Yu. L. Ivaskiv, and D. A. Pospelov, "About one generalization
of the threshold function" Doklady Akademii Nauk SSSR (The
Reports of the Academy of Sciences of the USSR), vol. 196, No 6, 1971, pp.
1287-1290 (in Russian).
[3] N. N. Aizenberg , Yu. L. Ivas'kiv,
D. A. Pospelov
and G. F. Khudyakov,
“Multi-valued threshold functions I. Boolean complex-threshold
functions and their generalization”, Cybernetics and
Systems Analysis, 1971, Vol. 7, No 4,
pp. 626-635.
[4] N. N. Aizenberg , Yu. L. Ivas'kiv,
D. A. Pospelov
and G. F. Khudyakov,“Multi-valued
threshold functions II. Synthesis of Multi-Valued Threshold Element”, Cybernetics
and Systems Analysis, 1973, Vol. 9, No 1, pp. 61-77.
[5] N. N. Aizenberg and Yu. L. Ivaskiv, Multiple-Valued
Threshold Logic, Naukova Dumka Publisher House, Kiev, 1977 (in Russian).
[6] I. Aizenberg, C. Moraga, and D. Paliy, "A
Feedforward Neural Network based on Multi-Valued Neurons", In Computational Intelligence, Theory and
Applications. Advances in Soft Computing, XIV, B. Reusch, Ed., Springer, Berlin,
Heidelberg, New York, pp. 599-612, 2005.
[7] I. Aizenberg and C. Moraga, "Multilayer
Feedforward Neural Network based on Multi-Valued Neurons and a Backpropagation
Learning Algorithm", Soft Computing, vol. 11, No 2, January
2007, pp. 169-183.
[8] I. Aizenberg, N. Aizenberg, and J. Vandewalle,
“Multi-Valued
and Universal Binary Neurons: Theory, Learning, Applications”, Kluwer
Academic Publishers, Boston/Dordrecht/London, 2000.
[9] I. Aizenberg, D. Paliy, J. Zurada, and J.
Astola, "Blur
Identification by Multilayer Neural Network based on Multi-Valued Neurons",
IEEE Transactions on Neural Networks, vol. 19, No 5, May 2008, pp.
883-898.
|