Development of automatic pattern recognition technology. Overview of existing pattern recognition methods

Modern robots equipped with vision systems are able to see well in order to work with the real world. They can conclude what type of objects are present, in what relationship they are with each other, what groups they form.

The essence of the recognition problem is to establish whether the objects under study have a fixed finite set of features that allows them to be attributed to a certain class.

The goals of the science of pattern recognition:

Replacing a human expert or a complex expert system with a simpler system (automation of human activity or simplification of complex systems);

Building learning systems that are able to make decisions without specifying clear rules, namely, systems that are able to synthesize decision-making rules themselves based on some finite number of examples of correct decisions “demonstrated” to the system.

Recognition tasks can be characterized as follows.

1. These are informational tasks, consisting of two main stages: bringing the source data to a form convenient for recognition and recognition itself.

2. In these problems, one can introduce the concept of analogy and similarity of objects and formulate the concept of proximity of objects as a basis for assigning an object to a certain class.

3. In these tasks, it is possible to operate with a set of examples, the classification of which is known and which, in the form of formalized descriptions, can be presented to the recognition algorithm for adjustment to the task in the learning process.

4. For these problems it is difficult to build formal theories and apply classical mathematical methods.

5. In these tasks, "bad" information is possible.

Types of recognition tasks:

Assigning the presented object to one of the classes (training with a teacher);

Automatic classification - splitting a set of objects (situations) according to their description into a system of non-overlapping classes;

Selection of a set of informational features for recognition;

Bringing the source data to a form convenient for recognition;

Dynamic recognition and dynamic classification;

Forecasting tasks.

Basic definitions

Image is a structured description of an object or phenomenon, represented by a feature vector, each element of which represents the numerical value of one of the features that characterize the given object. In other words: an image is any object for which a set of certain numerical features can be measured. An example of an image: a letter, an image, a cardiogram, etc.

Numeric sign(or just a sign). is a formula or other description of a method for matching an object with a certain numerical characteristic, which operates within the framework of a specific pattern recognition problem. For each object, several different features, that is, several numerical characteristics, can be defined.

feature space.N-dimensional space defined for a given recognition task, where N is a fixed number of measured features for any objects. The vector from the feature space corresponding to the object of the recognition problem is an N-dimensional vector with components (x1, x2, ..., xN), which are the values ​​of the features of this object.

OBJECT->Nfeatures->M-dimensional feature vector

Class- non-formalizable (as a rule) idea of ​​the possibility of assigning an arbitrary object from the set of objects of the recognition task to a certain group of objects. For objects of the same class, the presence of "similarity" is assumed. For the problem of pattern recognition, an arbitrary number of classes can be defined, greater than 1. The number of classes is denoted by the number S.

In general, the problem of pattern recognition consists of two parts: recognition and learning.

Pattern recognition is the classification of a certain group of objects based on certain requirements. Objects belonging to the same class of images have common properties. The requirements that define the classification may be different, since different situations require different types of classifications.

For example, when recognizing English letters, 26 classes of images are formed. However, in order to distinguish English letters from Chinese characters during recognition, only two classes of images are needed.

The simplest approach to pattern recognition is pattern matching. In this case, a set of images, one from each class of images, is stored in the machine's memory. The input (recognizable) image (of an unknown class) is compared with the standard of each class. The classification is based on a preselected matching or similarity criterion. In other words, if the input image matches the pattern of the i-th class of patterns better than any other pattern, then the input pattern is classified as belonging to the i-th class of patterns.

The disadvantage of this approach, i.e. matching with a standard, is that in some cases it is difficult to choose a suitable standard from each class of images and establish the necessary matching criterion.

A more advanced approach is that the classification is based on some set of selected measurements made on the input images. These selected measurements, called "features", are assumed to be invariant or insensitive to commonly encountered changes and distortions and have little redundancy.

A special case of the second “feature measurement” approach, in which the standards are stored in the form of measured features and a special classification criterion (comparison) is used in the classifier.

Features are defined by the developers and must be invariant to the orientation, size and shape variations of objects.

Lecture number 17.PATTERN RECOGNITION METHODS

There are the following groups of recognition methods:

Proximity Function Methods

Discriminant Function Methods

Statistical methods of recognition.

Linguistic Methods

heuristic methods.

The first three groups of methods are focused on the analysis of features expressed by numbers or vectors with numerical components.

The group of linguistic methods provides pattern recognition based on the analysis of their structure, which is described by the corresponding structural features and relationships between them.

The group of heuristic methods combines the characteristic techniques and logical procedures used by humans in pattern recognition.

Proximity Function Methods

The methods of this group are based on the use of functions that evaluate the measure of proximity between the recognizable image with the vector x * = (x * 1 ,….,x*n), and reference images of various classes, represented by vectors x i = (x i 1 ,…, x i n), i= 1,…,N, where i- image class number.

The recognition procedure according to this method consists in calculating the distance between the point of the recognized image and each of the points representing the reference image, i.e. in the calculation of all values d i , i= 1,…,N. The image belongs to the class for which the value d i has the least value among all i= 1,…,N .

A function that maps each pair of vectors x i, x * a real number as a measure of their closeness, i.e. determining the distance between them can be quite arbitrary. In mathematics, such a function is called the space metric. It must satisfy the following axioms:

r(x,y)=r(y,x);

r(x,y) > 0 if x not equal y and r(x,y)=0 if x=y;

r(x,y) <=r(x,z)+r(z,y)

These axioms are satisfied, in particular, by the following functions

a i= 1/2 , j=1,2,…n.

b i=sum, j=1,2,…n.

c i=max abs ( x ix j *), j=1,2,…n.

The first of these is called the Euclidean norm of a vector space. Accordingly, the spaces in which the specified function is used as a metric is called the Euclidean space.

Often, the root-mean-square difference of the coordinates of the recognized image is chosen as the proximity function x * and standard x i, i.e. function

d i = (1/n) sum( x i jx j *) 2 , j=1,2,…n.

Value d i geometrically interpreted as the square of the distance between points in the feature space, related to the dimension of the space.

It often turns out that different features are not equally important in recognition. In order to take this circumstance into account when calculating the proximity functions of the difference in coordinates, the corresponding more important features are multiplied by large coefficients, and the less important ones by smaller ones.

In this case d i = (1/n) sum wj (x i jx j *) 2 , j=1,2,…n,

where wj- weight coefficients.

The introduction of weight coefficients is equivalent to scaling the axes of the feature space and, accordingly, stretching or compressing the space in separate directions.

These deformations of the feature space pursue the goal of such an arrangement of points of reference images, which corresponds to the most reliable recognition under conditions of a significant scatter of images of each class in the vicinity of the point of the reference image.

Groups of image points close to each other (clusters of images) in the feature space are called clusters, and the problem of identifying such groups is called the clustering problem.

The task of identifying clusters is referred to as unsupervised pattern recognition tasks, i.e. to recognition problems in the absence of an example of correct recognition.

Discriminant Function Methods

The idea of ​​the methods of this group is to construct functions that define boundaries in the space of images, dividing the space into regions corresponding to classes of images. The simplest and most frequently used functions of this kind are functions that depend linearly on the values ​​of features. In the feature space, they correspond to separating surfaces in the form of hyperplanes. In the case of a two-dimensional feature space, a straight line acts as a separating function.

The general form of the linear decision function is given by the formula

d(x)=w 1 x 1 + w 2 x 2 +…+w n x n +w n +1 = Wx+w n

where x- image vector, w=(w 1 , w 2 ,…w n) is the vector of weight coefficients.

When divided into two classes X 1 and X 2 discriminant function d(x) allows recognition according to the rule:

x belongs X 1 if d(x)>0;

x belongs X 2 if d(x)<0.

If a d(x)=0, then the case of uncertainty takes place.

In the case of splitting into several classes, several functions are introduced. In this case, each class of images is assigned a certain combination of signs of discriminating functions.

For example, if three discriminant functions are introduced, then the following variant of selecting image classes is possible:

x belongs X 1 if d 1 (x)>0,d 2 (x)<0,d 3 (x)<0;

x belongs X 2 if d(x)<0,d 2 (x)>0,d 3 (x)<0;

x belongs X 3 if d(x)<0,d 2 (x)<0,d 3 (x)>0.

It is assumed that for other combinations of values d 1 (x),d 2 (x),d 3 (x) there is a case of uncertainty.

A variation of the method of discriminant functions is the method of decisive functions. In it, if available m classes are assumed to exist m functions d i(x), called decisive, such that if x belongs X i, then d i(x) > d j(x) for all j not equal i,those. decisive function d i(x) has the maximum value among all functions d j(x), j=1,...,n..

An illustration of such a method can be a classifier based on an estimate of the minimum of the Euclidean distance in the feature space between the image point and the standard. Let's show it.

Euclidean distance between the feature vector of the recognizable image x and the vector of the reference image is determined by the formula || x ix|| = 1/2 , j=1,2,…n.

Vector x will be assigned to the class i, for which the value || x ix *|| minimum.

Instead of distance, you can compare the square of the distance, i.e.

||x ix|| 2 = (x ix)(x ix) t = x x- 2x x i +x i x i

Since the value x x the same for everyone i, the minimum of the function || x ix|| 2 will coincide with the maximum of the decision function

d i(x) = 2x x i -x i x i.

that is x belongs X i, if d i(x) > d j(x) for all j not equal i.

That. the minimum distance classifying machine is based on linear decision functions. The general structure of such a machine uses decision functions of the form

d i (x)=w i 1 x 1 + w i 2 x 2 +…+w in x n +w i n +1

It can be visually represented by the appropriate block diagram.

For a machine that performs classification according to the minimum distance, the equalities take place: w ij = -2x i j , w i n +1 = x i x i.

Equivalent recognition by the method of discriminant functions can be carried out if the discriminant functions are defined as differences dij (x)=d i (x)‑d j (x).

The advantage of the method of discriminant functions is the simple structure of the recognition machine, as well as the possibility of its implementation mainly through predominantly linear decision blocks.

Another important advantage of the method of discriminant functions is the possibility of automatic training of the machine for the correct recognition of a given (training) sample of images.

At the same time, the automatic learning algorithm turns out to be very simple in comparison with other recognition methods.

For these reasons, the method of discriminant functions has gained wide popularity and is often used in practice.

Pattern recognition self-learning procedures

Consider methods for constructing a discriminant function from a given (training) sample as applied to the problem of dividing images into two classes. If two sets of images are given, belonging respectively to classes A and B, then the solution to the problem of constructing a linear discriminant function is sought in the form of a vector of weight coefficients W=(w 1 ,w 2 ,...,w n,w n+1), which has the property that for any image the conditions

x belongs to class A if >0, j=1,2,…n.

x belongs to class B if<0, j=1,2,…n.

If the training sample is N images of both classes, the problem is reduced to finding a vector w that ensures the validity of the system of inequalities. If the training sample consists of N images of both classes, the problem is reduced to finding the vector w, which ensures the validity of the system of inequalities

x 1 1 w i+x 21 w 2 +...+x n 1 w n+w n +1 >0;

x 1 2 w i+x 22 w 2 +...+x n 2 w n+w n +1 <0;

x 1 iw i+x 2i w 2 +...+x ni w n+w n +1 >0;

................................................

x 1 Nw i +x 2N w 2 +...+x nN w n +w n + 1>0;

here x i=(x i 1 ,x i 2 ,...,x i n ,x i n+ 1 ) - the vector of values ​​of the features of the image from the training sample, the > sign corresponds to the vectors of images x belonging to the class A, and the sign< - векторам x belonging to class B.

Desired vector w exists if the classes A and B are separable and does not exist otherwise. Vector component values w can be found either preliminary, at the stage preceding the hardware implementation of the FRO, or directly by the FRO itself in the course of its operation. The last of these approaches provides greater flexibility and autonomy of the SRO. Consider it on the example of a device called percentron. invented in 1957 by the American scientist Rosenblatt. A schematic representation of the percentron, which ensures that the image is assigned to one of the two classes, is shown in the following figure.

Retina S Retina A Retina R

oh oh x 1

oh oh x 2

oh oh x 3

o(sum)-------> R(reaction)

oh oh x i

oh oh x n

oh oh x n +1

The device consists of retinal sensory elements S, which are randomly connected to the associative elements of the retina A. Each element of the second retina produces an output signal only if a sufficient number of sensory elements connected to its input are in an excited state. Whole system response R is proportional to the sum of the reactions of the elements of the associative retina taken with certain weights.

Denoting through x i reaction i th associative element and through w i- reaction weight coefficient i th associative element, the reaction of the system can be written as R=sum( w j x j), j=1,..,n. If a R>0, then the image presented to the system belongs to class A, and if R<0, то образ относится к классу B. Описание этой процедуры классификации соответствует рассмотренным нами раньше принципам классификации, и, очевидно, перцентронная модель распознавания образов представляет собой, за исключением сенсорной сетчатки, реализацию линейной дискриминантной функции. Принятый в перцентроне принцип формирования значений x 1 , x 2 ,...,x n corresponds to a certain algorithm for the formation of features based on the signals of the primary sensors.

In general, there may be several elements R, which form the reaction of the perceptron. In this case, one speaks of the presence of the retina in the perceptron R reacting elements.

The percentron scheme can be extended to the case when the number of classes is more than two, by increasing the number of retinal elements R up to the number of distinguishable classes and the introduction of a block for determining the maximum reaction in accordance with the scheme presented in the above figure. In this case, the image is assigned to the class with the number i, if R i>Rj, for all j.

The learning process of the percentron consists in selecting the values ​​of the weight coefficients wj so that the output signal corresponds to the class to which the recognized image belongs.

Let us consider the percentron action algorithm using the example of recognizing objects of two classes: A and B. The objects of class A must correspond to the value R= +1, and class B - the value R= -1.

The learning algorithm is as follows.

If another image x belongs to class A, but R<0 (имеет место ошибка распознавания), тогда коэффициенты wj with indices corresponding to values x j>0, increase by some amount dw, and the rest of the coefficients wj decrease by dw. In this case, the value of the reaction R receives an increment towards its positive values ​​corresponding to the correct classification.

If a x belongs to class B, but R>0 (there is a recognition error), then the coefficients wj with indices corresponding to x j<0, увеличивают на dw, and the rest of the coefficients wj reduced by the same amount. In this case, the value of the reaction R is incremented towards negative values ​​corresponding to the correct classification.

The algorithm thus introduces a change in the weight vector w if and only if the image presented to k-th training step, was incorrectly classified during this step, and leaves the weight vector w no change in case of correct classification. The proof of the convergence of this algorithm is presented in [Too, Gonzalez]. Such training will eventually (with proper choice dw and linear separability of image classes) leads to a vector w for correct classification.

Statistical methods of recognition.

Statistical methods are based on minimizing the probability of a classification error. The probability P of incorrect classification of the image received for recognition, described by the feature vector x, is determined by the formula

P = sum[ p(i)prob( D(x)+i | x class i)]

where m- number of classes,

p(i) = probe ( x belongs to the class i) - a priori probability of belonging to an arbitrary image x to i-th class (frequency of occurrence of images i th class),

D(x) is a function that makes a classification decision (the feature vector x matches the class number i from the set (1,2,..., m}),

prob( D(x) not equal i| x belongs to the class i) is the probability of the event " D(x) not equal i" when the membership condition is met x class i, i.e. the probability of making an erroneous decision by the function D(x) for a given value x owned by i-th class.

It can be shown that the probability of misclassification reaches a minimum if D(x)=i if and only if p(x|ip(i)>p(x|jp(j), for all i+j, where p(x|i) - distribution density of images i th class in the feature space.

According to the above rule, the point x belongs to the class that corresponds to the maximum value p(i) p(x|i), i.e. the product of the a priori probability (frequency) of the appearance of images i-th class and pattern distribution density i th class in the feature space. The presented classification rule is called Bayesian, because it follows from the well-known Bayes formula in probability theory.

Example. Let it be necessary to recognize discrete signals at the output of an information channel affected by noise.

Each input signal is a 0 or 1. As a result of the signal transmission, the output of the channel appears the value x, which is superimposed with Gaussian noise with zero mean and variance b.

For the synthesis of a classifier that performs signal recognition, we will use the Bayesian classification rule.

In class No. 1 we combine the signals representing units, in class No. 2 - signals representing zeros. Let it be known in advance that, on average, out of every 1000 signals a signals are units and b signals - zeros. Then the values ​​of a priori probabilities of the appearance of signals of the 1st and 2nd classes (ones and zeros), respectively, can be taken equal to

p(1)=a/1000, p(2)=b/1000.

Because the noise is Gaussian, i.e. obeys the normal (Gaussian) distribution law, then the density of distribution of images of the first class, depending on the value x, or, which is the same, the probability of obtaining the output value x when signal 1 is applied at the input, it is determined by the expression

p(x¦1) =(2pib) -1/2 exp(-( x-1) 2 /(2b 2)),

and the distribution density depending on the value x images of the second class, i.e. the probability of obtaining the output value x when a signal 0 is applied at the input, it is determined by the expression

p(x¦2)= (2pib) -1/2 exp(- x 2 /(2b 2)),

The application of the Bayesian decision rule leads to the conclusion that a class 2 signal is transmitted, i.e. passed zero if

p(2) p(x¦2) > p(1) p(x¦1)

or, more specifically, if

b exp(- x 2 /(2b 2)) > a exp(-( x-1) 2 /(2b 2)),

Dividing the left side of the inequality by the right side, we get

(b/a)exp((1-2 x)/(2b 2)) >1,

whence, after taking the logarithm, we find

1-2x> 2b 2 ln(a/b)

x< 0.5 - б 2 ln(a/b)

It follows from the resulting inequality that a=b, i.e. with the same a priori probabilities of occurrence of signals 0 and 1, the image is assigned the value 0 when x<0.5, а значение 1, когда x>0.5.

If it is known in advance that one of the signals appears more often, and the other less often, i.e. in case of different values a and b, the classifier response threshold is shifted to one side or the other.

So at a/b=2.71 (corresponding to 2.71 times more frequent transmission of ones) and b 2 =0.1, the image is assigned the value 0 if x<0.4, и значение 1, если x>0.4. If there is no information about the a priori distribution probabilities, then statistical recognition methods can be used, which are based on other than Bayesian classification rules.

However, in practice, methods based on Bayes' rules are most common due to their greater efficiency, and also due to the fact that in most pattern recognition problems it is possible to set a priori probabilities for the appearance of images of each class.

Linguistic methods of pattern recognition.

Linguistic methods of pattern recognition are based on the analysis of the description of an idealized image, represented as a graph or a string of symbols, which is a phrase or a sentence of a certain language.

Consider the idealized images of letters obtained as a result of the first stage of linguistic recognition described above. These idealized images can be defined by descriptions of graphs, represented, for example, in the form of connection matrices, as was done in the above example. The same description can be represented by a formal language phrase (expression).

Example. Let there be given three images of the letter A obtained as a result of preliminary image processing. Let's designate these images with identifiers A1, A2 and A3.

For the linguistic description of the presented images, we use the PDL (Picture Description Language). The PDL language dictionary includes the following symbols:

1. Names of the simplest images (primitives). As applied to the case under consideration, the primitives and their corresponding names are as follows.

Images in the form of a line directed:

up and left (le F t), to the north (north)), up and to the right (right), to the east (east)).

Names: L, N, R, E.

2. Symbols of binary operations. (+,*,-) Their meaning corresponds to the sequential connection of primitives (+), the connection of the beginnings and endings of primitives (*), the connection of only the endings of primitives (-).

3. Right and left brackets. ((,)) Parentheses allow you to specify the order in which operations are to be performed in an expression.

The considered images A1, A2 and A3 are described in the PDL language, respectively, by the following expressions.

T(1)=R+((R-(L+N))*E-L

T(2)=(R+N)+((N+R)-L)*E-L

T(3)=(N+R)+(R-L)*E-(L+N)

After the linguistic description of the image has been built, it is necessary to analyze, using some recognition procedure, whether the given image belongs to the class of interest to us (the class of letters A), i.e. whether or not this image has some structure. To do this, first of all, it is necessary to describe the class of images that have the structure of interest to us.

Obviously, the letter A always contains the following structural elements: the left "leg", the right "leg" and the head. Let's name these elements respectively STL, STR, TR.

Then, in the PDL language, the symbol class A - SIMB A is described by the expression

SIMB A = STL + TR - STR

The left "leg" of the STL is always a chain of elements R and N, which can be written as

STL ‑> R ¦ N ¦ (STL + R) ¦ (STL + N)

(STL is the character R or N, or a string obtained by adding R or N characters to the source STL string)

The right "leg" of STR is always a chain of elements L and N, which can be written as follows, i.e.

STR ‑> L¦N¦ (STR + L)¦(STR + N)

The head part of the letter - TR is a closed contour, composed of the element E and chains like STL and STR.

In the PDL language, the TR structure is described by the expression

TR ‑> (STL - STR) * E

Finally, we get the following description of the class of letters A:

SIMB A ‑> (STL + TR - STR),

STL ‑> R¦N¦ (STL + R)¦(STL + N)

STR ‑> L¦N¦ (STR + L)¦(STR + N)

TR ‑> (STL - STR) * E

The recognition procedure in this case can be implemented as follows.

1. The expression corresponding to the image is compared with the reference structure STL + TR - STR.

2. Each element of the STL, TR, STR structure, if possible, i.e. if the description of the image is comparable with the standard, some subexpression from the expression T(A) is matched. For example,

for A1: STL=R, STR=L, TR=(R-(L+N))*E

for A2: STL = R + N, STR = L, TR = ((N + R) - L) * E

for A3: STL = N + R, STR = L + N, TR = (R - L) * E 3.

STL, STR, TR expressions are compared with their corresponding reference structures.

4. If the structure of each STL, STR, TR expression corresponds to the reference one, it is concluded that the image belongs to the class of letters A. If at any of stages 2, 3, 4 there is a discrepancy between the structure of the analyzed expression and the reference, it is concluded that the image does not belong to the SIMB class A. Expression structure matching can be done using the algorithmic languages ​​LISP, PLANER, PROLOG and other similar artificial intelligence languages.

In the example under consideration, all STL strings are made up of N and R characters, and STR strings are made up of L and N characters, which corresponds to the given structure of these strings. The TR structure in the considered images also corresponds to the reference one, since consists of "difference" of strings of type STL, STR, "multiplied" by the symbol E.

Thus, we come to the conclusion that the considered images belong to the class SIMB A.


Synthesis of fuzzy DC electric drive controllerin the "MatLab" environment

Synthesis of a fuzzy controller with one input and output.

The problem is getting the drive to follow the various inputs accurately. The development of the control action is carried out by a fuzzy controller, in which the following functional blocks can be structurally distinguished: fuzzifier, rule block and defuzzifier.

Fig.4 Generalized functional diagram of a system with two linguistic variables.

Fig.5 Schematic diagram of a fuzzy controller with two linguistic variables.

The fuzzy control algorithm in the general case is a transformation of the input variables of the fuzzy controller into its output variables using the following interrelated procedures:

1. transformation of input physical variables obtained from measuring sensors from the control object into input linguistic variables of a fuzzy controller;

2. processing of logical statements, called linguistic rules, regarding the input and output linguistic variables of the controller;

3. transformation of the output linguistic variables of the fuzzy controller into physical control variables.

Let us first consider the simplest case, when only two linguistic variables are introduced to control the servo drive:

"angle" - input variable;

"control action" - output variable.

We will synthesize the controller in the MatLab environment using the Fuzzy Logic toolbox. It allows you to create fuzzy inference and fuzzy classification systems within the MatLab environment, with the possibility of integrating them into Simulink. The basic concept of Fuzzy Logic Toolbox is FIS-structure - Fuzzy Inference System. The FIS-structure contains all the necessary data for the implementation of the functional mapping "inputs-outputs" based on fuzzy logical inference according to the scheme shown in fig. 6.


Figure 6. Fuzzy inference.

X - input crisp vector; - vector of fuzzy sets corresponding to the input vector X;
- the result of logical inference in the form of a vector of fuzzy sets; Y - output crisp vector.

The fuzzy module allows you to build fuzzy systems of two types - Mamdani and Sugeno. In Mamdani-type systems, the knowledge base consists of rules of the form “If x 1 =low and x 2 =medium, then y=high”. In Sugeno-type systems, the knowledge base consists of rules of the form “If x 1 =low and x 2 =medium, then y=a 0 +a 1 x 1 +a 2 x 2 ". Thus, the main difference between the Mamdani and Sugeno systems lies in the different ways of setting the values ​​of the output variable in the rules that form the knowledge base. In Mamdani-type systems, the values ​​of the output variable are given by fuzzy terms, in Sugeno-type systems - as a linear combination of input variables. In our case, we will use the Sugeno system, because it lends itself better to optimization.

To control the servo drive, two linguistic variables are introduced: "error" (by position) and "control action". The first of them is the input, the second is the output. Let's define a term-set for the specified variables.

The main components of fuzzy inference. Fuzzifier.

For each linguistic variable, we define a basic term-set of the form, which includes fuzzy sets that can be designated: negative high, negative low, zero, positive low, positive high.

First of all, let's subjectively define what is meant by the terms "large error", "small error", etc., defining the membership functions for the corresponding fuzzy sets. Here, for now, one can only be guided by the required accuracy, known parameters for the class of input signals, and common sense. So far, no one has been able to offer any rigid algorithm for choosing the parameters of membership functions. In our case, the linguistic variable "error" will look like this.

Fig.7. Linguistic variable "error".

It is more convenient to represent the linguistic variable "management" in the form of a table:

Table 1

Rule block.

Consider the sequence of defining several rules that describe some situations:

Suppose, for example, that the output angle is equal to the input signal (i.e., the error is zero). Obviously, this is the desired situation, and therefore we do not have to do anything (the control action is zero).

Now consider another case: the position error is much greater than zero. Naturally, we must compensate for it by generating a large positive control signal.

That. two rules have been drawn up, which can be formally defined as follows:

if error = null, then control action = zero.

if error = big positive, then control action = large positive.

Fig.8. Formation of control with a small positive error in position.

Fig.9. Formation of control at zero error by position.

The table below shows all the rules corresponding to all situations for this simple case.

table 2

In total, for a fuzzy controller with n inputs and 1 output, control rules can be determined, where is the number of fuzzy sets for the i-th input, but for the normal functioning of the controller it is not necessary to use all possible rules, but you can get by with a smaller number of them. In our case, all 5 possible rules are used to form a fuzzy control signal.

Defuzzifier.

Thus, the resulting impact U will be determined according to the implementation of any rule. If a situation arises when several rules are executed at once, then the resulting action U is found according to the following relationship:

, where n is the number of triggered rules (defuzzification by the area center method), u n is the physical value of the control signal corresponding to each of the fuzzy sets UBO, UMo, UZ, UMp, UBP. mUn(u) is the degree of belonging of the control signal u to the corresponding fuzzy set Un=( UBO, UMo, UZ, UMp, UBP). There are also other methods of defuzzification, when the output linguistic variable is proportional to the "strong" or "weak" rule itself.

Let's simulate the process of controlling the electric drive using the fuzzy controller described above.

Fig.10. Block diagram of the system in the environmentmatlab.

Fig.11. Structural diagram of a fuzzy controller in the environmentmatlab.

Fig.12. Transient process at a single step action.

Rice. 13. Transient process under harmonic input for a model with a fuzzy controller containing one input linguistic variable.

An analysis of the characteristics of a drive with a synthesized control algorithm shows that they are far from optimal and worse than in the case of control synthesis by other methods (too much control time with a single step effect and an error with a harmonic one). This is explained by the fact that the parameters of the membership functions were chosen quite arbitrarily, and only the magnitude of the position error was used as the controller inputs. Naturally, there can be no talk of any optimality of the obtained controller. Therefore, the task of optimizing the fuzzy controller becomes relevant in order to achieve the highest possible indicators of control quality. Those. the task is to optimize the objective function f(a 1 ,a 2 …a n), where a 1 ,a 2 …a n are the coefficients that determine the type and characteristics of the fuzzy controller. To optimize the fuzzy controller, we use the ANFIS block from the Matlab environment. Also, one of the ways to improve the characteristics of the controller may be to increase the number of its inputs. This will make the regulator more flexible and improve its performance. Let's add one more input linguistic variable - the rate of change of the input signal (its derivative). Accordingly, the number of rules will also increase. Then the circuit diagram of the regulator will take the form:

Fig.14 Schematic diagram of a fuzzy controller with three linguistic variables.

Let be the value of the speed of the input signal. The base term-set Tn is defined as:

Тn=("negative (VO)", "zero (Z)", "positive (VR)").

The location of membership functions for all linguistic variables is shown in the figure.

Fig.15. Membership functions of the linguistic variable "error".

Fig.16. Membership functions of the linguistic variable "input signal speed".

Due to the addition of one more linguistic variable, the number of rules will increase to 3x5=15. The principle of their compilation is completely similar to that discussed above. All of them are shown in the following table:

Table 3

fuzzy signal

management

Position error

Speed

For example, if if error = zero and input signal derivative = large positive, then control action = small negative.

Fig.17. Formation of control under three linguistic variables.

Due to the increase in the number of inputs and, accordingly, the rules themselves, the structure of the fuzzy controller will also become more complicated.

Fig.18. Structural diagram of a fuzzy controller with two inputs.

Add drawing

Fig.20. Transient process under harmonic input for a model with a fuzzy controller containing two input linguistic variables.

Rice. 21. Error signal under harmonic input for a model with a fuzzy controller containing two input linguistic variables.

Let's simulate the operation of a fuzzy controller with two inputs in the Matlab environment. The block diagram of the model will be exactly the same as in Fig. 19. From the graph of the transient process for the harmonic input, it can be seen that the accuracy of the system has increased significantly, but at the same time its oscillation has increased, especially in places where the derivative of the output coordinate tends to zero. It is obvious that the reasons for this, as mentioned above, is the non-optimal choice of parameters of membership functions, both for input and output linguistic variables. Therefore, we optimize the fuzzy controller using the ANFISedit block in the Matlab environment.

Fuzzy controller optimization.

Consider the use of genetic algorithms for fuzzy controller optimization. Genetic algorithms are adaptive search methods that are often used in recent years to solve functional optimization problems. They are based on the similarity to the genetic processes of biological organisms: biological populations develop over several generations, obeying the laws of natural selection and according to the principle of "survival of the fittest", discovered by Charles Darwin. By mimicking this process, genetic algorithms are able to "evolve" solutions to real-world problems if they are appropriately coded.

Genetic algorithms work with a set of "individuals" - a population, each of which represents Possible Solution this problem. Each individual is evaluated by the measure of its "fitness" according to how "good" the solution of the problem corresponding to it is. The fittest individuals are able to "reproduce" offspring by "cross-breeding" with other individuals in the population. This leads to the emergence of new individuals that combine some of the characteristics inherited from their parents. The least fit individuals are less likely to reproduce, so the traits they possess will gradually disappear from the population.

This is how the whole new population of feasible solutions is reproduced, choosing the best representatives of the previous generation, crossing them and getting a lot of new individuals. This new generation contains a higher ratio of characteristics that the good members of the previous generation possess. Thus, from generation to generation, good characteristics are distributed throughout the population. Ultimately, the population will converge to the optimal solution to the problem.

There are many ways to implement the idea of ​​biological evolution within the framework of genetic algorithms. Traditional, can be represented in the form of the following block diagram shown in Figure 22, where:

1. Initialization of the initial population - generation of a given number of solutions to the problem, from which the optimization process begins;

2. Application of crossover and mutation operators;

3. Stop conditions - usually, the optimization process is continued until a solution to the problem with a given accuracy is found, or until it is revealed that the process has converged (i.e., there has been no improvement in the solution of the problem over the last N generations).

In the Matlab environment, genetic algorithms are represented by a separate toolbox, as well as by the ANFIS package. ANFIS is an abbreviation for Adaptive-Network-Based Fuzzy Inference System - Adaptive Fuzzy Inference Network. ANFIS is one of the first variants of hybrid neuro-fuzzy networks - a neural network of a special type of direct signal propagation. The architecture of a neuro-fuzzy network is isomorphic to a fuzzy knowledge base. Differentiable implementations of triangular norms (multiplication and probabilistic OR) as well as smooth membership functions are used in neuro-fuzzy networks. This makes it possible to use fast and genetic algorithms for training neural networks based on the backpropagation method to tune neuro-fuzzy networks. The architecture and rules for the operation of each layer of the ANFIS network are described below.

ANFIS implements Sugeno's fuzzy inference system as a five-layer feed-forward neural network. The purpose of the layers is as follows: the first layer is the terms of the input variables; the second layer - antecedents (parcels) of fuzzy rules; the third layer is the normalization of the degree of fulfillment of the rules; the fourth layer is the conclusions of the rules; the fifth layer is the aggregation of the result obtained according to different rules.

The network inputs are not allocated to a separate layer. Figure 23 shows an ANFIS network with one input variable (“error”) and five fuzzy rules. For the linguistic evaluation of the input variable "error" 5 terms are used.


Fig.23. StructureANFIS-networks.

Let us introduce the following notation, necessary for further presentation:

Let be the inputs of the network;

y - network output;

Fuzzy rule with ordinal number r;

m - number of rules;

Fuzzy term with membership function , used for linguistic evaluation of a variable in the r-th rule (,);

Real numbers in the conclusion of the rth rule (,).

ANFIS-network functions as follows.

Layer 1 Each node of the first layer represents one term with a bell-shaped membership function. The inputs of the network are connected only to their terms. The number of nodes in the first layer is equal to the sum of cardinalities of term sets of input variables. The output of the node is the degree of belonging of the value of the input variable to the corresponding fuzzy term:

,

where a, b, and c are membership function configurable parameters.

Layer 2 The number of nodes in the second layer is m. Each node of this layer corresponds to one fuzzy rule. The node of the second layer is connected to those nodes of the first layer that form the antecedents of the corresponding rule. Therefore, each node of the second layer can receive from 1 to n input signals. The output of the node is the degree of execution of the rule, which is calculated as the product of the input signals. Denote the outputs of the nodes of this layer by , .

Layer 3 The number of nodes in the third layer is also m. Each node of this layer calculates the relative degree of fulfillment of the fuzzy rule:

Layer 4 The number of nodes in the fourth layer is also m. Each node is connected to one node of the third layer as well as to all inputs of the network (connections to the inputs are not shown in Fig. 18). The node of the fourth layer calculates the contribution of one fuzzy rule to the network output:

Layer 5 The single node of this layer summarizes the contributions of all rules:

.

Typical neural network training procedures can be applied to tune the ANFIS network, since it uses only differentiable functions. Typically, a combination of gradient descent in the form of backpropagation and least squares is used. The backpropagation algorithm adjusts the parameters of rule antecedents, i.e. membership functions. The rule conclusion coefficients are estimated by the least squares method, since they are linearly related to the network output. Each iteration of the tuning procedure is performed in two steps. At the first stage, a training sample is fed to the inputs, and the optimal parameters of the nodes of the fourth layer are found from the discrepancy between the desired and actual behavior of the network using the iterative least squares method. At the second stage, the residual discrepancy is transferred from the network output to the inputs, and the parameters of the nodes of the first layer are modified by the error backpropagation method. At the same time, the rule conclusion coefficients found at the first stage do not change. The iterative tuning procedure continues until the residual exceeds a predetermined value. To tune the membership functions, in addition to the error backpropagation method, other optimization algorithms can be used, for example, the Levenberg-Marquardt method.

Fig.24. ANFISedit workspace.

Let us now try to optimize the fuzzy controller for a single step action. The desired transient process is approximately the following:

Fig.25. desired transition process.

From the graph shown in Fig. it follows that most of the time the engine should run on full power, to ensure maximum performance, and when approaching the desired value, it should slow down smoothly. Guided by these simple considerations, we will take the following sample of values ​​as a training one, presented below in the form of a table:

Table 4


Error value

Management value

Error value

Management value

Error value

Management value


Fig.26. Type of training sample.

Training will be carried out at 100 steps. This is more than enough for the convergence of the method used.

Fig.27. The process of learning a neural network.

In the learning process, the parameters of the membership functions are formed in such a way that, with a given error value, the controller creates the necessary control. In the section between the nodal points, the dependence of the control on the error is an interpolation of the table data. The interpolation method depends on how the neural network is trained. In fact, after training, the fuzzy controller model can be represented as a non-linear function of one variable, the graph of which is presented below.

Fig.28. Plot of dependence of control from error to position inside the regulator.

Having saved the found parameters of the membership functions, we simulate the system with an optimized fuzzy controller.


Rice. 29. Transient process under harmonic input for a model with an optimized fuzzy controller containing one input linguistic variable.

Fig.30. Error signal under harmonic input for a model with a fuzzy controller containing two input linguistic variables.


It follows from the graphs that the optimization of the fuzzy controller by training the neural network was successful. Significantly decreased fluctuation and the magnitude of the error. Therefore, the use of a neural network is quite reasonable for optimizing controllers, the principle of which is based on fuzzy logic. However, even an optimized controller cannot satisfy the requirements for accuracy, so it is advisable to consider another control method, when the fuzzy controller does not control the object directly, but combines several control laws depending on the situation.

The image is understood as a structured description of the object or phenomenon under study, represented by a feature vector, each element of which represents the numerical value of one of the features that characterize the corresponding object.

The general structure of the recognition system is as follows:

The meaning of the recognition problem is to establish whether the studied objects have a fixed finite set of features that allow them to be assigned to a certain class. Recognition tasks have the following characteristic features:

1. These are information tasks consisting of two stages:

a. Bringing the source data to a form convenient for recognition.

b. Recognition itself is an indication of the belonging of an object to a certain class.

2. In these problems, one can introduce the concept of analogy or similarity of objects and formulate the concept of proximity of objects as a basis for assigning objects to the same class or different classes.

3. In these tasks, it is possible to operate with a set of precedents - examples whose classification is known and which, in the form of formalized descriptions, can be presented to the recognition algorithm to adjust to the task in the learning process.

4. For these problems, it is difficult to build formal theories and apply classical mathematical methods: often the information for an accurate mathematical model or the gain from using the model and mathematical methods is incommensurable with the costs.

5. In these tasks, “bad information” is possible - information with gaps, heterogeneous, indirect, fuzzy, ambiguous, probabilistic.

It is advisable to distinguish the following types of recognition tasks:

1. The task of recognition, that is, the assignment of the presented object according to its description to one of the given classes (training with a teacher).

2. The task of automatic classification is the division of a set of objects (situations) according to their descriptions into a system of non-overlapping classes (taxonomy, cluster analysis, unsupervised learning).

3. The problem of choosing an informative set of features in recognition.

4. The problem of reducing the initial data to a form convenient for recognition.

5. Dynamic recognition and dynamic classification - tasks 1 and 2 for dynamic objects.

6. The task of forecasting - tasks 5, in which the solution must refer to some moment in the future.

The concept of an image.

An image, a class is a classification grouping in the system that unites (singles out) a certain group of objects according to some attribute. Images have a number of characteristic properties, which manifest themselves in the fact that acquaintance with a finite number of phenomena from the same set makes it possible to recognize an arbitrarily large number of its representatives.


As an image, one can also consider a certain set of states of the control object, and this whole set of states is characterized by the fact that the same impact on the object is required to achieve a given goal. Images have characteristic objective properties in the sense that different people who learn from different observational material, for the most part, classify the same objects in the same way and independently of each other.

In general, the problem of pattern recognition consists of two parts: training and recognition.

Education is carried out by showing individual objects with an indication of their belonging to one or another image. As a result of training, the recognition system must acquire the ability to respond with the same reactions to all objects of the same image and different reactions to all objects of different images.

It is very important that the learning process should end only by displaying a finite number of objects without any other prompts. The objects of learning can be either visual images, or various phenomena of the external world, and others.

Training is followed by the process of recognition of new objects, which characterizes the operation of an already trained system. The automation of these procedures is the problem of training in pattern recognition. In the case when a person himself solves or invents, and then imposes on the computer the rules of classification, the recognition problem is partially solved, since the main and main part of the problem (training) is taken over by the person.

The problem of training in pattern recognition is interesting both from an applied and from a fundamental point of view. From an applied point of view, the solution of this problem is important, first of all, because it opens up the possibility of automating many processes that until now have been associated only with the activity of a living brain. The fundamental significance of the problem is connected with the question of what a computer can and cannot do in principle.

When solving problems of managing methods of pattern recognition, the term "state" is used instead of the term "image". State - certain forms of displaying the measured current (instantaneous) characteristics of the observed object, the set of states determines the situation.

A situation is usually called a certain set of states of a complex object, each of which is characterized by the same or similar characteristics of the object. For example, if a certain control object is considered as an object of observation, then the situation combines such states of this object in which the same control actions should be applied. If the object of observation is a game, then the situation unites all states of the game.

The choice of the initial description of objects is one of the central tasks of the problem of learning pattern recognition. With a successful choice of the initial description (feature space), the recognition task may turn out to be trivial. Conversely, an unsuccessfully chosen initial description can lead either to a very difficult further processing of information, or to no solution at all.

Geometric and structural approaches.

Any image that arises as a result of observing an object in the process of learning or exam can be represented as a vector, and hence as a point in some feature space.

If it is asserted that when displaying images it is possible to unambiguously attribute them to one of two (or several) images, then it is thereby asserted that in some space there are two or more regions that do not have common points, and that the image of a point is from these regions. Each point of such an area can be assigned a name, that is, give a name corresponding to the image.

Let us interpret the process of learning pattern recognition in terms of a geometric picture, limiting ourselves for now to the case of recognizing only two patterns. The only thing known in advance is that it is required to separate two regions in some space and that only points from these regions are shown. These areas themselves are not predetermined, that is, there is no information about the location of their boundaries or rules for determining whether a point belongs to a particular area.

In the course of training, points randomly selected from these areas are presented, and information is reported about which area the presented points belong to. No additional information about these areas, that is, the location of their boundaries during training, is reported.

The goal of learning is either to build a surface that would separate not only the points shown in the learning process, but also all other points belonging to these areas, or to build surfaces that bound these areas so that each of them contains only points of the same image. In other words, the goal of learning is to construct such functions from image vectors that would be, for example, positive at all points of one image and negative at all points of another image.

Due to the fact that the regions do not have common points, there is always a whole set of such separating functions, and as a result of learning, one of them must be built. If the presented images belong not to two, but to a larger number of images, then the task is to build, according to the points shown during training, a surface that separates all areas corresponding to these images from each other.

This problem can be solved, for example, by constructing a function that takes the same value over the points of each of the regions, and the value of this function over points from different regions should be different.

It may seem that knowing just a certain number of points from the area is not enough to separate the entire area. Indeed, one can specify an innumerable number of different regions that contain these points, and no matter how the surface that selects the region is built from them, it is always possible to specify another region that intersects the surface and at the same time contains the points shown.

However, it is known that the problem of approximating a function from information about it in a limited set of points is much narrower than the entire set on which the function is given, and is a common mathematical problem of approximating functions. Of course, the solution of such problems requires the introduction of certain restrictions on the class of functions under consideration, and the choice of these restrictions depends on the nature of the information that the teacher can add to the learning process.

One such hint is the conjecture about the compactness of images.

Along with the geometric interpretation of the problem of learning to recognize patterns, there is another approach, which is called structural or linguistic. Let's consider the linguistic approach on the example of visual image recognition.

First, a set of initial concepts is distinguished - typical fragments found in the image, and characteristics of the relative position of the fragments (left, bottom, inside, etc.). These initial concepts form a vocabulary that allows you to build various logical statements, sometimes called sentences.

The task is to select from a large number of statements that could be constructed using these concepts, the most significant for this particular case. Further, looking at a finite and, if possible, a small number of objects from each image, it is necessary to construct a description of these images.

The constructed descriptions must be so complete as to resolve the question of which image the given object belongs to. When implementing the linguistic approach, two tasks arise: the task of constructing an initial dictionary, that is, a set of typical fragments, and the task of constructing description rules from the elements of a given dictionary.

Within the framework of linguistic interpretation, an analogy is drawn between the structure of images and the syntax of a language. The desire for this analogy was caused by the possibility of using the apparatus of mathematical linguistics, that is, the methods are syntactic in nature. The use of the apparatus of mathematical linguistics to describe the structure of images can be applied only after the segmentation of images into component parts has been made, that is, words have been developed to describe typical fragments and methods for their search.

After the preliminary work, which ensures the selection of words, linguistic tasks proper arise, consisting of tasks of automatic grammatical parsing of descriptions for image recognition.

compactness hypothesis.

If we assume that in the learning process, the feature space is formed based on the planned classification, then we can hope that the specification of the feature space itself sets a property, under the influence of which images in this space are easily separated. It is these hopes that, as work in the field of pattern recognition developed, stimulated the emergence of the compactness hypothesis, which states that compact sets in the feature space correspond to patterns.

By a compact set we will understand certain clumps of points in the image space, assuming that there are rarefactions separating them between these clumps. However, this hypothesis has not always been confirmed experimentally. But those problems in which the compactness hypothesis was well fulfilled always found a simple solution, and vice versa, those problems for which the hypothesis was not confirmed were either not solved at all, or were solved with great difficulty and additional information.

The compactness hypothesis itself has turned into a sign of the possibility of satisfactorily solving recognition problems.

The formulation of the compactness hypothesis brings us close to the concept of an abstract image. If the coordinates of space are chosen randomly, then the images in it will be distributed randomly. They will be denser in some parts of the space than in others.

Let's call some randomly chosen space an abstract image. In this abstract space, there will almost certainly be compact sets of points. Therefore, in accordance with the compactness hypothesis, the set of objects to which compact sets of points correspond in an abstract space is usually called abstract images of a given space.

Training and self-training, adaptation and training.

If it were possible to notice a certain universal property that does not depend either on the nature of the images or on their images, but determines only the ability to separability, then along with the usual task of teaching recognition using information about the belonging of each object from the training sequence to one image or another, one can it would be better to pose a different classification problem - the so-called problem of learning without a teacher.

A task of this kind at the descriptive level can be formulated as follows: objects are presented to the system simultaneously or sequentially without any indication of their belonging to images. The input device of the system maps a set of objects onto a set of images and, using some property of image separability embedded in it beforehand, makes an independent classification of these objects.

After such a process of self-learning, the system should acquire the ability to recognize not only already familiar objects (objects from the training sequence), but also those that have not been presented before. The process of self-learning of a certain system is such a process, as a result of which this system, without the help of a teacher, acquires the ability to develop the same reactions to images of objects of the same image and different reactions to images of different images.

The role of the teacher in this case consists only in prompting the system of some objective property that is the same for all images and determines the ability to divide a set of objects into images.

It turns out that such an objective property is the property of compactness of images. The mutual arrangement of points in the selected space already contains information about how the set of points should be divided. This information determines the property of pattern separability, which is sufficient for self-learning of the pattern recognition system.

Most of the well-known self-learning algorithms are able to select only abstract images, that is, compact sets in given spaces. The difference between them lies in the formalization of the notion of compactness. However, this does not reduce, and sometimes even increases the value of self-learning algorithms, since often the images themselves are not predetermined by anyone, and the task is to determine which subsets of images in a given space are images.

An example of such a statement of the problem is sociological research, when groups of people are singled out according to a set of questions. In this understanding of the problem, self-learning algorithms generate a priori unknown information about the existence in a given space of images that no one had any idea about before.

In addition, the result of self-learning characterizes the suitability of the chosen space for a specific recognition learning task. If the abstract images allocated in the space of self-learning coincide with the real ones, then the space has been chosen successfully. The more abstract images differ from real ones, the more inconvenient the chosen space for a specific task.

Learning is usually called the process of developing in some system a particular reaction to groups of external identical signals by repeatedly influencing the external correction system. The mechanism for generating this adjustment almost completely determines the learning algorithm.

Self-learning differs from learning in that here additional information about the correctness of the reaction to the system is not reported.

Adaptation is the process of changing the parameters and structure of the system, and possibly control actions, based on current information in order to achieve a certain state of the system with initial uncertainty and changing operating conditions.

Learning is a process, as a result of which the system gradually acquires the ability to respond with the necessary reactions to certain sets of external influences, and adaptation is the adjustment of the parameters and structure of the system in order to achieve the required quality of control under conditions of continuous changes in external conditions.


Speech recognition systems.

Speech acts as the main means of communication between people and therefore speech communication is considered one of the most important components of the artificial intelligence system. Speech recognition is the process of converting an acoustic signal generated at the output of a microphone or telephone into a sequence of words.

A more difficult task is the task of understanding speech, which is associated with the identification of the meaning of the acoustic signal. In this case, the output of the speech recognition subsystem serves as the input of the utterance understanding subsystem. Automatic speech recognition (APP systems) is one of the areas of natural language processing technologies.

Automatic speech recognition is used in automating the input of texts into computers, in the formation of oral queries to databases or information retrieval systems, in the formation of oral commands to various intelligent devices.

Basic concepts of speech recognition systems.

Speech recognition systems are characterized by many parameters.

One of the main parameters is the word recognition error (ORF). This parameter is the ratio of the number of unrecognized words to the total number of spoken words.

Other parameters characterizing automatic speech recognition systems are:

1) dictionary size,

2) speech mode,

3) style of speech,

4) subject area,

5) speaker addiction,

6) the level of acoustic noise,

7) the quality of the input channel.

Depending on the size of the dictionary, APP systems are divided into three groups:

With a small dictionary size (up to 100 words),

With an average dictionary size (from 100 words to several thousand words),

With a large dictionary size (more than 10,000 words).

Speech mode characterizes the way words and phrases are pronounced. Allocate recognition systems continuous speech and systems that allow only isolated words of speech to be recognized. Isolated word recognition mode requires the speaker to pause briefly between words.

According to the style of speech, APP systems are divided into two groups: deterministic speech systems and spontaneous speech systems.

In deterministic speech recognition systems, the speaker reproduces speech following the grammatical rules of the language. Spontaneous speech is characterized by violations of grammatical rules and is more difficult to recognize.

Depending on the subject area, there are APP systems focused on application in highly specialized areas (for example, access to databases) and APP systems with an unlimited scope. The latter require a large amount of vocabulary and should provide recognition of spontaneous speech.

Many automatic speech recognition systems are speaker dependent. This involves pre-tuning the system to the peculiarities of the pronunciation of a particular speaker.

The complexity of solving the problem of speech recognition is explained by the high variability of acoustic signals. This variability is due to several reasons:

First, different implementation of phonemes - the basic units of the sound system of the language. The variability in the implementation of phonemes is caused by the influence of neighboring sounds in the speech stream. The shades of the realization of phonemes, due to the sound environment, are called allophones.

Secondly, the position and characteristics of acoustic receivers.

Thirdly, changes in the parameters of the speech of the same speaker, which are due to the different emotional state of the speaker, the pace of his speech.

The figure shows the main components of the speech recognition system:

The digitized speech signal enters the pre-processing unit, where the features necessary for sound recognition are extracted. Sound recognition is often done using artificial neural network models. The selected sound units are subsequently used to search for a sequence of words that best matches the input speech signal.

The search for a sequence of words is performed using acoustic, lexical and language models. The model parameters are determined from the training data based on the respective learning algorithms.

Synthesis of speech by text. Basic concepts

In many cases, the creation of artificial intelligence systems with elements of her-communication require the output of messages in speech form. The figure shows a block diagram of an intelligent question-answer system with a speech interface:

Picture 1.

Take a piece of lectures from Oleg

Consider the features of the empirical approach on the example of recognition of parts of speech. The task is to assign labels to the words of the sentence: noun, verb, preposition, adjective, and the like. In addition, it is necessary to define some additional features of nouns and verbs. For example, for a noun it is a number, and for a verb it is a form. We formalize the task.

Let's represent the sentence as a sequence of words: W=w1 w2…wn, where wn are random variables, each of which receives one of the possible values ​​belonging to the language dictionary. The sequence of labels assigned to the words of the sentence can be represented by the sequence X=x1 x2 … xn, where xn are random variables whose values ​​are defined on the set of possible labels.

Then the problem of part-of-speech recognition is to find the most probable sequence of labels x1, x2, …, xn given the sequence of words w1, w2, …, wn. In other words, it is necessary to find such a sequence of labels X*=x1 x2 … xn that provides the maximum conditional probability P(x1, x2, …, xn| w1 w2.. wn).

Let us rewrite the conditional probability P(X| W) as P(X| W)=P(X,W) / P(W). Since it is required to find the maximum conditional probability P(X,W) for the variable X, we get X*=arg x max P(X,W). The joint probability P(X,W) can be written as a product of conditional probabilities: P(X,W)=product over u-1 to n from P(x i |x1,…,x i -1 , w1,…,w i -1 ) P(w i |x1,…,x i -1 , w1,…,w i -1). Direct search for the maximum of this expression is a difficult task, since for large values ​​of n the search space becomes very large. Therefore, the probabilities that are written in this product are approximated by simpler conditional probabilities: P(x i |x i -1) P(w i |w i -1). In this case, it is assumed that the value of the label x i is associated only with the previous label x i -1 and does not depend on earlier labels, and that the probability of the word w i is determined only by the current label x i . These assumptions are called Markovian, and the theory of Markov models is used to solve the problem. Taking into account the Markov assumptions, we can write:

X*= arg x1, …, xn max П i =1 n P(x i |x i -1) P(wi|wi-1)

Where conditional probabilities are estimated on a set of training data

The search for a sequence of labels X* is carried out using the Viterbi dynamic programming algorithm. The Viterbi algorithm can be considered as a variant of the state graph search algorithm, where the vertices correspond to word labels.

Characteristically, for any current vertex, the set of child labels is always the same. Moreover, for each child vertex, the sets of parent vertices also coincide. This is explained by the fact that transitions are made on the state graph, taking into account all possible combinations of labels. Markov's assumption provides a significant simplification of the problem of recognition of parts of speech while maintaining high accuracy of assigning labels to words.

So, with 200 tags, the assignment accuracy is approximately 97%. For a long time, imperial analysis was performed using stochastic context-free grammars. However, they have a significant drawback. It lies in the fact that the same probabilities can be assigned to different parses. This is due to the fact that the probability of parsing is represented as a product of the probabilities of the rules involved in the parsing. If during the analysis different rules are used, characterized by the same probabilities, then this gives rise to the indicated problem. The best results are given by a grammar that takes into account the vocabulary of the language.

In this case, the rules include the necessary lexical information that provides different probability values ​​for the same rule in different lexical environments. Imperial parsing is more in line with pattern recognition than traditional parsing in its classical sense.

Comparative studies have shown that the correctness of imperial parsing of natural language applications is higher compared to traditional parsing.

Overview of existing pattern recognition methods

L.P. Popova , AND ABOUT. Datiev

The ability to ";recognize"; It is considered the main property of human beings, as, indeed, of other living organisms. Pattern recognition is a branch of cybernetics that develops principles and methods for classifying and identifying objects, phenomena, processes, signals, situations - all those objects that can be described by a finite set of some features or properties that characterize an object.

An image is a description of an object. Images have a characteristic property, which manifests itself in the fact that acquaintance with a finite number of phenomena from the same set makes it possible to recognize an arbitrarily large number of its representatives.

There are two main directions in the theory of pattern recognition:

    the study of the powers of recognition possessed by human beings and other living organisms;

    development of the theory and methods for constructing devices designed to solve individual problems of pattern recognition in certain application areas.

Further, the article describes the problems, principles and methods for implementing pattern recognition systems related to the development of the second direction. The second part of the article discusses neural network methods of pattern recognition, which can be attributed to the first direction of pattern recognition theory.

Problems of building image recognition systems

The tasks that arise in the construction of automatic pattern recognition systems can usually be classified into several main areas. The first of them is related to the representation of the initial data obtained as the results of measurements for the object to be recognized. This sensitivity problem. Each measured value is some characteristic of an image or an object. Suppose, for example, that the images are alphanumeric characters. In this case, a measuring retina can be successfully used in the sensor, similar to that shown in Fig. of n-elements, then the measurement results can be represented as a measurement vector or an image vector ,

where each element xi takes, for example, the value 1 if the image of the symbol passes through the i-th cell of the retina, and the value 0 otherwise.

Consider Fig. 2(b). In this case, the images are continuous functions (of the type of sound signals) of the variable t. If the function values ​​are measured at discrete points t1,t2, ..., tn, then the image vector can be formed by taking x1= f(t1),x2=f(t2),... , xn = f(tn).

Figure 1. Measuring retina

The second problem of pattern recognition is related to the selection of characteristic features or properties from the obtained initial data and the reduction in the dimension of pattern vectors. This problem is often defined as a problem preprocessing and feature selection.

Features of a class of images are characteristic properties common to all images of a given class. The features that characterize the differences between individual classes can be interpreted as interclass features. Intraclass features that are common to all the classes under consideration do not carry useful information from the point of view of recognition and may not be taken into account. The choice of features is considered one of the important tasks associated with the construction of recognition systems. If the measurement results make it possible to obtain a complete set of distinguishing features for all classes, the actual recognition and classification of patterns will not cause any particular difficulties. Automatic recognition would then be reduced to a simple matching process or procedures such as table lookups. In most practical recognition problems, however, determining a complete set of distinguishing features is extremely difficult, if not impossible. From the original data, it is usually possible to extract some of the distinguishing features and use them to simplify the process of automatic pattern recognition. In particular, the dimension of the measurement vectors can be reduced using transformations that minimize the loss of information.

The third problem associated with the construction of pattern recognition systems is to find the optimal decision procedures necessary for identification and classification. After the data collected about the patterns to be recognized are represented by points or vectors of measurements in the pattern space, let the machine figure out which class of patterns this data corresponds to. Let the machine be designed to distinguish between M classes, denoted by w1, w2, ... ..., wm. In this case, the image space can be considered to consist of M regions, each of which contains points corresponding to images from the same class. In this case, the recognition problem can be considered as constructing the boundaries of the decision regions separating M classes based on the registered measurement vectors. Let these boundaries be defined, for example, by decision functions d1(х),d2(x),..., dm(х). These functions, also called discriminant functions, are scalar and single-valued functions of the image of x. If di (x) > dj (x), then the image of x belongs to the class w1. In other words, if i-th decisive function di(x) has the greatest value, then a meaningful illustration of such an automatic classification scheme based on the implementation of the decision-making process is shown in Fig. 2 (on the scheme "GR" - the generator of decisive functions).

Figure 2. Scheme of automatic classification.

Decision functions can be obtained in a number of ways. In those cases where complete a priori information is available about the recognizable patterns, the decision functions can be determined exactly on the basis of this information. If only qualitative information is available about the patterns, reasonable assumptions can be made about the form of the decision functions. In the latter case, the boundaries of the decision regions can deviate significantly from the true ones, and therefore it is necessary to create a system capable of arriving at a satisfactory result through a series of successive adjustments.

Objects (images) to be recognized and classified using an automatic pattern recognition system must have a set of measurable characteristics. When for a whole group of images the results of the corresponding measurements are similar, it is considered that these objects belong to the same class. The purpose of the pattern recognition system is to determine, on the basis of the collected information, a class of objects with characteristics similar to those measured for recognizable objects. The correctness of recognition depends on the amount of distinguishing information contained in the measured characteristics, and the efficiency of using this information.

      Basic Methods for Implementing Pattern Recognition Systems

Pattern recognition is the task of constructing and applying formal operations on numerical or symbolic representations of objects of the real or ideal world, the results, the solutions of which reflect the equivalence relations between these objects. Equivalence relations express the belonging of the evaluated objects to some classes, considered as independent semantic units.

When constructing recognition algorithms, equivalence classes can be set by a researcher who uses his own meaningful ideas or uses external additional information about the similarity and difference of objects in the context of the problem being solved. Then one speaks of "discerning with the teacher." Otherwise, i.e. when an automated system solves a classification problem without involving external training information, one speaks of automatic classification or “unsupervised recognition”. Most pattern recognition algorithms require very significant computing power, which can only be provided by high-performance computer technology.

Various authors (Yu.L. Barabash, V.I. Vasiliev, A.L. Gorelik, V.A. Skripkin, R. Duda, P. Hart, L.T. Kuzin, F.I. Peregudov, F.P. Tarasenko, Temnikov F.E., Afonin V.A., Dmitriev V.I., J. Tu, R. Gonzalez, P. Winston, K. Fu, Ya.Z. Tsypkin and others) give a different typology of methods pattern recognition. Some authors distinguish between parametric, non-parametric and heuristic methods, while others single out groups of methods based on historical schools and trends in this field.

At the same time, the well-known typologies do not take into account one very significant characteristic, which reflects the specifics of the way knowledge about the subject area is represented using any formal pattern recognition algorithm. D.A. Pospelov identifies two main ways of representing knowledge:

    Intensional representation - in the form of a diagram of relationships between attributes (features).

    Extensional representation - with the help of specific facts (objects, examples).

It should be noted that the existence of these two groups of recognition methods: those operating with features and those operating with objects, is deeply natural. From this point of view, none of these methods, taken separately from the other, makes it possible to form an adequate reflection of the subject area. Between these methods there is a relation of complementarity in the sense of N. Bohr, therefore, promising recognition systems should provide the implementation of both of these methods, and not just any one of them.

Thus, the classification of recognition methods proposed by D.A. Pospelov is based on the fundamental laws that underlie the human way of cognition in general, which puts it in a very special (privileged) position compared to other classifications, which, against this background, look more lightweight and artificial.

Intensional Methods

A distinctive feature of intensional methods is that they use different characteristics of features and their relationships as elements of operations in the construction and application of pattern recognition algorithms. Such elements can be individual values ​​or intervals of feature values, average values ​​and variances, feature relationship matrices, etc., on which actions are performed, expressed in an analytical or constructive form. At the same time, the objects in these methods are not considered as integral information units, but act as indicators for assessing the interaction and behavior of their attributes.

The group of intensional pattern recognition methods is extensive, and its division into subclasses is somewhat arbitrary:

– methods based on estimates of the distribution densities of feature values

– methods based on assumptions about the class of decision functions

– logical methods

– linguistic (structural) methods.

Methods based on estimates of the distribution densities of feature values. These pattern recognition methods are borrowed from the classical theory of statistical decisions, in which the objects of study are considered as realizations of a multidimensional random variable distributed in the feature space according to some law. They are based on the Bayesian decision-making scheme, which appeals to a priori probabilities of objects belonging to one or another recognizable class and conditional distribution densities of the feature vector values. These methods are reduced to determining the likelihood ratio in different areas of the multidimensional feature space.

The group of methods based on the estimation of the distribution densities of feature values ​​is directly related to the methods of discriminant analysis. The Bayesian approach to decision making is one of the most developed in modern statistics, the so-called parametric methods, for which the analytical expression of the distribution law (in this case, the normal law) is considered to be known and only a small number of parameters (mean vectors and covariance matrices) need to be estimated.

This group also includes a method for calculating the likelihood ratio for independent features. This method, with the exception of the assumption of the independence of features (which is practically never fulfilled in reality), does not imply knowledge of the functional form of the distribution law. It can be attributed to non-parametric methods.

Other non-parametric methods, used when the shape of the distribution density curve is unknown and no assumptions can be made about its nature at all, occupy a special position. These include the well-known method of multidimensional histograms, the “k-nearest neighbors” method, the Euclidean distance method, the method of potential functions, etc., the generalization of which is the method called “Parzen estimates”. These methods formally operate with objects as integral structures, but depending on the type of recognition task, they can act both in intensional and extensional hypostases.

Nonparametric methods analyze the relative numbers of objects that fall into the given multidimensional volumes and use various distance functions between the objects of the training sample and the recognized objects. For quantitative features, when their number is much less than the sample size, operations with objects play an intermediate role in estimating local distribution densities of conditional probabilities, and objects do not carry the semantic load of independent information units. At the same time, when the number of features is commensurate or greater than the number of objects under study, and the features are of a qualitative or dichotomous nature, then there can be no talk of any local estimates of the probability distribution densities. In this case, the objects in these non-parametric methods are considered as independent information units (holistic empirical facts) and these methods acquire the meaning of assessments of the similarity and difference of the objects under study.

Thus, the same technological operations of nonparametric methods, depending on the conditions of the problem, make sense either local estimates of the probability distribution densities of feature values, or estimates of the similarity and difference of objects.

In the context of the intensional representation of knowledge, the first side of nonparametric methods is considered here, as estimates of probability distribution densities. Many authors note that nonparametric methods such as Parzen estimates work well in practice. The main difficulties in applying these methods are the need to remember the entire training sample to calculate estimates of local probability distribution densities and high sensitivity to the non-representativeness of the training sample.

Methods based on assumptions about the class of decision functions. In this group of methods, the general form of the decision function is considered known and its quality functional is given. Based on this functional, the best approximation of the decision function is sought for the training sequence. The most common are representations of decision functions in the form of linear and generalized nonlinear polynomials. The quality functional of the decision rule is usually associated with the classification error.

The main advantage of methods based on assumptions about the class of decision functions is the clarity of the mathematical formulation of the recognition problem as a problem of finding an extremum. The solution to this problem is often achieved using some kind of gradient algorithms. The variety of methods of this group is explained by the wide range of used decision rule quality functionals and extremum search algorithms. A generalization of the considered algorithms, which include, in particular, Newton's algorithm, perceptron-type algorithms, etc., is the method of stochastic approximation. Unlike parametric recognition methods, the success of this group of methods does not depend so much on the mismatch of theoretical ideas about the laws of distribution of objects in the feature space with empirical reality. All operations are subordinated to one main goal - finding the extremum of the quality functional of the decision rule. At the same time, the results of the parametric and considered methods may be similar. As shown above, parametric methods for the case of normal distributions of objects in different classes with equal covariance matrices lead to linear decision functions. We also note that the algorithms for selecting informative features in linear diagnostic models can be interpreted as particular variants of gradient algorithms for searching for an extremum.

The possibilities of gradient algorithms for finding an extremum, especially in the group of linear decision rules, have been studied quite well. The convergence of these algorithms has been proved only for the case when the recognizable classes of objects are displayed in the feature space by compact geometric structures. However, the desire to achieve sufficient quality of the decision rule can often be satisfied with the help of algorithms that do not have a rigorous mathematical proof of the convergence of the solution to the global extremum.

These algorithms include large group heuristic programming procedures representing the direction of evolutionary modeling. Evolutionary modeling is a bionic method borrowed from nature. It is based on the use of known mechanisms of evolution in order to replace the process of meaningful modeling of a complex object with phenomenological modeling of its evolution.

A well-known representative of evolutionary modeling in pattern recognition is the method of group accounting of arguments (MGUA). The GMDH is based on the principle of self-organization, and the GMDH algorithms reproduce the scheme of mass selection. In GMDH algorithms, members of a generalized polynomial are synthesized and selected in a special way, which is often called the Kolmogorov-Gabor polynomial. This synthesis and selection is carried out with increasing complexity, and it is impossible to predict in advance what final form the generalized polynomial will have. First, simple pairwise combinations of initial features are usually considered, from which the equations of the decisive functions are composed, as a rule, not higher than the second order. Each equation is analyzed as an independent decision function, and the values ​​of the parameters of the composed equations are found in one way or another from the training sample. Then, from the resulting set of decision functions, a part of the best in some sense is selected. The quality of individual decision functions is checked on a control (test) sample, which is sometimes called the principle of external addition. The selected particular decision functions are considered below as intermediate variables that serve as initial arguments for a similar synthesis of new decision functions, etc. The process of such a hierarchical synthesis continues until the extremum of the decision function quality criterion is reached, which in practice manifests itself in the deterioration of this quality when trying to further increase the order of the members of the polynomial relative to the original features.

The self-organization principle underlying the GMDH is called heuristic self-organization, since the whole process is based on the introduction of external additions chosen heuristically. The result of the decision can significantly depend on these heuristics. The resulting diagnostic model depends on how the objects are divided into training and testing samples, how the recognition quality criterion is determined, how many variables are skipped in the next selection row, etc.

These features of GMDH algorithms are also characteristic of other approaches to evolutionary modeling. But we note here one more aspect of the methods under consideration. This is their content essence. Using methods based on assumptions about the class of decision functions (evolutionary and gradient), it is possible to build diagnostic models of high complexity and obtain practically acceptable results. At the same time, the achievement of practical goals in this case is not accompanied by the extraction of new knowledge about the nature of recognizable objects. The possibility of extracting this knowledge, in particular knowledge about the mechanisms of interaction of attributes (features), is fundamentally limited here by the given structure of such interaction, fixed in the chosen form of decisive functions. Therefore, the maximum that can be said after constructing a particular diagnostic model is to list the combinations of features and the features themselves that are included in the resulting model. But the meaning of combinations that reflect the nature and structure of the distributions of the objects under study often remains undiscovered within the framework of this approach.

Boolean Methods. Logical methods of pattern recognition are based on the apparatus of logical algebra and allow to operate with information contained not only in individual features, but also in combinations of feature values. In these methods, the values ​​of any attribute are considered as elementary events.

In the most general form, logical methods can be characterized as a kind of search for logical patterns in the training sample and the formation of a certain system of logical decision rules (for example, in the form of conjunctions of elementary events), each of which has its own weight. The group of logical methods is diverse and includes methods of varying complexity and depth of analysis. For dichotomous (boolean) features, the so-called tree-like classifiers, the dead-end test method, the Kora algorithm, and others are popular. More complex methods are based on the formalization of D.S. Mill's inductive methods. Formalization is carried out by constructing a quasi-axiomatic theory and is based on multi-sorted multi-valued logic with quantifiers over variable-length tuples.

The Kora algorithm, like other logical methods of pattern recognition, is quite laborious, since a complete enumeration is necessary when selecting conjunctions. Therefore, when applying logical methods, high requirements are placed on the efficient organization of the computational process, and these methods work well with relatively small dimensions of the feature space and only on powerful computers.

Linguistic (syntactic or structural) methods. Linguistic methods of pattern recognition are based on the use of special grammars that generate languages, with the help of which a set of properties of recognizable objects can be described. Grammar refers to the rules for constructing objects from these non-derived elements.

If the description of images is made with the help of non-derivative elements (sub-images) and their relationships, then a linguistic or syntactic approach is used to build automatic recognition systems using the principle of commonality of properties. An image can be described using a hierarchical structure of subimages similar to the syntactic structure of a language. This circumstance makes it possible to apply the theory of formal languages ​​in solving problems of pattern recognition. It is assumed that the grammar of images contains finite sets of elements called variables, non-derived elements and substitution rules. The nature of the substitution rules determines the type of grammar. Among the most studied grammars are regular, context-free and grammars of direct constituents. The key points of this approach are the choice of non-derivative elements of the image, the union of these elements and the relations connecting them into the grammar of images, and, finally, the implementation of the processes of analysis and recognition in the corresponding language. This approach is especially useful when working with images that either cannot be described by numerical measurements, or are so complex that their local features cannot be identified and one has to refer to the global properties of objects.

For example, E.A. Butakov, V.I. Ostrovsky, I.L. Fadeev propose the following system structure for image processing (Fig. 3), using a linguistic approach, where each of the functional blocks is a software (microprogram) complex (module) that implements the corresponding functions.

Figure 3. Structural diagram of the recognizer

Attempts to apply the methods of mathematical linguistics to the problem of image analysis lead to the need to solve a number of problems related to the mapping of a two-dimensional image structure onto one-dimensional chains of a formal language.

Extensional Methods

In the methods of this group, in contrast to the intensional direction, each object under study is given, to a greater or lesser extent, an independent diagnostic value. At their core, these methods are close to the clinical approach, which considers people not as a chain of objects ranked according to one or another indicator, but as integral systems, each of which is individual and has a special diagnostic value. Such a careful attitude to the objects of study does not allow one to exclude or lose information about each individual object, which occurs when applying the methods of intensional direction, using objects only to detect and fix the patterns of behavior of their attributes.

The main operations in pattern recognition using the discussed methods are the operations of determining the similarity and difference of objects. Objects in the specified group of methods play the role of diagnostic precedents. At the same time, depending on the conditions of a particular task, the role of an individual precedent can vary within the widest limits: from the main and defining to very indirect participation in the recognition process. In turn, the conditions of the problem may require the participation of a different number of diagnostic precedents for a successful solution: from one in each recognizable class to the entire sample size, as well as different ways to calculate the measures of similarity and difference of objects. These requirements explain the further division of extensional methods into subclasses:

    prototype comparison method;

    k-nearest neighbor method;

    teams of decision rules.

Prototype comparison method. This is the simplest extensional recognition method. It is used, for example, when the recognized classes are displayed in the feature space in compact geometric groupings. In this case, the center of the geometric grouping of the class (or the object closest to the center) is usually chosen as the prototype point.

To classify an unknown object, the prototype closest to it is found, and the object belongs to the same class as this prototype. Obviously, no generalized class images are formed in this method.

Various types of distances can be used as a measure of proximity. Often for dichotomous features, the Hamming distance is used, which in this case is equal to the square of the Euclidean distance. In this case, the decision rule for classifying objects is equivalent to a linear decision function.

This fact should be especially noted. It clearly demonstrates the connection between prototype and indicative representation of information about the data structure. Using the above representation, for example, any traditional measuring scale, which is a linear function of the values ​​of dichotomous features, can be considered as a hypothetical diagnostic prototype. In turn, if the analysis of the spatial structure of the recognized classes allows us to conclude that they are geometrically compact, then it is enough to replace each of these classes with one prototype, which is actually equivalent to a linear diagnostic model.

In practice, of course, the situation is often different from the idealized example described. A researcher who intends to apply a recognition method based on comparison with the prototypes of diagnostic classes faces difficult problems. This is, first of all, the choice of a proximity measure (metric), which can significantly change the spatial configuration of the distribution of objects. And, secondly, an independent problem is the analysis of multidimensional structures of experimental data. Both of these problems are especially acute for the researcher under conditions of high dimension of the feature space, which is typical for real problems.

Method of k-nearest neighbors. The k-nearest neighbor method for solving discriminant analysis problems was first proposed back in 1952. It is as follows.

When classifying an unknown object, a given number (k) of other objects geometrically closest to it in the feature space (nearest neighbors) with already known belonging to recognizable classes is found. The decision to assign an unknown object to a particular diagnostic class is made by analyzing information about this known membership of its nearest neighbors, for example, using a simple vote count.

Initially, the k-nearest neighbor method was considered as a non-parametric method for estimating the likelihood ratio. For this method, theoretical estimates of its effectiveness are obtained in comparison with the optimal Bayesian classifier. It is proved that the asymptotic error probabilities for the k-nearest neighbor method exceed the errors of the Bayes rule by no more than two times.

As noted above, in real problems it is often necessary to operate with objects that are described by a large number of qualitative (dichotomous) features. At the same time, the dimension of the feature space is commensurate with or exceeds the volume of the sample under study. Under such conditions, it is convenient to interpret each object of the training sample as a separate linear classifier. Then this or that diagnostic class is represented not by one prototype, but by a set of linear classifiers. The combined interaction of linear classifiers results in a piecewise linear surface that separates the recognizable classes in the feature space. The type of dividing surface, consisting of pieces of hyperplanes, can be varied and depends on the relative position of the classified aggregates.

Another interpretation of k-nearest neighbor classification mechanisms can also be used. It is based on the idea of ​​the existence of some latent variables, abstract or associated by some transformation with the original feature space. If pairwise distances between objects in the space of latent variables are the same as in the space of initial features, and the number of these variables is much less than the number of objects, then the interpretation of the k-nearest neighbors method can be considered from the point of view of comparing nonparametric estimates of conditional probability distribution densities. The concept of latent variables presented here is close in nature to the concept of true dimensionality and other representations used in various dimensionality reduction methods.

When using the k-nearest neighbors method for pattern recognition, the researcher has to solve the difficult problem of choosing a metric to determine the proximity of diagnosed objects. This problem in the conditions of high dimension of the feature space becomes extremely aggravated due to the sufficient complexity of this method, which becomes significant even for high-performance computers. Therefore, here, just as in the prototype comparison method, it is necessary to solve the creative problem of analyzing the multidimensional structure of experimental data in order to minimize the number of objects representing diagnostic classes.

Algorithms for calculating grades (voting). The principle of operation of algorithms for calculating scores (ABO) is to calculate the priority (similarity scores) that characterize the “proximity” of the recognized and reference objects according to the system of feature ensembles, which is a system of subsets of a given set of features.

Unlike all previously considered methods, algorithms for calculating estimates operate with object descriptions in a fundamentally new way. For these algorithms, objects exist simultaneously in very different subspaces of the feature space. The ABO class brings the idea of ​​using features to its logical conclusion: since it is not always known which combinations of features are the most informative, in ABO the degree of similarity of objects is calculated by comparing all possible or certain combinations of features included in the descriptions of objects.

Teams of decision rules. The decision rule uses a two-level recognition scheme. At the first level, private recognition algorithms work, the results of which are combined at the second level in the synthesis block. The most common methods of such a combination are based on the allocation of areas of competence of a particular algorithm. The simplest way to find areas of competence is to a priori split the feature space based on the professional considerations of a particular science (for example, stratification of the sample according to some feature). Then, for each of the selected areas, its own recognition algorithm is built. Another method is based on the use of formal analysis to determine local areas of the feature space as neighborhoods of recognizable objects for which the success of any particular recognition algorithm has been proven.

The most general approach to constructing a synthesis block considers the resulting indicators of partial algorithms as initial features for constructing a new generalized decision rule. In this case, all the above methods of intensional and extensional directions in pattern recognition can be used. Effective for solving the problem of creating a set of decision rules are logical algorithms of the “Kora” type and algorithms for calculating estimates (ABO), which form the basis of the so-called algebraic approach, which provides research and a constructive description of recognition algorithms, within which all existing types of algorithms fit.

Neural network methods

Neural network methods are methods based on the use of various types of neural networks (NN). The main areas of application of various NNs for pattern and image recognition:

    application for extracting key characteristics or features of given images,

    classification of the images themselves or the characteristics already extracted from them (in the first case, the extraction of key characteristics occurs implicitly within the network),

    solution of optimization problems.

Multilayer neural networks. The architecture of a multilayer neural network (MNN) consists of sequentially connected layers, where the neuron of each layer is connected with all the neurons of the previous layer with its inputs, and with the outputs of the next one.

The simplest application of a single-layer NN (called auto-associative memory) is to train the network to reconstruct the feed images. By feeding a test image to the input and calculating the quality of the reconstructed image, one can estimate how well the network recognized the input image. The positive properties of this method are that the network can recover distorted and noisy images, but it is not suitable for more serious purposes.

MNN is also used for direct classification of images - the input is either the image itself in some form, or a set of previously extracted key characteristics of the image, at the output, the neuron with maximum activity indicates belonging to the recognized class (Fig. 4). If this activity is below a certain threshold, then it is considered that the submitted image does not belong to any of the known classes. The learning process establishes the correspondence of the input images with belonging to a certain class. This is called supervised learning. This approach is good for access control tasks for a small group of people. This approach provides a direct comparison of the images themselves by the network, but with an increase in the number of classes, the time of training and network operation increases exponentially. Therefore, tasks such as searching for a similar person in a large database require extracting a compact set of key features from which to search.

A classification approach using the frequency characteristics of the entire image is described in . A single-layer NS based on multivalued neurons was used.

B shows the use of NN for image classification, when the network input receives the results of image decomposition by the method of principal components.

In the classical MNS, the interlayer neural connections are fully connected, and the image is represented as a one-dimensional vector, although it is two-dimensional. The architecture of the convolutional neural network aims to overcome these shortcomings. It used local receptor fields (providing local two-dimensional connectivity of neurons), general weights (providing the detection of some features anywhere in the image), and hierarchical organization with spatial subsampling (spatial subsampling). Convolutional NN (CNN) provides partial resistance to scale changes, displacements, rotations, distortions.

MNS are also used to detect objects of a certain type. In addition to the fact that any trained MNS can to some extent determine whether images belong to “its own” classes, it can be specially trained to reliably detect certain classes. In this case, the output classes will be classes that belong and do not belong to the given image type. A neural network detector was used to detect the face image in the input image. The image was scanned with a window of 20x20 pixels, which was fed to the input of the network, which decides whether the given area belongs to the class of faces. Training was done using both positive examples (various images of faces) and negative examples (images that are not faces). To improve the reliability of detection, a team of NNs trained with different initial weights was used, as a result of which the NNs made mistakes in different ways, and the final decision was made by voting of the entire team.

Figure 5. Principal components (eigenfaces) and decomposition of the image into principal components

NN is also used to extract the key characteristics of the image, which are then used for subsequent classification. In , a method for the neural network implementation of the principal component analysis method is shown. The essence of the principal component analysis method is to obtain the maximally decorelled coefficients that characterize the input patterns. These coefficients are called principal components and are used for statistical image compression, in which a small number of coefficients are used to represent the entire image. An NN with one hidden layer containing N neurons (which is much smaller than the image dimension), trained by the method of error backpropagation to restore the input image at the output, forms the coefficients of the first N principal components at the output of the hidden neurons, which are used for comparison. Typically, 10 to 200 principal components are used. As the component number increases, its representativeness decreases greatly, and it makes no sense to use components with large numbers. When using nonlinear activation functions of neural elements, a nonlinear decomposition into principal components is possible. Nonlinearity allows you to more accurately reflect the variations in input data. Applying principal component analysis to the decomposition of face images, we obtain the main components, called proper faces, which also have a useful property - there are components that mainly reflect such essential face characteristics as gender, race, emotions. When restored, the components have a face-like appearance, with the former reflecting the most general shape of the face, the latter representing various minor differences between faces (Fig. 5). This method is well applicable for searching similar face images in large databases. The possibility of further reduction of the dimension of principal components with the help of NS is also shown. By evaluating the quality of the reconstruction of the input image, one can very accurately determine whether it belongs to the class of faces.

Neural networks of high order. High-order neural networks (HNNs) differ from MNNs in that they have only one layer, but the inputs of neurons also receive high-order terms, which are the product of two or more components of the input vector . Such networks can also form complex separating surfaces.

Hopfield neural networks. Hopfield NN (HSH) is single-layer and fully connected (there are no connections of neurons to themselves), its outputs are connected with inputs. Unlike the MNS, the NSH is relaxational, i.e. being set to the initial state, it functions until it reaches a stable state, which will be its output value. To search for a global minimum in relation to optimization problems, stochastic modifications of the NSH are used.

The use of NSH as an associative memory allows you to accurately restore the images that the network has been trained to when a distorted image is fed to the input. In this case, the network will “remember” the closest (in the sense of the local minimum of energy) image, and thus recognize it. Such functioning can also be thought of as a sequential application of the auto-associative memory described above. Unlike auto-associative memory, NSH will restore the image perfectly accurately. To avoid interference minima and increase network capacity, use various methods.

Kohonen self-organizing neural networks. Kohonen self-organizing neural networks (SNNCs) provide topological ordering of the input image space. They allow topologically continuous mapping of the input n-dimensional space into the output m-dimensional space, mn. The input image is projected onto some position in the network, encoded as the position of the activated node. Unlike most other classification and clustering methods, topological class ordering preserves the output similarity in input patterns, which is especially useful when classifying data that has a large number of classes.

Cognitron. The cognitron in its architecture is similar to the structure of the visual cortex, it has a hierarchical multilayer organization, in which neurons between layers are connected only locally. Trained by competitive learning (without a teacher). Each layer of the brain implements different levels of generalization; the input layer is sensitive to simple patterns, such as lines, and their orientation in certain areas of the visual area, while the response of other layers is more complex, abstract, and independent of the pattern's position. Similar functions are implemented in the cognitron by modeling the organization of the visual cortex.

Neocognitron is a further development of the cognitron idea and more accurately reflects the structure of the visual system, allows you to recognize images regardless of their transformations, rotations, distortions and scale changes.

Cognitron is a powerful image recognition tool, however, it requires high computational costs, which are currently unattainable.

The considered neural network methods provide fast and reliable image recognition, but when using these methods, problems arise in the recognition of three-dimensional objects. However, this approach has many advantages.

      Conclusion

Currently, there are a fairly large number of automatic pattern recognition systems for various applied problems.

Pattern recognition by formal methods as a fundamental scientific direction is inexhaustible.

Mathematical methods of image processing have a wide variety of applications: science, technology, medicine, social sphere. In the future, the role of pattern recognition in human life will increase even more.

Neural network methods provide fast and reliable image recognition. This approach has many advantages and is one of the most promising.

Literature

    D.V. Brilyuk, V.V. Starovoitov. Neural network methods of image recognition // /

    Kuzin L.T. Fundamentals of Cybernetics: Fundamentals of Cybernetic Models. T.2. - M.: Energy, 1979. - 584 p.

    Peregudov F.I., Tarasenko F.P. Introduction to System Analysis: Textbook. - M .: Higher School, 1997. - 389s.

    Temnikov F.E., Afonin V.A., Dmitriev V.I. Theoretical foundations of information technology. - M.: Energy, 1979. - 511s.

    Tu J., Gonzalez R. Principles of Pattern Recognition. / Per. from English. - M.: Mir, 1978. - 410s.

    Winston P. Artificial intelligence. / Per. from English. - M.: Mir, 1980. - 520s.

    Fu K. Structural methods in pattern recognition: Translated from English. - M.: Mir, 1977. - 320s.

    Tsypkin Ya.Z. Fundamentals of Information Theory of Identification. - M.: Nauka, 1984. - 520s.

    Pospelov G.S. Artificial intelligence is the basis of new information technology. - M.: Nauka, 1988. - 280s.

    Yu. Lifshits, Statistical methods of pattern recognition ///modern/07modernnote.pdf

    Bohr N. Atomic physics and human knowledge. / Translation from English. - M.: Mir, 1961. - 151s.

    Butakov E.A., Ostrovsky V.I., Fadeev I.L. Image processing on a computer.1987.-236s.

    Duda R., Hart P. Pattern recognition and scene analysis. / Translation from English. - M.: Mir, 1978. - 510s.

    Duke V.A. Computer psychodiagnostics. - St. Petersburg: Brotherhood, 1994. - 365 p.

    Aizenberg I. N., Aizenberg N. N. and Krivosheev G. A. Multi-valued and Universal Binary Neurons: Learning Algorithms, Applications to Image Processing and Recognition. Lecture Notes in Artificial Intelligence - Machine Learning and Data Mining in Pattern Recognition, 1999, pp. 21-35.

    Ranganath S. and Arun K. Face recognition using transform features and neural networks. Pattern Recognition 1997, Vol. 30, pp. 1615-1622.

    Golovko V.A. Neurointelligence: Theory and Applications. Book 1. Organization and training of neural networks with direct and feedback - Brest: BPI, 1999, - 260s.

    Vetter T. and Poggio T. Linear Object Classes and Image Synthesis From a Single Example Image. IEEE Transactions on Pattern Analysis and Machine Intelligence 1997, Vol. 19, pp. 733-742.

    Golovko V.A. Neurointelligence: Theory and Applications. Book 2. Self-organization, fault tolerance and the use of neural networks - Brest: BPI, 1999, - 228s.

    Lawrence S., Giles C. L., Tsoi A. C. and Back A. D. Face Recognition: A Convolutional Neural Network Approach. IEEE Transactions on Neural Networks, Special Issue on Neural Networks and Pattern Recognition, pp. 1-24.

    Wasserman F. Neurocomputer technology: Theory and practice, 1992 - 184p.

    Rowley H. A., Baluja S. and Kanade T. Neural Network-Based Face Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998, Vol. 20, pp. 23-37.

    Valentin D., Abdi H., O "Toole A. J. and Cottrell G. W. Connectionist models of face processing: a survey. IN: Pattern Recognition 1994, Vol. 27, pp. 1209-1230.

    Document

    They make up algorithms recognitionimages. Methodsrecognitionimages As noted above ... reality is not exists"ecosystems in general" and exist only a few ... conclusions from this detailed reviewmethodsrecognition we presented in...

  1. Overview of methods for identifying people based on face images, taking into account the features of visual recognition

    Review

    ... recognition by a person of low-contrast objects, incl. persons. Brought review common methods ... Exists whole line methods ... way, as a result of the study, a platform for the development of methodrecognition ...

  2. Imeni Glazkova Valentina Vladimirovna RESEARCH AND DEVELOPMENT OF METHODS FOR CONSTRUCTION OF SOFTWARE TOOLS FOR CLASSIFICATION OF MULTI-TOPIC HYPERTEXT DOCUMENTS Specialty 05

    Dissertation abstract

    hypertext documents. The chapter contains reviewexistingmethods solution of the problem under consideration, description ... by cutting off the least relevant classes // Mathematical methodsrecognitionimages: 13th All-Russian Conference. Leningrad region...

  3. Slide 0 Overview of the tasks of bioinformatics related to the analysis and processing of genetic texts

    Lecture

    DNA and protein sequences. Review tasks of bioinformatics as tasks ... signals require the use of modern methodsrecognitionimages, statistical approaches and ... with low gene density. Existing gene prediction programs don't...

Methods of automatic pattern recognition and their implementation in optical character recognition systems (Optical Character Recognition - OCR systems) is one of the most advanced artificial intelligence technologies. In the development of this technology, Russian scientists occupy leading positions in the world.

An OCR system is understood as a system of automatic image recognition using special programs to image characters of printed or handwritten text (for example, entered into a computer through a scanner) and converting it into a format suitable for processing by word processors, text editors, etc.

The abbreviation OCR is sometimes deciphered as Optical Character Reader - a device for optical character recognition or automatic text reading. Currently, such devices in industrial use process up to 100,000 documents per day.

Industrial use involves the input of good to medium quality documents - this is the processing of census forms, tax returns, etc.

We list the features of the subject area that are significant from the point of view of OCR systems:

  • font and size variety of characters;
  • distortions in the images of symbols (breaks in the images of symbols);
  • distortions during scanning;
  • foreign inclusions in images;
  • combination of text fragments in different languages;
  • a wide variety of character classes that can only be recognized with additional contextual information.

Automatic reading of printed and handwritten texts is a special case of automatic visual perception of complex images. Numerous studies have shown that in order to fully solve this problem, intellectual recognition, i.e., "recognition with understanding," is necessary.

There are three principles on which all OCR systems are based.

  • 1. The principle of the integrity of the image. In the object under study there are always significant parts between which there are relationships. The results of local operations with parts of the image are interpreted only jointly in the process of interpreting integral fragments and the entire image as a whole.
  • 2. The principle of purposefulness. Recognition is a purposeful process of generating and testing hypotheses (finding out what is expected of an object).
  • 3. The principle of adaptability. The recognition system must be capable of self-learning.

Leading Russian OCR systems: FineReader; FineReader Manuscript; formReader; CunieForm (Cognitive Technologies), Cognitive Forms (Cognitive Technologies) .

The FineReader system is produced by ABBYY, which was founded in 1989. ABBYY develops in two directions: machine vision and applied linguistics. The strategic direction of scientific research and development is the natural language aspect of technologies in the field of machine vision, artificial intelligence and applied linguistics.

CuneiForm GOLD for Windows is the world's first self-learning intelligent OCR system, using the latest adaptive text recognition technology, supports many languages. For each language, a dictionary is supplied for contextual checking and improving the quality of recognition results. Recognizes any polygraphic, typewritten typefaces and fonts received from printers, with the exception of decorative and handwritten, as well as very low-quality texts.

Characteristics of pattern recognition systems. Among OSL technologies, special technologies for solving certain classes of problems of automatic pattern recognition are of great importance:

  • search for people by photos;
  • search for mineral deposits and weather forecasting based on aerial photography and satellite images in various ranges of light radiation;
  • compiling geographic maps based on the initial information used in the previous task;
  • analysis of fingerprints and drawings of the iris in forensics, security and medical systems.

At the stage of preparation and processing of information, especially when computerizing an enterprise, automating accounting, the task arises of entering a large amount of textual and graphic information into a PC. The main devices for entering graphic information are: a scanner, a fax modem, and less often a digital camera. In addition, using optical text recognition programs, you can also enter (digitize) text information into a computer. Modern software and hardware systems make it possible to automate the input of large amounts of information into a computer, using, for example, a network scanner and parallel text recognition on several computers simultaneously.

Most OCR programs work with a bitmap that is received through a fax modem, scanner, digital camera, or other device. At the first stage, the OSA system must break the page into blocks of text, based on the features of the right and left alignment and the presence of several columns. The recognized block is then split into lines. Despite the apparent simplicity, this is not such an obvious task, since in practice the distortion of the page image or its fragments when folded is inevitable. Even a slight slant causes the left edge of one line to be lower than the right edge of the next, especially when the line spacing is small. As a result, there is a problem of determining the line to which this or that fragment of the image belongs. For example, for letters

The lines are then broken up into contiguous regions of the image that correspond to individual letters; the recognition algorithm makes assumptions about the correspondence of these areas to characters, and then each character is selected, as a result of which the page is restored in characters of text, and, as a rule, in a given format. OCR systems can achieve the best recognition accuracy - over 99.9% for pure images composed of ordinary fonts. At first glance, this recognition accuracy seems perfect, but the error rate is still depressing, because if there are approximately 1500 characters per page, then even with a recognition success rate of 99.9%, there are one or two errors per page. In such cases, you should use the dictionary check method, i.e. if a certain word is not in the system dictionary, then it will try to find a similar one according to special rules. But this still does not allow 100% of errors to be corrected and requires human control of the results.

Texts encountered in real life are usually far from perfect, and the percentage of recognition errors for "impure" texts is often unacceptably high. Dirty images are the most obvious problem because even small smudges can obscure defining parts of a character or transform one into another. The problem is also inaccurate scanning associated with the "human factor", as the operator sitting at the scanner is simply not able to smooth each scanned page and align it accurately with the edges of the scanner. If the document was photocopied, there are often breaks and merging of characters. Any of these effects can cause the system to err because some of the OSD systems assume that a contiguous area of ​​an image must be a single character. An out-of-bounds or skewed page creates slightly skewed character images that can be confused by the OSA system.

The OSL system software usually works with a large bitmap of the page received from the scanner. Images with a standard degree of resolution are achieved by scanning with an accuracy of 9600 p / d. An A4 sheet image at this resolution takes up about 1 MB of memory.

The main purpose of OCR systems is to analyze raster information (scanned character) and assign a corresponding character to an image fragment. After the recognition process is completed, OCR systems must be able to preserve the formatting of source documents, assign a paragraph attribute in the right place, save tables, graphics, etc. Modern recognition programs support all known text and graphic formats and spreadsheet formats, as well as HTML and PDF.

Working with OCR systems, as a rule, should not cause any particular difficulties. Most of these systems have the simplest automatic mode "scan and recognize" (Scan & Read), and they also support the mode of recognition of images from files. However, in order to achieve the best possible results for a given system, it is desirable (and often necessary) to manually pre-adjust it to a specific type of text, letterhead layout, and paper quality. An out-of-bounds or skewed page creates slightly distorted character images that can be confused by the OCR system.

When working with an OCR system, it is very important to choose the recognition language and the type of material to be recognized (typewriter, fax, dot matrix printer, newspaper, etc.), as well as the intuitiveness of the user interface. When recognizing texts in which several languages ​​are used, the recognition efficiency depends on the ability of the OCR system to form groups of languages. At the same time, some systems already have combinations for the most commonly used languages, such as Russian and English.

At the moment, there are a huge number of programs that support text recognition as one of the possibilities. The leader in this area is the FineReader system. The latest version of the program (6.0) now has tools for developing new systems based on FineReader 6.0 technology. The FineReader 6.0 family includes: FineReader 6.0 Professional, FineReader 6.0 Corporate Edition, FineReader Scripting Edition 6.0 and FineReader Engine 6.0. The FineReader 6.0 system, in addition to knowing a huge number of formats for saving, including PDF, has the ability to directly recognize from PDF files. The new Intelligent Background Filtering technology (intelligent background filtering) allows you to filter out information about the texture of the document and the background noise of the image: sometimes a gray or colored background is used to highlight text in a document. This does not prevent a person from reading, but conventional text recognition algorithms have serious difficulties when working with letters located on top of such a background. FineReader can detect zones containing such text by separating the text from the background of the document, finding dots that are smaller than a certain size, and removing them. At the same time, the contours of the letters are preserved, so that background points that are close to these contours do not introduce interference that can degrade the quality of text recognition.

Using the capabilities of modern layout programs, designers often create objects of complex shape, such as wrapping multi-column text around a non-rectangular image. FineReader 6.0 supports the recognition of such objects and their saving in MS Word files. Now complex layout documents will be accurately reproduced in this text editor. Even tables are recognized with maximum accuracy, while maintaining all the possibilities for editing.

ABBYY FormReader is one of ABBYY's recognition programs based on ABBYY FineReader Engine. This program is designed to recognize and process forms that can be filled in manually. ABBYY FormReader can process forms with a fixed layout just as well as forms whose structure can change. The new ABBYY FlexiForm technology was used for recognition.

Leading software manufacturers have licensed Russian information technology for use with their products. The popular software packages Corel Draw (Corel Corporation), FaxLine/OCR & Business Card Wizard (Inzer Corporation) and many others have the CuneiForm OCR library built in. This program became the first OCR system in Russia to receive the MS Windows Compatible Logo.

Readiris Pro 7 system - professional program text recognition. According to the manufacturers, this OCR system differs from analogues in the highest accuracy of converting ordinary (everyday) printed documents, such as letters, faxes, magazine articles, newspaper clippings, into editable objects (including PDF files). The main advantages of the program are: the ability to more or less accurately recognize images compressed "to the maximum" (with maximum loss of quality) using the JPEG format method, support digital cameras and auto-detection of page orientation, support for up to 92 languages ​​(including Russian).

OmniPage 11 is a ScanSoft product. A limited version of this program (OmniPage 11 Limited Edition, OmniPage Lite) is usually bundled with new scanners (in Europe and the US). The developers claim that their program recognizes printed documents with almost 100% accuracy, restoring their formatting, including columns, tables, hyphenation (including hyphenation of parts of words), headings, chapter titles, signatures, page numbers, footnotes, paragraphs, numbered lists, red lines, graphs and pictures. It is possible to save to Microsoft Office, PDF and 20 other formats, recognize from PDF files and edit in this format. The artificial intelligence system allows you to automatically detect and correct errors after the first manual correction. A new specially developed software module "Dcspeckle" allows you to recognize documents with reduced quality (faxes, copies, copies of copies, etc.). The advantage of the program is the ability to recognize colored text and correct by voice. A version of OmniPage also exists for Macintosh computers.

  • Cm.: Bashmakov A. I., Bashmakov I. A. Intelligent information technologies.
If you find an error, please select a piece of text and press Ctrl+Enter.