01.02.2022
Standard deviation of the mean. Mean square (standard) deviation
Values obtained from experience inevitably contain errors due to a wide variety of reasons. Among them, one should distinguish between systematic and random errors. Systematic errors are caused by reasons that act in a very specific way, and can always be eliminated or taken into account quite accurately. Random errors are caused by a very large number of individual causes that cannot be accurately accounted for and act in different ways in each individual measurement. These errors cannot be completely excluded; they can only be taken into account on average, for which it is necessary to know the laws that govern random errors.
We will denote the measured quantity by A, and the random error in the measurement by x. Since the error x can take on any value, it is a continuous random variable, which is fully characterized by its distribution law.
The simplest and most accurately reflecting reality (in the vast majority of cases) is the so-called normal error distribution law:
This distribution law can be obtained from various theoretical premises, in particular, from the requirement that the most probable value of an unknown quantity for which a series of values with the same degree of accuracy is obtained by direct measurement is the arithmetic mean of these values. Quantity 2 is called dispersion of this normal law.
Average
Determination of dispersion from experimental data. If for any value A, n values a i are obtained by direct measurement with the same degree of accuracy and if the errors of value A are subject to the normal distribution law, then the most probable value of A will be average:
a - arithmetic mean,
a i - measured value at the i-th step.
Deviation of the observed value (for each observation) a i of value A from arithmetic mean: a i - a.
To determine the variance of the normal error distribution law in this case, use the formula:
2 - dispersion,
a - arithmetic mean,
n - number of parameter measurements,
Standard deviation
Standard deviation shows the absolute deviation of the measured values from arithmetic mean. In accordance with the formula for the measure of accuracy of a linear combination mean square error The arithmetic mean is determined by the formula:
, Where
a - arithmetic mean,
n - number of parameter measurements,
a i - measured value at the i-th step.
The coefficient of variation
The coefficient of variation characterizes the relative measure of deviation of measured values from arithmetic mean:
, Where
V - coefficient of variation,
- standard deviation,
a - arithmetic mean.
The higher the value coefficient of variation, the relatively greater the scatter and less uniformity of the studied values. If the coefficient of variation less than 10%, then the variability of the variation series is considered to be insignificant, from 10% to 20% is considered average, more than 20% and less than 33% is considered significant and if the coefficient of variation exceeds 33%, this indicates the heterogeneity of information and the need to exclude the largest and smallest values.
Average linear deviation
One of the indicators of the scope and intensity of variation is average linear deviation(average deviation module) from the arithmetic mean. Average linear deviation calculated by the formula:
, Where
_
a - average linear deviation,
a - arithmetic mean,
n - number of parameter measurements,
a i - measured value at the i-th step.
To check the compliance of the studied values with the law of normal distribution, the relation is used asymmetry indicator to his mistake and attitude kurtosis indicator to his mistake.
Asymmetry indicator
Asymmetry indicator(A) and its error (m a) is calculated using the following formulas:
, Where
A - asymmetry indicator,
- standard deviation,
a - arithmetic mean,
n - number of parameter measurements,
a i - measured value at the i-th step.
Kurtosis indicator
Kurtosis indicator(E) and its error (m e) is calculated using the following formulas:
, Where
18. Standard deviation, calculation method, value.
An approximate method for assessing the variability of a variation series is to determine the limit and amplitude, but the values of the variant within the series are not taken into account. The main generally accepted measure of the variability of a quantitative characteristic within a variation series is standard deviation (σ - sigma). The larger the standard deviation, the higher the degree of fluctuation of this series.
The method for calculating the standard deviation includes the following steps:
1. Find the arithmetic mean (M).
2. Determine the deviations of individual options from the arithmetic mean (d=V-M). In medical statistics, deviations from the average are designated as d (deviate). The sum of all deviations is zero.
3. Square each deviation d 2.
4. Multiply the squares of the deviations by the corresponding frequencies d 2 *p.
5. Find the sum of the products (d 2 *p)
6. Calculate the standard deviation using the formula:
when n is greater than 30, or when n is less than or equal to 30, where n is the number of all options.
Standard deviation value:
1. The standard deviation characterizes the spread of the variant relative to the average value (i.e., the variability of the variation series). The larger the sigma, the higher the degree of diversity of this series.
2. The standard deviation is used for a comparative assessment of the degree of correspondence of the arithmetic mean to the variation series for which it was calculated.
Variations of mass phenomena obey the law of normal distribution. The curve representing this distribution looks like a smooth bell-shaped symmetrical curve (Gaussian curve). According to the theory of probability, in phenomena that obey the law of normal distribution, there is a strict mathematical relationship between the values of the arithmetic mean and the standard deviation. The theoretical distribution of a variant in a homogeneous variation series obeys the three-sigma rule.
If in a system of rectangular coordinates the values of a quantitative characteristic (variants) are plotted on the abscissa axis, and the frequency of occurrence of a variant in a variation series is plotted on the ordinate axis, then variants with larger and smaller values are evenly located on the sides of the arithmetic mean.
It has been established that with a normal distribution of the trait:
68.3% of the values of the option are within M1
95.5% of the values of the option are within M2
99.7% of the values of the option are within M3
3. The standard deviation allows you to establish normal values for clinical and biological parameters. In medicine, the interval M1 is usually taken as the normal range for the phenomenon being studied. The deviation of the estimated value from the arithmetic mean by more than 1 indicates a deviation of the studied parameter from the norm.
4. In medicine, the three sigma rule is used in pediatrics to individually assess the level of physical development children (sigma deviation method), to develop standards for children's clothing
5. The standard deviation is necessary to characterize the degree of diversity of the characteristic being studied and to calculate the error of the arithmetic mean.
The value of the standard deviation is usually used to compare the variability of series of the same type. If two series with different characteristics are compared (height and weight, average duration of hospital treatment and hospital mortality, etc.), then a direct comparison of sigma sizes is impossible , because standard deviation is a named value expressed in absolute numbers. In these cases, use coefficient of variation (Cv), which is a relative value: the percentage ratio of the standard deviation to the arithmetic mean.
The coefficient of variation is calculated using the formula:
The higher the coefficient of variation , the greater the variability of this series. It is believed that a coefficient of variation of more than 30% indicates the qualitative heterogeneity of the population.
" |
Standard deviation(synonyms: standard deviation, standard deviation, square deviation; related terms: standard deviation, standard spread) - in probability theory and statistics, the most common indicator of the dispersion of the values of a random variable relative to its mathematical expectation. With limited arrays of samples of values, instead of the mathematical expectation, the arithmetic mean of the set of samples is used.
Encyclopedic YouTube
-
1 / 5
The standard deviation is measured in units of measurement of the random variable itself and is used when calculating the standard error of the arithmetic mean, when constructing confidence intervals, when statistically testing hypotheses, when measuring the linear relationship between random variables. Defined as the square root of the variance of a random variable.
Standard deviation:
s = n n − 1 σ 2 = 1 n − 1 ∑ i = 1 n (x i − x ¯) 2 ; (\displaystyle s=(\sqrt ((\frac (n)(n-1))\sigma ^(2)))=(\sqrt ((\frac (1)(n-1))\sum _( i=1)^(n)\left(x_(i)-(\bar (x))\right)^(2)));)- Note: Very often there are discrepancies in the names of MSD (Root Mean Square Deviation) and STD (Standard Deviation) with their formulas. For example, in the numPy module of the Python programming language, the std() function is described as "standard deviation", while the formula reflects the standard deviation (division by the root of the sample). In Excel, the STANDARDEVAL() function is different (division by the root of n-1).
Standard deviation(estimate of the standard deviation of a random variable x relative to its mathematical expectation based on an unbiased estimate of its variance) s (\displaystyle s):
σ = 1 n ∑ i = 1 n (x i − x ¯) 2 . (\displaystyle \sigma =(\sqrt ((\frac (1)(n))\sum _(i=1)^(n)\left(x_(i)-(\bar (x))\right) ^(2))).)Where σ 2 (\displaystyle \sigma ^(2))- dispersion; x i (\displaystyle x_(i)) - i th element of the selection; n (\displaystyle n)- sample size; - arithmetic mean of the sample:
x ¯ = 1 n ∑ i = 1 n x i = 1 n (x 1 + … + x n) . (\displaystyle (\bar (x))=(\frac (1)(n))\sum _(i=1)^(n)x_(i)=(\frac (1)(n))(x_ (1)+\ldots +x_(n)).)It should be noted that both estimates are biased. In the general case, it is impossible to construct an unbiased estimate. However, the estimate based on the unbiased variance estimate is consistent.
In accordance with GOST R 8.736-2011, the standard deviation is calculated using the second formula of this section. Please check the results.
Three sigma rule
Three sigma rule (3 σ (\displaystyle 3\sigma )) - almost all values of a normally distributed random variable lie in the interval (x ¯ − 3 σ ; x ¯ + 3 σ) (\displaystyle \left((\bar (x))-3\sigma ;(\bar (x))+3\sigma \right)). More strictly - with approximately probability 0.9973, the value of a normally distributed random variable lies in the specified interval (provided that the value x ¯ (\displaystyle (\bar (x))) true, and not obtained as a result of sample processing).
If the true value x ¯ (\displaystyle (\bar (x))) is unknown, then you should not use σ (\displaystyle \sigma ), A s. Thus, the rule of three sigma is transformed into the rule of three s .
Interpretation of the standard deviation value
A larger standard deviation value shows a greater spread of values in the presented set with the average value of the set; a smaller value, accordingly, shows that the values in the set are grouped around the average value.
For example, we have three numerical sets: (0, 0, 14, 14), (0, 6, 8, 14) and (6, 6, 8, 8). All three sets have mean values equal to 7, and standard deviations, respectively, equal to 7, 5 and 1. The last set has a small standard deviation, since the values in the set are grouped around the mean value; the first set has the most great importance standard deviation - values within the set diverge greatly from the average value.
In a general sense, standard deviation can be considered a measure of uncertainty. For example, in physics, standard deviation is used to determine the error of a series of successive measurements of some quantity. This value is very important for determining the plausibility of the phenomenon under study in comparison with the value predicted by the theory: if the average value of the measurements differs greatly from the values predicted by the theory (large standard deviation), then the obtained values or the method of obtaining them should be rechecked. is identified with portfolio risk.
Climate
Suppose there are two cities with the same average maximum daily temperature, but one is located on the coast and the other on the plain. It is known that cities located on the coast have many different maximum daytime temperatures that are lower than cities located inland. Therefore, the standard deviation of the maximum daily temperatures for a coastal city will be less than for the second city, despite the fact that the average value of this value is the same, which in practice means that the probability that the maximum air temperature on any given day of the year will be higher differ from the average value, higher for a city located inland.
Sport
Let's assume there are several football teams, which are assessed according to a certain set of parameters, for example, the number of goals scored and conceded, scoring chances, etc. It is most likely that the best team in this group will have better values for a larger number of parameters. The smaller the team’s standard deviation for each of the presented parameters, the more predictable the team’s result is; such teams are balanced. On the other hand, a team with a large standard deviation is difficult to predict the result, which in turn is explained by an imbalance, for example, a strong defense but a weak attack.
Using the standard deviation of team parameters makes it possible, to one degree or another, to predict the result of a match between two teams, assessing the strengths and weak sides commands, and therefore the chosen methods of struggle.
The square root of the variance is called the standard deviation from the mean, which is calculated as follows:
An elementary algebraic transformation of the standard deviation formula leads it to the following form:
This formula often turns out to be more convenient in calculation practice.
The standard deviation, just like the average linear deviation, shows how much on average specific values of a characteristic deviate from their average value. The standard deviation is always greater than the mean linear deviation. There is the following relationship between them:
Knowing this ratio, you can use the known indicators to determine the unknown, for example, but (I calculate a and vice versa. The standard deviation measures the absolute size of the variability of a characteristic and is expressed in the same units of measurement as the values of the characteristic (rubles, tons, years, etc.). It is an absolute measure of variation.
For alternative signs, for example presence or absence higher education, insurance, dispersion and standard deviation formulas are as follows:
Let us show the calculation of the standard deviation according to the data of a discrete series characterizing the distribution of students in one of the university faculties by age (Table 6.2).
Table 6.2.
The results of auxiliary calculations are given in columns 2-5 of table. 6.2.
The average age of a student, years, is determined by the weighted arithmetic mean formula (column 2):
The squared deviations of the student's individual age from the average are contained in columns 3-4, and the products of the squared deviations and the corresponding frequencies are contained in column 5.
We find the variance of students’ age, years, using formula (6.2):
Then o = l/3.43 1.85 *oda, i.e. Each specific value of a student’s age deviates from the average by 1.85 years.
The coefficient of variation
In its absolute value, the standard deviation depends not only on the degree of variation of the characteristic, but also on the absolute levels of options and the average. Therefore, it is impossible to directly compare the standard deviations of variation series with different average levels. To be able to make such a comparison, you need to find specific gravity the average deviation (linear or quadratic) in the arithmetic average, expressed as a percentage, i.e. calculate relative measures of variation.
Linear coefficient of variation calculated by the formula
The coefficient of variation determined by the following formula:
In coefficients of variation, not only the incomparability associated with different units of measurement of the characteristic being studied is eliminated, but also the incomparability that arises due to differences in the value of arithmetic means. In addition, the indicators of variation characterize the homogeneity of the population. The population is considered homogeneous if the coefficient of variation does not exceed 33%.
According to the table. 6.2 and the calculation results obtained above, we determine the coefficient of variation, %, according to formula (6.3):
If the coefficient of variation exceeds 33%, then this indicates the heterogeneity of the population being studied. The value obtained in our case indicates that the population of students by age is homogeneous in composition. Thus, an important function of generalizing indicators of variation is to assess the reliability of averages. The less c1, a2 and V, the more homogeneous the resulting set of phenomena and the more reliable the resulting average. According to the “three sigma rule” considered by mathematical statistics, in normally distributed or close to them series, deviations from the arithmetic mean not exceeding ±3st occur in 997 cases out of 1000. Thus, knowing X and a, you can get a general initial idea of the variation series. If, for example, the average wage employee in the company was 25,000 rubles, and a is equal to 100 rubles, then with a probability close to certainty, it can be argued that the wages of the company’s employees fluctuate within the range (25,000 ± ± 3 x 100), i.e. from 24,700 to 25,300 rubles.
One of the main tools of statistical analysis is the calculation of standard deviation. This indicator allows you to estimate the standard deviation for a sample or for a population. Let's learn how to use the standard deviation formula in Excel.
Let’s immediately determine what the standard deviation is and what its formula looks like. This quantity is the square root of the arithmetic mean of the squares of the difference between all quantities in the series and their arithmetic mean. There is an identical name for this indicator - standard deviation. Both names are completely equivalent.
But, naturally, in Excel the user does not have to calculate this, since the program does everything for him. Let's learn how to calculate standard deviation in Excel.
Calculation in Excel
You can calculate the specified value in Excel using two special functions STDEV.V(based on the sample population) and STDEV.G(based on the general population). The principle of their operation is absolutely the same, but they can be called in three ways, which we will discuss below.
Method 1: Function Wizard
Method 2: Formulas Tab
Method 3: Manually entering the formula
There is also a way in which you won't need to call the arguments window at all. To do this, you must enter the formula manually.
As you can see, the mechanism for calculating standard deviation in Excel is very simple. The user only needs to enter numbers from the population or references to the cells that contain them. All calculations are performed by the program itself. It is much more difficult to understand what the calculated indicator is and how the calculation results can be applied in practice. But understanding this already relates more to the field of statistics than to learning to work with software.