home Ladder

Methods of quantitative analysis: Estimation of confidence intervals. Samples and Confidence Intervals Table of Confidence Intervals

Estimation of confidence intervals

Learning objectives

The statistics consider the following two main tasks:

We have some estimate based on sample data and we want to make some probabilistic statement about where the true value of the parameter being estimated is.

We have a specific hypothesis that needs to be tested based on sample data.

In this topic, we consider the first problem. We also introduce the definition of a confidence interval.

A confidence interval is an interval that is built around the estimated value of a parameter and shows where the true value of the estimated parameter lies with an a priori given probability.

After studying the material on this topic, you:

learn what is the confidence interval of the estimate;

learn to classify statistical problems;

master the technique of constructing confidence intervals, both using statistical formulas and using software tools;

learn to determine the required sample sizes to achieve certain parameters of accuracy of statistical estimates.

Distributions of sample characteristics

T-distribution

As discussed above, the distribution of the random variable is close to a standardized normal distribution with parameters 0 and 1. Since we do not know the value of σ, we replace it with some estimate s . The quantity already has a different distribution, namely, or Student's distribution, which is determined by the parameter n -1 (number of degrees of freedom). This distribution is close to the normal distribution (the larger n, the closer the distributions).

On fig. 95
Student's distribution with 30 degrees of freedom is presented. As you can see, it is very close to the normal distribution.

Similar to the functions for working with the normal distribution NORMDIST and NORMINV, there are functions for working with the t-distribution - STUDIST (TDIST) and STUDRASPBR (TINV). An example of the use of these functions can be found in the STUDRIST.XLS file (template and solution) and in fig. 96
.

Distributions of other characteristics

As we already know, to determine the accuracy of the expectation estimate, we need a t-distribution. To estimate other parameters, such as variance, other distributions are required. Two of them are the F-distribution and x 2 -distribution.

Confidence interval for the mean

Confidence interval is an interval that is built around the estimated value of the parameter and shows where the true value of the estimated parameter lies with an a priori given probability.

The construction of a confidence interval for the mean value occurs in the following way:

Example

The fast food restaurant plans to expand its assortment with a new type of sandwich. In order to estimate the demand for it, the manager plans to randomly select 40 visitors from among those who have already tried it and ask them to rate their attitude towards the new product on a scale from 1 to 10. The manager wants to estimate the expected number of points that the new product will receive and construct a 95% confidence interval for this estimate. How to do it? (see file SANDWICH1.XLS (template and solution).

Solution

To solve this problem, you can use . The results are presented in fig. 97
.

Confidence interval for the total value

Sometimes, according to sample data, it is required to estimate not the mathematical expectation, but the total sum of values. For example, in a situation with an auditor, it may be of interest to estimate not the average value of an invoice, but the sum of all invoices.

Let N be the total number of items, n is the sample size, T 3 is the sum of the values in the sample, T" is the estimate for the sum over the entire population, then , and the confidence interval is calculated by the formula , where s is the estimate of the standard deviation for the sample, is the estimate average for the sample.

Example

Let's say a tax office wants to estimate the amount of total tax refunds for 10,000 taxpayers. The taxpayer either receives a refund or pays additional taxes. Find the 95% confidence interval for the refund amount, assuming a sample size of 500 people (see file REFUND AMOUNT.XLS (template and solution).

Solution

There is no special procedure in StatPro for this case, however, you can see that the bounds can be obtained from the bounds for the mean using the above formulas (Fig. 98
).

Confidence interval for proportion

Let p be the expectation of a share of customers, and pv be an estimate of this share, obtained from a sample of size n. It can be shown that for sufficiently large the estimate distribution will be close to normal with mean p and standard deviation . The standard error of the estimate in this case is expressed as , and the confidence interval as .

Example

The fast food restaurant plans to expand its assortment with a new type of sandwich. In order to estimate the demand for it, the manager randomly selected 40 visitors from among those who had already tried it and asked them to rate their attitude towards the new product on a scale from 1 to 10. The manager wants to estimate the expected proportion of customers who rate the new product at least than 6 points (he expects these customers to be the consumers of the new product).

Solution

Initially, we create a new column on the basis of 1 if the client's score was more than 6 points and 0 otherwise (see the SANDWICH2.XLS file (template and solution).

Method 1

Counting the amount of 1, we estimate the share, and then we use the formulas.

The value of z cr is taken from special normal distribution tables (for example, 1.96 for a 95% confidence interval).

Using this approach and specific data to construct a 95% interval, we obtain the following results (Fig. 99
). The critical value of the parameter z cr is 1.96. The standard error of the estimate is 0.077. The lower limit of the confidence interval is 0.475. The upper limit of the confidence interval is 0.775. Thus, a manager can assume with 95% certainty that the percentage of customers who rate a new product 6 points or more will be between 47.5 and 77.5.

Method 2

This problem can be solved using standard StatPro tools. To do this, it suffices to note that the share in this case coincides with the average value of the Type column. Next apply StatPro/Statistical Inference/One-Sample Analysis to build a confidence interval for the mean value (expectation estimate) for the Type column. The results obtained in this case will be very close to the result of the 1st method (Fig. 99).

Confidence interval for standard deviation

s is used as an estimate of the standard deviation (the formula is given in Section 1). The density function of the estimate s is the chi-squared function, which, like the t-distribution, has n-1 degrees of freedom. There are special functions for working with this distribution CHI2DIST (CHIDIST) and CHI2OBR (CHIINV) .

The confidence interval in this case will no longer be symmetrical. The conditional scheme of the boundaries is shown in fig. one hundred .

Example

The machine should produce parts with a diameter of 10 cm. However, due to various circumstances, errors occur. The quality controller is concerned about two things: first, the average value should be 10 cm; secondly, even in this case, if the deviations are large, then many details will be rejected. Every day he makes a sample of 50 parts (see file QUALITY CONTROL.XLS (template and solution). What conclusions can such a sample give?

Solution

We construct 95% confidence intervals for the mean and for the standard deviation using StatPro/Statistical Inference/ One-Sample Analysis(Fig. 101
).

Further, using the assumption of a normal distribution of diameters, we calculate the proportion of defective products, setting a maximum deviation of 0.065. Using the capabilities of the lookup table (the case of two parameters), we construct the dependence of the percentage of rejects on the mean value and standard deviation (Fig. 102
).

Confidence interval for the difference of two means

This is one of the most important applications of statistical methods. Situation examples.

A clothing store manager would like to know how much more or less the average female shopper spends in the store than a male.

The two airlines fly similar routes. A consumer organization would like to compare the difference between the average expected flight delay times for both airlines.

The company sends out coupons for certain types of goods in one city and does not send out in another. Managers want to compare the average purchases of these items over the next two months.

A car dealer often deals with married couples at presentations. To understand their personal reactions to the presentation, couples are often interviewed separately. The manager wants to evaluate the difference in ratings given by men and women.

Case of independent samples

The mean difference will have a t-distribution with n 1 + n 2 - 2 degrees of freedom. The confidence interval for μ 1 - μ 2 is expressed by the ratio:

This problem can be solved not only by the above formulas, but also by standard StatPro tools. To do this, it is enough to apply

Confidence interval for difference between proportions

Let be the mathematical expectation of the shares. Let be their sample estimates built on samples of size n 1 and n 2, respectively. Then is an estimate for the difference . Therefore, the confidence interval for this difference is expressed as:

Here z cr is the value obtained from the normal distribution of special tables (for example, 1.96 for 95% confidence interval).

The standard error of the estimate is expressed in this case by the relation:

Example

The store, in preparation for the big sale, undertook the following marketing research. The top 300 buyers were selected and randomly divided into two groups of 150 members each. All of the selected buyers were sent invitations to participate in the sale, but only for members of the first group was attached a coupon giving the right to a 5% discount. During the sale, the purchases of all 300 selected buyers were recorded. How can a manager interpret the results and make a judgment about the effectiveness of couponing? (See COUPONS.XLS file (template and solution)).

Solution

For our particular case, out of 150 customers who received a discount coupon, 55 made a purchase on sale, and among 150 who did not receive a coupon, only 35 made a purchase (Fig. 103
). Then the values of the sample proportions are 0.3667 and 0.2333, respectively. And the sample difference between them is equal to 0.1333, respectively. Assuming a confidence interval of 95%, we find from the normal distribution table z cr = 1.96. The calculation of the standard error of the sample difference is 0.0524. Finally, we get that the lower limit of the 95% confidence interval is 0.0307, and the upper limit is 0.2359, respectively. The results obtained can be interpreted in such a way that for every 100 customers who received a discount coupon, we can expect from 3 to 23 new customers. However, it should be kept in mind that this conclusion in itself does not mean the efficiency of using coupons (because by providing a discount, we lose in profit!). Let's demonstrate this on specific data. Suppose that the average purchase amount is 400 rubles, of which 50 rubles. there is a store profit. Then the expected profit per 100 customers who did not receive a coupon is equal to:

50 0.2333 100 \u003d 1166.50 rubles.

Similar calculations for 100 buyers who received a coupon give:

30 0.3667 100 \u003d 1100.10 rubles.

The decrease in the average profit to 30 is explained by the fact that, using the discount, buyers who received a coupon will, on average, make a purchase for 380 rubles.

Thus, the final conclusion indicates the inefficiency of using such coupons in this particular situation.

Comment. This problem can be solved using standard StatPro tools. To do this, it suffices to reduce this problem to the problem of estimating the difference of two averages by the method, and then apply StatPro/Statistical Inference/Two-Sample Analysis to build a confidence interval for the difference between two mean values.

Confidence interval control

The length of the confidence interval depends on following conditions:

directly data (standard deviation);

significance level;

sample size.

Sample size for estimating the mean

Let us first consider the problem in the general case. Let us denote the value of half the length of the confidence interval given to us as B (Fig. 104
). We know that the confidence interval for the mean value of some random variable X is expressed as , where . Assuming:

and expressing n , we get .

Unfortunately, we do not know the exact value of the variance of the random variable X. In addition, we do not know the value of t cr as it depends on n through the number of degrees of freedom. In this situation, we can do the following. Instead of the variance s, we use some estimate of the variance for some available realizations of the random variable under study. Instead of the t cr value, we use the z cr value for the normal distribution. This is quite acceptable, since the density functions for the normal and t-distributions are very close (except for the case of small n ). Thus, the desired formula takes the form:

Since the formula gives, generally speaking, non-integer results, rounding with an excess of the result is taken as the desired sample size.

Example

The fast food restaurant plans to expand its assortment with a new type of sandwich. In order to estimate the demand for it, the manager randomly plans to select a number of visitors from among those who have already tried it, and ask them to rate their attitude towards the new product on a scale from 1 to 10. The manager wants to estimate the expected number of points that the new product will receive. product and plot the 95% confidence interval of that estimate. However, he wants half the width of the confidence interval not to exceed 0.3. How many visitors does he need to poll?

as follows:

Here r ots is an estimate of the fraction p, and B is a given half of the length of the confidence interval. An inflated value for n can be obtained using the value r ots= 0.5. In this case, the length of the confidence interval will not exceed the given value B for any true value of p.

Example

Let the manager from the previous example plan to estimate the proportion of customers who prefer a new type of product. He wants to construct a 90% confidence interval whose half length is less than or equal to 0.05. How many clients should be randomly sampled?

Solution

In our case, the value of z cr = 1.645. Therefore, the required quantity is calculated as .

If the manager had reason to believe that the desired value of p is, for example, about 0.3, then by substituting this value in the above formula, we would get a smaller value of the random sample, namely 228.

Formula to determine random sample sizes in case of difference between two means written as:

Example

Some computer company has a customer service center. Recently, the number of customer complaints about the poor quality of service has increased. The service center mainly employs two types of employees: those with little experience, but who have completed special training courses, and those with extensive practical experience, but who have not completed special courses. The company wants to analyze customer complaints over the past six months and compare their average numbers per each of the two groups of employees. It is assumed that the numbers in the samples for both groups will be the same. How many employees must be included in the sample to get a 95% interval with a half length of no more than 2?

Solution

Here σ ots is an estimate of the standard deviation of both random variables under the assumption that they are close. Thus, in our task, we need to somehow obtain this estimate. This can be done, for example, as follows. Looking at customer complaint data over the past six months, a manager may notice that there are generally between 6 and 36 complaints per employee. Knowing that for a normal distribution, practically all values are no more than three standard deviations from the mean, he can reasonably believe that:

Whence σ ots = 5.

Substituting this value into the formula, we get .

Formula to determine the size of a random sample in the case of estimating the difference between the shares looks like:

Example

Some company has two factories for the production of similar products. The manager of a company wants to compare the defect rates of both factories. According to available information, the rejection rate at both factories is from 3 to 5%. It is supposed to build a 99% confidence interval with a half length of no more than 0.005 (or 0.5%). How many products should be selected from each factory?

Solution

Here p 1ot and p 2ot are estimates of two unknown fractions of rejects at the 1st and 2nd factories. If we put p 1ots \u003d p 2ots \u003d 0.5, then we will get an overestimated value for n. But since in our case we have some a priori information about these shares, we take the upper estimate of these shares, namely 0.05. We get

When estimating some population parameters from sample data, it is useful to provide not only a point estimate of the parameter, but also a confidence interval that shows where the exact value of the parameter being estimated may lie.

In this chapter, we also got acquainted with quantitative relationships that allow us to build such intervals for various parameters; learned ways to control the length of the confidence interval.

We also note that the problem of estimating the sample size (experiment planning problem) can be solved using standard StatPro tools, namely StatPro/Statistical Inference/Sample Size Selection.

Write down the task. For example: The average weight of a male student at ABC University is 90 kg. You will test the weight prediction accuracy of male students at ABC University within a given confidence interval.

Make a suitable sample. You will use it to collect data for hypothesis testing. Let's say you have already randomly selected 1000 male students.

Calculate the mean and standard deviation of this sample. Select the statistics (for example, mean and standard deviation) that you want to use to analyze your sample. Here's how to calculate the mean and standard deviation:

To calculate the sample mean, add the weights of the 1,000 sampled men and divide the result by 1,000 (the number of men). Let's say we got an average weight of 93 kg.
To calculate the standard deviation of the sample, you need to find the average value. Then you need to calculate the variance of the data, or the mean of the squared differences from the mean. Once you find that number, just take the square root of it. Let's say in our example the standard deviation is 15 kg (note that sometimes this information can be given along with the condition of the statistical problem).

Select the desired confidence level. The most commonly used confidence levels are 90%, 95% and 99%. It can also be given along with the condition of the problem. Let's say you chose 95%.

Calculate the margin of error. You can find the margin of error using the following formula: Z a/2 * σ/√(n). Z a/2 = confidence factor (where a = confidence level), σ = standard deviation, and n = sample size. This formula shows that you must multiply the critical value by the standard error. Here is how you can solve this formula by breaking it into parts:

Calculate the critical value or Z a/2 . The confidence level is 95%. Convert percentage to decimal: 0.95 and divide by 2 to get 0.475. Then look at the Z-score table to find the corresponding value for 0.475. You will find the value 1.96 (at the intersection of row 1.9 and column 0.06).
Take the standard error (standard deviation): 15 and divide it by the square root of the sample size: 1000. You get: 15/31.6 or 0.47 kg.
Multiply 1.96 by 0.47 (critical value per standard error) to get 0.92, the margin of error.

Write down the confidence interval. To formulate a confidence interval, simply write down the mean (93) ± error. Answer: 93 ± 0.92. You can find the upper and lower bounds of the confidence interval by adding and subtracting the error to/from the mean. So the lower limit is 93 - 0.92 or 92.08 and the upper limit is 93 + 0.92 or 93.92.

You can use the following formula to calculate the confidence interval: x̅ ± Z a/2 * σ/√(n), where x̅ is the mean value.

In statistics, there are two types of estimates: point and interval. Point Estimation is a single sample statistic that is used to estimate a population parameter. For example, the sample mean is a point estimate of the population mean, and the sample variance S2- point estimate of the population variance σ2. it was shown that the sample mean is an unbiased estimate of the population expectation. The sample mean is called unbiased because the mean of all sample means (with the same sample size n) is equal to the mathematical expectation of the general population.

In order for the sample variance S2 became an unbiased estimator of the population variance σ2, the denominator of the sample variance should be set equal to n – 1 , but not n. In other words, the population variance is the average of all possible sample variances.

When estimating population parameters, it should be kept in mind that sample statistics such as , depend on specific samples. To take this fact into account, to obtain interval estimation the mathematical expectation of the general population analyze the distribution of sample means (for more details, see). The constructed interval is characterized by a certain confidence level, which is the probability that the true parameter of the general population is estimated correctly. Similar confidence intervals can be used to estimate the proportion of a feature R and the main distributed mass of the general population.

Download note in or format, examples in format

Construction of a confidence interval for the mathematical expectation of the general population with a known standard deviation

Building a confidence interval for the proportion of a trait in the general population

In this section, the concept of a confidence interval is extended to categorical data. This allows you to estimate the share of the trait in the general population R with a sample share RS= X/n. As mentioned, if the values nR And n(1 - p) exceed the number 5, the binomial distribution can be approximated by the normal one. Therefore, to estimate the share of a trait in the general population R it is possible to construct an interval whose confidence level is equal to (1 - α)x100%.

where pS- sample share of the feature, equal to X/n, i.e. the number of successes divided by the sample size, R- the share of the trait in the general population, Z is the critical value of the standardized normal distribution, n- sample size.

Example 3 Let's assume that a sample is extracted from the information system, consisting of 100 invoices completed during the last month. Let's say that 10 of these invoices are incorrect. In this way, R= 10/100 = 0.1. The 95% confidence level corresponds to the critical value Z = 1.96.

Thus, there is a 95% chance that between 4.12% and 15.88% of invoices contain errors.

For a given sample size, the confidence interval containing the proportion of the trait in the general population seems to be wider than for a continuous random variable. This is because measurements of a continuous random variable contain more information than measurements of categorical data. In other words, categorical data that takes only two values contain insufficient information to estimate the parameters of their distribution.

INcalculation of estimates drawn from a finite population

Estimation of mathematical expectation. Correction factor for the final population ( fpc) was used to reduce the standard error by a factor of . When calculating confidence intervals for population parameter estimates, a correction factor is applied in situations where samples are drawn without replacement. Thus, the confidence interval for the mathematical expectation, having a confidence level equal to (1 - α)x100%, is calculated by the formula:

Example 4 To illustrate the application of a correction factor for a finite population, let us return to the problem of calculating the confidence interval for the average amount of invoices discussed in Example 3 above. Suppose that a company issues 5,000 invoices per month, and X̅=110.27 USD, S= $28.95 N = 5000, n = 100, α = 0.05, t99 = 1.9842. According to formula (6) we get:

Estimation of the share of the feature. When choosing no return, the confidence interval for the proportion of the feature that has a confidence level equal to (1 - α)x100%, is calculated by the formula:

Confidence intervals and ethical issues

When sampling a population and formulating statistical inferences, ethical problems often arise. The main one is how the confidence intervals and point estimates of sample statistics agree. Publishing point estimates without specifying the appropriate confidence intervals (usually at 95% confidence levels) and the sample size from which they are derived can be misleading. This may give the user the impression that a point estimate is exactly what he needs to predict the properties of the entire population. Thus, it is necessary to understand that in any research, not point, but interval estimates should be put at the forefront. In addition, special attention should be paid to the correct choice of sample sizes.

Most often, the objects of statistical manipulations are the results of sociological surveys of the population on various political issues. At the same time, the results of the survey are placed on the front pages of newspapers, and the sampling error and the methodology of statistical analysis are printed somewhere in the middle. To prove the validity of the obtained point estimates, it is necessary to indicate the sample size on the basis of which they were obtained, the boundaries of the confidence interval and its significance level.

Next note

Materials from the book Levin et al. Statistics for managers are used. - M.: Williams, 2004. - p. 448–462

Central limit theorem states that, given a sufficiently large sample size, the sample distribution of means can be approximated by a normal distribution. This property does not depend on the type of population distribution.

The calculation of the confidence interval is based on the average error of the corresponding parameter. Confidence interval shows within what limits with probability (1-a) is the true value of the estimated parameter. Here a is the significance level, (1-a) is also called the confidence level.

In the first chapter, we showed that, for example, for the arithmetic mean, the true population mean lies within 2 mean errors of the mean about 95% of the time. Thus, the boundaries of the 95% confidence interval for the mean will be from the sample mean by twice the mean error of the mean, i.e. we multiply the mean error of the mean by some factor that depends on the confidence level. For the mean and the difference of the means, the Student's coefficient (the critical value of the Student's criterion) is taken, for the share and difference of the shares, the critical value of the z criterion. The product of the coefficient and the average error can be called the marginal error of this parameter, i.e. the maximum that we can get when evaluating it.

Confidence interval for arithmetic mean : .

Here is the sample mean;

Average error of the arithmetic mean;

s- sample standard deviation;

f = n-1 (Student's coefficient).

Confidence interval for difference of arithmetic means :

Here, is the difference between the sample means;

- the average error of the difference of arithmetic means;

s 1 ,s 2 - sample standard deviations;

n1,n2

Critical value of the Student's criterion for a given level of significance a and the number of degrees of freedom f=n1 +n2-2 (Student's coefficient).

Confidence interval for shares :

Here d is the sample share;

– average share error;

n– sample size (group size);

Confidence interval for share differences :

Here, is the difference between the sample shares;

is the mean error of the difference between the arithmetic means;

n1,n2– sample sizes (number of groups);

The critical value of the criterion z at a given significance level a ( , , ).

By calculating the confidence intervals for the difference in indicators, we, firstly, directly see the possible values of the effect, and not just its point estimate. Secondly, we can draw a conclusion about the acceptance or refutation of the null hypothesis and, thirdly, we can draw a conclusion about the power of the criterion.

When testing hypotheses using confidence intervals, the following rule should be followed:

If the 100(1-a)-percent confidence interval of the mean difference does not contain zero, then the differences are statistically significant at the a significance level; on the contrary, if this interval contains zero, then the differences are not statistically significant.

Indeed, if this interval contains zero, then, it means that the compared indicator can be either more or less in one of the groups compared to the other, i.e. the observed differences are random.

By the place where zero is located within the confidence interval, one can judge the power of the criterion. If zero is close to the lower or upper limit of the interval, then perhaps with a larger number of compared groups, the differences would reach statistical significance. If zero is close to the middle of the interval, then it means that both the increase and decrease of the indicator in the experimental group are equally probable, and, probably, there really are no differences.

Examples:

To compare operational lethality when using two different types of anesthesia: 61 people were operated on using the first type of anesthesia, 8 died, using the second - 67 people, 10 died.

d 1 \u003d 8/61 \u003d 0.131; d 2 \u003d 10/67 \u003d 0.149; d1-d2 = - 0.018.

The difference in lethality of the compared methods will be in the range (-0.018 - 0.122; -0.018 + 0.122) or (-0.14; 0.104) with a probability of 100(1-a) = 95%. The interval contains zero, i.e. the hypothesis of the same lethality with two different types of anesthesia cannot be rejected.

Thus, mortality can and will decrease to 14% and increase to 10.4% with a probability of 95%, i.e. zero is approximately in the middle of the interval, so it can be argued that, most likely, these two methods really do not differ in lethality.

In the example considered earlier, the average tapping time was compared in four groups of students differing in their examination scores. Let's calculate the confidence intervals of the average pressing time for students who passed the exam for 2 and 5 and the confidence interval for the difference between these averages.

Student's coefficients are found from the tables of Student's distribution (see Appendix): for the first group: = t(0.05;48) = 2.011; for the second group: = t(0.05;61) = 2.000. Thus, confidence intervals for the first group: = (162.19-2.011 * 2.18; 162.19 + 2.011 * 2.18) = (157.8; 166.6) , for the second group (156.55- 2.000*1.88 ; 156.55+2.000*1.88) = (152.8 ; 160.3). So, for those who passed the exam for 2, the average pressing time ranges from 157.8 ms to 166.6 ms with a probability of 95%, for those who passed the exam for 5 - from 152.8 ms to 160.3 ms with a probability of 95%.

You can also test the null hypothesis using confidence intervals for the means, and not just for the difference in the means. For example, as in our case, if the confidence intervals for the means overlap, then the null hypothesis cannot be rejected. In order to reject a hypothesis at a chosen significance level, the corresponding confidence intervals must not overlap.

Let's find the confidence interval for the difference in the average pressing time in the groups who passed the exam for 2 and 5. The difference in the averages: 162.19 - 156.55 = 5.64. Student's coefficient: \u003d t (0.05; 49 + 62-2) \u003d t (0.05; 109) \u003d 1.982. Group standard deviations will be equal to: ; . We calculate the average error of the difference between the means: . Confidence interval: \u003d (5.64-1.982 * 2.87; 5.64 + 1.982 * 2.87) \u003d (-0.044; 11.33).

So, the difference in the average pressing time in the groups that passed the exam at 2 and at 5 will be in the range from -0.044 ms to 11.33 ms. This interval includes zero, i.e. the average pressing time for those who passed the exam with excellent results can both increase and decrease compared to those who passed the exam unsatisfactorily, i.e. the null hypothesis cannot be rejected. But zero is very close to the lower limit, the time of pressing is much more likely to decrease for excellent passers. Thus, we can conclude that there are still differences in the average click time between those who passed by 2 and by 5, we just could not detect them for a given change in the average time, the spread of the average time and sample sizes.

The power of the test is the probability of rejecting an incorrect null hypothesis, i.e. find differences where they really are.

The power of the test is determined based on the level of significance, the magnitude of differences between groups, the spread of values in groups, and the sample size.

For Student's t-test and analysis of variance, you can use sensitivity charts.

The power of the criterion can be used in the preliminary determination of the required number of groups.

The confidence interval shows within what limits the true value of the estimated parameter lies with a given probability.

With the help of confidence intervals, you can test statistical hypotheses and draw conclusions about the sensitivity of the criteria.

LITERATURE.

Glantz S. - Chapter 6.7.

Rebrova O.Yu. - p.112-114, p.171-173, p.234-238.

Sidorenko E. V. - pp. 32-33.

Questions for self-examination of students.

1. What is the power of the criterion?

2. In what cases is it necessary to evaluate the power of criteria?

3. Methods for calculating power.

6. How to test a statistical hypothesis using a confidence interval?

7. What can be said about the power of the criterion when calculating the confidence interval?

Tasks.

The confidence interval came to us from the field of statistics. This is a defined range that serves to estimate an unknown parameter with a high degree of reliability. The easiest way to explain this is with an example.

Suppose you need to investigate some random variable, for example, the speed of the server's response to a client request. Each time a user types in the address of a particular site, the server responds at a different rate. Thus, the investigated response time has a random character. So, the confidence interval allows you to determine the boundaries of this parameter, and then it will be possible to assert that with a probability of 95% the server will be in the range we calculated.

Or you need to find out how many people know about the brand of the company. When the confidence interval is calculated, it will be possible, for example, to say that with a 95% probability the share of consumers who know about this is in the range from 27% to 34%.

Closely related to this term is such a value as the confidence level. It represents the probability that the desired parameter is included in the confidence interval. This value determines how large our desired range will be. The larger the value it takes, the narrower the confidence interval becomes, and vice versa. Usually it is set to 90%, 95% or 99%. The value of 95% is the most popular.

This indicator is also influenced by the variance of observations and its definition is based on the assumption that the feature under study obeys. This statement is also known as Gauss' Law. According to him, such a distribution of all probabilities of a continuous random variable, which can be described by a probability density, is called normal. If the assumption of a normal distribution turned out to be wrong, then the estimate may turn out to be wrong.

First, let's figure out how to calculate the confidence interval for Here, two cases are possible. Dispersion (the degree of spread of a random variable) may or may not be known. If it is known, then our confidence interval is calculated using the following formula:

xsr - t*σ / (sqrt(n))<= α <= хср + t*σ / (sqrt(n)), где

α - sign,

t is a parameter from the Laplace distribution table,

σ is the square root of the dispersion.

If the variance is unknown, then it can be calculated if we know all the values of the desired feature. For this, the following formula is used:

σ2 = х2ср - (хр)2, where

х2ср - the average value of the squares of the trait under study,

(xsr)2 is the square of this feature.

The formula by which the confidence interval is calculated in this case changes slightly:

xsr - t*s / (sqrt(n))<= α <= хср + t*s / (sqrt(n)), где

xsr - sample mean,

α - sign,

t is a parameter that is found using the Student's distribution table t \u003d t (ɣ; n-1),

sqrt(n) is the square root of the total sample size,

s is the square root of the variance.

Consider this example. Assume that, based on the results of 7 measurements, the trait under study was determined to be 30 and the sample variance equal to 36. It is necessary to find, with a probability of 99%, a confidence interval that contains the true value of the measured parameter.

First, let's determine what t is equal to: t \u003d t (0.99; 7-1) \u003d 3.71. Using the above formula, we get:

xsr - t*s / (sqrt(n))<= α <= хср + t*s / (sqrt(n))

30 - 3.71*36 / (sqrt(7))<= α <= 30 + 3.71*36 / (sqrt(7))

21.587 <= α <= 38.413

The confidence interval for the variance is calculated both in the case of a known mean and when there is no data on the mathematical expectation, and only the value of the unbiased point estimate of the variance is known. We will not give here the formulas for its calculation, since they are quite complex and, if desired, they can always be found on the net.

We only note that it is convenient to determine the confidence interval using the Excel program or a network service, which is called so.