Information

Chi-square test for independence (two samples)


The use of this research test aims to verify if the distributions of two or more unrelated samples differ significantly in relation to the given variable.

Conditions for test execution

Exclusively for nominal and ordinal variables;

Preferably for large samples, <30;

Independent observations;

Not applicable if 20% of observations are less than 5

There can be frequencies below 1;

In the latter two cases, if there are such incidences, it is advisable to group the data according to a specific criterion.

Procedure for Test Execution

Determine H0. The variables are independent, or the variables are not associated;

Establish the significance level (µ);

Determine the rejection region of H0. Determine the value of degrees of freedom (φ), where φ = (L - 1) (C - 1), where L = table row numbers and C = number of columns… Find the value of Chi-square tabulated;

Calculate the Chi Square using the formula:

To find the expected value (E), use the following formula:

Since the calculated Chi Square, higher than the tabulated, rejects H0 in favor of H1.

There is dependency or the variables are not associated.

Example

A researcher wants to identify if there is dependence on the consumption of their chocolates and the cities of their region.

Taquari Valley Towns

Chocolate flavor

Lajeado

Holy Cross

Star

Taquari

Cashew Chocolate

60

30

20

40

150

Peanut Chocolate

45

35

20

10

110

Chocolate with flakes

55

25

47

13

140

Chocolate with raisins

70

35

25

20

150

230

125

112

83

550

H0: The preference for flavors is independent of the city

H1: The preference for flavors depends on the city.

µ = 0,05

φ = (4 - 1) (3 - 1) = 6, where tabulated chi square is 12.6.

Calculation of expected values ​​(E).

Taquari Valley Towns

Chocolate flavor

Lajeado

Holy Cross

Star

Taquari

Cashew Chocolate

62,7

34,1

30,5

22,6

Peanut Chocolate

46,0

25,0

22,4

16,6

Chocolate with flakes

58,5

31,8

28,5

21,1

Chocolate with raisins

62,7

34,1

30,5

22,6

Χ2 = (60 - 62,7)2/62,7 + (30 - 34,1) 2/34,1… (20 - 22,6) 2/22,6 =

0,11+0,49+3,61+13,39+0,02+4+0,25+2,62+0,21+1,45+12+3,11+0,85+0,32+0,99+0,29 = 43,72

It is concluded that the calculated Chi square (43.72) is higher than the tabulated (12.6), rejects H0 in favor of H1.

Therefore there is significant difference, at the 0.05 level, for cities.

Contingency Coefficient (CC)

CC is an indicator of the degree of association between two variables analyzed by Chi square.

The closer to 1, the better the contingency coefficient, which ranges from 0 to 1.

In the example given above the coefficient would be 0.3442.

Next: T Test for Two Unrelated Samples