# Chi-square test for independence (two samples)

The use of this research test aims to verify if the distributions of two or more unrelated samples differ significantly in relation to the given variable.

## Conditions for test execution

Exclusively for nominal and ordinal variables;

Preferably for large samples, <30;

Independent observations;

Not applicable if 20% of observations are less than 5

There can be frequencies below 1;

In the latter two cases, if there are such incidences, it is advisable to group the data according to a specific criterion.

## Procedure for Test Execution

Determine H0. The variables are independent, or the variables are not associated;

Establish the significance level (µ);

Determine the rejection region of H0. Determine the value of degrees of freedom (φ), where φ = (L - 1) (C - 1), where L = table row numbers and C = number of columns… Find the value of Chi-square tabulated;

Calculate the Chi Square using the formula: To find the expected value (E), use the following formula: Since the calculated Chi Square, higher than the tabulated, rejects H0 in favor of H1.

There is dependency or the variables are not associated.

### Example

A researcher wants to identify if there is dependence on the consumption of their chocolates and the cities of their region.

 Taquari Valley Towns Chocolate flavor Lajeado Holy Cross Star Taquari ∑ Cashew Chocolate 60 30 20 40 150 Peanut Chocolate 45 35 20 10 110 Chocolate with flakes 55 25 47 13 140 Chocolate with raisins 70 35 25 20 150 ∑ 230 125 112 83 550

H0: The preference for flavors is independent of the city

H1: The preference for flavors depends on the city.

µ = 0,05

φ = (4 - 1) (3 - 1) = 6, where tabulated chi square is 12.6.

 Calculation of expected values ​​(E). Taquari Valley Towns Chocolate flavor Lajeado Holy Cross Star Taquari Cashew Chocolate 62,7 34,1 30,5 22,6 Peanut Chocolate 46,0 25,0 22,4 16,6 Chocolate with flakes 58,5 31,8 28,5 21,1 Chocolate with raisins 62,7 34,1 30,5 22,6

Χ2 = (60 - 62,7)2/62,7 + (30 - 34,1) 2/34,1… (20 - 22,6) 2/22,6 = 0,11+0,49+3,61+13,39+0,02+4+0,25+2,62+0,21+1,45+12+3,11+0,85+0,32+0,99+0,29 = 43,72

It is concluded that the calculated Chi square (43.72) is higher than the tabulated (12.6), rejects H0 in favor of H1.

Therefore there is significant difference, at the 0.05 level, for cities.

## Contingency Coefficient (CC)

CC is an indicator of the degree of association between two variables analyzed by Chi square.

The closer to 1, the better the contingency coefficient, which ranges from 0 to 1.

In the example given above the coefficient would be 0.3442. Next: T Test for Two Unrelated Samples