Calculations

\(G_i^*\) is the sum of the values of neighbors (\(j\)) connected to location \(i\) (including the value at \(i\)) divided by the sum of the values at all locations:

\[G_i^* = \frac{\sum_{j=1}^n w_{ij} y_j}{\sum_{j=1}^n y_j}\]

where \(w_{ij}\) is a weight indicating whether a location \(j\) is connected to location \(i\). Weights are typically represented in a pairwise (square) matrix using binary terms (1 = connected, 0 = not connected). In most cases (including in the UDSS Analysis Module), the binary connections matrix is row-standardized (by dividing each element in the matrix by its row sum) to account for potential variation in the number of connections across different locations. The expected value of \(G_i^*\) is:

\[E(G_i^*) = \frac{W_i^*}{n}\]

where \(W_i^*\) is the sum of the weights connected to point \(i\), including \(i\) itself (if not row-standardized, this is simply the number of locations connected to focal location \(i\)).

In words, \(G_i^*\) measures whether the sum of values at and around a location (\(i\)) are disproportionately high or low, where the expected value is based on the number of locations \(j\) connected to location \(i\) relative to the total number of locations. Locations with more connections are expected to have a higher \(G_i^*\).

The standardized \(G_i^*\) score is calculated as

\[z(G_i^*) = \frac{\sum_{j}w_{ij}y_j-W_i^*\bar{y}} {s\cfrac{(nS_{1i}^*)-W_i^{*2}}{(n-1)^{1/2}}}\]

where \(S_{1i}^* = \sum_jw_{ij}^2\) (the sum of the squared weights connected to location \(i\)), \(\bar{y}\) is the mean of the values over all locations and \(s\) is the standard deviation of the values over all locations.

A worked example may be helpful in understanding the theory and calculations underlying a hotspot analysis.

Imagine we have a state that is perfectly square and consists of 25 counties, each of which is also perfectly square, forming a 5x5 grid. Each grid cell (county) has a value that represents the rate of diabetes located in that grid cell. The grid cells have either 1, 4, 5, 7, or 8 people with diabetes per 100 population and we can create a choropleth map by using colors to represent the rates.

Hypothetical 5x5 grid of square 'counties' each with measured diabetes prevalence rate.

The spatial distribution of diabetes rates looks to be patchy (or “clustered”) in certain grid cells of our grid. We want to quantify (and potentially test) whether there are certain locations on the grid that have a disproportionately high (“hotspot”) or low (“coldspot”) diabetes prevalence surrounding that location.

We calculate the \(G_i^*\) statistics for each grid cell and then standardize them for ease of interpretation. Using the standard (\(z\)) scores, we can see that hotspots have positive values and coldspots have negative values. Grid cells with \(z\)-scores at or near zero are at or near the expected value (if the rate of diabetes were evenly distributed across the grid). We can plot the \(z\)-scores to get a better sense of the overall pattern, using a color ramp from blue (coldspot) to red (hotspot).

Hotspot analysis of a hypothetical 5x5 grid of diabetes prevalence next to the map of the grid.

Hotspot analysis of a hypothetical 5x5 grid of diabetes prevalence next to the map of the grid.

To see how the weights are stored in the weights matrix, consider the first (green) cell in the bottom left corner from grid of hospital facility counts (we will index this cell as [1, 1] for row = 1, column = 1). Under queen contiguity, this cell has four connections: the cell to the right [1, 2], the cell above [2, 1], the cell diagonally up to the right [2, 2], and a connection to itself [1, 1] since we are calculating \(G_i^*\). The weight matrix is 25 x 25 since there are 25 grid cells. The first row of the weights matrix represents the connections for cell [1, 1]. Since this is a pairwise matrix, the values in the first column also represent the connections to cell [1, 1]. Because all cells are connected to themselves for \(G_i^*\), the diagonal of the matrix contains all 1s; this would not be the case if we were calculating \(G_i\) rather than \(G_i^*\).

[1,1] [1,2] [1,3] [1,4] [1,5] [2,1] [2,2] [2,3] [2,4] [2,5] [3,1] [3,2] [3,3] [3,4] [3,5] [4,1] [4,2] [4,3] [4,4] [4,5] [5,1] [5,2] [5,3] [5,4] [5,5]
[1,1] 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1,2] 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1,3] 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1,4] 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1,5] 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2,1] 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
[2,2] 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
[2,3] 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0
[2,4] 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0
[2,5] 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
[3,1] 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0
[3,2] 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0
[3,3] 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0
[3,4] 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0
[3,5] 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0
[4,1] 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0
[4,2] 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0
[4,3] 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0
[4,4] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1
[4,5] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1
[5,1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0
[5,2] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0
[5,3] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0
[5,4] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1
[5,5] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1

Because the number of connections per cell may vary, it is common practice to “row standardize” the matrix. Row standardization is accomplished by dividing each value in a row by its row sum. For example, the first grid cell [1, 1] has four connections, so its row sum is four. The second cell (from the bottom left corner) [1, 2] has six connections, so its row sum is six. If we divide each value in the row by the row sum we get the row-standardized matrix.

[1,1] [1,2] [1,3] [1,4] [1,5] [2,1] [2,2] [2,3] [2,4] [2,5] [3,1] [3,2] [3,3] [3,4] [3,5] [4,1] [4,2] [4,3] [4,4] [4,5] [5,1] [5,2] [5,3] [5,4] [5,5]
[1,1] 0.2500000 0.2500000 0.0000000 0.0000000 0.0000000 0.2500000 0.2500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[1,2] 0.1666667 0.1666667 0.1666667 0.0000000 0.0000000 0.1666667 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[1,3] 0.0000000 0.1666667 0.1666667 0.1666667 0.0000000 0.0000000 0.1666667 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[1,4] 0.0000000 0.0000000 0.1666667 0.1666667 0.1666667 0.0000000 0.0000000 0.1666667 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[1,5] 0.0000000 0.0000000 0.0000000 0.2500000 0.2500000 0.0000000 0.0000000 0.0000000 0.2500000 0.2500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[2,1] 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[2,2] 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[2,3] 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[2,4] 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[2,5] 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[3,1] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[3,2] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[3,3] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[3,4] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[3,5] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[4,1] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000
[4,2] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000
[4,3] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000
[4,4] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111 0.0000000 0.0000000 0.1111111 0.1111111 0.1111111
[4,5] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667
[5,1] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.2500000 0.2500000 0.0000000 0.0000000 0.0000000 0.2500000 0.2500000 0.0000000 0.0000000 0.0000000
[5,2] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.1666667 0.0000000 0.0000000 0.1666667 0.1666667 0.1666667 0.0000000 0.0000000
[5,3] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.1666667 0.0000000 0.0000000 0.1666667 0.1666667 0.1666667 0.0000000
[5,4] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1666667 0.1666667 0.1666667 0.0000000 0.0000000 0.1666667 0.1666667 0.1666667
[5,5] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.2500000 0.2500000 0.0000000 0.0000000 0.0000000 0.2500000 0.2500000

The expected value of \(G_i^*\) is:

\[E(G_i^*) = \frac{W_i^*}{n}\]

where \(W_i^*\) is the sum of the weights connected to point \(i\), including \(i\) itself. Hence, for the row-standardized matrix, the expected value of \(G_i^*\) for the first grid cell [1, 1] is:

\[E(G_i^*)=(0.25 + 0.25 + 0.25 + 0.25)/25 = 1/25 = 0.04 \]

If the connection matrix were not row-standardized, expected value for \(G_i^*\) would be:

\[E(G_i^*)=(1+1+1+1)/25=4/25=0.16\]

This makes intuitive sense: if the rate of diabetes was evenly distributed across our grid, we would expect the proportion of values surrounding a location to be only a function of the number of its connections (relative to the total number of cells). Note that the sum of the rates across our grid equals 75 and that there are 25 grid cells (\(n\)) , with a mean rate (\(\bar{y}\)) of 3 per grid cell. Therefore, if the prevalence of diabetes were evenly distributed across the grid, each cell would have value of 3 and the value of \(G_i^*\) for the first grid cell [1, 1] would be:

\[G_i^* = [(0.25*3) + (0.25*3) + (0.25*3) +(0.25*3)] / 75 = 0.04\]

which is exactly what we expected for the row-standardized matrix based on the number of connections.

The observed value of \(G_i^*\) for the first grid cell, which has four neighbors, is:

\[G_i^*=[(0.25*4) + (0.25*4) + (0.25*4) +(0.25*7)]/75 = 0.063\]

which is is higher than the expected value of 0.04. Let’s compare this to the observed \(G_i^*\) statistic for the bottom right corner (grid cell [1, 5]), which also has four connections but only has a diabetes rate of 1 and appears to be surrounded by cells with low rates of diabetes.

\[G_i^*=[(0.25*1)+(0.25*1)+(0.25*1)+(0.25*1)]/75 = 0.013\]

For this cell, we see that the \(G_i^*\) statistic is much less than the expected value of 0.04. While it is possible to compare observed and expected values to get a sense of whether a cell may represent a hot or coldspot, such observations are more straight forward if we transform the \(G_i^*\) statistics to a standard (\(z\)) score:

\[z(G_i^*)=\frac{\sum_{j}w_{ij}y_j-W_i^*\bar{y}}{s\{[(nS_{1i}^*)-W_i^{*2}]/(n-1)\}^{1/2}}\]

where \(S_{1i}^* = \sum_jw_{ij}^2\) (the sum of the squared weights connected to location \(i\)), \(\bar{y}\) is the mean of the values over all locations and \(s\) is the standard deviation of the values over all locations. In our example, the \(\bar{y}\) is 3 and \(s\) is 2 (assuming we have all the grid cell values and not a sample).

For the grid cell [1, 1] in the bottom left corner, \(S_{1i}^*\) equals 0.25 and, as before, \(W_i^*\) equals 1. Putting this together we have:

\[ z(G_i^*)=\frac{4.75 - (1*3)}{2\{{[(25*0.25) - 1^2]/(25-1)\}}^{1/2}} = 1.87 \]

For the grid cell [1, 5] in the bottom right corner, we have:

\[ z(G_i^*)=\frac{1 - (1*3)}{2\{{[(25*0.25) - 1^2]/(25-1)\}}^{1/2}} = -2.14 \]

These values indicate a hotspot in the bottom left corner and a coldspot in the bottom right corner. The higher magnitude of the coldspot indicates that the deviation from the expected value is more extreme than in the bottom left corner. The two-tailed P-value for a \(z\)-score of -2.14 is 0.032 and is 0.062 for a \(z\)-score of 1.87.

Inference

For a hotspot analysis, the null hypothesis is that there is no difference between the observed and expected local-\(G\) statistic. For each areal unit (a county), the difference between the observed and expected value is transformed into a \(z\)-score and, given certain simplifying assumptions, the \(z\)-score can be referenced against a standard normal distribution to infer statistical significance. From the standard normal distribution, one can calculate a P-value. The P-value tells us the probability of obtaining a \(z\)-score equal to or more extreme than observed if the null hypothesis were true; the more extreme the \(z\)-score, the lower the probability that a result is due to random chance. Because there are many caveats associated with making statistical inferences in a hotspot analysis, we have chosen not to report P-values directly in the Analysis Module. Nevertheless, for heuristic purposes, the table (below) lists the critical values for \(z\)-scores based on one- and two-tailed tests for different significance levels (\(\alpha\)). In most contexts, two-tailed tests are preferred because we do not typically have an a priori hypothesis as to whether a particular county represents a hotspot or coldspot. Note that the default threshold for the sliders on the color ribbon found on the hospot analysis page (i.e. -2 to 2) approximates the threshold for a two-tailed test at a significance level of 0.05 (i.e. -1.96 to 1.96).

One-tailed right One-tailed left Two-tailed right Two-tailed left
α 0.05 1.64 -1.64 1.96 -1.96
α 0.01 2.33 -2.33 2.58 -2.58
α 0.001 3.09 -3.09 3.29 -3.29

Whenever a statistical test is conducted, there is some chance that the null hypothesis may be rejected due to chance alone, referred to as a Type I (“type one”) statistical error. The significance level (\(\alpha\)) represents the Type I error rate; a common convention is to use an \(\alpha\) of 0.05, which means that one is willing to accept the possibility that 1 in 20 tests may be the result of a Type I error. The more tests conducted, the higher the chance of committing a Type I error. The family-wise error rate (FWER) represents the probability of committing a Type I error in the course of conducting some number (\(n\)) of tests, at a particular significance level. \[FWER = 1-(1-\alpha)^n \] To demonstrate these calculations, we will use the 2019 data for Newly Diagnosed Diabetes in Georgia.

Map showing a hotspot analysis of diabetes incidence in Georgia in 2019, showing coldspots in northeast Georgia and hotspots in south central Georgia.

There are 159 counties in Georgia. For a significance level of 0.05, the FWER equals: \[FWER = 1 - (1 - 0.05)^159 = 1 - (0.95)^159 = 0.9997129 = 1.00 = 100\%\]In other words, there is a practically a 100% chance of committing at least one Type I error in the course of conducting a hotspot analysis in the state of Georgia. To account for this problem, it is advisable to apply a correction for multiple testing, which usually involves adjusting the significance level downwards as to make the test more conservative. However, it is possible to be too conservative in this adjustment and instead commit a Type II (“type two”) error: failure to reject the null hypothesis when it should have been rejected. Simple corrections for multiple testing, such as strict Bonferroni correction, are commonly viewed to be too conservative for local spatial statistics. For example, if a strict Bonferroni correction were applied the adjusted significance level would be: \[\alpha^{'} = \frac{\alpha}{n} = \frac{0.05}{159} = 0.00031\]

For a two-tailed test, this implies a \(z\)-score greater than or equal to 3.61 (or less than or equal to -3.61). We can use the slider to isolate \(z\)-scores greater than 3.61 or less than -3.61. For our example, only Houston, Wilcox, and Macon counties would be considered statistically significant hotspots and there are no statistically significant coldspots:

Setting the sliders to plus and minus 3.58 for diabetes incidence in Georgia in 2019 will show significant hotspots using a Bonferonni correction

A less conservative alternative to strict Bonferroni correction that is often used with local spatial statistics is the False Discovery Rate. The False Discovery Rate can be determined by first sorting the \(P\)-values in descending order and then ranking the \(P\)-values, where the lowest \(P\)-value gets an rank (\(i\)) of 1 and the largest \(P\)-value gets rank equal to \(n\). Next, a new variable (FDR) is calculated for each observation based on \(i\), \(\alpha\) and \(n\):

\[FDR = i*(\alpha/n)\]For example, the observation with the smallest \(P\)-value (Wilcox County) would get an FDR of \(1*(0.05/159) = 0.0003125\) , the next smallest \(P\)-value (Macon County) would get an FDR of \(2*(0.05/159)=0.0006250\), and so on. Finally we compare the \(P\)-value to its FDR and retain only those results where the \(P\)-value is less than or equal to to its corresponding FDR. The table (below) indicates that there are seven counties with \(P\)-values that are less than or equal to their FDR. Those counties are indicated in the map (below the table) using the threshold sliders.

County FIPS County State Rate per 1000 Z_Score P i FDR
13315 Wilcox County Georgia 6.6 4.626 0.0000019 1 0.0003125
13193 Macon County Georgia 14.0 4.064 0.0000241 2 0.0006250
13153 Houston County Georgia 15.4 3.788 0.0000759 3 0.0009375
13311 White County Georgia 7.3 -3.283 0.0005135 4 0.0012500
13081 Crisp County Georgia 9.0 3.143 0.0008361 5 0.0015625
13281 Towns County Georgia 5.5 -3.014 0.0012891 6 0.0018750
13139 Hall County Georgia 9.0 -2.915 0.0017784 7 0.0021875

Setting the sliders to -2.91 and 3.11 for diabetes incidence in Georgia in 2019 will show significant hotspots using a FDR correction

Although statistical inference associated with local spatial statistics may require considering issues such as correction for multiple testing, it may be worthwhile to look at the overall pattern represented in the choropleth map, without thresholding.

Evaluate statistically significant counties alongside the general pattern at the whole state level to avoid missing any interesting general patterns.

While critical values and correction for multiple testing help to avoid Type I errors, more general patterns may be overlooked if too much emphasis is placed on statistical significance (in other words: “do not miss the forest for the trees”). For example, only considering counties that are statistically significant after test correction would obscure the fact that diabetes incidence in 2019 is disproportionately low throughout northeastern Georgia, and disproportionately high across a very large swath of south-central Georgia. Randomly permuting incidence values among counties in Georgia and then recalculatingthe \(z\)-scores (over and over again) may show some clusters of orange or blue counties (due to chance) but the likelihood of orange and blue counties clustering over large regions (as observed) would be extremely low.