Public Health Data Analysis Guide

Scenario: Public Health Data Analysis

You are a data analyst working for a public health organization in the Northland region. The organization is conducting a study to analyze the distribution of measles cases across different districts and demographic groups. The study focuses on three key districts in the Northland region: Far North, Whangarei, and Kaipara.

Study Requirements

Gender Distribution

Age Groups

District Distribution

Measles Cases

Additional Data

1. Excel

a) VBA Script

To use the VBA script, press Alt + F11 to open the VBA editor in Excel. Insert a new module and paste the following code:

Sub GenerateData()
    Dim ws As Worksheet
    Dim i As Integer
    Dim TA2023_code As String
    Dim Age As String
    Dim Gender As String
    Dim MeaslesCases As Integer
    Dim lookupRange As Range
    Dim lookupResult As Variant

    On Error GoTo ErrorHandler

    ' Set the worksheet
    Set ws = ThisWorkbook.Sheets("Sheet1")
    Set lookupRange = ws.Range("J2:M4")

    ' Fill "Northland" in column A and TA2023_code in column B as text with leading zeros
    For i = 2 To 13
        ws.Cells(i, 1).Value = "Northland"

        Select Case i
            Case 2 To 5
                ws.Cells(i, 2).Value = "'001" ' Apostrophe makes it text explicitly
            Case 6 To 9
                ws.Cells(i, 2).Value = "'002" ' Apostrophe makes it text explicitly
            Case 10 To 13
                ws.Cells(i, 2).Value = "'003" ' Apostrophe makes it text explicitly
        End Select
    Next i

    ' Generate data for other columns
    For i = 2 To 13
        On Error GoTo ErrorHandler

        ' Get the TA2023_code from column B as text
        TA2023_code = ws.Cells(i, 2).Text

        ' Debug output
        Debug.Print "Processing row " & i & " with TA2023_code: " & TA2023_code

        ' VLookup to get the TA2023_name
        lookupResult = Application.VLookup(TA2023_code, lookupRange, 2, False)
        If IsError(lookupResult) Then
            MsgBox "VLookup failed for TA2023_code: " & TA2023_code & " in row " & i
            ws.Cells(i, 3).Value = "N/A"
        Else
            ws.Cells(i, 3).Value = lookupResult
        End If

        ' Set Age and Gender based on row number
        If (i - 2) Mod 4 < 2 Then
            Age = "< 25 years old"
            Gender = "Female"
        Else
            Age = "> 25 years old"
            Gender = "Male"
        End If

        ' Generate random MeaslesCases based on TA2023_code
        If TA2023_code = "001" Then
            MeaslesCases = Int((600 - 300 + 1) * Rnd + 300)
        ElseIf TA2023_code = "002" Then
            MeaslesCases = Int((600 - 100 + 1) * Rnd + 100)
        Else
            MeaslesCases = Int((500 - 200 + 1) * Rnd + 200)
        End If

        ' Fill the cells with generated data
        ws.Cells(i, 4).Value = Age
        ws.Cells(i, 5).Value = Gender
        ws.Cells(i, 6).Value = MeaslesCases

        ' VLookup for Area_sqkm and wkt_geom
        lookupResult = Application.VLookup(TA2023_code, lookupRange, 3, False)
        If IsError(lookupResult) Then
            MsgBox "VLookup for Area_sqkm failed for TA2023_code: " & TA2023_code & " in row " & i
            ws.Cells(i, 7).Value = "N/A"
        Else
            ws.Cells(i, 7).Value = lookupResult
        End If

        lookupResult = Application.VLookup(TA2023_code, lookupRange, 4, False)
        If IsError(lookupResult) Then
            MsgBox "VLookup for wkt_geom failed for TA2023_code: " & TA2023_code & " in row " & i
            ws.Cells(i, 8).Value = "N/A"
        Else
            ws.Cells(i, 8).Value = lookupResult
        End If

        On Error GoTo 0
    Next i
    Exit Sub

ErrorHandler:
    MsgBox "Error in row " & i & ": " & Err.Description
    On Error GoTo 0
End Sub

Before running the VBA script, set up your Excel sheet as follows:

A1 B1 C1 D1 E1 F1 G1 H1
DHB TA2023_code TA2023_name Age Gender Measles Cases Area_sqkm wkt_geom

In columns J through M, add the lookup table:

J K L M
TA2023_code TA2023_name Area_sqkm wkt_geom
001 Far North 7319.745 MultiPolygon(...)
002 Whangarei 2855.176 MultiPolygon(...)
003 Kaipara 3116.315 MultiPolygon(...)

b) Excel Functions

Set up your Excel sheet as shown in the tables above. Then, use the following functions:

Under DHB (A1), type "Northland" in A2. Then in A3, add the function:

=IF(ROW(A3:A13) <= 13, $A$2, "")

Under TA2023_code, add the function in B2:

=IF(ROW(A1)<=4, "001", IF(ROW(A1)<=8, "002", "003"))

Under TA2023_name, add the function in C2:

=VLOOKUP(B2, $J$2:$L$4, 2, FALSE)

Under Age, add the function in D2:

=IF(MOD(ROW()-2, 4) < 2, "< 25 years old", "> 25 years old")

Under Gender, add the function in E2:

=IF(MOD(ROW(), 2) = 0, "Male", "Female")

Under Measles Cases, add the function in F2:

=IF(C2="Far North District", RANDBETWEEN(300, 600), IF(C2="Kaipara District", RANDBETWEEN(100, 500), RANDBETWEEN(100, 600)))

Under Area_sqkm, add the function in G2:

=VLOOKUP(B2, $J$2:$M$4, 3, FALSE)

Under wkt_geom, add the function in H2:

=VLOOKUP(B2, $J$2:$M$4, 4, FALSE)

2. R Script

To generate the required dataset using R, follow these steps:

# Load necessary library
library(openxlsx)

# Create a lookup data frame
lookup_df <- data.frame(
  TA2023_code = c("001", "002", "003"),
  TA2023_name = c("Far North", "Whangarei", "Kaipara"),
  Area_sqkm = c(7319.745, 2855.176, 3116.315),
  wkt_geom = c("MultiPolygon((...))", "MultiPolygon((...))", "MultiPolygon((...))"),
  stringsAsFactors = FALSE
)

# Initialize an empty data frame for the generated data
generated_data <- data.frame(
  DHB = character(),
  TA2023_code = character(),
  TA2023_name = character(),
  Age = character(),
  Gender = character(),
  Measles_Cases = integer(),
  Area_sqkm = numeric(),
  wkt_geom = character(),
  stringsAsFactors = FALSE
)

# Generate the data
for (i in 1:12) {
  if (i <= 4) {
    TA2023_code <- "001"
  } else if (i <= 8) {
    TA2023_code <- "002"
  } else {
    TA2023_code <- "003"
  }
  
  TA2023_name <- lookup_df$TA2023_name[lookup_df$TA2023_code == TA2023_code]
  Area_sqkm <- lookup_df$Area_sqkm[lookup_df$TA2023_code == TA2023_code]
  wkt_geom <- lookup_df$wkt_geom[lookup_df$TA2023_code == TA2023_code]
  
  Age <- ifelse((i - 1) %% 4 < 2, "< 25 years old", "> 25 years old")
  Gender <- ifelse((i - 1) %% 4 < 2, "Female", "Male")
  
  if (TA2023_code == "001") {
    Measles_Cases <- sample(300:600, 1)
  } else if (TA2023_code == "002") {
    Measles_Cases <- sample(100:600, 1)
  } else {
    Measles_Cases <- sample(200:500, 1)
  }
  
  generated_data <- rbind(generated_data, data.frame(
    DHB = "Northland",
    TA2023_code = TA2023_code,
    TA2023_name = TA2023_name,
    Age = Age,
    Gender = Gender,
    Measles_Cases = Measles_Cases,
    Area_sqkm = Area_sqkm,
    wkt_geom = wkt_geom,
    stringsAsFactors = FALSE
  ))
}

# Specify the output file path
output_file_path <- "C:/Users/Pawaneet Kaur/Desktop/Excel/generated_data.xlsx"

# Write the data to an Excel file
write.xlsx(generated_data, output_file_path, rowNames = FALSE)

print("Excel file has been created successfully.")

3. Python Script

To generate the required dataset using Python, follow these steps:

import pandas as pd
import numpy as np
import os

# Create a lookup DataFrame
lookup_df = pd.DataFrame({
    'TA2023_code': ['001', '002', '003'],
    'TA2023_name': ['Far North', 'Whangarei', 'Kaipara'],
    'Area_sqkm': [7319.745, 2855.176, 3116.315],
    'wkt_geom': ['MultiPolygon((...))', 'MultiPolygon((...))', 'MultiPolygon((...))']
})

# Initialize an empty DataFrame for the generated data
generated_data = []

# Function to generate random number within a range
def generate_random_cases(code):
    if code == '001':
        return np.random.randint(300, 601)
    elif code == '002':
        return np.random.randint(100, 601)
    elif code == '003':
        return np.random.randint(200, 501)

# Generate the data
for i in range(1, 13):
    if i <= 4:
        TA2023_code = '001'
    elif i <= 8:
        TA2023_code = '002'
    else:
        TA2023_code = '003'
    
    TA2023_name = lookup_df.loc[lookup_df['TA2023_code'] == TA2023_code, 'TA2023_name'].values[0]
    Area_sqkm = lookup_df.loc[lookup_df['TA2023_code'] == TA2023_code, 'Area_sqkm'].values[0]
    wkt_geom = lookup_df.loc[lookup_df['TA2023_code'] == TA2023_code, 'wkt_geom'].values[0]
    
    Age = '< 25 years old' if (i - 1) % 4 < 2 else '> 25 years old'
    Gender = 'Female' if (i - 1) % 4 < 2 else 'Male'
    
    Measles_Cases = generate_random_cases(TA2023_code)
    
    generated_data.append({
        'DHB': 'Northland',
        'TA2023_code': TA2023_code,
        'TA2023_name': TA2023_name,
        'Age': Age,
        'Gender': Gender,
        'Measles_Cases': Measles_Cases,
        'Area_sqkm': Area_sqkm,
        'wkt_geom': wkt_geom
    })

# Convert the list of dictionaries to a DataFrame
generated_data_df = pd.DataFrame(generated_data)

# Display the generated data for debugging
print(generated_data_df)

# Specify the output file path
output_file_path = os.path.join('C:/Users/Pawaneet Kaur/Desktop/Excel', 'generated_data.xlsx')

# Write the data to an Excel file
generated_data_df.to_excel(output_file_path, index=False)

print("Excel file has been created successfully.")

Using the Generated Data

Loading Data into a Data Visualization Tool

After generating the data using one of the methods above, you can load the data into your preferred data visualization tool, such as Excel, Power BI, or Tableau, for further analysis and visualization.

Step-by-Step Guide for Power BI

Step 1: Importing Data

Open Power BI and select "Get Data" from the Home tab.

Choose "Excel" as the data source and navigate to the location of your generated Excel file.

Select the sheet containing your data and click "Load".

Step 2: Creating Visualizations

Once the data is loaded, you can create various visualizations to analyze the distribution of measles cases across different districts and demographic groups.

Step 3: Customizing the Dashboard

Add filters to allow users to interact with the data. For example, you can add slicers for gender, age group, and district.

Format the visuals to improve readability and aesthetics by adjusting colors, labels, and titles.

Additional Resources

For more detailed instructions on using Power BI, refer to the Power BI Guided Learning website.

If you encounter any issues or have questions about the scripts provided, consider reaching out to relevant online communities or forums such as Stack Overflow or the R/Python communities.

Conclusion

This guide provides multiple approaches to generating and analyzing public health data using Excel VBA, R, and Python. By following the steps outlined above, you can create a comprehensive dataset and visualize it in a tool like Power BI to gain valuable insights into the distribution of measles cases across different districts and demographic groups in the Northland region.