Survey Report (Based on Synthetic Data)
1 Executive Summary
Summary
This project is designed to mirror a project I recently worked on, analyzing the results of a survey that was distributed across a college campus. The actual report and data are not public, so I cannot include them here. Instead, I have created a synthetic data set in order to showcase some of the work I did on the project. The synthetic data set was generated using a python script, which is included below. I also replaced or redacted all text from the report so none of the conclusions or insights are published here. All charts created reflect the original report in terms of the code and aesthetic, but any values or trends are based on synthetic data, not real results.
The goal of the original project was to survey and understand the perspectives on AI of different groups of our college’s community. We sent out a survey which included various check-box style questions and some open-response questions. I was responsible for data exploration, cleaning and formatting the data, and visualizing the results to help the VP and administrators make decisions about AI policies at the college.
2 Data Cleaning and Preparation
2.1 Generating a Synthetic Data Set
This section was not included in the original report, but will be included here to demonstrate how the synthetic data set was created based on the original one.
Show the code
# Import libraries
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
# Function for randomizing submission dates
def random_march_timestamp(size, rng):
start = datetime(2025, 3, 1, 0, 0)
end = datetime(2025, 4, 1, 0, 0)
span_seconds = int((end - start).total_seconds())
offsets = rng.integers(0, span_seconds, size=size)
stamps = [start + timedelta(seconds=int(s)) for s in offsets]
# Match format of original dataset
return [dt.strftime(f"%-m/%-d/%y %-H:%M") for dt in stamps]
# Load real dataset
source = pd.read_csv("AI Task Force Survey Results.csv")
# Extract header row
header_row = source.iloc[[0]]
source = source.iloc[1:]
# Checkbox vs. Open Response Columns
open_cols = ['Q3_7_TEXT', 'Q5_4_TEXT', 'Q7_7_TEXT', 'Q9_18_TEXT', 'Q12_10_TEXT', 'Q14', 'Q15yes_8_TEXT', 'Q17', ]
model_cols = [c for c in source.columns if c not in open_cols]
# Set up for Random Generation
rng = np.random.default_rng(17)
n_rows = len(source)
synthetic = {}
# Randomly Generate Responses
for col in model_cols:
if col == 'RecordedDate':
synthetic[col] = random_march_timestamp(n_rows, rng)
else:
freq = source[col].value_counts(normalize=True)
synthetic[col] = rng.choice(freq.index, size=n_rows, p=freq.values)
# Save as dataframe
synthetic_df = pd.DataFrame(synthetic)
# Save Open-Response Answers as Blank Strings
for col in open_cols:
synthetic_df[col] = ""
# Re-Attach Header
out_df = pd.concat([header_row, synthetic_df], ignore_index=True)
# Export as CSV
out_df.to_csv("survey_synthetic.csv", index=False)
2.2 Data Prep
First, the data set needed to be formatted for analysis. Generally, this included steps ranging from eliminating “housekeeping” adjustments like empty responses to categorizing open responses. The following steps were taken to format the data set:
Detailed Steps
Removed the description row
Removed empty responses (where no questions were answered)
Reformatted the response date (RecordedDate)
Removed responses that were entered before official release of survey
Condense the responses of the Device Access Question (Q7 + Q7_7_TEXT)
Bucketed “Once Weekly” and “A few times a week” into a single category of “Weekly” for Q8. (See section 4.2 for details/explanation).
Condense the responses of the AI Tools Question (Q9 + Q9_18_TEXT)
Categorized Open Ended Responses for Uses of AI (Q12 + Q12_10_TEXT)
Categorized Open Ended Responses for Concerns about AI (Q15yes + Q15_8_TEXT)
Cleaned up the responses for Skill Level Ratings (Q10)
Reformatted responses for learning opportunities (Q18)
Show the code
## Data Cleanup
# Split Description Row
first_row <- data[1, , drop = FALSE]
rest <- data[-1, ]
# Remove empty rows
df <- rest %>%
rowwise() %>%
filter(!all(is.na(c_across(-RecordedDate))) ) %>%
ungroup()
# Remove Pre-release test responses
df <- df %>%
mutate(RecordedDate = mdy_hm(RecordedDate),
RecordedDate = as.Date(RecordedDate))
df <- df %>%
filter(RecordedDate > as.Date("2025-03-16"))
# Check Remaining NAs
na_counts <- df %>%
summarise(across(everything(), ~ sum(is.na(.)))) %>%
pivot_longer(everything(), names_to = "Column", values_to = "NA_Count")
head(df)
# Condense Q7 Responses
df <- df %>%
mutate(Q7_7_TEXT = case_when(
Q7_7_TEXT == "iPhone" ~ "Smartphone",
Q7_7_TEXT == "Android Phone" ~ "Smartphone",
Q7_7_TEXT == "Apple Watch" ~ "Smart Watch",
Q7_7_TEXT == 'Kindle' ~ 'Kindle',
Q7_7_TEXT == "Kindle Reader" ~ "Kindle",
Q7_7_TEXT == 'Amazon Echo' ~ 'Smart Speaker',
Q7_7_TEXT == 'Game Console' ~ 'Game Console',
Q7_7_TEXT == 'I am not sure my devices support AI or want them to' ~ 'Unsure',
Q7_7_TEXT == 'Unfortunately all my devices are trying to force me to use AI.' ~ NA,
TRUE ~ Q7_7_TEXT
)) %>%
mutate(Q7 = if_else(!is.na(Q7_7_TEXT), paste0(Q7, ', ', Q7_7_TEXT), Q7))
# Clean Up / Format Device Access (Q7) Responses
df <- df %>%
mutate(Q7 = Q7 %>%
str_replace_all("Laptop \\(PC or Mac\\)", "Laptop") %>%
str_replace_all("Desktop computer \\(PC or Mac\\)", "Desktop Computer") %>%
str_replace_all("I do not have a device capable of utilizing AI applications.", "None") %>%
str_replace_all("VR headest", "VR Headset"))
# Clean up / Format Usage Frequency (Q8) Responses
df <- df %>%
mutate(Q8 = case_when(
Q8 == "Once weekly" ~ "Weekly",
Q8 == "Frequently, every day" ~ "Frequently / Daily",
Q8 == "A few times a week" ~ "Weekly",
TRUE ~ Q8
))
# Clean Up & Condense Q9 Responses (Regularly Used AI Tools)
df <- df %>%
mutate(Q9_18_TEXT = case_when(
str_detect(Q9_18_TEXT, regex("GROK", ignore_case = TRUE)) ~ "Grok",
str_detect(Q9_18_TEXT, regex("claude", ignore_case = TRUE)) ~ "Claude",
str_detect(Q9_18_TEXT, regex("google", ignore_case = TRUE)) ~ "Gemini",
str_detect(Q9_18_TEXT, regex("co pilot", ignore_case = TRUE)) ~ "Microsoft Copilot",
str_detect(Q9_18_TEXT, regex("llama", ignore_case = TRUE)) ~ "Meta AI",
is.na(Q9_18_TEXT) ~ NA,
TRUE ~ "Other"
)) %>%
rowwise() %>%
mutate(Q9 = Q9 %>%
str_replace_all("I do not use AI tools regularly.", "None") %>%
str_replace("Other \\(please specify\\):", Q9_18_TEXT)) %>%
ungroup() %>%
mutate(cleaned_col = case_when(
str_trim(Q9) == "None" ~ "None", # keep as is
TRUE ~ str_remove_all(Q9, "\\bNone\\b,?\\s*") %>%
str_replace_all(",\\s*$", "") %>% # remove trailing commas
str_trim()
))
# Clean up Q10 Reponses
df <- df %>%
mutate(Q10 = if_else(Q10 == "Not currently, but I have in the past.",
"Previously", Q10))
# Clean up Q12 Responses and Categorized Open Response Answers
# Note: These have been redacted for purposes of publishing this work
df <- df %>%
mutate(Q12_10_TEXT = case_when(
is.na(Q12_10_TEXT) ~ "NA",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Fun",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Fun",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Fun",
str_detect(Q12_10_TEXT, regex("<redacted> ", ignore_case = TRUE)) ~ "Writing Assistance",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Navigation",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Writing Assistance",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous, Writing Assistance",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Detect Cheating",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Detect Cheating",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "None",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Translation",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Translation",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Assignment Completion",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Research, Explanation",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Learn AI",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Writing Assistance",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Studying",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Studying",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Navigation, Assignment Completion",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Writing Assistance",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Writing Assistance",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Writing Assistance",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Writing Assistance",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Lesson Planning",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Studying",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Lesson Planning",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous, Brainstorming",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Writing Assistance",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Miscellaneous",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Speech-to-Text",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ NA,
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Lesson Planning, Image Generation",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Writing Assistance",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Explanation, Studying",
str_detect(Q12_10_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Grading",
TRUE ~ "Other"
)) %>%
rowwise() %>%
mutate(Q12 = Q12 %>%
str_replace_all("I do not use AI tools.", "None") %>%
str_replace_all('Language translation', 'Translation') %>%
str_replace_all('Writing assistance', 'Writing Assistance') %>%
str_replace_all('Image generation', 'Image Generation') %>%
str_replace("Other \\(please specify\\):", Q12_10_TEXT),
Q12 = case_when(
str_detect(Q1, "Faculty") & str_detect(Q1, "Students") ~ str_replace(Q12, "Study or lesson planning", "Studying, Lesson Planning"),
str_detect(Q1, "Faculty") ~ str_replace(Q12, "Study or lesson planning", "Lesson Planning"),
str_detect(Q1, "Student") ~ str_replace(Q12, "Study or lesson planning", "Studying"),
TRUE ~ str_replace(Q12, "Study or lesson planning", "Studying, Lesson Planning")
)
) %>%
ungroup()
# Account for potential duplicate values within Q12
df <- df %>%
mutate(Q12 = Q12 %>%
str_split(",\\s*") %>% # Split
map(~ unique(.)) %>% # Remove duplicates
map_chr(~ paste(.x, collapse = ",")) # Recombine
)
# Clearn up Responses for Q15yes_8_TEXT
df <- df %>%
mutate(Q15yes_8_TEXT = case_when(
is.na(Q15yes_8_TEXT) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Theft",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Tech Over-Dependence",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Tech Over-Dependence",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Tech Over-Dependence",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Theft",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Bias",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Theft",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "Bias",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
str_detect(Q15yes_8_TEXT, regex("<redacted>", ignore_case = TRUE)) ~ "NA",
TRUE ~ "NA"
)
)
# Clean Up Responses for Q15yes
df <- df %>%
rowwise() %>%
mutate(Q15yes = Q15yes %>%
str_replace_all('Academic dishonesty/plagiarism', 'Academic Dishonesty') %>%
str_replace_all('Being accused of academic dishonesty/plagiarism', 'Accusations of Academic Dishonesty') %>%
str_replace_all('Accuracy of information', 'Accuracy') %>%
str_replace_all('Environmental impact of Ai', 'Environmental Impact') %>%
str_replace_all('Over-dependency on technology', 'Tech Over-Dependence') %>%
str_replace_all('Job displacement', 'Job Displacement') %>%
str_replace_all('Privacy issues', 'Privacy') %>%
str_replace("Other \\(please specify\\):", Q15yes_8_TEXT)
) %>%
ungroup()
# Account for potential duplicate values within Q12
df <- df %>%
mutate(Q15yes = Q15yes %>%
str_split(",\\s*") %>% # Split
map(~ unique(.)) %>% # Remove duplicates
map_chr(~ paste(.x, collapse = ",")) # Recombine
)
# Clean Up Text From Q11
df <- df %>%
mutate(Q11 = case_when(
Q11 == 'Very high' ~ 'Very High',
Q11 == 'None' ~ 'No Skill',
TRUE ~ Q11
))
# Format Responses for Q18
df <- df %>%
mutate(Q18 = case_when(
is.na(Q18) ~ 'NA',
str_detect(Q18, regex("AI Center on-campus that provides on-site AI support and computers with AI tools", ignore_case = TRUE)) ~ "AI Center",
str_detect(Q18, regex("Books on AI tools/uses", ignore_case = TRUE)) ~ "Books",
str_detect(Q18, regex("Face-to-face workshop", ignore_case = TRUE)) ~ "FTF Workshop",
str_detect(Q18, regex("Virtual AI support", ignore_case = TRUE)) ~ "Virtual Support",
str_detect(Q18, regex("Virtual/Zoom workshop", ignore_case = TRUE)) ~ "Virtual Workshop",
str_detect(Q18, regex("None", ignore_case = TRUE)) ~ "None",
TRUE ~ Q18,
))
# Split into 3 separate dataframes for Student, Faculty, and Staff
df_student <- df %>%
filter(str_detect(Q1, "Student"))
df_faculty <- df %>%
filter(str_detect(Q1, "Faculty"))
df_staff <- df %>%
filter(str_detect(Q1, 'Staff') | str_detect(Q1, 'staff'))
3 Respondent Overview
The following includes a demographic overview of the survey respondents, including their roles at the college (student, faculty, staff), as well as race and gender. Much of the analysis for the remainder of the report will examine responses across each of these categories.
3.1 Breakdown by Role: Students, Faculty, and Staff
3.2 Academic and Administrative Areas
3.3 Demographics
Note: The numeric display of the counts is omitted if less than 10.
4 Technology Access, Usage, and Experience
Note: In the sections that follow, many graphs display percentage values. Because several questions allowed respondents to select multiple options (i.e., checkbox responses), the total percentages in some charts may exceed 100%.
4.1 Device Access
Which of the following devices do you have that are capable of utilizing AI applications for your work/schoolwork?
4.2 AI Usage Frequency
Note: Responses of “Once Weekly” were relatively small (
How often do you use AI tools (e.g. chatbots, virtual assistants)?
4.3 AI Tools in Use
Which AI tools do you use regularly?
4.4 Paid Usage
Do you pay for the use of an AI tool?
4.5 Self-Reported Skill Level
How would you rate your skill level using AI tools?
5 Concerns About AI
5.1 Presence of Concern
Do you have any concerns about using AI tools in your work/schoolwork?
5.2 Types of Concerns
What are your main concerns?
6 Application of AI
For what purposes do you use AI tools?
7 Perspectives on AI at QCC
7.1 Should QCC Provide More Tools?
Would you like to see more AI tools available for use at QCC?
7.2 Possible Additional Tools and Resources
8 Learning Opportunities and Support Needs
What kind of AI learning opportunity would you be open to trying?