UW-Madison Division of Extension Master Gardener SWOT Survey Analysis

Background and Data Collection

Data was collected via a Qualtrics survey. Individuals identified their role, years experience, gender, and age and what they perceive as the strengths, weaknesses, threats and opportunities in the Master Gardener Program.

Basic Information

The dataset contained 881 responses. Not all responses contained data for all questions.

The majority of respondents were female.

The majority of respondents had less than 5 years experience.

An overwhelming number of respondents were Master Gardener Volunteers.

The majority of the respondents are older than 65.

Preparing for Textual Analysis

Two kinds of text analysis were completed - topic modeling and term frequency-inverse document frequency weighting. In order to prepare the data for this analysis, the analyst:

converted all of the data to plain lowercase text
removed common words (the, of, is, master, garden, etc.)
identified parts of speech
retained nouns and proper nouns

After preparation, we had the following number of responses for each question. Note that overall, people had more to say about the programs strengths than any of the other topics.

Question	Total.Responses
Strengths	720
Weaknesses	673
Opportunities	631
Threats	604

Topic Modeling Results

The analyst created topic models for each question. The results of the models were not definitive. Each model worked best when it defined 5 or 6 individual topics. The best models are visible below:

Finding Meaningful Words

If we treat each SWOT question as a document in its own right, we can uncover the words that are frequent in a single question, but not frequent in the other 3 questions. This is a concept called term frequency-inverse document frequency. It helps us reveal the words that have the most significance for our question. The series of graphics below identifes the top terms within each question, slicing the data by some of our other factors. Any term that appeared only once was removed after computing the tf-idf scores. Terms with a score of zero (indicating that they appeared in all 4 questions) were also removed.

Top Terms Across All Responses

Top Terms Among People Less Than 45 Years Old

Top Terms Among People 45 and Older

Top Terms Among Males

Top Terms Among Females

Top Terms Among Young Men (Under 45)

Top Terms Among Older Women (45 and up)

N-grams

N-grams are N-word prhases. They are identified by a sliding window. For example, the sentence “The cat sat down” has the following n-word phrases (called bigrams):

the cat
cat sat
sat down

It has the following 3-word phrases (trigrams):

the cat sat
cat sat down

N-grams can be used as the tokens in tf-idf weighting. They often provide additional context.

The analyst generated these N-grams from the raw data, and then removed any N-gram containing a common word (a “stop word”). Any n-gram that appeared only once was removed after computing tf-idf scores, to avoid excessive data in the visualizations.

Top N-grams Across All Responses

Top N-grams Among People Less Than 45 Years Old

The sample size for people under 45 is small. Several categories of terms had no terms that appeared more than 1 time. Those categories are not included in these visualizations

UW-Madison Division of Extension Master Gardener SWOT Survey Analysis

Deanna Schneider

April 23, 2019