Data was collected via a Qualtrics survey. Individuals identified their role, years experience, gender, and age and what they perceive as the strengths, weaknesses, threats and opportunities in the Master Gardener Program.
The dataset contained 881 responses. Not all responses contained data for all questions.
The majority of respondents were female.
The majority of respondents had less than 5 years experience.
An overwhelming number of respondents were Master Gardener Volunteers.
The majority of the respondents are older than 65.
Two kinds of text analysis were completed - topic modeling and term frequency-inverse document frequency weighting. In order to prepare the data for this analysis, the analyst:
After preparation, we had the following number of responses for each question. Note that overall, people had more to say about the programs strengths than any of the other topics.
Question | Total.Responses |
---|---|
Strengths | 720 |
Weaknesses | 673 |
Opportunities | 631 |
Threats | 604 |
The analyst created topic models for each question. The results of the models were not definitive. Each model worked best when it defined 5 or 6 individual topics. The best models are visible below:
If we treat each SWOT question as a document in its own right, we can uncover the words that are frequent in a single question, but not frequent in the other 3 questions. This is a concept called term frequency-inverse document frequency. It helps us reveal the words that have the most significance for our question. The series of graphics below identifes the top terms within each question, slicing the data by some of our other factors. Any term that appeared only once was removed after computing the tf-idf scores. Terms with a score of zero (indicating that they appeared in all 4 questions) were also removed.
N-grams are N-word prhases. They are identified by a sliding window. For example, the sentence “The cat sat down” has the following n-word phrases (called bigrams):
It has the following 3-word phrases (trigrams):
N-grams can be used as the tokens in tf-idf weighting. They often provide additional context.
The analyst generated these N-grams from the raw data, and then removed any N-gram containing a common word (a “stop word”). Any n-gram that appeared only once was removed after computing tf-idf scores, to avoid excessive data in the visualizations.
The sample size for people under 45 is small. Several categories of terms had no terms that appeared more than 1 time. Those categories are not included in these visualizations
There is not enough data to produce trigrams for males.
We have very little data for this category. What’s available is shown.