I've been investigating text mining through the recommended book (Kwartler, 2017). Some interesting processing on Delta airlines support tweets to remove stopwords and punctuation, lowercase the data. Then from the processed block of tweets a 'Term Document Matrix', or TDM was made. This is a list of words and the frequency of where they appear. The diagram essentially shows the way certain words cluster together. The 'Height' represents the "distance" between words in the matrix.
![]() |
Figure 1.1. Dendrogram from Delta airlines tweets |
Another way to look at this is by colouring the branches. Representing the dendrogram in a circle gives a different perspective too (Fig 1.2).
![]() |
Figure 1.2 Circular dendrogram from Delta airlines tweets |
I should reiterate that whilst these dendrograms formed the basis of my understanding, they are worked examples from Ted Kwartler's book. I just thought they looked cool!
References
Kwartler, T. (2017). Text mining in practice with R. John Wiley & Sons.