library(jsonlite)
library(tidyverse)
library(tidytext)
library(wordcloud2)
library(textdata)
library(ggplot2)
library(wordcloud2)
library(gridExtra)
Text Analysis Project- TAME IMPALA LYRICS ACROSS THE ALBUMS
TAME IMPALA: LYRICAL PROGRESSIONS OF LOVE & PSYCHEDELICS
INTRODUCTION:
Tame Impala, which is comprised of Kevin Parker, is a Psychedelic Alternative/Indie band from Australia. With work ranging from 2008, the band originally came into the Rock scene with Parker’s debut self-titled EP “Tame Impala.” Since then, Tame Impala has been a prominent force in the music industry, known for the unique blend of psychedelic and indie rock music, as well as a distinctive rock sound. However, despite this success, Tame Impala’s music itself has undergone a quite distinct change between Parker’s first two albums. The first two albums are specifically described by music aficionados as a sort of rock-psychedelic centered landscape, with heavy rock influences. However, the latest two albums take a much deeper dive into R&B, Pop, and Alternative-Rock centered themes.
Based on this, I created a few main hypotheses to see how Tame Impala’s lyrics have possibly shifted throughout time, and how these progressions may mirror Kevin Parker’s shift in music creation & focus.
Hypothesis:
In terms of the hypothesis for this project, I am predicting that:
The top 10 words overall will be heavily related to psychedelic terms, relating to themes about the mind and time.
The top 10 words in the albums labeled ‘Early’ (AKA Lonerism & Innerspeaker) will contain more words that are ‘psychedelic’ when compared to the albums labeled ‘Late’.
Conversely, the top 10 words in the albums labeled ‘Late’ (AKA Currents & The Slow Rush) will contain more words that are ‘love-related’ when compared to the albums labeled ‘Early’.
Psychedelic Lexicons will show an overall increase in later albums, and Love Lexicons will show an overall decrease in later albums
Reasoning:
In terms of why I believe these hypotheses will be justified in this project, I believe that the shift to more of an R&B, Pop, and Alternative-Rock sound would also mirror a shift in love-centered lyrics, as many music in those genres mainly focus on romantic relationships and love as their main themes in comparison to rock music. Similarly, Parker curated ‘Currents’ as a breakup album, following the breakup with his long-term girlfriend Melody Prochet, who had been with him for the earlier two album releases. Consequently, I believe the music and albums following his breakup (so, Currents and The Slow Rush) will contain much less time-centric or psychedelic themes, ultimately beign replaced with more stories and sentiments about love.
If you want to check out a great mini-video essay on the album ‘Currents’ as a breakup album, check it out: here!
Load Libraries
We start off this project by loading each of the libraries necessary to visualize and analyze our data for this project.
Import Data Set
Next, we import the data from a JSON file, including joined sets of both lyrics and albums.
<- fromJSON('Lyrics_TameImpala.json', flatten = TRUE)
ti
as.data.frame(ti$songs$album.name) -> df2
as.data.frame(ti$songs$lyrics) -> df1
%>%
df1 cross_join(df2) -> joined
colnames(joined)[1] <- "lyrics"
colnames(joined)[2] <- "album"
%>%
joined unnest_tokens(word, lyrics) -> joined2
Filter the Data Set by Main Albums
Here we filter the data set based on the four main released albums that Tame Impala has published. Remixes, B-sides, Limited Editions, and Extended Editions were omitted from this filtering as it would duplicate many of the lyrics on songs that appear more than once.
%>%
joined2 filter(album %in% c("Currents", "Lonerism (iTunes Edition)", "Innerspeaker", "The Slow Rush (Limited Edition Vinyl)")) -> joined3
Group the Albums by Early and Late
In order to view the difference in most common words between the first two albums and most recent two albums, we need to indicate this in our data. So, in order to do so, we utilize the mutate function in order to implement an extra column describing the time period (named “Period) in which the song and lyric showed up in. The case_when function is utilized in order to describe each album. So, in turn, this function will indicate for each song and lyric whether or not the album is considered”Early” (so, the song it came from was from either Lonerism or Innerspeaker), or “Late” (so, the song it came from was from either Currents or The Slow Rush).
%>%
joined3 mutate(
"Period" =
case_when(
%in% c('Lonerism (iTunes Edition)', 'Innerspeaker') ~ "Early",
album %in% c('Currents', 'The Slow Rush (Limited Edition Vinyl') ~ "Late",
album TRUE ~ NA
)-> joined4 )
Top Words Over All Albums
Before we look at the top words over the early and late albums, I thought it could be impactful to investigate the top words over all albums. To do so, we filter out any stop words, as well as other words that are not necessarily lyrics (i.e.: chorus, verse, etc.)
%>%
joined4 anti_join(stop_words) %>%
filter(!word %in% c('lyrics', 'chorus', 'verse', 'intro', 'outro', 'bridge',
'ah', 'ahhh',
'1','2','3','4', 'Contributors')) %>%
count(word, sort = TRUE) %>%
head(10)
Joining with `by = join_by(word)`
word n
1 time 8058
2 gotta 7650
3 feel 6120
4 love 5916
5 day 3570
6 mind 3009
7 close 2958
8 gonna 2856
9 life 2754
10 closer 2652
WordCloud of Top Words Over All Albums
%>%
joined4 anti_join(stop_words) %>%
filter(!word %in% c('lyrics', 'chorus', 'verse', 'intro', 'outro', 'bridge',
'ah', 'ahhh',
'1','2','3','4', 'Contributors', 'likeembed', 'pre', 'ooh',
'instrumental', 'yeah')) %>%
count(word, sort = TRUE) %>%
head(30) %>%
wordcloud2(color='random-light')
Joining with `by = join_by(word)`
As we can see based on this, the top word used overall in all songs is ‘time’, again relating back to the themes of the past and consciousness. Gotta is a close second, next followed by words like feel, love, and mind. Based on this visualization, we can expect that many of the themes in Tame Impala’s music will be surrounding time and love.
This somewhat supports our hypothesis that the main themes would be surrounding psychedelics and time, with words like ‘mind’, ‘time’, and ‘day’, all making it on the list.
Top Words Over First Two Albums
Next, we look at the top words over the first two albums, Lonerism and Innerspeaker. To do so, we indicate the period as early and again filter out extraneous stop words and lyrics. We use the head function to display the top ten words in an object called early albums.
%>%
joined4 anti_join(stop_words) %>%
filter(!word %in% c('lyrics', 'chorus', 'verse', 'intro', 'outro', 'bridge',
'ah', 'ahhh',
'1','2','3','4', 'Contributors') & Period %in% "Early") %>%
count(word, sort = TRUE) %>%
head(10) -> earlyalbums
Joining with `by = join_by(word)`
Graph Top Words for First Two Albums
We assign this to the word ‘left’ in order to later combine these graphs into one visualization to better compare them.
ggplot(earlyalbums, aes(x = reorder(word, -n), y = n)) +
geom_bar(stat = "identity", fill = "magenta") +
labs(title = "Top 10 Words in First Two Albums",
x = "Word",
y = "Frequency") +
theme_classic() -> left
Top Words Over Last Two Albums
Next, we look at the top words over the last two albums, Currents and The Slow Rush. To do so, we indicate the period as late and, again, filter out extraneous stop words and lyrics. We use the head function to display the top ten words in an object called latealbums.
%>%
joined4 anti_join(stop_words) %>%
filter(!word %in% c('lyrics', 'chorus', 'verse', 'intro', 'outro', 'bridge',
'ah', 'ahhh',
'1','2','3','4', 'Contributors') & Period %in% "Late") %>%
count(word, sort = TRUE) %>%
head(10) -> latealbums
Joining with `by = join_by(word)`
Graph Top Words for Last Two Albums
We assign this visualization to ‘right’ in order to later view these side-by-side in a comparison
ggplot(latealbums, aes(x = reorder(word, -n), y = n)) +
geom_bar(stat = "identity", fill = "purple") +
labs(title = "Top 10 Words in Last Two Albums",
x = "Word",
y = "Frequency") +
scale_y_continuous(name="Frequency", limits=c(0, 4000))+
theme_classic() -> right
Let’s Compare!
So, what do we see side-by-side?
grid.arrange(left, right, ncol = 2)
It seems that the rankings of words actually don’t change at all. In a surprising sense, they are still in the same order and in the same rankings, just with different frequencies of each. The other words and rankings tend to be entirely the same between albums, just with much less of a frequency in the later albums. This, in turn, shows that the differences between these two album eras might not be as drastic as we might have initially thought.
Psychedelic & Love Lexicons:
Moving on, we seek to investigate how love lyrics and psychedelic lyrics compare.
%>%
joined4 anti_join(stop_words) %>%
filter(!word %in% c('lyrics', 'chorus', 'verse', 'intro', 'outro', 'bridge',
'ah', 'ahhh',
'1','2','3','4', 'Contributors') & Period %in% "Early") %>%
count(word, sort = TRUE) -> allearlyalbums
Joining with `by = join_by(word)`
%>%
joined4 anti_join(stop_words) %>%
filter(!word %in% c('lyrics', 'chorus', 'verse', 'intro', 'outro', 'bridge',
'ah', 'ahhh',
'1','2','3','4', 'Contributors') & Period %in% "Late") %>%
count(word, sort = TRUE) -> alllatealbums
Joining with `by = join_by(word)`
Create a Lexicon of ‘Psychedelic’ Words:
We start by building a lexicon of ‘psychedelic words’ based on frequent words that are associated with this genre of music. We assign this to an object called psychedelic-words in order to later use it to pull out the frequencies in our observations.
<- c('mind', 'conscience', 'mental', 'trance', 'vision', 'trip',
psychedelic_words 'lucid', 'transcend', 'surreal', 'dream', 'psychedelic',
'dreams')
Show Overall Counts of Psychedelic Words Over All Songs:
Next, we want to see what the numbers show for psychedelic words over all songs and albums. We use the wordcloud function to visualize this.
%>%
joined4 filter(word %in% psychedelic_words) %>%
count(word, sort = TRUE) %>%
wordcloud2(color='random-light')
In an overwhelming visualization, we can see how the most frequently used word in Tame Impala’s lyrics overall when it relates to the ‘psychedelic lexicon’ is easily ‘mind’, overpowering nearly all other lyrics. However, we do see some other standouts with words relating to dreams and visions, but not nearly as much as ‘mind’ is mentioned. This can be attributed to the several songs with mind in the title (i.e.: Why Won’t You Make Up Your Mind?” or “Mind Mischief”, as well as several lyrics referring to mental states and awareness.
Show Psychedelic Lyrics by Early Albums
Next, we want to investigate how the psychedelic lexicon differs between early and late albums. To do so, we filter out the psychedelic lexicon of words between psychearly and psychlate.
%>%
joined4 filter(word %in% psychedelic_words & Period %in% "Early") %>%
count(word, sort = TRUE) -> psychearly
Show Psychedelic Lyrics by Late Albums
%>%
joined4 filter(word %in% psychedelic_words & Period %in% "Late") %>%
count(word, sort = TRUE) -> psychlate
Graph Both Side-By-Side
Finally, we want to graph both of these visualizations side by side to see what trend psychedelic lyrics have made over Tame Impala’s discography.
ggplot(psychearly, aes(word, n)) +
geom_bar(stat = "identity", fill = "purple") +
labs(title = "Early Albums",
x = "Word",
y = "Frequency") +
theme_classic() -> left2
ggplot(psychlate, aes(word, n)) +
geom_bar(stat = "identity", fill = "purple") +
labs(title = "Late Albums",
x = "Word",
y = "Frequency") +
scale_y_continuous(name="Frequency", limits=c(0, 1500))+
theme_classic() -> right2
grid.arrange(left2, right2, ncol = 2)
Splitting this by Early and Late albums, we again see a very similar trend amongst word frequency. Mind takes over a majority of the psychedelic lyrics in both the early and late albums, with other words following a very similar trend to one another. However, although these graphs look quite similar in visualizing this difference, it is noteworthy to recognize that there was an overall decrease in psychedelic lyrics from our lexicons when comparing the early albums to the later albums. This, in turn, supports our hypothesis regarding the idea that there will be more of a prevalence of psychedelic lyrics in the earlier Tame Impala albums, when compared to the later albums.
Create a Lexicon of ‘Love’ Words:
Next, we create a lexicon of ‘love words’. These include basic sentiments regarding relationships, love, and romantic content.
<- c('love', 'lover', 'heart', 'she', 'kiss', 'girl', 'babe', 'woman',
love_words 'her')
%>%
joined4 filter(word %in% love_words) %>%
count(word, sort = TRUE) %>%
wordcloud2(color='random-light')
Looking at the wordcloud of overall ‘love’ lyrics indicates that a majority of the romantic sentiments in Tame Impala’s music talks about love directly, with direct descriptions of women close behind.
Show Love Lyrics by Early Albums
Now, we separate the data by early and late albums in order to investigate if there is a difference in the frequency of ‘love’ lyrics between the two eras of music.
%>%
joined4 filter(word %in% love_words & Period %in% "Early") %>%
count(word, sort = TRUE)
word n
1 love 2900
2 her 1275
3 she 1150
4 heart 550
5 woman 400
6 girl 350
7 lover 200
8 babe 125
Show Love Lyrics by Late Albums
%>%
joined4 filter(word %in% love_words & Period %in% "Late") %>%
count(word, sort = TRUE)
word n
1 love 1508
2 her 663
3 she 598
4 heart 286
5 woman 208
6 girl 182
7 lover 104
8 babe 65
%>%
joined4 filter(word %in% love_words & Period %in% "Early") %>%
count(word, sort = TRUE) -> loveearly
%>%
joined4 filter(word %in% love_words & Period %in% "Late") %>%
count(word, sort = TRUE) -> lovelate
ggplot(loveearly, aes(word, n)) +
geom_bar(stat = "identity", fill = "purple") +
labs(title = "Early Albums",
x = "Word",
y = "Frequency") +
theme_classic() -> left3
ggplot(lovelate, aes(word, n)) +
geom_bar(stat = "identity", fill = "purple") +
labs(title = "Late Albums",
x = "Word",
y = "Frequency") +
scale_y_continuous(name="Frequency", limits=c(0, 3000))+
theme_classic() -> right3
grid.arrange(left3, right3, ncol = 2)
As we can see, there’s much more of distinct difference in these numbers. Although love is easily the standout regardless of album, we actually see much more of a prevalance of lyrics within the love lexicon in the early albums, with much fewer in the later albums. So, this is essentially backwards from what our original hypothesis predicted, indicating that the frequency of love words decreased from early albums to later albums, and psychedelic words showed a decrease in later albums.
Conclusions:
In a sad turn of events, we seem to witness that nearly all of our hypotheses were disproven, with the exception of the idea that the top 10 words overall would be related to psychedelic terms, relating to themes about the mind and time, as well as our hypothesis that there would be more psychedelic terms in earlier albums. Our hypotheses about love lyrics having a greater prevalence in later albums was exactly the opposite, seemingly having a greater presence in earlier albums!
In a backwards way, we were actually able to investigate how these albums differ in an unexpected way: with love lyrics showing a decrease in later albums when compared to the earlier ones, and psychedelic lyrics showing an increase from early albums to later ones. Although some of these are not the results we initially expected, it does allow for some insight into why this might be. Do the later albums contain more metaphorical references to love? Do the earlier albums contain more instrumentals? Are there more songs in total in earlier albums compared to later (or vice versa)? These are all valuable questions in understanding why our results played out this way. Regardless, we found some interesting sentiments about the work of Kevin Parker, and the impact of time, the mind, and love on his lyrical content.
Limitations:
In concluding this project, I thought it could be impactful to discuss and recognize several limitations that may have hindered these results, and could possibly explain why we don’t see what expected to see!
Firstly, the lexicons for both psychedelic words and love words were created simply out of lyrics that I associated with psychedelic ideas and love ideas. It is important to note that not all words that are used in these lexicons could be referring to psychedelic or love themes (i.e.: the usage of ‘her’, or ‘mind’ are not unique to love nor psychedelic themes, they can be indicative of other purposes in a song. Without the contexts of each song, it is difficult to indicate what the true frequency of each lexicon is across Tame Impala’s discography. In a similar sense, it is also important to note that there could be more lyrics about love and/or psychedelics that are not in the lexicon created, meaning they were inaccurately omitted from the results.
Next, we should recognize the plethora of data that was omitted from the original JSON file in order to have a better grasp of what could possibly be limiting about this lyric analysis. In subsectioning this data set, we eliminated many other albums that could possibly include repeat songs or performances, so there was a lower risk of duplicate lyrics. These omissions included live performances, Limited Editions, Remasterings, and Remixes/ b-sides. This may have inaccurately impacted our analysis and therefore conclusions as it was not a true investigation of the entire discography.
Finally, another limitation of note is the increase in lyrics. The results we see may not be entirely indicative of the pyrical content of the music, but possibly a greater usage of lyrics in their entirety by album. In other words, these results may just be true because there were more words overall in earlier albums compared to later ones. Similarly, there is a chance that the songs may have just been longer in different eras, indicating that the sentiments of the music itself may not be different, but seemingly just having more words in general.
Thanks for reading! :)