Deep Rapping Character Level Neural Modelsfor Automated Rap Lyrics Composition

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/332877990

Deep Rapping Character Level Neural Models for Automated Rap Lyrics
Composition

Conference Paper · December 2018

CITATIONS READS

2 1,006

2 authors:

Ken Jon Mabasa Tarnate Aaron Carl Fernandez


Mapúa Institute of Technology Mapúa Institute of Technology
3 PUBLICATIONS 10 CITATIONS 6 PUBLICATIONS 7 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Ken Jon Mabasa Tarnate on 06 May 2019.

The user has requested enhancement of the downloaded file.


International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-2S December, 2018

Deep Rapping: Character Level Neural Models


for Automated Rap Lyrics Composition
Aaron Carl T. Fernandez, Ken Jon M. Tarnate, Madhavi Devaraj

Abstract: “Dope”, “Twerk”, “YOLO”, these are just some of


the words that originated from rap music which made into the II. LITERATURE REVIEW
Oxford dictionary. Rap lyrics break the traditional structure of
English, making use of shorten and invented words to create This work has been inspired by DeepBeat [5], an online rap
rhythmic lines and inject informality, humor, and attitude in the lyrics generator tool built on the RankSVM algorithm and
music. In this paper, we attack this domain on a computational multi-layered artificial neural networks. It is a prediction
perspective, by implementing deep learning models that could model that assembles rap lyrics line-by-line using intact
forge rap lyrics through unsupervised character prediction. Our
lines from existing rap songs. The results were promising as
work employed novel recurrent neural networks for the task at
hand and showed that these can emulate human creativity in rap it shows that it can outperform top human rappers in terms
lyrics composition based on qualitative analysis, rhyme density of length and rhyme frequency by 21%.However, we
score, and Turing test performed on computer science students. differentiate this work by developing an artificial
Keywords: Gated Recurrent Unit; Long Short-Term Memory; intelligence, which not just intertwines existing rap lines to
Natural Language Generation; Recurrent Neural Networks. generate lyrics but instead, build this on the character level
that could resemble human performance as closely as
I. INTRODUCTION possible.Syntactic text generation is further broken down to
Rap lyrics are stigmatized as offensive, profane, and sentence-level, word-level and character-level. The latter is
exemplified in the generative language model of Sutskever
inappropriate, heavily laced with words that belong to the
et al. [6], wherein the authors trained a recurrent neural
semantic field of sex, drugs, and discrimination [1]. To
justify our inclination on such domain for scholastic reasons, network on the character sequences of Wikipedia, New
York Times and machine learning papers from Conference
William Shakespeare, whom has been credited by the
on Neural Information Processing Systems and Journal of
Oxford English Dictionary for coining up to 2,000 words, is
found to be out worded by most modern-day rappers such as Machine Learning Research. It was considered as the largest
implementation of recurrent neural network at that time,
Aesop Rock, who has 7,392 unique words under his belt
compared to the celebrated poet who only had 5,170 [2]. training their model of 500 hidden layers with 1,500 units
each for five days on a parallel system of 8 high-end GPUs
This insinuates that rap lyrics have larger vocabulary than
with 4GB RAM each and data amounting up to 300MB.
the English literature and opens the question “How can we
build an artificial intelligence that can come up with a Their work has proven that a large vocabulary of words,
grammatical and punctuation rules can be learned at the
vocabulary of such magnitude?”. The question led us to
deep learning, a subset of artificial intelligence that can character level. This has further been bolstered by Graves
[3], which achieved the same success on a relatively smaller
generate music, text and motion capture [3] using multi-
data and with a different type of recurrent neural network
layered artificial neural networks. Artificial neural networks,
specifically recurrent neural networks, have an internal called the “Long Short-Term Memory” neural network or
LSTM [7], a type of recurrent neural network, which can
memory that maintains previously calculated results,
amend its predecessor’s instability when generating
allowing information to persist. There is an abundance of its
application in the current literature, along with its sequences. His work has shown that character-level LSTMs
can outperform word-level LSTMs on discrete sequence
successors, Long Short-Term Memory and Gated Recurrent
Unit. This paper investigates the automatic composition of generation and argued that predicting one character at a time
allows a more interesting text generation as it is capable of
rap lyrics on different recurrent neural network architectures
inventing novel words and strings [3].Having said that,
and assess its quality through cross-entropy loss, rhyme
density score and human evaluation. Rhyme is considered as writing rap lyrics using “Long Short-Term Memory” neural
networks had already been executed by Potash et al. [8],
an important characteristic feature of rap lyrics [4] but in
whose objective is to generate rap lyrics in the similar style of
this study, we put more weight on the merit of how close
a given rapper but not identical to his existing lyrics, emulating
machines emulate human writing.
the task of ghostwriting. They compared their work with the
model of Barbieri et al. [9], which employed constrained
Markov processes on the same task.

Revised Manuscript Received on December 28, 2018


Aaron Carl T. Fernandez Mapua University, Manila, Philippines
Ken Jon M. Tarnate Mapua University, Manila, Philippines
Dr. Madhavi Devaraj Mapua University, Manila, Philippines

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: BS2651128218/19©BEIESP 306 & Sciences Publication
Deep Rapping: Character Level Neural Models for Automated Rap Lyrics Composition

To quantify the critical aspect of their experiment, which is vector of integer numbers using the character tokens. The
to produce similar yet different lyrics, they evaluated their resulting data vector is reformatted into input and target
models by correlating the cosine similarity between the sequences, which what will be fed into the recurrent neural
existing and generated lyrics using the “Inverse Document network. The target sequences are configured to have only
Frequency” algorithm, as well as, computing its “Rhyme one time-step ahead of the input sequences to force the
Density” score. algorithm to make predictions one character at a time.The
Rhyme density is the total number of rhymed syllables over reformatting starts by taking strips of the input and target
the total number of syllables. It was formulated by Hirjee et data vectors at an interval of 25 characters and cutting the
al. [10], who defined different methods of racking up resulting strips into 50-character sequences after which, the
potential rhymes based on the phonetic frequencies in rap resulting sequences are stacked altogether. The resulting
lyrics. It is worth noting that it has not only been applied in data is then again reformatted so that the first sequence in
empirical works in rap music [5, 8, 11] but in lyrical the 𝑛𝑡ℎ batch picks up exactly where the sequence in the
analysis [12], and computational poetry evaluation [13] as 𝑛 − 1 𝑡ℎ batch left off because a recurrent neural network
well.While the authors had been successful in exhibiting cell state does not reset between batches in a “stateful”
how LSTMs outperformed their baseline Markov model in model. Finally, the target data is embedded with an extra
producing rap lyrics that has the same rhyming style of the axis to work with the sparse categorical cross-entropy loss
target artists, a more recent recurrent neural network called function, which is what will be used to evaluate the training
“Gated Recurrent Unit” [14] has demonstrated to converge of the recurrent neural network. Once the neural network is
faster and perform better than LSTMs on modelling tasks fully trained, an initial seed is picked out randomly and
such as polyphonic music and speech signal generation [15]. transformed into a vector of integer numbers using the
Although, it has been noted that the results were still preserved meta-model earlier. The succeeding character
preliminary and not conclusive. Prospective experiments on tokens with the maximum probability are then predicted and
these contemporary gated units is a viable supplement in its appended into the vector one at a time. The character token
current literature. prediction employs a temperature parameter, which value is
used to divide the natural logarithm of the probabilities array
III. METHODOLOGY to smoothen or sharpen it. The intuition behind this is that a
3.1. Character Level Language Model lower temperature value dictates a more conservative rap
lyrics generation while a higher value will be more creative
Generating text one character at a time often yield worse and diverse at the cost of more mistakes. The prediction
results compared to word–level language models [3]. The iterates depending on the length of the rap lyrics to be
latter can make decisions at a coarser granularity and has a generated, which is specified by the user. Once finished, the
lower probability of producing spelling errors, which the resulting integer vector is de-tokenized, forging a new set of
former suffers at. However, rap lyrics often put into use rap lyrics.
large amounts of colloquial vocabulary and slangs from the
subculture [16] such as “fam”, “shawty”, “gangsta”, “po- 3.2 Evaluation Metrics
po”, “OG” etc. Moreover, real-world rap lyrics available in Sparse Categorical Cross Entropy
the internet are usually crowdsourced, making it susceptible
to inconsistencies, typographical errors, and spelling Guided by the Shannon’s source coding theorem [18], the
mistakes. This can significantly increase the size of the minimum space we can compress the information in song
vocabulary by having several versions of the same word lyrics is given by the entropy of the distribution over the
[17] resulting to extremely large output nodes prone to lyrics space. This is cannot be computed directly. As an
floating-point errors and underflow that could severely alternative, its approximation by using the language model
deteriorate the quality of its predictions. In addition, novel to predict a set of lyrics it has not seen before can be used.
encoding techniques of input words in a word-level This approximation is called the cross-entropy measure [19].
language model may not be able to capture the similarity of Specifically, the sparse categorical cross-entropy, which is a
the written form of the words which could result to poor multiclass logarithmic loss that measures the dissimilarity
representation of infrequent words in the training data [17]. between the target distribution 𝑦 and the predicted
Hence, we opt to model our rap lyrics at the level of distribution 𝑦 ~ formulated as:
characters to overcome these limitations. Our language 𝐿𝑐𝑟𝑜𝑠𝑠 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑦 ~ , 𝑦 = −𝛴𝑖 𝑦𝑖 log 𝑦𝑖~ (1)
model first, finds some good default seed strings in the Rhyme Density
training corpus. This is done heuristically by splitting the
The rhyme density measure has been introduced in [10] as
data per new line and creating a random sample of 200
a quantitative measure of the technical quality of rap lyrics
unique lines from it, after which, we only take the top
from a rhyming perspective. It is formulated as the average
quartile of this random sample based on each line’s length
length of the longest rhyme per word [10] and is the same
for the reason of viewing longer lines as a more decent seed
metric used to evaluate the rap lyrics generation models of
string. Then, each character in the training corpus is
[5] and [8]. A tool called the “Rhyme Analyzer” [20] which
tokenized, preserving the original letter case of the letter
calculates the statistical features of the rhymes detected in
tokens and sorting it in a descending order according to how
common it appears. The seed strings and character tokens
are both preserved as meta-models, to be used during the rap
lyrics generation. Then, the training data is converted into a

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: BS2651128218/19©BEIESP 307 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-2S December, 2018

the input lyrics has been developed by the same authors of Fabolous, and Lil’ Wayne, as these are the rappers
[10] and is available for download1. identified in [8, 26], who attained the best rhyme scores.
This brought in 42,918 lyric lines in a 1.8MB file, which
Turing Test
we refer to as 𝐷𝑎𝑡𝑎𝑠𝑒𝑡𝑠𝑚𝑎𝑙𝑙 .
Finding human evaluators who are well versed in the
technicality of rap lyrics can be challenging [5, 21]. But 4.2. Training Hyperparameters
following Alan Turing’s paper “Computing Machinery and We tried different hyper-parameter settings on a plain
Intelligence”, wherein a machine is proposed to be tested recurrent neural network and using 𝐷𝑎𝑡𝑎𝑠𝑒𝑡𝑙𝑎𝑟 𝑔𝑒 , and found
based on how closely it can resemble human’s intelligent that three hidden layers with 512 neurons each, optimized
behavior [22], enables us to attain the objective truth on by ADAM [23] with a learning rate of 0.0001, may work
the quality of our rap lyrics by questioning human best for the task at hand. We applied the same hyper-
evaluators abstractedly if it was written by a human or a parameter settings on all the recurrent neural network
machine. Since the primary objective of this experiment is models and on both 𝐷𝑎𝑡𝑎𝑠𝑒𝑡𝑙𝑎𝑟𝑔𝑒 and 𝐷𝑎𝑡𝑎𝑠𝑒𝑡𝑠𝑚𝑎𝑙𝑙 to get a
to examine if a recurrent neural network can produce rap fair judgement on their performances for this experiment.
lyrics at the character level that are admissible to humans, We trained all models on a 12GB Tesla K80 for 400
we sampled our lyrics to 30 computer science students of epochs but with an early stopping rule if there is no loss
Mapua University, of whom 2 were at the doctorate level, improvement for 5 epochs on 𝐷𝑎𝑡𝑎𝑠𝑒𝑡𝑠𝑚𝑎𝑙𝑙 and 3 epochs
7 were at the master level, and 21 were undergraduates. on 𝐷𝑎𝑡𝑎𝑠𝑒𝑡𝑙𝑎𝑟𝑔𝑒 .
All participants were informed that the test was for a
machine-learning project to get a more conscientious V. RESULTS AND DISCUSSION
evaluation. We also asked the participants if they are
familiar with rap lyrics to which 13 responded “yes” and 5.1. Training Results
17 responded “no”. For this test, we sampled 10 6-bar Table 1 presents the actual training time clocked by our
verses for each of the RNN model, on both datasets, and for RNN models on both datasets. Clearly, plain recurrent
temperatures 0.2, 0.5, and 1.0 to test samples of low, neural networks converged the fastest, taking only almost 2
medium, and high diversity respectively. Of the 10 samples hours to train on a 1.8MB dataset and 15 hours on a larger
generated for each criterion set, we selected the best sample dataset of 27.1MB. Having said that, it was the worst
to be included in the test in terms of least spelling mistakes performer on both datasets in terms of cross-entropy loss.
and subjective quality. The participants were given three The disparity in the runtimes of LSTM and GRU on both
sets of tests, which were for low, medium, and high datasets suggests that GRU trains faster than LSTM on a
diversity set of generated lyrics. Each set has eight 6-bar rap small dataset but can be outran by the latter on a larger
lyrics, of which six were generated from our models, one dataset. However, this claim is still refutable, as a more
generated from deepbeat.org, and a human-written rap lyric thorough experiment concentrated on this facet is required
taken from ohhla.com. The participants were asked to make to make such conclusive assertion. In terms training quality,
a binary decision (human or machine) about the source of the shape of the training curves in Figure 1 indicate that the
the rap lyrics, which scores 1 point for each lyric perceived learning rate employed was just about right on all RNN
as “human” by the participants. models but scaling it down a little could have improved the
training. GRU outperformed LSTM and plain RNN on both
IV. EXPERIMENTS datasets. Although, the performance of LSTM is not far off
GRU.
4.1. Data
We collected 4,799 rap lyrics from The Original Hip-Hop Table 1. Clocked run times of plain recurrent neural
(Rap) Lyrics Archive2, the same lyrics source of [8]. The network, long short-term memory, and gated recurrent
corpus was cleaned by stripping all HTML tags, metadata unit on both datasets.
(artist name, song name, album name, INTRO, CHORUS, PRNN LSTM GRU
Repeat 2x, 4x, etc.), and whitespaces between lines. We also 𝐷𝑎𝑡𝑎𝑠𝑒𝑡𝐿 00:14:57:32 02:04:08:07 03:01:02:56
manually normalized the data to filter out any malformed
𝐷𝑎𝑡𝑎𝑠𝑒𝑡𝑆 00:01:39:09 00:08:06:02 00:06:14:01
metadata and non-English songs, which yielded 696,787
lyric lines amounting to a 27.1MB file. We refer to this
corpus as 𝐷𝑎𝑡𝑎𝑠𝑒𝑡𝑙𝑎𝑟𝑔𝑒 .
We hypothesized that the vast lyrical diversity among the
rap lyrics collected may degrade the quality of the
generated rap lyrics in terms of rhyme density. We are also
interested to examine the performance of the recurrent
neural network architectures discussed above on a smaller
dataset, and see the disparity compared to a larger dataset.
Hence, we created another corpus concentrated on the
songs of three rap artists namely: Notorious B.I.G,

1
https://sourceforge.net/projects/rhymeanalyzer/
2
http://ohhla.com/

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: BS2651128218/19©BEIESP 308 & Sciences Publication
Deep Rapping: Character Level Neural Models for Automated Rap Lyrics Composition

pointin
I ain't fuckin' with the bullet that's cool, not too
dead
I say fuck nigga pull up, I could use a baby
Girl I did it
And I got like ten bitches in the car with me there PRNN DL
with that pussy
I'm a heacher day in the hood now
I said I was a youngin' tell 'em I'm straight up
I got a black bags on the beat like a salad blade
I still really get to see a bitch and a nigga eeling
in the sky
Figure 1. Training loss of plain recurrent neural I can't trust a nigga out you
network, long short-term memory, and gated recurrent Now your kids gotta deal with this shit cause you LSTM DL
unit on 𝑫𝒂𝒕𝒂𝒔𝒆𝒕𝒍𝒂𝒓𝒈𝒆 (left) and 𝑫𝒂𝒕𝒂𝒔𝒆𝒕𝒔𝒎𝒂𝒍𝒍 (right). do
We goin' out to the moon and we get to the hood
5.2. Qualitative Analysis Cookin' gin, continue to me
Table 2 exhibits the generated rap lyrics when the It's like the moon months saying to the people
temperature value was set to 0.5. All models learned how that they say
to capitalize proper nouns and starting letters of each What's the deal with these hoes, and they want to
verse. We manually verified that all verse generated are see me Acting all me about where they home
original and entirely invented by our model, except for the nigga
first lines, which were the seeds that were taken from our You stuck dat kush inside dem sickles lot we got GRU DL
datasets. high
It is hard to judge the coherence of the generated rap And we gon' ride on you can see
lyrics due to the vast grammatical freedom in rap music. We gone my niggas gon bang
This makes it more difficult to differentiate the lyrical And we gon' be gettin' outta line
quality of each model subjectively as its distinctiveness is I know that you're lookin' like I ain't got this shit
imperceptible except for the minor spelling mistakes in the though
plain RNN samples on both datasets such as “connerfuckin”, I ain't got no signs to the note
“heacher”, and “eeling”, which do not even exist in our
5.3. Rhyme Density Scores
datasets. Although, this reinforces Grave’s statement in [3],
which mentioned that character level language models are We generated 100 16-bar verses of medium diversity for
capable of inventing new words on their own. each of the RNN models and on both datasets like in [5]. We
For the above reasons, we obtained an evaluation on an considered one bar as equivalent to one line following [24],
unbiased perspective such as rhyme density calculation and which mentioned that a typical 16-bar verse in a rap song is
Turing test to get a more reliable assessment. These are composed of 16 lines. We used the “Rhyme Analyzer” tool
discussed on the succeeding sections. [20] developed by Hirjee et al. [10] to analyze the rhyme
densities of our generated rap lyrics. This is presented on
Table 2. Generated Rap Lyrics Sample of Medium Table 3 and as shown, our generated rap lyrics suffered from
Diversity (Warning: Explicit Content). poor rhyme density score, which is distant to what DeepBeat
Generated Lyrics Model had achieved and did not even reach the threshold of human
rappers such as MC Hammer and Ice Cube, who had the
Only 2 years old when daddy used to bring them
lowest rhyme density of 0.19 [10]. This is expected as we
I don't know she me now cut I'm a flavorin'
did not consider rhymes programmatically. We only
It hurtin' and got a gun cold car anything
hypothesized that our generated rap lyrics would inherit the
I got that blow blow, on ya mouth like man
PRNN DS rhyme densities of our corpus, which were 0.28 for
I'm so steel I'm a motherfuckin Cash Money
𝐷𝑎𝑡𝑎𝑠𝑒𝑡𝑙𝑎𝑟𝑔𝑒 , and 0.32 for 𝐷𝑎𝑡𝑎𝑠𝑒𝑡𝑠𝑚𝑎𝑙𝑙 .
Millionaire
I'm a move ass connerfuckin niggas start Table 3. Average Rhyme Densities of all Trained Models.
shooking
I been peeping you too, nigga I see you shining LSTM DS Model and Dataset Rhyme Density Score
We ride this singles, hit the parkin' shit and then PRNN DS 0.14
I'm talkin' 'bout who, it's no fuckin' in my sky is LSTM DS 0.13
Stop playin' what try to fly the biggy down the GRU DS 0.12
faster PRNN DL 0.15
The niggas throwin splats, around the grave LSTM DL 0.13
She tell me what they be something GRU DL 0.15
But what did you say, you said that you never GRU DS
change
You can call me When I'ma Holly Grove
I know my homies from Hollywood, this ain't

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: BS2651128218/19©BEIESP 309 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-2S December, 2018

5.4. Turing Test Results computer science students into thinking that that it was
Figure 2 shows that DeepBeat’s generated rap lyrics written by a human rapper based on the Turing test
attained the highest score, being perceived as human-written performed. Although, it suffered low rhyme density as it
by 71% of the total participants not familiar with rap lyrics, was not considered programmatically. Finally, a potential
even outperforming the real human-written lyrics by one extension of this work is the incorporation of rhymes and
vote. This was almost matched by the LSTM model trained intelligibility in the algorithm to generate more rhythmic
on the Dataset_small, which achieved the best human and coherent rap lyrics. A hypothetical solution is to
evaluation out of all our trained models, deceiving 67% and generate rap lines at the character level and develop an
53% of the participants whom are acquainted and algorithm that will weave the generated rap lines into a rap
unaccustomed to rap lyrics respectively. The generated rap verse based on its accidence and rhyme density score. The
lyrics of plain RNN and GRU on both datasets received insistence of initially constructing the rap lyrics at the
satisfactory evaluation, deceiving half of the participants on character level is due to its offbeat structure and unorthodox
all tests. Overall, the result is positive as all our generated vocabulary, which is assumed difficult if built on a word
rap lyrics deceived a significant number of participants into level.
thinking that it was written by human rappers, which is the
primary goal of this experiment. We can attribute this REFERENCES
success to the offbeat structure in rap lyrics and its 1. M. Escoto and M. B. Torrens. Rap the Language. Publicaciones
unorthodox vocabulary. This worked in favor of character Didacticas 28 (2012)
2. M. Daniels. The Largest Vocabulary in Hip Hop. Retrieved August 5,
level text generation as mistakes in spelling and grammar, 2018 from https://pudding.cool/2017/02/ vocabulary/
and even its incoherence can be confused with its whimsical 3. Graves. Generating Sequences with Recurrent Neural Networks.
nature. Having said that, generating lyrics for other genres, arXiv:1308.0850, (2013)
4. K. Addanki and D. Wu. Unsupervised rhyme scheme identification in
which take form in lyrical coherence and conventional hip hop lyrics using hidden markov models. Proceedings of the First
vocabulary, may be difficult to generate at the character international conference on Statistical Language and Speech
level. Hence, generation at a higher level such as word, Processing, (2013)
5. E. Malmi, P. Takala, H. Toivonen, T. Raiko, and A. Gionis.
phrase, and sentence would be more appropriate.
DopeLearning: A Computational Approach to Rap Lyrics Generation.
Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, (2016)
6. Sutskever, J. Martens, and G. Hinton. Generating text with recurrent
neural networks. Proceedings of the 28th International Conference on
International Conference on Machine Learning (2011)
7. S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural
Comput. 9, 8 (1997)
8. P. Potash, A. Romanov, and A. Rumshisky. GhostWriter: Using an
LSTM for Automatic Rap Lyric Generation. Proceedings of the 2015
Conference on Empirical Methods in Natural Language Processing
(2015)
9. G. Barbieri, F. Pachet, P. Roy, and M. D. Esposti. Markov constraints
for generating lyrics with style. Proceedings of the 20th European
Conference on Artificial Intelligence (2012)
10. H. Hirjee and D. Brown. Using Automated Rhyme Detection to
Characterize Rhyming Style in Rap Music. Empirical Musicology
Review 5, 4 (2010)
11. N. Condit-Schultz. MCFlow: A Digital Corpus of Rap Transcriptions.
Empirical Musicology Review 11, 2 (2017)
12. M. Fell and C. Sporleder. Lyrics-based Analysis and Classification of
Figure 2. Human Evaluation in Percentage. Music. Proceedings of 25th International Conference on
Computational Linguistics (2014)
13. E. Lamb, D. G. Brown, and C. Clarke. Can Human Assistance
VI. CONCLUSION Improve a Computational Poet? Proceedings of Bridges 2015:
Mathematics, Music, Art, Architecture, Culture (2015)
We have investigated the performance of plain recurrent 14. Cho, B. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H.
neural network, long short-term memory, and gated Schwenk, and Y. Bengio. Learning Phrase Representations using
recurrent unit on the domain of automated rap lyrics RNN Encoder–Decoder for Statistical Machine Translation.
Proceedings of the 2014 Conference on Empirical Methods in Natural
composition at the character level on small and medium-
Language Processing (2014)
sized dataset. Plain recurrent neural network converged the 15. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical Evaluation of
fastest but obtained the worst cross-entropy loss. GRU Gated Recurrent Neural Networks on Sequence Modeling. NIPS 2014
outperformed both plain RNN and LSTM in terms of Workshop on Deep Learning (2014)
16. Wu, K. Addanki, and M. Saers. Freestyle: A Challenge-Response
training quality but was the slowest to train on a medium- System for Hip Hop Lyrics via Unsupervised Induction of Stochastic
sized dataset, taking more than 3 days to converge. Having Transduction Grammars. Proceedings of the Annual Conference of
said that, it trained 25% faster than LSTM on a smaller the International Speech Communication Association (2013)
17. P. Bojanowski, A. Joulin, and T. Mikolov. Alternative structures for
dataset. This suggests that the convergence of LSTM and character-level RNNs. arXiv:1511.06303 (2016)
GRU may depend on the dataset size. However, this 18. E. Shannon. A mathematical theory of communication. SIGMOBILE
assertion demands a more thorough and concentrated Mob. Comput. Commun. Rev. 5, 1 (2001)
investigation. All RNN models on both datasets learned the
basic syntax sentence structure of a human-written rap lyric,
even when built at the character level. Our machine-
generated rap lyrics convinced a significant number of

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: BS2651128218/19©BEIESP 310 & Sciences Publication
Deep Rapping: Character Level Neural Models for Automated Rap Lyrics Composition

19. P. T. de Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein. A


Tutorial Introduction to the Cross-Entropy Method. Annals of
Operations Research 134, 1 (2005)
20. H. Hirjee and D. G. Brown. Rhyme Analyzer: An Analysis Tool for
Rap Lyrics. Proceedings of 11th International Society for Music
Information Retrieval Conference (2010)
21. Wu, K. Addanki, M. Saers, and M. Beloucif. Learning to Freestyle:
Hip Hop Challenge-Response Induction via Transduction Rule
Segmentation. Proceedings of the 2013 Conference on Empirical
Methods in Natural Language Processing (2013)
22. M. Turing. Computing machinery and intelligence. In Computers &
thought (1995)
23. D. P. Kingma and J. Ba. Adam: A Method for Stochastic
Optimization. Proceedings of the 3rd International Conference on
Learning Representations (2015)
24. Paul Edwards. How to Rap: The Art & Science of the Hip-Hop MC,
Chicago: Chicago Review Press (2009)

AUTHORS
Aaron Carl T. Fernandez
He obtained his bachelor’s degree in
information technology from San
Beda College – Alabang,
Philippines, in 2011 and is currently
working toward a M.S. degree in
computer science at Mapua
University, Philippines. He is also a
full time Technology Lead specializing in mainframe
applications development for Infosys. His current research
interests are sequence modelling using artificial neural
networks and deep reinforcement learning for playing
games.

Ken Jon M. Tarnate


He obtained his bachelor’s degree in
information and communication
technology education from
Philippine Normal University –
Manila, Philippines, in 2014. He is
currently working toward a M.S.
degree in computer science at
Mapua University, Philippines. He is also a full-time scholar
of Engineering Research and Development for Technology
(ERDT) under Department of Science and Technology of
the Philippines.

Dr. Madhavi Devaraj


She graduated with a PhD in
Computer Science from Dr. A.P.J.
Abdul Kalam Technical University
(formerly Uttar Pradesh Technical
University) in Lucknow, Uttar
Pradesh, India in 2016. She took
up her Master in Philosophy in
Computer Science from the
Madurai Kamaraj University and Master of Computer
Applications from V.V.Vannaiaperumal College for Women
both in India in 2004 and 2000 respectively. She finished
her Bachelor of Science in Mathematics from the
Bharathidasan University - Government Arts College for
Women in 1997.

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: BS2651128218/19©BEIESP 311 & Sciences Publication
View publication stats

You might also like