Text Analysis with R for Students of Literature – Book Review

Screen Shot 2015-07-14 at 4.45.22 PMOur ability to access, process, and analyze large quantities of data has been increasing at a dizzying pace over the last few years. This data-driven revolution is fundamentally changing many professional and academic fields. Many people, especially the long-term practitioners in humanities and similar disciplines, find this change worrying, and in many ways exactly contrary to the spirit of these disciplines. Pouring over long and demanding texts, while internalizing them and becoming personally immersed in them, seems to be at the very core of what these disciplines are all about. And yet, as both a lover of humanities and a die-hard techy, I find this latest development incredibly exciting.

The title of this short book makes it eminently clear who the intended audience is: students of literature who are interested in using R for textual analysis. R is a very powerful programming language used for statistical analysis. Textual analysis is a very prominent aspect of modern data science, so there are many well-known and established tools and techniques that can help one with this task. However, the aim of this book is neither to teach R or programming, but to give the Literature students just the most basic tools needed to do some relatively straightforward textual analysis. The book jumps straight into the examples almost from the very first page. The obvious virtue of this approach is that you can start doing some interesting work rather quickly, and as long as your own research doesn’t depart dramatically from the examples given in the book you should be able to use the books as a reference and a primer for your own work. However, if you have some slightly more demanding problems that you are trying to work on, then after finishing this book you might want to go to a specialized book on R programming that will give you enough foundation to work on a larger variety of problems.

The book takes the freely available text file of “Moby Dick” and runs a variety of textual analysis on it: simple word count and word frequencies, correlations between various “special” words, context analysis, etc. In the latter chapters it moves from a single book to a corpus of books for more interesting look at themes across many texts. I found the last chapter on topic modeling especially fascinating, but way too brief. I guess I will now have to take a look at other sources to learn more about this line of analysis.

This books is very pedagogical in its style. Oftentimes the author would present two different solutions to a particular problem – one using a very simple yet hard to understand R command, and another broken down into several self-contained chunks. I find this approach very educational and helpful.

Even though this is primarily a book intended for literature students, I would actually strongly recommend it to anyone interested in text mining, text analysis and natural language processing. It is a very gentle and approachable introduction to the whole world of textual analysis.

**** Electronic version of the book provided for review purposes. ****

 

Bojan Tunguz

Bojan Tunguz was born in Bosnia and Herzegovina, which he and his family fled during the civil war for the neighboring Croatia. Over the past two decades he has studied, lived and worked in the United States. He is a theoretical physicist with degrees from Stanford and University of Illinois. Tunguz has taught physics at several prominent liberal arts colleges and has been writing about physics, science and technology for more than a decade. He also has a wide spectrum of interests, and reads and writes about current events, society, culture, religion and politics. Over the years he has reviewed many of the books that he has read, and posted his reviews on various online outlets. In 2011 he had become a top 10 reviewer on Amazon.com, where he continues to be very active. Aside from reading and writing, Tunguz enjoys traveling, digital photography, hiking, and fitness. He resides with his wife in Indiana. You can follow my review updates on the following pages as well: Facebook: http://www.facebook.com/tunguzreview Twitter: http://www.twitter.com/tunguzreviews Google+: https://plus.google.com/u/0/104312842297641697463/posts

Visit Website

There are no comments yet, add one below.

Leave a Comment

Your email address will not be published. Required fields are marked *

*