Bookstore Analysis

By Lorraine Sanares, James Herman & Lenni Migios

Aim

This assignment involved analysing datasets containing information about books and user reviews from an online bookstore. The main goal was to extract insights to help bookstore managers decide which books to purchase or avoid for optimal sales, and to recommend additional purchases to customers. For this open-ended task, our team developed the research question: What factors influence the book choices of different age groups? Our findings were shared through a presentation and a written technical report aimed at a managerial audience.

What we found

Our analysis applied three data-processing techniques (imputation, manipulation and tokenisation) and two unsupervised machine learning models (correlation and k-means clustering). This revealed several key insights:

Words like “fire” and “stone” appeal to younger customers, while “dark” appeals to older customers.
Books published between 1920 and the 1970s have higher average ratings.
More users prefer contemporary books.
Books published before the 1950s are favoured by 25-40-year-olds.

From this information, we derived three key reader types:

Based on these reader types, the following recommendations were made for new book selection:

Purchase dark fiction older customers and fantasy younger customers.
Purchase contemporary books published after 1980 because they appeal to majority of customers.
Created targeted marketing campaigns for each reader type.

The written technical report is 12 pages, consisting approx. 3600 words. The final grade received -including both the report and oral presentation – was a H1 (+80%). The full report can be found in my GitHub repo:

Bookstore Analysis

Aim

What we found

Libraries

Pandas

Matplotlib

Scikit learn

Scipy

Nltk