E-commerce
A Journey Through Time: My First Blog and Data Analysis with Spotify
A Journey Through Time: My First Blog and Data Analysis with Spotify
As a dedicated tech enthusiast with a passion for data science, I have a wealth of digital footprints that trace the evolution of blogging and data analysis. Delve into this journey as I share the history of my first blog and the sophisticated process of analyzing a music-streaming dataset using Python.
The Evolution of My Blog
My first blog is still running! It's one of the world's 12 earliest blogs, launched in 1995 on IBM's own ISP service for their customers. Early updates were a challenge due to 28 kbps or slower modems. In 1997, I moved it to Yahoo! GeoCities until it closed in 1999. Over the years, I've hosted it on platforms like OpenDiary, LiveJournal, MSN Spaces, and Windows Live Spaces, eventually settling on in 2008. Though it has been free hosted, it has never been monetized. (my blog Space)
Data Analysis with the Spotify Dataset
In my latest technical blog, I’ve decided to turn my focus to data analysis. I started with a dataset provided by Kaggle, focusing on music streaming company Spotify. This step-by-step guide will walk you through the process from data exploration to analysis using Python’s powerful libraries such as Pandas and Matplotlib.
Introduction to the Dataset
The dataset includes 586,672 entries and 20 columns. Key columns include:
Id: Unique identity for each song Name: Name of the track Popularity: Ranges from 0–100, indicating how popular the song is Duration: Time length of the track in milliseconds Explicit: Boolean value indicating whether the song is explicit Artist: Creators of the songLet's dive into the initial steps of data exploration using Python's Pandas library.
Data Exploration and Cleaning
First, we import the necessary libraries:
import numpy as npimport pandas as pdimport as pltimport seaborn as sns
Then, we load the data into a DataFrame:
tracks _csv('_csv')
Using the head() function, we can see the first few rows of the dataset:
tracks.head()
Next, we check for null values:
().sum()
The output will show the count of null values for each column, such as:
'Name': 71, 'dtype': int64
From the above output, we can see that the 'Name' column has 71 null values. We then sort the DataFrame by popularity to examine the least popular songs:
sorted_df _values('popularity', ascendingTrue).head(10)
To gain a statistical overview of the dataset, we use the describe() function:
().transpose()
This function provides details such as count, mean, standard deviation, and quartiles:
Count: 586672Mean: Deviation: ...25% Quartile: ...75% Quartile: ...
To filter tracks with high popularity, we apply:
most_popular tracks.query('popularity 90').sort_values('popularity', ascendingFalse)
This line of code retains only the tracks with a popularity score above 90 and sorts them by popularity:
most_popular[:10]
Finally, we change the index to Release Date:
_index('_release_date', inplaceTrue)
This step designates the release date as the new index, allowing for more meaningful analysis.
Conclusion: This blog post and dataset analysis serves as a comprehensive guide to getting started with data analysis using Python. Feel free to follow along and apply the techniques discussed to your own projects. Happy coding!