A Customer Churn Prediction Project with PySpark


The project is part of the Udacity Data Scientist Nanodegree’s Capstone Project. One of the more common business problems, Customer Churn prediction is an important part of a Data Scientist’s role in a business and product setting. This project falls under the domain of the music industry, and one of the most important parts of today’s streaming world, is keeping customers hooked to the business by preventing churn.

Similar to Spotify, this project uses similar data for a hypothetical Sparkify — a music streaming service, with different information regarding each session of operation…

….and why it is recommended too!

The steps I took into making my Master’s Thesis in Economic Analysis can be summed up as: 1. Identifying an economic problem, 2. Identifying the data needed to solve the problem, 3. Gathering the data and cleaning it, 4. Exploration of the data, 5. Modelling the data (statistical modelling, predictive modelling, you name it), 6. Communicating the findings, tying it back to how it solves the economic problem asked in step 1.

Extremely similar, a data scientist needs to have a good business understanding, then the data understanding, gathering the data, EDA (Exploratory Data…

Part 2 — Statistical and Econometric Considerations

How well do you think the middle ice needle represents the average dispersion of the water droplets around it?

A General Trend

Panel — noun. a flat board on which instruments or controls are fixed.

Remembering this definition, a flat board is typically rectangular, and in three dimensions, a cuboid. Understanding panel data is as simple as this. We have exports — the variable we would like to measure the changes of. One dimension. The changes have to be measured over some frame of previous reference — time. So, time periods from t-n to t (n…

Part 1 — International Trade’s most sought-after econometric model

In a typical procedure for writing Master’s thesis, I had to develop a theoretical base, upon which I conduct empirical econometric analysis. This is in order to validate/test/disprove the theory, or simply just replicate an existing empirical study based on the same theory/theories. But the beauty of the Gravity Model of International Trade, is that it is not quite a theory, but an econom(etr)ic model, which is empirically successful in explaining trade flows between two countries.

To go in-depth into the foundations of the gravity model, and how it is estimated…

A Data Analysis of Singapore’s Airbnb Listings

Udacity’s Data Scientist Nanodegree

This is a brief summary of the results obtained in the Airbnb data analysis: part of the Udacity Data Scientist Nanodegree program.

When I was in Singapore in 2013, I couldn’t help but look at the sheer number of high-raised apartments (South East Asian/South Asian style). Apart from being a financial and an entrepot hub in Asia, Singapore also hosts several tourist destinations. As a tourist, I know that my initial question would be: ‘where am I going to stay?’

In the 21st century, one of the most popular short term stay options…

We are all well aware of the concept of time. Some say it doesn’t exist and is simply a demon created by the human race itself, just to become enslaved to it. Some say time is everything, and they’re right, and they don’t really see it as a Godly figure or an oppressor that enslaves us. Obviously, time is just a parameter, a standard of measurement of the length of each and every activity a human does, based on which the whole framework of life functions and the whole livelihood operates on the planet. Given that, we are granted the…

Nithin Gopalakrishnan

