Kevin Vo’s portfolio
This is a collection of notebooks that I created for data analysis. All of the project titles are links to the notebooks on GitHub and are sorted from newest to oldest.
Guided Projects
These are guided projects that I have completed from Dataquest.io to learn the fundamental workflow and techniques of data analysis.
Guided Project: Predicting House Sale Prices Using Linear Regression in Python
In this project, I use linear regression in Python to predict the sale prices of houses between 2006 and 2010 in Ames, Iowa.
Predicting Car Prices Using K-Nearest Neighbors
I attempt to estimate car prices using K-Nearest Neighbors regression (instance-based learning) to fit a model to a training dataset and apply it to a test dataset for the prediction. This project utilized k-NN regression, error metrixcs, test/train validation, and k-fold cross validation.
Winning Jeopardy
In this project, I looked through 20,000 past questions from the game show Jeopardy and use probabilities, chi-squared tests, and significance levels to see if there are any patterns that could give a contestant an edge over their competitors.
Finding the Best Markets to Advertise In
In this guided project, I assume a role in an e-learning company like Dataquest.io and try to find the two best markets to advertise our product in by using different averages and z-scores. Let’s assume that most of our courses are on web and mobile development, but we also cover many other domains, like data science, game development, etc.
Investigating Fandango Movie Ratings
In this project, I apply basic statisics to look at the aftermath of Walt Hickey’s investigation into Fandango’s movie ratings, where Fandango claims a glitch in their system scored movies ratings higher than what fans actually rated.
Designing and Creating a Database
To fully understand how write queries to get the data you need, you need to have an idea of how databases are designed and created.
Answering Business Questions Using SQL
Databases are a great way to store great amount of data, given the amount of data you need to know how to join tables together and write more complex queries to get the data you need.
Analyzing CIA Factbook Data Using SQLite and Python
I analyzed CIA Factbook data by writing SQL queries to retrieve data to get summary statistics and perform additonal analysis in Python in this project.
Star Wars Survey
In this guided project, I analyze survey data about Star Wars that FiveThirtyEight conducted. This project focused on cleaning data and having it manner more suitable for analysis and presentation.
Analyzing NYC High School Data
Real world data is never clean, so it important to understand the fundamentals of cleaning and combining data. This project was an exercise on how to combined different dataset and clean the data (converting columns to the correct datatype, handling invalid data, etc.).
Visualizing The Gender Gap in College Degrees
This project builds upon the previous, focusing on making visualizations clearer by following guidelines put forth by Edward Tufte, a pioneer in the field of data visualization.
Visualizing Earnings Based on College Majors
I utilize the basics of Matplotlib to visualize data about earnings based on college majors and get a quick idea of the data I am looking at.
Analyzing Thanksgiving Dinners
In this project, I analyzed survey data about Thanksgiving dinners (how far the respondent traveled, what they ate, etc.) using NumPy and pandas.
Exploring Gun Deaths in the US
This guided project apply more intermediate elements of Python such as classes, regular expressions, datetime, etc.
Explore US Births
This is the first guided project on Dataquest, intended to utilize the basics of Python (for loops, if statements, functions, etc.).