Data Science Projects

nSpectr.org: Predicting Restaurant Health Violations in Boston

Can machine learning make us healthier?

As a fellow in Insight Data Science, I built nspectr.org, an app that predicts severe health violations in Boston restaurants. Using a unique data set from Yelp and the city of Boston, I used natural language processing (NLP) techniques to encode the textual data. I implemented several models, including a random forest and gradient boosted trees. I also visualized the model's predictions with an interactive map of the city. You can view the code for the models on Github.
Tools: Python, R.

Skills: Implemented tf-idf in Python and cleaned up all the text data before building several machine learning models to predict severe violations.


Kaggle Competitions

Teaching machines to learn

I have competed in several Kaggle competitions, including classifying purchase decisions for insurance in the Homesite Quote Conversion Challenge, quantifying property hazards in the Liberty Mutual Group Challenge, and classifying products in the Otto Group Classification Challenge, placing in the top 13%, 15%, and 18%, respectively, out of thousands of competitors.

Tools: R (xgboost, h2o, caret)


Social Media and Fitness

Can a social media site help you go to the gym?

Marketers wonder if online clicks can lead to offline behavior. To answer this question, we built a social media site as part of a fitness program that enrolled more than 1,000 graduate and professional students at UPenn. We found that social influence from peers in an online social media site was more effective at getting people to show up to fitness classes than traditional online advertisement messages. In a follow-up study, we examined which aspects of social influence were most effective on the social media site. The results have implications for the design of online web platforms to promote offline behavior change.



Tools: SQL, R

Helsinki talk


Deciding Where to Move

Using data science to inform our moving decision

When we were deciding where to relocate to, we wanted to make an informed decision. The weather is really important to me (Seasonal Affective Disorder is a real thing!). But we also wanted information about commuting times, the price of housing, and taxes. I built a web scraper to gather information about our locations, and then made some charts in R to help our decision.

Tools: Python, R.

Skills: Implements a complete data pipeline from gathering information on the web with Python, to storing it, and then processing it in R, and generating plots and an html output document available here. Code is available on Github.


The Tipping Point

When do products take off?

The search for a tipping point has vexed marketers and social movements alike. This project examines several simulations to show how the number of initial adherents to a product or cause influences its adoption rate.

Code for these simulations is available on Github.

Tools: Python