Linear Programming - College Football Power Rankings

This tool takes in statistics from college football games as input and generates Power Rankings for each team for Rush, Pass, Returns, and Special Teams. The rankings of each team are dependent on the team’s performance relative to all other teams they played games against.

The Python script solves using a linearly constrained quadratic optimization problem. The method used is “minimize” from the scipy.optimize library. The objective function is to minimize the squared error for combined offense/defense scores for each ranking.

The Python script works as an API (setup using the Flask library) which listens for / receives data (via http) from Excel, then solves and populates the Power Rankings in the Excel sheet. The Excel sheet uses a VBA script to send/receive via http.

Python API script:
Python API

Excel File (this is the front-end user interface to run the optimizer):
Excel Data File

Computer Vision - Identifying Blurry Photos using a CNN

My goal here is to classify good and bad quality photos using a neural network. The idea for a tool like this hit me when I uploaded several hundred photos from my digital SLR camera and needed to sort through them all. In particular, just getting all the blurry, dark, or overexposed photos out of the way would be a big help.

Creative Feature Engineering - Apartment Listings Popularity

This dataset is part of a Kaggle competition hosted in conjunction with renthop.com. The apartment listings are from their site, and the goal was to predict the popularity of a particular listing (High, Medium, or Low) based on a variety of features.

Since each listing had an associated manager_id (See data summary in Line 3 in the Notebook), I wanted to see the impact of the listing manager on popularity. Using the provided data, I created an additional feature called “manager skill” which gave a score to each manager, based on a weighted count of high, medium, and low popularity listings (See Line 70). An important caveat was to split the data into training and test set BEFORE this weighted count, to avoid the leaking test set data into the calculated “manager skill”.

Results from the Random Forest analysis confirmed that “manager skill” was the strongest predictor of apartment listing popularity (See chart in Line 122).

See my iPython Notebook for my analysis:
AptListingsAnalysis.ipynb

Getting started with Heroku

I wanted to make a simple web app using Python and Flask. It turns out launching an app for all the world to see is quick and easy with Heroku. I figured out what I needed at minimum to run my app via Heroku and thought I’d present my files here for future reference.

Exploratory Analysis - New York MTA subway data

In this project I used publicly available MTA subway turnstile data, as well as city data, to recommend the best way to deploy recruitment / promotion teams to target diversity / demographics.