Bio

Eugene Agronin is an empirical economist with a background in Finance, Statistical Modeling and Machine Learning. He has over 10 years of experience in solving problems for clients in multiple industries by generating insight from data and communicating results to both technical and non-technical audiences. Eugene’s projects involved companies in such industries as real estate, financial services, hi-tech, auto, oil & gas, mining, banking, insurance, gaming, dairy and soft beverages.

Eugene uses Python, R, Stata, SQL and NoSQL to solve problems and generate insight by applying statistical modeling and machine learning techniques.

Contact:

eagronin@gmail.com | https://www.linkedin.com/in/eagronin | https://eagronin.github.io/portfolio

Introduction

The projects in this portfolio are organized by phase of the project’s development process (i.e., data acquisition, cleaning, analysis and reporting) as summarized in the diagram below:

The projects apply R, Python, Spark and various query tools within a data science context to provide value in targeted marketing, detection of credit card fraud, investment in real estate and fire prevention using k-nearest neighbors, decision trees, random forest, support vector machine, linear regression, logistic regression, regularization, k-means clustering and principal component analysis.

The projects are summarized in the table below. You can click either on the project name to be directed to the project, or on any of the “x”-s in the columns on the right to be directed to the respective phase of the project’s development process.

Projects

No. Project Name Description Tools A c q u i r e P r e p a r e A n a l y z e R e p o r t -
[1] Predicting Home Values in Denver Predict home values using location, historical sales prices and home features. Python (SciKit Learn, Pandas) x x x x
[2] Market Basket Analysis Discover the co-occurrence relationships among customers’ purchase activities for targeted marketing in online retail. R (arules) x x x x
[3] Catch the Pink Flamingo Online Game Develop recommendations for increasing revenue from an online game. Spark, Python, Splunk, KNIME, Neo4j x x x x
[4] Credit Card Fraud Detection and Model Evaluation Optimize the trade-off between recall and precision in predicting fraud in credit card transactions. Python (SciKit Learn, Matplotlib) x x x x
[5] Classification of Mushrooms Select the most accurate technique for classifying mushrooms into edible and not edible by applying several classifiers to a feature space reduced to two principal components and visualize the decision boundary. Python (SciKit Learn, Matplotlib) x x x x
[6] Are Housing Prices in University Towns Less Affected by Recessions? Generate insight into investing in real estate by testing the hypothesis whether university towns have their housing prices less affected by recessions (using a means comparison t-test). Python (Pandas) x x x -
[7] Weather Patterns in Silicon Valley Analyze weather patterns in Silicon Valley to determine if the range of temperatures has widened in 2015 as compared to the previous 10-year period (from 2005 to 2014). Python (Pandas, Matplotlib) x x x x
[8] Identification of Weather Patterns in San Diego, CA Using Cluster Analysis Identify distinct weather patterns using k-means clustering for the optimal number of clusters in the data collected from a weather station and compare cluster centers visualized using parallel coordinates plots. Spark (pyspark SQL, ML, MLlib) x x x x
[9] Classification of Low Humidity Days in San Diego, CA Determine the likelihood of wildfires by predicting low humidity days using a decision tree for providing a timely warning to the residents and appropriate authorities. Spark (pyspark SQL, ML, MLlib) x x x -
[10] Streaming Weather Data Analysis Connect to a weather station that transmits streaming data generated by sensors, process these data in real time, and save the transformed output. Spark (pyspark Streaming) x x - -
[11] Exploratory Analysis of Yelp Star Ratings Explore Yelp dataset and prepare the data for predictive modeling of the number of stars that users assign in their reviews. SQL (MySQL) x x - -
[12] Exploratory Analysis of Twitter Data Run queries in MongoDB to profile and understand a Twitter dataset. MongoDB - x - -