Data Acquisition

Overview

This section describes the process of loading Yelp dataset into SQL in order to explore and prepare the data for predictive modeling of the number of stars that users assign in their reviews.

Data exploration and preparation for predictive modeling are discussed in the next section.

This project is based on assignments from SQL for Data Science by University of California, Davis on Coursera.

The analysis for this project was performed in MySQL.

Description of the Yelp Dataset

The Yelp dataset yelp_db.sql has been downloaded from the Yelp Open Dataset website.

The association and structure of the tables is shown below:

Loading Yelp Dataset into SQL

We first change the directory to /usr/local/mysql, log in to mysql from the command line:

cd /usr/local/mysql
bin/mysql -u root -p

Then we create a database yelp_db:

mysql> create database yelp_db

Let’s now exit from mysql by typing exit, then change the directory in the command line to the location of the dataset yelp_db.sql that we want to load, and load the data:

cd /Users/eagronin/Documents/"Data Science"/"SQL for Data Science"
/usr/local/mysql/bin/mysql -u root -p yelp_db < yelp_db.sql

Next step: Data Preparation