Chapter 3 SQL Basics
3.1 Introduction
In this chapter, we will learn to:
- select
- single column
- multiple columns
- distinct values in a column
- limit the number of records returned
- handle NULL values
- and filter columns using the following operators
- WHERE
- AND, or & NOT
- BETWEEN
- IN
- LIKE
We will use the following R packages:
All the data sets used in this chapter can be found here and code can be downloaded from here.
3.2 Set Up
In order to be able to use the ecom
data set for our case study, we will read
it using the readr package and then copy it to an in memory database using the
copy_to()
function from dplyr.
3.3 Select Columns
The SQL SELECT statement is used to fetch the data from a database table.
3.3.1 Syntax
Below is the basic syntax of the SELECT statement.
SELECT column1, column2, columnN FROM table_name;
Here, column1, column2… are the fields of a table whose values you want to fetch. If you want to fetch all the fields, use the following syntax.
SELECT * FROM table_name;
3.3.2 Select Single Column
Let us begin by selecting the device
field from the ecom
table.
device |
---|
laptop |
tablet |
laptop |
tablet |
mobile |
laptop |
mobile |
mobile |
mobile |
mobile |
3.3.3 Select Multiple Columns
Select the following fields from the ecom
table:
referrer
device
purchase
referrer | device | purchase |
---|---|---|
laptop | 0 | |
yahoo | tablet | 0 |
direct | laptop | 0 |
bing | tablet | 1 |
yahoo | mobile | 0 |
yahoo | laptop | 0 |
yahoo | mobile | 0 |
direct | mobile | 0 |
bing | mobile | 0 |
mobile | 0 |
3.3.4 Select All Columns
Select all the fields from the ecom
table.
id | referrer | device | bouncers | n_visit | n_pages | duration | country | purchase | order_items | order_value |
---|---|---|---|---|---|---|---|---|---|---|
1 | laptop | 1 | 10 | 1 | 693 | Czech Republic | 0 | 0 | 0 | |
2 | yahoo | tablet | 1 | 9 | 1 | 459 | Yemen | 0 | 0 | 0 |
3 | direct | laptop | 1 | 0 | 1 | 996 | Brazil | 0 | 0 | 0 |
4 | bing | tablet | 0 | 3 | 18 | 468 | China | 1 | 6 | 434 |
5 | yahoo | mobile | 1 | 9 | 1 | 955 | Poland | 0 | 0 | 0 |
6 | yahoo | laptop | 0 | 5 | 5 | 135 | South Africa | 0 | 0 | 0 |
7 | yahoo | mobile | 1 | 10 | 1 | 75 | Bangladesh | 0 | 0 | 0 |
8 | direct | mobile | 1 | 10 | 1 | 908 | Indonesia | 0 | 0 | 0 |
9 | bing | mobile | 0 | 3 | 19 | 209 | Netherlands | 0 | 0 | 0 |
10 | mobile | 1 | 6 | 1 | 208 | Czech Republic | 0 | 0 | 0 |
3.4 Limit
If you have a large table with thousands of rows, returning all the records will take time. Use LIMIT to specify the number of records to return.
id | referrer | device | bouncers | n_visit | n_pages | duration | country | purchase | order_items | order_value |
---|---|---|---|---|---|---|---|---|---|---|
1 | laptop | 1 | 10 | 1 | 693 | Czech Republic | 0 | 0 | 0 | |
2 | yahoo | tablet | 1 | 9 | 1 | 459 | Yemen | 0 | 0 | 0 |
3 | direct | laptop | 1 | 0 | 1 | 996 | Brazil | 0 | 0 | 0 |
4 | bing | tablet | 0 | 3 | 18 | 468 | China | 1 | 6 | 434 |
5 | yahoo | mobile | 1 | 9 | 1 | 955 | Poland | 0 | 0 | 0 |
6 | yahoo | laptop | 0 | 5 | 5 | 135 | South Africa | 0 | 0 | 0 |
7 | yahoo | mobile | 1 | 10 | 1 | 75 | Bangladesh | 0 | 0 | 0 |
8 | direct | mobile | 1 | 10 | 1 | 908 | Indonesia | 0 | 0 | 0 |
9 | bing | mobile | 0 | 3 | 19 | 209 | Netherlands | 0 | 0 | 0 |
10 | mobile | 1 | 6 | 1 | 208 | Czech Republic | 0 | 0 | 0 |
3.5 Distinct
A column in a table may often contain many duplicate values; and we might be interested only in the distinct/unique values. In such cases, we can use the SELECT DISTINCT statement to return only distinct values.
referrer |
---|
yahoo |
direct |
bing |
social |
3.6 Filter
Now that we know how to select columns, let us focus on filtering data. In SQL, the WHERE keyword is used to extract only those records that fulfill a specified condition. Data filter based on both text and numeric values in a table. Below are a few comparison operators we can use:
=
equal<>
not equal<
less than>
greater than<=
less than or equal to>=
greater than or equal to
The following SQL statement filters all rows from the ecom
table where the duration
field
is greater than 300.
id | referrer | device | bouncers | n_visit | n_pages | duration | country | purchase | order_items | order_value |
---|---|---|---|---|---|---|---|---|---|---|
1 | laptop | 1 | 10 | 1 | 693 | Czech Republic | 0 | 0 | 0 | |
2 | yahoo | tablet | 1 | 9 | 1 | 459 | Yemen | 0 | 0 | 0 |
3 | direct | laptop | 1 | 0 | 1 | 996 | Brazil | 0 | 0 | 0 |
4 | bing | tablet | 0 | 3 | 18 | 468 | China | 1 | 6 | 434 |
5 | yahoo | mobile | 1 | 9 | 1 | 955 | Poland | 0 | 0 | 0 |
8 | direct | mobile | 1 | 10 | 1 | 908 | Indonesia | 0 | 0 | 0 |
11 | direct | laptop | 1 | 9 | 1 | 738 | Jamaica | 0 | 0 | 0 |
13 | direct | mobile | 0 | 9 | 14 | 406 | Ireland | 1 | 3 | 651 |
16 | bing | laptop | 1 | 1 | 1 | 995 | United States | 0 | 0 | 0 |
17 | bing | tablet | 0 | 5 | 16 | 368 | Peru | 1 | 6 | 1049 |
Let us filter data based on a text value. In the following example, we filter all rows
from the ecom
table where the device
used is mobile
.
id | referrer | device | bouncers | n_visit | n_pages | duration | country | purchase | order_items | order_value |
---|---|---|---|---|---|---|---|---|---|---|
5 | yahoo | mobile | 1 | 9 | 1 | 955 | Poland | 0 | 0 | 0 |
7 | yahoo | mobile | 1 | 10 | 1 | 75 | Bangladesh | 0 | 0 | 0 |
8 | direct | mobile | 1 | 10 | 1 | 908 | Indonesia | 0 | 0 | 0 |
9 | bing | mobile | 0 | 3 | 19 | 209 | Netherlands | 0 | 0 | 0 |
10 | mobile | 1 | 6 | 1 | 208 | Czech Republic | 0 | 0 | 0 | |
13 | direct | mobile | 0 | 9 | 14 | 406 | Ireland | 1 | 3 | 651 |
15 | yahoo | mobile | 0 | 7 | 1 | 19 | France | 0 | 7 | 2423 |
22 | mobile | 1 | 5 | 1 | 147 | Brazil | 0 | 0 | 0 | |
23 | bing | mobile | 0 | 0 | 7 | 196 | Russia | 0 | 4 | 237 |
29 | mobile | 1 | 10 | 1 | 338 | Russia | 0 | 0 | 0 |
3.7 And, Or & Not
The WHERE clause can be combined with other operators such as
- AND - displays a record if all the conditions separated by AND is TRUE
- OR - displays a record if any of the conditions separated by OR is TRUE
- NOT - displays a record if the condition(s) is NOT TRUE
to filter data based on more than one condition or to create more complex conditions.
In the following example, we filter all the rows from the ecom
table where
n_visit
(visit count) is greater than 3 and duration
(time spent on the site)
is greater than 100. We use AND to create multiple conditions.
id | referrer | device | bouncers | n_visit | n_pages | duration | country | purchase | order_items | order_value |
---|---|---|---|---|---|---|---|---|---|---|
1 | laptop | 1 | 10 | 1 | 693 | Czech Republic | 0 | 0 | 0 | |
2 | yahoo | tablet | 1 | 9 | 1 | 459 | Yemen | 0 | 0 | 0 |
5 | yahoo | mobile | 1 | 9 | 1 | 955 | Poland | 0 | 0 | 0 |
6 | yahoo | laptop | 0 | 5 | 5 | 135 | South Africa | 0 | 0 | 0 |
8 | direct | mobile | 1 | 10 | 1 | 908 | Indonesia | 0 | 0 | 0 |
10 | mobile | 1 | 6 | 1 | 208 | Czech Republic | 0 | 0 | 0 | |
11 | direct | laptop | 1 | 9 | 1 | 738 | Jamaica | 0 | 0 | 0 |
12 | direct | tablet | 0 | 6 | 12 | 132 | Estonia | 0 | 0 | 0 |
13 | direct | mobile | 0 | 9 | 14 | 406 | Ireland | 1 | 3 | 651 |
17 | bing | tablet | 0 | 5 | 16 | 368 | Peru | 1 | 6 | 1049 |
In the next example, we will use both AND & OR. Our goal is to filter all
rows from the ecom
table that follow the below conditions:
n_visit
(visit count) is either equal to 3 or 5device
used to visit the website is eithermobile
ortablet
SELECT *
FROM ecom WHERE (n_visit == 5 OR n_visit == 3)
AND (device = 'mobile' OR device = 'tablet')
id | referrer | device | bouncers | n_visit | n_pages | duration | country | purchase | order_items | order_value |
---|---|---|---|---|---|---|---|---|---|---|
4 | bing | tablet | 0 | 3 | 18 | 468 | China | 1 | 6 | 434 |
9 | bing | mobile | 0 | 3 | 19 | 209 | Netherlands | 0 | 0 | 0 |
14 | yahoo | tablet | 0 | 5 | 8 | 80 | Philippines | 0 | 2 | 362 |
17 | bing | tablet | 0 | 5 | 16 | 368 | Peru | 1 | 6 | 1049 |
22 | mobile | 1 | 5 | 1 | 147 | Brazil | 0 | 0 | 0 | |
38 | yahoo | mobile | 1 | 3 | 1 | 700 | Canada | 0 | 0 | 0 |
50 | bing | tablet | 1 | 5 | 1 | 831 | Iran | 0 | 0 | 0 |
53 | social | tablet | 0 | 3 | 12 | 324 | China | 0 | 0 | 0 |
64 | yahoo | tablet | 0 | 3 | 5 | 140 | Latvia | 0 | 6 | 2798 |
67 | yahoo | tablet | 1 | 3 | 1 | 332 | France | 0 | 0 | 0 |
3.8 BETWEEN
The BETWEEN operator selects values within a given range and is inclusive: begin and end values are included. The values can be numbers, text, or dates. In the following example, we filter rows from
the ecom
table where the visit count is between 1 and 3, and the device
used to visit the website is
mobile
.
id | referrer | device | bouncers | n_visit | n_pages | duration | country | purchase | order_items | order_value |
---|---|---|---|---|---|---|---|---|---|---|
9 | bing | mobile | 0 | 3 | 19 | 209 | Netherlands | 0 | 0 | 0 |
32 | direct | mobile | 1 | 2 | 1 | 501 | El Salvador | 0 | 0 | 0 |
36 | bing | mobile | 0 | 1 | 1 | 25 | Ireland | 0 | 10 | 1885 |
38 | yahoo | mobile | 1 | 3 | 1 | 700 | Canada | 0 | 0 | 0 |
42 | direct | mobile | 0 | 1 | 13 | 234 | Indonesia | 0 | 0 | 0 |
49 | social | mobile | 0 | 1 | 2 | 44 | France | 0 | 10 | 1515 |
98 | bing | mobile | 0 | 3 | 18 | 288 | Portugal | 1 | 6 | 764 |
99 | yahoo | mobile | 1 | 1 | 1 | 73 | Indonesia | 0 | 0 | 0 |
102 | mobile | 0 | 2 | 3 | 69 | Bahamas | 0 | 3 | 583 | |
105 | direct | mobile | 1 | 1 | 1 | 423 | Mexico | 0 | 0 | 0 |
3.9 IN
The IN operator allows us to specify multiple values in a WHERE clause. It is a shorthand
for multiple OR conditions. In the below example, we filter rows from the ecom
table where
the visit count is either 2 or 4 or 6 or 8 or 10. Instead of using multiple OR conditions,
we use the IN operator.
id | referrer | device | bouncers | n_visit | n_pages | duration | country | purchase | order_items | order_value |
---|---|---|---|---|---|---|---|---|---|---|
1 | laptop | 1 | 10 | 1 | 693 | Czech Republic | 0 | 0 | 0 | |
7 | yahoo | mobile | 1 | 10 | 1 | 75 | Bangladesh | 0 | 0 | 0 |
8 | direct | mobile | 1 | 10 | 1 | 908 | Indonesia | 0 | 0 | 0 |
10 | mobile | 1 | 6 | 1 | 208 | Czech Republic | 0 | 0 | 0 | |
12 | direct | tablet | 0 | 6 | 12 | 132 | Estonia | 0 | 0 | 0 |
20 | social | tablet | 0 | 2 | 1 | 28 | Namibia | 0 | 7 | 2077 |
21 | direct | laptop | 1 | 2 | 1 | 384 | China | 0 | 0 | 0 |
24 | social | laptop | 1 | 4 | 1 | 850 | Ukraine | 0 | 0 | 0 |
27 | direct | tablet | 0 | 2 | 19 | 342 | Japan | 1 | 5 | 622 |
29 | mobile | 1 | 10 | 1 | 338 | Russia | 0 | 0 | 0 |
3.10 IS NULL
A field with a NULL value is a field with no value. If a field in a table is optional, it is possible to insert a new record or update a record without adding a value to this field. Then, the field will be saved with a NULL value. In the next example, we filter all rows from the ecom
table where the device
column
has NULL values.
id | referrer | device | bouncers | n_visit | n_pages | duration | country | purchase | order_items | order_value |
---|
3.11 LIKE
The LIKE operator is used to search for a specific pattern in a column. There are two wildcards used in conjunction with the LIKE operator:
- % : represents zero, one, or multiple characters
- _ : represents a single character
In the following example, we filter all rows from the ecom
table where the name of the country
starts with P
. We use %
after P
to indicate that it can be followed by any number or type of characters.
id | referrer | device | bouncers | n_visit | n_pages | duration | country | purchase | order_items | order_value |
---|---|---|---|---|---|---|---|---|---|---|
5 | yahoo | mobile | 1 | 9 | 1 | 955 | Poland | 0 | 0 | 0 |
14 | yahoo | tablet | 0 | 5 | 8 | 80 | Philippines | 0 | 2 | 362 |
17 | bing | tablet | 0 | 5 | 16 | 368 | Peru | 1 | 6 | 1049 |
43 | bing | laptop | 1 | 0 | 1 | 456 | Portugal | 0 | 0 | 0 |
59 | yahoo | tablet | 1 | 9 | 1 | 706 | Philippines | 0 | 0 | 0 |
62 | direct | tablet | 1 | 8 | 1 | 814 | Philippines | 0 | 0 | 0 |
72 | social | laptop | 0 | 5 | 16 | 176 | Panama | 0 | 5 | 2216 |
77 | yahoo | laptop | 1 | 4 | 1 | 928 | Peru | 0 | 0 | 0 |
88 | yahoo | tablet | 0 | 3 | 11 | 176 | Philippines | 0 | 0 | 0 |
93 | laptop | 1 | 9 | 1 | 869 | Poland | 0 | 0 | 0 |
Let us look at another example where we filter all rows from the ecom
table where the name of the
country should follow the below conditions:
- name can start with any character
- the second character must be
o
- it can have any type or number of characters after the second character
id | referrer | device | bouncers | n_visit | n_pages | duration | country | purchase | order_items | order_value |
---|---|---|---|---|---|---|---|---|---|---|
5 | yahoo | mobile | 1 | 9 | 1 | 955 | Poland | 0 | 0 | 0 |
6 | yahoo | laptop | 0 | 5 | 5 | 135 | South Africa | 0 | 0 | 0 |
19 | social | tablet | 0 | 7 | 10 | 290 | Colombia | 1 | 9 | 1304 |
30 | yahoo | mobile | 0 | 8 | 9 | 225 | Colombia | 0 | 0 | 0 |
33 | direct | laptop | 1 | 8 | 1 | 658 | Colombia | 0 | 0 | 0 |
43 | bing | laptop | 1 | 0 | 1 | 456 | Portugal | 0 | 0 | 0 |
44 | yahoo | tablet | 0 | 7 | 11 | 264 | Somalia | 0 | 0 | 0 |
56 | tablet | 0 | 1 | 19 | 304 | Colombia | 0 | 0 | 0 | |
93 | laptop | 1 | 9 | 1 | 869 | Poland | 0 | 0 | 0 | |
95 | social | tablet | 1 | 1 | 1 | 566 | Portugal | 0 | 0 | 0 |