Chapter 3 SQL Basics

3.1 Introduction

In this chapter, we will learn to:

  • select
    • single column
    • multiple columns
    • distinct values in a column
  • limit the number of records returned
  • handle NULL values
  • and filter columns using the following operators
    • WHERE
    • AND, or & NOT
    • BETWEEN
    • IN
    • LIKE

We will use the following R packages:

All the data sets used in this chapter can be found here and code can be downloaded from here.

3.2 Set Up

In order to be able to use the ecom data set for our case study, we will read it using the readr package and then copy it to an in memory database using the copy_to() function from dplyr.

ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv')
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, ecom)

3.3 Select Columns

The SQL SELECT statement is used to fetch the data from a database table.

3.3.1 Syntax

Below is the basic syntax of the SELECT statement.

SELECT column1, column2, columnN FROM table_name;

Here, column1, column2… are the fields of a table whose values you want to fetch. If you want to fetch all the fields, use the following syntax.

SELECT * FROM table_name;

3.3.2 Select Single Column

Let us begin by selecting the device field from the ecom table.

SELECT device FROM ecom
Table 3.1: Displaying records 1 - 10
device
laptop
tablet
laptop
tablet
mobile
laptop
mobile
mobile
mobile
mobile

3.3.3 Select Multiple Columns

Select the following fields from the ecom table:

  • referrer
  • device
  • purchase
SELECT referrer, device, purchase  FROM ecom
Table 3.2: Displaying records 1 - 10
referrer device purchase
google laptop 0
yahoo tablet 0
direct laptop 0
bing tablet 1
yahoo mobile 0
yahoo laptop 0
yahoo mobile 0
direct mobile 0
bing mobile 0
google mobile 0

3.3.4 Select All Columns

Select all the fields from the ecom table.

SELECT * FROM ecom
Table 3.3: Displaying records 1 - 10
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value
1 google laptop 1 10 1 693 Czech Republic 0 0 0
2 yahoo tablet 1 9 1 459 Yemen 0 0 0
3 direct laptop 1 0 1 996 Brazil 0 0 0
4 bing tablet 0 3 18 468 China 1 6 434
5 yahoo mobile 1 9 1 955 Poland 0 0 0
6 yahoo laptop 0 5 5 135 South Africa 0 0 0
7 yahoo mobile 1 10 1 75 Bangladesh 0 0 0
8 direct mobile 1 10 1 908 Indonesia 0 0 0
9 bing mobile 0 3 19 209 Netherlands 0 0 0
10 google mobile 1 6 1 208 Czech Republic 0 0 0

3.4 Limit

If you have a large table with thousands of rows, returning all the records will take time. Use LIMIT to specify the number of records to return.

SELECT * FROM ecom limit 10
Table 3.4: Displaying records 1 - 10
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value
1 google laptop 1 10 1 693 Czech Republic 0 0 0
2 yahoo tablet 1 9 1 459 Yemen 0 0 0
3 direct laptop 1 0 1 996 Brazil 0 0 0
4 bing tablet 0 3 18 468 China 1 6 434
5 yahoo mobile 1 9 1 955 Poland 0 0 0
6 yahoo laptop 0 5 5 135 South Africa 0 0 0
7 yahoo mobile 1 10 1 75 Bangladesh 0 0 0
8 direct mobile 1 10 1 908 Indonesia 0 0 0
9 bing mobile 0 3 19 209 Netherlands 0 0 0
10 google mobile 1 6 1 208 Czech Republic 0 0 0

3.5 Distinct

A column in a table may often contain many duplicate values; and we might be interested only in the distinct/unique values. In such cases, we can use the SELECT DISTINCT statement to return only distinct values.

SELECT distinct referrer FROM ecom
Table 3.5: 5 records
referrer
google
yahoo
direct
bing
social

3.6 Filter

Now that we know how to select columns, let us focus on filtering data. In SQL, the WHERE keyword is used to extract only those records that fulfill a specified condition. Data filter based on both text and numeric values in a table. Below are a few comparison operators we can use:

  • = equal
  • <> not equal
  • < less than
  • > greater than
  • <= less than or equal to
  • >= greater than or equal to

The following SQL statement filters all rows from the ecom table where the duration field is greater than 300.

SELECT * 
FROM ecom 
WHERE duration > 300
Table 3.6: Displaying records 1 - 10
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value
1 google laptop 1 10 1 693 Czech Republic 0 0 0
2 yahoo tablet 1 9 1 459 Yemen 0 0 0
3 direct laptop 1 0 1 996 Brazil 0 0 0
4 bing tablet 0 3 18 468 China 1 6 434
5 yahoo mobile 1 9 1 955 Poland 0 0 0
8 direct mobile 1 10 1 908 Indonesia 0 0 0
11 direct laptop 1 9 1 738 Jamaica 0 0 0
13 direct mobile 0 9 14 406 Ireland 1 3 651
16 bing laptop 1 1 1 995 United States 0 0 0
17 bing tablet 0 5 16 368 Peru 1 6 1049

Let us filter data based on a text value. In the following example, we filter all rows from the ecom table where the device used is mobile.

SELECT * 
FROM ecom 
WHERE device == 'mobile'
Table 3.7: Displaying records 1 - 10
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value
5 yahoo mobile 1 9 1 955 Poland 0 0 0
7 yahoo mobile 1 10 1 75 Bangladesh 0 0 0
8 direct mobile 1 10 1 908 Indonesia 0 0 0
9 bing mobile 0 3 19 209 Netherlands 0 0 0
10 google mobile 1 6 1 208 Czech Republic 0 0 0
13 direct mobile 0 9 14 406 Ireland 1 3 651
15 yahoo mobile 0 7 1 19 France 0 7 2423
22 google mobile 1 5 1 147 Brazil 0 0 0
23 bing mobile 0 0 7 196 Russia 0 4 237
29 google mobile 1 10 1 338 Russia 0 0 0

3.7 And, Or & Not

The WHERE clause can be combined with other operators such as

  • AND - displays a record if all the conditions separated by AND is TRUE
  • OR - displays a record if any of the conditions separated by OR is TRUE
  • NOT - displays a record if the condition(s) is NOT TRUE

to filter data based on more than one condition or to create more complex conditions.

In the following example, we filter all the rows from the ecom table where n_visit (visit count) is greater than 3 and duration (time spent on the site) is greater than 100. We use AND to create multiple conditions.

SELECT * 
FROM ecom 
WHERE n_visit > 3 AND duration > 100
Table 3.8: Displaying records 1 - 10
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value
1 google laptop 1 10 1 693 Czech Republic 0 0 0
2 yahoo tablet 1 9 1 459 Yemen 0 0 0
5 yahoo mobile 1 9 1 955 Poland 0 0 0
6 yahoo laptop 0 5 5 135 South Africa 0 0 0
8 direct mobile 1 10 1 908 Indonesia 0 0 0
10 google mobile 1 6 1 208 Czech Republic 0 0 0
11 direct laptop 1 9 1 738 Jamaica 0 0 0
12 direct tablet 0 6 12 132 Estonia 0 0 0
13 direct mobile 0 9 14 406 Ireland 1 3 651
17 bing tablet 0 5 16 368 Peru 1 6 1049

In the next example, we will use both AND & OR. Our goal is to filter all rows from the ecom table that follow the below conditions:

  • n_visit (visit count) is either equal to 3 or 5
  • device used to visit the website is either mobile or tablet
SELECT * 
FROM ecom WHERE (n_visit == 5 OR n_visit == 3)  
AND (device = 'mobile' OR device = 'tablet')
Table 3.9: Displaying records 1 - 10
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value
4 bing tablet 0 3 18 468 China 1 6 434
9 bing mobile 0 3 19 209 Netherlands 0 0 0
14 yahoo tablet 0 5 8 80 Philippines 0 2 362
17 bing tablet 0 5 16 368 Peru 1 6 1049
22 google mobile 1 5 1 147 Brazil 0 0 0
38 yahoo mobile 1 3 1 700 Canada 0 0 0
50 bing tablet 1 5 1 831 Iran 0 0 0
53 social tablet 0 3 12 324 China 0 0 0
64 yahoo tablet 0 3 5 140 Latvia 0 6 2798
67 yahoo tablet 1 3 1 332 France 0 0 0

3.8 BETWEEN

The BETWEEN operator selects values within a given range and is inclusive: begin and end values are included. The values can be numbers, text, or dates. In the following example, we filter rows from the ecom table where the visit count is between 1 and 3, and the device used to visit the website is mobile.

SELECT * 
FROM ecom
WHERE n_visit BETWEEN 1 AND 3 AND device = 'mobile'
Table 3.10: Displaying records 1 - 10
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value
9 bing mobile 0 3 19 209 Netherlands 0 0 0
32 direct mobile 1 2 1 501 El Salvador 0 0 0
36 bing mobile 0 1 1 25 Ireland 0 10 1885
38 yahoo mobile 1 3 1 700 Canada 0 0 0
42 direct mobile 0 1 13 234 Indonesia 0 0 0
49 social mobile 0 1 2 44 France 0 10 1515
98 bing mobile 0 3 18 288 Portugal 1 6 764
99 yahoo mobile 1 1 1 73 Indonesia 0 0 0
102 google mobile 0 2 3 69 Bahamas 0 3 583
105 direct mobile 1 1 1 423 Mexico 0 0 0

3.9 IN

The IN operator allows us to specify multiple values in a WHERE clause. It is a shorthand for multiple OR conditions. In the below example, we filter rows from the ecom table where the visit count is either 2 or 4 or 6 or 8 or 10. Instead of using multiple OR conditions, we use the IN operator.

SELECT * 
FROM ecom 
WHERE n_visit IN (2, 4, 6, 8, 10)
Table 3.11: Displaying records 1 - 10
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value
1 google laptop 1 10 1 693 Czech Republic 0 0 0
7 yahoo mobile 1 10 1 75 Bangladesh 0 0 0
8 direct mobile 1 10 1 908 Indonesia 0 0 0
10 google mobile 1 6 1 208 Czech Republic 0 0 0
12 direct tablet 0 6 12 132 Estonia 0 0 0
20 social tablet 0 2 1 28 Namibia 0 7 2077
21 direct laptop 1 2 1 384 China 0 0 0
24 social laptop 1 4 1 850 Ukraine 0 0 0
27 direct tablet 0 2 19 342 Japan 1 5 622
29 google mobile 1 10 1 338 Russia 0 0 0

3.10 IS NULL

A field with a NULL value is a field with no value. If a field in a table is optional, it is possible to insert a new record or update a record without adding a value to this field. Then, the field will be saved with a NULL value. In the next example, we filter all rows from the ecom table where the device column has NULL values.

SELECT * 
FROM ecom 
WHERE device IS NULL
Table 3.12: 0 records
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value

3.11 LIKE

The LIKE operator is used to search for a specific pattern in a column. There are two wildcards used in conjunction with the LIKE operator:

  • % : represents zero, one, or multiple characters
  • _ : represents a single character

In the following example, we filter all rows from the ecom table where the name of the country starts with P. We use % after P to indicate that it can be followed by any number or type of characters.

SELECT * 
FROM ecom 
WHERE country LIKE 'P%'
Table 3.13: Displaying records 1 - 10
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value
5 yahoo mobile 1 9 1 955 Poland 0 0 0
14 yahoo tablet 0 5 8 80 Philippines 0 2 362
17 bing tablet 0 5 16 368 Peru 1 6 1049
43 bing laptop 1 0 1 456 Portugal 0 0 0
59 yahoo tablet 1 9 1 706 Philippines 0 0 0
62 direct tablet 1 8 1 814 Philippines 0 0 0
72 social laptop 0 5 16 176 Panama 0 5 2216
77 yahoo laptop 1 4 1 928 Peru 0 0 0
88 yahoo tablet 0 3 11 176 Philippines 0 0 0
93 google laptop 1 9 1 869 Poland 0 0 0

Let us look at another example where we filter all rows from the ecom table where the name of the country should follow the below conditions:

  • name can start with any character
  • the second character must be o
  • it can have any type or number of characters after the second character
SELECT * 
FROM ecom 
WHERE country LIKE '_o%'
Table 3.14: Displaying records 1 - 10
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value
5 yahoo mobile 1 9 1 955 Poland 0 0 0
6 yahoo laptop 0 5 5 135 South Africa 0 0 0
19 social tablet 0 7 10 290 Colombia 1 9 1304
30 yahoo mobile 0 8 9 225 Colombia 0 0 0
33 direct laptop 1 8 1 658 Colombia 0 0 0
43 bing laptop 1 0 1 456 Portugal 0 0 0
44 yahoo tablet 0 7 11 264 Somalia 0 0 0
56 google tablet 0 1 19 304 Colombia 0 0 0
93 google laptop 1 9 1 869 Poland 0 0 0
95 social tablet 1 1 1 566 Portugal 0 0 0