Chapter 4 SQL Advanced

4.1 Introduction

In this chapter, we will learn to

  • aggregate data
  • order data
  • and group data

We will use the following R packages:

All the data sets used in this chapter can be found here and code can be downloaded from here.

4.2 Set Up

In order to be able to use the ecom data set for our case study, we will read it using the readr package and then copy it to an in memory database using the copy_to() function from dplyr.

ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv')
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, ecom)

4.3 Aggregate

Let us combine the aggregate statements with WHERE statement to filter data.

  • SUM() : returns the total sum of a numeric column
SELECT SUM(n_visit) FROM ecom
Table 4.1: 1 records
SUM(n_visit)
4972
SELECT SUM(n_visit) 
FROM ecom 
WHERE n_visit > 5
Table 4.2: 1 records
SUM(n_visit)
3574
  • AVG() : returns the average value of a numeric column
SELECT AVG(n_visit) FROM ecom
Table 4.3: 1 records
AVG(n_visit)
4.972
SELECT AVG(n_visit) 
FROM ecom 
WHERE country LIKE 'P%'
Table 4.4: 1 records
AVG(n_visit)
5.07913669064748
  • MAX() : returns the largest value of the selected column
SELECT MAX(n_visit) FROM ecom
Table 4.5: 1 records
MAX(n_visit)
10
SELECT MAX(n_visit) 
FROM ecom 
WHERE device == 'tablet'
Table 4.6: 1 records
MAX(n_visit)
10
  • MIN() : returns the smallest value of the selected column
SELECT MIN(n_visit) FROM ecom
Table 4.7: 1 records
MIN(n_visit)
0
SELECT MIN(n_visit) 
FROM ecom 
WHERE duration BETWEEN 600 AND 900
Table 4.8: 1 records
MIN(n_visit)
0

4.4 Alias

SQL aliases are used to give a table, or a column in a table, a temporary name. They are often used to make column names more readable. An alias only exists for the duration of the query. Below are a few examples:

SELECT AVG(n_visit) AS avg_mobile 
FROM ecom 
WHERE device == 'mobile'
Table 4.9: 1 records
avg_mobile
5.4796511627907
SELECT MAX(n_visit) AS max_visit 
FROM ecom
Table 4.10: 1 records
max_visit
10
SELECT MIN(duration) AS min_duration 
FROM ecom
Table 4.11: 1 records
min_duration
10

4.5 Order By

The ORDER BY keyword is used to sort the records in ascending or descending order. By default, the records are sorted in ascending order. Use the DESC keyword if you want to sort the records in descending order,

SELECT * 
FROM ecom 
ORDER BY country
Table 4.12: Displaying records 1 - 10
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value
232 social laptop 0 8 2 60 Afghanistan 0 0 0
299 yahoo laptop 0 10 18 180 Afghanistan 0 0 0
570 social laptop 1 2 1 274 Afghanistan 0 0 0
677 direct tablet 1 10 1 682 Afghanistan 0 0 0
682 direct tablet 0 5 18 414 Afghanistan 1 8 2006
853 bing tablet 0 7 2 34 Afghanistan 0 10 2122
169 direct laptop 0 6 6 96 Albania 0 0 0
423 google laptop 1 5 1 304 Albania 0 0 0
456 bing mobile 0 2 2 54 Albania 0 10 2182
492 yahoo laptop 0 3 1 23 Albania 0 0 0
SELECT *  
FROM ecom 
ORDER BY duration
Table 4.13: Displaying records 1 - 10
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value
236 yahoo tablet 1 5 1 10 Poland 0 0 0
615 social laptop 1 1 1 10 Finland 0 0 0
392 yahoo laptop 0 0 1 12 Indonesia 0 0 0
688 social mobile 1 2 1 12 Botswana 0 0 0
468 direct laptop 0 0 1 13 Czech Republic 0 5 1598
63 direct mobile 0 10 1 17 China 0 0 0
951 yahoo tablet 0 4 1 17 Poland 0 3 1478
15 yahoo mobile 0 7 1 19 France 0 7 2423
177 bing laptop 0 5 1 20 Thailand 0 3 1252
432 yahoo mobile 0 6 1 21 Russia 0 0 0
SELECT * 
FROM ecom 
ORDER BY n_visit DESC
Table 4.14: Displaying records 1 - 10
id referrer device bouncers n_visit n_pages duration country purchase order_items order_value
1 google laptop 1 10 1 693 Czech Republic 0 0 0
7 yahoo mobile 1 10 1 75 Bangladesh 0 0 0
8 direct mobile 1 10 1 908 Indonesia 0 0 0
29 google mobile 1 10 1 338 Russia 0 0 0
40 google mobile 1 10 1 479 Cameroon 0 0 0
63 direct mobile 0 10 1 17 China 0 0 0
96 direct tablet 0 10 3 57 China 0 4 722
112 social mobile 0 10 11 242 Argentina 1 4 287
116 direct mobile 1 10 1 675 Indonesia 0 0 0
153 google tablet 1 10 1 86 China 0 0 0

4.6 Group By

The GROUP BY statement is used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result by one or more columns.

SELECT device, 
count(*) AS visits 
FROM ecom 
GROUP BY device 
ORDER by visits DESC
Table 4.15: 3 records
device visits
mobile 344
tablet 331
laptop 325
SELECT device, MAX(duration) AS max_duration 
FROM ecom 
GROUP BY device 
ORDER by max_duration DESC
Table 4.16: 3 records
device max_duration
tablet 999
laptop 997
mobile 994