Chapter 4 SQL Advanced

4.1 Introduction

In this chapter, we will learn to

  • aggregate data
  • order data
  • and group data

4.2 Libraries, Code & Data

We will use the following libraries in this chapter:

All the data sets used in this chapter can be found here and code can be downloaded from here.

4.4 Aggregate

Let us combine the aggregate statements with WHERE statement to filter data.

  • SUM() : returns the total sum of a numeric column
##   SUM(n_visit)
## 1         4972
##   SUM(n_visit)
## 1         3574
  • AVG() : returns the average value of a numeric column
##   AVG(n_visit)
## 1        4.972
##   AVG(n_visit)
## 1     5.079137
  • MAX() : returns the largest value of the selected column
##   MAX(n_visit)
## 1           10
##   MAX(n_visit)
## 1           10
  • MIN() : returns the smallest value of the selected column
##   MIN(n_visit)
## 1            0
##   MIN(n_visit)
## 1            0

4.5 Alias

SQL aliases are used to give a table, or a column in a table, a temporary name. They are often used to make column names more readable. An alias only exists for the duration of the query. Below are a few examples:

##   avg_mobile
## 1   5.479651
##   max_visit
## 1        10
##   min_duration
## 1           10

4.6 Order By

The ORDER BY keyword is used to sort the records in ascending or descending order. By default, the records are sorted in ascending order. Use the DESC keyword if you want to sort the records in descending order,

##        id referrer device bouncers n_visit n_pages duration
## 1     232   social laptop        0       8       2       60
## 2     299    yahoo laptop        0      10      18      180
## 3     570   social laptop        1       2       1      274
## 4     677   direct tablet        1      10       1      682
##                               country purchase order_items order_value
## 1                         Afghanistan        0           0           0
## 2                         Afghanistan        0           0           0
## 3                         Afghanistan        0           0           0
## 4                         Afghanistan        0           0           0
##  [ reached getOption("max.print") -- omitted 996 rows ]
##        id referrer device bouncers n_visit n_pages duration
## 1     236    yahoo tablet        1       5       1       10
## 2     615   social laptop        1       1       1       10
## 3     392    yahoo laptop        0       0       1       12
## 4     688   social mobile        1       2       1       12
##                               country purchase order_items order_value
## 1                              Poland        0           0           0
## 2                             Finland        0           0           0
## 3                           Indonesia        0           0           0
## 4                            Botswana        0           0           0
##  [ reached getOption("max.print") -- omitted 996 rows ]
##        id referrer device bouncers n_visit n_pages duration
## 1       1   google laptop        1      10       1      693
## 2       7    yahoo mobile        1      10       1       75
## 3       8   direct mobile        1      10       1      908
## 4      29   google mobile        1      10       1      338
##                               country purchase order_items order_value
## 1                      Czech Republic        0           0           0
## 2                          Bangladesh        0           0           0
## 3                           Indonesia        0           0           0
## 4                              Russia        0           0           0
##  [ reached getOption("max.print") -- omitted 996 rows ]

4.7 Group By

The GROUP BY statement is used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result by one or more columns.

##   device visits
## 1 mobile    344
## 2 tablet    331
## 3 laptop    325
##   device max_duration
## 1 tablet          999
## 2 laptop          997
## 3 mobile          994