All the data sets used in this chapter can be found here and code can be downloaded from here.
4.2 Set Up
In order to be able to use the ecom data set for our case study, we will read
it using the readr package and then copy it to an in memory database using the
copy_to() function from dplyr.
Let us combine the aggregate statements with WHERE statement to filter data.
SUM() : returns the total sum of a numeric column
SELECTSUM(n_visit) FROM ecom
Table 4.1: 1 records
SUM(n_visit)
4972
SELECTSUM(n_visit) FROM ecom WHERE n_visit >5
Table 4.2: 1 records
SUM(n_visit)
3574
AVG() : returns the average value of a numeric column
SELECTAVG(n_visit) FROM ecom
Table 4.3: 1 records
AVG(n_visit)
4.972
SELECTAVG(n_visit) FROM ecom WHERE country LIKE'P%'
Table 4.4: 1 records
AVG(n_visit)
5.07913669064748
MAX() : returns the largest value of the selected column
SELECTMAX(n_visit) FROM ecom
Table 4.5: 1 records
MAX(n_visit)
10
SELECTMAX(n_visit) FROM ecom WHERE device =='tablet'
Table 4.6: 1 records
MAX(n_visit)
10
MIN() : returns the smallest value of the selected column
SELECTMIN(n_visit) FROM ecom
Table 4.7: 1 records
MIN(n_visit)
0
SELECTMIN(n_visit) FROM ecom WHERE duration BETWEEN600AND900
Table 4.8: 1 records
MIN(n_visit)
0
4.4 Alias
SQL aliases are used to give a table, or a column in a table, a temporary name. They are often used to make column names more readable. An alias only exists for the duration of the query. Below are a
few examples:
SELECTAVG(n_visit) AS avg_mobile FROM ecom WHERE device =='mobile'
Table 4.9: 1 records
avg_mobile
5.4796511627907
SELECTMAX(n_visit) AS max_visit FROM ecom
Table 4.10: 1 records
max_visit
10
SELECTMIN(duration) AS min_duration FROM ecom
Table 4.11: 1 records
min_duration
10
4.5 Order By
The ORDER BY keyword is used to sort the records in ascending or descending order. By default, the records are sorted in ascending order. Use the DESC keyword if you want to sort the records in descending order,
SELECT*FROM ecom ORDERBY country
Table 4.12: Displaying records 1 - 10
id
referrer
device
bouncers
n_visit
n_pages
duration
country
purchase
order_items
order_value
232
social
laptop
0
8
2
60
Afghanistan
0
0
0
299
yahoo
laptop
0
10
18
180
Afghanistan
0
0
0
570
social
laptop
1
2
1
274
Afghanistan
0
0
0
677
direct
tablet
1
10
1
682
Afghanistan
0
0
0
682
direct
tablet
0
5
18
414
Afghanistan
1
8
2006
853
bing
tablet
0
7
2
34
Afghanistan
0
10
2122
169
direct
laptop
0
6
6
96
Albania
0
0
0
423
google
laptop
1
5
1
304
Albania
0
0
0
456
bing
mobile
0
2
2
54
Albania
0
10
2182
492
yahoo
laptop
0
3
1
23
Albania
0
0
0
SELECT*FROM ecom ORDERBY duration
Table 4.13: Displaying records 1 - 10
id
referrer
device
bouncers
n_visit
n_pages
duration
country
purchase
order_items
order_value
236
yahoo
tablet
1
5
1
10
Poland
0
0
0
615
social
laptop
1
1
1
10
Finland
0
0
0
392
yahoo
laptop
0
0
1
12
Indonesia
0
0
0
688
social
mobile
1
2
1
12
Botswana
0
0
0
468
direct
laptop
0
0
1
13
Czech Republic
0
5
1598
63
direct
mobile
0
10
1
17
China
0
0
0
951
yahoo
tablet
0
4
1
17
Poland
0
3
1478
15
yahoo
mobile
0
7
1
19
France
0
7
2423
177
bing
laptop
0
5
1
20
Thailand
0
3
1252
432
yahoo
mobile
0
6
1
21
Russia
0
0
0
SELECT*FROM ecom ORDERBY n_visit DESC
Table 4.14: Displaying records 1 - 10
id
referrer
device
bouncers
n_visit
n_pages
duration
country
purchase
order_items
order_value
1
google
laptop
1
10
1
693
Czech Republic
0
0
0
7
yahoo
mobile
1
10
1
75
Bangladesh
0
0
0
8
direct
mobile
1
10
1
908
Indonesia
0
0
0
29
google
mobile
1
10
1
338
Russia
0
0
0
40
google
mobile
1
10
1
479
Cameroon
0
0
0
63
direct
mobile
0
10
1
17
China
0
0
0
96
direct
tablet
0
10
3
57
China
0
4
722
112
social
mobile
0
10
11
242
Argentina
1
4
287
116
direct
mobile
1
10
1
675
Indonesia
0
0
0
153
google
tablet
1
10
1
86
China
0
0
0
4.6 Group By
The GROUP BY statement is used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result by one or more columns.
SELECT device, count(*) AS visits FROM ecom GROUPBY device ORDERby visits DESC
Table 4.15: 3 records
device
visits
mobile
344
tablet
331
laptop
325
SELECT device, MAX(duration) AS max_duration FROM ecom GROUPBY device ORDERby max_duration DESC