SQL Advanced
Introduction
In this chapter, we will learn to
- aggregate data
- order data
- and group data
We will use the following R packages:
All the data sets used in this chapter can be found here and code can be downloaded from here.
Set Up
In order to be able to use the ecom
data set for our case study, we will read
it using the readr package and then copy it to an in memory database using the
copy_to()
function from dplyr.
ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv')
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, ecom)
Aggregate
Let us combine the aggregate statements with WHERE statement to filter data.
- SUM() : returns the total sum of a numeric column
SELECT SUM(n_visit) FROM ecom
Table 4.1: 1 records
4972 |
SELECT SUM(n_visit)
FROM ecom
WHERE n_visit > 5
Table 4.2: 1 records
3574 |
- AVG() : returns the average value of a numeric column
SELECT AVG(n_visit) FROM ecom
Table 4.3: 1 records
4.972 |
SELECT AVG(n_visit)
FROM ecom
WHERE country LIKE 'P%'
Table 4.4: 1 records
5.07913669064748 |
- MAX() : returns the largest value of the selected column
SELECT MAX(n_visit) FROM ecom
SELECT MAX(n_visit)
FROM ecom
WHERE device == 'tablet'
- MIN() : returns the smallest value of the selected column
SELECT MIN(n_visit) FROM ecom
SELECT MIN(n_visit)
FROM ecom
WHERE duration BETWEEN 600 AND 900
Alias
SQL aliases are used to give a table, or a column in a table, a temporary name. They are often used to make column names more readable. An alias only exists for the duration of the query. Below are a
few examples:
SELECT AVG(n_visit) AS avg_mobile
FROM ecom
WHERE device == 'mobile'
Table 4.9: 1 records
5.4796511627907 |
SELECT MAX(n_visit) AS max_visit
FROM ecom
SELECT MIN(duration) AS min_duration
FROM ecom
Order By
The ORDER BY keyword is used to sort the records in ascending or descending order. By default, the records are sorted in ascending order. Use the DESC keyword if you want to sort the records in descending order,
SELECT *
FROM ecom
ORDER BY country
Table 4.12: Displaying records 1 - 10
232 |
social |
laptop |
0 |
8 |
2 |
60 |
Afghanistan |
0 |
0 |
0 |
299 |
yahoo |
laptop |
0 |
10 |
18 |
180 |
Afghanistan |
0 |
0 |
0 |
570 |
social |
laptop |
1 |
2 |
1 |
274 |
Afghanistan |
0 |
0 |
0 |
677 |
direct |
tablet |
1 |
10 |
1 |
682 |
Afghanistan |
0 |
0 |
0 |
682 |
direct |
tablet |
0 |
5 |
18 |
414 |
Afghanistan |
1 |
8 |
2006 |
853 |
bing |
tablet |
0 |
7 |
2 |
34 |
Afghanistan |
0 |
10 |
2122 |
169 |
direct |
laptop |
0 |
6 |
6 |
96 |
Albania |
0 |
0 |
0 |
423 |
google |
laptop |
1 |
5 |
1 |
304 |
Albania |
0 |
0 |
0 |
456 |
bing |
mobile |
0 |
2 |
2 |
54 |
Albania |
0 |
10 |
2182 |
492 |
yahoo |
laptop |
0 |
3 |
1 |
23 |
Albania |
0 |
0 |
0 |
SELECT *
FROM ecom
ORDER BY duration
Table 4.13: Displaying records 1 - 10
236 |
yahoo |
tablet |
1 |
5 |
1 |
10 |
Poland |
0 |
0 |
0 |
615 |
social |
laptop |
1 |
1 |
1 |
10 |
Finland |
0 |
0 |
0 |
392 |
yahoo |
laptop |
0 |
0 |
1 |
12 |
Indonesia |
0 |
0 |
0 |
688 |
social |
mobile |
1 |
2 |
1 |
12 |
Botswana |
0 |
0 |
0 |
468 |
direct |
laptop |
0 |
0 |
1 |
13 |
Czech Republic |
0 |
5 |
1598 |
63 |
direct |
mobile |
0 |
10 |
1 |
17 |
China |
0 |
0 |
0 |
951 |
yahoo |
tablet |
0 |
4 |
1 |
17 |
Poland |
0 |
3 |
1478 |
15 |
yahoo |
mobile |
0 |
7 |
1 |
19 |
France |
0 |
7 |
2423 |
177 |
bing |
laptop |
0 |
5 |
1 |
20 |
Thailand |
0 |
3 |
1252 |
432 |
yahoo |
mobile |
0 |
6 |
1 |
21 |
Russia |
0 |
0 |
0 |
SELECT *
FROM ecom
ORDER BY n_visit DESC
Table 4.14: Displaying records 1 - 10
1 |
google |
laptop |
1 |
10 |
1 |
693 |
Czech Republic |
0 |
0 |
0 |
7 |
yahoo |
mobile |
1 |
10 |
1 |
75 |
Bangladesh |
0 |
0 |
0 |
8 |
direct |
mobile |
1 |
10 |
1 |
908 |
Indonesia |
0 |
0 |
0 |
29 |
google |
mobile |
1 |
10 |
1 |
338 |
Russia |
0 |
0 |
0 |
40 |
google |
mobile |
1 |
10 |
1 |
479 |
Cameroon |
0 |
0 |
0 |
63 |
direct |
mobile |
0 |
10 |
1 |
17 |
China |
0 |
0 |
0 |
96 |
direct |
tablet |
0 |
10 |
3 |
57 |
China |
0 |
4 |
722 |
112 |
social |
mobile |
0 |
10 |
11 |
242 |
Argentina |
1 |
4 |
287 |
116 |
direct |
mobile |
1 |
10 |
1 |
675 |
Indonesia |
0 |
0 |
0 |
153 |
google |
tablet |
1 |
10 |
1 |
86 |
China |
0 |
0 |
0 |
Group By
The GROUP BY statement is used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result by one or more columns.
SELECT device,
count(*) AS visits
FROM ecom
GROUP BY device
ORDER by visits DESC
Table 4.15: 3 records
mobile |
344 |
tablet |
331 |
laptop |
325 |
SELECT device, MAX(duration) AS max_duration
FROM ecom
GROUP BY device
ORDER by max_duration DESC
Table 4.16: 3 records
tablet |
999 |
laptop |
997 |
mobile |
994 |