Chapter 1 DBI

1.1 Introduction

In this chapter, we will learn to:

  • connect to a SQLite database from R
  • display database information
  • list tables in the database
  • query data
    • read entire table
    • read few rows
    • read data in batches
  • create table in database
  • overwrite table in database
  • append data to table in database
  • remove table from database
  • generate SQL query
  • close database connection

We will use the following R packages:

All the data sets used in this chapter can be found here and code can be downloaded from here.

1.2 Connection

The first step is to connect to a database. We will connect to an in memory SQLite databse using dbConnect().

con <- dbConnect(RSQLite::SQLite(), ":memory:")

1.2.1 Connection Summary

We can get the more information about the connection using summary().

summary(con)
##           Length            Class             Mode 
##                1 SQLiteConnection               S4

1.2.2 List Tables

Now that we are connected to a database, let us list all the tables present in it using dbListTables().

dbListTables(con)
## [1] "ecom"         "sqlite_stat1" "sqlite_stat4"

1.2.3 List Fields

Time to explore the ecom table in the database. Use dbListFields() to list all the fields in the table.

dbListFields(con, "ecom")
## [1] "referrer" "device"   "bouncers" "n_visit"  "n_pages"  "duration"

1.3 Querying Data

The main objectives of connecting to a database are to:

  • query data from the tables already present
  • create new tables
  • overwrite existing tables
  • delete existing tables

Let us begin with querying data. We can do this in the following ways:

  • read an entire table at once
  • read few rows from a table
  • read data in batches

1.3.1 Entire Table

We can read an entire table from a database using dbReadTable().

dbReadTable(con, 'ecom')
##    referrer device bouncers n_visit n_pages duration
## 1    google laptop        1      10       1      693
## 2     yahoo tablet        1       9       1      459
## 3    direct laptop        1       0       1      996
## 4      bing tablet        0       3      18      468
## 5     yahoo mobile        1       9       1      955
## 6     yahoo laptop        0       5       5      135
## 7     yahoo mobile        1      10       1       75
## 8    direct mobile        1      10       1      908
## 9      bing mobile        0       3      19      209
## 10   google mobile        1       6       1      208
## 11   direct laptop        1       9       1      738
## 12   direct tablet        0       6      12      132
## 13   direct mobile        0       9      14      406
## 14    yahoo tablet        0       5       8       80
## 15    yahoo mobile        0       7       1       19
## 16     bing laptop        1       1       1      995
## 17     bing tablet        0       5      16      368
## 18   google tablet        1       7       1      406
## 19   social tablet        0       7      10      290
## 20   social tablet        0       2       1       28

In some cases, we may not need the entire table but only a specific number of rows. Use dbGetQuery() and supply a SQL statement specifying the number of rows of data to be read from the table. In the below example, we read ten rows of data from the ecom table.

1.3.2 Few Rows

dbGetQuery(con, "select * from ecom limit 10")
##    referrer device bouncers n_visit n_pages duration
## 1    google laptop        1      10       1      693
## 2     yahoo tablet        1       9       1      459
## 3    direct laptop        1       0       1      996
## 4      bing tablet        0       3      18      468
## 5     yahoo mobile        1       9       1      955
## 6     yahoo laptop        0       5       5      135
## 7     yahoo mobile        1      10       1       75
## 8    direct mobile        1      10       1      908
## 9      bing mobile        0       3      19      209
## 10   google mobile        1       6       1      208

In case of very large table, we can read data in batches using dbSendQuery() and dbFetch(). We can mention the number of rows of data to be read while fetching the data using the query generated by dbGetQuery().

1.3.3 Read Data in Batches

query <- dbSendQuery(con, 'select * from ecom')
result <- dbFetch(query, n = 15)
result
##    referrer device bouncers n_visit n_pages duration
## 1    google laptop        1      10       1      693
## 2     yahoo tablet        1       9       1      459
## 3    direct laptop        1       0       1      996
## 4      bing tablet        0       3      18      468
## 5     yahoo mobile        1       9       1      955
## 6     yahoo laptop        0       5       5      135
## 7     yahoo mobile        1      10       1       75
## 8    direct mobile        1      10       1      908
## 9      bing mobile        0       3      19      209
## 10   google mobile        1       6       1      208
## 11   direct laptop        1       9       1      738
## 12   direct tablet        0       6      12      132
## 13   direct mobile        0       9      14      406
## 14    yahoo tablet        0       5       8       80
## 15    yahoo mobile        0       7       1       19

1.4 Query

1.4.1 Query Status

To know the status of a query, use dbHasCompleted(). It is very useful in cases of queries that might take a long time to complete.

dbHasCompleted(query)
## [1] FALSE

1.4.2 Query Info

dbGetInfo() will return the following:

  • the sql staement
  • number of rows fetched
  • number of rows modified/affected
  • status of the query
dbGetInfo(query)
## $statement
## [1] "select * from ecom"
## 
## $row.count
## [1] 15
## 
## $rows.affected
## [1] 0
## 
## $has.completed
## [1] FALSE

1.4.3 Latest Query

To get the latest query, use dbGetStatement().

dbGetStatement(query)
## [1] "select * from ecom"

1.4.4 Rows Fetched

To check the number of rows of data returned by a query, use dbGetRowCount().

dbGetRowCount(query)
## [1] 15

1.4.5 Rows Affected

To know the number of rows modified or affected in the table, use dbGetRowsAffected().

dbGetRowsAffected(query)
## [1] 0

1.4.6 Column Info

To know the name of the columns and their data types, use dbColumnInfo().

dbColumnInfo(query)
##       name      type
## 1 referrer character
## 2   device character
## 3 bouncers   integer
## 4  n_visit    double
## 5  n_pages    double
## 6 duration    double

1.5 Create Table

So far we have explored querying data from an existing table. Now, let us turn our attention to creating new tables in the database.

1.5.0.1 Introduction

To create a new table, use dbWriteTable(). It takes the following 3 arguments:

  • connection name
  • name of the new table
  • data for the new table
x <- 1:10
y <- letters[1:10]
trial <- tibble::tibble(x, y)
dbWriteTable(con, "trial", trial)
## Warning: Closing open result set, pending rows

Let us check if the new table has been created.

dbListTables(con)
## [1] "ecom"         "sqlite_stat1" "sqlite_stat4" "trial"
dbExistsTable(con, "trial")
## [1] TRUE

Let us query data from the new table.

dbGetQuery(con, "select * from trial limit 5")
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e

1.5.0.2 Overwrite Table

In some cases, you may want to overwrite the data in an existing table. Use the overwrite argument in dbWriteTable() and set it to TRUE.

x <- sample(100, 10)
y <- letters[11:20]
trial2 <- tibble::tibble(x, y)
dbWriteTable(con, "trial", trial2, overwrite = TRUE)

Let us see if the trial table has been overwritten.

dbGetQuery(con, "select * from trial limit 5")
##    x y
## 1 59 k
## 2 72 l
## 3 94 m
## 4 24 n
## 5 30 o

1.6 Append Data

You can append data to an existing table by setting the append argument in dbWriteTable() to TRUE.

x <- sample(100, 10)
y <- letters[5:14]
trial3 <- tibble::tibble(x, y)
dbWriteTable(con, "trial", trial3, append = TRUE)

Let us quickly check if the new data has been appended to the trial table.

dbReadTable(con, "trial")
##     x y
## 1  59 k
## 2  72 l
## 3  94 m
## 4  24 n
## 5  30 o
## 6  29 p
## 7  43 q
## 8  66 r
## 9  84 s
## 10 32 t
## 11 86 e
## 12 89 f
## 13 80 g
## 14 71 h
## 15 25 i
## 16 16 j
## 17 43 k
## 18 10 l
## 19 11 m
## 20 75 n

We can also use sqlAppendTable() to append data to an existing table.

sqlAppendTable(con, "ecom", head(ecom))
## Warning: Do not rely on the default value of the row.names argument for
## sqlAppendTable(), it will change in the future.
## <SQL> INSERT INTO `ecom`
##   (`referrer`, `device`, `bouncers`, `n_visit`, `n_pages`, `duration`)
## VALUES
##   ('google', 'laptop', TRUE, 10, 1, 693),
##   ('yahoo', 'tablet', TRUE, 9, 1, 459),
##   ('direct', 'laptop', TRUE, 0, 1, 996),
##   ('bing', 'tablet', FALSE, 3, 18, 468),
##   ('yahoo', 'mobile', TRUE, 9, 1, 955),
##   ('yahoo', 'laptop', FALSE, 5, 5, 135)

1.7 Insert Rows

1.7.0.1 Introduction

We can insert new rows into existing tables using:

  • dbExecute()
  • dbSendStatement()

Both the function take 2 arguments:

  • connection name
  • sql statement
# use dbExecute
dbExecute(con,
  "INSERT into trial (x, y) VALUES (32, 'c'), (45, 'k'), (61, 'h')"
)
## [1] 3

# use dbSendStatement
dbSendStatement(con,
  "INSERT into trial (x, y) VALUES (25, 'm'), (54, 'l'), (16, 'y')"
)
## <SQLiteResult>
##   SQL  INSERT into trial (x, y) VALUES (25, 'm'), (54, 'l'), (16, 'y')
##   ROWS Fetched: 0 [complete]
##        Changed: 3

1.8 Remove Table

If you want to delete/remove a table from the database, use dbRemoveTable().

dbRemoveTable(con, "trial")
## Warning: Closing open result set, pending rows

1.9 SQLite Data Type

If you want to know the data type, use dbDataType().

dbDataType(RSQLite::SQLite(), "a")
## [1] "TEXT"
dbDataType(RSQLite::SQLite(), 1:5)
## [1] "INTEGER"
dbDataType(RSQLite::SQLite(), 1.5)
## [1] "REAL"

1.10 Close Connection

It is a good practice to close connection to a database when you no longer need to read/write data from/to it. Use dbDisconnect() to close the database connection.

dbDisconnect(con)