-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
132 lines (99 loc) · 4.07 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# duckspatial
<!-- badges: start -->
[](https://CRAN.R-project.org/package=duckspatial)
[](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[](https://app.codecov.io/gh/Cidree/duckspatial)
[](https://www.gnu.org/licenses/gpl-3.0)
[](https://www.repostatus.org/#active)
<!-- badges: end -->
**duckspatial** is an R package that simplifies the process of reading and writing vector spatial data (e.g., `sf` objects) in a [DuckDB](https://duckdb.org/) database. This package is designed for users working with geospatial data who want to leverage DuckDB’s fast analytical capabilities while maintaining compatibility with R’s spatial data ecosystem.
## Installation
You can install the development version of duckspatial from [GitHub](https://github.com/) with:
``` r
# install.packages("pak")
pak::pak("Cidree/duckspatial")
```
## Example
This is a basic example which shows how to set up DuckDB for spatial data manipulation, and how to write/read vector data.
```{r}
library(duckdb)
library(duckspatial)
library(sf)
```
First, we create a connection with a DuckDB database (in this case in memory database), and we make sure that the spatial extension is installed, and we load it:
```{r connect}
## create connection
conn <- dbConnect(duckdb())
## install and load spatial extension
ddbs_install(conn)
ddbs_load(conn)
```
Now we can get some data to insert into the database. We are creating 1,000,000 random points.
```{r}
## create n points
n <- 1000000
random_points <- data.frame(
id = 1:n,
x = runif(n, min = -180, max = 180), # Random longitude values
y = runif(n, min = -90, max = 90) # Random latitude values
)
## convert to sf
sf_points <- st_as_sf(random_points, coords = c("x", "y"), crs = 4326)
## view first rows
head(sf_points)
```
Now we can insert the data into the database using the `ddbs_write_vector()` function. We use the `proc.time()` function to calculate how long does it take, and we can compare it with writing a shapefile with the `write_sf()` function:
```{r}
## write data monitoring processing time
start_time <- proc.time()
ddbs_write_vector(conn, sf_points, "test_points")
end_time <- proc.time()
## print elapsed time
elapsed_duckdb <- end_time["elapsed"] - start_time["elapsed"]
print(elapsed_duckdb)
```
```{r}
## write data monitoring processing time
start_time <- proc.time()
shpfile <- tempfile(fileext = ".shp")
write_sf(sf_points, shpfile)
end_time <- proc.time()
## print elapsed time
elapsed_shp <- end_time["elapsed"] - start_time["elapsed"]
print(elapsed_shp)
```
In this case, we can see that DuckDB was `r round(elapsed_shp / elapsed_duckdb, 1)` times faster. Now, we will do the same exercise but reading the data back into R:
```{r}
## write data monitoring processing time
start_time <- proc.time()
sf_points_ddbs <- ddbs_read_vector(conn, "test_points", crs = 4326)
end_time <- proc.time()
## print elapsed time
elapsed_duckdb <- end_time["elapsed"] - start_time["elapsed"]
print(elapsed_duckdb)
```
```{r}
## write data monitoring processing time
start_time <- proc.time()
sf_points_ddbs <- read_sf(shpfile)
end_time <- proc.time()
## print elapsed time
elapsed_shp <- end_time["elapsed"] - start_time["elapsed"]
print(elapsed_shp)
```
For reading, we get a factor of `r round(elapsed_shp / elapsed_duckdb, 1)` times faster for DuckDB. Finally, don't forget to disconnect from the database:
```{r}
dbDisconnect(conn)
```