Title: | Wicked-Fast Streaming 'JSON' ('ndjson') Reader |
---|---|
Description: | Streaming 'JSON' ('ndjson') has one 'JSON' record per-line and many modern 'ndjson' files contain large numbers of records. These constructs may not be columnar in nature, but it is often useful to read in these files and "flatten" the structure out to enable working with the data in an R 'data.frame'-like context. Functions are provided that make it possible to read in plain 'ndjson' files or compressed ('gz') 'ndjson' files and either validate the format of the records or create "flat" 'data.table' structures from them. |
Authors: | Bob Rudis [aut, cre] , Niels Lohmann [aut] (C++ json parser), Deepak Bandyopadhyay [aut] (C++ gzstream), Lutz Kettner [aut] (C++ gzstream), Neal Fultz [ctb] (Rcpp integration), Maarten Demeyer [ctb] (dtplyr cleanup) |
Maintainer: | Bob Rudis <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.9.0 |
Built: | 2024-12-06 03:09:46 UTC |
Source: | https://github.com/hrbrmstr/ndjson |
data.table
Flatten a character vector of individual JSON lines into a data.table
flatten(x, cls = c("dt", "tbl"))
flatten(x, cls = c("dt", "tbl"))
x |
character vector of individual JSON lines to flatten |
cls |
the package uses |
data.table
or tbl
flatten('{"top":{"next":{"final":1,"end":true},"another":"yes"},"more":"no"}')
flatten('{"top":{"next":{"final":1,"end":true},"another":"yes"},"more":"no"}')
Streaming 'JSON' ('ndjson') has one 'JSON' record per-line and many modern 'ndjson' files contain large numbers of records. These constructs may not be columnar in nature, but it is often useful to read in these files and "flatten" the structure out to enable working with the data in an R 'data.frame'-like context. Functions are provided that make it possible to read in plain ndjson' files or compressed ('gz') 'ndjson' files and either validate the format of the records or create "flat" 'data.table' structures from them.
Bob Rudis ([email protected])
data.table
Given a file of streaming JSON (ndjson) this function reads in the records
and creates a flat data.table
/ tbl
from it.
stream_in(path, cls = c("dt", "tbl"))
stream_in(path, cls = c("dt", "tbl"))
path |
path to file (supports " |
cls |
the package uses |
data.table
or tbl
f <- system.file("extdata", "test.json", package="ndjson") nrow(stream_in(f)) gzf <- system.file("extdata", "testgz.json.gz", package="ndjson") nrow(stream_in(gzf))
f <- system.file("extdata", "test.json", package="ndjson") nrow(stream_in(f)) gzf <- system.file("extdata", "testgz.json.gz", package="ndjson") nrow(stream_in(gzf))
Given a file of streaming JSON (ndjson) this function reads in the records
and validates that they are all legal JSON records. If the verbose
parameter is TRUE
and errors are found, the line numbers of the
errant records will be displayed.
validate(path, verbose = FALSE)
validate(path, verbose = FALSE)
path |
path to file (supports " |
verbose |
display verbose information (filename and line numbers with bad records) |
logical
f <- system.file("extdata", "test.json", package="ndjson") validate(f) gzf <- system.file("extdata", "testgz.json.gz", package="ndjson") validate(gzf)
f <- system.file("extdata", "test.json", package="ndjson") validate(f) gzf <- system.file("extdata", "testgz.json.gz", package="ndjson") validate(gzf)