Title: | Tools to Transform and Query Data with Apache Drill |
---|---|
Description: | Apache Drill is a low-latency distributed query engine designed to enable data exploration and analysis on both relational and non-relational data stores, scaling to petabytes of data. Methods are provided that enable working with Apache Drill instances via the REST API, DBI methods and using 'dplyr'/'dbplyr' idioms. Helper functions are included to facilitate using official Drill Docker images/containers. |
Authors: | Bob Rudis [aut, cre] , Edward Visel [ctb], Andy Hine [ctb], Scott Came [ctb], David Severski [ctb] , James Lamb [ctb] |
Maintainer: | Bob Rudis <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.9.1 |
Built: | 2024-11-10 04:30:44 UTC |
Source: | https://gitlab.com/hrbrmstr/sergeant |
When working with CSV[H] files in Drill 1.15.0+ everything comes back
VARCHAR
since that's the way it should be. The old behaviour of
sergeant
to auto-type convert was kinda horribad wrong. However,
it's a royal pain to make CTAS
queries from a giant list of VARCHAR
field by hand. So, this is a
helper function to do that, inspired by David Severski.
ctas_profile(x, new_table_name = "CHANGE____ME")
ctas_profile(x, new_table_name = "CHANGE____ME")
x |
a |
new_table_name |
a new Drill data source spec (e.g. |
WIP!
## Not run: db <- src_drill("localhost") # Test with bare data source flt1 <- tbl(db, "dfs.d.`/flights.csvh`") cat(ctas_profile(flt1)) # Test with SELECT flt2 <- tbl(db, sql("SELECT `year`, tailnum, time_hour FROM dfs.d.`/flights.csvh`")) cat(ctas_profile(flt2, "dfs.d.`flights.parquet`")) ## End(Not run)
## Not run: db <- src_drill("localhost") # Test with bare data source flt1 <- tbl(db, "dfs.d.`/flights.csvh`") cat(ctas_profile(flt1)) # Test with SELECT flt2 <- tbl(db, sql("SELECT `year`, tailnum, time_hour FROM dfs.d.`/flights.csvh`")) cat(ctas_profile(flt2, "dfs.d.`flights.parquet`")) ## End(Not run)
Drill dbDataType
## S4 method for signature 'DrillConnection' dbDataType(dbObj, obj, ...)
## S4 method for signature 'DrillConnection' dbDataType(dbObj, obj, ...)
dbObj |
A |
obj |
Any R object |
... |
Extra optional parameters |
Other Drill REST DBI API:
DrillConnection-class
,
DrillDriver-class
,
DrillResult-class
,
Drill()
,
dbUnloadDriver,DrillDriver-method
Metadata about database objects
## S4 method for signature 'DrillDriver' dbGetInfo(dbObj) ## S4 method for signature 'DrillConnection' dbGetInfo(dbObj)
## S4 method for signature 'DrillDriver' dbGetInfo(dbObj) ## S4 method for signature 'DrillConnection' dbGetInfo(dbObj)
dbObj |
A |
Unload driver
## S4 method for signature 'DrillDriver' dbUnloadDriver(drv, ...)
## S4 method for signature 'DrillDriver' dbUnloadDriver(drv, ...)
drv |
driver |
... |
Extra optional parameters |
Other Drill REST DBI API:
DrillConnection-class
,
DrillDriver-class
,
DrillResult-class
,
Drill()
,
dbDataType,DrillConnection-method
Drill
Connect to Drill
Drill() ## S4 method for signature 'DrillDriver' dbConnect( drv, host = "localhost", port = 8047L, ssl = FALSE, username = NULL, password = NULL, ... )
Drill() ## S4 method for signature 'DrillDriver' dbConnect( drv, host = "localhost", port = 8047L, ssl = FALSE, username = NULL, password = NULL, ... )
drv |
An object created by |
host |
host |
port |
port |
ssl |
use ssl? |
username , password
|
credentials |
... |
Extra optional parameters |
Other Drill REST DBI API:
DrillConnection-class
,
DrillDriver-class
,
DrillResult-class
,
dbDataType,DrillConnection-method
,
dbUnloadDriver,DrillDriver-method
Other Drill REST DBI API:
DrillConnection-class
,
DrillDriver-class
,
DrillResult-class
,
dbDataType,DrillConnection-method
,
dbUnloadDriver,DrillDriver-method
This is a very simple test (performs HEAD /
request on the Drill server/cluster)
drill_active(drill_con)
drill_active(drill_con)
drill_con |
drill server connection object setup by |
Other Drill direct REST API Interface:
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
## Not run: drill_connection() %>% drill_active() ## End(Not run)
## Not run: drill_connection() %>% drill_active() ## End(Not run)
Cancel the query that has the given queryid
drill_cancel(drill_con, query_id)
drill_cancel(drill_con, query_id)
drill_con |
drill server connection object setup by |
query_id |
the UUID of the query in standard UUID format that Drill assigns to each query. |
Other Drill direct REST API Interface:
drill_active()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
Setup a Drill connection
drill_connection( host = Sys.getenv("DRILL_HOST", "localhost"), port = Sys.getenv("DRILL_PORT", 8047), ssl = FALSE, user = Sys.getenv("DRILL_USER", ""), password = Sys.getenv("DRILL_PASSWORD", "") )
drill_connection( host = Sys.getenv("DRILL_HOST", "localhost"), port = Sys.getenv("DRILL_PORT", 8047), ssl = FALSE, user = Sys.getenv("DRILL_USER", ""), password = Sys.getenv("DRILL_PASSWORD", "") )
host |
Drill host (will pick up the value from |
port |
Drill port (will pick up the value from |
ssl |
use ssl? |
user , password
|
(will pick up the values from |
If user
/password
are set this function will make a POST
to the REST
interface immediately to prime the cookie-jar with the session id.
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
dc <- drill_connection()
dc <- drill_connection()
dplyr
translationsOne benefit of dplyr
is that it provide a nice DSL over datasbase ops but that
means there needs to be knowlege of functions supported by the host database and
then a translation layer so they can be used in R.
Similarly, there are functions like grepl()
in R that don't directly exist in
databases. Yet, one can create a translation for grepl()
that maps to a
Drill custom function so you
don't have to think differently or rewrite your pipes when switching from core
tidyverse ops and database ops.
Many functions translate on their own, but it's handy to provide explicit ones, especially when you want to use parameters in a different order.
If you want a particular custom function mapped, file a PR or issue request in
the link found in the DESCRIPTION
file.
as.character(x)
: CAST( x AS CHARACTER )
as.integer64(x)
: CAST( x AS BIGINT )
as.date(x)
: CAST( x AS DATE )
as.logical(x)
: CAST( x AS BOOLEAN)
as.numeric(x)
: CAST( x AS DOUBLE )
as.posixct(x)
: CAST( x AS TIMESTAMP )
binary_string(x)
: BINARY_STRING( x )
cbrt(x)
: CBRT( x )
char_to_timestamp(x, y)
: TO_TIMESTAMP( x, y )
grepl(y, x)
: CONTAINS( x, y )
contains(x, y)
: CONTAINS( x, y )
convert_to(x, y)
: CONVERT_TO( x, y )
convert_from(x, y)
: CONVERT_FROM( x, y )
degrees(x)
: DEGREES( x )
lshift(x, y)
: DEGREES( x, y )
negative(x)
: NEGATIVE( x )
pow(x, y)
: MOD( x, y )
sql_prefix(x, y)
: POW( x, y )
string_binary(x)
: STRING_BINARY( x )
radians(x)
: RADIANS( x )
rshift(x)
: RSHIFT( x )
to_char(x, y)
: TO_CHAR x, y )
to_date(x, y)
: TO_DATE( x, y )
to_number(x, y)
: TO_NUMBER( x, y )
trunc(x)
: TRUNC( x )
double_to_timestamp(x)
= TO_TIMESTAMP( x )
char_length(x)
= CHAR_LENGTH( x )
flatten(x)
= FLATTEN( x )
kvgen(x)
= KVGEN( x )
repeated_count(x)
= REPEATED_COUNT( x )
repeated_contains(x)
= REPEATED_CONTAINS( x )
ilike(x, y)
= ILIKE( x, y )
init_cap(x)
= INIT_CAP( x )
length(x)
= LENGTH( x )
lower(x)
= LOWER( x )
tolower(x)
= LOWER( x )
ltrim(x, y)
= LTRIM( x, y )
nullif(x, y
= NULLIF( x, y )
position(x, y)
= POSITION( x IN y )
gsub(x, y, z)
= REGEXP_REPLACE( z, x, y )
regexp_replace(x, y, z)
= REGEXP_REPLACE( x, y, z )
rtrim(x, y)
= RTRIM( x, y )
rpad(x, y)
= RPAD( x, y )
rpad_with(x, y, z)
= RPAD( x, y, z )
lpad(x, y)
= LPAD( x, y )
lpad_with(x, y, z)
= LPAD( x, y, z )
strpos(x, y)
= STRPOS( x, y )
substr(x, y, z)
= SUBSTR( x, y, z )
upper(x)
= UPPER(1)
toupper(x)
= UPPER(1)
You can get a compact list of these with:
sql_translate_env(src_drill()$con)
as well.
Other Drill REST API (dplyr):
src_drill()
,
src_tbls.src_drill()
Show all the available Drill built-in functions & UDFs
drill_functions(drill_con, browse = FALSE)
drill_functions(drill_con, browse = FALSE)
drill_con |
drill server connection object setup by |
browse |
if |
data frame
You must be using Drill 1.15.0+ to use this function
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
## Not run: drill_connection() %>% drill_functions() ## End(Not run)
## Not run: drill_connection() %>% drill_functions() ## End(Not run)
Get the current memory metrics
drill_metrics(drill_con)
drill_metrics(drill_con)
drill_con |
drill server connection object setup by |
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
## Not run: drill_connection() %>% drill_metrics() ## End(Not run)
## Not run: drill_connection() %>% drill_metrics() ## End(Not run)
List the name, default, and data type of the system and session options
drill_options(drill_con, pattern = NULL)
drill_options(drill_con, pattern = NULL)
drill_con |
drill server connection object setup by |
pattern |
pattern to filter results by |
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
## Not run: drill_connection() %>% drill_options() ## End(Not run)
## Not run: drill_connection() %>% drill_options() ## End(Not run)
Show all the available Drill options
drill_opts(drill_con, browse = FALSE)
drill_opts(drill_con, browse = FALSE)
drill_con |
drill server connection object setup by |
browse |
if |
data frame
You must be using Drill 1.15.0+ to use this function
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
## Not run: drill_connection() %>% drill_opts() ## End(Not run)
## Not run: drill_connection() %>% drill_opts() ## End(Not run)
Get the profile of the query that has the given queryid
drill_profile(drill_con, query_id)
drill_profile(drill_con, query_id)
drill_con |
drill server connection object setup by |
query_id |
UUID of the query in standard UUID format that Drill assigns to each query |
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
Get the profiles of running and completed queries
drill_profiles(drill_con)
drill_profiles(drill_con)
drill_con |
drill server connection object setup by |
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
## Not run: drill_connection() %>% drill_profiles() ## End(Not run)
## Not run: drill_connection() %>% drill_profiles() ## End(Not run)
This function can handle REST API connections or JDBC connections. There is a benefit to
calling this function for JDBC connections vs a straight call to dbGetQuery()
in
that the function result is a tbl_df
vs a plain data.frame
so you get better
default printing (which can be helpful if you accidentally execute a query and the result
set is huge).
drill_query(drill_con, query, uplift = TRUE, .progress = interactive())
drill_query(drill_con, query, uplift = TRUE, .progress = interactive())
drill_con |
drill server connection object setup by |
query |
query to run |
uplift |
automatically run |
.progress |
if |
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
try({ drill_connection() %>% drill_query("SELECT * FROM cp.`employee.json` limit 5") }, silent=TRUE)
try({ drill_connection() %>% drill_query("SELECT * FROM cp.`employee.json` limit 5") }, silent=TRUE)
Helper function to make it more R-like to set Drill SESSION or SYSTEM optons. It
handles the conversion of R types (like TRUE
) to SQL types and automatically
quotes parameter values (when necessary).
drill_set(drill_con, ..., type = c("session", "system"))
drill_set(drill_con, ..., type = c("session", "system"))
drill_con |
drill server connection object setup by |
... |
named parameters to be sent to |
type |
set the |
If any query errors result, error messages will be presented to the console.
a tbl
(invisibly) with the ALTER
queries sent and results, including errors.
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
## Not run: drill_connection() %>% drill_set(exec.errors.verbose=TRUE, store.format="parquet", web.logs.max_lines=20000) ## End(Not run)
## Not run: drill_connection() %>% drill_set(exec.errors.verbose=TRUE, store.format="parquet", web.logs.max_lines=20000) ## End(Not run)
Changes (optionally, all) session settings back to system defaults
drill_settings_reset(drill_con, ...)
drill_settings_reset(drill_con, ...)
drill_con |
drill server connection object setup by |
... |
bare name of system options to reset |
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
## Not run: drill_connection() %>% drill_settings_reset(exec.errors.verbose) ## End(Not run)
## Not run: drill_connection() %>% drill_settings_reset(exec.errors.verbose) ## End(Not run)
Show files in a file system schema.
drill_show_files(drill_con, schema_spec, .progress = interactive())
drill_show_files(drill_con, schema_spec, .progress = interactive())
drill_con |
drill server connection object setup by |
schema_spec |
properly quoted "filesystem.directory_name" reference path |
.progress |
if |
Other Dill direct REST API Interface:
drill_show_schemas()
,
drill_use()
try({ drill_connection() %>% drill_show_files("dfs.tmp") }, silent=TRUE)
try({ drill_connection() %>% drill_show_files("dfs.tmp") }, silent=TRUE)
Returns a list of available schemas.
drill_show_schemas(drill_con, .progress = interactive())
drill_show_schemas(drill_con, .progress = interactive())
drill_con |
drill server connection object setup by |
.progress |
if |
Other Dill direct REST API Interface:
drill_show_files()
,
drill_use()
Get Drillbit information, such as ports numbers
drill_stats(drill_con)
drill_stats(drill_con)
drill_con |
drill server connection object setup by |
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
## Not run: drill_connection() %>% drill_stats() ## End(Not run)
## Not run: drill_connection() %>% drill_stats() ## End(Not run)
Get the status of Drill
drill_status(drill_con)
drill_status(drill_con)
drill_con |
drill server connection object setup by |
The output of this is in a "viewer" window
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
## Not run: drill_connection() %>% drill_status() ## End(Not run)
## Not run: drill_connection() %>% drill_status() ## End(Not run)
Retrieve, modify or remove storage plugins from a Drill instance. If you intend
to modify an existing configuration it is suggested that you use the "list
" or
"raw
" values to the as
parameter to make it easier to modify them.
drill_storage(drill_con, plugin = NULL, as = c("tbl", "list", "raw")) drill_mod_storage(drill_con, name, config) drill_rm_storage(drill_con, name)
drill_storage(drill_con, plugin = NULL, as = c("tbl", "list", "raw")) drill_mod_storage(drill_con, name, config) drill_rm_storage(drill_con, name)
drill_con |
drill server connection object setup by |
plugin |
the assigned name in the storage plugin definition. |
as |
one of " |
name |
name of the storage plugin configuration to create/update/remove |
config |
a raw 1-element character vector containing valid JSON of a complete storage spec |
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_system_reset()
,
drill_threads()
,
drill_version()
## Not run: drill_connection() %>% drill_storage() drill_connection() %>% drill_mod_storage( name = "drilldat", config = ' { "config" : { "connection" : "file:///", "enabled" : true, "formats" : null, "type" : "file", "workspaces" : { "root" : { "location" : "/Users/hrbrmstr/drilldat", "writable" : true, "defaultInputFormat": null } } }, "name" : "drilldat" } ') ## End(Not run)
## Not run: drill_connection() %>% drill_storage() drill_connection() %>% drill_mod_storage( name = "drilldat", config = ' { "config" : { "connection" : "file:///", "enabled" : true, "formats" : null, "type" : "file", "workspaces" : { "root" : { "location" : "/Users/hrbrmstr/drilldat", "writable" : true, "defaultInputFormat": null } } }, "name" : "drilldat" } ') ## End(Not run)
Changes (optionally, all) system settings back to system defaults
drill_system_reset(drill_con, ..., all = FALSE)
drill_system_reset(drill_con, ..., all = FALSE)
drill_con |
drill server connection object setup by |
... |
bare name of system options to reset |
all |
if |
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_threads()
,
drill_version()
## Not run: drill_connection() %>% drill_system_reset(all=TRUE) ## End(Not run)
## Not run: drill_connection() %>% drill_system_reset(all=TRUE) ## End(Not run)
Get information about threads
drill_threads(drill_con)
drill_threads(drill_con)
drill_con |
drill server connection object setup by |
The output of this is in a "viewer" window
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_version()
## Not run: drill_connection() %>% drill_threads() ## End(Not run)
## Not run: drill_connection() %>% drill_threads() ## End(Not run)
This is a "get you up and running quickly" helper function as it only runs a standalone mode Drill instance and is optionally removed after the container is stopped. You should customize your own Drill containers based on the one at Drill's Docker Hub.
drill_up( image = "drill/apache-drill:1.16.0", container_name = "drill", data_dir = getwd(), remove = TRUE ) drill_down(id)
drill_up( image = "drill/apache-drill:1.16.0", container_name = "drill", data_dir = getwd(), remove = TRUE ) drill_down(id)
image |
Drill image to use. Must be a valid image from Drill's Docker Hub. Defaults to most recent Drill docker image. |
container_name |
naem for the container. Defaults to " |
data_dir |
valid path to a place where your data is stored; defaults to the
value of |
remove |
remove the Drill container instance after it's stopped?
Defaults to |
id |
the id of the Drill container |
The path specified in data_dir
will be mapped inside the container as
/data
and a new dfs
storage workspace will created (dfs.d
) that
maps to /data
and is writable.
Use drill_down()
to stop a running Drill container by container id
(full or partial).
a stevedore
docker object (invisibly) which you are responsible
for killing with the $stop()
function or from the Docker command
line (in interactive mode the docker container ID is printed as well).
this requires a working Docker setup on your system and it is highly suggested
you docker pull
it yourself before running this function.
Other Drill Docker functions:
killall_drill()
,
showall_drill()
## Not run: drill_up(data_dir = "~/Data") ## End(Not run)
## Not run: drill_up(data_dir = "~/Data") ## End(Not run)
If you know the result of drill_query()
will be a data frame, then
you can pipe it to this function to pull out rows
and automatically
type-convert it.
drill_uplift(query_result)
drill_uplift(query_result)
query_result |
the result of a call to |
Not really intended to be called directly, but useful if you accidentally ran
drill_query()
without uplift=TRUE
but want to then convert the structure.
Change to a particular schema.
drill_use(drill_con, schema_name, .progress = interactive())
drill_use(drill_con, schema_name, .progress = interactive())
drill_con |
drill server connection object setup by |
schema_name |
A unique name for a Drill schema. A schema in Drill is a configured storage plugin, such as hive, or a storage plugin and workspace. |
.progress |
if |
Other Dill direct REST API Interface:
drill_show_files()
,
drill_show_schemas()
Identify the version of Drill running
drill_version(drill_con)
drill_version(drill_con)
drill_con |
drill server connection object setup by |
Other Drill direct REST API Interface:
drill_active()
,
drill_cancel()
,
drill_connection()
,
drill_functions()
,
drill_metrics()
,
drill_options()
,
drill_opts()
,
drill_profiles()
,
drill_profile()
,
drill_query()
,
drill_settings_reset()
,
drill_set()
,
drill_stats()
,
drill_status()
,
drill_storage()
,
drill_system_reset()
,
drill_threads()
## Not run: drill_connection() %>% drill_version() ## End(Not run)
## Not run: drill_connection() %>% drill_version() ## End(Not run)
DrillConnection
A concise character representation (label) for a DrillConnection
## S3 method for class 'DrillConnection' format(x, ...)
## S3 method for class 'DrillConnection' format(x, ...)
x |
a |
... |
ignored |
This is a destructive function. It will stop any Docker container that
is based on an image matching a runtime command of "bin/drill-embedded
".
It's best used when you had a session forcefully interuppted and had been
using the R helper functions to start/stop the Drill Docker container.
You may want to consider using the Docker command-line interface to perform
this work manually.
killall_drill()
killall_drill()
Other Drill Docker functions:
drill_up()
,
showall_drill()
drill_conn
objectsPrint function for drill_conn
objects
## S3 method for class 'drill_conn' print(x, ...)
## S3 method for class 'drill_conn' print(x, ...)
x |
a |
... |
unused |
The following functions are imported and then re-exported from the sergeant package to enable use of the magrittr pipe operator with no additional library calls
This function will show all Docker containers that are based on an
image matching a runtime command of "bin/drill-embedded
".
showall_drill()
showall_drill()
Other Drill Docker functions:
drill_up()
,
killall_drill()
Use src_drill()
to connect to a Drill cluster and tbl()
to connect to a
fully-qualified "table reference". The vast majority of Drill SQL functions have
also been made available to the dplyr
interface. If you have custom Drill
SQL functions that need to be implemented please file an issue on GitHub.
src_drill( host = Sys.getenv("DRILL_HOST", "localhost"), port = as.integer(Sys.getenv("DRILL_PORT", 8047L)), ssl = FALSE, username = NULL, password = NULL ) ## S3 method for class 'src_drill' tbl(src, from, ...)
src_drill( host = Sys.getenv("DRILL_HOST", "localhost"), port = as.integer(Sys.getenv("DRILL_PORT", 8047L)), ssl = FALSE, username = NULL, password = NULL ) ## S3 method for class 'src_drill' tbl(src, from, ...)
host |
Drill host (will pick up the value from |
port |
Drill port (will pick up the value from |
ssl |
use ssl? |
username , password
|
if not |
src |
A Drill "src" created with |
from |
A Drill view or table specification |
... |
Extra parameters |
This is a DBI wrapper around the Drill REST API.
Other Drill REST API (dplyr):
drill_custom_functions
,
src_tbls.src_drill()
Other Drill REST API (dplyr):
drill_custom_functions
,
src_tbls.src_drill()
try({ db <- src_drill("localhost", 8047L) print(db) ## src: DrillConnection ## tbls: INFORMATION_SCHEMA, cp.default, dfs.default, dfs.root, dfs.tmp, sys emp <- tbl(db, "cp.`employee.json`") count(emp, gender, marital_status) ## # Source: lazy query [?? x 3] ## # Database: DrillConnection ## # Groups: gender ## marital_status gender n ## <chr> <chr> <int> ## 1 S F 297 ## 2 M M 278 ## 3 S M 276 # Drill-specific SQL functions are also available select(emp, full_name) %>% mutate( loc = strpos(full_name, "a"), first_three = substr(full_name, 1L, 3L), len = length(full_name), rx = regexp_replace(full_name, "[aeiouAEIOU]", "*"), rnd = rand(), pos = position("en", full_name), rpd = rpad(full_name, 20L), rpdw = rpad_with(full_name, 20L, "*")) ## # Source: lazy query [?? x 9] ## # Database: DrillConnection ## loc full_name len rpdw pos rx ## <int> <chr> <int> <chr> <int> <chr> ## 1 0 Sheri Nowmer 12 Sheri Nowmer******** 0 Sh*r* N*wm*r ## 2 0 Derrick Whelply 15 Derrick Whelply***** 0 D*rr*ck Wh*lply ## 3 5 Michael Spence 14 Michael Spence****** 11 M*ch**l Sp*nc* ## 4 2 Maya Gutierrez 14 Maya Gutierrez****** 0 M*y* G*t**rr*z ## 5 7 Roberta Damstra 15 Roberta Damstra***** 0 R*b*rt* D*mstr* ## 6 7 Rebecca Kanagaki 16 Rebecca Kanagaki**** 0 R*b*cc* K*n*g*k* ## 7 0 Kim Brunner 11 Kim Brunner********* 0 K*m Br*nn*r ## 8 6 Brenda Blumberg 15 Brenda Blumberg***** 3 Br*nd* Bl*mb*rg ## 9 2 Darren Stanz 12 Darren Stanz******** 5 D*rr*n St*nz ## 10 4 Jonathan Murraiin 17 Jonathan Murraiin*** 0 J*n*th*n M*rr***n ## # ... with more rows, and 3 more variables: rpd <chr>, rnd <dbl>, first_three <chr> }, silent=TRUE)
try({ db <- src_drill("localhost", 8047L) print(db) ## src: DrillConnection ## tbls: INFORMATION_SCHEMA, cp.default, dfs.default, dfs.root, dfs.tmp, sys emp <- tbl(db, "cp.`employee.json`") count(emp, gender, marital_status) ## # Source: lazy query [?? x 3] ## # Database: DrillConnection ## # Groups: gender ## marital_status gender n ## <chr> <chr> <int> ## 1 S F 297 ## 2 M M 278 ## 3 S M 276 # Drill-specific SQL functions are also available select(emp, full_name) %>% mutate( loc = strpos(full_name, "a"), first_three = substr(full_name, 1L, 3L), len = length(full_name), rx = regexp_replace(full_name, "[aeiouAEIOU]", "*"), rnd = rand(), pos = position("en", full_name), rpd = rpad(full_name, 20L), rpdw = rpad_with(full_name, 20L, "*")) ## # Source: lazy query [?? x 9] ## # Database: DrillConnection ## loc full_name len rpdw pos rx ## <int> <chr> <int> <chr> <int> <chr> ## 1 0 Sheri Nowmer 12 Sheri Nowmer******** 0 Sh*r* N*wm*r ## 2 0 Derrick Whelply 15 Derrick Whelply***** 0 D*rr*ck Wh*lply ## 3 5 Michael Spence 14 Michael Spence****** 11 M*ch**l Sp*nc* ## 4 2 Maya Gutierrez 14 Maya Gutierrez****** 0 M*y* G*t**rr*z ## 5 7 Roberta Damstra 15 Roberta Damstra***** 0 R*b*rt* D*mstr* ## 6 7 Rebecca Kanagaki 16 Rebecca Kanagaki**** 0 R*b*cc* K*n*g*k* ## 7 0 Kim Brunner 11 Kim Brunner********* 0 K*m Br*nn*r ## 8 6 Brenda Blumberg 15 Brenda Blumberg***** 3 Br*nd* Bl*mb*rg ## 9 2 Darren Stanz 12 Darren Stanz******** 5 D*rr*n St*nz ## 10 4 Jonathan Murraiin 17 Jonathan Murraiin*** 0 J*n*th*n M*rr***n ## # ... with more rows, and 3 more variables: rpd <chr>, rnd <dbl>, first_three <chr> }, silent=TRUE)