Commands are non-SQL statements such as setting a property or adding a resource. They can be used in HiveQL scripts or directly in the CLI or Beeline.
Command
|
Description
|
---|---|
Command
|
Description
|
quit
exit |
Use quit or exit to leave the interactive shell.
|
reset
|
Resets the configuration to the default values (as of Hive 0.10: see HIVE-3202). Any configuration parameters that were set using the set command or -hiveconf parameter in hive commandline will get reset to default value.
Note that this does not apply to configuration parameters that were set in set command using the "hiveconf:" prefix for the key name (for historic reasons).
|
set
|
Sets the value of a particular configuration variable (key).
Note: If you misspell the variable name, the CLI will not show an error. |
set
|
Prints a list of configuration variables that are overridden by the user or Hive.
|
set -v
|
Prints all Hadoop and Hive configuration variables.
|
add FILE[S]
add JAR[S] add ARCHIVE[S] |
Adds one or more files, jars, or archives to the list of resources in the distributed cache. See Hive Resources for more information.
|
add FILE[S] add JAR[S] add ARCHIVE[S] | As of Hive 1.2.0, adds one or more files, jars or archives to the list of resources in the distributed cache using an Ivy URL of the form ivy://group:module:version?query_string. See Hive Resources for more information. |
list FILE[S]
list JAR[S] list ARCHIVE[S] |
Lists the resources already added to the distributed cache. See Hive Resources for more information.
|
list FILE[S]
list JAR[S] list ARCHIVE[S] |
Checks whether the given resources are already added to the distributed cache or not. See Hive Resources for more information.
|
delete FILE[S]
delete JAR[S] delete ARCHIVE[S] |
Removes the resource(s) from the distributed cache.
|
delete FILE[S] delete JAR[S] delete ARCHIVE[S] | As of Hive 1.2.0, removes the resource(s) which were added using the |
!
|
Executes a shell command from the Hive shell.
|
dfs
|
Executes a dfs command from the Hive shell.
|
Executes a Hive query and prints results to standard output.
| |
source FILE
|
Executes a script file inside the CLI.
|
compile `
|
This allows inline Groovy code to be compiled and be used as a UDF (as of Hive 0.13.0). For a usage example, see Nov. 2013 Hive Contributors Meetup Presentations – Using Dynamic Compilation with Hive.
|
Sample Usage:
hive> set mapred.reduce.tasks=32; hive> set; hive> select a.* from tab1; hive> !ls; hive> dfs -ls;
Hive Command Line Options
To get help, run "hive -H
" or "hive --help
". Usage (as it is in Hive 0.9.0):In remote mode HiveServer2 only accepts valid Thrift calls – even in HTTP mode, the message body contains Thrift payloads.usage: hive -d,--defineVariable substitution to apply to Hive commands. e.g. -d A=B or --define A=B -e SQL from command line -f SQL from files -H,--help Print help information -h Connecting to Hive Server on remote host --hiveconf Use value for given property --hivevar Variable substitution to apply to hive commands. e.g. --hivevar A=B -i Initialization SQL file -p Connecting to Hive Server on port number -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console) Examples
See Variable Substitution for examples of using thehiveconf
option.
Example of running a query from the command line$HIVE_HOME/bin/hive -e 'select a.col from tab1 a' Example of setting Hive configuration variables$HIVE_HOME/bin/hive -e 'select a.col from tab1 a' --hiveconf hive.exec.scratchdir=/home/my/hive_scratch --hiveconf mapred.reduce.tasks=32 Example of dumping data out from a query into a file using silent mode$HIVE_HOME/bin/hive -S -e 'select a.col from tab1 a' > a.txt Example of running a script non-interactively from local disk$HIVE_HOME/bin/hive -f /home/my/hive-script.sql Example of running a script non-interactively from a Hadoop supported filesystem (starting in Hive 0.14)$HIVE_HOME/bin/hive -f hdfs://: /hive-script.sql $HIVE_HOME/bin/hive -f s3://mys3bucket/s3-script.sql Example of running an initialization script before entering interactive mode$HIVE_HOME/bin/hive -i /home/my/hive-init.sql/tmp/
/hive.log $HIVE_HOME/bin/hive --hiveconf hive.root.logger=INFO,consolehive.root.logger
specifies the logging level as well as the log destination. Specifyingconsole
as the target sends the logs to the standard error (instead of the log file).Hive Batch Mode CommandsWhen$HIVE_HOME/bin/hive
is run with the-e
or-f
option, it executes SQL commands in batch mode.
hive -e '
executes the query string.' hive -f
executes one or more SQL queries from a file.Version 0.14
As of Hive 0.14,can be from one of the Hadoop supported filesystems (HDFS, S3, etc.) as well. $HIVE_HOME/bin/hive -f hdfs://
: /hive-script.sql $HIVE_HOME/bin/hive -f s3://mys3bucket/s3-script.sql
hive> add FILE /tmp/tt.py; hive> list FILES; /tmp/tt.py hive> select from networks a MAP a.networkid USING 'python tt.py' as nn where a.ds = '2009-01-04' limit 10;
% beeline -u jdbc:hive2://localhost:10000/default -n scott -w password_file Hive version 0.11.0-SNAPSHOT by Apache Connecting to jdbc:hive2://localhost:10000/defaultBeeline Command Options
The Beeline CLI supports these command line options:
Option Description Option Description -u The JDBC URL to connect to. Special characters in parameter values should be encoded with URL encoding if needed.Usage:beeline -u
db_URL -r Reconnect to last used URL (if a user has previously used!connect
to a URL and used!save
to a beeline.properties file).Usage:beeline -r
Version: 2.1.0 (HIVE-13670) -n The username to connect as.Usage:beeline -n
valid_user -p The password to connect as.Usage:beeline -p
valid_passwordOptional password mode:Starting Hive 2.2.0 (HIVE-13589) the argument for -p option is optional.Usage : beeline -p [valid_password]If the password is not provided after -p Beeline will prompt for the password while initiating the connection. When password is provided Beeline uses it initiate the connection without prompting. -d The driver class to use.Usage:beeline -d
driver_class -e Query that should be executed. Double or single quotes enclose the query string. This option can be specified multiple times.Usage:beeline -e "
query_string"Support to run multiple SQL statements separated by semicolons in a single query_string: 1.2.0 (HIVE-9877)Bug fix (null pointer exception): 0.13.0 (HIVE-5765)Bug fix (--headerInterval not honored): 0.14.0 (HIVE-7647) Bug fix (running-e
in background): 1.3.0 and 2.0.0 (HIVE-6758); workaround available for earlier versions-f Script file that should be executed.Usage:beeline -f
filepathVersion: 0.12.0 (HIVE-4268) Note: If the script contains tabs, query compilation fails in version 0.12.0. This bug is fixed in version 0.13.0 (HIVE-6359). Bug fix (running-f
in background): 1.3.0 and 2.0.0 (HIVE-6758); workaround available for earlier versions-i (or) --init The init files for initializationUsage:beeline -i /tmp/initfile
Single file:Version: 0.14.0 (HIVE-6561)Multiple files:Version: 2.1.0 (HIVE-11336)-w (or) --password-file The password file to read password from.Version: 1.2.0 (HIVE-7175)-a (or) --authType The authentication type passed to the jdbc as an auth propertyVersion: 0.13.0 (HIVE-5155)--property-file File to read configuration properties fromUsage:beeline --property-file /tmp/a
Version: 2.2.0 (HIVE-13964)--hiveconf property=value Use value for the given configuration property. Properties that are listed in hive.conf.restricted.list cannot be reset with hiveconf (see Restricted List and Whitelist).Usage:beeline --hiveconf
prop1=
value1Version: 0.13.0 (HIVE-6173)--hivevar name=value Hive variable name and value. This is a Hive-specific setting in which variables can be set at the session level and referenced in Hive commands or queries.Usage:beeline --hivevar
var1=
value1--color=[true/false] Control whether color is used for display. Default is false.Usage:beeline --color=true
(Not supported for Separated-Value Output formats. See HIVE-9770)--showHeader=[true/false] Show column names in query results (true) or not (false). Default is true.Usage:beeline --showHeader=false
--headerInterval=ROWS The interval for redisplaying column headers, in number of rows, when outputformat is table. Default is 100.Usage:beeline --headerInterval=50
(Not supported for Separated-Value Output formats. See HIVE-9770)--fastConnect=[true/false] When connecting, skip building a list of all tables and columns for tab-completion of HiveQL statements (true) or build the list (false). Default is true.Usage:beeline --fastConnect=false
--autoCommit=[true/false] Enable/disable automatic transaction commit. Default is false.Usage:beeline --autoCommit=true
--verbose=[true/false] Show verbose error messages and debug information (true) or do not show (false). Default is false.Usage:beeline --verbose=true
--showWarnings=[true/false] Display warnings that are reported on the connection after issuing any HiveQL commands. Default is false.Usage:beeline --showWarnings=true
--showDbInPrompt=[true/false] Display the current database name in prompt. Default is false.Usage:beeline --showDbInPrompt=true
Version: 2.2.0 (HIVE-14123)--showNestedErrs=[true/false] Display nested errors. Default is false.Usage:beeline --showNestedErrs=true
--numberFormat=[pattern] Format numbers using a DecimalFormat pattern.Usage:beeline --numberFormat="#,###,##0.00"
--force=[true/false] Continue running script even after errors (true) or do not continue (false). Default is false.Usage:beeline--force=true
--maxWidth=MAXWIDTH The maximum width to display before truncating data, in characters, when outputformat is table. Default is to query the terminal for current width, then fall back to 80.Usage:beeline --maxWidth=150
--maxColumnWidth=MAXCOLWIDTH The maximum column width, in characters, when outputformat is table. Default is 50 in Hive version 2.2.0+ (see HIVE-14135) or 15 in earlier versions.Usage:beeline --maxColumnWidth=25
--silent=[true/false] Reduce the amount of informational messages displayed (true) or not (false). It also stops displaying the log messages for the query from HiveServer2 (Hive 0.14 and later) and the HiveQL commands (Hive 1.2.0 and later). Default is false.Usage:beeline --silent=true
--autosave=[true/false] Automatically save preferences (true) or do not autosave (false). Default is false.Usage:beeline --autosave=true
--outputformat=[table/vertical/csv/tsv/dsv/csv2/tsv2] Format mode for result display. Default is table. See Separated-Value Output Formats below for description of recommended sv options.Usage:beeline --outputformat=tsv
Version: dsv/csv2/tsv2 added in 0.14.0 (HIVE-8615)--truncateTable=[true/false] If true, truncates table column in the console when it exceeds console length.Version: 0.14.0 (HIVE-6928)--delimiterForDSV= DELIMITER The delimiter for delimiter-separated values output format. Default is '|' character.Version: 0.14.0 (HIVE-7390)--isolation=LEVEL Set the transaction isolation level to TRANSACTION_READ_COMMITTED or TRANSACTION_SERIALIZABLE. See the "Field Detail" section in the Java Connection documentation.Usage:beeline --isolation=TRANSACTION_SERIALIZABLE
--nullemptystring=[true/false] Use historic behavior of printing null as empty string (true) or use current behavior of printing null as NULL (false). Default is false.Usage:beeline --nullemptystring=false
Version: 0.13.0 (HIVE-4485) --incremental=[true/false] Defaults totrue
from Hive 2.3 onwards, before it defaulted tofalse
.
When set tofalse
, the entire result set is fetched and buffered before being displayed, yielding optimal display column sizing. When set totrue
, result rows are displayed immediately as they are fetched, yielding lower latency and memory usage at the price of extra display column padding. Setting--incremental=true
is recommended if you encounter an OutOfMemory on the client side (due to the fetched result set size being large).--incrementalBufferRows=NUMROWS The number of rows to buffer when printing rows on stdout, defaults to 1000; only applicable if--incremental=true
and--outputformat=table
Usage:beeline --incrementalBufferRows=1000
Version: 2.3.0 (HIVE-14170)--maxHistoryRows=NUMROWS The maximum number of rows to store Beeline history.Version: 2.3.0 (HIVE-15166)--delimiter=; Set the delimiter for queries written in Beeline. Multi-char delimiters are allowed, but quotation marks, slashes, and -- are not allowed. Defaults to ;Usage:beeline --delimiter=$$
Version: 3.0.0 (HIVE-10865)--convertBinaryArrayToString=[true/false] Display binary column data as string or as byte array.Usage:beeline --convertBinaryArrayToString=true
Version: 3.0.0 (HIVE-14786) --help Display a usage message.Usage:beeline --help
"1.0"
?>
"text/xsl"
href=
"configuration.xsl"
?>
beeline.hs2.connection.user
hive
beeline.hs2.connection.password
hive
"1.0"
?>
"text/xsl"
href=
"configuration.xsl"
?>
beeline.hs2.connection.hosts
localhost:
10000
beeline.hs2.connection.principal
hive/dummy-hostname
@domain
.com
"1.0"
?>
"text/xsl"
href=
"configuration.xsl"
?>
beeline.hs2.connection.user
hive
beeline.hs2.connection.hiveconf
hive.cli.print.current.db=
true
, hive.cli.print.header=
true
beeline.hs2.connection.hivevar
testVarName1=value1, testVarName2=value2
//localhost:10000/default;user=hive;password=hive |
beeline.hs2.jdbc.url.httpUrl
jdbc:hive2:
//localhost:10000/default;user=hive;password=hive;transportMode=http;httpPath=cliservice
beeline.hs2.jdbc.url.
default
tcpUrl
beeline.hs2.connection.password
hive
Consider the following beeline-site.xml:
beeline.hs2.jdbc.url.httpUrl
jdbc:hive2:
//localhost:10000/default;transportMode=http;httpPath=cliservice
beeline.hs2.jdbc.url.
default
tcpUrl
Using JDBC
You can use JDBC to access data stored in a relational database or other tabular format.
- Load the HiveServer2 JDBC driver. As of 1.2.0 applications no longer need to explicitly load JDBC drivers using Class.forName().
For example:Class.forName("org.apache.hive.jdbc.HiveDriver");
- Connect to the database by creating a
Connection
object with the JDBC driver.
For example:Connection cnct = DriverManager.getConnection("jdbc:hive2://
: ", " ", " "); The default
is 10000. In non-secure configurations, specify a
for the query to run as. The
field value is ignored in non-secure mode.Connection cnct = DriverManager.getConnection("jdbc:hive2://
: ", " ", ""); In Kerberos secure mode, the user information is based on the Kerberos credentials. - Submit SQL to the database by creating a
Statement
object and using itsexecuteQuery()
method.
For example:Statement stmt = cnct.createStatement(); ResultSet rset = stmt.executeQuery("SELECT foo FROM bar");
- Process the result set, if necessary.
JDBC Client Sample Code
Running the JDBC Sample Code
Alternatively, you can run the following bash script, which will seed the data file and build your classpath before invoking the client. The script adds all the additional jars needed for using HiveServer2 in embedded mode as well.
$ a=b
$ hive -e
" describe $a "
$ bin/hive --hiveconf a=b -e 'set a; set hiveconf:a; \
create table
if
not exists b (col
int
); describe ${hiveconf:a}'
$ a=b
$ hive -e
" describe $a "
$ bin/hive --hiveconf a=b -e 'set a; set hiveconf:a; \
create table
if
not exists b (col
int
); describe ${hiveconf:a}'
Hive history file=/tmp/edward/hive_job_log_edward_201011240906_1463048967.txt
a=b
hiveconf:a=b
OK
Time taken:
5.913
seconds
OK
col
int
Time taken:
0.754
seconds
There are three namespaces for variables – hiveconf, system, and env
set x=myvalue
${hiveconf:x}
set zzz=
5
;
-- sets zzz=
5
set zzz;
set system:xxx=
5
;
set system:xxx;
-- sets a system property xxx to
5
set system:yyy=${system:xxx};
set system:yyy;
-- sets yyy with value of xxx
set go=${hiveconf:zzz};
set go;
-- sets go base on value on zzz
set hive.variable.substitute=
false
;
set raw=${hiveconf:zzz};
set raw;
-- disable substitution set a value to the literal
set hive.variable.substitute=
true
;
EXPLAIN SELECT * FROM src where key=${hiveconf:zzz};
SELECT * FROM src where key=${hiveconf:zzz};
--use a variable in a query
set a=
1
;
set b=a;
set c=${hiveconf:${hiveconf:b}};
set c;
--uses nested variables.
set jar=../lib/derby.jar;
add file ${hiveconf:jar};
list file;
delete file ${hiveconf:jar};
list file;
The HCatalog command line interface (CLI) can be invoked as HIVE_HOME=
hive_home hcat_home/bin/hcat
where hive_home is the directory where Hive has been installed and hcat_home is the directory where HCatalog has been installed.
HCatalog CLI
The HCatalog CLI supports these command line options:
Option
|
Usage
|
Description
|
---|---|---|
hcat |
Prints a usage message.
| |
-D
| hcat -D key= value ... |
Passes the key-value pair to HCatalog as a Java System Property.
|
-e
| hcat -e 'create table mytable(a int);' ... |
Tells HCatalog to treat the following string as a DDL command and execute it.
|
-f
| hcat -f myscript.hcatalog ... |
Tells HCatalog that myscript.hcatalog is a file containing DDL commands to execute.
|
-g
| hcat -g mygroup ... |
Tells HCatalog that the table which needs to be created must have group "mygroup".
|
-p
| hcat -p rwxr-xr-x ... |
Tells HCatalog that the table which needs to be created must have permissions "rwxr-xr-x".
|
Note the following:
- The -g and -p options are not mandatory.
- Only one -e or -f option can be provided, not both.
- The order of options is immaterial; you can specify the options in any order.
If no option is provided, then a usage message is printed:
Usage: hcat { -e "" | -f } [-g ] [-p ] [-D=]
- ALTER INDEX ... REBUILD
- CREATE TABLE ... AS SELECT
- ALTER TABLE ... CONCATENATE
- ALTER TABLE ARCHIVE/UNARCHIVE PARTITION
- ANALYZE TABLE ... COMPUTE STATISTICS
- IMPORT FROM ...
- EXPORT TABLE
CREATE TABLE kst
PARTITIONED BY (ds string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.url'='http://schema_provider/kst.avsc');
CREATE TABLE kst (
string1 string,
string2 string,
int1 int,
boolean1 boolean,
long1 bigint,
float1 float,
double1 double,
inner_record1 struct,
enum1 string,
array1 array,
map1 map,
union1 uniontype,
fixed1 binary,
null1 void,
unionnullint int,
bytes1 binary)
PARTITIONED BY (ds string)
STORED AS AVRO;
CREATE TABLE test_serializer(string1 STRING,
int1 INT,
tinyint1 TINYINT,
smallint1 SMALLINT,
bigint1 BIGINT,
boolean1 BOOLEAN,
float1 FLOAT,
double1 DOUBLE,
list1 ARRAY,
map1 MAP,
struct1 STRUCT,
union1 uniontype,
enum1 STRING,
nullableint INT,
bytes1 BINARY,
fixed1 BINARY)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':' MAP KEYS TERMINATED BY '#' LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
{
"namespace": "com.linkedin.haivvreo",
"name": "test_serializer",
"type": "record",
"fields": [
{ "name":"string1", "type":"string" },
{ "name":"int1", "type":"int" },
{ "name":"tinyint1", "type":"int" },
{ "name":"smallint1", "type":"int" },
{ "name":"bigint1", "type":"long" },
{ "name":"boolean1", "type":"boolean" },
{ "name":"float1", "type":"float" },
{ "name":"double1", "type":"double" },
{ "name":"list1", "type":{"type":"array", "items":"string"} },
{ "name":"map1", "type":{"type":"map", "values":"int"} },
{ "name":"struct1", "type":{"type":"record", "name":"struct1_name", "fields": [
{ "name":"sInt", "type":"int" }, { "name":"sBoolean", "type":"boolean" }, { "name":"sString", "type":"string" } ] } },
{ "name":"union1", "type":["float", "boolean", "string"] },
{ "name":"enum1", "type":{"type":"enum", "name":"enum1_values", "symbols":["BLUE","RED", "GREEN"]} },
{ "name":"nullableint", "type":["int", "null"] },
{ "name":"bytes1", "type":"bytes" },
{ "name":"fixed1", "type":{"type":"fixed", "name":"threebytes", "size":3} }
] }
CREATE TABLE as_avro
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED as INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc');
INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;
CREATE TABLE as_avro(string1 STRING,
int1 INT,
tinyint1 TINYINT,
smallint1 SMALLINT,
bigint1 BIGINT,
boolean1 BOOLEAN,
float1 FLOAT,
double1 DOUBLE,
list1 ARRAY,
map1 MAP,
struct1 STRUCT,
union1 uniontype,
enum1 STRING,
nullableint INT,
bytes1 BINARY,
fixed1 BINARY)
STORED AS AVRO;
INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;
CREATE TABLE embedded
COMMENT "just drop the schema right into the HQL"
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.literal'='{
"namespace": "com.howdy",
"name": "some_schema",
"type": "record",
"fields": [ { "name":"string1","type":"string"}]
}');
hive --hiveconf schema="${SCHEMA}" -f your_script_file.sql
ORC Files
- a single file as the output of each task, which reduces the NameNode's load
- Hive type support including datetime, decimal, and the complex types (struct, list, map, and union)
- light-weight indexes stored within the file
- skip row groups that don't pass predicate filtering
- seek to a given row
- block-mode compression based on data type
- run-length encoding for integer columns
- dictionary encoding for string columns
- concurrent reads of the same file using separate RecordReaders
- ability to split files without scanning for markers
- bound the amount of memory needed for reading or writing
- metadata stored using Protocol Buffers, which allows addition and removal of fields
An ORC file contains groups of row data called stripes, along with auxiliary information in a file footer. At the end of the file a postscript holds compression parameters and the size of the compressed footer.
The default stripe size is 250 MB. Large stripe sizes enable large, efficient reads from HDFS.
The file footer contains a list of stripes in the file, the number of rows per stripe, and each column's data type. It also contains column-level aggregates count, min, max, and sum.
This diagram illustrates the ORC file structure:
CREATE TABLE ... STORED AS ORC
ALTER TABLE ... [PARTITION partition_spec] SET FILEFORMAT ORC
SET hive.default.fileformat=Orc
/ Hive version 1.1.0 and later:
hive --orcfiledump [-d] [--rowindex ]
// Hive version 1.2.0 and later:
hive --orcfiledump [-d] [-t] [--rowindex ]
// Hive version 1.3.0 and later:
hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex ] [--recover] [--skip-dump]
[--backup-path ]
Specifying -d in the command will cause it to dump the ORC file data rather than the metadata (Hive 1.1.0 and later).
Specifying --rowindex with a comma separated list of column ids will cause it to print row indexes for the specified columns, where 0 is the top level struct containing all of the columns and 1 is the first column id (Hive 1.1.0 and later).
Specifying -t in the command will print the timezone id of the writer.
Specifying -j in the command will print the ORC file metadata in JSON format. To pretty print the JSON metadata, add -p to the command.
Specifying --recover in the command will recover a corrupted ORC file generated by Hive streaming.
Specifying --skip-dump along with --recover will perform recovery without dumping metadata.
Specifying --backup-path with a new-path will let the recovery tool move corrupted files to the specified backup path (default: /tmp).
is the URI of the ORC file.
is the URI of the ORC file or directory. From Hive 1.3.0 onward, this URI can be a directory containing ORC files.
PARQUET:
CREATE TABLE parquet_test (
id int,
str string,
mp MAP,
lst ARRAY,
strct STRUCT)
PARTITIONED BY (part string)
STORED AS PARQUET;
Compressed Data Storage
CREATE TABLE raw (line STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;
CREATE TABLE raw (line STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';
CREATE TABLE raw_sequence (line STRING)
STORED AS SEQUENCEFILE;
LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK; -- NONE/RECORD/BLOCK (see below)
INSERT OVERWRITE TABLE raw_sequence SELECT * FROM raw;
core-site.xml
core-site.xml
Add the following to your core-site.xml
:
com.hadoop.compression.lzo.LzoCodec
com.hadoop.compression.lzo.LzopCodec
For example:
io.compression.codecs
org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,
com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec
io.compression.codec.lzo.class
com.hadoop.compression.lzo.LzoCodec
Next we run the command to create an LZO index file:
SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec
SET hive.exec.compress.output=true
SET mapreduce.output.fileoutputformat.compress=true
SET hive.exec.compress.output=true
SET mapreduce.output.fileoutputformat.compress=true
Overview
HiveQL DDL statements are documented here, including:
- CREATE DATABASE/SCHEMA, TABLE, VIEW, FUNCTION, INDEX
- DROP DATABASE/SCHEMA, TABLE, VIEW, INDEX
- TRUNCATE TABLE
- ALTER DATABASE/SCHEMA, TABLE, VIEW
- MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)
- SHOW DATABASES/SCHEMAS, TABLES, TBLPROPERTIES, VIEWS, PARTITIONS, FUNCTIONS, INDEX[ES], COLUMNS, CREATE TABLE
- DESCRIBE DATABASE/SCHEMA, table_name, view_name, materialized_view_name
PARTITION statements are usually options of TABLE statements, except for SHOW PARTITIONS.
Keywords, Non-reserved Keywords and Reserved Keywords
All Keywords | ||
---|---|---|
Version | Non-reserved Keywords | Reserved Keywords |
Hive 1.2.0 | ADD, ADMIN, AFTER, ANALYZE, ARCHIVE, ASC, BEFORE, BUCKET, BUCKETS, CASCADE, CHANGE, CLUSTER, CLUSTERED, CLUSTERSTATUS, COLLECTION, COLUMNS, COMMENT, COMPACT, COMPACTIONS, COMPUTE, CONCATENATE, CONTINUE, DATA, DATABASES, DATETIME, DAY, DBPROPERTIES, DEFERRED, DEFINED, DELIMITED, DEPENDENCY, DESC, DIRECTORIES, DIRECTORY, DISABLE, DISTRIBUTE, ELEM_TYPE, ENABLE, ESCAPED, EXCLUSIVE, EXPLAIN, EXPORT, FIELDS, FILE, FILEFORMAT, FIRST, FORMAT, FORMATTED, FUNCTIONS, HOLD_DDLTIME, HOUR, IDXPROPERTIES, IGNORE, INDEX, INDEXES, INPATH, INPUTDRIVER, INPUTFORMAT, ITEMS, JAR, KEYS, KEY_TYPE, LIMIT, LINES, LOAD, LOCATION, LOCK, LOCKS, LOGICAL, LONG, MAPJOIN, MATERIALIZED, METADATA, MINUS, MINUTE, MONTH, MSCK, NOSCAN, NO_DROP, OFFLINE, OPTION, OUTPUTDRIVER, OUTPUTFORMAT, OVERWRITE, OWNER, PARTITIONED, PARTITIONS, PLUS, PRETTY, PRINCIPALS, PROTECTION, PURGE, READ, READONLY, REBUILD, RECORDREADER, RECORDWRITER, REGEXP, RELOAD, RENAME, REPAIR, REPLACE, REPLICATION, RESTRICT, REWRITE, RLIKE, ROLE, ROLES, SCHEMA, SCHEMAS, SECOND, SEMI, SERDE, SERDEPROPERTIES, SERVER, SETS, SHARED, SHOW, SHOW_DATABASE, SKEWED, SORT, SORTED, SSL, STATISTICS, STORED, STREAMTABLE, STRING, STRUCT, TABLES, TBLPROPERTIES, TEMPORARY, TERMINATED, TINYINT, TOUCH, TRANSACTIONS, UNARCHIVE, UNDO, UNIONTYPE, UNLOCK, UNSET, UNSIGNED, URI, USE, UTC, UTCTIMESTAMP, VALUE_TYPE, VIEW, WHILE, YEAR | ALL, ALTER, AND, ARRAY, AS, AUTHORIZATION, BETWEEN, BIGINT, BINARY, BOOLEAN, BOTH, BY, CASE, CAST, CHAR, COLUMN, CONF, CREATE, CROSS, CUBE, CURRENT, CURRENT_DATE, CURRENT_TIMESTAMP, CURSOR, DATABASE, DATE, DECIMAL, DELETE, DESCRIBE, DISTINCT, DOUBLE, DROP, ELSE, END, EXCHANGE, EXISTS, EXTENDED, EXTERNAL, FALSE, FETCH, FLOAT, FOLLOWING, FOR, FROM, FULL, FUNCTION, GRANT, GROUP, GROUPING, HAVING, IF, IMPORT, IN, INNER, INSERT, INT, INTERSECT, INTERVAL, INTO, IS, JOIN, LATERAL, LEFT, LESS, LIKE, LOCAL, MACRO, MAP, MORE, NONE, NOT, NULL, OF, ON, OR, ORDER, OUT, OUTER, OVER, PARTIALSCAN, PARTITION, PERCENT, PRECEDING, PRESERVE, PROCEDURE, RANGE, READS, REDUCE, REVOKE, RIGHT, ROLLUP, ROW, ROWS, SELECT, SET, SMALLINT, TABLE, TABLESAMPLE, THEN, TIMESTAMP, TO, TRANSFORM, TRIGGER, TRUE, TRUNCATE, UNBOUNDED, UNION, UNIQUEJOIN, UPDATE, USER, USING, UTC_TMESTAMP, VALUES, VARCHAR, WHEN, WHERE, WINDOW, WITH |
Hive 2.0.0
|
removed:
REGEXP, RLIKE
added:
AUTOCOMMIT, ISOLATION, LEVEL, OFFSET, SNAPSHOT, TRANSACTION, WORK, WRITE |
added:
COMMIT, ONLY, REGEXP, RLIKE, ROLLBACK, START |
Hive 2.1.0
|
added:
ABORT, KEY, LAST, NORELY, NOVALIDATE, NULLS, RELY, VALIDATE |
added:
CACHE, CONSTRAINT, FOREIGN, PRIMARY, REFERENCES |
Hive 2.2.0
|
added:
DETAIL, DOW, EXPRESSION, OPERATOR, QUARTER, SUMMARY, VECTORIZATION, WEEK, YEARS, MONTHS, WEEKS, DAYS, HOURS, MINUTES, SECONDS |
added:
DAYOFWEEK, EXTRACT, FLOOR, INTEGER, PRECISION, VIEWS |
Hive 3.0.0
| added: TIMESTAMPTZ, ZONE | added: TIME, NUMERIC, SYNC |
Version information
REGEXP and RLIKE are non-reserved keywords prior to Hive 2.0.0 and reserved keywords starting in Hive 2.0.0 (HIVE-11703).
Create Database
Drop Database
Alter Database
Use Database
Create Table
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14 . 0 and later) [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10 . 0 and later)] ON ((col_value, col_value, ...), (col_value, col_value, ...), ...) [STORED AS DIRECTORIES] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6 . 0 and later) ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6 . 0 and later) [AS select_statement]; -- (Note: Available in Hive 0.5 . 0 and later; not supported for external tables) CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path]; data_type : primitive_type | array_type | map_type | struct_type | union_type -- (Note: Available in Hive 0.7 . 0 and later) primitive_type : TINYINT | SMALLINT | INT | BIGINT | BOOLEAN | FLOAT | DOUBLE | DOUBLE PRECISION -- (Note: Available in Hive 2.2 . 0 and later) | STRING | BINARY -- (Note: Available in Hive 0.8 . 0 and later) | TIMESTAMP -- (Note: Available in Hive 0.8 . 0 and later) | DECIMAL -- (Note: Available in Hive 0.11 . 0 and later) | DECIMAL(precision, scale) -- (Note: Available in Hive 0.13 . 0 and later) | DATE -- (Note: Available in Hive 0.12 . 0 and later) | VARCHAR -- (Note: Available in Hive 0.12 . 0 and later) | CHAR -- (Note: Available in Hive 0.13 . 0 and later) array_type : ARRAY < data_type > map_type : MAP < primitive_type, data_type > struct_type : STRUCT < col_name : data_type [COMMENT col_comment], ...> union_type : UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7 . 0 and later) row_format : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char ]] [COLLECTION ITEMS TERMINATED BY char ] [MAP KEYS TERMINATED BY char ] [LINES TERMINATED BY char ] [NULL DEFINED AS char ] -- (Note: Available in Hive 0.13 and later) | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)] file_format: : SEQUENCEFILE | TEXTFILE -- (Default, depending on hive. default .fileformat configuration) | RCFILE -- (Note: Available in Hive 0.6 . 0 and later) | ORC -- (Note: Available in Hive 0.11 . 0 and later) | PARQUET -- (Note: Available in Hive 0.13 . 0 and later) | AVRO -- (Note: Available in Hive 0.14 . 0 and later) | JSONFILE -- (Note: Available in Hive 4.0 . 0 and later) | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname column_constraint_specification: : [ PRIMARY KEY|UNIQUE|NOT NULL|DEFAULT [default_value]|CHECK [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ] default_value: : [ LITERAL|CURRENT_USER()|CURRENT_DATE()|CURRENT_TIMESTAMP()|NULL ] constraint_specification: : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ] [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ] [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE [, CONSTRAINT constraint_name UNIQUE (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ] [, CONSTRAINT constraint_name CHECK [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ] |
- TBLPROPERTIES ("comment"="table_comment")
- TBLPROPERTIES ("hbase.table.name"="table_name") – see HBase Integration.
- TBLPROPERTIES ("immutable"="true") or ("immutable"="false") in release 0.13.0+ (HIVE-6406) – see Inserting Data into Hive Tables from Queries.
- TBLPROPERTIES ("orc.compress"="ZLIB") or ("orc.compress"="SNAPPY") or ("orc.compress"="NONE") and other ORC properties – see ORC Files.
- TBLPROPERTIES ("transactional"="true") or ("transactional"="false") in release 0.14.0+, the default is "false" – see Hive Transactions.
- TBLPROPERTIES ("NO_AUTO_COMPACTION"="true") or ("NO_AUTO_COMPACTION"="false"), the default is "false" – see Hive Transactions.
- TBLPROPERTIES ("compactor.mapreduce.map.memory.mb"="mapper_memory") – see Hive Transactions.
- TBLPROPERTIES ("compactorthreshold.hive.compactor.delta.num.threshold"="threshold_num") – see Hive Transactions.
- TBLPROPERTIES ("compactorthreshold.hive.compactor.delta.pct.threshold"="threshold_pct") – see Hive Transactions.
- TBLPROPERTIES ("auto.purge"="true") or ("auto.purge"="false") in release 1.2.0+ (HIVE-9118) – see Drop Table, Drop Partitions, Truncate Table, and Insert Overwrite.
- TBLPROPERTIES ("EXTERNAL"="TRUE") in release 0.6.0+ (HIVE-1329) – Change a managed table to an external table and vice versa for "FALSE".
JSON
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE
Stored as plain text file in JSON
format.
The JsonSerDe for JSON files is available in Hive 0.12 and later.
In some distributions, a reference to
hive-hcatalog-core.jar is required.
ADD JAR
/usr/lib/hive-hcatalog/lib/hive-hcatalog-core.jar;
CREATE TABLE my_table(a string, b bigint,
...)
ROW FORMAT
SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE;
The JsonSerDe was moved to Hive from HCatalog and before it
was in hive-contrib project. It was added to the Hive distribution
by HIVE-4895.
An Amazon SerDe is available at s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jar for releases prior to 0.12.0.
The JsonSerDe for JSON files is available in Hive 0.12 and later.
Starting in Hive 3.0.0, JsonSerDe
is added to Hive Serde as "org.apache.hadoop.hive.serde2.JsonSerDe"
(HIVE-19211).
CREATE TABLE my_table(a
string, b bigint, ...)
ROW FORMAT
SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
STORED AS TEXTFILE;
Or STORED
AS JSONFILE is supported starting in Hive
4.0.0 (HIVE-19899), so you can create table as
follows:
CREATE TABLE my_table(a string, b bigint, ...) STORED AS
JSONFILE;
CSV/TSV
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
Stored as plain text file in CSV /
TSV format.
The CSVSerde is available in Hive 0.14 and greater.
The following example creates a TSV
(Tab-separated) file.
CREATE TABLE my_table(a
string, b string, ...)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "'",
"escapeChar" = "\\"
)
STORED AS TEXTFILE;
Default properties for SerDe is
Comma-Separated (CSV) file
DEFAULT_ESCAPE_CHARACTER \
DEFAULT_QUOTE_CHARACTER
"
DEFAULT_SEPARATOR
,
This SerDe works for most CSV data,
but does not handle embedded newlines. To use the SerDe, specify the
fully qualified class name org.apache.hadoop.hive.serde2.OpenCSVSerde.
Documentation is based on original documentation
at https://github.com/ogrodnek/csv-serde.
Limitations
This SerDe treats all columns to be
of type String. Even if you create a table with non-string column types using
this SerDe, the DESCRIBE TABLE output would show string column type.
The type information is retrieved from the SerDe.
To convert columns to the desired type in a table, you can create a view over
the table that does the CAST to the desired type.
The CSV SerDe is based on https://github.com/ogrodnek/csv-serde,
and was added to the Hive distribution in HIVE-7777.
The CSVSerde has been built and
tested against Hive 0.14 and later, and uses Open-CSV 2.3
which is bundled with the Hive distribution.
For general information about SerDes, see Hive SerDe in the Developer Guide.
Also see SerDe for details about input
and output processing.
Storage Format
Description
STORED AS TEXTFILE
Stored as plain text files. TEXTFILE
is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.
Use the DELIMITED clause to read delimited files.
Enable escaping for the delimiter characters by
using the 'ESCAPED BY' clause (such as ESCAPED BY '\')
Escaping is needed if you want to work with data that can contain these
delimiter characters.
A custom NULL format can also be specified using the 'NULL DEFINED AS' clause
(default is '\N').
STORED AS SEQUENCEFILE
Stored as compressed Sequence File.
STORED AS ORC
Stored as ORC file format. Supports ACID Transactions
& Cost-based Optimizer (CBO). Stores column-level metadata.
STORED AS PARQUET
Stored as Parquet format for the Parquet columnar storage format in Hive 0.13.0 and later;
Use ROW FORMAT SERDE ... STORED AS INPUTFORMAT ... OUTPUTFORMAT syntax ... in
Hive 0.10, 0.11, or 0.12.
STORED AS AVRO
Stored as Avro format in Hive 0.14.0 and later (see Avro SerDe).
STORED AS RCFILE
Stored as Record
Columnar File format.
STORED AS JSONFILE
Stored as Json file format in Hive
4.0.0 and later.
STORED BY
Stored by a non-native table format.
To create or link to a non-native table, for example a table backed by HBase or Druid or Accumulo.
See StorageHandlers for more information on this
option.
INPUTFORMAT and OUTPUTFORMAT
in the file_format to specify the
name of a corresponding InputFormat and OutputFormat class as a string
literal.
For example,
'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'.
For LZO compression, the values to use are
'INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"'
(see LZO Compression).
Partitioned Tables
Example:
id int,
date date,
name varchar
Example:
create table table_name (
id int,
dtDontQuery string,
name string
)
partitioned by (date string)
Example:
CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
STORED AS SEQUENCEFILE;
Example:
CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
STORED AS SEQUENCEFILE;
CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User',
country STRING COMMENT 'country of origination')
COMMENT 'This is the staging page view table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
STORED AS TEXTFILE
LOCATION '';
CTAS has these restrictions:
The target table cannot be an external table.
The target table cannot be a list bucketing table.
Example:
CREATE TABLE new_key_value_store
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFile
AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair;
CREATE TABLE empty_key_value_store
LIKE key_value_store [TBLPROPERTIES (property_name=property_value, ...)];
Example:
CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
STORED AS SEQUENCEFILE
CREATE TABLE list_bucket_single (key STRING, value STRING)
SKEWED BY (key) ON (1,5,6) [STORED AS DIRECTORIES];
CREATE TABLE list_bucket_multiple (col1 STRING, col2 int, col3 STRING)
SKEWED BY (col1, col2) ON (('s1',1), ('s3',3), ('s13',13), ('s78',78)) [STORED AS DIRECTORIES];
Example:
CREATE TEMPORARY TABLE list_bucket_multiple (col1 STRING, col2 int, col3 STRING);
A table that supports operations with ACID semantics. See this for more details about transactional tables.
Example:
CREATE TRANSACTIONAL TABLE transactional_table_test(key string, value string) PARTITIONED BY(ds string) STORED AS ORC;
create table pk(id1 integer, id2 integer,
primary key(id1, id2) disable novalidate);
create table fk(id1 integer, id2 integer,
constraint c1 foreign key(id1, id2) references pk(id2, id1) disable novalidate);
create table constraints1(id1 integer UNIQUE disable novalidate, id2 integer NOT NULL,
usr string DEFAULT current_user(), price double CHECK (price > 0 AND price <= 1000));
create table constraints2(id1 integer, id2 integer,
constraint c1_unique UNIQUE(id1) disable novalidate);
create table constraints3(id1 integer, id2 integer,
constraint c1_check CHECK(id1 + id2 > 0));
Drop Table
DROP TABLE [IF EXISTS] table_name [PURGE]; -- (Note: PURGE available in Hive 0.14.0 and later)
TRUNCATE TABLE table_name [PARTITION partition_spec];
partition_spec:
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)
No comments:
Post a Comment