Commands are non-SQL statements such as setting a property or adding a resource. They can be used in HiveQL scripts or directly in the CLI or Beeline.
| 
Command | 
Description | 
|---|---|
| 
Command | 
Description | 
| 
quit exit | 
Use quit or exit to leave the interactive shell. | 
| 
reset | 
Resets the configuration to the default values (as of Hive 0.10: see HIVE-3202). Any configuration parameters that were set using the set command or -hiveconf parameter in hive commandline will get reset to default value. 
Note that this does not apply to configuration parameters that were set in set command using the "hiveconf:" prefix for the key name (for historic reasons). | 
| 
set  | 
Sets the value of a particular configuration variable (key). Note: If you misspell the variable name, the CLI will not show an error. | 
| 
set | 
Prints a list of configuration variables that are overridden by the user or Hive. | 
| 
set -v | 
Prints all Hadoop and Hive configuration variables. | 
| 
add FILE[S]  add JAR[S] add ARCHIVE[S] | 
Adds one or more files, jars, or archives to the list of resources in the distributed cache. See Hive Resources for more information. | 
| add FILE[S] add JAR[S] add ARCHIVE[S] | As of Hive 1.2.0, adds one or more files, jars or archives to the list of resources in the distributed cache using an Ivy URL of the form ivy://group:module:version?query_string. See Hive Resources for more information. | 
| 
list FILE[S] list JAR[S] list ARCHIVE[S] | 
Lists the resources already added to the distributed cache. See Hive Resources for more information. | 
| 
list FILE[S]  list JAR[S] list ARCHIVE[S] | 
Checks whether the given resources are already added to the distributed cache or not. See Hive Resources for more information. | 
| 
delete FILE[S]  delete JAR[S] delete ARCHIVE[S] | 
Removes the resource(s) from the distributed cache. | 
| delete FILE[S] delete JAR[S] delete ARCHIVE[S] | As of Hive 1.2.0, removes the resource(s) which were added using the | 
| 
!  | 
Executes a shell command from the Hive shell. | 
| 
dfs  | 
Executes a dfs command from the Hive shell. | 
| 
Executes a Hive query and prints results to standard output. | |
| 
source FILE  | 
Executes a script file inside the CLI. | 
| 
compile ` | 
This allows inline Groovy code to be compiled and be used as a UDF (as of Hive 0.13.0). For a usage example, see Nov. 2013 Hive Contributors Meetup Presentations – Using Dynamic Compilation with Hive. | 
Sample Usage:
hive> set mapred.reduce.tasks=32; hive> set; hive> select a.* from tab1; hive> !ls; hive> dfs -ls;
Hive Command Line Options
To get help, run "hive -H" or "hive --help". Usage (as it is in Hive 0.9.0):In remote mode HiveServer2 only accepts valid Thrift calls – even in HTTP mode, the message body contains Thrift payloads.usage: hive -d,--defineVariable substitution to apply to Hive commands. e.g. -d A=B or --define A=B -e SQL from command line -f SQL from files -H,--help Print help information -h Connecting to Hive Server on remote host --hiveconf Use value for given property --hivevar Variable substitution to apply to hive commands. e.g. --hivevar A=B -i Initialization SQL file -p Connecting to Hive Server on port number -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console) Examples
See Variable Substitution for examples of using thehiveconfoption.
Example of running a query from the command line$HIVE_HOME/bin/hive -e 'select a.col from tab1 a'
Example of setting Hive configuration variables$HIVE_HOME/bin/hive -e 'select a.col from tab1 a' --hiveconf hive.exec.scratchdir=/home/my/hive_scratch --hiveconf mapred.reduce.tasks=32
Example of dumping data out from a query into a file using silent mode$HIVE_HOME/bin/hive -S -e 'select a.col from tab1 a' > a.txt
Example of running a script non-interactively from local disk$HIVE_HOME/bin/hive -f /home/my/hive-script.sql
Example of running a script non-interactively from a Hadoop supported filesystem (starting in Hive 0.14)$HIVE_HOME/bin/hive -f hdfs://
: /hive-script.sql $HIVE_HOME/bin/hive -f s3://mys3bucket/s3-script.sql Example of running an initialization script before entering interactive mode$HIVE_HOME/bin/hive -i /home/my/hive-init.sql
/tmp//hive.log $HIVE_HOME/bin/hive --hiveconf hive.root.logger=INFO,consolehive.root.loggerspecifies the logging level as well as the log destination. Specifyingconsoleas the target sends the logs to the standard error (instead of the log file).Hive Batch Mode CommandsWhen$HIVE_HOME/bin/hiveis run with the-eor-foption, it executes SQL commands in batch mode.
hive -e 'executes the query string.' 
hive -fexecutes one or more SQL queries from a file.Version 0.14
As of Hive 0.14,can be from one of the Hadoop supported filesystems (HDFS, S3, etc.) as well. $HIVE_HOME/bin/hive -f hdfs://: /hive-script.sql $HIVE_HOME/bin/hive -f s3://mys3bucket/s3-script.sqlhive> add FILE /tmp/tt.py; hive> list FILES; /tmp/tt.py hive> select from networks a MAP a.networkid USING 'python tt.py' as nn where a.ds = '2009-01-04' limit 10;
% beeline -u jdbc:hive2://localhost:10000/default -n scott -w password_file Hive version 0.11.0-SNAPSHOT by Apache Connecting to jdbc:hive2://localhost:10000/defaultBeeline Command Options
The Beeline CLI supports these command line options:
beeline -udb_URL!connectto a URL and used!saveto a beeline.properties file).Usage:beeline -rVersion: 2.1.0 (HIVE-13670)beeline -nvalid_userbeeline -pvalid_passwordOptional password mode:Starting Hive 2.2.0 (HIVE-13589) the argument for -p option is optional.Usage : beeline -p [valid_password]If the password is not provided after -p Beeline will prompt for the password while initiating the connection. When password is provided Beeline uses it initiate the connection without prompting.beeline -ddriver_classbeeline -e "query_string"Support to run multiple SQL statements separated by semicolons in a single query_string: 1.2.0 (HIVE-9877)Bug fix (null pointer exception): 0.13.0 (HIVE-5765)Bug fix (--headerInterval not honored): 0.14.0 (HIVE-7647) Bug fix (running-ein background): 1.3.0 and 2.0.0 (HIVE-6758); workaround available for earlier versions-f beeline -ffilepathVersion: 0.12.0 (HIVE-4268) Note: If the script contains tabs, query compilation fails in version 0.12.0. This bug is fixed in version 0.13.0 (HIVE-6359). Bug fix (running-fin background): 1.3.0 and 2.0.0 (HIVE-6758); workaround available for earlier versions-i (or) --init beeline -i /tmp/initfileSingle file:Version: 0.14.0 (HIVE-6561)Multiple files:Version: 2.1.0 (HIVE-11336)-w (or) --password-file -a (or) --authType --property-file beeline --property-file /tmp/aVersion: 2.2.0 (HIVE-13964)--hiveconf property=value beeline --hiveconfprop1=value1Version: 0.13.0 (HIVE-6173)--hivevar name=value beeline --hivevarvar1=value1--color=[true/false] beeline --color=true(Not supported for Separated-Value Output formats. See HIVE-9770)--showHeader=[true/false] beeline --showHeader=false--headerInterval=ROWS beeline --headerInterval=50(Not supported for Separated-Value Output formats. See HIVE-9770)--fastConnect=[true/false] beeline --fastConnect=false--autoCommit=[true/false] beeline --autoCommit=true--verbose=[true/false] beeline --verbose=true--showWarnings=[true/false] beeline --showWarnings=true--showDbInPrompt=[true/false] beeline --showDbInPrompt=trueVersion: 2.2.0 (HIVE-14123)--showNestedErrs=[true/false] beeline --showNestedErrs=true--numberFormat=[pattern] beeline --numberFormat="#,###,##0.00"--force=[true/false] beeline--force=true--maxWidth=MAXWIDTH beeline --maxWidth=150--maxColumnWidth=MAXCOLWIDTH beeline --maxColumnWidth=25--silent=[true/false] beeline --silent=true--autosave=[true/false] beeline --autosave=true--outputformat=[table/vertical/csv/tsv/dsv/csv2/tsv2] beeline --outputformat=tsvVersion: dsv/csv2/tsv2 added in 0.14.0 (HIVE-8615)--truncateTable=[true/false] --delimiterForDSV= DELIMITER --isolation=LEVEL beeline --isolation=TRANSACTION_SERIALIZABLE--nullemptystring=[true/false] beeline --nullemptystring=falseVersion: 0.13.0 (HIVE-4485)truefrom Hive 2.3 onwards, before it defaulted tofalse.When set tofalse, the entire result set is fetched and buffered before being displayed, yielding optimal display column sizing. When set totrue, result rows are displayed immediately as they are fetched, yielding lower latency and memory usage at the price of extra display column padding. Setting--incremental=trueis recommended if you encounter an OutOfMemory on the client side (due to the fetched result set size being large).--incrementalBufferRows=NUMROWS --incremental=trueand--outputformat=tableUsage:beeline --incrementalBufferRows=1000Version: 2.3.0 (HIVE-14170)--maxHistoryRows=NUMROWS --delimiter=; beeline --delimiter=$$Version: 3.0.0 (HIVE-10865)--convertBinaryArrayToString=[true/false] beeline --convertBinaryArrayToString=trueVersion: 3.0.0 (HIVE-14786)beeline --help
"1.0"?>"text/xsl" href="configuration.xsl"?>  beeline.hs2.connection.user   hive  
  beeline.hs2.connection.password   hive 
"1.0"?>
"text/xsl" href="configuration.xsl"?>
  beeline.hs2.connection.hosts 
  localhost: 10000
  beeline.hs2.connection.principal   hive/dummy-hostname @domain.com
"1.0"?>
"text/xsl" href="configuration.xsl"?>
  beeline.hs2.connection.user 
  hive 
  beeline.hs2.connection.hiveconf   hive.cli.print.current.db= true, hive.cli.print.header=true  beeline.hs2.connection.hivevar   testVarName1=value1, testVarName2=value2 
|     //localhost:10000/default;user=hive;password=hive | 
  beeline.hs2.jdbc.url.httpUrl   jdbc:hive2: //localhost:10000/default;user=hive;password=hive;transportMode=http;httpPath=cliservice  beeline.hs2.jdbc.url. default  tcpUrl 
  beeline.hs2.connection.password   hive 
Consider the following beeline-site.xml:
  beeline.hs2.jdbc.url.httpUrl   jdbc:hive2: //localhost:10000/default;transportMode=http;httpPath=cliservice  beeline.hs2.jdbc.url. default  tcpUrl 
Using JDBC
You can use JDBC to access data stored in a relational database or other tabular format.
- Load the HiveServer2 JDBC driver. As of 1.2.0 applications no longer need to explicitly load JDBC drivers using Class.forName().
 
 For example:Class.forName("org.apache.hive.jdbc.HiveDriver");
- Connect to the database by creating aConnectionobject with the JDBC driver.
 
 For example:Connection cnct = DriverManager.getConnection("jdbc:hive2://: ", " ", " "); The defaultConnection cnct = DriverManager.getConnection("jdbc:hive2://: ", " ", ""); In Kerberos secure mode, the user information is based on the Kerberos credentials.
- Submit SQL to the database by creating aStatementobject and using itsexecuteQuery()method.
 
 For example:Statement stmt = cnct.createStatement(); ResultSet rset = stmt.executeQuery("SELECT foo FROM bar");
- Process the result set, if necessary.
JDBC Client Sample Code
Running the JDBC Sample Code
Alternatively, you can run the following bash script, which will seed the data file and build your classpath before invoking the client. The script adds all the additional jars needed for using HiveServer2 in embedded mode as well.
$ a=b
$ hive -e " describe $a "
$ bin/hive --hiveconf a=b -e 'set a; set hiveconf:a; \
create table if not exists b (col int); describe ${hiveconf:a}'
$ a=b$ hive -e " describe $a "$ bin/hive --hiveconf a=b -e 'set a; set hiveconf:a; \create table if not exists b (col int); describe ${hiveconf:a}'Hive history file=/tmp/edward/hive_job_log_edward_201011240906_1463048967.txta=bhiveconf:a=bOKTime taken: 5.913 secondsOKcol intTime taken: 0.754 seconds
There are three namespaces for variables – hiveconf, system, and env
set x=myvalue${hiveconf:x}
set zzz=5;--  sets zzz=5set zzz;set system:xxx=5;set system:xxx;-- sets a system property xxx to 5set system:yyy=${system:xxx};set system:yyy;-- sets yyy with value of xxxset go=${hiveconf:zzz};set go;-- sets go base on value on zzzset hive.variable.substitute=false;set raw=${hiveconf:zzz};set raw;-- disable substitution set a value to the literalset hive.variable.substitute=true;EXPLAIN SELECT * FROM src where key=${hiveconf:zzz};SELECT * FROM src where key=${hiveconf:zzz};--use a variable in a queryset a=1;set b=a;set c=${hiveconf:${hiveconf:b}};set c;--uses nested variables. set jar=../lib/derby.jar;add file ${hiveconf:jar};list file;delete file ${hiveconf:jar};list file;
The HCatalog command line interface (CLI) can be invoked as HIVE_HOME=hive_home hcat_home/bin/hcat where hive_home is the directory where Hive has been installed and hcat_home is the directory where HCatalog has been installed.
HCatalog CLI
The HCatalog CLI supports these command line options:| 
Option | 
Usage | 
Description | 
|---|---|---|
| hcat | 
Prints a usage message. | |
| 
-D | hcat -Dkey=value... | 
Passes the key-value pair to HCatalog as a Java System Property. | 
| 
-e | hcat -e 'create table mytable(a int);' ... | 
Tells HCatalog to treat the following string as a DDL command and execute it. | 
| 
-f | hcat -f myscript.hcatalog ... | 
Tells HCatalog that myscript.hcatalog is a file containing DDL commands to execute. | 
| 
-g | hcat -g mygroup ... | 
Tells HCatalog that the table which needs to be created must have group "mygroup". | 
| 
-p | hcat -p rwxr-xr-x ... | 
Tells HCatalog that the table which needs to be created must have permissions "rwxr-xr-x". | 
Note the following:- The -g and -p options are not mandatory.
- Only one -e or -f option can be provided, not both.
- The order of options is immaterial; you can specify the options in any order.
If no option is provided, then a usage message is printed:Usage:  hcat  { -e "" | -f  }  [-g ] [-p ] [-D=]
      - ALTER INDEX ... REBUILD
- CREATE TABLE ... AS SELECT
- ALTER TABLE ... CONCATENATE
- ALTER TABLE ARCHIVE/UNARCHIVE PARTITION
- ANALYZE TABLE ... COMPUTE STATISTICS
- IMPORT FROM ...
- EXPORT TABLE
CREATE TABLE kst
  PARTITIONED BY (ds string)
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.url'='http://schema_provider/kst.avsc');CREATE TABLE kst (
    string1 string,
    string2 string,
    int1 int,
    boolean1 boolean,
    long1 bigint,
    float1 float,
    double1 double,
    inner_record1 struct,
    enum1 string,
    array1 array,
    map1 map,
    union1 uniontype,
    fixed1 binary,
    null1 void,
    unionnullint int,
    bytes1 binary)
  PARTITIONED BY (ds string)
  STORED AS AVRO;    CREATE TABLE test_serializer(string1 STRING,
                             int1 INT,
                             tinyint1 TINYINT,
                             smallint1 SMALLINT,
                             bigint1 BIGINT,
                             boolean1 BOOLEAN,
                             float1 FLOAT,
                             double1 DOUBLE,
                             list1 ARRAY,
                             map1 MAP,
                             struct1 STRUCT,
                             union1 uniontype,
                             enum1 STRING,
                             nullableint INT,
                             bytes1 BINARY,
                             fixed1 BINARY)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':' MAP KEYS TERMINATED BY '#' LINES TERMINATED BY '\n'
 STORED AS TEXTFILE;    
{
  "namespace": "com.linkedin.haivvreo",
  "name": "test_serializer",
  "type": "record",
  "fields": [
    { "name":"string1", "type":"string" },
    { "name":"int1", "type":"int" },
    { "name":"tinyint1", "type":"int" },
    { "name":"smallint1", "type":"int" },
    { "name":"bigint1", "type":"long" },
    { "name":"boolean1", "type":"boolean" },
    { "name":"float1", "type":"float" },
    { "name":"double1", "type":"double" },
    { "name":"list1", "type":{"type":"array", "items":"string"} },
    { "name":"map1", "type":{"type":"map", "values":"int"} },
    { "name":"struct1", "type":{"type":"record", "name":"struct1_name", "fields": [
          { "name":"sInt", "type":"int" }, { "name":"sBoolean", "type":"boolean" }, { "name":"sString", "type":"string" } ] } },
    { "name":"union1", "type":["float", "boolean", "string"] },
    { "name":"enum1", "type":{"type":"enum", "name":"enum1_values", "symbols":["BLUE","RED", "GREEN"]} },
    { "name":"nullableint", "type":["int", "null"] },
    { "name":"bytes1", "type":"bytes" },
    { "name":"fixed1", "type":{"type":"fixed", "name":"threebytes", "size":3} }
  ] }
CREATE TABLE as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc');
  
INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;
CREATE TABLE as_avro(string1 STRING,
                     int1 INT,
                     tinyint1 TINYINT,
                     smallint1 SMALLINT,
                     bigint1 BIGINT,
                     boolean1 BOOLEAN,
                     float1 FLOAT,
                     double1 DOUBLE,
                     list1 ARRAY,
                     map1 MAP,
                     struct1 STRUCT,
                     union1 uniontype,
                     enum1 STRING,
                     nullableint INT,
                     bytes1 BINARY,
                     fixed1 BINARY)
STORED AS AVRO;
INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;    CREATE TABLE embedded
  COMMENT "just drop the schema right into the HQL"
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.literal'='{
      "namespace": "com.howdy",
      "name": "some_schema",
      "type": "record",
      "fields": [ { "name":"string1","type":"string"}]
    }');hive --hiveconf schema="${SCHEMA}" -f your_script_file.sql
ORC Files
- a single file as the output of each task, which reduces the NameNode's load
- Hive type support including datetime, decimal, and the complex types (struct, list, map, and union)
- light-weight indexes stored within the file- skip row groups that don't pass predicate filtering
- seek to a given row
 
- block-mode compression based on data type- run-length encoding for integer columns
- dictionary encoding for string columns
 
- concurrent reads of the same file using separate RecordReaders
- ability to split files without scanning for markers
- bound the amount of memory needed for reading or writing
- metadata stored using Protocol Buffers, which allows addition and removal of fields
An ORC file contains groups of row data called stripes, along with auxiliary information in a file footer. At the end of the file a postscript holds compression parameters and the size of the compressed footer.
The default stripe size is 250 MB. Large stripe sizes enable large, efficient reads from HDFS.
The file footer contains a list of stripes in the file, the number of rows per stripe, and each column's data type. It also contains column-level aggregates count, min, max, and sum.
This diagram illustrates the ORC file structure:
- CREATE TABLE ... STORED AS ORC
- ALTER TABLE ... [PARTITION partition_spec] SET FILEFORMAT ORC
- SET hive.default.fileformat=Orc
hive --orcfiledump [-d] [--rowindex ] 
  
// Hive version 1.2.0 and later:
hive --orcfiledump [-d] [-t] [--rowindex ] 
  
// Hive version 1.3.0 and later:
hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex ] [--recover] [--skip-dump] 
    [--backup-path ] 
Specifying -d in the command will cause it to dump the ORC file data rather than the metadata (Hive 1.1.0 and later).
Specifying --rowindex with a comma separated list of column ids will cause it to print row indexes for the specified columns, where 0 is the top level struct containing all of the columns and 1 is the first column id (Hive 1.1.0 and later).
Specifying -t in the command will print the timezone id of the writer.
Specifying -j in the command will print the ORC file metadata in JSON format. To pretty print the JSON metadata, add -p to the command.
Specifying --recover in the command will recover a corrupted ORC file generated by Hive streaming.
Specifying --skip-dump along with --recover will perform recovery without dumping metadata.
Specifying --backup-path with a new-path will let the recovery tool move corrupted files to the specified backup path (default: /tmp).
 is the URI of the ORC file.
 is the URI of the ORC file or directory. From Hive 1.3.0 onward, this URI can be a directory containing ORC files.      
PARQUET:
CREATE TABLE parquet_test (
 id int,
 str string,
 mp MAP,
 lst ARRAY,
 strct STRUCT) 
PARTITIONED BY (part string)
STORED AS PARQUET;   
Compressed Data StorageCREATE TABLE raw (line STRING)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;CREATE TABLE raw (line STRING)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';
CREATE TABLE raw_sequence (line STRING)
   STORED AS SEQUENCEFILE;
LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK; -- NONE/RECORD/BLOCK (see below)
INSERT OVERWRITE TABLE raw_sequence SELECT * FROM raw;
core-site.xml
core-site.xmlAdd the following to your core-site.xml:- com.hadoop.compression.lzo.LzoCodec
- com.hadoop.compression.lzo.LzopCodec
For example:
io.compression.codecs org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec, 
io.compression.codec.lzo.class 
Next we run the command to create an LZO index file:
SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec
SET hive.exec.compress.output=true
SET mapreduce.output.fileoutputformat.compress=true
SET hive.exec.compress.output=true
SET mapreduce.output.fileoutputformat.compress=true
Overview
HiveQL DDL statements are documented here, including:
- CREATE DATABASE/SCHEMA, TABLE, VIEW, FUNCTION, INDEX
- DROP DATABASE/SCHEMA, TABLE, VIEW, INDEX
- TRUNCATE TABLE
- ALTER DATABASE/SCHEMA, TABLE, VIEW
- MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)
- SHOW DATABASES/SCHEMAS, TABLES, TBLPROPERTIES, VIEWS, PARTITIONS, FUNCTIONS, INDEX[ES], COLUMNS, CREATE TABLE
- DESCRIBE DATABASE/SCHEMA, table_name, view_name, materialized_view_name
PARTITION statements are usually options of TABLE statements, except for SHOW PARTITIONS.
Keywords, Non-reserved Keywords and Reserved Keywords
| All Keywords | ||
|---|---|---|
| Version | Non-reserved Keywords | Reserved Keywords | 
| Hive 1.2.0 | ADD, ADMIN, AFTER, ANALYZE, ARCHIVE, ASC, BEFORE, BUCKET, BUCKETS, CASCADE, CHANGE, CLUSTER, CLUSTERED, CLUSTERSTATUS, COLLECTION, COLUMNS, COMMENT, COMPACT, COMPACTIONS, COMPUTE, CONCATENATE, CONTINUE, DATA, DATABASES, DATETIME, DAY, DBPROPERTIES, DEFERRED, DEFINED, DELIMITED, DEPENDENCY, DESC, DIRECTORIES, DIRECTORY, DISABLE, DISTRIBUTE, ELEM_TYPE, ENABLE, ESCAPED, EXCLUSIVE, EXPLAIN, EXPORT, FIELDS, FILE, FILEFORMAT, FIRST, FORMAT, FORMATTED, FUNCTIONS, HOLD_DDLTIME, HOUR, IDXPROPERTIES, IGNORE, INDEX, INDEXES, INPATH, INPUTDRIVER, INPUTFORMAT, ITEMS, JAR, KEYS, KEY_TYPE, LIMIT, LINES, LOAD, LOCATION, LOCK, LOCKS, LOGICAL, LONG, MAPJOIN, MATERIALIZED, METADATA, MINUS, MINUTE, MONTH, MSCK, NOSCAN, NO_DROP, OFFLINE, OPTION, OUTPUTDRIVER, OUTPUTFORMAT, OVERWRITE, OWNER, PARTITIONED, PARTITIONS, PLUS, PRETTY, PRINCIPALS, PROTECTION, PURGE, READ, READONLY, REBUILD, RECORDREADER, RECORDWRITER, REGEXP, RELOAD, RENAME, REPAIR, REPLACE, REPLICATION, RESTRICT, REWRITE, RLIKE, ROLE, ROLES, SCHEMA, SCHEMAS, SECOND, SEMI, SERDE, SERDEPROPERTIES, SERVER, SETS, SHARED, SHOW, SHOW_DATABASE, SKEWED, SORT, SORTED, SSL, STATISTICS, STORED, STREAMTABLE, STRING, STRUCT, TABLES, TBLPROPERTIES, TEMPORARY, TERMINATED, TINYINT, TOUCH, TRANSACTIONS, UNARCHIVE, UNDO, UNIONTYPE, UNLOCK, UNSET, UNSIGNED, URI, USE, UTC, UTCTIMESTAMP, VALUE_TYPE, VIEW, WHILE, YEAR | ALL, ALTER, AND, ARRAY, AS, AUTHORIZATION, BETWEEN, BIGINT, BINARY, BOOLEAN, BOTH, BY, CASE, CAST, CHAR, COLUMN, CONF, CREATE, CROSS, CUBE, CURRENT, CURRENT_DATE, CURRENT_TIMESTAMP, CURSOR, DATABASE, DATE, DECIMAL, DELETE, DESCRIBE, DISTINCT, DOUBLE, DROP, ELSE, END, EXCHANGE, EXISTS, EXTENDED, EXTERNAL, FALSE, FETCH, FLOAT, FOLLOWING, FOR, FROM, FULL, FUNCTION, GRANT, GROUP, GROUPING, HAVING, IF, IMPORT, IN, INNER, INSERT, INT, INTERSECT, INTERVAL, INTO, IS, JOIN, LATERAL, LEFT, LESS, LIKE, LOCAL, MACRO, MAP, MORE, NONE, NOT, NULL, OF, ON, OR, ORDER, OUT, OUTER, OVER, PARTIALSCAN, PARTITION, PERCENT, PRECEDING, PRESERVE, PROCEDURE, RANGE, READS, REDUCE, REVOKE, RIGHT, ROLLUP, ROW, ROWS, SELECT, SET, SMALLINT, TABLE, TABLESAMPLE, THEN, TIMESTAMP, TO, TRANSFORM, TRIGGER, TRUE, TRUNCATE, UNBOUNDED, UNION, UNIQUEJOIN, UPDATE, USER, USING, UTC_TMESTAMP, VALUES, VARCHAR, WHEN, WHERE, WINDOW, WITH | 
| 
Hive 2.0.0 | 
removed:  REGEXP, RLIKE
added:  AUTOCOMMIT, ISOLATION, LEVEL, OFFSET, SNAPSHOT, TRANSACTION, WORK, WRITE | 
added:  COMMIT, ONLY, REGEXP, RLIKE, ROLLBACK, START | 
| 
Hive 2.1.0 | 
added:  ABORT, KEY, LAST, NORELY, NOVALIDATE, NULLS, RELY, VALIDATE | 
added:  CACHE, CONSTRAINT, FOREIGN, PRIMARY, REFERENCES | 
| 
Hive 2.2.0 | 
added:  DETAIL, DOW, EXPRESSION, OPERATOR, QUARTER, SUMMARY, VECTORIZATION, WEEK, YEARS, MONTHS, WEEKS, DAYS, HOURS, MINUTES, SECONDS | 
added:  DAYOFWEEK, EXTRACT, FLOOR, INTEGER, PRECISION, VIEWS | 
| 
Hive 3.0.0 | added: TIMESTAMPTZ, ZONE  | added: TIME, NUMERIC, SYNC | 
Version information
REGEXP and RLIKE are non-reserved keywords prior to Hive 2.0.0 and reserved keywords starting in Hive 2.0.0 (HIVE-11703).
Create Database
Drop Database
Alter Database
Use Database
Create Table
| CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0and later)  [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]  [COMMENT table_comment]  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0and later)]     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)     [STORED AS DIRECTORIES]  [   [ROW FORMAT row_format]    [STORED AS file_format]     | STORED BY 'storage.handler.class.name'[WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0and later)  ]  [LOCATION hdfs_path]  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0and later)  [AS select_statement];   -- (Note: Available in Hive 0.5.0and later; not supported forexternal tables)CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name  LIKE existing_table_or_view_name  [LOCATION hdfs_path];data_type  : primitive_type  | array_type  | map_type  | struct_type  | union_type  -- (Note: Available in Hive 0.7.0and later)primitive_type  : TINYINT  | SMALLINT  | INT  | BIGINT  | BOOLEAN  | FLOAT  | DOUBLE  | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0and later)  | STRING  | BINARY      -- (Note: Available in Hive 0.8.0and later)  | TIMESTAMP   -- (Note: Available in Hive 0.8.0and later)  | DECIMAL     -- (Note: Available in Hive 0.11.0and later)  | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0and later)  | DATE        -- (Note: Available in Hive 0.12.0and later)  | VARCHAR     -- (Note: Available in Hive 0.12.0and later)  | CHAR        -- (Note: Available in Hive 0.13.0and later)array_type  : ARRAY < data_type >map_type  : MAP < primitive_type, data_type >struct_type  : STRUCT < col_name : data_type [COMMENT col_comment], ...>union_type   : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7.0and later)row_format  : DELIMITED [FIELDS TERMINATED BY char[ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13and later)  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]file_format:  : SEQUENCEFILE  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)  | RCFILE      -- (Note: Available in Hive 0.6.0and later)  | ORC         -- (Note: Available in Hive 0.11.0and later)  | PARQUET     -- (Note: Available in Hive 0.13.0and later)  | AVRO        -- (Note: Available in Hive 0.14.0and later)  | JSONFILE    -- (Note: Available in Hive 4.0.0and later)  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classnamecolumn_constraint_specification:  : [ PRIMARY KEY|UNIQUE|NOT NULL|DEFAULT [default_value]|CHECK  [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]default_value:  : [ LITERAL|CURRENT_USER()|CURRENT_DATE()|CURRENT_TIMESTAMP()|NULL ]  constraint_specification:  : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]    [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]    [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE     [, CONSTRAINT constraint_name UNIQUE (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]    [, CONSTRAINT constraint_name CHECK [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ] | 
- TBLPROPERTIES ("comment"="table_comment")
- TBLPROPERTIES ("hbase.table.name"="table_name") – see HBase Integration.
- TBLPROPERTIES ("immutable"="true") or ("immutable"="false") in release 0.13.0+ (HIVE-6406) – see Inserting Data into Hive Tables from Queries.
- TBLPROPERTIES ("orc.compress"="ZLIB") or ("orc.compress"="SNAPPY") or ("orc.compress"="NONE") and other ORC properties – see ORC Files.
- TBLPROPERTIES ("transactional"="true") or ("transactional"="false") in release 0.14.0+, the default is "false" – see Hive Transactions.
- TBLPROPERTIES ("NO_AUTO_COMPACTION"="true") or ("NO_AUTO_COMPACTION"="false"), the default is "false" – see Hive Transactions.
- TBLPROPERTIES ("compactor.mapreduce.map.memory.mb"="mapper_memory") – see Hive Transactions.
- TBLPROPERTIES ("compactorthreshold.hive.compactor.delta.num.threshold"="threshold_num") – see Hive Transactions.
- TBLPROPERTIES ("compactorthreshold.hive.compactor.delta.pct.threshold"="threshold_pct") – see Hive Transactions.
- TBLPROPERTIES ("auto.purge"="true") or ("auto.purge"="false") in release 1.2.0+ (HIVE-9118) – see Drop Table, Drop Partitions, Truncate Table, and Insert Overwrite.
- TBLPROPERTIES ("EXTERNAL"="TRUE") in release 0.6.0+ (HIVE-1329) – Change a managed table to an external table and vice versa for "FALSE".
 
  
JSON 
  
  ROW FORMAT SERDE 
  'org.apache.hive.hcatalog.data.JsonSerDe' 
  STORED AS TEXTFILE
 
  
Stored as plain text file in JSON
  format.
  The JsonSerDe for JSON files is available in Hive 0.12 and later.
  
  
  
In some distributions, a reference to
  hive-hcatalog-core.jar is required.
ADD JAR
  /usr/lib/hive-hcatalog/lib/hive-hcatalog-core.jar;
  
  CREATE TABLE my_table(a string, b bigint,
  ...)
ROW FORMAT
  SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE;
  The JsonSerDe was moved to Hive from HCatalog and before it
  was in hive-contrib project. It was added to the Hive distribution
  by HIVE-4895.
  An Amazon SerDe is available at s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jar for releases prior to 0.12.0.
  
  The JsonSerDe for JSON files is available in Hive 0.12 and later.
Starting in Hive 3.0.0, JsonSerDe
  is added to Hive Serde as "org.apache.hadoop.hive.serde2.JsonSerDe"
  (HIVE-19211).
CREATE TABLE my_table(a
  string, b bigint, ...)
  ROW FORMAT
  SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
  STORED AS TEXTFILE;
Or STORED
  AS JSONFILE is supported starting in Hive
  4.0.0 (HIVE-19899), so you can create table as
  follows:
CREATE TABLE my_table(a string, b bigint, ...) STORED AS
  JSONFILE;
 
  
  
CSV/TSV
  
  ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
  STORED AS TEXTFILE
 
  
Stored as plain text file in CSV /
  TSV format.
   
The CSVSerde is available in Hive 0.14 and greater.
The following example creates a TSV
  (Tab-separated) file.
CREATE TABLE my_table(a
  string, b string, ...)
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "\t",
   "quoteChar"     = "'",
   "escapeChar"    = "\\"
)  
STORED AS TEXTFILE;
Default properties for SerDe is
  Comma-Separated (CSV) file
   
DEFAULT_ESCAPE_CHARACTER \
DEFAULT_QUOTE_CHARACTER 
  "
DEFAULT_SEPARATOR       
  ,
This SerDe works for most CSV data,
  but does not handle embedded newlines. To use the SerDe, specify the
  fully qualified class name org.apache.hadoop.hive.serde2.OpenCSVSerde.  
Documentation is based on original documentation
  at https://github.com/ogrodnek/csv-serde.
  
  
Limitations
This SerDe treats all columns to be
  of type String. Even if you create a table with non-string column types using
  this SerDe, the DESCRIBE TABLE output would show string column type. 
  The type information is retrieved from the SerDe. 
  
  To convert columns to the desired type in a table, you can create a view over
  the table that does the CAST to the desired type.
  
  
The CSV SerDe is based on https://github.com/ogrodnek/csv-serde,
  and was added to the Hive distribution in HIVE-7777.
The CSVSerde has been built and
  tested against Hive 0.14 and later, and uses Open-CSV 2.3
  which is bundled with the Hive distribution.
  
  For general information about SerDes, see Hive SerDe in the Developer Guide.
  Also see SerDe for details about input
  and output processing.
 
  
 
   
Storage Format
 
   
Description
 
   
 
  
STORED AS TEXTFILE
 
  
Stored as plain text files. TEXTFILE
  is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.
Use the DELIMITED clause to read delimited files.
Enable escaping for the delimiter characters by
  using the 'ESCAPED BY' clause (such as ESCAPED BY '\') 
  Escaping is needed if you want to work with data that can contain these
  delimiter characters. 
  
  A custom NULL format can also be specified using the 'NULL DEFINED AS' clause
  (default is '\N').
 
  
  
STORED AS SEQUENCEFILE
 
  
Stored as compressed Sequence File. 
 
  
  
STORED AS ORC
 
  
Stored as ORC file format. Supports ACID Transactions
  & Cost-based Optimizer (CBO). Stores column-level metadata. 
 
  
  
STORED AS PARQUET
 
  
Stored as Parquet format for the Parquet columnar storage format in Hive 0.13.0 and later; 
  Use ROW FORMAT SERDE ... STORED AS INPUTFORMAT ... OUTPUTFORMAT syntax ... in
  Hive 0.10, 0.11, or 0.12.
 
  
  
STORED AS AVRO
 
  
Stored as Avro format in Hive 0.14.0 and later (see Avro SerDe).
 
  
  
STORED AS RCFILE
 
  
Stored as Record
  Columnar File format.
 
  
  
STORED AS JSONFILE
 
  
Stored as Json file format in Hive
  4.0.0 and later.
 
  
  
STORED BY 
 
  
Stored by a non-native table format.
  To create or link to a non-native table, for example a table backed by HBase or Druid or Accumulo. 
  See StorageHandlers for more information on this
  option.
 
  
  
INPUTFORMAT and OUTPUTFORMAT
 
  
in the file_format to specify the
  name of a corresponding InputFormat and OutputFormat class as a string
  literal.
  
  For example,
  'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. 
  
  For LZO compression, the values to use are 
  'INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" 
  OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"'
  
  
  (see LZO Compression).
 
  
Partitioned TablesExample:
id     int,
date   date,
name   varchar
Example:
create table table_name (
  id                int,
  dtDontQuery       string,
  name              string
)
partitioned by (date string)Example:
CREATE TABLE page_view(viewTime INT, userid BIGINT,
     page_url STRING, referrer_url STRING,
     ip STRING COMMENT 'IP Address of the User')
 COMMENT 'This is the page view table'
 PARTITIONED BY(dt STRING, country STRING)
 STORED AS SEQUENCEFILE;
Example:
CREATE TABLE page_view(viewTime INT, userid BIGINT,
     page_url STRING, referrer_url STRING,
     ip STRING COMMENT 'IP Address of the User')
 COMMENT 'This is the page view table'
 PARTITIONED BY(dt STRING, country STRING)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\001'
STORED AS SEQUENCEFILE;
CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
     page_url STRING, referrer_url STRING,
     ip STRING COMMENT 'IP Address of the User',
     country STRING COMMENT 'country of origination')
 COMMENT 'This is the staging page view table'
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
 STORED AS TEXTFILE
 LOCATION ''; CTAS has these restrictions:
The target table cannot be an external table.
The target table cannot be a list bucketing table.
Example:
CREATE TABLE new_key_value_store
   ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
   STORED AS RCFile
   AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair;CREATE TABLE empty_key_value_store
LIKE key_value_store [TBLPROPERTIES (property_name=property_value, ...)];Example:
CREATE TABLE page_view(viewTime INT, userid BIGINT,
     page_url STRING, referrer_url STRING,
     ip STRING COMMENT 'IP Address of the User')
 COMMENT 'This is the page view table'
 PARTITIONED BY(dt STRING, country STRING)
 CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\001'
   COLLECTION ITEMS TERMINATED BY '\002'
   MAP KEYS TERMINATED BY '\003'
 STORED AS SEQUENCEFILECREATE TABLE list_bucket_single (key STRING, value STRING)
  SKEWED BY (key) ON (1,5,6) [STORED AS DIRECTORIES];
CREATE TABLE list_bucket_multiple (col1 STRING, col2 int, col3 STRING)
  SKEWED BY (col1, col2) ON (('s1',1), ('s3',3), ('s13',13), ('s78',78)) [STORED AS DIRECTORIES];Example:
CREATE TEMPORARY TABLE list_bucket_multiple (col1 STRING, col2 int, col3 STRING);
A table that supports operations with ACID semantics. See this for more details about transactional tables.
Example:
CREATE TRANSACTIONAL TABLE transactional_table_test(key string, value string) PARTITIONED BY(ds string) STORED AS ORC;
create table pk(id1 integer, id2 integer,
  primary key(id1, id2) disable novalidate);
create table fk(id1 integer, id2 integer,
  constraint c1 foreign key(id1, id2) references pk(id2, id1) disable novalidate);
create table constraints1(id1 integer UNIQUE disable novalidate, id2 integer NOT NULL, 
  usr string DEFAULT current_user(), price double CHECK (price > 0 AND price <= 1000));
create table constraints2(id1 integer, id2 integer,
  constraint c1_unique UNIQUE(id1) disable novalidate);
create table constraints3(id1 integer, id2 integer,
  constraint c1_check CHECK(id1 + id2 > 0));
Drop Table
DROP TABLE [IF EXISTS] table_name [PURGE];     -- (Note: PURGE available in Hive 0.14.0 and later)
TRUNCATE TABLE table_name [PARTITION partition_spec];
partition_spec:
  : (partition_column = partition_col_value, partition_column = partition_col_value, ...)

 
No comments:
Post a Comment