Programmer's notebook: External tables in Hive are handy

Thursday, March 15, 2012

External tables in Hive are handy

Usually when you create tables in hive using raw data in HDFS, it moves them to a different location - "/user/hive/warehouse". If you created a simple table, it will be located inside the data warehouse. The following hive command creates a table with data location at "/user/hive/warehouse/user".

hive>   CREATE TABLE user(id INT, name STRING) ROW FORMAT
              DELIMITED FIELDS TERMINATED BY ','
              LINES TERMINATED BY '\n' STORED AS TEXTFILE;

Consider that the raw data is located at "/home/admin/data1.txt" and if you issues the following hive command, the data would be moved to a new location at "/user/hive/warehouse/user/data1.txt".

hive> LOAD DATA INPATH '/home/admin/userdata/data1.txt' INTO TABLE user;

If we want to just do hive queries, it is all fine. When you drop the table, the raw data is lost as the directory corresponding to the table in warehouse is deleted.
You may also not want to delete the raw data as some one else might use it in map-reduce programs external to hive analysis. It is far more convenient to retain the data at original location via "EXTERNAL" tables.
To create external table, simply point to the location of data while creating the tables. This will ensure that the data is not moved into a location inside the warehouse directory.

hive>   CREATE TABLE user(id INT, name STRING) ROW FORMAT
              DELIMITED FIELDS TERMINATED BY ','
              LINES TERMINATED BY '\n' 
              STORED AS TEXTFILE
              LOCATION '/home/admin/userdata';

Now you could happily use both Hive HQL queries as well as hand-crafted map-reduce programs on the same data. How ever, when you drop the table, hive would attempt to delete the externally located data. This can be addressed by explicitly marking the table "EXTERNAL". Try dropping the table, you will see that the raw data is retained.

hive>   CREATE EXTERNAL TABLE user(id INT, name STRING) ROW FORMAT
              DELIMITED FIELDS TERMINATED BY ','
              LINES TERMINATED BY '\n' 
              STORED AS TEXTFILE
              LOCATION '/home/admin/userdata';

There are few more goodies in Hive that surprised me. You can overlay multiple tables all pointing to the same raw data. The following command below will ensure that there are two table with different schema overlay over the same raw data. This allows you to experiment and create new tables which improves on the previous schema. It also allows you to use different schema for different hive queries.

hive>   CREATE EXTERNAL TABLE userline(line STRING) ROW FORMAT
              DELIMITED FIELDS TERMINATED BY ','
              LINES TERMINATED BY '\n' 
              STORED AS TEXTFILE
              LOCATION '/home/admin/userdata';

26 comments:

amitFebruary 19, 2013 at 4:52 AM
Good post!
ReplyDelete
Replies
UnknownFebruary 27, 2013 at 7:13 PM
Thanks a lot for your detailed example.
ReplyDelete
Replies
DominiekMarch 14, 2013 at 6:56 AM
A bit subawesome that the LOCATION has to point to a dedicated folder and can't point to a single file though...
ReplyDelete
Replies
UnknownApril 9, 2013 at 5:27 AM
LOAD DATA INPATH '/home/admin/userdata/data1.txt' INTO TABLE user;
shoulb be LOAD DATA LOCAL INPATH
ReplyDelete
Replies
UnknownMay 4, 2013 at 9:13 AM
Is it possible to load the data into a rdbms from hive ?. If yes how do I do it, say hive to Oracle.
ReplyDelete
Replies
UnknownMay 15, 2013 at 4:21 AM
I'd say via ETL tool using Hive ODBC.
ReplyDelete
Replies
MinutisJuly 11, 2013 at 5:10 AM
@Ratheesh use sqoop - http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html
ReplyDelete
Replies
UnknownOctober 2, 2013 at 3:28 AM
Hi,

may you please show me how to run the following command on oozie workflow:
hive> LOAD DATA INPATH '/home/admin/userdata/data1.txt' INTO TABLE user;

my goal is to load data to the external hive table every hour.
ReplyDelete
Replies
UnknownApril 5, 2014 at 10:00 AM
Thank you for your detailed explanation!
ReplyDelete
Replies
UnknownAugust 30, 2014 at 12:14 AM
very informative...thanks...
ReplyDelete
Replies
AlbertoMay 6, 2015 at 9:51 AM
Hi,

Really useful post!

Is there any way to omit file fields in external tables? In your example I would like to create an external table with just the second field of the file: name STRING

Thanks!!
ReplyDelete
Replies
UnknownJuly 15, 2016 at 5:04 PM
Can one create external table as select query and store it to the desired location?
ReplyDelete
Replies
needDecember 6, 2016 at 1:54 AM
how do we dump data in an existing external table without deleting/moving the source files?
ReplyDelete
Replies
UnknownFebruary 24, 2017 at 1:35 AM
can we create hive external table with bucket and ORC option?
ReplyDelete
Replies
OGEN Infosystem (P) LimitedApril 4, 2019 at 9:52 PM
Thankful to you for this amazing information sharing with us. Get website designing and development services by Ogen Infosystem.
Website Designing Company in Delhi
ReplyDelete
Replies
Kala KutirMay 8, 2019 at 12:49 AM
Decent, Get Service for Night out page 3 parties and this magnificent service provided by Lifestyle Magazine.
Lifestyle Magazine India
ReplyDelete
Replies
Just InfoJuly 8, 2019 at 3:21 AM
Great, I think this is one of the best blog in past some time I have seen. Visit Kalakutir for Fleet Painting, Godown Line Marking Painting and Caution & Indication Signages.
Fleet Painting
ReplyDelete
Replies
Online FrontNovember 2, 2020 at 11:25 PM
Thanks for sharing such a great information.. It really helpful to me..I always search to read the quality content and finally i found this in you post. keep it up!
Our Service:
Digital marketing Company
SMM Services
PPC Services in Delhi
Website Design & Development Packages
Web Development Packages
Web Development Package
Social Media Management Packages
Social Media Management Services
SEO Services Packages
ReplyDelete
Replies
aaryanOctober 9, 2021 at 3:11 AM
azure solution architect certification
aws solution architect training
azure data engineer certification
openshift certification
oracle cloud integration training
ReplyDelete
Replies
Peter SchiffJanuary 18, 2022 at 11:24 PM
Gteh Stocktwits Real-Time Overview Of A Stock, Including Recent And Historical Price Charts, News, Events, Analyst Rating Changes And Other Key Stock Information.
ReplyDelete
Replies
techarinatipsMarch 7, 2024 at 3:58 AM
Explore top-notch Docker consulting services to optimize containerization strategies. Unlock efficiency and seamless deployment for your business success.
ReplyDelete
Replies
aryanawslessionsJune 26, 2024 at 9:31 AM
mt5 download apk
ReplyDelete
Replies
freehtmlthemeJune 30, 2024 at 6:08 AM
forex trading kya hai
ReplyDelete
Replies

Add comment