Programmer's notebook: Ingest data from database into Hadoop with Sqoop (1)

Thursday, June 7, 2012

Ingest data from database into Hadoop with Sqoop (1)

Sqoop is an easy tool to import data from databases to HDFS and export data from Hadoop/Hive tables to Databases. Databases has been de-facto standard for storing structured data. Running complex queries on large data in databases can be detrimental to their performance.
It is some times useful to import the data into Hadoop for ad hoc analysis. Tools like hive, raw map-reduce can provide tremendous flexibility in performing various kinds of analysis.
This becomes particularly useful when database has been used mostly as storage device (Ex: Storing XML or unstructured string data as clob data).

Sqoop is very simple on it's face. Internally, it uses map-reduce in parallel data import from Database and utilizes JDBC connection for the purpose.

I am jumping straight into using sqoop with oracle database and will leave installation for some other post.

Sqoop commands are executed from command lines using following structure:

sqoop COMMAND [ARGS]

All available sqoop commands can be listed with: sqoop help

Article focuses on importing from database specifically Oracle DB.
All commands displayed multiline has to be run as a single command.

List database schema present on Oracle server.

User credentials should have have sufficient permission to list the databases.

sqoop list-databases 
             --connect jdbc:oracle:thin:@//HOST:PORT
             --username DBA_USER 
             --password password

Providing password on command line is insecure and we should be using "-P" instead. Sqoop will prompt for the password when you execute the command.

List tables present in a Oracle DB.

sqoop list-tables 
             --connect jdbc:oracle:thin:@//HOST:PORT/DB
             --username DBA_USER 
             -P

Run arbitrary sql command using Sqoop

sqoop eval 
             --connect jdbc:oracle:thin:@//HOST:PORT/DB
             --username DBA_USER 
             -P
             --query "SQL QUERY"

List columns for table in Oracle DB

sqoop eval 
             --connect jdbc:oracle:thin:@//HOST:PORT/DB
             --username DBA_USER 
             -P
             --query "SELECT  t.*  FROM  TABLENAME  t WHERE 1=0"

Importing tables from Oracle DB
There are many destinations from importing data from Oracle DB. You can import data to

HDFS as raw files
Hive table existing or newly created
HBase tables

I have not explored all of them but I definitely prefer storing them as raw files over hive tables. This makes it easy to define different hive table schema using external tables. You also need to ensure that the target directory in hdfs does not exist prior to running the sqoop command.
Following will fetch all the data from the table and store it as text file in target directory in HDFS.

hadoop fs -rmr /target/directory/in/hdfs

sqoop import 
             --connect jdbc:oracle:thin:@//HOST:PORT/DB
             --username DBA_USER 
             -P
             --table TABLENAME
             --as-textfile
             --target-dir /target/directory/in/hdfs

The above tries to execute a query to extract column names from the table. If this is not possible then it will error out with

ERROR tool.ImportTool: Imported Failed: Attempted to generate class with no columns!

In that case, we have to specify the columns manually in the command.

sqoop import 
             --connect jdbc:oracle:thin:@//HOST:PORT/DB
             --username DBA_USER 
             -P
             --table TABLENAME
             --columns "column1,column2,column3,.."
             --as-textfile
             --target-dir /target/directory/in/hdfs

The above command works well when primary key is defined for the table. If this is not specified, you will have to perform sequential import by forcing a single mapper.

sqoop import 
             --connect jdbc:oracle:thin:@//HOST:PORT/DB
             --username DBA_USER 
             -P
             --table TABLENAME
             --columns "column1,column2,column3,.."
             --as-textfile
             --target-dir /target/directory/in/hdfs
             -m 1

or split by column in the table

sqoop import 
             --connect jdbc:oracle:thin:@//HOST:PORT/DB
             --username DBA_USER 
             -P
             --table TABLENAME
             --columns "column1,column2,column3,.."
             --as-textfile
             --target-dir /target/directory/in/hdfs
             --split-by COLUMNNAME

I will explore few other nuances in importing data into HDFS in subsequent article.

28 comments:

Deepak DabiMarch 11, 2013 at 6:25 AM
hey can we use oracle wallet keys in oracle import, export i mean how can i remove this username and password param from command ?
ReplyDelete
Replies
UnknownMarch 26, 2013 at 5:01 PM
Hi, My Table has couple of string columns and one clob. How can I move this data from my oracle table to HBase? Also I am planning to unmarshall the XML object and store the data in HBase.

Thanks
Raj
ReplyDelete
Replies
BaskarApril 9, 2013 at 11:25 AM
Import worked out well as per above commands.. Need your help on exporting csv file from hdfs to a table on oracle DB.
ReplyDelete
Replies
BaskarApril 9, 2013 at 2:34 PM
I need to EXPORT a csv file from hdfs directory to ORACLE DB. Please help me out on this..
sqoop export --options-file oracle_db_connect.txt --table stitem_sls_d --export-dir /user/root/testdata/sqoop_test/ --input-fields-terminated-by ','

above command throws exception error.
ReplyDelete
Replies
AnonymousMay 9, 2013 at 7:28 AM
HI can we append imported data from mysql to existing file in hdfs with sqoop.because I want to append daily data into hdfs, but in sqoop-append every time it creates new file in existing folder like part-m-00000,part-m-00001... etc. I want it should append every time in part-m-00000 file. how Can it done?
ReplyDelete
Replies
venucherryApril 1, 2014 at 2:54 AM
Hi Ashish,

I just tried to load data from Oracle to HDFS

Before that installed Oracle 11g and tried to import data from Oracle..

default user: system,password:Admin123$,servicename:orcl and also i copied ojdbc6.jar into sqoop/lib.

Below are the commands i used and executed but still i am facing the issues

training@BigDataVM:~/
Installations/sqoop-1.4.2.bin__hadoop-1.0.0$ bin/sqoop import --connect jdbc:oracle:thin:@//192.168.1.7:1521/orcl --username system --password Admin123$ --table hr.employees -m 1;

training@BigDataVM:~/Installations/sqoop-1.4.2.bin__hadoop-1.0.0$ bin/sqoop import --connect jdbc:oracle:thin:@//localhost:1521/orcl --username system --password Admin123$ --table hr.employees -m 1;

training@BigDataVM:~/Installations/sqoop-1.4.2.bin__hadoop-1.0.0$ bin/sqoop import --connect jdbc:oracle:thin:@//192.168.1.7:1521/orcl --username system --password Admin123$ --table employees -m 1;

training@BigDataVM:~/Install ations/sqoop-1.4.2.bin__hadoop-1.0.0$ bin/sqoop import --connect jdbc:oracle:thin:@//locahost:1521/orcl --username system --password Admin123$ --table employees -m 1;

Error:

14/04/01 13:48:29 ERROR manager.SqlManager: Error executing statement: java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection
java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection

Could you please suggest here how to proceed further.

Thanks
Venu.Ch
ReplyDelete
Replies
Naveen KMay 22, 2014 at 10:21 PM
Hi all,

Will you please send me the query for one hive table to multiple sql server tables in sqoop export?

One hive table to one sql server destination table is working fine. But our requirement is to develop one to many, many to one and many to many relationship is it possible to do it in hadoop?

Please send me the result as soon as possible...
ReplyDelete
Replies
mareddyonlineJuly 22, 2014 at 8:55 PM
This is a great inspiring tutorials on hadoop.I am pretty much pleased with your good work.You put really very helpful information. Keep it up.
Hadoop Training in hyderabad
ReplyDelete
Replies
vignes waranJuly 24, 2016 at 3:35 AM

I regards For all content was really great.I will clearly undestand and then respects to your clear-cut point.I've to very impressed with our writing style.

Hadoop Training
Tableau Training
Informatica Training
Angularjs Training
SAP Success Factor Training
Seo Training
ReplyDelete
Replies
UnknownJuly 27, 2016 at 8:56 AM
you can visit to the below url , they are offering very good videos on Hadoop:

For free videos from previous sessions refer:
http://hadoopbigdatatutorial.com/hadoop-training/big-data-tutorial

ReplyDelete
Replies
UnknownAugust 1, 2016 at 4:48 AM
Very good article to explore in sqoop. I also found some useful commands in this link
www.geoinsyssoft.com/blog
kindly look this blog for more information about hadoop eco systems.
ReplyDelete
Replies
UnknownAugust 25, 2016 at 5:12 AM
Using big data analytics may give the companies many fruitful results, the findings can be implemented in their business decisions so as to minimize their risk and to cut the costs.
Big data Hadoop Training
ReplyDelete
Replies
UnknownMay 10, 2017 at 10:37 PM
Nice blog to read.. This blog having more useful information which really helpful to improve my hadoop knowledge .. thanks a lot for sharing this blog to us

big data training in chennai | best hadoop training in chennai | best big data training in chennai
ReplyDelete
Replies
UnknownMay 11, 2017 at 12:24 AM
Great and helpful blog to everyone.. This blog having more useful information which having clear explanation so easy and interesting to read.. This blog really useful to develop my knowledge in hadoop and cracking interview easily.. thanks a lot for sharing this blog to us...

hadoop training institute in velachery | big data training institute in velachery
ReplyDelete
Replies
vignesjosephJune 11, 2017 at 10:03 PM
This support content blog is very beneficial.Big Data analytics using for my company give that many fruitful results, All kind of findings data can be implemented to business decisions.Hadoop Training in Chennai | Hadoop Training Course in Chennai
ReplyDelete
Replies
Ramya KrishnanJune 16, 2017 at 1:47 AM
Fantastic and great blog, thanks for sharing this, it is more useful for us and my friends.

Hadoop Training in Chennai
ReplyDelete
Replies
TejutejuSeptember 25, 2018 at 12:39 AM
Thank you.Well it was nice post and very helpful information on Big Data Hadoop Online Training Bangalore
ReplyDelete
Replies
Praylin SDecember 3, 2018 at 3:35 AM
Thank you for such an informative post. I'm glad that I found your post.. Thanks for sharing.
C C++ Training in Chennai | C Training in Chennai | C++ Training in Chennai | C++ Training | C Language Training | C++ Programming Course | C and C++ Institute | C C++ Training in Chennai | C Language Training in Chennai
ReplyDelete
Replies
sangeetha sathyanDecember 4, 2018 at 3:48 AM
Excellent and useful blog admin, keep sharing more like this.
RPA Training in Chennai
Robotics Process Automation Training in Chennai
RPA courses in Chennai
Machine Learning Training in Chennai
AWS Training in Chennai
Angularjs Training in Chennai
ReplyDelete
Replies
AppslureFebruary 18, 2019 at 4:03 AM
This comment has been removed by the author.
ReplyDelete
Replies
gauthamSeptember 25, 2019 at 9:29 PM
Really nice post learn sql online through sql course
ReplyDelete
Replies
gbwhatsapp apkMay 30, 2020 at 2:40 AM
putlockerss.net Puxa vida devo agradecer vocês ganharam meu dia que site fantástico cheio de noticias não me canso de Elogiar já é a minha terceira visita por aqui absolutamente fantástico.
ReplyDelete
Replies
MithunMay 31, 2020 at 4:14 AM
Great Articles!!!informative contents...It shows your in-depth knowledge on the content...Big Thanks
Java training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery
ReplyDelete
Replies
Ufa88khMarch 13, 2022 at 8:35 PM
Thank you for sharing us education, please kindly visit mine
wordpress
blogspot
youtube
ប្រដាល់
ReplyDelete
Replies
sravanthiNovember 14, 2025 at 7:56 PM
Thank you for sharing such a great article.Keep posting.Looking forward to read more.

Oracle Time and Labor Training
SAP HCI Online Coaching from Canada
SAP BTP Online Training from USA
Azure Data Engineer Online Training Coaching
Azure Data Factory Training from Pune
Pyspark With Azure Databricks Training Classes
Best Linux Shell Scripting Training Institute In India
ReplyDelete
Replies
NikhilJanuary 19, 2026 at 9:47 AM
A Python Django course online allows flexible backend learning.It focuses on real coding tasks.This Python Django course online supports consistency.It is effective.
ReplyDelete
Replies

Add comment