Sqoop is an easy tool to import data from databases to HDFS and export data from Hadoop/Hive tables to Databases.
Databases has been de-facto standard for storing structured data. Running complex queries on large data in databases can be detrimental to their performance.
It is some times useful to import the data into Hadoop for ad hoc analysis. Tools like hive, raw map-reduce can provide tremendous flexibility in performing various kinds of analysis.
This becomes particularly useful when database has been used mostly as storage device (Ex: Storing XML or unstructured string data as clob data).
Sqoop is very simple on it's face. Internally, it uses map-reduce in parallel data import from Database and utilizes JDBC connection for the purpose.
I am jumping straight into using sqoop with oracle database and will leave installation for some other post.
Sqoop commands are executed from command lines using following structure:
Article focuses on importing from database specifically Oracle DB.
All commands displayed multiline has to be run as a single command.
List database schema present on Oracle server.
User credentials should have have sufficient permission to list the databases.
Providing password on command line is insecure and we should be using "-P" instead. Sqoop will prompt for the password when you execute the command.
List tables present in a Oracle DB.
Run arbitrary sql command using Sqoop
List columns for table in Oracle DB
Importing tables from Oracle DB
There are many destinations from importing data from Oracle DB. You can import data to
I have not explored all of them but I definitely prefer storing them as raw files over hive tables. This makes it easy to define different hive table schema using external tables. You also need to ensure that the target directory in hdfs does not exist prior to running the sqoop command.
Following will fetch all the data from the table and store it as text file in target directory in HDFS.
It is some times useful to import the data into Hadoop for ad hoc analysis. Tools like hive, raw map-reduce can provide tremendous flexibility in performing various kinds of analysis.
This becomes particularly useful when database has been used mostly as storage device (Ex: Storing XML or unstructured string data as clob data).
Sqoop is very simple on it's face. Internally, it uses map-reduce in parallel data import from Database and utilizes JDBC connection for the purpose.
I am jumping straight into using sqoop with oracle database and will leave installation for some other post.
Sqoop commands are executed from command lines using following structure:
sqoop COMMAND [ARGS]All available sqoop commands can be listed with: sqoop help
Article focuses on importing from database specifically Oracle DB.
All commands displayed multiline has to be run as a single command.
List database schema present on Oracle server.
User credentials should have have sufficient permission to list the databases.
sqoop list-databases --connect jdbc:oracle:thin:@//HOST:PORT --username DBA_USER --password password
Providing password on command line is insecure and we should be using "-P" instead. Sqoop will prompt for the password when you execute the command.
List tables present in a Oracle DB.
sqoop list-tables --connect jdbc:oracle:thin:@//HOST:PORT/DB --username DBA_USER -P
Run arbitrary sql command using Sqoop
sqoop eval --connect jdbc:oracle:thin:@//HOST:PORT/DB --username DBA_USER -P --query "SQL QUERY"
List columns for table in Oracle DB
sqoop eval --connect jdbc:oracle:thin:@//HOST:PORT/DB --username DBA_USER -P --query "SELECT t.* FROM TABLENAME t WHERE 1=0"
Importing tables from Oracle DB
There are many destinations from importing data from Oracle DB. You can import data to
- HDFS as raw files
- Hive table existing or newly created
- HBase tables
I have not explored all of them but I definitely prefer storing them as raw files over hive tables. This makes it easy to define different hive table schema using external tables. You also need to ensure that the target directory in hdfs does not exist prior to running the sqoop command.
Following will fetch all the data from the table and store it as text file in target directory in HDFS.
hadoop fs -rmr /target/directory/in/hdfs sqoop import --connect jdbc:oracle:thin:@//HOST:PORT/DB --username DBA_USER -P --table TABLENAME --as-textfile --target-dir /target/directory/in/hdfsThe above tries to execute a query to extract column names from the table. If this is not possible then it will error out with
ERROR tool.ImportTool: Imported Failed: Attempted to generate class with no columns!In that case, we have to specify the columns manually in the command.
sqoop import --connect jdbc:oracle:thin:@//HOST:PORT/DB --username DBA_USER -P --table TABLENAME --columns "column1,column2,column3,.." --as-textfile --target-dir /target/directory/in/hdfsThe above command works well when primary key is defined for the table. If this is not specified, you will have to perform sequential import by forcing a single mapper.
sqoop import --connect jdbc:oracle:thin:@//HOST:PORT/DB --username DBA_USER -P --table TABLENAME --columns "column1,column2,column3,.." --as-textfile --target-dir /target/directory/in/hdfs -m 1or split by column in the table
sqoop import --connect jdbc:oracle:thin:@//HOST:PORT/DB --username DBA_USER -P --table TABLENAME --columns "column1,column2,column3,.." --as-textfile --target-dir /target/directory/in/hdfs --split-by COLUMNNAMEI will explore few other nuances in importing data into HDFS in subsequent article.
hey can we use oracle wallet keys in oracle import, export i mean how can i remove this username and password param from command ?
ReplyDeleteI have not been able to remove the username and password parameters but you don't have to give it on command-line for it to be discovered in history. The startup command can ask for it for the first time.
DeleteHi, My Table has couple of string columns and one clob. How can I move this data from my oracle table to HBase? Also I am planning to unmarshall the XML object and store the data in HBase.
ReplyDeleteThanks
Raj
Import worked out well as per above commands.. Need your help on exporting csv file from hdfs to a table on oracle DB.
ReplyDeleteI need to EXPORT a csv file from hdfs directory to ORACLE DB. Please help me out on this..
ReplyDeletesqoop export --options-file oracle_db_connect.txt --table stitem_sls_d --export-dir /user/root/testdata/sqoop_test/ --input-fields-terminated-by ','
above command throws exception error.
HI can we append imported data from mysql to existing file in hdfs with sqoop.because I want to append daily data into hdfs, but in sqoop-append every time it creates new file in existing folder like part-m-00000,part-m-00001... etc. I want it should append every time in part-m-00000 file. how Can it done?
ReplyDeleteuse sqoop merge tool to merge those files.
DeleteHi Ashish,
ReplyDeleteI just tried to load data from Oracle to HDFS
Before that installed Oracle 11g and tried to import data from Oracle..
default user: system,password:Admin123$,servicename:orcl and also i copied ojdbc6.jar into sqoop/lib.
Below are the commands i used and executed but still i am facing the issues
training@BigDataVM:~/
Installations/sqoop-1.4.2.bin__hadoop-1.0.0$ bin/sqoop import --connect jdbc:oracle:thin:@//192.168.1.7:1521/orcl --username system --password Admin123$ --table hr.employees -m 1;
training@BigDataVM:~/Installations/sqoop-1.4.2.bin__hadoop-1.0.0$ bin/sqoop import --connect jdbc:oracle:thin:@//localhost:1521/orcl --username system --password Admin123$ --table hr.employees -m 1;
training@BigDataVM:~/Installations/sqoop-1.4.2.bin__hadoop-1.0.0$ bin/sqoop import --connect jdbc:oracle:thin:@//192.168.1.7:1521/orcl --username system --password Admin123$ --table employees -m 1;
training@BigDataVM:~/Install ations/sqoop-1.4.2.bin__hadoop-1.0.0$ bin/sqoop import --connect jdbc:oracle:thin:@//locahost:1521/orcl --username system --password Admin123$ --table employees -m 1;
Error:
14/04/01 13:48:29 ERROR manager.SqlManager: Error executing statement: java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection
java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection
Could you please suggest here how to proceed further.
Thanks
Venu.Ch
Hi all,
ReplyDeleteWill you please send me the query for one hive table to multiple sql server tables in sqoop export?
One hive table to one sql server destination table is working fine. But our requirement is to develop one to many, many to one and many to many relationship is it possible to do it in hadoop?
Please send me the result as soon as possible...
This is a great inspiring tutorials on hadoop.I am pretty much pleased with your good work.You put really very helpful information. Keep it up.
ReplyDeleteHadoop Training in hyderabad
ReplyDeleteI regards For all content was really great.I will clearly undestand and then respects to your clear-cut point.I've to very impressed with our writing style.
Hadoop Training
Tableau Training
Informatica Training
Angularjs Training
SAP Success Factor Training
Seo Training
you can visit to the below url , they are offering very good videos on Hadoop:
ReplyDeleteFor free videos from previous sessions refer:
http://hadoopbigdatatutorial.com/hadoop-training/big-data-tutorial
Very good article to explore in sqoop. I also found some useful commands in this link
ReplyDeletewww.geoinsyssoft.com/blog
kindly look this blog for more information about hadoop eco systems.
Using big data analytics may give the companies many fruitful results, the findings can be implemented in their business decisions so as to minimize their risk and to cut the costs.
ReplyDeleteBig data Hadoop Training
Nice blog to read.. This blog having more useful information which really helpful to improve my hadoop knowledge .. thanks a lot for sharing this blog to us
ReplyDeletebig data training in chennai | best hadoop training in chennai | best big data training in chennai
Great and helpful blog to everyone.. This blog having more useful information which having clear explanation so easy and interesting to read.. This blog really useful to develop my knowledge in hadoop and cracking interview easily.. thanks a lot for sharing this blog to us...
ReplyDeletehadoop training institute in velachery | big data training institute in velachery
This support content blog is very beneficial.Big Data analytics using for my company give that many fruitful results, All kind of findings data can be implemented to business decisions.Hadoop Training in Chennai | Hadoop Training Course in Chennai
ReplyDeleteFantastic and great blog, thanks for sharing this, it is more useful for us and my friends.
ReplyDeleteHadoop Training in Chennai
Thank you.Well it was nice post and very helpful information on Big Data Hadoop Online Training Bangalore
ReplyDeleteThank you for such an informative post. I'm glad that I found your post.. Thanks for sharing.
ReplyDeleteC C++ Training in Chennai | C Training in Chennai | C++ Training in Chennai | C++ Training | C Language Training | C++ Programming Course | C and C++ Institute | C C++ Training in Chennai | C Language Training in Chennai
Excellent and useful blog admin, keep sharing more like this.
ReplyDeleteRPA Training in Chennai
Robotics Process Automation Training in Chennai
RPA courses in Chennai
Machine Learning Training in Chennai
AWS Training in Chennai
Angularjs Training in Chennai
This comment has been removed by the author.
ReplyDeleteReally nice post learn sql online through sql course
ReplyDeleteputlockerss.net Puxa vida devo agradecer vocês ganharam meu dia que site fantástico cheio de noticias não me canso de Elogiar já é a minha terceira visita por aqui absolutamente fantástico.
ReplyDeleteGreat Articles!!!informative contents...It shows your in-depth knowledge on the content...Big Thanks
ReplyDeleteJava training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery
Thank you for sharing us education, please kindly visit mine
ReplyDeletewordpress
blogspot
youtube
ប្រដាល់