Here, I explore few other variations for importing data from database into HDFS.
This is a continuation of previous article..
Previous sqoop command listed were good for one time fetch when you want to import all the current data for a table in database.
A more practical workflow is to fetch data regularly and incrementally into HDFS for analysis. You do not want to skip any previously imported data. For this you have to mark a column for incremental import and also provide an initial value. This column mostly happens to be time-stamp.
Previous sqoop command listed were good for one time fetch when you want to import all the current data for a table in database.
A more practical workflow is to fetch data regularly and incrementally into HDFS for analysis. You do not want to skip any previously imported data. For this you have to mark a column for incremental import and also provide an initial value. This column mostly happens to be time-stamp.
sqoop import
--connect jdbc:oracle:thin:@//HOST:PORT/DB
--username DBA_USER
-P
--table TABLENAME
--columns "column1,column2,column3,.."
--as-textfile
--target-dir /target/directory/in/hdfs
-m 1
--check-column COLUMN3
--incremental lastmodified
--last-value "LAST VALUE"