Here, I explore few other variations for importing data from database into HDFS.
This is a continuation of previous article..
Previous sqoop command listed were good for one time fetch when you want to import all the current data for a table in database.
A more practical workflow is to fetch data regularly and incrementally into HDFS for analysis. You do not want to skip any previously imported data. For this you have to mark a column for incremental import and also provide an initial value. This column mostly happens to be time-stamp.
Previous sqoop command listed were good for one time fetch when you want to import all the current data for a table in database.
A more practical workflow is to fetch data regularly and incrementally into HDFS for analysis. You do not want to skip any previously imported data. For this you have to mark a column for incremental import and also provide an initial value. This column mostly happens to be time-stamp.
sqoop import --connect jdbc:oracle:thin:@//HOST:PORT/DB --username DBA_USER -P --table TABLENAME --columns "column1,column2,column3,.." --as-textfile --target-dir /target/directory/in/hdfs -m 1 --check-column COLUMN3 --incremental lastmodified --last-value "LAST VALUE"