Here, I explore few other variations for importing data from database into HDFS.
This is a continuation of previous article..
Previous sqoop command listed were good for one time fetch when you want to import all the current data for a table in database.
A more practical workflow is to fetch data regularly and incrementally into HDFS for analysis. You do not want to skip any previously imported data. For this you have to mark a column for incremental import and also provide an initial value. This column mostly happens to be time-stamp.
At the end of the execution, you will have to note down the lastmodified value from the output
You can avoid noting down this value and changing your command for subsequent execution by saving your command as "Sqoop Job".
Save sqoop commands as jobs
List and delete sqoop jobs
Now it is pretty easy to list sqoop jobs
There is one caveat though while executing sqoop jobs. It will prompt for password input and if you are thinking of automation, you will have to make provisions for that.
Other useful configurations
There are few other useful command configuration which comes handy during data import. Some of the columns in the table might be a clob data which could contain free running text including the default delimiter and field separator. To make sure that I can parse the data in HDFS unambiguously, I try to provide field separator and also escape value.
Output data format
So far I have imported data as text files as it is easy to process via several means. How ever there are other formats available for use.
Happy sqooping!
Previous sqoop command listed were good for one time fetch when you want to import all the current data for a table in database.
A more practical workflow is to fetch data regularly and incrementally into HDFS for analysis. You do not want to skip any previously imported data. For this you have to mark a column for incremental import and also provide an initial value. This column mostly happens to be time-stamp.
sqoop import --connect jdbc:oracle:thin:@//HOST:PORT/DB --username DBA_USER -P --table TABLENAME --columns "column1,column2,column3,.." --as-textfile --target-dir /target/directory/in/hdfs -m 1 --check-column COLUMN3 --incremental lastmodified --last-value "LAST VALUE"
At the end of the execution, you will have to note down the lastmodified value from the output
--last-value xxxxxxxxxand use it in next execution of the command.
You can avoid noting down this value and changing your command for subsequent execution by saving your command as "Sqoop Job".
Save sqoop commands as jobs
sqoop job --create JOBNAME -- import --connect jdbc:oracle:thin:@//HOST:PORT/DB --username DBA_USER -P --table TABLENAME --columns "column1,column2,column3,.." --as-textfile --target-dir /target/directory/in/hdfs -m 1 --check-column COLUMN3 --incremental lastmodified --last-value "LAST VALUE"
List and delete sqoop jobs
Now it is pretty easy to list sqoop jobs
sqoop job --list sqoop job --delete JOB_NAMEExecute sqoop jobs
There is one caveat though while executing sqoop jobs. It will prompt for password input and if you are thinking of automation, you will have to make provisions for that.
sqoop job --exec JOB_NAME
Other useful configurations
There are few other useful command configuration which comes handy during data import. Some of the columns in the table might be a clob data which could contain free running text including the default delimiter and field separator. To make sure that I can parse the data in HDFS unambiguously, I try to provide field separator and also escape value.
sqoop import --fields-terminated-by '\001' --escaped-by \\ --connect jdbc:oracle:thin:@//HOST:PORT/DB --username DBA_USER -P --table TABLENAME --columns "column1,column2,column3,.." --as-textfile --target-dir /target/directory/in/hdfs -m 1 --check-column COLUMN3 --incremental lastmodified --last-value "LAST VALUE"
Output data format
So far I have imported data as text files as it is easy to process via several means. How ever there are other formats available for use.
--as-avrodatafile Imports data to Avro Data Files --as-sequencefile Imports data to SequenceFiles --as-textfile Imports data as plain text (default)
Happy sqooping!
Big data in hadoop is the interesting topic and to get some important information.Big data hadoop online Course
ReplyDeleteGreat information
ReplyDeletebest android training center in Marathahalli
best android development institute in Marathahalli
android training institutes in Marathahalli
ios training in Marathahalli
android training in Marathahalli
mobile app development training in Marathahalli
Check The Sos Stocktwits Overview To Monitor A Stock And View Live Price Changes As They Occur, Including Volume And Share Changes. With Our Live, Real Time Stock Market Overview App, You No Longer Need To Refresh Your Browser Or Watch Multiple Computer Screens To View Real Time Stock Prices.
ReplyDeleteOperating from New Delhi, Delhi, India, BK IDNSUTRIES came into existence in the year 1963. We are known in the national and international market as a trustworthy manufacturer and we deal in all type of single & double facers corrugated rollers and industrial gears. Our speciality is UV Shaped Corrugating Rolls. Our products can meet the demand of different clients with a large quantity of models and complete specifications.
ReplyDeletenice blog , share more like this
ReplyDeleteWipro Jobs
Wipro recruitment