Usually, we learn to import RDBMS data into HDFS but there are times when we need to directly import data from MySQL to Hive using Sqoop.
In this tutorial, I will share how to import data from MySQL to Hive using Sqoop in an easy way. Using this method you can simply import RDBMS data into Hive using Sqoop.
Import MySQL Data to Hive using Sqoop
Here I am assuming that you have already installed Sqoop, MySQL, and Hive on your system. If not, then you can follow our Sqoop Tutorial and HDFS Tutorial for reference.
Just follow the below steps to import MySQL table in Hive using Sqoop.
I. Check MySQL Table emp
Let’s suppose we have one table in MySQL database “empdb” with the name “emp”. The table is looking like below-
dept_no | dept_name |
+---------+--------------------+
| d009 | Customer Service |
| d005 | Development |
| d002 | Finance |
| d003 | Human Resources |
| d001 | Marketing |
| d004 | Production |
| d006 | Quality Management |
| d008 | Research |
| d007 | Sales
We have to import the above emp table into Hive using Sqoop command.
II. Now write the Sqoop import scripts to import MySQL data into Hive
sqoop import \
–connect jdbc:mysql://localhost/empdb \
--username userid \
--password pwd \
--table emp \
--direct \
-m 1 \
--hive-import \
--create-hive-table \
--hive-table emp_mysql \
--target-dir /user/hive/warehouse/ \
--enclosed-by '\"' \
--fields-terminated-by , \
--escaped-by \\ \
III. Check the file in HDFS
As we know all the files will get stored in HDFS and so we can check the file in HDFS at the below Hive location as specified in the above query-
/user/hive/warehouse/emp_mysql/part-m-00000
IV. Verify the number of records
We can also verify the records like below-
$ hadoop fs -cat /user/hive/warehouse/emp_mysql/part-m-00000 | wc –l
V. Check the imported records in HDFS
If you want to see the table you have imported in HDFS, you can use the below command-
$ hadoop fs -cat /user/hive/warehouse/emp_mysql/part-m-00000 | more
VI. Verify data in Hive
Now as we have imported the data in Hive from MySQL and so let’s verify in Hive to ensure that all the process has been done correctly.
As we created a table in Hive while writing Sqoop command “emp_mysql” and so just do select operation on this table in Hive.
hive> select * from emp_mysql;
If it has been done successfully then you will get the result like below-
"d009" "Customer Service"
"d005" "Development"
"d002" "Finance"
"d003" "Human Resources"
"d001" "Marketing"
"d004" "Production"
"d006" "Quality Management"
"d008" "Research"
"d007" "Sales"
Conclusion
This was all about how to import data from MySQL to Hive using Sqoop. Hope this method to import MySQL data in Hive using Sqoop will help you while importing data directly in Hive using Sqoop.
Want to learn Hadoop? Just subscribe us and we will send you our premium courses to you for free.
How do I write a java code to import data from Oracle/MySQL to Hive using SqoopOptions?
Hi, You have simple IMPORT option of Sqoop to do this. Simply create a table in Hive or you can do through sqoop as well and import the data. Please follow the below link for the detailed steps. http://hdfstutorial.com/blog/how-to-import-data-from-…
Excellent post.
Clear steps for Sqoop importing. Thank you.
Reading your website is a big pleasure for me buddy.
Thanks for this awesome tutorial.
Thanks for this importing guide.
Hi,
Really liked it. Can you please tell me what will be the difference of using the number of mapper while importing.
[…] Hive Tutorial guide as different client software has diverse usage or forms of conventions. If a disappointment […]
Hi,
Thanks for the script. Can you please also explain how the mapper impacts the import and export operation.
hi
my sqoop job is copying huge amnt of data (say100gb) from db to hdfs,after copying 50 gb of data my job fails due to certain connection reasons,what am i suppose to do?is there any alternative sol?
Hi,
Are you using any condition while importing the data? If yes, please check that.
Also, make sure you are having proper resources available and network connectivity as well.
If all is good and if you are working at prod level, you can also create an Oozie job for that and schedule when the load is less.
Hope that helps! If you are still getting the error, please comment back.