Friday, January 3, 2014

HDinsight Emulator – Using Sqoop to import/export data from/to SQL Server Express

This post will walk you through the process of importing/exporting data from/to SQL Server Express via Sqoop .

Create dev box in the cloud (optional)

If you don't want to install the HDInsight Emulator on your development machine, you can create a VM in the cloud via Azure IaaS Virtual Machine Service (given that you have Azure Subscription).

Open the Azure Management Portal, click on the Virtual Machine tab, click ‘New’ and select your preferred version on Visual Studio.

image

If you don’t need Visual Studio, you can install VM with Windows Server.

image

Once your VM is ready, it will appear as ‘Running’ in the Virtual Machine Tab. You can Remote Connect to the VM using the link in the portal.

image

For more details on Azure VM, in this post, Scott Hanselman shows how he’s using the Visual Studio image to get work done from his Surface 2 Tablet.

Install HDinsight Emulator

HDInsight Emulator is a single node HDInsight deployment that allows developers to develop on and debug jobs on their local development box. You can install the HDInsight Emulator via Web Platform Installer from here. All the missing prerequisites will be detected and installed automatically!

image

Once the installation is complete, you will notice that all the supported Hadoop services will be running as Windows Services on your local machine.

HDI.Emulator.Services.

Install and configure SQL Server Express

If you have full SQL Server instance installed on your machine, you can skip to the next step. If you want to use the free SQL Express instance, some extra configurations are needed. Here’s how you install and configure SQL Express to make it work with Sqoop.

  • Open ‘Web Platform Installer’ (install from here)
  • Type ‘sql express ‘ in the search box
  • Click ‘Add’ next to ‘SQL Server Express 2008’ and ‘SQL Server Express 2008 Management Studio’

image

  • Set the password for the admin account (you will need to use it in the next step)

image

  • Open ‘SQL Server Configuration Manager’
image
  • Enable the TCP/IP protocol for SQLExpress.
    • Expend ‘SQL Server Network Configuration’
    • Right click on ‘Protocols for SQLExpress’
    • Right click on ‘TCP/IP’ and select ‘Enable’

image

  • Restart SQL Server service.
    • Click on ‘SQL Server Services’
    • Right click on ‘SQL Server (SQLExpress)’ and select ‘Restart’

image 

  • Start ‘SQL Server Browser’ service
    • Click on ‘SQL Server Services’
    • Right click on ‘SQL Server browser’ and select ‘Properties’
    • Change the ‘Start Mode’ to ‘Automatic’, click apply and close the properties window.
    • Start the service

image

Create Database and Table

  • Open ‘SQL Server Management Studio’ and Connect to ‘SQL Express’ server (default)

image

  • Create ‘New Query’

image

  • Run the following command to create new Database (sqoop) and Table (w3c) schema
CREATE DATABASE sqoop 

Go

Use sqoop

Go

CREATE TABLE [dbo].[w3c]( 

[logdate] nvarchar(150) NULL, [logtime] nvarchar(150) NULL, 

[c_ip] nvarchar(150) NULL, [cs_username] nvarchar(150) NULL, 

[s_ip] nvarchar(150) NULL, [s_port] nvarchar(150) NULL, 

[cs_method] nvarchar(150) NULL, [cs_uri_stem] nvarchar(150) NULL, 

[cs_uri_query] nvarchar(150) NULL, [sc_status] [bigint] NULL, 

[sc_bytes] [bigint] NULL, [cs_bytes] [bigint] NULL, 

[time_taken] [bigint] NULL, [cs_agent] nvarchar(150) NULL, 

[cs_referrer] nvarchar(150) NULL) ON [PRIMARY] 

Upload data to HDFS

  • Download the file w3c_2MB.txt 
  • Upload the file from your local drive to HDFS under the folder ‘sqoop’
    • Open the ‘Hadoop Command Prompt’ window (shortcut should be available on your desktop), type:

   > hadoop fs -copyFromLocal w3c_2MB.txt ./sqoop/w3c_2MB.txt

    • Make sure that the file has been uploaded, type:

   > hadoop fs –lsr

image

Export from HDFS to SQL Express

  • Open the ‘Hadoop Command Prompt’ window (shortcut should be available on your desktop), type:
  • In the script bellow replace:
    • <your-sql-password> with the password that you’ve chosen when you installed SQL Express
    • <your-user-name> with the name of your machine

If you are using full SQL Server: instead of localhost\sqlexpress, use localhost. Make sure that the specified user (sa) is enabled and has write permissions on your SQL Server. Notice that you must use SQL Server Authentication.

   > cd c:\Hadoop\sqoop-1.4.2\bin

   > sqoop export --connect "jdbc:sqlserver://localhost\sqlexpress;databaseName=sqoop;user=sa;password=<your-sql-password>" --table w3c --export-dir /user/<your-user-name>/sqoop -m 1 -input-fields-terminated-by " "

Note:

  • Make sure that the export completed successfully

image

  • Open ‘SQL Server Management Studio’ and Connect to ‘SQL Express’ server (default).
  • Create ‘New Query’ Run:
Use sqoop

Go

Select * from w3c
image 

Import from SQL Express back to HDFS

  • Open the ‘Hadoop Command Prompt’ window (shortcut should be available on your desktop)
  • In the script bellow replace:
    • <your-sql-password> with the password that you’ve chosen when you installed SQL Express
    • <your-user-name> with the name of your machine

   > cd c:\Hadoop\sqoop-1.4.2\bin

   > sqoop import --connect "jdbc:sqlserver://localhost\sqlexpress;databaseName=sqoop;user=sa;password=<your-sql-password>" --table w3c --target-dir /user/<your-user-name>/sqoop/import --fields-terminated-by " " -m 1

  • Make sure that the export completed successfully

    > hadoop fs –ls  /user/<your-user-name>/sqoop/import

image

Import data from SQL Server Express directly to Hive table

  • Open the ‘Hadoop Command Prompt’ window (shortcut should be available on your desktop)
  • In the script bellow replace:
    • <your-sql-password> with the password that you’ve chosen when you installed SQL Express

   > cd c:\Hadoop\sqoop-1.4.2\bin

   > sqoop import --connect "jdbc:sqlserver://localhost\sqlexpress;databaseName=sqoop;user=sa;password=<your-sql-password>" --table w3c --hive-table w3c_sqoop --create-hive-table --hive-import -m 1

image

  • Make sure that the export completed successfully.
  • Open hive CLI:

   > c:\Hadoop\hive-0.9.0\bin\hive

   hive> SELECT COUNT(*) FROM w3c_sqoop

image

4 comments:

  1. Thanks for sharing this blog. It seems very detailed and impressive. I completely leveraged your blog with just a minor change - I am using MSSQLSERVER instead of SQLEXPRESS.

    For the SQL Configuration Manager steps: I didnt see MSSQLSERVER services. Is there any other way to resolve it. Or we cant use MSSQLSERVER services?

    ReplyDelete
  2. Thanks Ankit. the configuration step is required only for SQL Express. If you are using the full MSSQLSERVER instance, configuration is not required.

    ReplyDelete
  3. Hi All,

    As I am having a question while exporting the data from hdfs to sql server using sqoop export.

    Is there any possibility to export data from hdfs to multiple tables in sql server with sqoop export??

    If any one knows it, Drop a query of sqoop export.

    Thanks
    Naveen

    ReplyDelete
  4. awesum sir, I struggled with MySQL and able to do so easiy following ur blog on sql server 2014, just that u need to download the jdbc driver and copy it to sqoop\lib folder , rest worked as expected, also need to set the authentication mode to mixed mode on the server

    ReplyDelete