Drilling into Big Data — Data Ingestion (4)

$ hadoop fs –ls  /user
$ hadoop fs –mkdir /user/demo

SQOOP

  • Import individual tables or entire databases to files in HDFS
  • Import from SQL databases directly into your Hive data warehouse
  • Sqoop import
  • Sqoop export
$ sudo cp ojdbc6.jar /usr/lib/sqoop-current/lib
$ sudo cp sqljdbc_6.0.81_enu.tar.gz /usr/lib/sqoop-current/lib

File Formats

Manage Parallelism

Sqoop Import

sqoop eval --connect jdbc:oracle:thin:@182.156.193.194:1556:ORCL --username xxx--password xxx --query "SELECT * FROM TRIP_ADVISOR LIMIT 3"
sqoop import --connect jdbc:oracle:thin:@182.156.193.194:1556:ORCL --username xxx    --P --table TRIP_ADVISOR --target-dir hdfs://emr-header-1.cluster-88549:9000/user/demo/sqoop  -m1

Troubleshooting Issues

sqoop import --connect jdbc:oracle:thin:@192.168.6.23:1526:xxx --username xxx --P –table xxx --target-dir hdfs://emr-header-1.cluster-88549:9000/user/demo/sqoop -m1 --driver oracle.jdbc.driver.OracleDriver
  • The Oracle service might not be running on the given host and port
  • The firewall might restrict the client access to the oracle server

Best Practices

  • Sqoop does not support few hadoop file formats like ORC, RC
  • Mentioning schema and table names in Capital letters will prevent facing some issues
  • Use split-by if you need multiple mappers
  • Avoid using column names which are keywords in sqoop
  • If a table does not have a primary key defined and the — split-by is not provided in the command, then import will fail unless the number of mappers is explicitly set to one

--

--

--

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

I made my own VS Code theme in less than an hour, here’s how?

Build a complete iOS messaging app using XMPPFramework

What the Heck is *Ops?

Culture and Values at Sandbox

KWoC - Contributions’ Report

Docker Hive Scripts

An automated quick-launch service for staging servers

Continuous integration: Pipelines or jobs first?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

More from Medium

Snowflake: Data Platform as a Service

Comprehensive comparison between AWS EMR and Glue

Store data into Redshift when data is uploaded into S3 without duplication using AWS Glue

The Slow and Steady Evolution of New Age Data Stores