Rabu, Desember 13, 2017

HDPCD - Hortonworks Certification Practice Questions

Problem Scenario 1 : You have given a CSV file, which contain Employee and Salary data. You need to accomplish following.
1. Load this Me to HDFS. 2. Create two tables In MySQL named as EMPLOYEE and SALARY. 3. Once file in HDFS load in RDBMS using Sqoop on above two tables. 4. Once data is landed in MySQL tables, import their data back to HDFS using Sqoop.
Employee.csv
1001,Amit,male,35
1002,Lokesh,male,36
1003,Venkat,male,28
1004,Radha,female,30
1005,Vanita,female,42
Salary.csv
1001,120000
1002,99000
1003,106000
1004,9000
1005,89000
Solution :
Step 1 : Create a csv file on local System. named Employee.csv and Salary.csv (On Desktop under the folder CCA175).
-- Create local directory as below.
mkdir hadoopexam
Now create files as below.
Employee.csv
Salary.csv
vi Iroothadoopexam/Employee.csv
vi frootihadoopexam/Salary.csv
Step 2 : Create a Directory in HDFS. using the shell.
- Create user space in Hadoop
hadoop fs -mkdir /user/root
- Go to tile view and check user space has been created.
hadoop fs -ls /user/root
- Now create another directory in it.
HORTON
hadoop fs -mkdir modulel
YAVA
hadoop fs -mkdir /user/root/modulel
Step 3 : Check whether Directory created or not
hadoop fs -ls /user/root/
Step 4 : Upload both the files to HDFS using put command.
HORTON
hdfs dfs -put Employee.csv modulel/
hdfs dfs -put Salary.csv modulel/
YAVA
hdfs dfs -put Employee.csv /user/root/modulel/
hdfs dfs -put Salary.csv /user/root/modulel/
Step 5 : Check whether both the tiles have been copied or not.
HORTON
hdfs dfs -ls /user/root/modulel
hdfs dfs -cat module1/Salary.csv
YAVA
hdfs dfs -ls /user/root/modulel
hdfs dfs -cat /user/root/modulel/Employee.csv
Step 6 : Create two tables in MySQL db first.
mysql
HORTON
CREATE DATABASE db1;
YAVA
create database coba;
Empoyee Table :
CREATE TABLE IF NOT EXISTS EMPLOYEE(id int, name char(100), sex char(5), age int );
Salary Table :
CREATE TABLE IF NOT EXISTS SALARY(id int, salary int);
Check table created or not :
show tables;
HORTON
sqoop list-tables --connect "jdbc:mysql://localhost/db1" --username root -P
YAVA
sqoop list-tables --connect "jdbc:mysql://localhost/db1" --username root
Step 7 : Check the Help for Sqoop export command.
sqoop export --help
Step 8 : Once the file is in HDFS, Using Sqoop create tables as well as load the data in it.
HORTON
sqoop export --connect "jdbc:mysql://localhost/db1" --username root -P --table employee --direct --export-dir module1/employee.csv -m 1
sqoop export --connect "jdbc:mysql://localhost/db1" --username root -P --table salary --direct --export-dir module1/salary.csv -m 1 
YAVA   
sqoop-export --connect "jdbc:mysql://localhost/coba" --username root --table EMPLOYEE --direct --export-dir /user/root/module1/Employee.csv -m 1
sqoop-export --connect "jdbc:mysql://localhost/coba" --username root --table SALARY --direct --export-dir /user/root/module1/Salary.csv -m 1

Step 9 : Check whether data is loaded or not.
mysql db1
select *from EMPLOYEE;
select *from SALARY;
Step 10 : Using Sqoop Extract Employee and Salary and its data as an Avro file.
- Check the all mysql jars version (We first need proper version of MySQL DB)
ls -ltr /usr/share/java/mysql*.jar
- First unlink the jar (As it is not correct version)
unlink /usr/share/java/mysql-connector-java
- Now link to proper version
UPDATE
mysql-connector-java-5.1.45.zip

ln -s /usr/share/java/mysql-connector-java-5.1.37.jar /usr/share/java/mysql-connector-java.jar                       
- Now check that It Is pointing to mysql-connector-java-5.1.37.jar
ls -ltr /usr/share/java/mysql*.jar
- Now Import the data from MySQL db to HDFS
sqoop import --connect jdbc:mysql://localhost/coba --username root -P --warehouse-dir /user/root/module1/module_export  --table employee --split-by id -m 1
sqoop import --connect jdbc:mysql://localhost/coba --username root -P --warehouse-dir /user/root/module1/module_export  --table salary --split-by id -m 1

Step 11: Check whether avro files extracted or not.
hdfs dfs -ls /user/root/module1/module1_export/employee
hdfs dfs -ls /user/root/module1/module1_export/salary

hdfs dfs -cat /user/root/module1/module_export/employee/part-m-00000
hdfs dfs -cat /user/root/module1/module_export/salary/part-m-00000

Problem Scenario 4: You have been given with following details.
user-hedba
passwordshadoopexam
databaseshadoopexam_dbl
tableshedba.hecategories
jdbc UHL • jdbc:mysgl://sandbox.hortonworks.com:3306/hadoopexam_dbl
Please accomplish following activities.
Import Single table categories(Subset data) to hive managed table , where category_id between 1 and 22
Solution :
Step 1 : If directory already exists than delete it first.
hdfs dfs -rm -R /user/root/hecategories/
Check directory successfully deleted
hdfs dfa -la /user/root/
Step 2 : Import Single table (Subset data)
sqoop import --connect jdbc:mysql://localhost/coba --username root --table employee --where "\`id\` between 1001 and 1003" --hive-import --m 1
Note: Here the ' is the same you find on - key
This command will create • managed table mad content will be created in the following directory.
hdfs dfs -la /apps/hive/warehouse/
Step 3 : Check whether table is created or not (In Hive)
hive
show tables;
select • from hecategories;

Problem Scenario 8: You have been given following mysgl database details as well as other info.
userdhedba password-hadoopexam database-hadoopexam_dbl table-hedba.hecategories jdbc UPL jdbc:mysta://sandbox.hortonworks.com:3306/hadoopexam dbl Compression Cadet : org.apache.hadoop.io.compress.SnappyCodec
Please accomplish following.
1. Import entire database such that it can be used as a hive tables, it must be created in default schema. 2. Also make sure each tables file is partitioned in 3 files e.g. part-00000, part-00002, part-00003 3. Store all the lava files in a directory called java_output to evalute the further
Solution :
Step 1 : Drop all the tables, which we have created in previous problems. Before implementing the solution. Login to hive and execute following command. show tables;
drop table hecategories: drop table hecustomers: drop table hedepartments: drop table heemployee: drop table heorder_items: drop table Reorders: drop table heproducts: drop table employee: drop cable salary;
show tables:
If directiory exists delete the same. (Generally is should not be there, until your previous iq,ort failed) hdfs dfs -rm -R hecategories hdfs dfs -rm -R hecustomers hdfs dfs -rm -R hedeportments hdfs dfs -rm -R heemployee hdfs dfs -rm -R heorder_items hdfs dfs -rm -R heorders hdfs dfs -rm -R heproducts hdfs dfs -rm -R employee hdfs dfs -rm -R salary Check warehouse directly. (And no directory should exists for dropped tables) hdfs dfs -Is /apps/hive/warehouse
Step 2 : Drop following tables from mysql databse mysql --userahedba --passwordahadoopexam hadoopexam_dbl drop table EMPLOYEE; drop table SALARY;
Step 3 : Now we have cleaned database. Import entire retail db with all the required parameters as problem statement is asking.
sgoop import-all-tables \
-m 3\
--connect 3dbc:mysgl://sandbox.hortonworks.com:3306/hadoopexam_dbl \
--ysernareshedba \
--tasswordshadoopexam \
--hive-import \
--hive-overwrite \
--create-hive-table \
--compress \
--compression-cadet org.apache.hadoop.io.compress.SnappyCodec \
--outdir 3ava_output
Step 4 : Verify the work is accomplished or not.
a. Go to hive and chick all the tables
hive
show tables:
select count(1) from hecustomers:
b. Check the warehouse directory and number of partitions.
hdfs dfs -Is /apps/hive/warehouse
hdfs dfs -Is /apps/hive/warehouse/hecategories
C. Check the output peva directory.
Is -ltr java_output/
Problem Scenario 10 : You have been given following mysql database details as well as other info.
user=hedba
password=hadoopexam
database=hadoopexam_db1
table=hedba.hecategones
jdbc URL = jdbc:mysql://sandbox.hortonworks.com:3306/hadoopexam_db1
Please accomplish following.
1. Import joined result of orders and order_items table join on orders.order_id = order_items.order_item_order_id.
2. Also make sure each tables file is partitioned in 2 files e.g. part-00000, part-00002
3. Also make sure you use order_id columns for sqoop to use for boundary conditions.
Solutions :
Step 1 : Clean the hdfs file system, if they exists clean out.
hadoop fs -rm -R hedepartments hadoop fs -rm -R hecategories hadoop fs -rm -R heproducts hadoop fs -rm -R heorders hadoop fs -rm -R heorder_itmes hadoop fs -rm -R hecustomers
Step 2 : Now import the department table as per requirement. sqoop import \ --connect jdbc:mysql://sandbox.hortonworks.com:3306/hadoopexam dbl \ --username=hedba \ --password=hadoopexam \ --query="select • from heorders join heorder_items on heorders.order_id = heorder_items.order_item order_id where \iCONDITIONS" \ --target-dir heorder_join \ --split-by order_id \ --num-mappers 2
Step 3 : Check imported data.'
hdfs dfs -Is heorderjoin hdfs dfs -cat heorder_join/part-m-00000 hdfs dfs -cat heorder_join/part-m-00001


Tidak ada komentar:

Posting Komentar