Wednesday, April 29, 2020

JRE and JDK

Java JRE

Difference between JVM, JDK, and JRE

  • JVM - Java Virtual Machine is a virtual machine which runs programs which are compiled to bytecodes. The JVM is detailed by a specification that formally describes what is required in a JVM implementation. Having a specification ensures interoperability of Java programs across different implementations so that program authors using the Java Development Kit (JDK) need not worry about traits of the underlying hardware platform.
  • JDK- JDK is a wrapper around the JRE and additionally contains the compiler, interpreter, debugger and other tools. It provides users with features to run as well as develop Java programs.
  • JRE- JRE is made up of class libraries, JVM and supporting files

Conversion process- [JAVA .class file into back to SOURCE code]

Step 1: Download JAD tool from https://bit.ly/2XuRzT5
Step 2: Extract the zip file and get jad.exe.
Java Decompiler
Step 3: Create a .java file (DecompilerDemo.java) in the same folder in which you have extracted the zip file.
Step 4: Compile and run the .java file. It will generate a .class file in the respective folder.
Java Decompiler2
Step 5: Now move the .class file and jad application to another folder.
Java Decompiler3
Step 6: Type the command: jad DecompilerDemo
Java Decompiler4
Above command generates a .jad file in the respective folder.
Java Decompiler5
Step 7: Type the command: jad ?s java DecompilerDemo
Java Decompiler6
It generates the .java file in the same folder.
Java Decompiler7
Open the DecompilerDemo.java we get the following code:


JAVA class Decompiler

Options
With javap command, many options are available. Some of them are:
  • -c: It prints disassembled code.
  • -p: It shows all classes and members.
  • -s: It prints internal type signatures.
  • -constants: It shows static final constants.

Sunday, March 22, 2020

Hadoop Arch

Hadoop 1.0
1.First job submits to Job Tracker
2.Job Tracker contact Name node for Data
3.Then it give back the information to client
4.Meanwhile Job Tracker send information to Task Tracker about the job which
gonna come
5.client reaches TaskTracker and give job details ---(Jar path,Data path)
6.Task Tracker starts the job (Map Reduce)


Hadoop 2.0
1.Clients submits the job to Resource Manager
2.RM contacts Name Node
3.Name node give info to RM(Input split transformation)
4.RM give back information to client
5.Meanwhile RM send information to respective node manager to create container
where the data is present also it send an Info to one manager to create application master
6.The data reaches container and Application master submits the job inside container and monitors the job inside container
7.NM monitors the life of container
8.Application master gets the track of Map Reduce

Monday, March 16, 2020

Find the excess amount of the blood

select tab.bg as Bloog_group,tab.supp_quan-tab.don_quan as Excess_blood
from
(select donotor.bg as bg,sum(donotor.quan) as don_quan,sum(supplier.quan) as supp.quan from
donotor join supplier
on donotor.bg=supplier.bg
group by donotor.bg) tab
where tab.don_quan

Thursday, March 5, 2020

Hadoop History and Sources

2003----Google---GFS---Google File System---DFS---But very powerful
2004----Google---MapReduce----Those guys---NDFS---Lucene Search Engine--Took MR Paper
2005----NUTCH--They release a process and storage with NDFS and Mapreduce
2006----Doug cutting---Hadoop---HDFS/MapReduce
2008---Facebook---Hive---SQL---HQL
2009-2010----Cloudera--
2011-2012---Horton works ---(spark) 2015

Sources:
---------
Business process-1X
OLTP -online transaction processing

Human-10x
Email
Documents
Social


Machine-100x
CTV
Mike
Sensors
Satellite(Maps,DTH)

Cluster setup:
--------------
Edge node should not part of storage node/master/slave node
that is not recommendable
Hadoop NameNode
Hadoop DataNode
Hadoop Edge Node

Interface medium:
Putty to connect Edge Node

In the putty IP Address[HostName] for connecting Edge Node
Username
password


Namenode:
cluster administration
 heartbeat
 Replication
 Balancing
Meta management

Over replication
Under replication


Wednesday, February 19, 2020

sqoop commands brushup

SQOOP
Chapters in sqoop:
•  List databases
•  List tables
•  EVAL
•  Select query evaluation
•  sqoop Import
•  Import a table from RDBMS
1. Into hdfs by using sqoop
2. Into hive by using sqoop
•  Import all tables from RDBMS
•  Import into a target dIrectory
•  Import  subset of table data
•  Incremental import
•  sqoop export
•  sqoop Job
•  create a sqoop job
•  verify the job
•  Inspect the job
•  execute the job


List databases
$ sqoop list-databases --connect jdbc:mysql://localhost/world --username root -P

List tables
$ sqoop list-tables --connect jdbc:mysql://localhost/world --username root -P

EVAL
•  Select query evaluation
$ sqoop eval --connect jdbc:mysql://localhost/world --username root -P --query "SELECT * FROM city LIMIT 10"


boundary
sqoop import \
  --connect "jdbc:mysql://localhost/world" \
  --username=root \
  --password=cloudera \
  --table city \
  --target-dir /user/cloudera/city2 \
  -m 2 \
  --boundary-query "select 1, 4079 from city limit 1" \

sqoop Import
•  Import a table from RDBMS (mysql)
$ sqoop import --connect jdbc:mysql://localhost/world --table city -m1 --username root -P

sqoop import --connect jdbc:mysql://localhost/world --table city --username root -P

Other sqoop import operations:

•  Import into a target directory
$ sqoop import --connect jdbc:mysql://localhost/world --table city --username root -P --target-dir /user/cloudera/world_city/ -m1

•  Import all tables from RDBMS
$ sqoop import-all-tables --connect jdbc:mysql://localhost/world -m1 --warehouse-dir /user/cloudera/world_city1/ --username root -P

•  Import  subset of table data
$ sqoop import
--connect jdbc:mysql://localhost/world
--table city --where "ID between 1 and 23"
--target-dir /user/cloudera/city1
-m1
-username root
-P

•  delimiter
sqoop import \
--connect jdbc:mysql://localhost/world \
--username=root \
--password=cloudera \
--table city \
--target-dir /user/cloudera/city4 \
--fields-terminated-by \|


•  split-by
•  sqoop import --connect jdbc:mysql://localhost/world --table city --username root -P --split-by ID --target-dir /user/cloudera/lab2

•  Incremental import (append)

$ sqoop import \
  --connect "jdbc:mysql://localhost/world" \
  --username=root \
  -P \
  --table city \
  --target-dir /user/cloudera/city1 \
  --append \
  --check-column "ID" \
  --incremental append \
  --last-value 23

SQOOP EXPORT
mysql> mysql -u root -p
password: cloudera
mysql> use world;
mysql> create table citytest like city;
mysql>  select * from citytest;

Local terminal:

cloudera@localhost> $sqoop export --connect jdbc:mysql://localhost/employee --table emp -m1 --export-dir/user/cloudera/emp  --input --fields --terminated-by ‘,’

sqoop export --connect "jdbc:mysql://localhost/world" \
--username root \
-P \
--table citytest \
--export-dir /user/cloudera/city1/part* \
-m1


SQOOP JOB:
To create a sqoop job:
$sqoop job --create importtoday -- import --connect jdbc:mysql://localhost/world --table city -m1 --username root -P
Verify the sqoop job:
 $sqoop job --list
Insect  the sqoop job:
 $sqoop job --show importtoday
Execute  the sqoop job:
 $sqoop job --exec importtoday

SQOOP HIVE ACTIONS
Steps:
Mysql terminal:
Mysql> show databases;
Mysql>use world;
Mysql> show tables;
Mysql>select * from City;

Hive terminal:
Linux> hive
Hive> select * from City;
Hive> describe City;

Sqoop terminal:(Sqoop import -- hive table )

$ sqoop import --connect jdbc:mysql://localhost/world --table city -m1 --username root -P --hive-import

Important objectives on sqoop:
DATA INJECTION sqoop
1. Import data from a table in a relational database into HDFS
2. Import the results of a query from a relational data bases into HDFS
3. Import a table from relational database into a new or existing Hive
     Table
4. Insert or update data from HDFS into a table in a relational
     Database
   
Code:
1. Import data from a table in a relational database into HDFS
Syntax:
$sqoop import
           --connect jdbc: rdbmsname:
          //localhost/dbname --table tablename --m1 --username root --P
2. Import the results of a query from a relational data bases into HDFS
Syntax:
$sqoop eval
         --connect jdbc: rdbmsname:
//localhost/ dbname  -e query --username root --P

3. Import a table from relational database into a new or existing Hive
     Table
Syntax:
$sqoop import --connect jdbc: rdbmsname:
          //localhost/dbname --table tablename -m1 --hive -import
4. Insert or update data from HDFS into a table in a relational
     Database 
Syntax:
       $sqoop export
           --connect jdbc: rdbmsname:
          //localhost/dbname --table tablename
 --export-dir/dbname/tablename  --username root --P


                 ---------------------------***practice makes perfect *** do more practice***------------------

sqoop import --connect jdbc:teradata://{host name}/Database=retail 
--connection-manager org.apache.sqoop.teradata.TeradataConnManager 
--username dbc --password dbc --table SOURCE_TBL --target-dir /user/hive/incremental_table -m 1 
--check-column modified_date --incremental lastmodified --last-value {last_import_date}

sqoop import --connect jdbc:teradata://{host name}/Database=retail 
--connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc 
--password dbc --target-dir /user/hive/incremental_table -m 1 
--query 'select * from SOURCE_TBL where modified_date > {last_import_date} AND $CONDITIONS’

Reference :
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html

Managed table:
sqoop create-hive-table 

https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_create_hive_table_literal



sqoop list-databases \
  --connect "jdbc:mysql://localhost/world" \
  --username root \
  --password cloudera

sqoop list-tables --connect "jdbc:mysql://localhost/world" \
  --username root \
  --password cloudera

sqoop eval \
  --connect "jdbc:mysql://localhost/world" \
  --username root \
  --password cloudera \
  --query "select count(1) from City"

sqoop import \
  -m 12 \
  --connect "jdbc:mysql://localhost/world" \
  --username root \
  --password cloudera \
  --table City
  --as-avrodatafile \
  --warehouse-dir=/user/hive/warehouse/retail_stage.db

--Default
sqoop import \
  --connect "jdbc:mysql://localhost/world" \
  --username root \
  --password cloudera \
  --table City \
  --as-textfile \
  --target-dir=/user/cloudera/departments

sqoop import \
  --connect "jdbc:mysql://localhost/world" \
  --username root \
  --password cloudera \
  --table City \
  --as-sequencefile \
  --target-dir=/user/cloudera/City

sqoop import \
  --connect "jdbc:mysql://localhost/world" \
  --username root \
  --password cloudera \
  --table City \
  --as-avrodatafile \
  --target-dir=/user/cloudera/departments

-- A file with extension avsc will be created under the directory from which sqoop import is executed
-- Copy avsc file to HDFS location
-- Create hive table with LOCATION to /user/cloudera/departments and TBLPROPERTIES pointing to avsc file
hadoop fs -put sqoop_import_departments.avsc /user/cloudera

CREATE EXTERNAL TABLE departments
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs:///user/cloudera/departments'
TBLPROPERTIES ('avro.schema.url'='hdfs://quickstart.cloudera/user/cloudera/sqoop_import_departments.avsc');


-- It will create tables in default database in hive
-- Using snappy compression
-- As we have imported all tables before make sure you drop the directories
-- Launch hive drop all tables
drop table departments;
drop table categories;
drop table products;
drop table orders;
drop table order_items;
drop table customers;

-- Dropping directories, in case your hive database/tables in consistent state
hadoop fs -rm -R /user/hive/warehouse/departments
hadoop fs -rm -R /user/hive/warehouse/categories
hadoop fs -rm -R /user/hive/warehouse/products
hadoop fs -rm -R /user/hive/warehouse/orders 
hadoop fs -rm -R /user/hive/warehouse/order_itmes
hadoop fs -rm -R /user/hive/warehouse/customers

sqoop import-all-tables \
  --num-mappers 1 \
  --connect "jdbc:mysql://localhost/world" \
  --username=root \
  --password=cloudera \
  --hive-import \
  --hive-overwrite \
  --create-hive-table \
  --compress \
  --compression-codec org.apache.hadoop.io.compress.SnappyCodec \
  --outdir java_files

sudo -u hdfs hadoop fs -mkdir /user/cloudera/retail_stage
sudo -u hdfs hadoop fs -chmod +rw /user/cloudera/retail_stage
hadoop fs -copyFromLocal ~/*.avsc /user/cloudera/retail_stage

-- Basic import
sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=retail_dba \
  --password=cloudera \
  --table departments \
  --target-dir /user/cloudera/departments 

-- Boundary Query and columns
sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=retail_dba \
  --password=cloudera \
  --table departments \
  --target-dir /user/cloudera/departments \
  -m 2 \
  --boundary-query "select 2, 8 from departments limit 1" \
  --columns department_id,department_name

-- query and split-by
sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=retail_dba \
  --password=cloudera \
  --query="select * from orders join order_items on orders.order_id = order_items.order_item_order_id where \$CONDITIONS" \
  --target-dir /user/cloudera/order_join \
  --split-by order_id \
  --num-mappers 4

-- Copying into existing table or directory (append)
-- Customizing number of threads (num-mappers)
-- Changing delimiter
sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=retail_dba \
  --password=cloudera \
  --table departments \
  --target-dir /user/hive/warehouse/retail_ods.db/departments \
  --append \
  --fields-terminated-by '|' \
  --lines-terminated-by '\n' \
  --num-mappers 1 \
  --outdir java_files

-- Importing table with out primary key using multiple threads (split-by)
-- When using split-by, using indexed column is highly desired
-- If the column is not indexed then performance will be bad 
-- because of full table scan by each of the thread
sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=retail_dba \
  --password=cloudera \
  --table departments \
  --target-dir /user/hive/warehouse/retail_ods.db/departments \
  --append \
  --fields-terminated-by '|' \
  --lines-terminated-by '\n' \
  --split-by department_id \
  --outdir java_files

-- Getting delta (--where)
sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=retail_dba \
  --password=cloudera \
  --table departments \
  --target-dir /user/hive/warehouse/retail_ods.db/departments \
  --append \
  --fields-terminated-by '|' \
  --lines-terminated-by '\n' \
  --split-by department_id \
  --where "department_id > 7" \
  --outdir java_files

-- Incremental load
sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=retail_dba \
  --password=cloudera \
  --table departments \
  --target-dir /user/hive/warehouse/retail_ods.db/departments \
  --append \
  --fields-terminated-by '|' \
  --lines-terminated-by '\n' \
  --check-column "department_id" \
  --incremental append \
  --last-value 7 \
  --outdir java_files

sqoop job --create sqoop_job \
  -- import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=retail_dba \
  --password=cloudera \
  --table departments \
  --target-dir /user/hive/warehouse/retail_ods.db/departments \
  --append \
  --fields-terminated-by '|' \
  --lines-terminated-by '\n' \
  --check-column "department_id" \
  --incremental append \
  --last-value 7 \
  --outdir java_files

sqoop job --list

sqoop job --show sqoop_job

sqoop job --exec sqoop_job

-- Hive related
-- Overwrite existing data associated with hive table (hive-overwrite)
sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=retail_dba \
  --password=cloudera \
  --table departments \
  --fields-terminated-by '|' \
  --lines-terminated-by '\n' \
  --hive-home /user/hive/warehouse/retail_ods.db \
  --hive-import \
  --hive-overwrite \
  --hive-table departments \
  --outdir java_files

--Create hive table example
sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=retail_dba \
  --password=cloudera \
  --table departments \
  --fields-terminated-by '|' \
  --lines-terminated-by '\n' \
  --hive-home /user/hive/warehouse \
  --hive-import \
  --hive-table departments_test \
  --create-hive-table \
  --outdir java_files

--Connect to mysql and create database for reporting database
--user:root, password:cloudera
mysql -u root -p
create database retail_rpt_db;
grant all on retail_rpt_db.* to retail_dba;
flush privileges;
use retail_rpt_db;
create table departments as select * from retail_db.departments where 1=2;
exit;

--For certification change database name retail_rpt_db to retail_db
sqoop export --connect "jdbc:mysql://quickstart.cloudera:3306/retail_rpt_db" \
       --username retail_dba \
       --password cloudera \
       --table departments \
       --export-dir /user/hive/warehouse/retail_ods.db/departments \
       --input-fields-terminated-by '|' \
       --input-lines-terminated-by '\n' \
       --num-mappers 2 \
       --batch \
       --outdir java_files

sqoop export --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username retail_dba \
  --password cloudera \
  --table departments \
  --export-dir /user/cloudera/sqoop_import/departments_export \
  --batch \
  --outdir java_files \
  -m 1 \
  --update-key department_id \
  --update-mode allowinsert

sqoop export --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username retail_dba \
  --password cloudera \
  --table departments_test \
  --export-dir /user/hive/warehouse/departments_test \
  --input-fields-terminated-by '\001' \
  --input-lines-terminated-by '\n' \
  --num-mappers 2 \
  --batch \
  --outdir java_files \
  --input-null-string nvl \
  --input-null-non-string -1

--Merge process begins
hadoop fs -mkdir /user/cloudera/sqoop_merge

--Initial load
sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=retail_dba \
  --password=cloudera \
  --table departments \
  --as-textfile \
  --target-dir=/user/cloudera/sqoop_merge/departments

--Validate
sqoop eval --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username retail_dba \
  --password cloudera \
  --query "select * from departments" 

hadoop fs -cat /user/cloudera/sqoop_merge/departments/part*

--update
sqoop eval --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username retail_dba \
  --password cloudera \
  --query "update departments set department_name='Testing Merge' where department_id = 9000"

--Insert
sqoop eval --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username retail_dba \
  --password cloudera \
  --query "insert into departments values (10000, 'Inserting for merge')"

sqoop eval --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username retail_dba \
  --password cloudera \
  --query "select * from departments"

--New load
sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=retail_dba \
  --password=cloudera \
  --table departments \
  --as-textfile \
  --target-dir=/user/cloudera/sqoop_merge/departments_delta \
  --where "department_id >= 9000"

hadoop fs -cat /user/cloudera/sqoop_merge/departments_delta/part*

--Merge
sqoop merge --merge-key department_id \
  --new-data /user/cloudera/sqoop_merge/departments_delta \
  --onto /user/cloudera/sqoop_merge/departments \
  --target-dir /user/cloudera/sqoop_merge/departments_stage \
  --class-name departments \
  --jar-file

hadoop fs -cat /user/cloudera/sqoop_merge/departments_stage/part*

--Delete old directory
hadoop fs -rm -R /user/cloudera/sqoop_merge/departments

--Move/rename stage directory to original directory
hadoop fs -mv /user/cloudera/sqoop_merge/departments_stage /user/cloudera/sqoop_merge/departments 

--Validate that original directory have merged data
hadoop fs -cat /user/cloudera/sqoop_merge/departments/part*

--Merge process ends


sqoop import --connect "jdbc:mysql://localhost/world" --username=root --password=cloudera --query="select City.*,CountryLanguage.Percentage,CountryLanguage.IsOfficial from City join CountryLanguage on City.CountryCode = CountryLanguage.CountryCode where City.ID=4079 and \$CONDITIONS" --target-dir /user/cloudera/order_join -m 1


Friday, February 14, 2020

Hbase basic operation

hbase-shell

create 'blog', 'info', 'content'
list
put 'blog', '20130320162535', 'info:title', 'Why use HBase?'
put 'blog', '20130320162535', 'info:author', 'Jane Doe'
put 'blog', '20130320162535', 'info:category', 'Persistence'
put 'blog', '20130320162535', 'content:pg', 'HBase is a column-oriented...'
put 'blog', '20130320162535', 'content:pg1', 'HBase is a column-oriented...'


get 'blog', '20130320162535'
scan 'blog', { STARTROW => '20130300', STOPROW => '20130400' }

delete 'blog', '20130320162535', 'info:category'
get 'blog', '20130320162535'

scan 'blog', { STARTROW => '20130300', STOPROW => '20130500', COLUMNS => 'info:title' }


disable 'blog'
drop 'emp'

exists 'emp'

create 'city','cityinfo','otherdetails'

sqoop import \
    --connect jdbc:mysql://localhost/world \
    --username root -P \
    --table City \
    --columns "id,name,countryCode" \
    --hbase-table city \
    --column-family cityinfo \
    --hbase-row-key id -m 1

Python Challenges Program

Challenges program: program 1: #Input :ABAABBCA #Output: A4B3C1 str1="ABAABBCA" str2="" d={} for x in str1: d[x]=d...