Bigdata_Spark_Consultant: Hadoop History and Sources

Thursday, March 5, 2020

Hadoop History and Sources

2003----Google---GFS---Google File System---DFS---But very powerful
2004----Google---MapReduce----Those guys---NDFS---Lucene Search Engine--Took MR Paper
2005----NUTCH--They release a process and storage with NDFS and Mapreduce
2006----Doug cutting---Hadoop---HDFS/MapReduce
2008---Facebook---Hive---SQL---HQL
2009-2010----Cloudera--
2011-2012---Horton works ---(spark) 2015

Sources:
---------
Business process-1X
OLTP -online transaction processing

Human-10x
Email
Documents
Social

Machine-100x
CTV
Mike
Sensors
Satellite(Maps,DTH)

Cluster setup:
--------------
Edge node should not part of storage node/master/slave node
that is not recommendable
Hadoop NameNode
Hadoop DataNode
Hadoop Edge Node

Interface medium:
Putty to connect Edge Node

In the putty IP Address[HostName] for connecting Edge Node
Username
password

Namenode:
cluster administration
heartbeat
Replication
Balancing
Meta management

Over replication
Under replication

Bigdata_Spark_Consultant

Thursday, March 5, 2020

Hadoop History and Sources

No comments:

Post a Comment

Python Challenges Program

Blog Archive

Labels