Thursday, March 5, 2020

Hadoop History and Sources

2003----Google---GFS---Google File System---DFS---But very powerful
2004----Google---MapReduce----Those guys---NDFS---Lucene Search Engine--Took MR Paper
2005----NUTCH--They release a process and storage with NDFS and Mapreduce
2006----Doug cutting---Hadoop---HDFS/MapReduce
2008---Facebook---Hive---SQL---HQL
2009-2010----Cloudera--
2011-2012---Horton works ---(spark) 2015

Sources:
---------
Business process-1X
OLTP -online transaction processing

Human-10x
Email
Documents
Social


Machine-100x
CTV
Mike
Sensors
Satellite(Maps,DTH)

Cluster setup:
--------------
Edge node should not part of storage node/master/slave node
that is not recommendable
Hadoop NameNode
Hadoop DataNode
Hadoop Edge Node

Interface medium:
Putty to connect Edge Node

In the putty IP Address[HostName] for connecting Edge Node
Username
password


Namenode:
cluster administration
 heartbeat
 Replication
 Balancing
Meta management

Over replication
Under replication


No comments:

Post a Comment

Python Challenges Program

Challenges program: program 1: #Input :ABAABBCA #Output: A4B3C1 str1="ABAABBCA" str2="" d={} for x in str1: d[x]=d...