Ambari Hadoop安装启用HA


本文图多,流量慎点。

通过Ambari搭建的Hadoop集群默认不是HA的

非HA集群,一旦namenode宕机,虽然元数据不会丢失,但整个集群将无法对外提供服务,HDFS服务的可靠性不高,这在实际应用场景中显然是不可行的。

想在原有的Ambari Hadoop集群上,启用HA,会有疑问?NameNode会不会重新format,数据会不会丢失,修改是否麻烦

那么,今天这篇文章,就来记录一下,Ambari Hadoop集群启用HA

Ambari版本:2.6.0.3-8

Hadoop版本:2.7.3

节点分配:

IP 角色
192.168.1.35  (sit-sql01) 新增NameNodeDataNode新增JournalNode
192.168.1.36 (sit-usc01) 新增JournalNodeDataNode
192.168.1.37 (sit-app01) 新增JournalNodeNameNode,DataNode

整个流程如下:

1.节点分配:分配Namenode、JournalNode节点

2.审核:审核前一步,启用HA后服务、节点增删情况

3.备份:开启NameNode的安全模式(只读),建立CheckPoint

4.修改:修改配置文件

5.初始化Journalnode

6.启动组件

7.初始化元数据

8.完成HA安装


1.首先,登陆Ambari,找到HDFS服务,单击右上角Service Actions,找到Enable NameNode HA

2.页面跳转,选择节点,需要配置JournalNode与新增Namenode节点分配


3.对配置进行审核

删除了SecondaryNameNode

额外增加了NameNode节点,3个JournalNode节点


4.创建CheckPoint

需要在原NameNode节点进行操作:

  • 让NameNode进入安全模式(即只读)

[root@sit-app01 tmp]# sudo su hdfs -l -c 'hdfs dfsadmin -safemode enter'
Safe mode is ON

  • 进入安全模式后,创建CheckPoint

[root@sit-app01 tmp]# sudo su hdfs -l -c 'hdfs dfsadmin -saveNamespace'
Save namespace successful

  • 创建好后,会进行检查,然后进行下一步



5.接下来会进行节点配置

  • 先关闭所有的HDFS服务
  • 在新添加NameNode节点安装NameNode
  • 安装Journalnode服务
  • 重新配置HDFS
  • 开启Journalnode
  • 关闭Secondary NameNode



6.接下来会对Journalnode进行初始化

需要在原NameNode节点上进行操作

执行如下命令,初始化Journalnode,共享Edits

[root@sit-app01 tmp]# sudo su hdfs -l -c 'hdfs namenode -initializeSharedEdits'

主要日志如下:

18/05/16 11:15:57 INFO namenode.NameNode: createNameNode [-initializeSharedEdits]
18/05/16 11:15:57 INFO namenode.FSEditLog: Edit logging is async:false
18/05/16 11:15:57 INFO namenode.FSNamesystem: No KeyProvider found.
18/05/16 11:15:57 INFO namenode.FSNamesystem: Enabling async auditlog
18/05/16 11:15:57 INFO namenode.FSNamesystem: fsLock is fair:false
18/05/16 11:15:57 INFO blockmanagement.HeartbeatManager: Setting heartbeat recheck interval to 30000 since dfs.namenode.stale.datanode.interval is less than dfs.namenode.heartbeat.recheck-interval
18/05/16 11:15:57 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
18/05/16 11:15:57 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
18/05/16 11:15:57 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:01:00:00.000
18/05/16 11:15:57 INFO blockmanagement.BlockManager: The block deletion will start around 2018 May 16 12:15:57
18/05/16 11:15:57 INFO util.GSet: Computing capacity for map BlocksMap
18/05/16 11:15:57 INFO util.GSet: VM type       = 64-bit
18/05/16 11:15:57 INFO util.GSet: 2.0% max memory 1011.3 MB = 20.2 MB
18/05/16 11:15:57 INFO util.GSet: capacity      = 2^21 = 2097152 entries
18/05/16 11:15:57 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=true
18/05/16 11:15:57 INFO blockmanagement.BlockManager: dfs.block.access.key.update.interval=600 min(s), dfs.block.access.token.lifetime=600 min(s), dfs.encrypt.data.transfer.algorithm=null
18/05/16 11:15:58 INFO blockmanagement.BlockManager: defaultReplication         = 3
18/05/16 11:15:58 INFO blockmanagement.BlockManager: maxReplication             = 50
18/05/16 11:15:58 INFO blockmanagement.BlockManager: minReplication             = 1
18/05/16 11:15:58 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
18/05/16 11:15:58 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
18/05/16 11:15:58 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
18/05/16 11:15:58 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
18/05/16 11:15:58 INFO namenode.FSNamesystem: fsOwner             = hdfs (auth:SIMPLE)
18/05/16 11:15:58 INFO namenode.FSNamesystem: supergroup          = hdfs
18/05/16 11:15:58 INFO namenode.FSNamesystem: isPermissionEnabled = true
18/05/16 11:15:58 INFO namenode.FSNamesystem: Determined nameservice ID: cluster
18/05/16 11:15:58 INFO namenode.FSNamesystem: HA Enabled: true
18/05/16 11:15:58 INFO namenode.FSNamesystem: Append Enabled: true
18/05/16 11:15:58 INFO util.GSet: Computing capacity for map INodeMap
18/05/16 11:15:58 INFO util.GSet: VM type       = 64-bit
18/05/16 11:15:58 INFO util.GSet: 1.0% max memory 1011.3 MB = 10.1 MB
18/05/16 11:15:58 INFO util.GSet: capacity      = 2^20 = 1048576 entries
18/05/16 11:15:58 INFO namenode.FSDirectory: ACLs enabled? false
18/05/16 11:15:58 INFO namenode.FSDirectory: XAttrs enabled? true
18/05/16 11:15:58 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
18/05/16 11:15:58 INFO namenode.NameNode: Caching file names occuring more than 10 times
18/05/16 11:15:58 INFO util.GSet: Computing capacity for map cachedBlocks
18/05/16 11:15:58 INFO util.GSet: VM type       = 64-bit
18/05/16 11:15:58 INFO util.GSet: 0.25% max memory 1011.3 MB = 2.5 MB
18/05/16 11:15:58 INFO util.GSet: capacity      = 2^18 = 262144 entries
18/05/16 11:15:58 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9900000095367432
18/05/16 11:15:58 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
18/05/16 11:15:58 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
18/05/16 11:15:58 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
18/05/16 11:15:58 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
18/05/16 11:15:58 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
18/05/16 11:15:58 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
18/05/16 11:15:58 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
18/05/16 11:15:58 INFO util.GSet: Computing capacity for map NameNodeRetryCache
18/05/16 11:15:58 INFO util.GSet: VM type       = 64-bit
18/05/16 11:15:58 INFO util.GSet: 0.029999999329447746% max memory 1011.3 MB = 310.7 KB
18/05/16 11:15:58 INFO util.GSet: capacity      = 2^15 = 32768 entries
18/05/16 11:15:58 INFO common.Storage: Lock on /data/datafile/hadoop/hdfs/namenode/in_use.lock acquired by nodename 10246@sit-app01.insightcredit
18/05/16 11:15:58 INFO namenode.FSImage: No edit log streams selected.
18/05/16 11:15:58 INFO namenode.FSImage: Planning to load image: FSImageFile(file=/data/datafile/hadoop/hdfs/namenode/current/fsimage_0000000000001353748, cpktTxId=0000000000001353748)
18/05/16 11:15:58 INFO namenode.FSImageFormatPBINode: Loading 3491 INodes.
18/05/16 11:15:58 INFO namenode.FSImageFormatProtobuf: Loaded FSImage in 0 seconds.
18/05/16 11:15:58 INFO namenode.FSImage: Loaded image for txid 1353748 from /data/datafile/hadoop/hdfs/namenode/current/fsimage_0000000000001353748
18/05/16 11:15:58 INFO namenode.FSNamesystem: Need to save fs image? false (staleImage=false, haEnabled=true, isRollingUpgrade=false)
18/05/16 11:15:58 INFO namenode.NameCache: initialized with 0 entries 0 lookups
18/05/16 11:15:58 INFO namenode.FSNamesystem: Finished loading FSImage in 210 msecs
18/05/16 11:15:58 WARN common.Storage: set restore failed storage to true
18/05/16 11:15:58 INFO namenode.FSEditLog: Edit logging is async:false
18/05/16 11:15:58 INFO namenode.FileJournalManager: Recovering unfinalized segments in /data/datafile/hadoop/hdfs/namenode/current
18/05/16 11:15:59 INFO namenode.FileJournalManager: Finalizing edits file /data/datafile/hadoop/hdfs/namenode/current/edits_inprogress_0000000000001353749 -> /data/datafile/hadoop/hdfs/namenode/current/edits_0000000000001353749-0000000000001353749
18/05/16 11:15:59 INFO client.QuorumJournalManager: Starting recovery process for unclosed journal segments...
18/05/16 11:15:59 INFO client.QuorumJournalManager: Successfully started new epoch 1
18/05/16 11:15:59 INFO namenode.RedundantEditLogInputStream: Fast-forwarding stream '/data/datafile/hadoop/hdfs/namenode/current/edits_0000000000001353749-0000000000001353749' to transaction ID 1353749
18/05/16 11:15:59 INFO namenode.FSEditLog: Starting log segment at 1353749
18/05/16 11:15:59 INFO namenode.FSEditLog: Ending log segment 1353749, 1353749
18/05/16 11:15:59 INFO namenode.FSEditLog: logSyncAll toSyncToTxId=1353749 lastSyncedTxid=1353749 mostRecentTxid=1353749
18/05/16 11:15:59 INFO namenode.FSEditLog: Done logSyncAll lastWrittenTxId=1353749 lastSyncedTxid=1353749 mostRecentTxid=1353749
18/05/16 11:15:59 INFO namenode.FSEditLog: Number of transactions: 1 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 25 



7.接下来会对元数据进行初始化


在原NameNode节点上进行操作

  • 格式化ZKFC

[root@sit-app01 tmp]# sudo su hdfs -l -c 'hdfs zkfc -formatZK'

主要日志如下:

18/05/16 11:18:27 INFO ha.ActiveStandbyElector: Session connected.
18/05/16 11:18:27 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/cluster in ZK.
18/05/16 11:18:27 INFO ha.ActiveStandbyElector: Terminating ZK connection for elector id=931496835 appData=null cb=Elector callbacks for NameNode at sit-app01.insightcredit/192.168.1.37:8020

新添加的NameNode节点上操作

[root@sit-sql software]# sudo su hdfs -l -c 'hdfs namenode -bootstrapStandby'

主要日志如下:

=====================================================
About to bootstrap Standby ID nn2 from:
           Nameservice ID: cluster
        Other Namenode ID: nn1
  Other NN's HTTP address: http://sit-app01.insightcredit:50070
  Other NN's IPC  address: sit-app01.insightcredit/192.168.1.37:8020
             Namespace ID: 1765464713
            Block pool ID: BP-1622793233-192.168.1.37-1510050524750
               Cluster ID: CID-e6f8e811-39fc-4951-b4f9-9286335e3684
           Layout version: -63
       isUpgradeFinalized: true
=====================================================
18/05/16 11:19:11 INFO common.Storage: Storage directory /data/datafile/hadoop/hdfs/namenode has been successfully formatted.
18/05/16 11:19:11 INFO namenode.FSEditLog: Edit logging is async:false
18/05/16 11:19:11 INFO namenode.TransferFsImage: Opening connection to http://sit-app01.insightcredit:50070/imagetransfer?getimage=1&txid=1353748&storageInfo=-63:1765464713:0:CID-e6f8e811-39fc-4951-b4f9-9286335e3684
18/05/16 11:19:11 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
18/05/16 11:19:11 INFO namenode.TransferFsImage: Combined time for fsimage download and fsync to all disks took 0.02s. The fsimage download took 0.01s at 20857.14 KB/s. Synchronous (fsync) write to disk of /data/datafile/hadoop/hdfs/namenode/current/fsimage.ckpt_0000000000001353748 took 0.00s.
18/05/16 11:19:11 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000001353748 size 299611 bytes.

8.最后一步,重启HDFS,开启所有服务


.

9.开启服务后,分别去两台节点查看NameNode状态

原NameNode节点状态为StandBy


新增NameNode节点状态为Active,已经成功的切换了一次,HA配置完成

最后去查看一下HDFS上的文件是否还存在

原来东西都在,大功告成~


赫墨拉

我是一个喜爱大数据的小菜鸡,这里是我分享我的成长和经历的博客

You may also like...

2 Responses

  1. 匿名说道:

    你好,namenode做HA,为啥要停止所有服务啊

发表评论

邮箱地址不会被公开。