Ambari Hadoop安装启用HA
本文图多,流量慎点。
通过Ambari搭建的Hadoop集群默认不是HA的
非HA集群,一旦namenode宕机,虽然元数据不会丢失,但整个集群将无法对外提供服务,HDFS服务的可靠性不高,这在实际应用场景中显然是不可行的。
想在原有的Ambari Hadoop集群上,启用HA,会有疑问?NameNode会不会重新format,数据会不会丢失,修改是否麻烦
那么,今天这篇文章,就来记录一下,Ambari Hadoop集群启用HA
Ambari版本:2.6.0.3-8
Hadoop版本:2.7.3
节点分配:
IP | 角色 |
192.168.1.35 (sit-sql01) | 新增NameNode,DataNode,新增JournalNode |
192.168.1.36 (sit-usc01) | 新增JournalNode,DataNode |
192.168.1.37 (sit-app01) | 新增JournalNode,NameNode,DataNode |
整个流程如下:
1.节点分配:分配Namenode、JournalNode节点
2.审核:审核前一步,启用HA后服务、节点增删情况
3.备份:开启NameNode的安全模式(只读),建立CheckPoint
4.修改:修改配置文件
5.初始化Journalnode
6.启动组件
7.初始化元数据
8.完成HA安装
1.首先,登陆Ambari,找到HDFS服务,单击右上角Service Actions,找到Enable NameNode HA
2.页面跳转,选择节点,需要配置JournalNode与新增Namenode节点分配
3.对配置进行审核
删除了SecondaryNameNode
额外增加了NameNode节点,3个JournalNode节点
4.创建CheckPoint
需要在原NameNode节点进行操作:
- 让NameNode进入安全模式(即只读)
[root@sit-app01 tmp]# sudo su hdfs -l -c 'hdfs dfsadmin -safemode enter' Safe mode is ON
- 进入安全模式后,创建CheckPoint
[root@sit-app01 tmp]# sudo su hdfs -l -c 'hdfs dfsadmin -saveNamespace' Save namespace successful
- 创建好后,会进行检查,然后进行下一步
5.接下来会进行节点配置
- 先关闭所有的HDFS服务
- 在新添加NameNode节点安装NameNode
- 安装Journalnode服务
- 重新配置HDFS
- 开启Journalnode
- 关闭Secondary NameNode
6.接下来会对Journalnode进行初始化
需要在原NameNode节点上进行操作
执行如下命令,初始化Journalnode,共享Edits
[root@sit-app01 tmp]# sudo su hdfs -l -c 'hdfs namenode -initializeSharedEdits'
主要日志如下:
18/05/16 11:15:57 INFO namenode.NameNode: createNameNode [-initializeSharedEdits] 18/05/16 11:15:57 INFO namenode.FSEditLog: Edit logging is async:false 18/05/16 11:15:57 INFO namenode.FSNamesystem: No KeyProvider found. 18/05/16 11:15:57 INFO namenode.FSNamesystem: Enabling async auditlog 18/05/16 11:15:57 INFO namenode.FSNamesystem: fsLock is fair:false 18/05/16 11:15:57 INFO blockmanagement.HeartbeatManager: Setting heartbeat recheck interval to 30000 since dfs.namenode.stale.datanode.interval is less than dfs.namenode.heartbeat.recheck-interval 18/05/16 11:15:57 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000 18/05/16 11:15:57 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true 18/05/16 11:15:57 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:01:00:00.000 18/05/16 11:15:57 INFO blockmanagement.BlockManager: The block deletion will start around 2018 May 16 12:15:57 18/05/16 11:15:57 INFO util.GSet: Computing capacity for map BlocksMap 18/05/16 11:15:57 INFO util.GSet: VM type = 64-bit 18/05/16 11:15:57 INFO util.GSet: 2.0% max memory 1011.3 MB = 20.2 MB 18/05/16 11:15:57 INFO util.GSet: capacity = 2^21 = 2097152 entries 18/05/16 11:15:57 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=true 18/05/16 11:15:57 INFO blockmanagement.BlockManager: dfs.block.access.key.update.interval=600 min(s), dfs.block.access.token.lifetime=600 min(s), dfs.encrypt.data.transfer.algorithm=null 18/05/16 11:15:58 INFO blockmanagement.BlockManager: defaultReplication = 3 18/05/16 11:15:58 INFO blockmanagement.BlockManager: maxReplication = 50 18/05/16 11:15:58 INFO blockmanagement.BlockManager: minReplication = 1 18/05/16 11:15:58 INFO blockmanagement.BlockManager: maxReplicationStreams = 2 18/05/16 11:15:58 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000 18/05/16 11:15:58 INFO blockmanagement.BlockManager: encryptDataTransfer = false 18/05/16 11:15:58 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000 18/05/16 11:15:58 INFO namenode.FSNamesystem: fsOwner = hdfs (auth:SIMPLE) 18/05/16 11:15:58 INFO namenode.FSNamesystem: supergroup = hdfs 18/05/16 11:15:58 INFO namenode.FSNamesystem: isPermissionEnabled = true 18/05/16 11:15:58 INFO namenode.FSNamesystem: Determined nameservice ID: cluster 18/05/16 11:15:58 INFO namenode.FSNamesystem: HA Enabled: true 18/05/16 11:15:58 INFO namenode.FSNamesystem: Append Enabled: true 18/05/16 11:15:58 INFO util.GSet: Computing capacity for map INodeMap 18/05/16 11:15:58 INFO util.GSet: VM type = 64-bit 18/05/16 11:15:58 INFO util.GSet: 1.0% max memory 1011.3 MB = 10.1 MB 18/05/16 11:15:58 INFO util.GSet: capacity = 2^20 = 1048576 entries 18/05/16 11:15:58 INFO namenode.FSDirectory: ACLs enabled? false 18/05/16 11:15:58 INFO namenode.FSDirectory: XAttrs enabled? true 18/05/16 11:15:58 INFO namenode.FSDirectory: Maximum size of an xattr: 16384 18/05/16 11:15:58 INFO namenode.NameNode: Caching file names occuring more than 10 times 18/05/16 11:15:58 INFO util.GSet: Computing capacity for map cachedBlocks 18/05/16 11:15:58 INFO util.GSet: VM type = 64-bit 18/05/16 11:15:58 INFO util.GSet: 0.25% max memory 1011.3 MB = 2.5 MB 18/05/16 11:15:58 INFO util.GSet: capacity = 2^18 = 262144 entries 18/05/16 11:15:58 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9900000095367432 18/05/16 11:15:58 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0 18/05/16 11:15:58 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000 18/05/16 11:15:58 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10 18/05/16 11:15:58 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10 18/05/16 11:15:58 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25 18/05/16 11:15:58 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 18/05/16 11:15:58 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis 18/05/16 11:15:58 INFO util.GSet: Computing capacity for map NameNodeRetryCache 18/05/16 11:15:58 INFO util.GSet: VM type = 64-bit 18/05/16 11:15:58 INFO util.GSet: 0.029999999329447746% max memory 1011.3 MB = 310.7 KB 18/05/16 11:15:58 INFO util.GSet: capacity = 2^15 = 32768 entries 18/05/16 11:15:58 INFO common.Storage: Lock on /data/datafile/hadoop/hdfs/namenode/in_use.lock acquired by nodename 10246@sit-app01.insightcredit 18/05/16 11:15:58 INFO namenode.FSImage: No edit log streams selected. 18/05/16 11:15:58 INFO namenode.FSImage: Planning to load image: FSImageFile(file=/data/datafile/hadoop/hdfs/namenode/current/fsimage_0000000000001353748, cpktTxId=0000000000001353748) 18/05/16 11:15:58 INFO namenode.FSImageFormatPBINode: Loading 3491 INodes. 18/05/16 11:15:58 INFO namenode.FSImageFormatProtobuf: Loaded FSImage in 0 seconds. 18/05/16 11:15:58 INFO namenode.FSImage: Loaded image for txid 1353748 from /data/datafile/hadoop/hdfs/namenode/current/fsimage_0000000000001353748 18/05/16 11:15:58 INFO namenode.FSNamesystem: Need to save fs image? false (staleImage=false, haEnabled=true, isRollingUpgrade=false) 18/05/16 11:15:58 INFO namenode.NameCache: initialized with 0 entries 0 lookups 18/05/16 11:15:58 INFO namenode.FSNamesystem: Finished loading FSImage in 210 msecs 18/05/16 11:15:58 WARN common.Storage: set restore failed storage to true 18/05/16 11:15:58 INFO namenode.FSEditLog: Edit logging is async:false 18/05/16 11:15:58 INFO namenode.FileJournalManager: Recovering unfinalized segments in /data/datafile/hadoop/hdfs/namenode/current 18/05/16 11:15:59 INFO namenode.FileJournalManager: Finalizing edits file /data/datafile/hadoop/hdfs/namenode/current/edits_inprogress_0000000000001353749 -> /data/datafile/hadoop/hdfs/namenode/current/edits_0000000000001353749-0000000000001353749 18/05/16 11:15:59 INFO client.QuorumJournalManager: Starting recovery process for unclosed journal segments... 18/05/16 11:15:59 INFO client.QuorumJournalManager: Successfully started new epoch 1 18/05/16 11:15:59 INFO namenode.RedundantEditLogInputStream: Fast-forwarding stream '/data/datafile/hadoop/hdfs/namenode/current/edits_0000000000001353749-0000000000001353749' to transaction ID 1353749 18/05/16 11:15:59 INFO namenode.FSEditLog: Starting log segment at 1353749 18/05/16 11:15:59 INFO namenode.FSEditLog: Ending log segment 1353749, 1353749 18/05/16 11:15:59 INFO namenode.FSEditLog: logSyncAll toSyncToTxId=1353749 lastSyncedTxid=1353749 mostRecentTxid=1353749 18/05/16 11:15:59 INFO namenode.FSEditLog: Done logSyncAll lastWrittenTxId=1353749 lastSyncedTxid=1353749 mostRecentTxid=1353749 18/05/16 11:15:59 INFO namenode.FSEditLog: Number of transactions: 1 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 25
7.接下来会对元数据进行初始化
在原NameNode节点上进行操作
- 格式化ZKFC
[root@sit-app01 tmp]# sudo su hdfs -l -c 'hdfs zkfc -formatZK'
主要日志如下:
18/05/16 11:18:27 INFO ha.ActiveStandbyElector: Session connected. 18/05/16 11:18:27 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/cluster in ZK. 18/05/16 11:18:27 INFO ha.ActiveStandbyElector: Terminating ZK connection for elector id=931496835 appData=null cb=Elector callbacks for NameNode at sit-app01.insightcredit/192.168.1.37:8020
在新添加的NameNode节点上操作
[root@sit-sql software]# sudo su hdfs -l -c 'hdfs namenode -bootstrapStandby'
主要日志如下:
===================================================== About to bootstrap Standby ID nn2 from: Nameservice ID: cluster Other Namenode ID: nn1 Other NN's HTTP address: http://sit-app01.insightcredit:50070 Other NN's IPC address: sit-app01.insightcredit/192.168.1.37:8020 Namespace ID: 1765464713 Block pool ID: BP-1622793233-192.168.1.37-1510050524750 Cluster ID: CID-e6f8e811-39fc-4951-b4f9-9286335e3684 Layout version: -63 isUpgradeFinalized: true ===================================================== 18/05/16 11:19:11 INFO common.Storage: Storage directory /data/datafile/hadoop/hdfs/namenode has been successfully formatted. 18/05/16 11:19:11 INFO namenode.FSEditLog: Edit logging is async:false 18/05/16 11:19:11 INFO namenode.TransferFsImage: Opening connection to http://sit-app01.insightcredit:50070/imagetransfer?getimage=1&txid=1353748&storageInfo=-63:1765464713:0:CID-e6f8e811-39fc-4951-b4f9-9286335e3684 18/05/16 11:19:11 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds 18/05/16 11:19:11 INFO namenode.TransferFsImage: Combined time for fsimage download and fsync to all disks took 0.02s. The fsimage download took 0.01s at 20857.14 KB/s. Synchronous (fsync) write to disk of /data/datafile/hadoop/hdfs/namenode/current/fsimage.ckpt_0000000000001353748 took 0.00s. 18/05/16 11:19:11 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000001353748 size 299611 bytes.
8.最后一步,重启HDFS,开启所有服务
.
9.开启服务后,分别去两台节点查看NameNode状态
原NameNode节点状态为StandBy
新增NameNode节点状态为Active,已经成功的切换了一次,HA配置完成
最后去查看一下HDFS上的文件是否还存在
原来东西都在,大功告成~
你好,namenode做HA,为啥要停止所有服务啊