4. 集群配置

4.1 集群部署规划

安装前先做好节点规划，完全分布式目前规划1个主节点（Master）和2个从节点（Slave）一共三个节点。
部署规划要求：

NameNode和SecondaryNameNode不要安装在同一台服务器。
ResourceManager也很消耗内存，不要和NameNode、SecondaryNameNode配置在同一台机器上。

	hadoop102	hadoop103	hadoop104
HDFS	NameNodeDataNode	DataNode	SecondaryNameNodeDataNode
YARN	NodeManager	ResourceManagerNodeManager	NodeManager

4.2 配置文件说明

Hadoop配置文件分两类：默认配置文件和自定义配置文件，只有用户想修改某一默认配置值时，才需要修改自定义配置文件，更改相应属性值。

默认配置文件：

要获取的默认文件	文件存放在Hadoop的jar包中的位置
[core-default.xml]	hadoop-common-3.1.3.jar/core-default.xml
[hdfs-default.xml]	hadoop-hdfs-3.1.3.jar/hdfs-default.xml
[yarn-default.xml]	hadoop-yarn-common-3.1.3.jar/yarn-default.xml
[mapred-default.xml]	hadoop-mapreduce-client-core-3.1.3.jar/mapred-default.xml

以上文件参数很多，具体使用时可以访问Hadoop官方文档，进入文档底部的Configuration部分进行学习和查看。
在这里插入图片描述

自定义配置文件：

core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml四个配置文件存放在$HADOOP_HOME/etc/hadoop这个路径上，用户可以根据项目需求重新进行修改配置。

[li@hadoop102 ~]$ cd /opt/module/hadoop-3.1.3/etc/hadoop
[li@hadoop102 hadoop]$ ll
总用量 176
-rw-r--r--. 1 li li  8260 9月  12 2019 capacity-scheduler.xml
-rw-r--r--. 1 li li  1335 9月  12 2019 configuration.xsl
-rw-r--r--. 1 li li  1940 9月  12 2019 container-executor.cfg
-rw-r--r--. 1 li li   774 9月  12 2019 core-site.xml #
-rw-r--r--. 1 li li  3999 9月  12 2019 hadoop-env.cmd
-rw-r--r--. 1 li li 15903 9月  12 2019 hadoop-env.sh
-rw-r--r--. 1 li li  3323 9月  12 2019 hadoop-metrics2.properties
-rw-r--r--. 1 li li 11392 9月  12 2019 hadoop-policy.xml
-rw-r--r--. 1 li li  3414 9月  12 2019 hadoop-user-functions.sh.example
-rw-r--r--. 1 li li   775 9月  12 2019 hdfs-site.xml #
-rw-r--r--. 1 li li  1484 9月  12 2019 httpfs-env.sh
-rw-r--r--. 1 li li  1657 9月  12 2019 httpfs-log4j.properties
-rw-r--r--. 1 li li    21 9月  12 2019 httpfs-signature.secret
-rw-r--r--. 1 li li   620 9月  12 2019 httpfs-site.xml
-rw-r--r--. 1 li li  3518 9月  12 2019 kms-acls.xml
-rw-r--r--. 1 li li  1351 9月  12 2019 kms-env.sh
-rw-r--r--. 1 li li  1747 9月  12 2019 kms-log4j.properties
-rw-r--r--. 1 li li   682 9月  12 2019 kms-site.xml
-rw-r--r--. 1 li li 13326 9月  12 2019 log4j.properties
-rw-r--r--. 1 li li   951 9月  12 2019 mapred-env.cmd
-rw-r--r--. 1 li li  1764 9月  12 2019 mapred-env.sh
-rw-r--r--. 1 li li  4113 9月  12 2019 mapred-queues.xml.template
-rw-r--r--. 1 li li   758 9月  12 2019 mapred-site.xml #
drwxr-xr-x. 2 li li  4096 9月  12 2019 shellprofile.d
-rw-r--r--. 1 li li  2316 9月  12 2019 ssl-client.xml.example
-rw-r--r--. 1 li li  2697 9月  12 2019 ssl-server.xml.example
-rw-r--r--. 1 li li  2642 9月  12 2019 user_ec_policies.xml.template
-rw-r--r--. 1 li li    10 9月  12 2019 workers
-rw-r--r--. 1 li li  2250 9月  12 2019 yarn-env.cmd
-rw-r--r--. 1 li li  6056 9月  12 2019 yarn-env.sh
-rw-r--r--. 1 li li  2591 9月  12 2019 yarnservice-log4j.properties
-rw-r--r--. 1 li li   690 9月  12 2019 yarn-site.xml #

4.3 配置集群

要在多台计算机上进行hadoop集群搭建，还需要对相关配置文件进行修改，来保证集群服务协调运行。进入/opt/module/hadoop-3.1.3/etc/hadoop目录，并修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、workers共5个配置文件的内容。

4.3.1 配置core-site.xml文件

core-site.xml是Hadoop的核心配置文件，用于配置HDFS地址、端口号、以及临时文件目录，即fs.defaultFS和hadoop.tmp.dir。fs.defaultFS配置了Hadoop的HDFS文件系统的NameNode端口。hadoop.tmp.dir配置了Hadoop的临时文件的目录。
将目录切换到/etc/hadoop

[li@hadoop102 ~]$ cd /opt/module/hadoop-3.1.3/etc/hadoop/

使用vim编辑器打开文件

[li@hadoop102 hadoop]$ vim core-site.xml

文件内容如下：

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at[http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
--><!-- Put site-specific property overrides in this file. -->
<configuration><!-- 指定NameNode的地址 --><property><name>fs.defaultFS</name><value>hdfs://hadoop102:8020</value></property><!-- 指定hadoop数据的存储目录 --><property><name>hadoop.tmp.dir</name><value>/opt/module/hadoop-3.1.3/data</value></property><!-- 配置HDFS网页登录使用的静态用户为li --><property><name>hadoop.http.staticuser.user</name><value>li</value></property>
</configuration>

上述文件配置了HDFS的主进程NameNode运行主机（Hadoop集群的主节点）,同时配置了Hadoop运行时生成数据的临时目录。
:wq保存退出。

4.3.2 配置hdfs-site.xml文件

hdfs-site.xml设置了HDFS相关的配置，HDFS的NameNode和DataNode两大进程。dfs.namenode.name.dir和dfs.datanode.data.dir分别指定了NameNode元数据和DataNode数据存储位置。dfs.namenode.secondary.http-address配置了SecondaryNameNode的地址。dfs.replication配置了文件块的副本数，默认为3个副本，不作修改。
打开hdfs-site.xml文件：

[li@hadoop102 hadoop]$ vim hdfs-site.xml

文件内容如下：

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at[http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration><!-- nn web端访问地址--><property><name>dfs.namenode.http-address</name><value>hadoop102:9870</value></property><!-- 2nn web端访问地址--><property><name>dfs.namenode.secondary.http-address</name><value>hadoop104:9868</value></property>
</configuration>

上述文件配置了NameNode的web访问地址，SecondaryNameNode所在主机的HTTP地址。
:wq保存退出。

4.3.3 配置mapred-site.xml文件

mapred-site.xml设置了MapReduce框架的相关配置，由于Hadoop 3.x使用了YARN框架，所以必须指定mapreduce.framework.name配置项的值为“yarn”。mapreduce.jobhistory.address和mapreduce.jobhistoryserver.webapp.address是JobHistoryserver的相关配置，即运行MapReduce任务的日志相关服务端口。此文件用于指定MapReduce运行框架，是MapReduce的核心配置文件。
打开mapred-site.xml文件

[li@hadoop102 hadoop]$ vim mapred-site.xml

文件内容如下：

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at[http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration><!-- 指定MapReduce程序运行在Yarn上 --><property><name>mapreduce.framework.name</name><value>yarn</value></property><!-- 历史服务器端地址 -->
<property><name>mapreduce.jobhistory.address</name><value>hadoop102:10020</value>
</property><!-- 历史服务器web端地址 -->
<property><name>mapreduce.jobhistory.webapp.address</name><value>hadoop102:19888</value>
</property>
</configuration>

上述配置文件中，设置了执行框架设置为YARN。
:wq保存退出。

4.3.4 配置yarn-site.xml文件

yarn-site.xml文件设置了YARN框架的相关配置，文件中命名了一个yarn.resourcemanager.hostname的变量，指定YARN集群的管理者在YARN的相关配置中可以直接引用该变量，其他配置保持不变即可。
打开yarn-site.xml文件

[li@hadoop102 hadoop]$ vim yarn-site.xml

文件内容如下：

<?xml version="1.0"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
-->
<configuration><!-- Site specific YARN configuration properties -->
<!-- 指定MR走shuffle --><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><!-- 指定ResourceManager的地址--><property><name>yarn.resourcemanager.hostname</name><value>hadoop103</value></property><!-- 环境变量的继承 --><property><name>yarn.nodemanager.env-whitelist</name><value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value></property>
</configuration>

上述配置文件中，配置了YARN的主进程ResourceManager运行主机为hadoop103，将NodeManager运行时的附属服务配置为：mapreduce_shuffle以及环境变量的继承。
:wq保存。

其他配置也可以参考hadoop官方文档进行，网址：https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html

4.3.5 配置workers

workers文件保存的是从节点（slave节点）的信息。

[li@hadoop102 ~]$ vim /opt/module/hadoop-3.1.3/etc/hadoop/workers

在该文件中增加如下内容：

hadoop102
hadoop103
hadoop104

注意：该文件中添加的内容结尾不允许有空格，文件中不允许有空行。
:wq保存退出。

4.3.6 配置日志的聚集

日志聚集概念：应用运行完成以后，将程序运行日志信息上传到HDFS系统上。
在这里插入图片描述
日志聚集功能好处：可以方便的查看到程序运行详情，方便开发调试。
注意：开启日志聚集功能，需要重新启动NodeManager 、ResourceManager和HistoryServer。
开启日志聚集功能具体步骤如下：

配置yarn-site.xml

[li@hadoop102 hadoop]$ vim yarn-site.xml

添加如下配置：

<!-- 开启日志聚集功能 -->
<property><name>yarn.log-aggregation-enable</name><value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>  <name>yarn.log.server.url</name>  <value>[http://hadoop102:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为7天 -->
<property><name>yarn.log-aggregation.retain-seconds</name><value>604800</value>
</property>

:wq保存。

4.4 在集群上分发配置好的Hadoop配置文件

使用xsync工具进行文件分发：

[li@hadoop102 hadoop]$ xsync /opt/module/hadoop-3.1.3/etc/hadoop/

4.5 查看文件分发情况

到hadoop103和hadoop104上查看文件分发情况：

[li@hadoop103 hadoop]$ cat /opt/module/hadoop-3.1.3/etc/hadoop/core-site.xml
[li@hadoop104 hadoop]$ cat /opt/module/hadoop-3.1.3/etc/hadoop/core-site.xml

集群配置完成。