一、环境及配置
运行环境:Ubuntu 16.04 LTS AMD64、OpenJDK 9、Hadoop-1.2.1、OpenSSH
Hadoop依赖JDK,请确认系统已安装JDK,具体设置请参考文章: Ubuntu安装Oracle JDK8
二、 Hadoop安装及配置
2.1 下载Hadoop
通过wget在清华大学镜像源下载Hadoop并解压到/opt
1
2
3
| $ cd /opt
$ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
$ tar -zxvf hadoop-1.2.1.tar.gz
|
2.2 配置Hadoop参数
2.2.1 配置文件路径
1
| $ cd /opt/hadoop-1.2.1/conf
|
2.2.2 配置hadoop-env.sh
在文件中添加安装JDK
的路径,这里用的是OpenJDK
1
2
3
| $ vim hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-9-openjdk-amd64
|
2.2.3 配置core-site.xml
配置内容如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| <configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/hadoop/name</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
|
2.2.4 配置hdfs-site.xml
配置内容如下:
1
2
3
4
5
6
| <configuration>
<property>
<name>dfs.data.dir</name>
<value>/hadoop/data</value>
</property>
</configuration>
|
2.2.5 配置mapred-site.xml
配置内容如下:
1
2
3
4
5
6
| <configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
|
2.2.6 配置/etc/profile
增加以下配置:
1
| export HADOOP_HOME=/opt/hadoop-1.2.1
|
相同文件PATH
变量中追加参数$HADOOP_HOME/bin:
,保存、退出并刷新profile
2.3 初始化NameNode
运行命令后会自动开始格式化,最后NameNode自动关闭
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
| $ hadoop namenode -format
************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = mike-virtual-machine/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 9-internal
************************************************************/
16/06/18 11:36:16 INFO util.GSet: Computing capacity for map BlocksMap
16/06/18 11:36:16 INFO util.GSet: VM type = 64-bit
16/06/18 11:36:16 INFO util.GSet: 2.0% max memory = 1048576000
16/06/18 11:36:16 INFO util.GSet: capacity = 2^21 = 2097152 entries
16/06/18 11:36:16 INFO util.GSet: recommended=2097152, actual=2097152
16/06/18 11:36:17 INFO namenode.FSNamesystem: fsOwner=root
16/06/18 11:36:17 INFO namenode.FSNamesystem: supergroup=supergroup
16/06/18 11:36:17 INFO namenode.FSNamesystem: isPermissionEnabled=true
16/06/18 11:36:17 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
16/06/18 11:36:17 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
16/06/18 11:36:17 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
16/06/18 11:36:17 INFO namenode.NameNode: Caching file names occuring more than 10 times
16/06/18 11:36:17 INFO common.Storage: Image file /hadoop/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
16/06/18 11:36:17 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/hadoop/dfs/name/current/edits
16/06/18 11:36:17 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/hadoop/dfs/name/current/edits
16/06/18 11:36:17 INFO common.Storage: Storage directory /hadoop/dfs/name has been successfully formatted.
16/06/18 11:36:17 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at mike-virtual-machine/127.0.1.1
************************************************************/
|
三、 启动Hadoop服务
进入Hadoop目录,通过脚本启动Hadoop
1
2
| $ cd /opt/hadoop-1.2.1/bin
$ ./start_all.sh
|
运行结果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| starting namenode, logging to /opt/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-mike-virtual-machine.out
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
root@localhost's password:
localhost:
localhost: starting datanode, logging to /opt/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-mike-virtual-machine.out
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
root@localhost's password:
localhost:
localhost: starting secondarynamenode, logging to /opt/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-mike-virtual-machine.out
starting jobtracker, logging to /opt/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-mike-virtual-machine.out
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
root@localhost's password:
localhost:
localhost: starting tasktracker, logging to /opt/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-mike-virtual-machine.out
|
用jps
查看可知Hadoop已启动
1
2
3
4
5
6
7
| $ jps
2531 DataNode
2524 NameNode
2678 SecondaryNameNode
2778 JobTracker
2940 TaskTracker
3039 sun.tools.jps.Jps
|
四、 疑难解答
4.1 解决 /etc/profile 失效
在~/.bashrc
文件添加环境变量
1
2
3
4
5
6
7
8
9
| $ cd ~
$ vim .bashrc
export HADOOP_HOME_WARN_SUPPRESS=1
export HADOOP_HOME=/opt/hadoop-1.2.1
export JAVA_HOME=/usr/lib/jvm/java-9-openjdk-amd64/
export JER_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$PATH
$ source .bashrc
|
4.2 找不到配置文件
这个错误可能在使用OpenJDK时出现
1
| Error: Config file not found: /usr/lib/jvm/java-9-openjdk-amd64/conf/management/management.properties
|
原因是软链接不存在,手动创建正确软链接即可
1
2
3
4
5
| $ cd /usr/lib/jvm/java-9-openjdk-amd64
$ touch conf
$ ln -s lib conf
$ ls -la conf
lrwxrwxrwx 1 root root 3 6月 16 17:24 conf -$ lib
|
4.3 HADOOP_HOME is deprecated
1
| Warning: $HADOOP_HOME is deprecated
|
在~/.bash_profile
里增加一个环境变量抑制错误提示
1
| export HADOOP_HOME_WARN_SUPPRESS=1
|
4.4 SSH无法连接
没有安装SSH
服务导致
1
| localhost: ssh: connect to host localhost port 22: Connection refused
|
下载openssh-server
1
2
| $ ssh-agent
$ apt-get install openssh-server
|
配置ssh_config
1
2
3
| $ cd /etc/ssh/ssh_config
StrictHostKeyChecking no # 修改为no
UserKnownHostsFile /dev/null # 添加这行
|
配置sshd_config
1
2
3
| $ cd /etc/ssh/sshd_config
PermitRootLogin prohibit-password # 改为 yes
PasswordAuthentication prohibit-password # 取消注释
|
重启系统,开启ssh
服务
1
2
| $ service sshd restart
$ ssh localhost # 检查ssh服务
|
4.5 安装dpkg报错
多个窗口同时使用apt-get
会出现此错误,避免同时使用多个窗口
1
2
3
| .....
.....
E:Sub-process /usr/bin/dpkg returned an error code(1)
|
如果问题无法解决,可以尝试:
1
2
3
4
5
6
| $ cd /var/lib/dpkg
$ mv info infobak
$ mkdir info
$ mv ./info/* ./infobak
$ rm -rf ./info
$ mv ./infobak ./info
|