目睹这头大象是怎么跳的舞。以下是我在Ubuntu 12.10下面安装JDK以及Hadoop的整个过程。

说明:在最开始时,我在网上各处搜比较妥当的安装hadoop的方法,过程比较纠结;后来才发现直接在官方文档中就可以找到可靠的安装过程,传送门:Hadoop Single Node Setup

一、安装Java开发环境(Ubuntu自带openjdk:可java -version查看版本;或执行sudo apt-get install java提示已安装openjdk)
2、在/usr/下新建java目录:sudo mkdir /usr/java
3、拷贝文件至该新建目录:sudo cp /home/baron/Downloads/jdk-6u37-linux-i586.bin /usr/java
4、更改文件权限,使之可以运行:sudo chmod u+x jdk-6u37-linux-i586.bin
5、运行该文件:sudo jdk-6u37-linux-i586.bin 。至此,usr/java/目录下面有一个bin文件包jdk1.6.0_37,以及解压后的同名文件夹。
6、在profile中配置jdk环境变量:sudo vi /etc/profile,并在后面加上一下几行(千万不能输错,否则进不了桌面系统,如出现该情况:ctrl+alt+F1进入root环境,验证用户名密码,执行:vi /etc/profile正确修改文件):
export JAVA_HOME=/usr/java/jdk1.6.0_37
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

sudo apt-get install ssh

sudo apt-get install rsync

sudo addgroup hadoop
sudo adduser --ingroup hadoop hadoop
2、将下载的hadoop拷贝至该新建文件夹下:sudo cp /home/baron/Downloads/hadoop-1.0.4-bin.tar.gz /home/hadoop/
3、进入该目录(cd /home/hadoop/)之后,解压该文件:sudo tar xzf hadoop-1.0.4-bin.tar.gz
4、进入hadoop-env.sh所在目录(/hadoop-1.0.4/conf/),对该文件进行如下内容的修改:export JAVA_HOME=/usr/java/jdk1.6.0_37
5、hadoop默认是Standalone Operation。可以按照官方文档进行测试:
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.
The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
$ cat output/*
6、或者使用Pseudo-Distributed Operation模式,参照官方文档:
Pseudo-Distributed Operation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
Configuration,Use the following:






Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /home/hadoop/.ssh/id_dsa.
Your public key has been saved in /home/hadoop/.ssh/id_dsa.pub.
The key fingerprint is:
b3:5d:c4:*** hadoop@Baron-SR25E
The key's randomart image is:
+--[ DSA 1024]----+
|       ...o E... |
|      . ...= ..  |
|       o .. +    |
|      .    *     |
|        S + o    |
|         = = .   |
|        . o o o  |
|           . o . |
|          ... .  |
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Format a new distributed-filesystem:
$ bin/hadoop namenode -format
12/11/10 16:25:48 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = Baron-SR25E/
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.4
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
12/11/10 16:25:49 INFO util.GSet: VM type       = 32-bit
12/11/10 16:25:49 INFO util.GSet: 2% max memory = 17.77875 MB
12/11/10 16:25:49 INFO util.GSet: capacity      = 2^22 = 4194304 entries
12/11/10 16:25:49 INFO util.GSet: recommended=4194304, actual=4194304
12/11/10 16:25:49 INFO namenode.FSNamesystem: fsOwner=root
12/11/10 16:25:49 INFO namenode.FSNamesystem: supergroup=supergroup
12/11/10 16:25:49 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/11/10 16:25:49 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/11/10 16:25:49 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/11/10 16:25:49 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/11/10 16:25:50 INFO common.Storage: Image file of size 110 saved in 0 seconds.
12/11/10 16:25:50 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
12/11/10 16:25:50 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at Baron-SR25E/
Start the hadoop daemons:
$ bin/start-all.sh
1)进入超级用户模式,也就是输入"su -"
    su -  
系统会让你输入超级用户密码,输入密码后就进入了超级用户模式,也就是root用户模式。注意这里有"-" ,这和su是不同的,在用命令”su”的时候只是切换到root,但没有把root的环境变量传过去,还是当前用户的环境变量,用”su -”命令将环境变量也一起带过去,就象和root登录一样。
    chmod u+w /etc/sudoers  
    vi /etc/sudoers
进入编辑模式,找到这一 行:
    root ALL=(ALL:ALL) ALL  
    hadoop ALL=(ALL:ALL) ALL  
    chmod u-w /etc/sudoers
Browse the web interface for the NameNode and the JobTracker; by default they are available at:
    NameNode - http://localhost:50070/
    JobTracker - http://localhost:50030/
Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input
Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
Examine the output files:
Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*
View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*
When you're done, stop the daemons with:
$ bin/stop-all.sh





export HADOOP=/home/hadoop/hadoop-1.0.4

export PATH=$HADOOP/bin:$PATH


