• 编译hadoop的原因,官网的hadoop版本是预编译版,由于hadoop需要读写文件,不可能全部用java实现,需要c,c++编译成的动态链接库即.so文件。hadoop编译可以使它支持一些压缩工具。
  • 笔者看了尚硅谷的hadoop教学视频,有意自主动手编译hadoop-3.1.3版本,从官网下载源码解压后发现有start-build-env.sh脚本,该脚本通过dev-support/docker下的Dockerfile文件构建了一个docker镜像,在该镜像构建时安装hadoop编译所需的环境工具。
FROM centos:7WORKDIR /rootADD jdk-8u291-linux-x64.tar.gz /opt/
# 注意这里jdk放在dev-support/docker文件夹下,注意修改文件名
RUN mv /opt/jdk1.8.0_291 /opt/jdk8
ENV JAVA_HOME /opt/jdk8
ENV CLASSPATH $JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
ENV PATH=$PATH:$JAVA_HOME/binRUN yum update -y && \yum install -y java-1.8.0-openjdk \gcc* \make \snappy* \bzip2* \lzo* \zlib* \openssl* \svn \ncurses* \autocong \automake \libtool \epel-release \*zstd* \gcc-c++ \bats \ShellCheck \python3 \sudo \fuse3 \fuse3-devel \doxygen \git \rsync \patch \vim
# Install cmake 3.1.0
RUN mkdir -p /opt/cmake && \curl -L -s -S \https://cmake.org/files/v3.20/cmake-3.20.0-linux-x86_64.tar.gz \-o /opt/cmake.tar.gz && \tar xzf /opt/cmake.tar.gz --strip-components 1 -C /opt/cmake
ENV CMAKE_HOME /opt/cmake
ENV PATH "${PATH}:/opt/cmake/bin"######
# Install Google Protobuf 2.5.0
RUN mkdir -p /opt/protobuf-src && \curl -L -s -S \https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz \-o /opt/protobuf.tar.gz && \tar xzf /opt/protobuf.tar.gz --strip-components 1 -C /opt/protobuf-src
RUN cd /opt/protobuf-src && ./configure --prefix=/opt/protobuf && make install
ENV PROTOBUF_HOME /opt/protobuf
ENV PATH "${PATH}:/opt/protobuf/bin"#RUN curl -L -s  https://rpm.nodesource.com/setup_10.x | bash - && \
# RUN yum install -y nodejs && \
#     npm config set registry https://registry.npm.taobao.org
# RUN npm install -g n && \
#     n lts && PATH="$PATH" && \
#     npm install -g bower && \
#     npm install -g ember-cli
RUN yum install -y wget && \mkdir -p /opt/nodejs && \wget -O /opt/nodejs.tar.xz https://npm.taobao.org/mirrors/node/v14.17.5/node-v14.17.5-linux-x64.tar.xz && \tar xf /opt/nodejs.tar.xz --strip-components 1 -C /opt/nodejs && \ln -s /opt/nodejs/bin/npm /usr/local/bin && \ln -s /opt/nodejs/bin/node /usr/local/bin && \npm config set registry https://registry.npm.taobao.org && \npm install -g bower && \npm install -g ember-cliRUN pip3 install pylint==2.6.0 python-dateutil==2.8.1  -i https://pypi.doubanio.com/simpleRUN mkdir -p /opt/maven && \curl -L -s -S \https://dlcdn.apache.org/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz \-o /opt/maven.tar.gz && \tar xzf /opt/maven.tar.gz --strip-components 1 -C /opt/maven
ENV MAVEN_HOME /opt/maven
ENV PATH "${PATH}:/opt/maven/bin"RUN mkdir -p /opt/isa-l-src \&& yum install -y  automake yasm libtool\&& curl -L -s -S \https://github.com/intel/isa-l/archive/v2.29.0.tar.gz \-o /opt/isa-l.tar.gz \&& tar xzf /opt/isa-l.tar.gz --strip-components 1 -C /opt/isa-l-src \&& cd /opt/isa-l-src \&& ./autogen.sh \&& ./configure \&& make "-j$(nproc)" \&& make install \&& cd /root \&& rm -rf /opt/isa-l-src
# Avoid out of memory errors in builds
ENV MAVEN_OPTS -Xms512m -Xmx3072m
# ENV MAVEN_OPTS -Xms256m -Xmx1536m# Add a welcome message and environment checks.
ADD hadoop_env_checks.sh /root/hadoop_env_checks.sh
RUN chmod 755 /root/hadoop_env_checks.sh
RUN echo '~/hadoop_env_checks.sh' >> /root/.bashrc

我创建了一个source_code 文件夹,将hadoop-3.1.3-src文件夹放入其中。start-build-env.sh文件默认是通过pwd名利把hadoop-3.1.3-src挂载到docker,为了提高该docker容器的复用性(比如之后要编译hive,spark等需要用到类似的环境),在该脚本文件中把默认的 -v "${PWD}:/home/${USER_NAME}/hadoop${V_OPTS:-}" \ 替换成 -v "/xxx/source_code/:/home/${USER_NAME}/source${V_OPTS:-}" \


Building distributions:

Create binary distribution without native code and without documentation:

$ mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true

Create binary distribution with native code and with documentation:

$ mvn package -Pdist,native,docs -DskipTests -Dtar

Create source distribution:

$ mvn package -Psrc -DskipTests

Create source and binary distributions with native code and documentation:

$ mvn package -Pdist,native,docs,src -DskipTests -Dtar

Create a local staging version of the website (in /tmp/hadoop-site)

$ mvn clean site -Preleasedocs; mvn site:stage -DstagingDirectory=/tmp/hadoop-site

Note that the site needs to be built in a second pass after other artifacts.

cd ../source
cd hadoop-3.1.3-src
推荐 mvn package -Pdist,native,docs -DskipTests -Dtar


[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce (enforce-banned-dependencies) on project hadoop-client-check-test-invariants: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :hadoop-client-check-test-invariants

hadoop-client-check-test-invariants 该模块在hadoop各个版本编译过程中经常报错,例如当时最新的版本hadoop-3.3.1 利用提供的脚本无需改变Dockerfile即可成功构建容器,但仍然该模块编译报错

  <modules><!-- Left as an empty artifact w/dep for compat --><module>hadoop-client</module><!-- Should be used at compile scope for access to IA.Public classes --><module>hadoop-client-api</module><!-- Should be used at runtime scope for remaining classes necessary for hadoop-client-api to function --><module>hadoop-client-runtime</module><!-- Should be used at test scope for those that need access to mini cluster that works with above api and runtime --><module>hadoop-client-minicluster</module><!-- Checks invariants above --><module>hadoop-client-check-invariants</module>
<!--     <module>hadoop-client-check-test-invariants</module> --><!-- Attempt to use the created libraries --><module>hadoop-client-integration-tests</module></modules>

重新执行mvn package -Pdist,native,docs -DskipTests -Dtar



