编译与运行Standalone

Posted by AlstonWilliams on February 17, 2019

阅读源码,肯定少不了编译和运行这一步。

我选择的源码的版本是Spark 2.4.0-SNAPSHOT这一个版本。

编译的方法很简单,只需要在Spark的源码目录下,运行下面的命令就好了:

./build/mvn -DskipTests clean package

编译比较耗时间,占的CPU也较高。所以建议晚上睡觉时,开着电脑让它编译完成。

编译完以后,就可以运行了。这里我们为了调试方便,只是运行的Standalone,这样就不需要额外安装Hadoop的那一套,或者Mesos这些东西。

Standalone的运行方式也很简单。

首先,运行Spark master:

sbin/start-master.sh

然后,在其日志中,我们能够看到一个master的url:

我们要记住这个url,后面多次要使用。

然后,我们再来启动一个slave节点:

sbin/start-slave.sh spark://alstonwilliams:7077

start-slave.sh后面跟的是master的url。你应该换成你的。

然后,修改配置文件(位于conf目录下),将spark-defaults.conf修改成下面这样子:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
spark.master                     spark://alstonwilliams:7077
spark.eventLog.enabled           true
spark.eventLog.dir               file:///tmp/spark-events
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

其中spark.master是告诉Application,如何找到master。spark.eventLog.enabledspark.eventLog.dir是配合HistoryServer使用的,如果不设置,Application不会输出日志,我们在HistoryServer中也就看不到我们跑过的Application。

另外,需要注意的是,spark.eventLog.dir对应的目录一定要存在,否则HistoryServer启动时会报错的。

好了,上面这些完成以后,通过sbin/start-history-server.sh启动一个HistoryServer,我们就可以愉快的玩耍了。

对了,Spark Master WebUI的端口号,默认是8080,Spark Worker WebUI的端口号,默认是8081。如果你同时还在开发Web应用,那么这两个端口大概率会被占用。我们可以通过修改conf/spark-env.sh来设置新的端口。在我的本机上,我分别设置成了90909091: