一、前置准备
CentOS7、jdk1.8、hive-2.3.6、hadoop-2.7.7、spark-2.0.0-bin-hadoop2-without-hijared对jensen独占欲ve
二、简单了解
2.1 手动编译Spark
Spark下载地址:https://archive.apache.其他综合收益属于什么科目org系统运维工程师面试问题及答案/dist/spark/spark-2.0.0/
源码包只有12M其他和其它的区别,下载完成后解压并进行编译(去hive模块)
2.2系统/运维 编译好的gz包
链接:https://pan.baidu.cosparkedm/s/15dkf-hiveos官网appDMc6CB0-oifQUy9OA
提取码:6y4e
三、更换Spark引擎
3.1 hive-site.xml
在原有的配置基础上其他应付款增加以下配置:
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>hive.enable.spark.execution.engine</name>
<value>true</value>
</property>
<property>
<name>spark.home</name>
<value>/opt/software/spark-2.0.0</value>
</property>
<property>
<name>spark.master</name>
<value>yarn</value>
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>hdfs://hacluster:8020/spark-hive-jobhistory</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>512m</value>
</property>
<property>
<name>spark.driver.memory</name>
<value>512m</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>spark.yarn.jars</name>
<value>hdfs://hacluster:8020/spark-jars/*</value>
</property>
<property>
<name>hive.spark.client.server.connect.timeout</name>
<value>300000</value>
</property>
3.2 spark-env.sh
export JAVA_HOME=/opt/moudle/jdk1.8.0_191
export SCALA_HOME=/opt/moudle/scala-2.11.12
export HADOOP_HOME=/opt/software/hadoop-2.7.7
export HADOOP_CONF_DIR=/opt/software/hadoop-2.7.7/etc/hadoop
export HADOOP_YARN_CONF_DIR=/opt/software/hadoop-2.7.7/etc/hadoop
export SPARK_HOME=/opt/software/spark-2.0.0
export SPARK_WORKER_MEMORY=512m
export SPARK_EXECUTOR_MEMORY=512m
export SPARK_DRIVER_MEMORY=512m
export SPARK_DIST_CLASSPATH=$(/opt/software/hadoop-2.7.7/bin/hadoop classpath)
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=hadoop01:2181,hadoop02:2181,hadoop03:2181 -Dspark.deploy.zookeeper.dir=/ha-on-spark"
3.3 slaves
[xiaokang@hadoop01 conf]$ cp slaves.template slaves
hadoop01
hadoop02
hadoop03
3.4 拷贝jar包和xml文件
将Hive的lib目hadoop生态圈中的框架不包括录下的指定jar包拷贝到Spark的jars目录hadoop三大组件下:
hive-beeline-2.3.6.jar
hive-cli-2.3.6.jar
hive-exec-2.3.6.jar
hive-jdbc-2.3.6.jar
hive-metastore-2.3.6.jar
[xiaokang@hadoop01 lib]$ cp hive-beeline-2.3.6.jar hive-cli-2.3.6.jar hive-exec-2.3.6.jar hive-jdbc-2.3.6.jar hive-metastore-2.3.6.jar /opt/software/spark-2.0.0/jars/
将Spark的jars目录下的指定jar包拷贝到Hive的lib目录下:
spark-network-common_2.11-2.0.0.jar
spark-core_2.11-2.0.0.jar
scala-library-2.11.8.jar
chill-java
chill
jackson-module-paranamer,
jackson-module-scala,
jersey-container-servlet-core
jersey-server,
json4s-ast ,
kryo-shaded,
minlog,
scala-xml,
spark-launcher
spark-network-shuffle,
spark-unsafe ,
xbean-asm5-shaded
[xiaokang@hadoop01 jars]$ cp spark-network-common_2.11-2.0.0.jar spark-core_2.11-2.0.0.jar scala-library-2.11.8.jar chill-java-0.8.0.jar chill_2.11-0.8.0.jar jackson-module-paranamer-2.6.5.jar jackson-module-scala_2.11-2.6.5.jar jersey-container-servlet-core-2.22.2.jar jersey-server-2.22.2.jar json4s-ast_2.11-3.2.11.jar kryo-shaded-3.0.3.jar minlog-1.3.0.jar scala-xml_2.11-1.0.2.jar spark-launcher_2.11-2.0.0.jar spark-network-shuffle_2.11-2.0.0.jar spark-unsafe_2.11-2.0.0.jar xbean-asm5-shaded-4.4.jar /opt/software/hive-2.3.6/lib/
将hadoop中的yarn-site.xml、hdfs-site.xml以及Hive的hive-site.xml放入spark的conf中
[xiaokang@hadoop01 ~]$ cp /opt/software/hadoop-2.7.7/etc/hadoop/hdfs-site.xml /opt/software/hadoop-2.7.7/etc/hadoop/yarn-site.xml /opt/software/hive-2.3.6/conf/hive-site.xml /opt/software/spark-2.0.0/conf/
3.5 上传至HDFS
为了使各个节点都能够使用 Spark 引擎进行计算,需要将Spark的jars目录下所有依赖包上传至HDFS
# 在HDFS上创建一个目录,用来存放spark依赖包
[xiaokang@hadoop01 ~]$ hdfs dfs -mkdir /spark-jars
# 上传所有依赖包
[xiaokang@hadoop01 ~]$ hdfs dfs -put /opt/software/spark-2.0.0/jars/*.jar /spark-jars
3.6 分发
[xiaokang@hadoop01 ~]$ distribution.sh /opt/software/spark-2.0.0
3.7 启动HA-Spark集群
四、测试
# 启动hiveserver2服务
[xiaokang@hadoop01 ~]$ nohup hiveserver2 >/dev/null 2>&1 &
#查询每天每人的登录次数
select userid,dt,count(*) as loginTimes
from game_login
group by dt,userid;
Spark界面如下图大数据与会计所示:
Yarn界面如下图所示:其他和其它的区别
发表评论