Spark2.3.3创建DataFrame的14种方式和源码解析(二)【全网最全】

目录

一、SparkSQL通过Hive创建DataFrame问题分析

问题一:
问题二:
问题三:
问题四:
问题五:
问题六:

二、SparkSQL通过Hive创建DataFrame代码

三、数据及结果展示



SparkSQL通过Hive创建DataFrame问题分析

问题一:

问题一
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException:
Table or view 'stu' not found in database 'default';

分析:确实没有临时表View,并且没有开启Hive支持
解决:开启Hive支持
val spark: SparkSession = SparkSession.builder()
.appName("SparkUtils")
.master("local[*]")
.enableHiveSupport() // Hive支持
.getOrCreate()

问题二:

问题二
hive> show databases;
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

分析和解决
1.2.0 SparkSql自带的Hive版本为1.2.0,每次连接之后版本会变成自带版本1.2.0,而本机版本为2.3.1,所以报错
2.3.1 将版本改为2.3.1

问题三:

问题三:
Exception in thread "main" java.lang.IllegalArgumentException:
Unable to instantiate SparkSession with Hive support because Hive classes are not found.

分析: 因为没有spark连接hive的jar
解决:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.3.3</version>
</dependency>

问题四:

问题四:
19/12/31 11:00:53 INFO HiveMetaStore: 0: get_table : db=default tbl=stu
19/12/31 11:00:53 INFO audit: ugi=yuhui ip=unknown-ip-addr cmd=get_table : db=default tbl=stu
Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: stu; line 2 pos 14

分析:
缺少 core-site.xml , hdfs-site.xml ,hive-site.xml

解决:
/usr/app/hadoop-2.8.5/etc/hadoop/core-site.xml
/usr/app/hadoop-2.8.5/etc/hadoop/hdfs-site.xml
/usr/app/apache-hive-2.3.1-bin/conf/hive-site.xml
放在resources中

问题五:

问题五:
Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException:
The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH.
Please check your CLASSPATH specification, and the name of the driver.

<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.39</version>
</dependency>

问题六:

问题六:
Exception in thread "main" org.apache.hadoop.security.AccessControlException:
Permission denied: user=yuhui, access=READ_EXECUTE, inode="/user/hive/warehouse/stu":root:supergroup:drwx-wx-wx
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:318)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:225)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:189)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1663)


方法一、在代码中加入这句话: System.setProperty("HADOOP_USER_NAME","ROOT")
方法二、hadoop fs -chmod 777 /user/hive/warehouse/stu.txt

SparkSQL通过Hive创建DataFrame代码

package bolg

import org.apache.spark.sql.{DataFrame, SparkSession}

/**
* @author: 余辉
* @create: 2019-12-31 10:31
* @description:
**/
object DF03_Create_Hive {

def main(args: Array[String]): Unit = {

System.setProperty("HADOOP_USER_NAME", "ROOT")

val spark: SparkSession = SparkSession.builder()
.appName("SparkUtils")
.master("local[*]")
.enableHiveSupport()
.getOrCreate()

spark.sql(
"""
|select * from stu
|
|""".stripMargin).show()
}
}

三、数据及结果展示

create table userinfo(id int,name string)
row format delimited fields terminated by ',';

load data local inpath '/root/data/userinfo.txt' overwrite into table userinfo;

1,xiaohui
2,xiaowang
3,xiaoyu

Spark2.3.3创建DataFrame的14种方式和源码解析(二)【全网最全】