spark写hudi:NoSuchMethodError: org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V

前提

Hudi version : 0.9.0
Spark version : 3.1.2
Hive version : 2.1.1-cdh6.3.2
Hadoop verapache安装与配置sion : 3.0.0-cdh6.3.2

报错日志:

scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
|   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
|   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
|   option(TABLE_NAME, tableName).
|   mode(Overwrite).
|   save(basePath)
warning: there was one deprecation warning; for details, enable `:setting -deprecation' or `:replay -deprecation'
21/09/18 16:52:33 WARN hudi.HoodieSparkSqlWriter$: hoodie table at file:/tmp/hudi_trips_cow already exists. Deleting existing data & overwriting with new data.
java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V
at io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50)
at io.javalin.Javalin.<init>(Javalin.java:94)
at io.javalin.Javalin.create(Javalin.java:107)
at org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:270)
at org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:94)
at org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.startTimelineService(EmbeddedTimelineServerHelper.java:71)
at org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.createEmbeddedTimelineService(EmbeddedTimelineServerHelper.java:58)
at org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:109)
at org.apache.hudi.client.AbstractHoodieClient.<init>(AbstractHoodieClient.java:77)
at org.apache.hudi.client.AbstractHoodieWriteClient.<init>(AbstractHoodieWriteClient.java:133)
at org.apache.hudi.client.AbstractHoodieWriteClient.<init>(AbstractHoodieWriteClient.java:121)
at org.apache.hudi.client.SparkRDDWriteClient.<init>(SparkRDDWriteClient.java:82)
at org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:201)
at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$write$8(HoodieSparkSqlWriter.scala:247)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:246)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
... 66 elided

分析原因

这是因为我构建hudi 的时候,使用hadoop 3.0.0编译的,
hadoop3.0.0 提供的是Jetty 9.3; Hudilinux重启命令 依赖于 Jetty 9.4 ( SessionHandler.setHttpOnly() 在9.3版本中并不存在).
使用hudi 0.9.0版本内置的hadoop 2.7.3编译则不存在这个问题。

解决办法

hoodie.embed.timeline.server=false

这个参数的意思是,当设置为true时,会在每个writer的driver进程上启动一个timeline server 的一个实例(用户缓存及统计文件apache和tomcat区别)。

所以代码改为这样:

scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
|   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
|   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
|   option(TABLE_NAME, tableName).
|   option("hoodie.embed.timeline.server","false").
|   mode(Overwrite).
|   save(basePath)

然后就可以正常写入了。

scala> spark.sql("select fare, begin_lon, begin_lat, ts from  hudi_trips_snapshot where fare > 20.0").show()
+------------------+-------------------+-------------------+-------------+
|              fare|          begin_lon|          begin_lat|           ts|
+------------------+-------------------+-------------------+-------------+
| 27.79478688582596| 0.6273212202489661|0.11488393157088261|1631541003762|
| 64.27696295884016| 0.4923479652912024| 0.5731835407930634|1631379617014|
| 93.56018115236618|0.14285051259466197|0.21624150367601136|1631871937510|
| 33.92216483948643| 0.9694586417848392| 0.1856488085068272|1631763276616|
| 66.62084366450246|0.03844104444445928| 0.0750588760043035|1631925783792|
|  43.4923811219014| 0.8779402295427752| 0.6100070562136587|1631531656847|
|34.158284716382845|0.46157858450465483| 0.4726905879569653|1631363440080|
| 41.06290929046368| 0.8192868687714224|  0.651058505660742|1631830879614|
+------------------+-------------------+-------------------+-------------+
scala> spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, rider, driver, fare from  hudi_trips_snapshot").show()
+-------------------+--------------------+----------------------+---------+----------+------------------+
|_hoodie_commit_time|  _hoodie_record_key|_hoodie_partition_path|    rider|    driver|              fare|
+-------------------+--------------------+----------------------+---------+----------+------------------+
|     20210918170241|b162f5a9-144e-47e...|  americas/united_s...|rider-213|driver-213| 27.79478688582596|
|     20210918170241|bcfb6b98-e269-488...|  americas/united_s...|rider-213|driver-213| 64.27696295884016|
|     20210918170241|e6d3a988-6985-442...|  americas/united_s...|rider-213|driver-213|19.179139106643607|
|     20210918170241|d09ef0a1-eb39-416...|  americas/united_s...|rider-213|driver-213| 93.56018115236618|
|     20210918170241|c3febd37-1f52-423...|  americas/united_s...|rider-213|driver-213| 33.92216483948643|
|     20210918170241|254c8a0f-1611-45e...|  americas/brazil/s...|rider-213|driver-213| 66.62084366450246|
|     20210918170241|914e6c7f-aafb-496...|  americas/brazil/s...|rider-213|driver-213|  43.4923811219014|
|     20210918170241|56035495-4562-40d...|  americas/brazil/s...|rider-213|driver-213|34.158284716382845|
|     20210918170241|692405c4-e969-460...|    asia/india/chennai|rider-213|driver-213|17.851135255091155|
|     20210918170241|2b25339f-504c-4c3...|    asia/india/chennai|rider-213|driver-213| 41.06290929046368|
+-------------------+--------------------+----------------------+---------+----------+------------------+