Skip to content

[SPARK-AVRO] Dataproc Serverless runtime 3.0 (Spark 4.0.1): AvroFileFormat V1 shim compiled for Scala 2.12 causes ClassNotFoundException on Scala 2.13 runtime #56385

@krisztiansala

Description

@krisztiansala

Summary

When using df.write.format("avro").save(path) on Dataproc Serverless runtime 3.0 (Spark 4.0.1, Scala 2.13), every avro write fails with:

java.lang.NoClassDefFoundError: scala/collection/immutable/StringOps
    at org.apache.spark.sql.avro.AvroFileFormat.supportFieldName(AvroFileFormat.scala:163)
    at org.apache.spark.sql.execution.datasources.DataSourceUtils$.$anonfun$checkFieldNames$1(DataSourceUtils.scala:74)
    ...
Caused by: java.lang.ClassNotFoundException: scala.collection.immutable.StringOps

Root cause

scala.collection.immutable.StringOps exists as a class in Scala 2.12 but was moved to scala.collection.StringOps in Scala 2.13 — scala.collection.immutable.StringOps is only a type alias (no .class file) in 2.13.

The AvroFileFormat.supportFieldName method referenced in the stack trace is not present in spark-avro_2.13-4.0.0.jar from Maven Central (Spark 4.0 migrated spark-avro to DataSource V2). The class loading from the Dataproc Serverless runtime 3.0's internal JAR bundle, which contains a AvroFileFormat compiled against Scala 2.12 while the runtime stdlib is Scala 2.13.

In other words: the runtime ships a Scala 2.12-compiled V1 compatibility shim for AvroFileFormat in a Scala 2.13 environment, causing class loading to fail at the first String operation inside the shim.

Reproduction

On Dataproc Serverless runtime 3.0 (Spark 4.0.1), submit any PySpark batch that writes a DataFrame in avro format:

df.write.mode("overwrite").format("avro").save("gs://my-bucket/output/")

Fails immediately with the ClassNotFoundException above.

  • Workaround: use runtime 2.3 (Spark 3.5) instead — avro writes succeed.
  • Supplying an external spark-avro_2.13-4.0.0.jar does not help; the runtime's internal Scala 2.12 AvroFileFormat is still picked up by DataSourceUtils.checkFieldNames.

Environment

  • Dataproc Serverless runtime: 3.0.13 (latest as of 2026-06)
  • Spark version: 4.0.1
  • Scala runtime: 2.13 (confirmed by runtime 3.0 docs)
  • External spark-avro JAR: spark-avro_2.13-4.0.0 (does NOT contain AvroFileFormat — only V2 classes in org.apache.spark.sql.v2.avro.*)
  • Runtime 2.3 (Spark 3.5, Scala 2.13) with spark-avro_2.13-3.5.5.jar: works correctly

Expected behavior

Avro format writes should work on Dataproc Serverless runtime 3.0 (Spark 4.0.1 + Scala 2.13) without ClassNotFoundException.

Suggested fix

Ensure the AvroFileFormat V1 compatibility shim bundled inside the Spark 4.0 / Dataproc runtime 3.0 distribution is compiled against Scala 2.13 (referencing scala.collection.StringOps, not scala.collection.immutable.StringOps).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions