scala-将spark中的字符串 array transformation为字节 array并使用udfs将其检索回来


0

我正在尝试在spark中将字符串 array transformation为字节 array,并将字节 array重新 transformation为字符串 array。

但是,我并没有按我的意愿取回字符串 array。这是密码-

// UDFs for converting Array[String] to byte array and get back Array[String] from byte array
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.databind.ObjectMapper 

val mapper: ObjectMapper = new ObjectMapper
mapper.registerModule(DefaultScalaModule)

val convertToByteArray = udf((map: Seq[String]) => mapper.writeValueAsBytes(map))
val convertToString = udf((a: Array[Byte])=> new String(a))

val arrayDF = Seq(
("x100", Array("p1","p2","p3","p4"))
).toDF("id", "myarray")
arrayDF.printSchema()
root
|-- id: string (nullable = true)
|-- myarray: array (nullable = true)
| |-- element: string (containsNull = true)
arrayDF.show(false)
+----+----------------+
|id |myarray |
+----+----------------+
|x100|[p1, p2, p3, p4]|
+----+----------------+

val converted = arrayDF.withColumn("bytearray", convertToByteArray($"myarray")).select($"id",$"bytearray")
converted.printSchema()
root
|-- id: string (nullable = true)
|-- bytearray: binary (nullable = true)
converted.show(false)
+----+----------------------------------------------------------------+
|id |bytearray |
+----+----------------------------------------------------------------+
|x100|[5B 22 70 31 22 2C 22 70 32 22 2C 22 70 33 22 2C 22 70 34 22 5D]|
+----+----------------------------------------------------------------+

val getBack = converted.withColumn("getstring", convertToString($"bytearray"))
getBack.printSchema()
root
|-- id: string (nullable = true)
|-- bytearray: binary (nullable = true)
|-- getstring: string (nullable = true)
getBack.show(false)
+----+----------------------------------------------------------------+---------------------+
|id |bytearray |getstring |
+----+----------------------------------------------------------------+---------------------+
|x100|[5B 22 70 31 22 2C 22 70 32 22 2C 22 70 33 22 2C 22 70 34 22 5D]|["p1","p2","p3","p4"]|
+----+----------------------------------------------------------------+---------------------+

但是,我希望我的最终结果是-

+----+----------------------------------------------------------------+---------------------+
|id  |bytearray                                                       |getstring            |
+----+----------------------------------------------------------------+---------------------+
|x100|[5B 22 70 31 22 2C 22 70 32 22 2C 22 70 33 22 2C 22 70 34 22 5D]|[p1,p2,p3,p4]|
+----+----------------------------------------------------------------+---------------------+

下面是用于创建字节 array的pom.xml

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-core</artifactId>
    <version>2.9.5</version>
</dependency>

1 答案

0

您获取一个字符串列表并将其视为单个对象,然后在 transformation后将其视为一个字符串-如果要将单个字符串还原,还需要将该列表 transformation为字符串:

val convertToByteArray = udf((map: Seq[String]) => mapper.writeValueAsBytes(map.mkString("[",",","]")))

我来回答