UDFs - CSV to List of JSON objects transformation

This sample UDF takes a CSV (string of Text) as input and returns a List of JSON objs.

Given the following input (dataType: Text):

my,csv,headers
this,is,one row
this,is,another row
and this,here is,yet another row

Let’s build a UDF that takes the CSV above and returns a list of JSON objects.

First we need a list of the csv headers, which will be used as the JSON keys.

SPLIT(
  JOIN(
    FROM row IN SPLIT(csv, "\n") WHERE index = 0 SELECT row
  ),
  ","
)

The output should look like:

[
  "my",
  "csv",
  "headers"
]

We’ll use a similar logic to get the rows of data, each row as a list of values to be used as the values of one JSON obj.

FROM row IN SPLIT(csv, "\n") WHERE NOT(index = 0) SELECT SPLIT(row, ",")

The output should look like:

[
  [
    "this",
    "is",
    "one row"
  ],
  [
    "this",
    "is",
    "another row"
  ],
  [
    "and this",
    "here is",
    "yet another row"
  ]
]

The last piece of the puzzle is to create one JSON object per each of the lists of values, using the headers list as the keys, and matching the indexes of header <-> value. In the sample below body is the list of lists of values and headers is the list of header values (json keys).

FROM
  row
IN
  body
SELECT
  MERGE_OBJECTS(
    FROM valuex, index IN row SELECT 
       FROM valuez, indez IN headers WHERE indez = index SELECT "{{valuez}}": valuex
  )

Now group all the above into one UDF’s Function Body:

FROM
  row
IN
  FROM rows IN SPLIT(csv, "\n") WHERE NOT(index = 0) SELECT SPLIT(rows, ",")
SELECT
  MERGE_OBJECTS(
    FROM
      valuex,
      index
    IN
      row
    SELECT
      FROM
        valuez,
        indez
      IN
        SPLIT(
          JOIN(
            FROM header IN SPLIT(csv, "\n") WHERE index = 0 SELECT header
          ),
          ","
        )
      WHERE
        indez = index
      SELECT
        "{{valuez}}": valuex
  )

The output of this is:

[
  {
    "my": "this",
    "csv": "is",
    "headers": "one row"
  },
  {
    "my": "this",
    "csv": "is",
    "headers": "another row"
  },
  {
    "my": "and this",
    "csv": "here is",
    "headers": "yet another row"
  }
]
2 Likes

These are awesome, thanks @juanchax!

1 Like