UDFs - Recursive Processing of pre-selected JSON fields

Sometimes we might have a list of JSON objects for which we’d like to update the values of certain (known) keys.

To understand how to build this UDF we’ll be using the following use-case to exemplify:

I have a list of order details for which I need to display all date fields in YYYY-MM-DD hh:MM:ss (PST) format, but the date fields are:

  • text fields
  • in UTC

This is what the List of JSON looks like:

[
  {
    "purchase_date": "2022-02-02T22:13:27.407Z",
    "order_id": "19553872",
    "item": "Pizza",
    "delivery_date":"2022-02-02T23:35:07.407Z"
  },
{
    "purchase_date": "2022-02-08T19:33:21.407Z",
    "order_id": "93259653",
    "item": "Soup",
    "delivery_date":"2022-02-08T21:03:01.407Z"
  },
  {
    "purchase_date": "2022-02-10T19:33:21.407Z",
    "order_id": "20259384",
    "item": "Noodles",
    "delivery_date":"2022-02-10T20:13:12.507Z"
  }
]

And we need to display the following fields as MM/DD/YYYY HH:mm (PST):

[
  "purchase_date",
  "delivery_date"
]

This is what the body of the UDF will be:

FROM
  i
IN
  json
SELECT
MERGE_OBJECTS(
  i,
  MERGE_OBJECTS(
    FROM
      j
    IN
      date_fields
    SELECT
      FROM
        value,
        key
      IN
        i
      WHERE
        key = j
      SELECT
        "{{j}}": FORMAT_DATETIME(
          DATETIME_FROM_FORMAT(
            value
          ),
          "YYYY-MM-DD hh:MM:ss z",
          "en-EN",
          tz
        )
  )
)

Where the inputs are:

  • json (dataType: List of Any) - the list of JSON objects to transform
  • date_fields (dataType: List of Text) - the list of known JSON keys whose date strings we’ll be transforming
  • tz (dataType: Text) - a string representing the Time Zone to offset the date fields to (for example: America/Los_Angeles)

And the output will be a List of Any (List of JSON objects).

You UDF should look like:

And the output of the UDF will be

[
  {
    "purchase_date": "2022-02-02 02:02:27 PST",
    "order_id": "19553872",
    "item": "Pizza",
    "delivery_date": "2022-02-02 03:02:07 PST"
  },
  {
    "purchase_date": "2022-02-08 11:02:21 PST",
    "order_id": "93259653",
    "item": "Soup",
    "delivery_date": "2022-02-08 01:02:01 PST"
  },
  {
    "purchase_date": "2022-02-10 11:02:21 PST",
    "order_id": "20259384",
    "item": "Noodles",
    "delivery_date": "2022-02-10 12:02:12 PST"
  }
]

BUILDING PROCESS

Let’s break the UDF’s body down to understand how we built it.

  1. The first part of the UDF is iterating over the JSON objects in our list, where i represents a single JSON object:

    FROM
      i
    IN
      json
    SELECT
    
  2. The next bit is there to replace the current key:value pairs of each object with the second argument passed to the MERGE_OBJECTS() function, which will be the offset and transformed date_field

    MERGE_OBJECTS(
        i,
    
  3. The following piece starts with another MERGE_OBJECTS() function which is used to create a single JSON object with all date_fields transformed. If this was about transforming a single field/ key:value pair we wouldn’t need it.
    The second part of it is only iterating over our date_fields list.

    MERGE_OBJECTS(
          FROM
            j
          IN
            date_fields
          SELECT
    
  4. Now onto the actual transformation, the innermost part of our UDF, where j is the name of the date_field we want to transform (for example: "purchase_date"):

    FROM
      value,
      key
    IN
      i
    WHERE
      key = j
    SELECT
      "{{j}}": FORMAT_DATETIME(
        DATETIME_FROM_FORMAT(
          value
            ),
            "YYYY-MM-DD hh:MM:ss z",
            "en-EN",
            tz
          )
    

TIP: Play around with the UDF to build a UDF that transforms:

  • Phone string to an international US phone (dataType: Phone)
  • Number string to a decimal number and back to a string (dataType: Text)

What other useful transformations have you had to make in the past? Share them with the Community below!

1 Like