Sometimes we might have a list of JSON objects for which we’d like to update the values of certain (known) keys.
To understand how to build this UDF we’ll be using the following use-case to exemplify:
I have a list of order details for which I need to display all date fields in
YYYY-MM-DD hh:MM:ss
(PST) format, but the date fields are:
- text fields
- in UTC
This is what the List of JSON looks like:
[
{
"purchase_date": "2022-02-02T22:13:27.407Z",
"order_id": "19553872",
"item": "Pizza",
"delivery_date":"2022-02-02T23:35:07.407Z"
},
{
"purchase_date": "2022-02-08T19:33:21.407Z",
"order_id": "93259653",
"item": "Soup",
"delivery_date":"2022-02-08T21:03:01.407Z"
},
{
"purchase_date": "2022-02-10T19:33:21.407Z",
"order_id": "20259384",
"item": "Noodles",
"delivery_date":"2022-02-10T20:13:12.507Z"
}
]
And we need to display the following fields as MM/DD/YYYY HH:mm (PST):
[
"purchase_date",
"delivery_date"
]
This is what the body of the UDF will be:
FROM
i
IN
json
SELECT
MERGE_OBJECTS(
i,
MERGE_OBJECTS(
FROM
j
IN
date_fields
SELECT
FROM
value,
key
IN
i
WHERE
key = j
SELECT
"{{j}}": FORMAT_DATETIME(
DATETIME_FROM_FORMAT(
value
),
"YYYY-MM-DD hh:MM:ss z",
"en-EN",
tz
)
)
)
Where the inputs are:
-
json
(dataType: List of Any) - the list of JSON objects to transform -
date_fields
(dataType: List of Text) - the list of known JSON keys whose date strings we’ll be transforming -
tz
(dataType: Text) - a string representing the Time Zone to offset the date fields to (for example: America/Los_Angeles)
And the output will be a List of Any (List of JSON objects).
You UDF should look like:
And the output of the UDF will be
[
{
"purchase_date": "2022-02-02 02:02:27 PST",
"order_id": "19553872",
"item": "Pizza",
"delivery_date": "2022-02-02 03:02:07 PST"
},
{
"purchase_date": "2022-02-08 11:02:21 PST",
"order_id": "93259653",
"item": "Soup",
"delivery_date": "2022-02-08 01:02:01 PST"
},
{
"purchase_date": "2022-02-10 11:02:21 PST",
"order_id": "20259384",
"item": "Noodles",
"delivery_date": "2022-02-10 12:02:12 PST"
}
]
BUILDING PROCESS
Let’s break the UDF’s body down to understand how we built it.
-
The first part of the UDF is iterating over the JSON objects in our list, where
i
represents a single JSON object:FROM i IN json SELECT
-
The next bit is there to replace the current key:value pairs of each object with the second argument passed to the
MERGE_OBJECTS()
function, which will be the offset and transformeddate_field
MERGE_OBJECTS( i,
-
The following piece starts with another
MERGE_OBJECTS()
function which is used to create a single JSON object with alldate_fields
transformed. If this was about transforming a single field/ key:value pair we wouldn’t need it.
The second part of it is only iterating over ourdate_fields
list.MERGE_OBJECTS( FROM j IN date_fields SELECT
-
Now onto the actual transformation, the innermost part of our UDF, where
j
is the name of thedate_field
we want to transform (for example:"purchase_date"
):FROM value, key IN i WHERE key = j SELECT "{{j}}": FORMAT_DATETIME( DATETIME_FROM_FORMAT( value ), "YYYY-MM-DD hh:MM:ss z", "en-EN", tz )
TIP: Play around with the UDF to build a UDF that transforms:
- Phone string to an international US phone (dataType: Phone)
- Number string to a decimal number and back to a string (dataType: Text)
What other useful transformations have you had to make in the past? Share them with the Community below!