Generate Array of Values in FOR JSON

There is quite sustainable support for working with JSON format in SQL Server since the 2016 version. But it’s tricky time-to-time to do more complex things, especially building more complex JSON documents from tabular data format. One of these things is how to build an array of values, e.g. we would like to create JSON with tables and for each table list of columns should be available as an array of values.

Let’s try. I will start with the traditional approach first and build a simple nested JSON with two tables and their columns listed. All queries below will work in the msdb system database context so you can try it immediately.

SELECT 
	[t].[object_id] AS 'Id', [t].[name] AS 'Name',
	(
		SELECT [c].[name] AS 'Name'
		FROM [sys].[columns] [c]
		WHERE [c].[object_id] = [t].[object_id]
		ORDER BY [c].[name]
		FOR JSON AUTO
	) AS 'Columns'
FROM [sys].[tables] [t]
WHERE [t].[name] IN ('systaskids', 'syscachedcredentials')
FOR JSON PATH, ROOT ('Tables')

And this is the JSON produced by the query:

{
    "Tables": [
        {
            "Id": 18099105,
            "Name": "syscachedcredentials",
            "Columns": [
                { "Name": "cachedate" },
                { "Name": "has_server_access" },
                { "Name": "is_sysadmin_member" },
                { "Name": "login_name" }
            ]
        },
        {
            "Id": 2099048,
            "Name": "systaskids",
            "Columns": [
                { "Name": "job_id" },
                { "Name": "task_id" }
            ]
        }
    ]
}

We have created nested JSON with the list of columns for each table using a simple subquery with FOR JSON AUTO. There is the Name attribute and you can see that columns are sorted by their original name because the ORDER BY clause is included.

Sure that we can play with it and enrich the nested JSON as needed. E.g. we can add the id of the column, use the FOR JSON PATH and create a JSON part for every single column.

SELECT 
	[t].[object_id] AS 'Id', [t].[name] AS 'Name',
	(
		SELECT [c].[column_id] AS 'Column.Id', [c].[name] AS 'Column.Name'
		FROM [sys].[columns] [c]
		WHERE [c].[object_id] = [t].[object_id]
                ORDER BY [c].[name]
		FOR JSON PATH
	) AS 'Columns'
FROM [sys].[tables] [t]
WHERE [t].[name] IN ('systaskids', 'syscachedcredentials')
FOR JSON PATH, ROOT ('Tables')

JSON created:

{
    "Tables": [
        {
            "Id": 18099105,
            "Name": "syscachedcredentials",
            "Columns": [
                { "Column": { "Id": 4, "Name": "cachedate" } },
                { "Column": { "Id": 2, "Name": "has_server_access" } },
                { "Column": { "Id": 3, "Name": "is_sysadmin_member" } },
                { "Column": { "Id": 1, "Name": "login_name" } }
            ]
        },
        {
            "Id": 2099048,
            "Name": "systaskids",
            "Columns": [
                { "Column": { "Id": 2, "Name": "job_id" } },
                { "Column": { "Id": 1, "Name": "task_id" } }
            ]
        }
    ]
}

But what should we do in a case when we will have just a simple array of values for columns names instead of full inner JSON? This is where things are starting to be a bit more complex.

Let’s try with a STRING_AGG() function and simply aggregate all column names as a separated list of values. You will expect that it will look like this in the JSON:

["cachedate","has_server_access","is_sysadmin_member","login_name"]

This is the output of the STRING_AGG() if we will run it as a standalone statement. But when it’s executed as inner query inside outer query with FOR JSON, then the result is different:

SELECT 
	[t].[object_id] AS 'Id', [t].[name] AS 'Name',
	(
		SELECT CONCAT('["',STRING_AGG([c].[name], '","') WITHIN GROUP(ORDER BY [c].[name]),'"]')
		FROM [sys].[columns] [c]
		WHERE [c].[object_id] = [t].[object_id]
	) AS 'Columns'
FROM [sys].[tables] [t]
WHERE [t].[name] IN ('systaskids', 'syscachedcredentials')
FOR JSON PATH, ROOT ('Tables')

JSON:

{
    "Tables": [
        {
            "Id": 18099105,
            "Name": "syscachedcredentials",
            "Columns": "[\"cachedate\",\"has_server_access\",\"is_sysadmin_member\",\"login_name\"]"
        },
        {
            "Id": 2099048,
            "Name": "systaskids",
            "Columns": "[\"job_id\",\"task_id\"]"
        }
    ]
}

Double quotes are escaped because the whole string is handled like to be a single literal value for the “Columns” property. But this is not what we want. There is a simple, but really not straightforward solution: use JSON_QUERY as a wrapper for the inner query:

SELECT 
	[t].[object_id] AS 'Id', [t].[name] AS 'Name',
	JSON_QUERY(
                     (
			 SELECT CONCAT('["',STRING_AGG([c].[name], '","') WITHIN GROUP(ORDER BY [c].[name]),'"]')
			 FROM [sys].[columns] [c]
			 WHERE [c].[object_id] = [t].[object_id]
		       )
                  ) [Columns]
FROM [sys].[tables] [t]
WHERE [t].[name] IN ('systaskids', 'syscachedcredentials')
FOR JSON PATH, ROOT ('Tables')

JSON:

{
    "Tables": [
        {
            "Id": 18099105,
            "Name": "syscachedcredentials",
            "Columns": [ "cachedate", "has_server_access", "is_sysadmin_member", "login_name" ]
        },
        {
            "Id": 2099048,
            "Name": "systaskids",
            "Columns": [ "job_id", "task_id" ]
        }
    ]
}

It looks great and can be very useful in the case of a large list of values to reduce the overall JSON size.

There is still one small issue we need to take care of. Let’s modify the query with the 1 = 0 condition to simulate that we have a table without columns and the “Columns” array should be empty:

SELECT
	[t].[object_id] AS [Id], [t].[name] AS [Name],
	JSON_QUERY (
			( 
				SELECT	CONCAT ('["', STRING_AGG ([c].[name], '","') WITHIN GROUP(ORDER BY [c].[name]), '"]')
				FROM [sys].[columns] [c]
				WHERE [c].[object_id] = [t].[object_id] 
				      AND 1 = 0
			)
		) AS [Columns]
FROM [sys].[tables] [t]
WHERE [t].[name] IN ( 'systaskids', 'syscachedcredentials' )
FOR JSON PATH, ROOT('Tables')

JSON:

{
    "Tables": [
        {
            "Id": 18099105,
            "Name": "syscachedcredentials",
            "Columns": [""]
        },
        {
            "Id": 2099048,
            "Name": "systaskids",
            "Columns": [""]
        }
    ]
}

This is not really what we need. It’s one column with zero name length instead of an empty array. This is because of the CONCAT() function used which eliminated the NULL values returned from STRING_AGG() and concatenates the ‘[“”]’ string. We can solve it using the CASE:

SELECT
	[t].[object_id] AS [Id], [t].[name] AS [Name],
	JSON_QUERY (
			( 
				SELECT 
					CASE 
						WHEN STRING_AGG ([c].[name], '","') IS NOT NULL 
							THEN CONCAT('["', STRING_AGG ([c].[name], '","') WITHIN GROUP(ORDER BY [c].[name]), '"]')
						ELSE NULL
					END
				FROM [sys].[columns] [c]
				WHERE [c].[object_id] = [t].[object_id] 
				      AND 1 = 0
			)
		) AS [Columns]
FROM [sys].[tables] [t]
WHERE [t].[name] IN ( 'systaskids', 'syscachedcredentials' )
FOR JSON PATH, ROOT('Tables'), INCLUDE_NULL_VALUES

JSON:

{
    "Tables": [
        {
            "Id": 18099105,
            "Name": "syscachedcredentials",
            "Columns": null
        },
        {
            "Id": 2099048,
            "Name": "systaskids",
            "Columns": null
        }
    ]
}

This time “Columns” array isn’t included in the JSON or it is reported as null if the INCLUDE_NULL_VALUES option is used (see the last line marked).

You can solve the same issue with REPLACE() in case you really prefer to have an empty array in the JSON:

SELECT
	[t].[object_id] AS [Id], [t].[name] AS [Name],
	JSON_QUERY (
			REPLACE(	
				( 
					SELECT	CONCAT ('["', STRING_AGG ([c].[name], '","') WITHIN GROUP(ORDER BY [c].[name]), '"]')
					FROM [sys].[columns] [c]
					WHERE [c].[object_id] = [t].[object_id] 
					      AND 1 = 0
				), '[""]', '[]'
			)
		) AS [Columns]
FROM [sys].[tables] [t]
WHERE [t].[name] IN ( 'systaskids', 'syscachedcredentials' )
FOR JSON PATH, ROOT('Tables')

JSON:

{
    "Tables": [
        {
            "Id": 18099105,
            "Name": "syscachedcredentials",
            "Columns": []
        },
        {
            "Id": 2099048,
            "Name": "systaskids",
            "Columns": []
        }
    ]
}

With all that in mind, you can generate nested JSONs as needed quite effectively.

Generate Array of Values in FOR JSON

Leave a Reply Cancel reply

Most popular

Scalar Functions and Parallel Execution Plans

Extracting Metadata from the Power BI Report File

Using Binary Hash as an Alternate Record Key

Run your first U-SQL script in Azure Data Lake (step-by-step tutorial)

Generate Array of Values in FOR JSON

Leave a Reply Cancel reply

Most popular

Scalar Functions and Parallel Execution Plans

Extracting Metadata from the Power BI Report File

Using Binary Hash as an Alternate Record Key

Run your first U-SQL script in Azure Data Lake (step-by-step tutorial)

Tags