Wednesday, November 4, 2020

Unwrapping JSON to SQL Server Tables

If you know the structure and contents of a JSON document, then it is possible to turn this into one or more relational tables, but even then I dare to you claim that it is easy to tap in a good OpenJSON SELECT statement to do it. If you don’t know what’s in that JSON file, then you’re faced with sweating over a text editor trying to work it all out. You long to just get the contents into a relational table and take it on from there. Even then, You’ve got several struggles before that table appears in the result pane. You must get the path to the tabular data correct, you have to work out the SQL Datatypes, and you need to list the full panoply of keys. Let’s face it: it is a chore. Hopefully, all that is in the past with these helper functions.

What about being able to do this, for example …

EXECUTE TablesFromJSON @TheJSON= N'[
    {"name":"Phil", "email":"PhilipFactor@geeMail.com"},
    {"name":"Geoff", "email":"Geoff2435@geeMail.com"},
    {"name":"Mo", "email":"MoHussain34@geeMail.com"},
    {"name":"Karen", "email":"KarenAlott34@geeMail.com"},
    {"name":"Bob", "email":"bob32@geeMail.com"}
]'

 

…and getting this?

Or if there is more than one table somewhere there…

DECLARE @json NVARCHAR(MAX) =
  N'{
        "id": "0001",
        "type": "donut",
        "name": "Cake",
        "ppu": 0.55,
        "batters":
                {
                        "batter":
                                [
                                        { "id": "1001", "type": "Regular" },
                                        { "id": "1002", "type": "Chocolate" },
                                        { "id": "1003", "type": "Blueberry" },
                                        { "id": "1004", "type": "Devil''s Food" }
                                ]
                },
        "topping":
                [
                        { "id": "5001", "type": "None" },
                        { "id": "5002", "type": "Glazed" },
                        { "id": "5005", "type": "Sugar" },
                        { "id": "5007", "type": "Powdered Sugar" },
                        { "id": "5006", "type": "Chocolate with Sprinkles" },
                        { "id": "5003", "type": "Chocolate" },
                        { "id": "5004", "type": "Maple" }
                ]
}';
EXECUTE TablesFromJson @JSON

 

You may not want table results straight out of your JSON: I’m really just showing off, but I’m going to describe some routines that are useful to me for dealing with JSON import. Your requirements may be more subtle.

The first stage is to get a good representation of a json document so you can work on it in SQL. I do this with a multi-statement table-valued function, but you’ll see that it is a mostly pretty simple unwrapping of the json document. Built into it is a way of gauging the most appropriate SQL Datatype for each value. Unless you adopt JSON Schema, there is no onus on being consistent in assigning values to keys, so you have to test every simple value (i.e. everything other than arrays or objects)

CREATE OR alter FUNCTION [dbo].[UnwrapJson]
/**
summary:   >
  This multi-statement table-valued function talkes a JSON string and
  unwraps it into a relational hierarchy table that also retains
  the path to each element in the JSON document, and calculates the
  best-fit sql datatype fpr every simple value
Author: Phil Factor
Revision: 1.0
date: 1 Nov 2020
example:
  - SELECT * FROM UnwrapJson (N'[  
    {"name":"Phil", "email":"PhilipFactor@gmail.com"},  
    {"name":"Bob", "email":"bob32@gmail.com"}  
    ]')
returns:   >
  id, level, [key], Value, type, SQLDatatype, parent, path
 
**/    
(
    @JSON NVARCHAR(MAX)
)
RETURNS @Unwrapped TABLE 
  (
  [id] INT IDENTITY, --just used to get a unique reference to each json item
  [level] INT, --the hierarchy level
  [key] NVARCHAR(100), --the key or name of the item
  [Value] NVARCHAR(MAX),--the value, if it is a null, int,binary,numeric or string
  type INT, --0 TO 5, the JSON type, null, numeric, string, binary, array or object
  SQLDatatype sysname, --whatever the datatype can be parsed to
  parent INT, --the ID of the parent
  [path] NVARCHAR(4000) --the path as used by OpenJSON
  )
AS begin
INSERT INTO @Unwrapped ([level], [key], Value, type, SQLDatatype, parent,
[path])
VALUES
  (0, --the level
   NULL, --the key,
   @json, --the value,
   CASE WHEN Left(ltrim(@json),1)='[' THEN 4 ELSE 5 END, --the type
   'json', --SQLDataType,
   0 , --no parent
   '$' --base path
  );
DECLARE @ii INT = 0,--the level
@Rowcount INT = -1; --the number of rows from the previous iteration
WHILE @Rowcount <> 0 --while we are still finding levels
  BEGIN
    INSERT INTO @Unwrapped ([level], [key], Value, type, SQLDatatype, parent,
    [path])
      SELECT [level] + 1 AS [level], new.[Key] AS [key],
        new.[Value] AS [value], new.[Type] AS [type],
-- SQL Prompt formatting off
/* in order to determine the datatype of a json value, the best approach is to a determine
the datatype that can be parsed. It JSON, an array of objects can contain attributes that arent
consistent either in their name or value. */
       CASE 
        WHEN new.Type = 0 THEN 'bit null'
                WHEN new.[type] IN (1,2)  then COALESCE(
                  CASE WHEN TRY_CONVERT(INT,new.[value]) IS NOT NULL THEN 'int' END, 
                  CASE WHEN TRY_CONVERT(NUMERIC(14,4),new.[value]) IS NOT NULL THEN 'numeric' END,
                  CASE WHEN TRY_CONVERT(FLOAT,new.[value]) IS NOT NULL THEN 'float' END,
                  CASE WHEN TRY_CONVERT(MONEY,new.[value]) IS NOT NULL THEN 'money' END,
                  CASE WHEN TRY_CONVERT(DateTime,new.[value],126) IS NOT NULL THEN 'Datetime2' END,
                  CASE WHEN TRY_CONVERT(Datetime,new.[value],127) IS NOT NULL THEN 'Datetime2' END,
                  'nvarchar')
           WHEN new.Type = 3 THEN 'bit'
           WHEN new.Type = 5 THEN 'object' ELSE 'array' END AS SQLDatatype,
        old.[id],
        old.[path] + CASE WHEN old.type = 5 THEN '.' + new.[Key] 
                                           ELSE '[' + new.[Key] COLLATE DATABASE_DEFAULT + ']' END AS path
-- SQL Prompt formatting on
      FROM @Unwrapped old
        CROSS APPLY OpenJson(old.[Value]) new
          WHERE old.[level] = @ii AND old.type IN (4, 5);
    SELECT @Rowcount = @@RowCount;
    SELECT @ii = @ii + 1;
  END;
  return
END
go

 

Now, you have the document as a table. You can now unpick this in several ways, but we want to get all the tables that are embedded in the json. These are generally represented as arrays of objects, each of these objects representing a ‘document’ or ‘row’. In JSON, you can, of course, have an object or array as a value for one of the keys within the object; equivalent to storing xml or json in a relational column- but we’re after the simple columns.

Because we are looking for the tables, we can get a quick listing of them. (in reality, you’d probably only want a sample of each if you’re just browsing)

First we place the output of dbo.unwrapJSON into a temporary table. I’ve chosen #jsonObject. Now we can see the rows, if there are any table structures within the JSON.

SELECT parent.path AS [TheTable] ,grandchild.*
      FROM
        (SELECT path, id FROM #jsonObject WHERE type = 4) Parent --start with an array
        INNER JOIN #jsonObject Child
          ON Child.parent = Parent.id AND child.type IN (4, 5) --either array or object
        INNER JOIN #jsonObject GrandChild
          ON GrandChild.parent = Child.id AND GrandChild.type NOT IN (4, 5)

 

Well, that’s fine as far as it goes, but it doesn’t go far enough. What I want is the OpenJSON query that I can execute to get the actual result.

Here is an inline table function that does just that, using the expression I’ve just shown you.

CREATE OR alter FUNCTION [dbo].[OpenJSONExpressions]
/**
summary:   >
  This inline table-valued function talkes a JSON string and
  locates every table structure. Then it creates an OpenJSON
  Statement that can then be executed to create that table
  from the original JSON.
Author: Phil Factor
Revision: 1.0
date: 1 Nov 2020
example:
  - SELECT * FROM OpenJSONExpressions (N'[  
    {"name":"Phil", "email":"PhilipFactor@gmail.com"},  
    {"name":"Bob", "email":"bob32@gmail.com"}  
    ]')
returns:   >
  expression
 
**/    
(
   @JSON NVARCHAR(MAX)
    
)
RETURNS TABLE AS RETURN
(
WITH UnwrappedJSON (id, [level], [key], [Value], [type], SQLDatatype, parent,
                   [path]
                   )
AS (SELECT id, [level], [key], [Value], [type], SQLDatatype, parent, [path]
      FROM dbo.UnwrapJson(@json) )
  SELECT 'Select * from openjson(@json,''' + path + ''')
WITH ('  + String_Agg(
                       [name] + ' ' + datatype + ' ' --the WITH statement
-- SQL Prompt formatting off
   + case when datatype='nvarchar' then '('+length+')' 
     WHEN datatype='numeric' then  '(14,4)' ELSE '' end,', ')
   WITHIN GROUP ( ORDER BY  TheOrder  ASC  )    +')' as expression
-- SQL Prompt formatting on
    FROM
      (
      SELECT Parent.path, GrandChild.[key] AS [name], Min(GrandChild.id) AS TheOrder,
            Max(GrandChild.SQLDatatype) AS datatype,
        Convert(NVARCHAR(100), Max(Len(GrandChild.Value))) AS length
        FROM
          (SELECT path, id FROM UnwrappedJSON WHERE type = 4) Parent
          INNER JOIN UnwrappedJSON Child
            ON Child.parent = Parent.id AND child.type IN (4, 5)
          INNER JOIN UnwrappedJSON GrandChild
            ON GrandChild.parent = Child.id AND GrandChild.type NOT IN (4, 5)
        GROUP BY Parent.path, GrandChild.[key]
      ) TheFields
    GROUP BY path
        )
GO

So we try it out with some JSON that has two table in it.

SELECT * FROM OpenJSONExpressions (
  N'{"employees":[  
    {"name":"Mo", "email":"mojaiswal@gmail.com", "StartDate":"2012-11-07T18:26:20"},  
    {"name":"Bob", "email":"bob456@gmail.com", "StartDate":"2015-06-20"},  
        {"name":"Phil", "email":"PhilipFactor@gmail.com", "StartDate":"2015-08-01T08:05:20"},
    {"name":"Susan", "email":"Su87@me.com", "StartDate":"2012-13-07"}  
],"Customers":[  
    {"name":"The Kamakaze Laxative Company", "contact":"Karen", "email":"Enquiries@KLCEnterprises.co.uk", "CustomerSince":"2012-11-07"},  
    {"name":"GreenFence Softwear limited", "contact":"Dick", "email":"Bob@GreenFenceSoftwear.com", "CustomerSince":"2015-06-20"},  
        {"name":"Grimm and Grivas, Solicitors",  "contact":"Jaz", "email":"GrimmGrivas@gmail.com", "CustomerSince":"2015-08-01T08:05:20"}
]
}  ')

 

From this we get the result …

Which are the following queries…

Select * from openjson(@json,'$.Customers')
WITH (name nvarchar (29), contact nvarchar (5), email nvarchar (30), CustomerSince Datetime2 )

Select * from openjson(@json,'$.employees')
WITH (name nvarchar (5), email nvarchar (22), StartDate nvarchar (19))

Well, those look suspiciously-executable; so we’ll do just that, in the following procedure

go
CREATE PROCEDURE TablesFromJSON @TheJSON NVARCHAR(MAX)
/**
summary:   >
  This procedure returns a table for every one found  in a JSON 
  string 
Author: Phil Factor
Revision: 1.0
date: 1 Nov 2020
example:
  - EXECUTE TablesFromJSON @TheJSON= N'[  
    {"name":"Phil", "email":"PhilipFactor@geeMail.com"},  
    {"name":"Geoff", "email":"Geoff2435@geeMail.com"},
    {"name":"Mo", "email":"MoHussain34@geeMail.com"},
    {"name":"Karen", "email":"KarenAlott34@geeMail.com"},
        {"name":"Bob", "email":"bob32@geeMail.com"}   
    ]'
returns:   >
  expression
 
**/    
AS
DECLARE @expressions TABLE (id INT IDENTITY, TheExpression NVARCHAR(MAX));
INSERT INTO @expressions (TheExpression)
  SELECT expression FROM OpenJSONExpressions(@TheJSON);
DECLARE @RowCount INT = -1, @ii INT = 1, @expressionToExcecute NVARCHAR(MAX);
WHILE @RowCount <> 0
  BEGIN
    SELECT @expressionToExcecute = TheExpression FROM @expressions WHERE id = @ii;
    SELECT @RowCount = @@RowCount;
    SELECT @ii = @ii + 1;
    IF @RowCount > 0
      EXECUTE sp_executesql @expressionToExcecute, N'@JSON NVARCHAR(MAX)',
        @JSON = @TheJSON;
  END;
GO 
So we try it out…
EXECUTE TablesFromJSON  
  N'{"employees":[  
    {"name":"Mo", "email":"mojaiswal@gmail.com", "StartDate":"2012-11-07T18:26:20"},  
    {"name":"Bob", "email":"bob456@gmail.com", "StartDate":"2015-06-20"},  
        {"name":"Phil", "email":"PhilipFactor@gmail.com", "StartDate":"2015-08-01T08:05:20"},
    {"name":"Susan", "email":"Su87@me.com", "StartDate":"2012-13-07"}  
],"Customers":[  
    {"name":"The Kamakaze Laxative Company", "contact":"Karen", "email":"Enquiries@KLCEnterprises.co.uk", "CustomerSince":"2012-11-07"},  
    {"name":"GreenFence Softwear limited", "contact":"Dick", "email":"Bob@GreenFenceSoftwear.com", "CustomerSince":"2015-06-20"},  
        {"name":"Grimm and Grivas, Solicitors",  "contact":"Jaz", "email":"GrimmGrivas@gmail.com", "CustomerSince":"2015-08-01T08:05:20"}
]
}  '

 

And voila! Two results.

So there we have it. I have to admit that the TablesFromJSON procedure isn’t quite so practical as I’d like because it is impossible to get more than one result from a stored procedure within SQL (no problem from an application, of course). It turned out to be very useful in testing all the code out, though.

I use all three routines. I hope that they’re useful to you. If you spot a bug, then let me know in the comments.

These are stored with my JSON/SQL Server routines here Phil-Factor/JSONSQLServerRoutines

 

The post Unwrapping JSON to SQL Server Tables appeared first on Simple Talk.



from Simple Talk https://ift.tt/38iKHx0
via

No comments:

Post a Comment