Friday, March 12, 2021

Ways to get and deal with invalid node identifiers in SQL Server Edge references

One of the weirder things about graph tables in SQL Server is the mysterious backend implementation. The values you see when working with these objects look like this: {“type”:”node”,”schema”:”dbo”,”table”:”Node1″,”id”:0}, but when you get errors, you don’t see the json, you see what this represents (two integers for the object_id and unique id in the graph table, as I showed in this blog). Since all of this happens in the engine, the values you see in the JSON, you kind of expect would have to correspond to something real. But this is not always the case. There are two ways (that I know of) that this can happen. The first way is when you delete nodes that had edge references but not constraint. The next way is the just insert invalid (or at least, currently invalid data).

For example, take these two node objects, and 2 edges:

CREATE TABLE dbo.Node1(Name varchar(20)) AS NODE;
CREATE TABLE dbo.Node2(Name varchar(20)) AS NODE;

For one edge table, I will create it to allow any nodes to connect:

CREATE TABLE dbo.Edge1 AS EDGE;

And then another edge, but this one will have 2 edge conditions defined to use cascading deletes (so if either node is deleted, the edge is removed).

CREATE TABLE dbo.Edge2
(
        CONSTRAINT EC_Edge2 CONNECTION 
                (dbo.Node2 TO dbo.Node2, 
                 dbo.Node1 TO dbo.Node2) ON DELETE CASCADE
) AS EDGE;

Next I will add a couple of nodes to each node table:

INSERT INTO dbo.Node1(Name)
VALUES('One'),('Two');

INSERT INTO dbo.Node2(Name)
VALUES('Buckle'),('Shoe');

Take a look at the data that has been created:

SELECT *
FROM   dbo.Node1;
SELECT *
FROM   dbo.Node2;

Each row output has the node_id value (with a unique name amongst all other tables.

$node_id_1E3D804E50C142CEA8041BF16862E7AC                Name
-------------------------------------------------------- --------------------
{"type":"node","schema":"dbo","table":"Node1","id":0}    One
{"type":"node","schema":"dbo","table":"Node1","id":1}    Two
$node_id_28039674CFE4498FB0B24CDAD854F145                Name
-------------------------------------------------------- --------------------
{"type":"node","schema":"dbo","table":"Node2","id":0}    Buckle
{"type":"node","schema":"dbo","table":"Node2","id":1}    Shoe

Using those values, we can establish a link in both edges, with the same id values. Edge1:

INSERT INTO dbo.Edge1($from_id, $to_id)
SELECT (SELECT $node_id FROM dbo.Node2 WHERE name = 'Buckle'),
           (SELECT $node_id FROM dbo.Node2 WHERE name = 'Shoe');

INSERT INTO dbo.Edge1($from_id, $to_id)
SELECT (SELECT $node_id FROM dbo.Node1 WHERE name = 'One'),
           (SELECT $node_id FROM dbo.Node2 WHERE name = 'Buckle');

Then edge2:

INSERT INTO dbo.Edge2($from_id, $to_id)
SELECT (SELECT $node_id FROM dbo.Node2 WHERE name = 'Buckle'),
           (SELECT $node_id FROM dbo.Node2 WHERE name = 'Shoe');

INSERT INTO dbo.Edge2($from_id, $to_id)
SELECT (SELECT $node_id FROM dbo.Node1 WHERE name = 'One'),
           (SELECT $node_id FROM dbo.Node2 WHERE name = 'Buckle');

Looking at the data in Edge1:

SELECT Node2.Name AS FromNode, LinkedTo.Name AS ToNode
FROM  dbo.Node2,dbo.Edge1,dbo.Node2 AS LinkedTo
WHERE MATCH(Node2-(Edge1)->LinkedTo);

This returns:

FromNode             ToNode
-------------------- --------------------
Buckle               Shoe

Now we delete a node:

DELETE FROM dbo.Node2
WHERE  Node2.Name = 'Buckle';

Now re-execute the previous query, and you get a seemingly strange result:

FromNode             ToNode
-------------------- --------------------
NULL                 Shoe

If you look at the data in the tables, it is evident what has occurred:

SELECT *
FROM   dbo.Node2;

SELECT *
FROM   dbo.Edge1;

This returns:

$node_id_9163A449BE314C8CAA08E02FF3F1FE3E               Name
------------------------------------------------------- --------------------
{"type":"node","schema":"dbo","table":"Node2","id":1}   Shoe

$edge_id_F8D6E993D86E40A592E94A7E9C08EE99               $from_id_7B55F84E631C4739A4941A6768F1370D              Continued Below
------------------------------------------------------- -------------------------------------------------------
{"type":"edge","schema":"dbo","table":"Edge1","id":0}   {"type":"node","schema":"dbo","table":"Node2","id":0}  
{"type":"edge","schema":"dbo","table":"Edge1","id":1}   {"type":"node","schema":"dbo","table":"Node1","id":0}  
            $to_id_C170E534EA6348B488B4927A955C02CD
            ----------------------------------------------------------
            {"type":"node","schema":"dbo","table":"Node2","id":1}
            {"type":"node","schema":"dbo","table":"Node2","id":0}

You can see in the $from_id and $to_id that there is an id:0 that doesn’t exist in the table. If you do this with the other edge, you will see that because of the CASCADE connection, that the edge is removed. Neither of the following query returns data:

SELECT Node2.Name AS FromNode, LinkedTo.Name AS ToNode
FROM  dbo.Node2,dbo.Edge2,dbo.Node2 AS LinkedTo
WHERE MATCH(Node2-(Edge2)->LinkedTo);

SELECT *
FROM   dbo.Edge2;

Going back to the Edge1 object, we are missing the Buckle node, but if you try to add it back:

INSERT INTO dbo.Node2(Name)
VALUES('Buckle');

But even after you add back the row, the orphaned key value in the $from_id and $to_id values is not added so the following query still returns NULL for the Buckle side of the result:

SELECT Node2.Name AS FromNode, LinkedTo.Name AS ToNode
FROM  dbo.Node2,dbo.Edge1,dbo.Node2 AS LinkedTo
WHERE MATCH(Node2-(Edge1)->LinkedTo);

If you want to add back that data in exactly the form it came as, you have to actually create that value manually by providing the node_id to the insert, using the format that you get from the output of the queries:

DELETE FROM dbo.Node2 WHERE Name = 'Buckle';

INSERT INTO dbo.Node2($Node_id, Name)
VALUES ('{"type":"node","schema":"dbo","table":"Node2","id":0}','Buckle')

Now you see the following:

SELECT *
FROM   dbo.Node2;

This returns:

$node_id_7AEF81800C7C46808238FA0683232FDA              Name
------------------------------------------------------ -------
{"type":"node","schema":"dbo","table":"Node2","id":0}  Buckle
{"type":"node","schema":"dbo","table":"Node2","id":1}  Shoe

And the MATCH works too:

SELECT Node2.Name AS FromNode, LinkedTo.Name AS ToNode
FROM  dbo.Node2,dbo.Edge1,dbo.Node2 AS LinkedTo
WHERE MATCH(Node2-(Edge1)->LinkedTo);

The row is back:

FromNode             ToNode
-------------------- --------------------
Buckle               Shoe

In a future blog, I will extend this concept of inserting your own nodes and show the value\limitation of the process.

The post Ways to get and deal with invalid node identifiers in SQL Server Edge references appeared first on Simple Talk.



from Simple Talk https://ift.tt/2OmgRjG
via

No comments:

Post a Comment