The technique I will use in this blog can be used to load a graph with existing data you have. In my blog Using a Modeling Tool To Draw and Import a Graph into SQL Server, I demonstrated how to use a free (and nicely featured) tool to draw a graph and then get that data into a set of tables using a Powershell script. Now the task is to get this (or any data) into edges and nodes.
To refresh you memory, using this query:
SELECT * FROM NodeStaging.Node WHERE Node.Filename = 'NodeType-DefaultEdgeType-Sample'; SELECT * FROM NodeStaging.Edge WHERE Edge.Filename = 'NodeType-DefaultEdgeType-Sample' ORDER BY Edge.FromNodeId,Edge.ToNodeId; GO
You can see the data I am going to load:
To load this data into a set of SQL Graph tables, I will create the next two tables:
DROP TABLE IF EXISTS NodeType, DefaultEdgeType; CREATE TABLE NodeType ( NodeId int NOT NULL UNIQUE, NodeName nvarchar(30) NOT NULL ) AS NODE; CREATE TABLE DefaultEdgeType AS EDGE
The first method is to just load the data using straightforward joins. The nodes are simple, just take the nodes and insert them. Adding nodes to a set of graph tables is pretty simple. The tables are just tables, and all the graph internals are done for you.
INSERT INTO dbo.NodeType(NodeId, NodeName) SELECT NodeId, Node.Name FROM NodeStaging.Node WHERE FileName = 'NodeType-DefaultEdgeType-Sample' AND Node.NodeType = 'NodeType';
But the edges are where things get “complicated”, because you need to get the node_id values for the from and to nodes.
WITH BaseRows AS ( --note, for this example it is technically not needed --to do this join, but you usually will need other data. SELECT FromNode.NodeId AS FromNodeId, ToNode.NodeId AS ToNodeId FROM NodeStaging.Edge JOIN NodeStaging.Node AS FromNode ON FromNode.NodeId = Edge.FromNodeId JOIN NodeStaging.Node AS ToNode ON ToNode.NodeId = Edge.ToNodeId --extra where stuff to deal with the fact that I am loading from my copy database WHERE Edge.Filename = 'NodeType-DefaultEdgeType-Sample' AND Edge.EdgeType = 'DefaultEdgeType' AND FromNode.FileName = 'NodeType-DefaultEdgeType-Sample' AND FromNode.NodeType = 'NodeType' AND ToNode.FileName = 'NodeType-DefaultEdgeType-Sample' AND ToNode.NodeType = 'NodeType') INSERT INTO dbo.DefaultEdgeType ($From_id, $to_id) SELECT FromNode.$node_id AS FromNode, ToNode.$node_id AS ToNode FROM BaseRows JOIN dbo.NodeType AS FromNode ON FromNode.NodeId = BaseRows.FromNodeId JOIN dbo.NodeType AS ToNode ON ToNode.NodeId = BaseRows.ToNodeId;
Now though, you a set of nodes you can easily query:
SELECT FromNode.NodeName, ToNode.NodeName FROM dbo.NodeType AS FromNode, dbo.DefaultEdgeType, dbo.NodeType AS ToNode WHERE MATCH(FromNode-(DefaultEdgeType)->ToNode)
This returns:
NodeName NodeName ------------------------------ ------------------------------ 3 4 1 4 2 1 2 3 Node Name 1 4 5 6 5 7 6 7 Possesssive's Node Name Possesssive's Node Name 8
While this method is perfectly acceptable, due to the joins it may not be quite as fast if you are loading a lot of data. Luckily there is an easier way that is not super obvious, but actually makes loading a graph from existing data very easy.
The idea lies in the data structure of the values you see in the special columns. The node_id values look like this:
{"type":"node","schema":"dbo","table":"NodeType","id":0}
As I demonstrated here, that value actually doesn’t exist in the database, but is really just shorthand for a few integer values (which is really annoying when error handling, hence the blog). But this shorthand is something you can exploit to load data because you can form the data yourself.
So I will clear the tables:
DELETE FROM NodeType ; DELETE FROM DefaultEdgeType;
To create the string, we need to add several values together in our insert. You can do this with CONCAT and put together the value, but to simplify, I will create a user-defined function that will return the node mapping value that looks just the string you see in the values:
IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = 'Tools') EXEC ('CREATE SCHEMA Tools') GO CREATE OR ALTER FUNCTION Tools.GraphTable$GenerateId( @Type sysname, --Node or Edge @SchemaName sysname , --Schema of the Node or Edge Table @tableName sysname, --Name of the Node or Edge Table @graphItemId int -- The id of the graph object ) RETURNS nvarchar(2000) AS BEGIN RETURN CONCAT('{"type":"', @Type , '","schema":"', @schemaName, '","table":"',@tableName, '","id":', @graphItemId,'}') END;
For example, say you want a node for the table we are building, like for id =1:
SELECT Tools.[GraphTable$GenerateId] ('node','dbo','NodeType',1);
This returns:
{"type":"node","schema":"dbo","table":"NodeType","id":1}
And luckily, not only can you use this when querying for a value, but you can also create a new row’s graph surrogate value by providing your own. By taking control of the id values, things get a lot easier. Of course, while it will greatly simplify the edge loading, you have one easy step to take when loading a node, providing the id value:
So the insert becomes simply (I am including the id value because we needed it in the previous example and left it in the table):
INSERT INTO dbo.NodeType($Node_id, NodeId, NodeName) SELECT dbo.[GraphTable$GenerateId] ('node','dbo','NodeType',NodeId), NodeId, Node.Name FROM NodeStaging.Node WHERE FileName = 'NodeType-DefaultEdgeType-Sample' AND Node.NodeType = 'NodeType';
Loading the edge, since we have the integer values for the from and to values, and they match the id values we just created. we use the function and output the node values:
INSERT INTO dbo.DefaultEdgeType ($from_id, $to_id) SELECT dbo.[GraphTable$GenerateId] ('node','dbo','NodeType',FromNodeId) AS FromNode, dbo.[GraphTable$GenerateId] ('node','dbo','NodeType',ToNodeId) AS ToNode FROM NodeStaging.Edge WHERE FileName = 'NodeType-DefaultEdgeType-Sample' AND Edge.EdgeType = 'DefaultEdgeType';
Note: There was a bug with inserting your own nodes in SQL Server that was fixed in SQL Server 2019 CU 12.
Run the following code:
SELECT FromNode.NodeName, ToNode.NodeName FROM dbo.NodeType AS FromNode, dbo.DefaultEdgeType, dbo.NodeType AS ToNode WHERE MATCH(FromNode-(DefaultEdgeType)->ToNode);
And you get the same output as before.
What is cool is that you can do this with any of your data that has integer ids. For example, you might load a set of graph tables to analyze your data. For a concrete example, consider the many-to-many relationship between customers and products. There is an example of this in the WideWorldImporters database. It isn’t a straight connection, of course, because the connection goes through a sales order. But once you do the join through the sales order and line items, the process is exactly the same because you have a from and to value ready made:
USE WideWorldImporters GO
Then add the Tools function:
IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = 'Tools') EXEC ('CREATE SCHEMA Tools') GO CREATE OR ALTER FUNCTION Tools.GraphTable$GenerateId( @Type sysname, --Node or Edge @SchemaName sysname , --Schema of the Node or Edge Table @tableName sysname, --Name of the Node or Edge Table @graphItemId int -- The id of the graph object ) RETURNS nvarchar(2000) AS BEGIN RETURN CONCAT('{"type":"', @Type , '","schema":"', @schemaName, '","table":"',@tableName, '","id":', @graphItemId,'}') END;
Next create a couple of node and edge tables to hold the data:
IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = 'AnalysisGraph') EXEC ('CREATE SCHEMA AnalysisGraph') GO CREATE TABLE AnalysisGraph.Customer( CustomerId int NOT NULL, CustomerName nvarchar(100) NOT NULL ) AS NODE; CREATE TABLE AnalysisGraph.Product( ProductId int NOT NULL, ProductName nvarchar(100) NOT NULL ) AS NODE; CREATE TABLE AnalysisGraph.Ordered ( OrderedQuantityCount int NOT NULL --included an attribute to see what product was --ordered most by the connected customer ) AS EDGE;
Now load the data:
Clean the tables before starting, as you will want to refresh all the data, or build some ETL to handle changes for a real database..
TRUNCATE TABLE AnalysisGraph.Customer; TRUNCATE TABLE AnalysisGraph.Product; TRUNCATE TABLE AnalysisGraph.Ordered;
Just as before, using keys from the table, load the customer and product tables.
INSERT INTO AnalysisGraph.Customer($Node_id, CustomerId, CustomerName) SELECT Tools.[GraphTable$GenerateId] ('node','AnalysisGraph','Customer',CustomerId), CustomerId, CustomerName FROM Sales.Customers; INSERT INTO AnalysisGraph.Product($Node_id, ProductId, ProductName) SELECT Tools.[GraphTable$GenerateId] ('node','AnalysisGraph','Product',StockItemId), StockItemId, StockItems.StockItemName FROM Warehouse.StockItems;
In the Edge insert, the CTE aggregates all the orders of the database, so we can pick the related customer that has the most ordered items for a related product.
WITH AggregatedRows AS ( SELECT Customers.CustomerID, OrderLines.StockItemID, SUM(OrderLines.Quantity) AS OrderedQuantityCount FROM Warehouse.StockItems JOIN Sales.OrderLines ON OrderLines.StockItemID = StockItems.StockItemID JOIN Sales.Orders ON Orders.OrderID = OrderLines.OrderID JOIN Sales.Customers ON Customers.CustomerID = Orders.CustomerID GROUP BY Customers.CustomerID, OrderLines.StockItemID) INSERT INTO AnalysisGraph.Ordered ($from_id, $to_id,OrderedQuantityCount) SELECT Tools.[GraphTable$GenerateId] ('node','AnalysisGraph','Customer', AggregatedRows.CustomerId) AS FromNode, Tools.[GraphTable$GenerateId] ('node','AnalysisGraph','Product', AggregatedRows.StockItemId) AS ToNode, AggregatedRows.OrderedQuantityCount FROM AggregatedRows;
Loading these tables on my SQL Server 2019 on my local computer takes around 3 or 4 seconds. It is not a very beefy computer, so this is pretty awesome. There are: 663 customers, 227 products, and 115140 edges created, so this is no slouch. Now you can execute the following query:
SELECT TOP 10 Product.ProductName, customer2.CustomerName,Ordered2.OrderedQuantityCount FROM AnalysisGraph.Customer, AnalysisGraph.Ordered, AnalysisGraph.Product, AnalysisGraph.Ordered AS Ordered2, AnalysisGraph.Customer AS Customer2 WHERE MATCH(Customer-(Ordered)->Product<-(Ordered2)-Customer2) AND Customer.CustomerName = 'Wingtip Toys (Mauldin, SC)' ORDER BY Ordered2.OrderedQuantityCount desc;
From this query you get the following result, showing top 10 products that customers had in common with ‘Wingtip Toys (Mauldin, SC)’ that made the same product orders:
ProductName CustomerName OrderedQuantityCount -------------------------------------------------- -------------------------------- -------------------- Black and orange fragile despatch tape 48mmx100m Kumar Naicker 1800 Black and orange fragile despatch tape 48mmx75m Manca Hrastovsek 1584 Black and orange fragile despatch tape 48mmx75m Wingtip Toys (Straughn, IN) 1548 Black and orange fragile despatch tape 48mmx75m Tailspin Toys (Hodgdon, ME) 1476 Black and orange fragile despatch tape 48mmx75m Kertu Sokk 1440 Black and orange fragile despatch tape 48mmx100m Tailspin Toys (Kalvesta, KS) 1440 Shipping carton (Brown) 500x310x310mm Wingtip Toys (Bourneville, OH) 1400 Black and orange fragile despatch tape 48mmx100m Wingtip Toys (Jamison, IA) 1332 Black and orange fragile despatch tape 48mmx75m Anindya Ghatak 1260 Black and orange fragile despatch tape 48mmx100m Tailspin Toys (Larose, LA) 1260
The post Loading Existing Data Into SQL Graph Objects appeared first on Simple Talk.
from Simple Talk https://ift.tt/3iH4Xxt
via
No comments:
Post a Comment