Friday, March 31, 2023

GENERATE_SERIES: My new go-to to build sets

I have come across a lot of use cases for manufacturing rows on the fly, aside from the common goal of populating a large data set such as a numbers or calendar table. A few favorites include building sample data, pivoting an unknown number of columns, data extrapolation, and filling gaps in date or time ranges.

If you are on SQL Server 2022 or Azure SQL Database, or have been reading up on new features, you’ve likely heard about one of the better T-SQL enhancements: a new built-in function called GENERATE_SERIES. The syntax is straightforward – it accepts arguments for start and stop, and an optional argument to indicate step (in case you want to iterate by more than 1, or backwards):

SELECT value FROM GENERATE_SERIES(<start>, <stop> [, <step>]);

A few quick examples:

/* count to 6 */ 
 SELECT [1-6] = value 
   FROM GENERATE_SERIES(1, 6);

 /* count by 5s to 30 */
 SELECT [step 5] = value 
   FROM GENERATE_SERIES(5, 30, 5);

 /* count from 10 to 0, backwards, by 2 */
 SELECT [backward] = value
   FROM GENERATE_SERIES(10, 0, -2);

 /* get all the days in a range, inclusive */
 DECLARE @start date = '20230401',
         @end   date = '20230406';

 SELECT [days in range]  = DATEADD(DAY, value, @start)
   FROM GENERATE_SERIES(0, DATEDIFF(DAY, @start, @end));

Results:

1-6
(first resultset)
  step 5
(second resultset)
  backward
(third resultset)
  days in range
(fourth resultset)
1 5 10 2023-04-01
2 10 8 2023-04-02
3 15 6 2023-04-03
4 20 4 2023-04-04
5 25 2 2023-04-05
6 30 0 2023-04-06
 

That is some handy syntax that is quite easy to use. I dug in more about it during the beta, but…

How would we do this on older versions of SQL Server?

We’ve been generating sets since before SQL Server was SQL Server, so we’ve always found a way. Some approaches are cryptic, and some perform poorly; others are cryptic and perform poorly. I have two that I like: one that works in SQL Server 2016 and above, and one that works all the way back to SQL Server 2008. There are others (even some that will work on SQL Server 2000), but these are the two I want to focus on today.

I’m going to present both techniques as inline table-valued functions, since the logic is complicated enough to justify encapsulation, and that also happens to keep demos nice and tidy. These will be written to accommodate a series of up to 4,000 values – we can certainly go beyond that, but exceeding 8,001 values leads to the first solution requiring LOB support, which can do unpredictable things to performance. The second is capped at 4,096 values because it is the highest power of 4 that is also less than 8,001; you’ll see why that’s important in a moment.

2016+ STRING_SPLIT + REPLICATE

This one is a rather recent addition to my toolbox; I don’t recall where I first came across it, but I like it because it’s concise without being overly opaque. We determine the number of values we want in our sequence, less one – which is the stop minus the start. We use REPLICATE to generate a string that is a sequence of that many commas. Then we split that string using STRING_SPLIT, which results in { stop - start + 1 } empty strings. We then apply a ROW_NUMBER() to the output, which serves as our series. Since our starting value might not be 1, we add it to the row number, and subtract 1.

To get started, I will create a new database named GenSeries to put the sample code.

CREATE FUNCTION dbo.GenerateSeries_Split
 (
   @start int,
   @stop  int
 )
 RETURNS TABLE WITH SCHEMABINDING
 AS
   RETURN
   (
     SELECT TOP (@stop - @start + 1) 
       value = ROW_NUMBER() OVER (ORDER BY @@SPID) + @start - 1
     FROM STRING_SPLIT(REPLICATE(',', @stop - @start), ',')
     ORDER BY value
   );

To support a range greater than 8,001 values, you can change this line:

FROM STRING_SPLIT(REPLICATE(CONVERT(varchar(max),','), @stop - @start), ',')

…but that’s not the version I’m going to test today.

2008+ Cross-Joined CTEs

This solution reaches further back into most of the unsupported versions of SQL Server you might still be clinging to but, unfortunately, it is a little more cryptic. I remember first using it in this solution after discovering this really efficient implementation by Jonathan Roberts.

CREATE FUNCTION dbo.GenerateSeries_CTEs
 (
   @start int,
   @stop  int
 )
 RETURNS TABLE WITH SCHEMABINDING 
 AS 
   RETURN
   (
     /* could work in 2005 by changing VALUES to a UNION ALL */
     WITH n(n) AS (SELECT 0 FROM (VALUES (0),(0),(0),(0)) n(n)),
      i4096(n) AS (SELECT 0 FROM n a, n b, n c, n d, n e, n f)      
     SELECT TOP (@stop - @start + 1) 
       value = ROW_NUMBER() OVER (ORDER BY @@TRANCOUNT) + @start - 1 
     FROM i4096
     ORDER BY value
   );

This approach uses two CTEs – one that just generates 4 rows using a VALUES constructor; the second one cross joins it to itself, however many times is necessary to cover the range of values you need to support. (In our case, we want to support 4,000 values.)

Each time you cross join the original set of 4, you produce a Cartesian product of 4^n, where n is bumped by 1 for each new reference. So if you just named it once, you’d have 4^1, which is 4. The second reference is 4^2, which is 16. Then 4^3 = 64, 4^4 = 256, 4^5 = 1,024, and 4^6 = 4,096. I’ll try to illustrate in an image:

Explaining cross join powers of 4

If you only need to support 256 values, for example, then you could change that second line to stop at the 4th cross join:

i256(n) AS (SELECT 0 FROM n a, n b, n c, n d)

And if you needed more than 4,096 values – say, up to 16,384 – you would instead just add one additional cross join:

i16K(n) AS (SELECT 0 FROM n a, n b, n c, n d, n e, n f, n g)

And of course you can be more verbose and self-documenting. Technically, I would want to write the following, it’s just a lot more to digest on first glance:

i4096(n) AS 
        (
           SELECT 0 FROM n AS n4    CROSS JOIN n AS n16 
              CROSS JOIN n AS n64   CROSS JOIN n AS n256
              CROSS JOIN n AS n1024 CROSS JOIN n AS n4096
           /* ... */
        )

You could also code defensively and alter the parameters to smallint or tinyint to prevent surprises when someone uses an int value that is too large and they don’t get the full set they expect. This won’t raise an error, unless you also add additional handling, say, to divide by 0 somewhere if the range is too large. Keep in mind that someone could try to generate 100 rows by passing in a start parameter of 2,000,000,000 and a stop parameter of 2,000,000,100 – so restricting either input value instead of the difference might be unnecessarily limiting.

I often see recursive CTEs suggested for set generation, since they are a little less cryptic than this, and are somewhat self-documenting (if you already understand recursive CTEs, I suppose). I do like recursive CTEs generally, and have offered them up in many posts and answers, but they’re not ideal for broad consumption in this context unless you will never retrieve more than 100 rows (say, generating the days for a monthly report). This is because you will need a MAXRECURSION query hint to produce more than 100 values; since you can’t put that hint inside a function, it means you have to put it on every outer query that references the function. Ick! So much for encapsulation.

So how do they perform?

I thought about the simplest test I can do to pit different number generation techniques against each other, and the first that came to mind involves pagination. (Note: This is a contrived use case and not intended to be a discussion about the best ways to paginate data.)

In the GenSeries database, I will create a simple table with 4,000 rows:

SELECT TOP (4000) rn = IDENTITY(int,1,1),*
 INTO dbo.things FROM sys.all_columns;

 CREATE UNIQUE CLUSTERED INDEX cix_things ON dbo.things(rn);

Then I created three stored procedures. One that uses the split approach:

CREATE OR ALTER PROCEDURE dbo.PaginateCols_Split
   @PageSize int = 100,
   @PageNum  int = 1
 AS
 BEGIN
   SET NOCOUNT ON;

   DECLARE @s int = (@PageNum-1) * @PageSize + 1;
   DECLARE @e int = @s + @PageSize - 1;

   WITH r(rn) AS
   (
     SELECT TOP (@PageSize) rn = value
     FROM dbo.GenerateSeries_Split(@s, @e)
   )
   SELECT t.* FROM dbo.things AS t 
   INNER JOIN r ON t.rn = r.rn;
 END

One that uses stacked CTEs:

CREATE OR ALTER PROCEDURE dbo.PaginateCols_CTEs
   @PageSize int = 100,
   @PageNum  int = 1
 AS
 BEGIN
   SET NOCOUNT ON;

   DECLARE @s int = (@PageNum-1) * @PageSize + 1;
   DECLARE @e int = @s + @PageSize - 1;

   WITH r(rn) AS
   (
     SELECT TOP (@PageSize) rn = value
     FROM dbo.GenerateSeries_CTEs(@s, @e)
   )
   SELECT t.* FROM dbo.things AS t 
   INNER JOIN r ON t.rn = r.rn;
 END

And one that uses GENERATE_SERIES directly:

CREATE OR ALTER PROCEDURE dbo.PaginateCols_GenSeries
   @PageSize int = 100,
   @PageNum  int = 1
 AS
 BEGIN
   SET NOCOUNT ON;

   DECLARE @s int = (@PageNum-1) * @PageSize + 1;
   DECLARE @e int = @s + @PageSize - 1;

   WITH r(rn) AS
   (
     SELECT TOP (@PageSize) rn = value
     FROM GENERATE_SERIES(@s, @e)
   )
   SELECT t.* FROM dbo.things AS t 
   INNER JOIN r ON t.rn = r.rn;
 END

Then I created a wrapper that will call each of them with a defined page number – this way I could test the beginning, middle, and end of the set (pagination often sees tanking performance as the page number gets higher). This table is hardly a performance nightmare but if I ran the procedures enough times I would hopefully see some variance.

CREATE OR ALTER PROCEDURE dbo.PaginateCols_Wrapper
   @PageNum int = 1
 AS
 BEGIN
   SET NOCOUNT ON;

   EXEC dbo.PaginateCols_Split     @PageNum = @PageNum;
   EXEC dbo.PaginateCols_CTEs      @PageNum = @PageNum;
   EXEC dbo.PaginateCols_GenSeries @PageNum = @PageNum;
 END

If you execute this procedure, you will see 3 output sets that contain rows from sys.columns. If you vary the @pagenum parameter value, you will see different pages of data from that set, but each three will be the same results. The only difference is the series generating code.

I turned on Query Store, and always want to remind you that QUERY_CAPTURE_MODE = ALL is not a production-friendly option – but quite handy if you want to make sure you capture every instance of every query:

ALTER DATABASE GenSeries SET QUERY_STORE 
 (
   OPERATION_MODE              = READ_WRITE,
   QUERY_CAPTURE_MODE          = ALL /* Do not do this in production! */
 );

I didn’t want to run the procedures a bunch of times manually; I like using sqlstresscmd because I can run tests hundreds of thousands of times without guilt about overwhelming a poor UI, or waiting for results to render, or battling resource conflicts and poisoning the test as a result. It runs the queries, discards the results, and that’s it.

I configured a JSON file called GenSeries.json like this, to run each procedure 10,000 times across 16 threads. It took about 5 minutes to run on average:

{
   "CollectIoStats": true,
   "CollectTimeStats": true,
   "MainDbConnectionInfo": 
   {
     "Database": "GenSeries",
     "Login": "sa",
     "Password": "$tr0ng_P@$$w0rd",
     "Server": "127.0.0.1,2022"
   },
   "MainQuery": "EXEC dbo.PaginateCols_Wrapper @PageNum = 1;",
   "NumIterations": 10000,
   "NumThreads": 16,
   "ShareDbSettings": true
 }

Then ran it using the following:

sqlstresscmd -s ~/Documents/GenSeries.json

Then I collected the average runtimes from Query Store:

SELECT qt.query_sql_text,
        avg_duration       = AVG(rs.avg_duration/1000.0)
   FROM sys.query_store_query_text AS qt
   INNER JOIN sys.query_store_query AS q 
     ON qt.query_text_id = q.query_text_id
   INNER JOIN sys.query_store_plan  AS p 
     ON q.query_id = p.query_id
   INNER JOIN sys.query_store_runtime_stats AS rs 
     ON p.plan_id = rs.plan_id
   WHERE qt.query_sql_text LIKE N'%dbo.things%'
     AND qt.query_sql_text NOT LIKE N'%sys.query_store%'
   GROUP BY qt.query_sql_text;

When I wanted to switch to the middle or the end of the set, I ran this query to clear Query Store data. (Note: you will need to capture the results from Query Store each time before executing this statement as this clears everything from Query Store):

ALTER DATABASE GenSeries SET QUERY_STORE CLEAR;

Then I changed the MainQuery line appropriately to run tests for the middle and the end. For rows 1,901 – 2,000:

"MainQuery": "EXEC dbo.PaginateCols_Wrapper @PageNum = 20;",

And for rows 3,901 – 4,000:

"MainQuery": "EXEC dbo.PaginateCols_Wrapper @PageNum = 40;",

Here are the timing results in milliseconds (click to enlarge):

Line graph showing average duration, in milliseconds, of three different series generation techniques

In these tests, the split approach was the winner, but the new built-in function is right on its heels. The stacked CTEs, while much more backward-compatible, have become a bit of an outlier.

I would love to see some flat lines in there, of course, since there shouldn’t be any penalty for jumping ahead to any page; but, not the point today. I do plan to revisit some of my old pagination techniques in a future article.

Conclusion

As the title suggests, I’m pretty happy with the syntax of GENERATE_SERIES so far, and I hope you get to try it out sooner than later! The performance of the split approach is slightly better, but both are still relatively linear and, for the simplicity of the implementation, I’d be inclined to use the newer syntax in most cases. At this scale, we’re talking about single-digit milliseconds anyway, so maybe not all that telling other than “this is worth testing.”

And to reiterate, this wasn’t meant to show that any of these methods might be better for pagination specifically – it was a completely manufactured scenario where the table just happened to have contiguous row numbers to join to the output. This was more a demonstration of how easy it is to swap GENERATE_SERIES into places where you’re using more convoluted methods today.

Further reading

As far as series generation goes, there are other options out there, too, including some from Paul White, Itzik Ben-Gan, and others in this 6-part Number series generator challenge from 2021. In particular, there is an interesting solution from Paul White (dbo.GetNums_SQLkiwi) in solutions part 4, but it does require a little concentration, and is version-limiting (it requires a table with a clustered columnstore index). You should do more thorough testing with his and other approaches from that series, with your data and workload, especially if your primary objective is squeezing performance. Some solutions will only be options if you are on modern versions and/or have some leeway in implementation (some CLR solutions might be interesting as well).

The post GENERATE_SERIES: My new go-to to build sets appeared first on Simple Talk.



from Simple Talk https://ift.tt/D8qQF7I
via

Thursday, March 30, 2023

A Beginners Guide to MySQL Replication Part 1

MySQL Replication is a process where data from one MySQL database known as the source (formerly called “master”) is copied over to one or more other databases called replicas (formerly called “slaves”). Think of this like having a backup buddy that is always in sync and up to date, giving you peace of mind.

It’s important to highlight that in 2020, MySQL took the step to modernize its language by retiring outdated terms such as ‘master’ and ‘slave’ and replacing them with more inclusive language – ‘source’ and ‘replica’. The company is actively working towards updating its queries and documentation with these updated terms. According to the official blog post, the origin of these words is negative and does not fit the description.

Note that some of the syntax will still reference the old terms.

There are several benefits to using MySQL replication. For starters, it helps to increase the reliability and availability of your data. If the source database goes down, one of the replica databases can step in to take its place, keeping your application running smoothly. Plus, by distributing the load across multiple servers, replication can help to improve performance and prevent downtime. Another bonus? Replication makes it easier to back up and recover your data. So, if anything were to happen, you can rest easy knowing that you have a backup ready to go.

We have different types of replication setups, such as asynchronous, semi-synchronous, and synchronous replication. These are the standard replication types mentioned in the documentation. In this series, we’ll be delving into each of these replication types and examining their requirements, benefits, and limitations. So, hold on tight, as we embark on a journey of discovery to setting up and choosing the right replication method.

Types of MySQL Replication

The choice of replication method depends on the specific requirements of your application. There are various replication types available in MySQL, each having its pros and limitations:

Asynchronous replication:

This is a type of replication in MySQL that involves a single source server that receives all write operations and one or more replica servers that replicate the data from the source. This type of replication is useful for scaling read-only operations. The source server does not wait for acknowledgment from the replica servers before committing changes. Here are some of the benefits of asynchronous replication:

  • Scalability: Asynchronous replication allows you to scale out the number of replica servers to meet growing demands for read-only access to data.
  • Performance: Asynchronous replication allows the source server to commit changes to the database immediately, which can provide high performance as the source server does not need to wait for the replica servers to catch up.
  • Flexibility: Asynchronous replication allows you to locate replica servers in different geographic locations, providing access to data from multiple regions.

Potential Limitations: Asynchronous replication is a great solution for businesses looking to scale their read-only operations and ensure high scalability. However, it may not be the best fit for applications that require real-time updates and immediate consistency due to some of its downsides, such as:

  • Data consistency: Since the source server does not wait for the replica servers to catch up before committing changes, there may be a temporary loss of data in the event of a failure. This can result in a lack of consistency between the source and replica servers.
  • Recovery time: If a failure occurs on the source server, it may take longer to recover and restore data consistency as the replica servers may have lagged.
  • Lagging: Since the replica servers may not be in real-time sync with the source, there may be a lag in the replication process that can result in outdated data on the replica servers.

Synchronous Replication

This is a method of replicating data where the source server waits for acknowledgment from the replica servers before committing changes to the database. This ensures that data remains consistent across all servers and eliminates the risk of temporary data loss in the event of a failure. There are several benefits to using synchronous replication, some of which are:

  • Data consistency and reliability: One of the key benefits of synchronous replication is the high level of data consistency and reliability it provides. With this method, you can be confident that all servers in your environment have the same up-to-date information, even in the event of a failure.
  • Fast recovery time: It can provide a faster recovery time in the event of a failure, as the replica servers already have the latest data. This can be especially important in high-availability environments where you need to ensure minimal downtime.

Potential Limitations: However, just like for asynchronous, there are also some limitations to synchronous replication that you should be aware of, such as:

  • Low performance: Since the source server must wait for acknowledgment from the replica servers before committing changes. This can slow down the processing of updates and inserts, especially if the replica servers are located far away from the source server.
  • High complexity: Synchronous replication can be more complex to set up and manage than asynchronous replication, as it requires more coordination between the servers. The high level of consistency and reliability it provides also requires more resources, including network bandwidth and disk space, to ensure smooth operation.

Semi-synchronous replication

This is a replication method that offers a compromise between the high data consistency of synchronous replication and the improved performance of asynchronous replication. With semi-synchronous replication, the source server waits for at least one replica server to acknowledge the receipt of updates before committing changes to the database. Here’s why semi-synchronous replication is a game-changer:

  • Data consistency and performance: It provides a good balance between data consistency and performance. Since the source server only needs to wait for one replica server to acknowledge the receipt of updates, it can complete transactions more quickly than with synchronous replication, which requires all replica servers to acknowledge the receipt of updates.
  • Reduced data loss: It can help reduce the risk of data loss in the event of a failure. With this method, the replica server acts as a backup of the data, providing a level of protection against temporary data loss in the event of a source server failure.

Potential Limitations: Of course, like all good things in life, semi-synchronous replication comes with a few trade-offs.

  • It’s Not for the Faint of Heart: It can still be more complex to set up and manage than asynchronous replication, as it requires coordination between the source and replica servers.
  • Data consistency: If the replica server that acknowledges the receipt of updates experiences a failure, data consistency may be affected, as the source server will still commit changes to the database.

You can decide whether synchronous, asynchronous, or semi-synchronous replication is the best option for your environment with some thorough planning and taking into account the evaluation of your data demands and use cases.

Replication formats

MySQL supports two core types of replication formats. Let’s take a closer look at each of these formats to understand how they work and what their pros and cons are.

Statement-based replication: Statement-based replication works by recording changes to a database as SQL statements and then replicating those statements to all replicas. The replicas then execute the same statements in the same order as the primary database. SBR is an easy-to-use and efficient replication format that’s ideal for simple, straightforward replication scenarios.

However, it does have some limitations. For example, it may not handle certain types of non-deterministic statements well, which can cause problems during replication.

Row-based replication: Row-based replication records changes to a database as changes to individual rows of data. This type of replication is much more flexible and robust than SBR, as it can handle more complex changes to data and can resolve replication conflicts more easily. It’s also better equipped to handle complex data structures, making it a good choice for demanding replication scenarios.

However, RBR is also more resource intensive than SBR, as it requires more network bandwidth and storage space to transmit the additional data.

Requirements for setting up MySQL Replication

Replicating data from a source database to a replica database is a great way to improve the performance of your MySQL infrastructure. But before you dive into setting up replication, there are a few key requirements you need to be aware of:

  • MySQL Version: Ensure that both the source and replica servers are running the same version of MySQL. This will guarantee that data can be replicated between the two servers without any compatibility issues.
  • Network Connectivity: The source and replica servers need to be able to communicate with each other over a network. This can be achieved by having both servers on the same network, or by setting up a secure connection between them.
  • User Privileges: A user account is required for replication, and it must have sufficient privileges on both the source and replica servers. The user must have the `REPLICATION SLAVE` privilege on the source and the `REPLICATION CLIENT` privilege on the replica.
  • Binary Logging: Binary logging must be enabled on the source server. This is essential for the replica to receive updates to the database. In order to set up binary logging in MySQL replication, you need to modify the MySQL configuration file (my.cnf or my.ini) on the source server to enable binary logging.
  • Unique Server IDs: Each MySQL server must have a unique server ID. This allows you to keep track of which data is being replicated from where.
  • Source Database Backup: To initialize the replica, you need to create a backup of the source database. This can be done using the `mysqldump` utility (Lukas Vileikis recently blogged about this utility here).
  • Storage Space: Make sure that both the source and replica servers have enough storage space to accommodate the replicated data.

Conclusion

Phew, that was a lot to digest! MySQL replication is a complex topic, and it’s essential to have a solid grasp of the fundamentals before diving into the technical details. But don’t worry, because in Part 2 of this series, we’re diving deep into the heart of the matter and exploring the nitty-gritty details of the replication process. Get ready to flex your query-writing muscles and gain a deeper understanding of this powerful tool!

 

The post A Beginners Guide to MySQL Replication Part 1 appeared first on Simple Talk.



from Simple Talk https://ift.tt/z615Frf
via

Wednesday, March 29, 2023

The role of forethought for a computer engineer

If you have ever played a game of chess against someone good, or perhaps against a computer, you know that a good player is always thinking multiple moves ahead, working every combination of moves their opponents may possibly make, and getting a plan for their next set of moves. It is a fun game, but very frustrating if you are just a novice because there are many ways for the game to be over super quick, leaving you feeling stupid.

Being a DBA and/or a database programmer is like playing several chess games simultaneously. Each opponent is formidable, crafty, and unlike a chess-playing computer, completely unpredictable. This unpredictability has led to software development methodologies that seek to manage this problem. But just like chess, if you have no forethought of where your opponents will move next, it can only exasperate your problems. You know your opponents and are probably good friends with many of them, which can leave you vulnerable to their whims.

Most of these items pertain to anyone working with the process of writing and maintaining software. Even if you are not a dedicated data-oriented programmer, you are going to have to think about the data too!

  • Customers – Without customers, being a DBA would be the same as being unemployed. Luckily too, customers are reasonably predictable with a bit of coaching. They have wants and needs but can only be satiated by working software delivered regularly enough. Just talking to them, documenting their future needs (stuff accomplished well by employing agile methodologies), and giving them results regularly can help you to avoid their unpredictability.
  • Hardware – Any DBA worth their weight in Jelly Babies knows that their hardware will fail someday. So we build servers to allow for minor failures and replace servers nearing the “mean time before failure” zone. We backup resources and store the results somewhere safe like a cloud provider, or at least offsite storage (probably not your mom’s garage for security reasons!) Don’t forget to test those backups and maintain hardware that can also accept your backups!
  • Software – With every edition of SQL Server, more and more features are placed on the deprecated list, meaning someday that feature is going away (I was personally hit by an archaic form of RAISERROR that I used over 10 years ago in code that hadn’t been touched in years.) Even though deprecated features generally work for years, going through your code (or in SQL Server using Extended Events) to see when deprecated features are being used, and then using newer features will help you avoid future failures.
  • Complex Requirements – You can tell the quality of a database programmer by looking at a complex stored procedure they have written. The novice often says: Ok, I need rows from X WHERE X.I = 1, and do that query into a temp table. Then I need to remove rows where X.J > 0, so they create a temp table of those rows. By the end of the coding task, they have 100 temp tables to return 3 rows, with all the temp tables just being stuff that could have been done with a few subqueries of compound WHERE clause criteria.

    If you are unsure of whether complex code, you can generally be 99.9382% sure of the quality of that code by reading the comments. Or lack of comments

  • Database State – The most significant difference between database management/coding and other programming disciplines is maintaining state. And when you move from one set of structures to another, you must upgrade that state along the way. So as you are thinking about new features you want, it is essential to realize that whatever you build will not be easy to change without maintaining the existing data.
  • And so much more – Unlike a game of chess, there is more to what you must deal with than just the pieces you can see right in front of you. World economics, your or your family’s health; changes in management; new products; changing technologies. And so on. 

In the end, the process of going from newbie to junior up to senior and architect level in a technical career requires a lot of forethought to be in constant preparation for what is next. And it is not untrue for almost any career. Keep yourself firmly grounded in making today’s business run, all while ensuring it runs well tomorrow. 

If this all sounds complicated… then you have grokked what I am saying and are ready to start the next phase of your training. Doing your best to make it happen. And don’t be afraid you are going to fail. Because you are. And you will learn as much from your failures as anything you read.

The post The role of forethought for a computer engineer appeared first on Simple Talk.



from Simple Talk https://ift.tt/uWjlXZC
via

Monday, March 27, 2023

Using Virtual Network Data Gateways to Secure Access to Azure SQL in Power BI

Power BI and Azure are two related services, but with a considerable independence between them. Power BI share the Azure Tenant for security management, but in relation to networking, they are in completely different environments.

Frequently Power BI needs to access to Azure resources, such as Azure SQL. If we use the easier and most simple configurations, all the network communication between Power BI and Azure services will happen through public internet, creating a security risk.

Of course, many Azure services have their own firewall, protecting the access by IP address. Azure SQL has a firewall as well, two in fact. However, it’s very difficult, if not impossible, open a service by IP to Power BI access. This leads to situations where we need to leave the service public accessible in some levels.

For example, let’s consider Azure SQL. Probably there are many articles on the web with the terrible advice that that in order to access Azure SQL from Power BI “you should enable Allow Azure Services and Resources to access this Server for this to work“. As I explained on my article Eight Azure SQL configurations you may have missed and on the session I delivered and you can watch: Azure SQL Networking Secrets, this should not be used in production.

This configuration allows any Azure service to bypass the Azure SQL firewall. It puts down one layer of security for Azure SQL.

We will focus on an example using Azure SQL to explain how to ensure the network communication from Power BI and Azure services can be made through a private environment, with no network package going on public internet.

Using Private Endpoints and a Gateway

Each one, Power BI and Azure SQL, have part of the solution, which needs to be put together.

Azure SQL: When we need an Azure SQL to be private, not exposed to the web, not even through the firewall protection, we use a private endpoint. We create a private endpoint for Azure SQL inside an Azure Virtual Network and disable public access. Azure SQL will only be accessible through the virtual network. Once the private endpoint is created, it’s possible to disable the public access at all.

Power BI: When we need to access something behind a network barrier, we use a data gateway. For example, when we need to access a server on premises, we install a data gateway on premises and configure it in the portal.

The solution becomes a matter of connecting the pieces: Create a virtual network on Azure, a private endpoint for Azure SQL and install a data gateway for Power BI. This last part requires a virtual machine. This would be what I would call a very “manual” process, managing a data gateway in a virtual machine for Power BI.

Power BI has a new feature that makes this process easier and less “manual”: Virtual Network Data Gateways. This is the focus of this article.

Power BI Virtual Network Data Gateways

Power BI has a feature called Virtual Network Gateway. In summary, it’s a data gateway created and managed automatically by Power BI inside an Azure Virtual Network.

This feature makes the process much more “automated” and less “manual”, including the management of gateway redundancy, for example.

The limitation is the fact the Power BI tenant and the Azure tenant needs to be the same. If the tenants are not the same, you have two options:

  • Revert to the Virtual Machine with a Data Gateway installed
  • Establish a network connectivity across tenants

In addition, this feature requires a premium workspace. The reports accessing data through Virtual Network Data Gateways need to be premium or Power BI Premium Per User (PPU).

Steps to build the solution

These are the steps to build this solution:

  1. Build a Virtual Network in Azure. The Virtual network will create a private environment for you, instead of exposing your services to public internet
  2. Build a private endpoint for Azure SQL. This will insert Azure SQL inside your private environment, eliminating the internet access from the service.
  3. Build the Power BI Virtual Network Data Gateway to link Power BI with your virtual network
  4. Create your data source on the Virtual Network Data Gateway. The data source will allow the Power BI access to your private Azure SQL

I will skip the Virtual Network part, because this is a regular Azure feature. You can discover more about creating a virtual network here.

Creating the Private Endpoint

There are some networking considerations when creating a private endpoint. We will not try to be too specific and tell exactly what to do or not to do. Any company planning this architecture on the enterprise level should carefully plan these configurations.

NSG for Private Endpoints

Azure Virtual Networks can use NSGs, Network Security Groups, to protect communications. Private Endpoints, on the other hand, are not protected by NSGs by default. There is a special configuration we can make in a virtual network to define if the Virtual Network will use NSGs to protect private endpoints or not.

This configuration can be done on the level of each subnet and it’s available during the process of creating a private endpoint.

Application Security Group

Application Security Groups enable you to configure application elements in groups to be easily controlled by NSG rules. When you have many application elements which require the same rules, such as private endpoints, virtual machines and more, you can group them using an application security group.

By doing so, you can create single rules for the application security group instead of all the trouble to create rules for each one of the application services.

Learn more about Application Security Groups

DNS Configuration

Private endpoints are built using DNS name resolution. By default, when we create a private endpoint, a DNS Zone is created for us and linked with the virtual network. In enterprise scenarios, you may would like to control your own DNS resolution system instead of using many different DNS zones.

Steps to create the private endpoint

  1. Access the Azure SQL Server
  2. On the left tab, access Security->Networking
    Interface gráfica do usuário, Texto, Aplicativo, chat ou mensagem de texto Descrição gerada automaticamente
  3. Click on Private access tab

Interface gráfica do usuário, Texto, Aplicativo, Email Descrição gerada automaticamente

  1. Click on + Create a private endpoint. A wizard with multiple steps will open.
  2. On the first window, select the resource group
  3. Create the name for the private endpoint
  4. Select the region. It should be the same as the Virtual Network region and the Azure SQL

Interface gráfica do usuário, Texto, Aplicativo Descrição gerada automaticamente

  1. The 2nd window has a confirmation of the target resource. Ensure SQLServer is selected and move forward

Interface gráfica do usuário, Texto, Aplicativo, Email Descrição gerada automaticamente

  1. The 3rd window contains the details of the virtual network. Choose the virtual network and subnet to create the private endpoint.

Interface gráfica do usuário, Texto, Aplicativo, Email Descrição gerada automaticamente

  1. You can configure the network policy for private endpoints clicking the Edit link. Changing this configuration affects the entire subnet, all private endpoints in the subnet.

Interface gráfica do usuário, Texto, Aplicativo, Email Descrição gerada automaticamente

  1. You can select or create an application security group for this private endpoint
  2. You can select if the private endpoint will be static or dynamic
  3. After making your decision about the above configurations, move to the next step
  4. The 4th configuration is about the DNS Zone. It will be filled with default values. It’s the moment to choose how to manage your DNS naming resolution.

Interface gráfica do usuário, Texto, Aplicativo, Email Descrição gerada automaticamente

The next two steps are the tagging and the review. These are usual steps for resource creation, you can proceed and create the private endpoint.

Private endpoint result

The image below shows the objects created after these steps:

Interface gráfica do usuário, Texto, Aplicativo, Email Descrição gerada automaticamente

The four items in the list correspond to the following:

Virtual Network: The virtual network was created on the first step, before the private endpoint

Network Interface: The NIC is needed to provide an IP to the private endpoint. It’s an IP belonging to the virtual network

DNS Zone: Created for the name resolution of the private endpoint. It’s linked to the virtual network.

Private Endpoint: As the name points out, this is the private endpoint itself.

Creating the Virtual Network Data Gateway

The Virtual Network Data Gateway is a PaaS service, managed by Power BI. We make the initial configuration, and everything is managed by Power BI.

In order to achieve this, Power BI needs a subnet delegated to it. The subnet delegation feature allows an external software, such as Power BI, to make changes to the subnet settings and the objects inside the subnet.

We also need to register the resource provider Microsoft.PowerPlatform in our Azure subscription. Resource Providers are one of the core building blocks of the Azure environment, you can learn more about them here.

We have three steps to execute:

  • Register the resource provider
  • Create the delegated subnet
  • Create the Virtual Network Data Gateway

Registering the Resource Provider

  1. On the portal, use the top icon to open the cloud shell.

If your cloud shell is not configured, you will need to configure it

There are other methods to do command line actions in Azure, which you can learn more about on this link

  1. Use the following statement to register the resource provider:

Register-AzResourceProvider -ProviderNamespace Microsoft.PowerPlatform

Interface gráfica do usuário, Texto Descrição gerada automaticamente

  1. It will take some moments for the resource provider to be registered. You can check when the registration is complete using the following statement:

Get-AzResourceProvider -ProviderNamespace Microsoft.PowerPlatform

When the registration is complete, the RegistrationState will appear as Registered, in opposite to Registering while the registration is still ongoing.

Creating the delegated subnet

  1. On the portal, open the Virtual Network object
  2. On the left tab, under Settings, click Subnets

Interface gráfica do usuário, Texto, Aplicativo Descrição gerada automaticamente

  1. Click the +Subnet button
  2. Fill the name you want to use for the subnet, for example, pbisubnet
  3. On the Delegate subnet to a service drop down, select Microsoft.PowerPlatform/vnetaccesslinks

Interface gráfica do usuário, Texto, Aplicativo, Email Descrição gerada automaticamente

  1. Click the Save button

Creating the Virtual Network Data Gateway

  1. On the Power BI portal, click the Settings icon and the Manage connections and gateways menu option

Interface gráfica do usuário, Texto, Aplicativo Descrição gerada automaticamente

  1. Click the tab Virtual network data gateways

Interface gráfica do usuário, Texto, Aplicativo, Email Descrição gerada automaticamente

  1. Click the New button
  2. On the New virtual network data gateway window, select the Azure subscription

If the subscription doesn’t have the correct resource providers registered, an error message will appear.

Interface gráfica do usuário, Texto, Aplicativo, Email Descrição gerada automaticamente

  1. On the New virtual network data gateway window, select the resource group
  2. On the New virtual network data gateway window, select the virtual network
  3. On the New virtual network data gateway window, select the subnet. Only subnets with the correct delegation will be listed.
  4. On the New virtual network data gateway window, choose the inactivity time before an auto-pause
  5. On the New virtual network data gateway window, choose the number of gateways to be created
  6. Click the Save button

Interface gráfica do usuário, Texto, Aplicativo, Email Descrição gerada automaticamente

Create your data source on the Virtual Network Data Gateway

  1. Use Power BI Desktop to create a report using the Azure SQL Database as a source
  2. Create a premium or PPU workspace in Power BI portal
  3. Publish the Power BI report to the created workspace.
  4. Refresh the created dataset. The first refresh always fails, asking you to update the credentials used to connect to the source

Interface gráfica do usuário, Texto, Aplicativo, Email Descrição gerada automaticamente

  1. On the dataset, click the Settings menu item

Interface gráfica do usuário, Aplicativo Descrição gerada automaticamente

  1. Open Data Source Credentials item and click the Edit Credentials link

Interface gráfica do usuário, Texto, Aplicativo, Email, Site Descrição gerada automaticamente

  1. After inserting the correct credential, you will receive an error message. The Power BI Ip addresses are not allowed to access the Azure SQL Server.

Interface gráfica do usuário, Texto Descrição gerada automaticamente

  1. Click the Cancel button
  2. Open the Gateway connection option
  3. Turn on the option Use an On-premises or VNet data gateway

The gateway will appear as Not configured correctly, because it’s missing the datasource required for this dataset. This datasource will be listed and we will have the option to add it to the gateway.

Interface gráfica do usuário, Texto, Aplicativo, Email Descrição gerada automaticamente

  1. Click the link Add to VNet
  2. On the New data source window, create a name for the data source, for example, myAzureSQL
  3. Set the authentication to be used. The authentication will be registered on the gateway, the report developers will not need to know the authentication used.

There is an interesting feature available for the data sources: We can use SSO via Azure AD for DirectQuery. This means that when a data source uses direct query, the end user will be directly authenticated using Azure AD. This is a subject for future blogs.

Interface gráfica do usuário, Aplicativo Descrição gerada automaticamente

  1. Click the Create button
  2. Click the Apply button to link the dataset datasource with the new datasource created on the gateway.

Interface gráfica do usuário, Texto, Aplicativo, Email Descrição gerada automaticamente

  1. Refresh the dataset again. This time the refresh will work because it will be using the VNet data gateway.
  2. Go to Settings and click the Manage connections and gateways menu option
  3. On the data sources tab, you will be able to locate the data source created.

Interface gráfica do usuário, Aplicativo Descrição gerada automaticamente

Conclusion

In this and almost all cases where you are setting up security,

it may be complex to set up and maintain a secure environment in your company, but it’s very important to not use shortcuts which will create security breaches.

 

The post Using Virtual Network Data Gateways to Secure Access to Azure SQL in Power BI appeared first on Simple Talk.



from Simple Talk https://ift.tt/RYoIJge
via

Friday, March 24, 2023

The difference between libraries and frameworks

The concept behind frameworks and libraries is to provide reusable code that you can use to perform everyday tasks so that developers don’t have to write all of the code from scratch for every project.

Frameworks and libraries are designed to be modular and reusable, meaning they can be used in multiple projects without needing to rewrite the code. This can save developers a significant amount of time and effort and also help to improve the reliability and maintainability of the code.

In addition to providing reusable code, frameworks and libraries can offer a set of best practices and conventions for structuring and organizing code, which can help developers build more scalable and maintainable applications.

However, there are some differences that you need to understand as you start writing code, which I will cover in this article.

Libraries and frameworks: Basics

A library is a group of operations that programmers can call at a whim and still have complete control over the software application’s flow. The actions that libraries are expected to take are stated carefully and precisely.

An application describes the substance of the action by requesting the developer to fill in the blanks in a framework, which is a structure. The developer specifies the functionality with the end-user in mind, while a framework establishes the notion using a set of libraries or tools to accomplish the task.

What are libraries?

A library is a collection of pre-written code that you can use to perform specific tasks. Libraries are often used to reduce the amount of code a programmer needs to write by providing reusable functions or classes that can be called upon as needed.

All libraries are shared into two classifications. All libraries are either Static or Dynamic.

  • Dynamic libraries: Dynamic libraries are loaded at runtime rather than linked with a program at compile time. This allows programs to use the library without having to be re-compiled if the library is updated. In addition, dynamic libraries are often used to provide platform-specific functionality or to allow multiple programs to share a single copy of the library in memory.
  • Static libraries: These are compiled into a program’s executable code, making it self-contained and independent of external changes. Static libraries are linked with a program at compile time, resulting in a single executable file that includes all of the code from the library. As a result, static libraries are often used to improve the performance of a program by avoiding the overhead of loading a library at runtime.

You can write libraries in most programming languages, and they can be used by programs written in that same language or a different language. For example, a library written in C++ could be used by a program written in Python.

While there are two main classifications of libraries, there are many different types of libraries, including:

  • Standard libraries: These are libraries that are included with a programming language by default. These libraries provide a set of standard functions and utilities that are useful for many applications. For example, the C standard library includes functions for input/output, string manipulation, memory allocation, and mathematical operations.
  • Third-party libraries: These libraries are developed by organizations or individuals other than the creators of the programming language. These libraries can provide additional functionality that is not included in the standard library and may be distributed as open-source or commercial products.

Libraries can perform various tasks, such as file input and output, data manipulation, network communication, and more. They can also provide access to system resources, such as the file system or hardware devices.

What are Frameworks?

A framework is a set of libraries or tools that provide a structure for building applications. It typically includes code libraries, templates, and guidelines for how to use the libraries to develop applications.

Frameworks can be useful for developers because they provide a standardized way to build and organize code, which can help streamline the development process and make it easier to build and maintain complex applications. They can also help developers to build applications that are more reliable and scalable, as the framework provides a set of best practices and conventions to follow.

Frameworks are used in various programming sectors like:

These are many different types of frameworks available, ranging from web application frameworks to game development frameworks to mobile application frameworks. Some examples of popular frameworks include:

Each framework has its own set of features and capabilities, and developers can choose the one that best meets their needs for a particular project.

Pros and Cons of Libraries and Frameworks

When choosing between frameworks and libraries, it is important to take into account all of their possibilities, advantages, and disadvantages.

Pros of libraries

There are several benefits to using programming libraries:

  • Reusability: Programming libraries allow you to reuse code that has already been written and tested, saving you time and effort. This means you don’t have to reinvent the wheel whenever you want to perform a common task.
  • Quality: Programming libraries are typically well-tested and debugged, so you can be confident that the code is reliable and will work as intended.
  • Maintainability: Programming libraries can make maintaining your code easier over time. Since the library code is separate from your code, you can update or upgrade the library without affecting your codebase.
  • Community support: Many popular programming libraries have large user bases and active communities, so you can often find help or resources online if you have questions or run into issues.
  • Standardization: Programming libraries help ensure that your code is consistent and follows established best practices. This can make it easier for other developers to understand and work with your code.
  • Encapsulation: Ignores global state management issues like HTTP and routing in favor of concentrating directly on the functionality that the library implements.
  • Developer time: Lowers the cost of developing applications by providing pre-built solutions to handle random tasks.

Cons of libraries

There are equally some potential drawbacks to using programming libraries:

  • Dependencies: Using programming libraries can introduce dependencies into your codebase. This means that if the library changes or is no longer supported, it could break your code or require you to make changes. This is particularly a problem with dynamic libraries.
  • Bloat: Depending on the size and complexity of the library, it could add unnecessary bloat to your codebase, which could impact the performance of your application.
  • Compatibility: Not all libraries are compatible with all programming languages or platforms. This means you may have to choose a different library or find a way to make it work with your technology stack.
  • Lack of control: When you use a programming library, you have to rely on its functionality. This means you may have less control over how a particular task is performed than writing your code from scratch.
  • Learning curve: Using a new programming library can take time to learn and understand. This can be a barrier for some developers, especially if the library has a steep learning curve or needs better documentation.
  • Performance: A wrapper is required when using a library in an unsupported environment, which slows the application.

Pros of Frameworks

There are several benefits to using frameworks in programming:

  • Structure and organization: Frameworks provide structure and organization for your code, making it easier to develop and maintain common types of applications by bringing together the libraries and tools needed. This can be especially helpful for large or complex projects.
  • Reusability: Frameworks often include reusable components or libraries that can save you time and effort. This means you can write less code from scratch and offers quicker solutions for web development.
  • Best practices: Frameworks often embody established best practices and design patterns, which can help ensure that your code is well-designed and maintainable.
  • Community support: Many popular frameworks have large user bases and active communities, so you can often find help or resources online if you have questions or run into issues.
  • Scalability: Frameworks are often designed with scalability in mind, which means they can handle large requests or traffic without performance issues. This can be especially important for applications that are expected to grow or have high usage levels.
  • Encapsulation: The entire application is unaffected by a change to a single component.
  • Compatibility: Encourages the development of cross-platform applications.
  • Quality: Creates rich, dynamic content that offers a better user experience with less skill required.
  • Data binding: Because several JavaScript frameworks, like Angular, are built on a common architectural pattern called MVC.

Cons of frameworks

There are a few potential drawbacks to using frameworks in programming:

  • Complexity: Some frameworks can be complex and have a steep learning curve, which can be intimidating for beginner developers.
  • Limited flexibility: Frameworks often have a specific way of doing things, which can be inflexible if you want to do something outside the framework’s scope. Also, a framework’s ready-to-use features restrict programmers from fully knowing the programming language.
  • Performance overhead: Some frameworks can add a performance overhead, which can impact the speed of your application.
  • Dependency on the framework: If you build your application using a specific framework, you may become reliant on it, making it more challenging to switch to a different one.
  • Lock-in: Using a framework can also lead to lock-in, where you are tied to a specific set of technologies and unable to switch to alternatives easily.
  • Complexity: For example. in MVC, separating the presentation layer from the business logic might be challenging.
  • Compatibility: Only a browser environment that supports JavaScript can use JavaScript frameworks.
  • Unexpected consequences: As an example, if recommended framework guideline is not followed while developing, there is a chance of a security breach.

Libraries and frameworks: Similarities and differences

Libraries offer pre-set functions and classes to developers to streamline their work and speed up the development process. Contrarily, a framework is akin to the base upon which programmers create apps for specific platforms.

Both libraries and frameworks are collections of pre-written code that developers can utilize to speed up the creation of software programs. In addition, they offer several classes or functions that can carry out typical activities, such as communicating with databases, managing user input, or producing graphics.

Similarities

There are several similarities between libraries and frameworks:

  • Both libraries and frameworks provide functions or classes that developers can use to perform everyday tasks.
  • Both libraries and frameworks are designed to be reusable, so developers do not have to write the same code repeatedly.
  • The process for installing libraries and frameworks is the same (e.g, installing the Numpy library and the TensorFlow framework on a computer can both be done using the pip package installer by running the command “pip install numpy” and “pip install tensorflow“, respectively, in the command line. The process for installing the library and the framework is the same, as both are done using the same package installer and the same type of command.). 
  • Both libraries and frameworks can save time and effort when building software applications.

One way to illustrate the similarities between libraries and frameworks is with an example. Suppose you are building a web application that needs to authenticate users and allow them to upload and download files. You can use multiple libraries or a more complete framework to provide the necessary functionality for these tasks.

For example, you could use a library like “Authlib” to handle user authentication. This library provides a set of functions you can call in your code to handle tasks such as creating new user accounts, logging in and out, and verifying user credentials. Then get a different library for handling file transfers, or write your own.

On the other hand, you could use a framework like “Django” to handle both user authentication and file uploads/downloads. Django is a full-stack web framework with many out-of-the-box functionalities, including user authentication and file storage. When using Django, you would write your code within the structure provided by the framework, following its conventions and patterns.

Both libraries and frameworks can be valuable tools for building software applications, and the choice of which to use will depend on the specific needs of your project. However, both libraries and frameworks provide pre-written code that can save time and effort when building software, and they can be easily installed and used in a project.

Differences

These are some key differences between libraries and frameworks which I will discuss further in the following sections.

Flexibility

Libraries are designed to be flexible and can be used in various ways. Developers can choose how to use a library and incorporate it into their code in any way that makes sense for their project. 

Consider the mathematical library.

Here’s an example of how a mathematical library can be used in a simple mathematical operation, using the Math library in Python:

from scipy.optimize import fsolveimport math
# Calculate the square root of a number
x = 16
y = math.sqrt(x)
print(y) 
# Output: 4.0
# Calculate the logarithm of a number with base 10
z = math.log10(100)
print(z) 
# Output: 2.0
# Calculate the sine of an angle in radians
a = math.sin(math.radians(30))
print(a) 
# Output: 0.49999999999999994

In this example, the Math library provides functions for performing various mathematical operations, such as calculating square roots, logarithms, and trigonometric functions. The developers can use these functions in any way they see fit. They can call any of the functions provided by the library and use the return values of these functions in any way they like. Hence, this example showcases the flexibility of a library.

In contrast to libraries, frameworks tend to impose a specific structure or pattern on the principle that the developer writes. This means that developers must follow the system set by the framework when using it.

Now consider a framework like “Rails.” Rails is a web application framework that provides a range of functionality for building web applications, including routing, database interaction, and rendering views. When using Rails, a developer must follow the structure and conventions of the framework. For example, they might write code like this:

class UsersController < ApplicationController
  def show
    @user = User.find(params[:id])
  end
end

In this example, the developer must define a controller class that inherits from “ApplicationController” and define a “show” action within that class. They must also use instance variables (such as @user) in a specific way, as they will be used to render views.

Inversion of control

This has a vital difference in libraries and frameworks. It refers to managing control flow in an application that uses a library or framework. In a standard library, the developer writes code that calls the library’s functions or methods. The developer controls the program’s flow and can decide when to call the library’s functions. This is known as “procedural control.”

This example reflects procedural control in a library:

# Import the library
import authlib
# Use the library's functions to perform tasks
user = authlib.create_user(username='jane', password='123456')
authlib.login(username='jane', password='123456')

In contrast, in a framework, the framework calls the code that the developer writes. The developer writes code that conforms to the structure and conventions of the framework, and the framework calls this code at specific points in the application’s execution. This is known as “inversion of control.”

Here is an example of inversion of control in a framework:
# Import the framework
import Django
# Use the framework's structure to build the application
# Define a Django "view" function to handle a request
def handle_request(request):
  # Perform some action based on the request
  ...
# Define a Django "urlpattern" to map the view function to a URL
urlpatterns = [
  path('/', handle_request),
]

In this example, the developer has defined a “view” function called handle_request that Django will contact to handle incoming HTTP requests. The developer has written this function within the structure provided by the Django framework, and the framework will call the function at the appropriate time when a request is received. This represents an inversion of control, as the framework calls the code that the developer has written rather than the other way around.

Scope of functionality

The scope of functionality refers to the range of tasks that a library or framework is designed to handle. Libraries are focused on a specific task or set of tasks and provide a narrow range of functionality that can be used to perform these tasks. For example, a library might be designed to handle user input, interact with a database, or perform data parsing.

Here is an example of how a library might be used in code to handle user input:

# Import the library
import inputlib
# Use the library's functions to handle user input
name = inputlib.get_string('Enter your name:')
age = inputlib.get_int('Enter your age:')

In contrast, frameworks are designed to provide a wide range of functionality and are often used to build complete applications. A framework will typically include a set of libraries and other tools that can perform a wide range of tasks, such as handling user input, rendering graphics, and interacting with a database.

Here is an example of how a framework might be used in code to build a complete web application:

# Import the framework
import Django
# Use the framework's structure to build the application
# Define a Django "view" function to handle a request
def handle_request(request):
  # Perform some action based on the request
  ...
# Define a Django "urlpattern" to map the view function to 
#a URL
 urlpatterns = [
  path('/', handle_request),
]
# Use Django's built-in libraries to interact with a 
#database
from django.db import models
class User(models.Model):
  name = models.CharField(max_length=50)
  age = models.IntegerField()

Here, the developer is using Django to build a complete web application. The developer has defined a “view” function to handle incoming HTTP requests and has used Django’s built-in libraries to define a model for storing user data in a database. This demonstrates a framework’s broad scope of functionality, including tools for handling user input, interacting with a database, and building a complete web application.

Ease of use

Libraries are generally easier to use than frameworks, as impose far less specific structure or pattern on the code written by the developer. This can make them more flexible and easier to work with, especially for developers who are new to a particular language or platform.

Is it wrong to use Libraries and Frameworks interchangeably?

Although they have similarities, developers shouldn’t confuse libraries for frameworks or use both terms interchangeably. 

Understanding the distinction will make it easier for you to make sense of all the resources that are already accessible for web application development and all the ones that will undoubtedly emerge in the future.

You can also combine different resources in the same application by knowing whether each is a framework or a library. Can you combine, for instance, Angular and Express? React and Rails? React and Angular? Express and Rails?

Some of these tools are frontend or backend intended and you can only combine them if you follow the guide below… 

Mixing Frameworks

Typically, you wouldn’t combine two frameworks, especially that have overlapping purposes, but there are common reasons why you might.

  • The “MEAN” stack (Express and Angular) is an exception to the rule , because both frameworks are lightweight and concentrate on different aspects of the application.
  • Although one is a framework and the other is a library, Angular and React offer front-end rendering paradigms. Thus, combining the two generally wouldn’t make sense.
  • Since one is a robust framework for full-stack apps and the other is a library for front-end rendering, you can combine Rails and React. 
  • However, since Rails and Express offer back-end framewor ks, you shouldn’t combine the two.

However, you may (and typically must) incorporate a framework and a library. Furthermore, you usually need to integrate your framework with many libraries because each library excels at doing one particular task, and applications frequently employ a variety of libraries because they must perform various tasks efficiently.

Conclusion

An application can be quickly developed and deployed using a framework. By utilizing a framework, we may use resources to expedite the development and improve user experience. A library is used to increase an application’s functionality. We can use the functions from our library in a variety of applications.

Who controls the execution flow is the key technical distinction between a framework and a library. You decide when to call the library, if at all when using a library. However, while using a framework, you write code to fill in the framework’s gaps because the framework manages the execution flow. The code that’s written is called the framework.

The post The difference between libraries and frameworks appeared first on Simple Talk.



from Simple Talk https://ift.tt/NyfMgRs
via

Tuesday, March 21, 2023

Paging Data in T-SQL

Sometimes when working with very large sets of data, you may not want to return all of the data at once. I discussed using TOP in my previous article, which allowed you to only get a number of rows from the start of the results from a query. However, if you want to see the rows after that top set,

paging of data takes that further to let you scroll through a set of data one page at time. So, you might want to fetch the first 100 rows, then another 100, then the rest of the rows, etc.

This article will show you how to page through a set of results using the OFFSET and FETCH options of the ORDER BY clause.

Sample Data

Before showing some different paging options, I will create some test data using the code in Listing 1.

-- Create test data
USE tempdb;
GO

DROP TABLE IF EXISTS TestData; 
GO

CREATE TABLE TestData (
ID INT IDENTITY, 
CityName VARCHAR(20),
StateName VARCHAR(20),
Founded SMALLINT);

--Insert rows of test data
INSERT INTO TestData VALUES
('Seattle','Washington',1851),
('Redmond','Washington',1871),
('Bellevue','Washington',1953),
('Spokane','Washington',1881),
('Tacoma','Washington',1872),
('Portland','Oregon',1851),
('Grants Pass','Oregon',1887),
('Salem','Oregon',1842),
('Bend','Oregon',1905);

Listing 1: Creating Sample Data

In Listing 1, a table named TestData was created that contains a list of cities. This data will be used in the different paging examples below. If you want to follow along and run the example code in this article you can create the sample TestData on your test instance of SQL Server.

Page through data in T-SQL

In order to page through a page of data using T-SQL the OFFSET and FETCH options of the ORDER BY clause are used. SQL Server starts returning data from a specific row based on the OFFSET value and returns a specific number of rows based on the FETCH value.

Using the OFFSET and FETCH options of the ORDER BY clause is a better option for paging then using a server-side cursor.

Syntax for the OFFSET and FETCH

Below is the syntax for the ORDER BY clause as found in the Microsoft Documentation.

ORDER BY order_by_expression 
  [ COLLATE collation_name ]  
  [ ASC | DESC ]  
  [ ,...n ]  
[ <offset_fetch> ] 
<offset_fetch> ::= 
{  
  OFFSET { integer_constant | offset_row_count_expression } 
      { ROW | ROWS } 
  [ 
   FETCH { FIRST | NEXT } {integer_constant | 
      fetch_row_count_expression } { ROW | ROWS } ONLY 
  ] 
}}

The <offset_fetch> option is an optional item that is used in conjunctions with the ORDER BY clause to page through a set of data. It has two components:

OFFSET and FETCH.

The OFFSET option identifies the number of rows in an ordered row set to skip before rows are returned. The FETCH option is optional and identifies the number of rows that will be returned. If the FETCH option is not specified all rows from the OFFSET location to the end of the ordered set are returned.

To show how the OFFSET and FETCH clauses let’s go through a few different examples.

Using the OFFSET option

The OFFSET option of the ORDER BY clause is used to identify the number of rows to skip in a record set before rows are returned. The value can be from 0 (zero) or any number up to the number of rows in the set. When zero (0) is used no rows are skipped, as shown when the code in Listing 2 is executed.

SELECT * 
FROM TestData
ORDER BY ID 
OFFSET 0 ROWS;

Listing 2: Skipping zero rows.

When Listing 2 is executed the output showing in Report 1 is created.

A picture containing text, crossword puzzle, receipt Description automatically generated

Report 1: Output when Listing 2 is run.

As you can see when Listing 2 is run every row in table TestData is returned. In Listing 2 no rows were skipped because 0 (zero) was used for the OFFSET value and the FETCH option is not provided so all rows are returned from the sample data table.

Suppose the first 5 rows based on ID values needed to be skipped when selecting data. To meet that requirement the code in Listing 3 could be executed.

-- Skipping 5 rows
DECLARE @Skip INT = 5;
SELECT * FROM TestData
ORDER BY ID
 OFFSET @Skip ROWS;

Listing 3: Skipping 5 rows

The code in Listing 3, this time, specified that 5 rows would be skip. by using a variable instead of a constant. When Listing 3 is executed Report 2 is produced.

Report 2: Output created when Listing 3 is run.

By reviewing the output, in Report 2, you can see only the records with the ID value of greater than 5 and a StateName value of “Oregon” are displayed this time. That is because the first 5 rows in the TestData table based on the ID value were skipped before the rest of the test data table rows are returned using the SELECT statement.

Each example so far has only shown how to skip rows. If you want to limit the number of rows displayed the FETCH option needs to be used.

Using the FETCH option

Assume you what to skip no rows in the record set, but only display just the first three rows of data based on the ID column value. If this was the requirement, then the code in Listing 4 could be executed.

-- Display first 3 rows
DECLARE @Skip INT = 0;
DECLARE @Fetch INT = 3;

SELECT * 
FROM TestData
ORDER BY ID
 OFFSET @Skip ROWS
      FETCH NEXT @Fetch ROWS ONLY;

Listing 4: Displaying the first 3 rows

In Listing 4 another variable was declared @Fetch, which identifies the number of rows to return. It was set to the value 3. When the code in listing 4 is executed the output in Report 3 was produced.

Table Description automatically generated

Report 3: Output created with Listing 4 is run.

By reviewing Report 3 you can see that zero rows were skipped, as identified by the OFFSET value. Plus, only the first 3 rows of the TestData table based on the ID column were displayed, because the FETCH option variable @Fetch was set to 3.

Suppose you wanted to display the first three cities in Oregon based on the ID column value. To accomplish this requirement the code in Listing 5 could be run.

-- Display first 3 Oregon Cities
DECLARE @Skip INT;
DECLARE @Fetch INT = 3;

SELECT TOP (1) @SKIP = ID - 1 FROM TestData 
WHERE StateName = 'Oregon';

SELECT * FROM TestData
ORDER BY ID
 OFFSET @Skip ROWS
      FETCH NEXT @Fetch ROWS ONLY;

Listing 5: Displaying first three cities in Oregon.

When the code in Listing 5 is executed the results in Report 4 is displayed

Report 4: Results displayed with Listing 5 is run.

In Listing 5 the @Skip variable was set programmatically using a SELECT statement. That statement identified the ID value for the first row that had “Oregon” set as the StateName. By programmatically setting the @Skip variable, all of the Washington state cities were skipped. Only the first 3 Oregon state rows were displayed because the @Fetch variable was set to 3.

Paging through data with a loop

The examples in the prior sections showed how to use the OFFSET and FETCH options to identify the rows to skip and display from the sample data table. By changing the OFFSET and FETCH values between calls to SQL Server an application can page through a table of data. This is particularly useful when you need to display one page at a time while paging through a table with a large number of rows. By using the OFFSET and FETCH options of the ORDER BY clause will minimize the amount of data transmitted back to the client, by only sending one page data at a time to the application.

To simulate paging through the sample data my example will use a WHILE loop. The code in Listing 6 pages through the sample data displaying 3 rows of data at a time. Keep in mind while you review this example a client application would normally perform the operations of looping through data one page at a time.

-- Paging through sample data
DECLARE @Skip INT = 0 ;
DECLARE @Fetch INT = 3;
DECLARE @LoopCnt INT;

SELECT @LoopCnt = COUNT(*) / @Fetch 
FROM TestData;

WHILE @LoopCnt > 0
BEGIN
  SET @LoopCnt = @LoopCnt - 1;

  SELECT * FROM TestData
  ORDER BY ID
  OFFSET @Skip ROWS
  FETCH NEXT @Fetch ROWS ONLY;

-- Adjust the rows to skip
  SET @SKIP = @SKIP + @Fetch;
END

Listing 6: Paging through sample data 3 rows at a time

The first time through the WHILE loop the rows in Report 5 are displayed. This is because the @Offset value is set to zero and the @Fetch option is set to 3.

Table Description automatically generated

Report 5: First time through the loop.

Before the second time through the loop the @Skip value is increased by the value of 3 that is contained in the @Fetch variable. The rows in Report 5 are displayed for the second time the loop is executed.

Report 6: Second time through loop.

Before the last time through the loop the @Skip is increased again by 3. Report 7 shows the rows displayed for the third time through the loop.

Report 7: Last time through the loop.

As you can see by adjusting the @Skip variable between each time through the loop the next set of 3 rows where displayed.

A caveat: Changes to underlying data

One of the biggest concerns when paging data is that you are not holding any locks or version control of the results. Each execution of the SELECT statement fetching rows is executing the query again. This could be an issue for hard to optimize query, but there is one more interesting issue with that. Changes to the results of your query.

Aany change to the underlying data can cause you to see rows again, or perhaps miss rows. For example, consider the following set of statements in Listing 7:

-- Display first 3 rows
DECLARE @Skip INT = 0;
DECLARE @Fetch INT = 3;

SELECT * FROM TestData
ORDER BY ID OFFSET @Skip ROWS
      FETCH NEXT @Fetch ROWS ONLY;
GO

Listing 7: Showing the effect of changing rowsets

This returns the output you see in report 8:

Table Description automatically generated

Report 8: Result from fetching first three rows

Next, in Listing 8, I will delete the row with ID = 3, and then run the statement that an application would execute if paging through these rows. It is the same code as in Listing 7, but I skipped 3 rows instead of 0.

DELETE FROM TestData
WHERE ID = 3;
GO

-- Display first 3 rows
DECLARE @Skip INT = 3;
DECLARE @Fetch INT = 3;

SELECT * F
ROM TestData
ORDER BY ID OFFSET @Skip ROWS
      FETCH NEXT @Fetch ROWS ONLY;

Listing 8: Removing an already fetched row, then fetching next rows

In report 9, you can see that the ID value starts at 5 instead of 4, like you may have expected. When rows are inserted, you may end up with the same row returned multiple times.

Graphical user interface, text, application Description automatically generated

Report 9: Shows that row with ID=4 has been skipped.

If you require to get absolutely all of the rows from your SELECT statement, it can be useful to store the results in a temporary table and page through it. Another method of handling this is using SNAPSHOT isolation level.

Summary

In this article you learned how to use the OFFSET and FETCH options of the ORDER BY clause to page through an ordered set of records. The OFFSET option was used to skip a specific number of rows in the ordered set. Whereas the FETCH option was used to identify the number of rows to FETCH from the record set. By controlling these two different options a client application could programmatically page through the rows of data a page at a time. Next time you need to page through a set of rows in a table consider whether using the OFFSET and FETCH options of the ORDER BY clause will meet your paging requirements.

 

The post Paging Data in T-SQL appeared first on Simple Talk.



from Simple Talk https://ift.tt/PA7B16N
via