Monday, August 21, 2023

Fabric Lakehouse: Convert to Table feature and Workspace Level Spark Configuration

I have been working as a no-code data engineer: Focused on Data Factory ETL and visual tools. In fact, I prefer to use visual resources when possible.

On my first contact with Fabric Lakehouse I discovered to convert Files into Tables I need to use a notebook. I was waiting a lot of time for a UI feature to achieve the same, considering this is a very simple task.

Convert to Table feature is Available in Lakehouses

This feature is finally available in the lakehouse: You can right-click a folder and choose the option “Convert to Table”.

When converting, you can create a new table or add the information to an existing table. This allows you to make an incremental load manually, if needed.

It’s simple as a right-click over the folder and asking for the conversion.

A screenshot of a computer Description automatically generated

Table Optimizations in Lakehouses

There are optimizations we should do when writing delta tables. We usually do these optimizations on the spark notebooks we create.

For example:

  • spark.sql.parquet.vorder.enabled
  • spark.microsoft.delta.optimizeWrite.enabled
  • spark.microsoft.delta.optimizeWrite.binSize

A screenshot of a computer Description automatically generated

You can discover more about these optimizations on this article from Microsoft

How would we make these configurations if we use the UI feature?

Workspace Level Spark Configuration

We can make these configurations on workspace level. In this way, these configurations will become default and be applied to every write operation.

  1. On the workspace, click the button Workspace Settings

A close-up of a screen Description automatically generated

  1. On the Workspace settings window, click Data Engineering/Science on the left side.
  2. Click Spark compute option.

A screenshot of a phone Description automatically generated

  1. Under Configurations area, add the 3 properties we need for optimization.

A screenshot of a computer Description automatically generated

Differences between converting using UI or Notebooks

Let’s analyze some differences in relation to the usage of the UI and the usage of a spark notebook:

UI Conversion

Spark Notebook

No writing options configuration, it depends on workspace level configuration

Custom writing options configuration

No partitioning configuration. The table can’t be partitioned

Custom partitioning is possible for the tables.

Manual Process, no scheduling possible

Schedulable process

Summary

This is an interesting new interactive feature for lakehouse in Fabric, but when we need to build a pipeline to be scheduled, we still need to use notebooks or data factory.

The workspace level configuration for spark settings is also very interesting.

The post Fabric Lakehouse: Convert to Table feature and Workspace Level Spark Configuration appeared first on Simple Talk.



from Simple Talk https://ift.tt/xFh659f
via

No comments:

Post a Comment