Lake Formation Data Filter Governance and Compliance

Author

Reads 729

Scenery of lake flowing among rocky formations covered with abundant green plants under clouds
Credit: pexels.com, Scenery of lake flowing among rocky formations covered with abundant green plants under clouds

Lake Formation Data Filter Governance and Compliance is a crucial aspect of managing data in Amazon Web Services (AWS). It helps ensure that sensitive data is handled correctly and securely.

Data filtering in Lake Formation is based on row-level security, which means that access to data is controlled at the individual row level. This approach ensures that users only see the data they're authorized to access.

To implement data filtering, you need to define a data catalog, which is a metadata store that contains information about your data. This catalog is used to identify and categorize sensitive data, making it easier to apply filters and access controls.

Data filtering can be applied at various points in the data pipeline, including during data ingestion, processing, and querying. By controlling access to sensitive data at these points, you can minimize the risk of data breaches and unauthorized access.

Expand your knowledge: Data Lake Data Catalog

Filters

Filters are used to refine permissions in Lake Formation data, allowing users to view specific data portions rather than the entire dataset. This is achieved by creating data filters that include information such as the name of the filter, table, database, columns to include or exclude, and row filter expression.

Focused detail of a modern server rack with blue LED indicators in a data center.
Credit: pexels.com, Focused detail of a modern server rack with blue LED indicators in a data center.

A data filter includes the following information: Name, Table, Database, Columns, Column-level access, and Row filter expression. This information can be used to specify which data to include or exclude in query results.

Data filters can be created to include or exclude specific columns in query results. For example, a data filter can be created to include only the columns c_emailaddress, c_phone, c_dob, c_firstname, c_address, c_country, c_lastname, and tenanted in the order_details table.

Synopsis

A filter is a way to narrow down the data in a database. It's like having a superpower that helps you find exactly what you need.

DatabaseName is a string, which means it's a word or phrase that identifies your database. This is the foundation of your filter.

FilterExpression is also a string, and it's the actual rule that determines what data gets filtered out or included. Think of it as a set of instructions that says "show me this" or "hide that".

A wildcard with exclusions is a shorthand way to write a filter expression. It's like a shortcut that saves you time and effort.

Filters

Calm winter lake with snowy ice formations reflecting the cold and serene atmosphere.
Credit: pexels.com, Calm winter lake with snowy ice formations reflecting the cold and serene atmosphere.

Filters are a crucial aspect of data protection in AWS Lake Formation. They allow you to refine permissions and limit access to specific data portions, rather than granting access to an entire dataset.

A data filter includes information such as the name of the filter, the table and database it applies to, and the columns it includes or excludes. It also specifies the type of access, either include or exclude, for the columns, and a row filter expression that determines which rows to include in query results.

You can create a filter for a specific dataset, such as the US marketplace data, by specifying the target database, table, and column-level access. For example, to create a filter for the US marketplace data, you would enter the data filter name, target database, and target table, and then specify the row filter expression, such as marketplace='US'.

Data filters can be used to implement row-level and cell-level security by creating filters for specific datasets and then granting permissions on those filters. For instance, to grant permissions to the Japanese data analyst, you would select the filter amazon_reviews_JP and choose Grant.

See what others are reading: Create Azure Data Lake Storage Gen2

Eroded limestone formations by a tranquil lake with distant snow-capped mountains under a bright sky.
Credit: pexels.com, Eroded limestone formations by a tranquil lake with distant snow-capped mountains under a bright sky.

Here is a table summarizing the equivalent AWS Lake Formation data filter for a given Protect masking type:

Note that a data filter can include a wildcard with exclusions, and the row filter expression can be specified using the WHERE clause syntax described in the PartiQL dialect.

Data filters can be created for specific datasets, such as the Japanese marketplace data, by specifying the target database, table, and column-level access. For example, to create a filter for the Japanese marketplace data, you would enter the data filter name, target database, and target table, and then specify the row filter expression, such as marketplace='JP'.

By using data filters, you can refine permissions and limit access to specific data portions, ensuring that sensitive data is protected and only accessible to authorized users.

Tenant1 and Tenant2 Users Run Queries

Tenant1 and Tenant2 users run queries using the SQL editor or a SQL client. They can connect to the database using the query editor with their respective user IDs.

Credit: youtube.com, Filtering a Database Query Using the Identity of the Logged In User

Tenant1_user can only see records where the tenantid value is Tenant1. This is because of the Lake Formation data filters that restrict access to specific data.

To validate these filters, Tenant1_user cannot see any records for Tenant2. This demonstrates the effectiveness of the data filters in isolating data by tenant.

Similarly, Tenant2_user can only see records where the tenantid value is Tenant2. This is also due to the Lake Formation data filters that limit access to certain data.

Tenant2_user also cannot see any records for Tenant1. This confirms that the data filters are working as intended to prevent cross-tenant data access.

Expand your knowledge: Aws Data Lake Formation

Access Control

Access Control is a crucial aspect of Lake Formation data filtering. You can create a filter to restrict access to specific data, such as the Japanese marketplace data.

To create a filter, navigate to the Data filters page and choose Create new filter. For Data filter name, enter a name like amazon_reviews_JP. The Target database should be lakeformation_tutorial_row_security, and the Target table should be amazon_reviews.

Credit: youtube.com, AWS Lake Formation access control model

For Row filter expression, enter marketplace='JP' to restrict access to records belonging to the JP marketplace. This will ensure that only relevant data is accessible to authorized users.

You can grant permissions to specific users, such as the Japanese data analyst, to access the filtered data. On the Data permissions page, choose Grant and select the user DataAnalystJP. For Policy tags or catalog resources, choose Named data catalog resources, and then select the database lakeformation_tutorial_row_security and the table amazon_reviews.

To grant the necessary permissions, select Select for Table permissions and Advanced cell-level filters for Data permissions. Then, select the filter amazon_reviews_JP to restrict access to the Japanese marketplace data.

Worth a look: Data Catalog in Azure

Security

Lake Formation data filters provide a simple way to manage fine-grained permissions by restricting access to specific rows in a table based on column values.

Row-level security restricts access to only specific rows in a table, where the filtering is based on the values of one or more columns.

Credit: youtube.com, AWS Tutorials - Row and Column Level Security in AWS Lake Formation

A salesperson analyzing sales opportunities should only see those opportunities in their assigned territory and not others, thanks to row-level filters that match the assigned territory of the user.

Data filters make it easier to manage permissions by allowing you to specify a row filter expression using the WHERE clause syntax described in the PartiQL dialect.

To run queries and see how permissions are enforced by Lake Formation, you need to create a special Athena workgroup named AmazonAthenaLakeFormationPreview, and switch to using it.

Governance

Governance is crucial in a data lake to prevent it from turning into a "data swamp." A data swamp is characterized by difficulties in locating, comprehending, or placing trust in the data stored within it.

Data governance serves as a structured framework for managing data within an organization. It defines who can access and use the data, how they can do so, and for what purposes.

Effective governance in a data lake involves correctly categorizing data, ensuring accessibility, establishing traceability, and implementing protective measures for the data. This ensures responsible, accurate, and secure data handling.

A well-implemented data governance framework also encompasses maintaining data quality, understanding its origins, and aligning its use with existing policies and regulations.

Check this out: Data Swamp

Lee Mohr

Writer

Lee Mohr is a skilled writer with a passion for technology and innovation. With a keen eye for detail and a knack for explaining complex concepts, Lee has established himself as a trusted voice in the industry. Their writing often focuses on Azure Virtual Machine Management, helping readers navigate the intricacies of cloud computing and virtualization.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.