April 8, 2020

ETL Testing Interview Questions and Answers

What is ETL?

In data warehousing architecture, ETL is an important component, which manages the data for any business process. ETL stands for Extract, Transform and Load. Extract does the process of reading data from a database. Transform does the converting of data into a format that could be appropriate for reporting and analysis. While, load does the process of writing the data into the target database.

Explain what are the ETL testing operations includes?

ETL testing includes

  • Verify whether the data is transforming correctly according to business requirements
  • Verify that the projected data is loaded into the data warehouse without any truncation and data loss
  • Make sure that ETL application reports invalid data and replaces with default values
  • Make sure that data loads at expected time frame to improve scalability and performance

Mention what are the types of data warehouse applications and what is the difference between data mining and data warehousing?

The types of data warehouse applications are

  • Info Processing
  • Analytical Processing
  • Data Mining

Data mining can be define as the process of extracting hidden predictive information from large databases and interpret the data while data warehousing may make use of a data mine for analytical processing of the data in a faster way. Data warehousing is the process of aggregating data from multiple sources into one common repository.

What are the various tools used in ETL?

  • Cognos Decision Stream
  • Oracle Warehouse Builder
  • Business Objects XI
  • SAS business warehouse
  • SAS Enterprise ETL server

What is fact? What are the types of facts?

It is a central component of a multi-dimensional model which contains the measures to be analysed. Facts are related to dimensions.

Types of facts are

Explain what are Cubes and OLAP Cubes?

Cubes are data processing units comprised of fact tables and dimensions from the data warehouse. It provides multi-dimensional analysis.

OLAP stands for Online Analytics Processing, and OLAP cube stores large data in muti-dimensional form for reporting purposes. It consists of facts called as measures categorized by dimensions.

What are the names of the layers in ETL?

The first layer in ETL is the source layer, and it is the layer where data lands. The second layer is the integration layer where the data is stored after transformation. The third layer is the dimension layer where the actual presentation layer stands.

What is meant by snapshots?

Snapshots are the copies of the read-only data that is stored in the master table.

What are views?

Views are built using the attributes of one or more tables. View with a single table can be updated, but those with multiple tables cannot be updated. More info ETL Training

 

What is meant by a materialized view log?

A materialized view log is the pre-computed table with aggregated or joined data from the fact tables, as well as the dimension tables.

What is a materialized view?

A materialized view is an aggregate table.

What is the difference between PowerCenter and PowerMart?

PowerCenter processes large volumes of data, whereas Power Mart processes small volumes of data.

What is the exact purpose of an ETL according to you?

It is actually very beneficial for the extracting of data and from the systems that are based on legacy. 

Can you define measures in a simple statement?

Well, they can be called as the number data which is generally based on the columns and is generally present in a fact table by default. Learn more from ETL Testing Certification

When you will make use of the Lookup Transformation?

It is one of the finest and in fact one of the very useful approaches in the ETL. It simply makes sure that the users can get a related value from a table and with the help of a column value that seems useless. In addition to this, it simply makes sure of boosting the performance of a dimensions table which is changing at a very slow rate. Also, there are situations when the records already remain present in the table. Dealing with such issues can be made possible with the help of Lookup transformation. 

What do you understand by Data Purging?

There are needs and situations when the data needs to be deleted from the data warehouse. It is a very daunting task to delete the data in bulk. The Purging is an approach that can delete multiple files at the same time and enable users to maintain speed as well as efficiency. A lot of extra space can be created simply with this.

Can you tell something about Bus Schema?

Dimension identification is something that is very important in the ETL and the same is largely handled by the Bus Schema.

Are your familiar with the Dynamic and the Static Cache?

When it comes to updating the master table, the dynamic cache can be opted. Also, the users are free to use it for changing the dimensions. On the other side, the users can simply manage the flat files through the Static Cache.  It is possible to deploy both the Dynamic and the Static Cache at the same time depending on the task and the overall complexity of the final outcome. 

What do you mean by staging area?

It is an area which is used when it comes to holding the information or the data temporary on the server that controls the data warehouse. There are certain steps that are included and the prime one among them is Surrogate assignments. 

What are the types of Partitioning you are familiar with?

There are two types of Partitioning that is common in ETL and they are:

1.   Hash 

2.   Round-robin

Can you tell a few benefits of using the Data Reader Destination Adapter?

There are ADO record sets which generally consists of columns and records. When it comes to populating them in a simple manner, the Data Reader Destination Adapter is very useful. It simply exposes the data flow and let the users impose various restrictions on the data which is required in many cases.

 

What are joiner and lookup?

The joiner is used to join two or more tables to retrieve the data from tables.

Lookup

 is used to check and compare the source table and the target table.