Overview on DataStage Architecture
DataStage is considered as a useful ETL tool that uses graphical presentation to process data integration.It is also available in various versions in current market.
DataStage Architecture:
DataStage follows the client-server architecture. The different versions of DataStage have different types of client-server architecture. Now I am going to explain the architecture of DataStage7.5. Data Stage 7.5 version is a standalone version in which DataStage engine, repository (metadata) and service all is installed in server and client is installed in local PC. These access the servers by the ds-client. data stage online course helps you to learn more techniques.
The users are created in Windows or Unix DataStage server. One has to create new Windows or UNIX user in the DS server to give them the access permission. Then need to add them to Data Stage group. It will give them access to the DataStage server from the client. Dsadm is the Data Stage server and dstage is the group of DataStage.
Client components:
- DataStage Administrator – The DataStage administrator is answerable for creating and deleting projects and also setting the environment variable.
- DataStage Designer – DataStage Designer is used for designing the job.
- DataStage Director – It is responsible for run, scheduling and validate the jobs.
- DataStage Manager – DataStage Manager is used for export and import the projects.
Server components:
- DS Server – DS Server remains answerable to run executable server tasks
- DS Package Installer–It is used to install packaged DS jobs
- Repository or project–It is a central store, which is containing all the information.
The Datastage Designer is the primary interface to the metadata repository and provides a graphical user interface that enables you to view, edit, and assemble Datstage objects from the repository needed to create an ETL job.
An ETL job should include source and target stages. Additionally, your server job can include transformation stages for data filtering, data validation, data aggregation, data calculations, data splitting for multiple outputs, and usage of user-defined variables or parameters. These stages allow the job design to be more flexible and reusable.
Datastage Designer enables you to:
· Create, edit, and view objects in the repository.
Create, edit, and view data elements, table definitions, transforms, and routines.
· Import and export DataStage components, such as projects, jobs, and job components.
· Analyze the use of particular items in a project.
· Edit and view user-defined object properties.
· Create jobs, job sequences, containers, and job templates.
· Create and use parameters within jobs.
· Insert and link stages into jobs.
· Set stage and job properties.
· Load and save table definitions.
· Save, compile, and run jobs.
Datastage Designer Window
The Datastage Designer window, which is the graphical user interface used to view, configure, and assemble Datastage objects, contains the following components:
· Repository Window: Displays project objects organized into categories. By default, the Repository window is located in the upper left corner of the Designer window. The project tree displays in this pane and contains the repository objects belonging to a project. datastage training for more techniques and skills
· Tool Palette: Contains objects that you add to your job design, such as stage types, file types, database types, and processor objects. You can drag these objects from the Palette into the Diagram window. By default, this window is displayed in the lower left corner, of the Designer window. This window appears to be empty until you open or create a job.
· Diagram Window: Serves as the canvas for your job design. You drag, drop, and link stages and processor objects to create jobs, sequencers, and templates.
· Property Browser: Displays the properties of the currently selected stage of the job that is open in the Diagram window. By default, this window is hidden. To open it, select View, Property Browser from the menu bar, and then click a stage to see its properties.
A job is an executable Datastage program. In Datastage, you can design and run
jobs that perform many useful data integration tasks, including data extraction, data
conversion, data aggregation, data loading, etc.
Datastage Jobs
DataStage jobs are:
• Designed and built in Designer.
• Scheduled, invoked, and monitored in Director.
Datastage Job design approach:
· In Administrator, Create the project and set project properties.
· In Manager, import metadata defining sources and targets as per your requirements for the project.
· In Designer, add stages defining data extractions and loads, Transformers, link partitioners or link collectors to provide the flow of data from source to target and other stages to defined data transformations.
· Define Job parameters to be used.
· Save and Compile the job.
· In Director, validate, run, and monitor your job and check logs for any errors occurred.
Types of Jobs in Datastage Jobs:
Server Jobs: These are executed by Datastage Server Engine, compiled into BASIC and does not support parallel processing.
Parallel Jobs: These are executed by Datastage Parallel Engine, have built-in functionality for pipeline and partition parallelism. datastage administrator training from industrial experts along with real time projects.
Sequence Jobs: These jobs control the order of execution of server/parallel jobs.