1) What is Datastage? Design jobs for Extraction, Transformation and Loading( ETL). Ideal tool data integration projects such as data warehouses, data marts. Top 60 DataStage Interview Questions Ans Answers These Data stage questions were asked in various interviews and prepared by. Datastage Interview Question and Answers - Download as Word Doc .doc /. docx), PDF File .pdf), Text File .txt) or read online.
|Language:||English, Spanish, Arabic|
|ePub File Size:||24.73 MB|
|PDF File Size:||17.24 MB|
|Distribution:||Free* [*Regsitration Required]|
Important and frequently asked Datastage interview questions with answers. PDF version also available for IBM Datastage questions and answers useful for. A data stage is basically a tool that is used to design, develop and Top 50 Datastage Interview Questions & Answers .. Download PDF. Frequently asked Datastage Interview Questions with detailed answers and examples. Tips and Tricks for cracking Datastage interview. Happy job hunting.
Quality stage is also called as Integrity stage. While, there is no concept of partition and parallelism in informatica for node configuration. Upendra Rao. Which type is not used for lookups in data stage? A sequential file is just a file with no key column.
Set — Sets the Preserve partitioning flag. It means we need Partitioning where we have huge volumes of data to process. Orchestrate dynamically scales your application up or down in response to system configuration changes. A dataset can be saved across nodes using partitioning method selected. By default. If you are processing very large volumes and need to sort you will find the sort stage is more flexible then the partition tab sort. Subsequent operators in the sequence could perform various processing and analysis tasks.
Basic transformer takes less time to compile than the Normal Transformer. Data sets are operating system files. The processing power of Orchestrate derives largely from its ability to execute operators in parallel on multiple processing nodes. The operators in your Orchestrate application pass data records from one operator to the next. The Data Set stage allows you to store data being operated on in a persistent form. The sort stage is for use when you don't have any stage doing partitioning in your job but you still want to sort your data.
In parallel jobs we have specific stage types for performing specialized tasks. Thus using operator stages will increase the speed of data processing applications rather than using transformer stages.
Basic transformer does not run on multiple nodes whereas a Normal Transformer can run on multiple nodes giving better performance. This means if you are partitioning your data in a stage you can define the sort at the same time.
Many stages have an optional sort function via the partition tab. Orchestrate operators execute on all processing nodes in your system. Using datasets wisely can be key to good performance in a set of linked jobs. Sort Stage is used to perform more complex sort operations which are not possible using stages Advanced tab properties.
These operators are the basic functional units of an orchestrate application. Use wait for file activity stage between job activity stages in job sequencer. Using Dataset stage instead of sequential files wherever necessary. Use Operator stages like remove duplicate. Which execution mode would you use when you used for comparison of data? The Change Capture stage takes two input data sets.
The preserve-partitioning flag is set on the change data set.
If two rows have identical key columns. By using check point information we can restart the sequence from failure. Filter unwanted records in beginning of the job flow itself. You can also optionally specify change values. The stage assumes that the incoming data is key-partitioned and sorted in ascending order. The columns the data is hashed on should be the key columns used for the data compare.
Sort the data before sending to change capture stage or remove duplicate stage. Key column should be hash partitioned and sorted before aggregate operation.
Use Join stage instead of Lookup stage when the data is huge. A basic transformer should be used in Server Jobs. The stage produces a change data set. The compare is based on a set of key columns. Join performs all 4 types of joins inner join. You can define part of your schema and specify that.
If reference table is having huge amount of data then we go for join where as if the reference table is having less amount of data then we go for lookup. We can use both Sequential as well as parallel modes of execution for change capture stage. It can have a single input link and any number of output links. How it is implemented? DataStage is flexible about Meta data. Join uses hash partition where as lookup use entire partition.
When do you use it? The Peek stage lets you print record column values either to the job log or to a separate output link as the stage copies records from its input data set to one or more output data sets.
The Peek stage can be helpful for monitoring the progress of your application or to diagnose a bug in your application.
This can be enabled for a project via the DataStage Administrator. This is known as runtime column propagation RCP. The schema file is a plain text file contains a record or row definition. The Row Generator stage produces a set of mock data fitting the specified metadata. It is just a unique identifier or number for each row that can be used for the primary key to the table.
RCP is implemented through Schema File. De-generative Dimension: It is line item-oriented fact table design. Surrogate key is a substitution for the natural primary key. This is useful where we want to test our job but have no real data available to process. Row Generator is also useful when we want processing stages to execute at least once in absence of data from the source.
It has no input links. Derivations and Constants? Stage Variable. Expression that specifies value to be passed on to the target column. You should always ensure that runtime column propagation is turned on. The only requirement for a surrogate primary key is that it is unique for each row in the table. It is useful because. Constant — Conditions that are either true or false that specifies flow of data with a link.
Conformed Dimension: If a dimension table is connected to more than one fact table. Junk Dimension: The Dimension table. Transformer stages. Monster Dimension: If rapidly changes in Dimension are known as Monster Dimension.
Sparse Lookup: If the reference table is having more amounts of data than Primary table data. We can capture duplicates by using sort stage. If the reference table is having less amount of data than primary table data. Then by using transformer or Filter stage we can capture duplicates in one file and non-duplicates in another file. Basis on this hash key feature, searching in Hash file is faster than in sequential file. In Datastage, routines are of two types i.
We can call a routine from the transformer stage in Datastage. We can say, ODS is a mini data warehouse.
It can be used to incorporate other languages such as French, German, and Spanish etc. These languages have same scripts as English language. In order to improve performance in Datastage, it is recommended, not to use more than 20 stages in every job. If you need to use more than 20 stages then it is better to use another job for those stages.
I have worked with these tools and possess hands on experience of working with these third party tools. Whenever we launch the Datastage client, we are asked to connect to a Datastage project. There are two types of hash files in DataStage i. The static hash file is used when limited amount of data is to be loaded in the target database. In Datastage, MetaStage is used to save metadata that is helpful for data lineage and data analysis. This knowledge is useful in Datastage because sometimes one has to write UNIX programs such as batch programs to invoke batch processing etc.
Transaction size means the number of row written before committing the records in a table. In Datastage, we use Surrogate Key instead of unique key. Surrogate key is mostly used for retrieving data faster. It uses Index to perform the retrieval operation. In the Datastage, the rejected rows are managed through constraints in transformer.
We can either place the rejected rows in the properties of a transformer or we can create a temporary storage for rejected rows with the help of REJECTED command. Orabulk stage is used to load large amount of data in one target table of Oracle database.
In Datastage, Link Partitioner is used to divide data into different parts through certain partitioning methods.
Your email address will not be published. Latest Articles You are here: Datastage Quiz. Please wait while the activity loads. If this activity does not load, try refreshing your browser.
The number of correct answers. The total number of questions. Correct answer percentage. Number of wrong answers. Time allowed Requires timer add on.
Time used Requires timer add on. Your answers are highlighted below. Question 1. Question 2. Process where hardware resources are shared by processor?
Question 3. Question 4. Which type is not used for lookups in data stage? Question 5. In data stage, the repository is another name for a data warehouse? Question 6. Which function is used to convert formats from one format to another.
Question 7. How do you find the number of rows in a sequential file? Question 8. What does not contain information for more than one year? Question 9. Question Name the third party tool that can be NOT Used in data stage? Which hash file is used to when limited amount of data is to be loaded in the target database? Which is type of view in data stage director?
Which command is used to execute datastage job from command line prompt? Which type of job is not used in datastage? Once you are finished, click the button below.
Any items you have not completed will be marked incorrect. Get Results. You have completed. You have not finished your quiz. If you leave this page, your progress will be lost. Download PDF. You Might Like: Job Control List. Job Control Language. Join Control Language. Java Control Language.