imocha: April 2014

Thursday, 24 April 2014

Aggregation without using aggregator transformation in informatica

Here the scenario is to find the sum of salary department wise.

My source is

Deptno	Sal
20	800
30	1600
30	1250
20	2975
30	1250
30	2850
10	2450
20	3000
10	5000
30	1500
20	1100
30	950
20	3000
10	1300

And my target will be

Deptno	Sal
10	8750
20	10875
30	9400

Solution:
Algorithm
1. Import source
2. If the source is relational database table apply sorting in the source qualifier. If the source is flat file take a sorter and pass ports from source into the sorter transformation and checked Deptno as key and select ascending in the Direction option. ( even though the source from relational database you can take a sorter transformation for sorting but the sorting in the source qualifier will increase performance).
3. Take an expression transformation and drag Deptno and Sal ports from the sorter or source qualifier which one you are using.
4. And do like

5. Take another sorter and apply

6. Again take a expression transformation and apply

7. Take a filter transformation and apply filter condition as FLAG_O =1 as

8. Pass the ports from filter into the target.

That’s it. The final mapping will look like this.

Tracing level

It is very much important, to track the cause of error. But, we have to take care of the performance also.

It is the level of details capture in the log file. If we increase the amount of detail information it will reduce the performance. So we have to take care, while we choosing the tracing level.

Tracing Level	Description
Normal	Integration Service logs initialization and status information, errors encountered, and skipped rows due to transformation row errors, Summarizes session result, but not at the level of individual rows.
Terse	Integration Service logs initialization information, error messages, and notification of rejected data.
Verbose Initialization	Normal + the Integration Service logs additional initialization details, names of index and data files used, and detailed transformation statistics.
Verbose Data	Verbose Initialization + the Integration logs each row that passes into the mapping. Also where the Integration Service truncates string data fit the precision of a column and provides detailed transformation statistics. When we configure the tracing level to verbose data, the Integration Service writes row data for all rows in a block when it processes a transformation.

Reject File in Informatica

Integration Service may create reject file when we run a session. So that we can rectify the reason of rejection and correct the code and run it again with the right code. Thanks to the people who developed the Informatica Power Center, they know that some people will come up with the wrong code or wrong data.

Then when and why the reject file will create

Reject file will create when

1. The field contains NULL values

2. When Numeric data exceeded the specified precision or scale for the column.

3. String data exceeded a specified precision for the column, so the Integration Service truncated it.

Integration Service create reject files for each instance in the mapping. By default the reject file is created at the $PMBadFileDir process variable directory. But we can change the directory path by changing value in the mapping tab for the session.

One important point is that Integration Service will create multiple reject files if we run a session with multiple partitions. Integration Service will create one reject file for each partition.

How reject file look like?

When we open a reject file,( we can use any text editor) we will get rows of data rejected by the writer or target database. Even though the error occurred in only one column of a row, Integration Service will write the entire row in the reject file.

But the question is how we will quickly identify which column is responsible for the rejection?

The answer is, Integration Service adds row indicator and column indicator so that we can find out the reason of rejection.

Row Indicator: The firs column in each row in the reject file is row indicator. The row indicator defines the row was marked for insert, update, delete or reject.

Row Indicator	Meaning	Rejected by
0	Insert	Writer or target
1	Update	Writer or target
2	Delete	Writer or target
3	Reject	Writer
4	Rolled-back insert	Writer
5	Rolled-back update	Writer
6	Rolled-back delete	Writer
7	Committed insert	Writer
8	Committed update	Writer
9	Committed delete	Writer

Column Indicator: we can find column indicator, after every column of data. The column indicator defines whether the column contains valid, overflow, null, or truncated data.

Column Indicator	Type of data	Writer Treats As
D	Valid data.	Good data. Writer passes it to the target database. The target accepts it unless a database error occurs, such as finding a duplicate key.
O	Overflow. Numeric data exceeded the specified precision or scale for the column.	Bad data, if you configured the mapping target to reject overflow or truncated data.
N	Null. The column contains a null value.	Good data. Writer passes it to the target, which rejects it if the target database does not accept null values.
T	Truncated. String data exceeded a specified precision for the column, so the Integration Service truncated it.	Bad data, if you configured the mapping target to reject overflow or truncated data.

Tuesday, 22 April 2014

Difference between Joiner and Source Qualifier

Joiner	Source Qualifier
Join is possible with heterogeneous source	Join is possible only with the relational database within the same schema
It require input pipelines	It does not require input pipelines
At least one matching filed is required to apply join	It doesn’t require matching fields it join based on matching ports.

Padding using Java Transformation

I have a source

Dname

RESEARCH

SERVICE

SALES

PRODUTION

MARKETING

FINANCE

and Target should be

Dname	Length	Max Length		Right Padding
RESEARCH	8	9	RESEARCH*
SERVICE	7	9	SERVICE**
HR	2	9	HR*******
IT	2	9	IT*******
SALES	5	9	SALES****
PRODUTION	9	9	PRODUTION
MARKETING	11	9	MARKETING
FINANCE	7	9	FINANCE**

Solution
The logic is to find the maximum length of the Dname field and make the length of the Right Padding field equal to the maximum length of Dname field by putting '*' at the right.

Algorithm

1. Import source

2. Pass ports into an expression transformation

3. Create one output Port dlength as integer data type in the expression transformation.

4. In the expression of dlength port write as length(DNAME). It will give the length of the input dname value.

5. Create another output port dummy as integer data type.

6. In the expression of dummy port give a value 1.

7. Take an aggregator transformation.

8. Pass the ports dummy and dlength into aggregator transformation.

9. Select Group By check box of dummy.

10. Create an output port max_lenght as integer data type.

11. In the expression of the max_length port write max(dlength). It will give maximum length of the dlength.

12. Checked Sorted input of the aggregator transformation.

13. Take a joiner transformation.

14. Pass the ports dummy,Dname and dlenght from expression transformation into the joiner transformation and checked Sorted Input in the joiner properties.

15. Pass the ports dummy1 and max_length from the aggregator transformation into joiner transformation.

16. In the joiner transformation give joiner condition as dummy = dummy1

17. Take a java transformation.

18. Pass the ports Dname,dlength and max_length from joiner transformation into the Java Transformation.

19. In the java transformation create one output port new_dname as string.

20. Go to java code tab.
21. Select the Helper Code tab and put the code as