Search This Blog

Wednesday, September 5, 2018

How to find latest record using date column from a CSV File in Talend

Problem Statement :

My CSV is arranged as follow :

ID NAME DATE
1 A 10/7/2018
1 B 10/8/2018
2 C 10/9/2018
3 D 10/10/2018
4 E 10/11/2018
5 A 10/12/2018
5 B 10/13/2018
.
.
.
11 B 10/23/2018
11 C 10/24/2018
12 D 10/25/2018
13 E 10/26/2018

There are duplicate IDs present with different dates.
How can I write a Talend job that identifies the earliest and latest records in the CSV file.



Solution :


This can be achieved using tAggregateRow component.


Job Overview


  1.  Create new Meta data connection for File Delimited drag the tFileInputDelimited from palette.
  2.  Drag tAggregateRow component. Connect main row of tFileInputDelimited to the tAggregateRow component and do the following settings:
        • Add ID column in group by category
        • Add max function for DATE column as we want record with max date
        •  Add last function for NAME column as we want latest updated record


tAggregateRow component Settings



  3. Drag tLogRow and run the job.


No comments:

Post a Comment