Problem Statement :
My CSV is arranged as follow :
ID NAME DATE
1 A 10/7/2018
1 B 10/8/2018
2 C 10/9/2018
3 D 10/10/2018
4 E 10/11/2018
5 A 10/12/2018
5 B 10/13/2018
.
.
.
11 B 10/23/2018
11 C 10/24/2018
12 D 10/25/2018
13 E 10/26/2018
There are duplicate IDs present with different dates.
How can I write a Talend job that identifies the earliest and latest records in the CSV file.
My CSV is arranged as follow :
ID NAME DATE
1 A 10/7/2018
1 B 10/8/2018
2 C 10/9/2018
3 D 10/10/2018
4 E 10/11/2018
5 A 10/12/2018
5 B 10/13/2018
.
.
.
11 B 10/23/2018
11 C 10/24/2018
12 D 10/25/2018
13 E 10/26/2018
There are duplicate IDs present with different dates.
How can I write a Talend job that identifies the earliest and latest records in the CSV file.
Solution :
This can be achieved using tAggregateRow component.
Job Overview |
- Create new Meta data connection for File Delimited drag the tFileInputDelimited from palette.
- Drag tAggregateRow component. Connect main row of tFileInputDelimited to the tAggregateRow component and do the following settings:
- Add ID column in group by category
- Add max function for DATE column as we want record with max date
- Add last function for NAME column as we want latest updated record
tAggregateRow component Settings |
3. Drag tLogRow and run the job.
No comments:
Post a Comment