|
|
Question : Unix Shell script or Perl script to merge two csv text files based on a common key column
|
|
|
|
I have the following 2 text files (with their columns separated by comma) & I need to merge them. File2 has all the keys (ie the first column) values of file1 but both files are not sorted. So all the keys (ie 1st column) in file1 is a subset of the keys found in file2.
file1: INC00023233,description 1 text,field 1 text,.....,field 1 text INC00023132,description 2 text,field 2 text,.....,field 2 text INC00023073,description 3 text,field 3 text,.....,field 3 text INC00023573,description 4 text,field 4 text,.....,field 4 text ........
file2: INC00011312,start date fieldA,end date fieldA ...... INC00023233,start date fieldB,end date fieldB PBI00023232,start date fieldC,end date fieldC ...... INC00023073,start date fieldD,end date fieldD ...... INC00023132,start date fieldE,end date fieldE ..... INC00023573,start date fieldF,end date fieldF .....
I would like to merge the above 2 files such that the resultant merged csv file is as follows : INC00023233,description 1 text,field 1 text,.....,field 1 text,start date fieldB,end date fieldB INC00023132,description 2 text,field 2 text,.....,field 2 text,start date fieldE,end date fieldE INC00023073,description 3 text,field 3 text,.....,field 3 text,start date fieldD,end date fieldD INC00023573,description 4 text,field 4 text,.....,field 4 text,start date fieldF,end date fieldF . . . . .
The UNIX I'm running is a Redhat 4 but I do have HP-UX B11.11 (Perl interpreter is in the Redhat servers but I'm not too sure about the HP-UX servers), so the script provided need to be able to run on those platforms
|
|
|
|
Answer : Unix Shell script or Perl script to merge two csv text files based on a common key column
|
|
Presuming there are no commas in the various fields i.e. description, etc. do not include commas.
A simpler method might be to use mysql and load the first file and then issue updates with data from the second file.
You can store the output in a file and load it. In this case make sure to add #!/usr/local/bin/perl at the top of the file and 1; at the end of the output file so you do not get errors when you require the file where the hash is stored..
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
|
#!/usr/local/bin/perl
$file1="path to filename1";
my $hash={};
open (File1, "<${file1}") || die "Unable to open ${file1} for reading $! \n";
while (<File1>){
chomp();
@array=split(/,/,$_);
#The below builds a referenced hash of hashes
$hash->{$array[0]}{'exists'}=1;
$hash->{$array[0]}{'description 1 test'}=$array[1];
$hash->{$array[0]}{'field 1 test'}=$array[2];
$hash->{$array[0]}{'field 2 test'}=$array[3];
.
.
.
}
#done with processing file1.
close(File1);
open (File2,"<$file2") || die "Unable to open $file2 for reading: $!\n:"
while (<File2>){
chomp();
@array=split (/,/,$_);
if (exists $hash->{$array[0]}{'exists'} and $hash->{$array[0]}{'exists'}==1){
$hash->{$array[0]}{'first field from second file'}=$array[1];
$hash->{$array[0]}{'second field'}=$array[2];
$hash->{$array[0]}{'third field'}=$array[3];
.
.
.
.
}
}
#done with processing files when there are matching keys.
close(File2);
#you now have a $hash that is a reference to hashes.
foreach $key (keys %{$hash}){
# print "$key :$hash->{$key}\n";
foreach $index (keys %{$hash->{$key}}) {
print "\$hash->{'$key'}{'$index'}=$hash->{$key}{$index}\n";
}
}
|
|
|
|
|