Question : Need to optimize data cleanup routine in VB

Good Morning,

I have a 20MB pipe delimited file that was exported from Progress 9.1e.  It contains " as text qualifiers.  My problem is that some text fields contain line breaks (CRLF), which is also the end of record marker.

I wrote the attached code to strip out double quotes and then remove the CRLFs that are embedded in the text fields.  I am glad to say the code works.

HOWEVER, it appears that it will take about 10 hours to read the file in, some unkown time to process and then probably another 10 hours to write the updated file back out.  

So I'm looking to optimize the process by oh say a day or two.  I have been looking for better methods for how to read and write but could not figure it out.

The second file is a 20KB segment of the data showing the spurious CRLFs.

Any help would be greatly appreciated.  I am not proud about what the solution is, I can throw the code out.  If there is way to export from Progress without the additional CRLFs or a better language (although VB is the only thing we have installed) or a better algorithm rather than using an array, I'm open.

Thank you in advance for your assistance.


1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
65:
66:
67:
68:
69:
70:
71:
72:
73:
74:
75:
76:
77:
78:
79:
80:
81:
82:
Sub ReadChars()

Dim strCharacters(20000000) As String * 1
Dim intCharcount, intMaxChars As Long

Dim booIsQuote As Boolean
Dim i, j As Long

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objfile = objFSO.opentextfile("x:\TCB\Purchasing\08-09\ERP\Data Conversion\o_bp-inspect.txt", 1)


'Input File
intCharcount = 0
strSentence = ""
Do Until objfile.AtEndOfStream
    strCharacters(intCharcount) = objfile.read(1)

    intCharcount = intCharcount + 1

Loop

objfile.Close



'Remove Double Quotes in Text areas,  there are a few spurious instances
i = 0
strPattern = "|" & Chr$(34) & Chr$(34) & "|"
Do Until i = intCharcount - 1

    
    If strCharacters(i) & strCharacters(i + 1) & strCharacters(i + 2) & strCharacters(i + 3) <> strPattern Then GoTo Skip1
      strCharacters(i + 1) = Chr$(0)
      strCharacters(i + 2) = Chr$(0)

      i = i + 2
Skip1:
    i = i + 1

Loop



'Remove CR/LF between quotes
i = 0
j = 0


Do Until i = intCharcount - 1

   If strCharacters(i) = Chr$(34) Then
    booIsQuote = Not booIsQuote
    GoTo skip
    End If
   If (strCharacters(i) = Chr$(10) Or strCharacters(i) = Chr$(13)) And booIsQuote Then
    strCharacters(i) = " "
    End If
    
skip:
   i = i + 1
   
Loop


'Write Array to file

Set outfso = CreateObject("Scripting.FileSystemObject")
Set outfile = outfso.createtextfile("x:\TCB\Purchasing\08-09\ERP\Data Conversion\o_bp-inspect1.txt", True)

i = 0
Do Until i = intCharcount - 1

	outfile.write (strCharacters(i))
	i = i + 1

Loop


outfile.Close

End Sub
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
0|?|yes|0|"
"|""|?|?|0|0|""|""|0|0|0|""|""|0|0|""|0|0|0|0|""|""|""|?|""
0|?|yes|0|""|""|?|?|0|0|""|""|0|0|0|""|""|0|0|""|0|0|0|0|""|""|""|?|""
0|?|yes|0|""|""|?|?|0|0|""|""|0|0|0|""|""|0|0|""|0|0|0|0|""|""|""|?|""
0|?|yes|0|""|""|?|?|0|0|""|""|0|0|0|""|""|0|0|""|0|0|0|0|""|""|""|?|""
0|?|yes|0|""|""|?|?|0|0|""|""|0|0|0|""|""|0|0|""|0|0|0|0|""|""|""|?|""
0|?|yes|0|""|""|?|?|0|0|""|""|0|0|0|""|""|0|0|""|0|0|0|0|""|""|""|?|""
0|10/31/06|yes|57.25|"ANGEL RIVAS"|""|?|?|0|0|""|"140"|0|20060001|1|"N"|""|0|0|""|0|0|0|0|""|""|""|?|""
0|11/16/06|no|57.25|"ANGEL RIVAS"|"MISSING RISET POP AT SIDE OF BOTTOM TRACK IN TERRACE 6"" O.C"|?|?|0|0|""|"140"|0|20060002|1|"N"|""|0|1|""|0|0|0|0|""|""|""|?|""
0|11/20/06|yes|57.25|"AURELIO RAMOS"|""|?|?|0|0|""|"140"|0|20060002|2|"N"|""|0|0|""|0|0|0|0|""|""|""|?|""
0|10/26/06|yes|57.25|"ANGEL RIVAS"|""|?|?|0|0|""|"140"|0|20060003|1|"N"|""|0|0|""|0|0|0|0|""|""|""|?|""
0|""|0|0|0|0|""|""|""|?|""
0|12/08/06|no|57.25|"AURELIO RAMOS"|"NEED ELECTRICAL GROUND INSPECTION
NEED NOC"|?|?|0|0|""|"140"|0|20060021|1|"N"|""|0|7|""|0|0|0|0|""|""|""|?|""
0|03/07/07|no|57.25|"FELIX POUSA"|"NON CONFORMING TO NEW LOOK ART 680"|?|?|0|0|""|"601"|0|20060021|1|"N"|""|0|4|""|0|0|0|0|""|""|""|?|""
0|03/08/07|no|57.25|"PETER WAGONER"|"NO ONE HOME"|?|?|0|0|""|"140"|0|20060021|2|"N"|""|0|5|""|0|0|0|0|""|""|""|?|""
0|03/19/07|no|57.25|"FELIX POUSA"|"1. NON CONFORMING TO NEC 2005 
90.2A
110.3B
300.5

2. PAY RE-FEE"|?|?|0|0|""|"601"|0|20060021|2|"N"|""|0|2|""|0|0|0|0|""|""|""|?|""
0|02/21/08|yes|57.25|"RAUL RODRIGUEZ"|"SEE PERMIT #2007-3249 FOR ELECT FINAL"|?|?|0|0|""|"140"|0|20060021|3|"N"|""|0|6|""|0|0|0|0|""|""|""|?|""
0|01/05/08|yes|57.25|"CARLOS BERTOT"|""|?|?|0|0|""|"601"|0|20060021|3|"N"|""|0|3|""|0|0|0|0|""|""|""|?|""
0|01/11/07|yes|57.25|"HENRY WILLIS"|""|?|?|0|0|""|"039"|0|20060022|1|"N"|""|0|0|""|0|0|0|0|""|""|""|?|""
0|11/21/06|no|57.25|"ANGEL RIVAS"|"NO ADDRESS  FOUND"|?|?|0|0|""|"140"|0|20060023|1|"N"|""|0|1|""|0|0|0|0|""|""|""|?|""
0|11/22/06|yes|57.25|"ANGEL RIVAS"|""|?|?|0|0|""|"140"|0|20060023|2|"N"|""|0|0|""|0|0|0|0|""|""|""|?|""
0|11/15/06|yes|57.25|"AURELIO RAMOS"|""|?|?|0|0|""|"140"|0|20060024|1|"N"|""|0|1|""|0|0|0|0|""|""|""|?|""
0|11/09/06|yes|57.25|"ANGEL RIVAS"|""|?|?|0|0|""|"001"|0|20060024|1|"N"|""|0|0|""|0|0|0|0|""|""|""|?|""
0|11/20/06|yes|57.25|"AURELIO RAMOS"|""|?|?|0|0|""|"001"|0|20060025|1|"N"|""|0|1|""|0|0|0|0|""|""|""|?|""
0|11/27/06|yes|57.25|"ANGEL RIVAS"|""|?|?|0|0|""|"140"|0|20060025|1|"N"|""|0|0|""|0|0|0|0|""|""|""|?|""
0|?|no|57.5|""|""|?|?|0|0|""|"640"|0|20060026|1|"N"|""|0|1|""|0|0|0|0|""|""|""|?|""
0|11/15/06|no|57.25|"AURELIO RAMOS"|"NO PERMIT"|?|?|0|0|""|"001"|0|20060027|1|"I"|""|0|3|""|0|0|0|0|""|""|""|?|""
0|11/16/06|yes|57.25|"AURELIO RAMOS"|""|?|?|0|0|""|"001"|0|20060027|2|"N"|""|0|0|""|0|0|0|0|""|""|""|?|""
0|11/22/06|no|57.25|"AURELIO RAMOS"|"NO PERMIT"|?|?|0|0|""|"140"|0|20060027|1|"I"|""|0|2|""|0|0|0|0|""|""|""|?|""
0|11/28/06|yes|57.25|"ANGEL RIVAS"|""|?|?|0|0|""|"140"|0|20060027|2|"N"|""|0|1|""|0|0|0|0|""|""|""|?|""
0|?|no|57.5|""|""|?|?|0|0|""|"640"|0|20060028|1|"N"|""|0|1|""|0|0|0|0|""|""|""|?|""
0|11/16/06|no|57.25|"ANGEL RIVAS"|"LOOSE ANCHORS
FRONT DOORS TOP PANEL DOESNT COMPLY WITH MIN EDGE DISTANCE
MISSING WIND LOAD CALC
REAR ACCORDION COULD EXCEED MAX SPAN"|?|?|0|0|""|"140"|0|20060029|1|"N"|""|0|1|""|0|0|0|0|""|""|""|?|""
0|01/17/07|yes|57.25|"HENRY WILLIS"|""|?|?|0|0|""|"140"|0|20060029|2|"N"|""|0|0|""|0|0|0|0|""|""|""|?|""
0|11/28/06|yes|57.25|"ANGEL RIVAS"|""|?|?|0|0|""|"140"|0|20060030|1|"N"|""|0|1|""|0|0|0|0|""|""|""|?|""
0|11/20/06|yes|57.25|"ANGEL RIVAS"|""|?|?|0|0|""|"140"|0|20060031|1|"N"|""|0|0|""|0|0|0|0|""|""|""|?|""
0|11/03/06|yes|57.25|"AURELIO RAMOS"|""|?|?|0|0|""|"140"|0|20060032|1|"N"|""|0|2|""|0|0|0|0|""|""|""|?|""
0|11/14/06|yes|57.25|"ANGEL RIVAS"|""|?|?|0|0|""|"001"|0|20060033|1|"N"|""|0|2|""|0|0|0|0|""|""|""|?|""
0|12/21/06|no|57.25|"PETER WAGONER"|"CAVE-IN IN HOLES, END TREATMENT @ HOUSE INCOMPLETE "|?|?|0|0|""|"001"|0|20060050|1|"N"|""|0|2|""|0|0|0|0|""|""|""|?|""
0|01/09/07|yes|57.25|"PETER WAGONER"|""|?|?|0|0|""|"140"|0|20060050|1|"N"|""|0|3|""|0|0|0|0|""|""|""|?|""
0|11/28/06|yes|57.25|"Angel Rivas"|""|?|?|0|0|""|"001"|0|20060051|1|"N"|""|0|3|""|0|0|0|0|""|""|""|?|""
0|12/06/06|no|57.25|"Aurelio Ramos"|"no permit
"|?|?|0|0|""|"140"|0|20060051|1|"N"|""|0|2|""|0|0|0|0|""|""|""|?|""
0|12/11/06|no|57.25|"Angel Rivas"|""|?|?|0|0|""|"140"|0|20060051|2|"N"|""|0|0|""|0|0|0|0|""|""|""|?|""
0|12/13/06|yes|57.25|"Aurelio Ramos"|""|?|?|0|0|""|"140"|0|20060051|3|"N"|""|0|1|""|0|0|0|0|""|""|""|?|""
0|11/30/06|no|57.25|"angel rivas"|"no address number posted. no holed inspection post spaced more than 4'0"" o.c."|?|?|0|0|""|"140"|0|20060052|1|"N"|""|0|4|""|0|0|0|0|""|""|""|?|""
0|12/05/06|yes|57.25|"angel rivas"|""|?|?|0|0|""|"001"|0|20060052|1|"N"|""|0|1|""|0|0|0|0|""|""|""|?|""
0|12/07/06|yes|57.25|"angel rivas"|""|?|?|0|0|""|"140"|0|20060052|2|"N"|""|0|2|""|0|0|0|0|""|""|""|?|""
0|01/26/07|yes|57.25|"Peter Wagoner"|""|?|?|0|0|""|"001"|0|20060052|2|"N"|""|0|0|""|0|0|0|0|""|""|""|?|""

Answer : Need to optimize data cleanup routine in VB

I have not tried myself but google came up with plenty of options.

Top of the list: http://www.soft32.com/download_194850.html

Freeware- looks simple and effective.

L
Random Solutions  
 
programming4us programming4us