When parsing large files and using DataStreams to push it into a DataFrame code is very slow parsing one field at a time. It is more efficient when parsing whole columns from the file at a time. This is fairly easy with byte based parsing but character based parsing means lines are not a constant length in bytes making byte indexing across a large file difficult. Figure out a method to do this universally.
When parsing large files and using DataStreams to push it into a DataFrame code is very slow parsing one field at a time. It is more efficient when parsing whole columns from the file at a time. This is fairly easy with byte based parsing but character based parsing means lines are not a constant length in bytes making byte indexing across a large file difficult. Figure out a method to do this universally.