Problem
A text file may contain many fields separated by delimiters. The delimiters could be tab, pipe(|), comma, etc. The following is an example of such a file. It has 4 fields separated by pipe (|). Some utilities only correctly recognize certain type of delimiters. How do we change the delimiter from one type to another, e.g., from pipe (|) to comma(,)?
$ head ads_log.txt ADS_ID|DEVICE_OS|NUM_IMPRESSION|NUM_CONVERSION 32| Android| 1| 0 32| Android| 1| 0 32| Android| 1| 0 32| Android| 2| 0 32| Android| 1| 0 32| Android| 2| 0 32| Android| 1| 0 32| Android| 3| 0 32| Android| 2| 0
Solution
We can use an editor such as notepad to do character replacement. However, if the file size is very big, using editor to do the work is very slow. A better way would be using Unix/Linux command tr.If we are running Windows system, we can install free cygwin which simulates Unix.
$ cat ads_log.txt | tr '|' ',' ADS_ID,DEVICE_OS,NUM_IMPRESSION,NUM_CONVERSION 32, Android, 1, 0 32, Android, 1, 0 32, Android, 1, 0 32, Android, 2, 0 32, Android, 1, 0 32, Android, 2, 0 32, Android, 1, 0 32, Android, 3, 0 32, Android, 2, 0We can write the output to the resulting file ads_log.txt.
$ cat ads_log.txt | tr '|' ',' > ads_log2.txt
No comments:
Post a Comment