Preventing Excel from messing up CSVs

posted Jan 12, 2011, 8:37 AM by Ezra Kenigsberg   [ updated Mar 26, 2011, 3:09 PM ]

#DataTipOfTheDay 2011-01-12

Q: "Why are some values in my CSV showing up wrong in Excel?"

A: When I open a CSV in Excel, Excel automatically tries to format data as numeric values.  This can mess up data that SHOULDN'T be formatted.

Example 1: An American postal code that starts with a zero, like "01810", will show up in Excel as "1810", with leading zeroes stripped out.

Example 2: A record ID code that reads "12-2" would become "2-Dec" in US versions of Excel, and "12-Feb" in most international versions.

Example 3: A software version field that reads "5.0" will be simplified to "5", making it unclear whether the value was meant to be 5.0 or was incompletely entered by the user.

Solution 1: Download the "Preserve CSV Formatting" Excel add-in.

To install the add-in in Excel 2003 (and prior versions), click "Tools | Add-Ins".  To install the add-in in Excel 2007 (and later versions), click the Office Menu in the upper left-hand corner, then click "Excel Options | Add-Ins".

Note that the add-in takes longer to open CSVs; I don't recommend it for CSVs larger than 5-10 Mb in size.

Solution 2: After opening a file in Excel, use Excel's Number Formats or the TEXT() function to reformat misformatted cells, then save the fixed file.

Solution 3: Don't use Excel.  Open CSVs in other applications like text editors, ETL tools, or Synchronizer.