Recover Corrupt excel File (.xls or xlsx) using Python

Jeril Kuriakose
2 min readJun 17, 2017

--

In this post I will be sharing how to recover corrupt excel file (.xls or .xlsx) file using Python. We were analysing a large data-set of a Pharmaceutical company, and the company was using SAP for their ERP. Their sales data were auto generated from SAP and were provided to us. But all of the data-sets were corrupt. Corrupt in the sense we were able to view the file in excel, but not using Python. The following was the error that is displayed while opening the file using excel.

We tried several options to open it, but of no use. The only way out was to open the file, then again save-as the file using the correct format, manually. But doing this for a large amount of files would become tedious. So we thought of asking it in StackOverflow, there were many suggestions and we tried all, but it dint work.

So, finally we thought to give it a try. We tried tried several ways, and all were returning the same error:

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\xff\xfe\r\x00\n\x00\r\x00'

And finally we got a solution, the following were our steps:

  1. Open the excel file using the normal io.open available in Python.
  2. Create a new workbook using Python.
  3. Write out the file as a new excel workbook with the same name.

The following is our code:

Now we were able to open and clean the data:

Happy Coding !!!

--

--