Microsoft Excel File Format
The binary interchange file format (BIFF) is the file format in which Microsoft Excel workbooks are saved on disk. Microsoft Excel versions 5.0 and later use compound files; this is the OLE 2 implementation of the Structured Storage Model standard. For more information on this technology, see the OLE 2 Programmer's Reference, Volume One,
and Inside OLE 2
An OLE 2 compound file is essentially "a file system within a file." The compound file contains a hierarchical system of storages
. A storage is analogous to a directory, and a stream is analogous to a file in a directory. Each Microsoft Excel workbook is stored in a compound file, an example of which is shown in the following illustration. This file is a workbook that contains three sheets: a worksheet with a PivotTable, a Visual Basic module, and a chart.
The Workbook stream consists of substreams
, which correspond to Excel workbook' sheets, charts and macro sheets. There is also one substream in the Workbook stream, which contains global data, e.g. the table of fonts used in the file, the table of sheets whith their offsets in a stream.
Each substream is a set of BIFF records
, which are the basic units of data in a BIFF file. Each BIFF record have one of predefined types. For example, a record of type NUMBER contains a numeric cell value, a LABEL contains a string cell value. Some records also contains links to other records (in fact one record stores an offset of another record a in substream).
All of these objects can appear damaged after the file system crash, power down or incorrect program termination. Bad record types, bad links, wrong records in a substream - and : ooops!
Our technology gives you a possibility for collecting records from damaged storages, check them and repair the file.
Our main purpose is to find all valid records in the file and to save them in a new repaired file. The repair process consists of several parts:
The reading of the file means at first reading of structure of the compound file, and then reading of the information inside streams. We are interested with data from the Workbook and other streams, for example Summary Information, which contains file properties, such as the author, modification date etc.
The treelike structure of streams and storages also stored in the file and, certainly, can be damaged. So, we must provide an alternative method of file reading without deal with the compound file structure. This method - raw scanning - supposes that all file data is exactly the Workbook stream.
Extracting BIFF records
The next step of the repair process is extracting BIFF records from the Workbook stream of the compound file.
Each record have a record number, which identify its class. For example, record number 0x203 means a NUMBER record. Further, each record with a certain record number have its own format. The following table illustrates a NUMBER record format:
||Common to all records
||Size of record data
||Index to the XF record
||Floating-point number value
When reading a record in a stream we actually get its record number, size and data. Further all of them can be processed and corrected.
Checking records and correcting them
When we reading records or process them it is possible, for example, that the record number contains bad value or the size of a record exceeds allowed maximum. So, we must "improve" such record if possible. Otherwise it will be discarded. Only records that approved to be valid can be written in a new repaired file.
Saving the repaired file
After checking and improving all records it's time to save them to the repaired Excel file. At first, the structured storage is created for the new compound document. Next, the correct Workbook stream and other streams from the initial file are written into this storage. Finally, we have the new file containing a valid part of data from the source file.
TRADEMARKS. Microsoft, Microsoft Office, Microsoft Excel,OLE and/or other Microsoft products and technologies referenced herein are either trademarks or registered trademarks of Microsoft Corporationin the U.S.A. and/or other countries.
Some pictures from Microsoft Developer Network (MSDN) Library are used in this text.