We all know that the only place to keep data is a database. Surprisingly, 150% of our customers firmly believe that data has to be stored in Excel sheets. As expected your customer rejected your first and quick attempt to export to Excel files using COPY TO. Now you're suffering from automation hell trying to keep execution time of a simple export down to less than an hour. There's no reason to!
Ever since Excel XP (Excel 2000 with downloadable plugins) Excel supports XML files in XMLSS (XML Spreadsheet) format. That's almost a decade now! Even your most cost sensitive customer might have now replaced their Office 95 and Office 97 installations, meaning there's no need to chastise yourself with Excel automation.
Creating Excel files
If you need to store data, of course, you store them in a table. At least that’s what every Visual FoxPro developer will tell you. Ask a .NET developer the same question and they’ll wonder why you would even consider something other than XML files.
Our customers are business persons. The only place for data that one can take seriously is – one shouldn’t need to point out the obvious – naturally Excel. Excel drives businesses to a degree that most of us won’t even want to know about. Just recently I came across another one myself. One of the not so small banks (Barclay’s Bank) and a similar well-known airline (British Airways) work together in some countries.
Under some conditions Barclay’s issues a voucher to its customer that they can redeem with British Airways. Employees track these conditions manually, and then send an email to British Airways where the recipient is entered in a large Excel document. To redeem the voucher customers need to call British Airways where a few people have the permissions to look into the file. These are big corporations and the vouchers in questions are for their top customers. And still, Excel wins against a carefully crafted solution to automate data exchange between the two companies.
Excel is everywhere. Excel makes data accessible. When you provider data to users in Excel format, they are usually happy, because now they can work with data. Data in a SQL server or in the application is locked away. They can only do what the program allows them to do. For every little bit they have to ask IT to implement a new feature. Budgets need to be allocated, decisions made, endless meetings, just to do something with data on a server that would be a no-brainer in Excel.
Visual FoxPro offers multiple ways to export data to Excel. The simplest ones of them are the COPY TO and EXPORT commands. They directly support several older versions of Excel. Alternatively, you can create a tab delimited or a CSV file. Name the file with an XLS extension. Excel is smart enough to realize that these files can’t be native XLS files. Rather, Excel automatically converts these files when you open them.
Office 2007 introduced a little change that makes this one tiny bit more difficult. Presumably for security reasons, Excel 2007 and Excel 2010 prompt for confirmation when you open a file where the file extension differs from the content.
The quick and simple approaches are exactly this: quick and simple. There’s not a lot of choice you have regarding formatting or data types. Excel deducts the type from the content. If something looks like a number, it will become a number.
With ODBC and OLEDB you can directly write into Excel sheets. This gives you quite a bit of control over how data is interpreted and where data is placed in the document. There’s zilch control over any visual aspect, though. You can’t control the layout, column widths, worksheet names, and so forth. Pamela Thalacker wrote about using ODBC to control Excel in the November 2007 issue of FoxPro Advisor (http://www.foxproadvisor.com).
For many, many years automation was the only way to transfer data to Excel and control formatting at the same time. There are two kinds of automation. The older one of the two is Dynamic Data Exchange, DDE for short. The current technology is called OLE automation or just automation. Both technologies suffer from the same disadvantages.
They aren’t known for being blazingly fast. None of them will work when Excel is not installed on the machine. In fact, even the version of Excel is often relevant. With automation you can’t ensure that no dialog comes up. This renders automation unusable in web server scenarios.
Excel XP introduced a new storage format that solves all of these problems: XML. It wouldn’t be a true Microsoft format, if the format hadn’t changed over time. Excel XP and Excel 2003 use a syntax that is called XML Spreadsheet (XMLSS). A detailed description of the format is available on the following web site:
Excel 2007 and Excel 2010 open XMLSS files without any problem. The native format of these two versions, though, is Office Open XML. Do not confuse this with the similarly named Open Office XML. The latter is the native format of Open Office, the competition on the office product market. Office Open XML, or OOXML, is a new ISO standard suggested by Microsoft. The standard has been accepted in April 2008. XMLSS is limited to only Excel. You can store just about any Office document as OOXML, though. OOXML supports the full feature set of the Microsoft Office Suite.
XMLSS is much more limited. Not even every feature of Excel supported. Missing are, for instance, pictures, embedded ActiveX controls or OLE objects, and VBA projects which means macro support. Lacking support for pictures is actually the most missed feature of XMLSS. Many companies use their logo in Excel sheets. These logos cannot be stored in XML format.
For us XMLSS has one big advantage over OOXML. It’s dramatically easier to handle. You don’t need to generate dozens of different files and store them in a ZIP archive. XLSX files are in fact ZIP archives. XMLSS exists longer and is easier. This makes it a format that is pretty well supported, even outside the Office product family. There’s an import filter for Office 2000. Unless you have to deal with Office 95 or Office 97, XMLSS is a pretty safe format. The remaining document will only cover the XMLSS format.
As mentioned previously, you don’t have to use XML as the file extension when you create XMLSS files. In the default configuration, opening a file with the XML extension would still open Microsoft Excel when the content is an XMLSS document. Office achieves this by using a small loader application that looks at the content of the XML file and then decides which application to launch. Many users configure the computer so that Windows doesn’t open XML files in Office. The loader doesn’t display the right icon in Explorer, though.
Therefore it’s less confusing and more compatible when you use the familiar XLS extension for you XML files. Excel opens these as XML spreadsheets automatically. The simplest form of such an XML spreadsheet looks like this:
This defines a workbook with a single, empty worksheet. If you would save this document from Excel, you’d notice that Excel adds a real bunch of further information. All of those are optional and would only clutter the file. Should you ever need to debug an XMLSS file it pays off to restrict the content to the absolute minimum. And you will need to debug these files, trust me.
By starting with an empty work book I already broke with the habit of beginning with a „Hello World“ example. Let me provide you with one now:
<Data ss:Type="String">Hello world</Data>
Some attributes are required. For instance, on the <Data> tag you have to specify which type of data you want to store in the cell. This is contrary to Excel’s normal behavior, which is to figure out things based on the content. This is a common theme in XMLSS. You will see that Excel does not attempt to figure out anything on its own. Provide all information as required by Excel, or be doomed. That’s your choice.
If you don’t specify the ss:Type attribute, Excel won’t open the document at all. Instead you get an error dialog. If you used automation to open the XLS file, Excel forwards the error to Visual FoxPro. It’s therefore a good idea to encapsulate opening an XMLSS document into a TRY…CATCH…ENDTRY block.
If you write code that creates Excel files you’ll likely see this error message quite frequently. Excel is very picky about the data format in the XML file. Forget just a single attribute or tag, or misspell one, store data in the wrong format, have too many elements of one kind, or too few, and Excel refuses to open the document.
This pickiness of Excel isn’t the result of a frustrated program manager at Microsoft who decided that developers need to share his pain. If you have worked with XML documents, you most likely used the DOM parser. DOM means Document Object Model. Once you loaded a document into memory, you can walk through an object tree, use XPath to search for nodes, and so on. It’s a comfortable way of accessing XML.
Excel uses a different mode called the SAX parser. .NET makes this parser available as the System.Xml.XmlReader class. This parser reads the entire XML document in a single pass top to bottom. Instead of asking for whatever XML node you need, your code receives the XML nodes in the order they appear in the document. That means you need to write your code around the structure of the XML file, not vice versa.
What you gain is performance. Using this parser and insisting on tags being in a strict order makes it possible for Excel to open a 50 MB XML document within two seconds. With the DOM parser you would likely spend minutes for the same document.
Debugging Excel file is an art of its own. Fortunately, Excel provides some guidance when the file format is wrong. Unfortunately, this might be less than you expect. Excel creates a log file every time you open an invalid document. This log file contains all errors and on Windows XP is stored in
C:\Documents and Settings\<user>\Local Settings\Temporary Internet Files\Content.MSO
On Windows Vista and Windows 7 you’ll find this file in the following directory:
C:\Users\<user>\AppData\Local\Microsoft\Windows\Temporary Internet Files\Content.MSO
As I’ve mentioned above the log file does contain any details about the error, the actual position, etc. The error log only specifies the tag and, if applicable, the attribute or the value that caused the problem. However, if you have hundreds or thousands of cells it’s not particular helpful to know that one of them contains an error.
Right now our sample worksheet doesn’t look particular exciting. Let’s therefore start exploiting some more Excel features step by step.
The content consists of rows and cells. You definitely noticed that this doesn’t leave any room for columns. There’s a separate tag for everything you can change on the column level. The most frequent setting needed for columns is the column width. To provide you with a better reading experience, I omit repeated elements in the following XML samples. You find the complete document for every step in the session material on the CD or the downloadable files. Changes are highlighted in bold font:
Excel expects that you pass all sizes and positions in points, not in pixels. A point is a typographical unit that is 1/72th of an inch. The actual pixel value depends on the resolution of your monitor. A standard resolution for monitors is 96 dpi. Some users might have enabled large fonts on the display properties dialog. In this case the resolution is often 120 dpi or any other value. As Visual FoxPro application always use pixels instead twips (no, this doesn’t have anything to do with Twitter), you can use the following formula to convert between pixels and points:
pixels = points * 96/72
points = pixels * 72/96
The column we defined in the previous sample is therefore 267 pixels wide. Other options available on the column level include visibility, automatic resizing, and so on. To define multiple columns you simply add multiple <Column> tags to the <Table> tag. There is no grouping tag such as <Columns>.
The physical order of the <Column> tags defines the order of the columns in the Excel sheet. Many tags accept the ss:Index attribute. With this tag you can skip one or multiple columns. For instance, if you want to define the size of the first and the 10th column, you would use:
<Column ss:Index="10" ss:Width="200"/>
XMLSS identifies columns by numbers. Numbering starts with 1, not like in .NET with 0. Column number 10 is the same as the J column in Excel. You will likely use the ss:Index attribute quite a bit with the <Cell> tag. You absolutely have to specify the content for every <Cell> tag that you use. Content isn’t optional. To keep a cell empty, you need to turn this cell into a string and pass an empty value.
Just link FoxPro distinguishes between EMPTY() and ISBLANK(), Excel keeps track of whether a cell is used or not, even if the content is empty in both cases. To leave a cell totally empty and undefined, you have to skip this cell in the XML definition using the ss:Index attribute on the following column.
Within a <Cell> tag you use the <Data> tag to define the content of a cell, as well as the type. The three most common data types are strings, numbers and date values.
<Cell><Data ss:Type="String">Hello world</Data></Cell>
There a number of rules regarding the content of a cell that you have to follow. Everything between <Data> and </Data> is the content. First of all, the content must not have any additional blanks or line breaks. For this reason I inserted the line break in the previous example into the <Data> tag itself, not between tags as in the following – invalid (!) – sample:
The line above will cause an error when you attempt to open the document. Excel won’t tell you which line caused the error. At least when the error is in the <Data> tag, you get an error message that actually contains a value:
XML ERROR in Table
REASON: Bad Value
FILE: C:\Users\Christof\AppData\Local\Microsoft\Windows\Temporary Internet Files\Content.MSO\47D0ED28.xls
The content between <Data> and </Data> must be convertible according to the data type. There’s no cleanup performed by Excel before the conversion takes place.
All string values have to be provided in UTF-8 format. For English text UTF-8 and ANSI are identical. However, umlauts used in German texts, accented characters used in every country around the US, require a conversion. Visual FoxPro makes this an easy job with the STRCONV() function:
If you pass a field to STRCONV(), make sure you trim the value with the ALLTRIM() function. Otherwise the cell content might look strange, especially when the cell is not left aligned. Numerical values must use a decimal point and no separators. This is true independent of the language version of Excel and the regional settings of Windows. Many European users would enter “1,23” in Excel. Within the XML document, however, this has to become “1.23”.
If your application uses SET SYSFORMATS ON to respect the users setting, or defines SET POINT, you need to reset this value to “.” for the export process. Even if your application only runs in the US and you don’t have a Spanish version, you should make a point of changing SET POINT (pun intended).
lcPoint = Set("Point")
Set Point To "."
With this setting in place you can convert numerical values with a simple call to the TRANSFORM() function.
Numbers aren’t optional! If you turn a cell into a numeric value, you must provide a number. If you don’t have a value, the zero is a good default value. You can’t leave the tag empty or specify something else. This is mainly an issue with NULL values. In you generic routines you should use the following expression:
TRANSFORM( NVL(nData,0) )
Excel saves date values as integer numbers. This number is the number of days since a particular date. When you store a date value in an XML spreadsheet, though, you have to use the standard XML date format. Fortunately, there’s a function in Visual FoxPro that performs the conversion for you:
Here comes the confusing part: Even though you specified the data type as date and you passed a date value in XML format, what you seen when you open the spreadsheet in Excel is a mere number. Data values are not automatically formatted as date values. Contrary to numbers and strings you have to format the cell explicitly.
Excel uses styles similar to Word or CSS. However, Excel offers significantly less flexibility. Styles are defined in a <Styles> tag on the work book level. All sheets within a work book are using the same style. Every style used by Excel corresponds to one <Style> tag that defines various visual aspects.
Styles in Excel do not support inheritance, at all. Moreover, you can only assign one style to each cell. Imagine, you have a cell with a complex format involving conditions, a background pattern, a border, and so on. For the summary line you need exactly the same style, except that it needs to be bold. The only possibility to do this in XMLSS is to duplicate the first style and adjust the copy. If you make changes to one style you have to remember to make these changes in all places.
Excel names all styles as “S” followed by an incrementing number. You don’t have to follow this example. Just like you don’t name variable lc1 to lc200 in your application, naming styles this way makes them a maintenance nightmare. You can use any text you want for the style name, as long as you remember that the name is case sensitive. Don’t be surprised when your carefully crafted names disappear after you saved the file in Excel.
Every time you save a file, Excel completely regenerates the list of styles. Styles that are not used anywhere in the spread sheet are removed from the file. All others styles are named as Excel would name them, that is, “S” followed by a number. Your style names are gone… forever.
When developers settle on names, those aren’t necessarily the best choices. <NumberFormat> is such a tag where name must have been chosen by a developer without reflecting the choice with other team members. You can think of <NumberFormat> as <CellFormat>, as long as you write it <NumberFormat>. This tag specifies the format of cells, not just numeric cells.
If you want to display the date value from one of our previous samples as “01-Mai-2008” (that’s German, not a spelling error), you have to enter the following code into an English version of Excel when the regional settings for the English version of Windows are German:
Excel is extremely language dependent. Some of this will bite you when you deal with different versions of Excel or different regional settings. But it’s not that time yet. For now you can be lucky, as formats are stored language independently in the XML file. Conveniently, Microsoft settled on the English format instructions:
You can find the correct format string using trial and error. I personally find it more efficient to format the cell in Excel in the way I want it. Then save the file as XML Spreadsheet and take a look at the style definition in the generated file. I then just copy and paste the definition into my XML file.
This becomes a bit easier when you format the cell with a font that you don’t use anywhere else in the Excel sheet. With this you can open the generated XML file and merely search for the font name to quickly identify the style used by this cell.
To format a cell with a certain style you pass the ID of the style to the <Cell> tag’s styleID attribute:
<Cell ss:StyleID="s22"><Data ss:Type="Da…</Data></Cell>
You cannot format a cell without using a style. In HTML and CSS you have the choice of creating a CSS selector with the style information, writing the format into the style attribute or use one of the older HTML tags like <B></B> to format text. In Excel creating a style and assigning it to a cell is your only choice.
Aside from the format string you can also change the font, style and size in an XMLSS file. Within the <Style> tag you use the <Font> tag to alter the font settings:
<Font ss:FontName="Tahoma" ss:Size="13"
Switches such as ss:Bold, ss:Italic, ss:Outline, ss:Shadow, ss:StrikeThrough, have a value of one, when the style is active, or zero when the style is deactivated. Zero is the default value for all of them. An exception to this is the ss:Underline attribute which can have be “Single” or “Double”. If nothing is specified, the text is not underlined at all.
Excel shines when it comes to false friends. There was the <NumberFormat> tag that doesn’t deal with numbers, the date format that doesn’t show up as a date, empty cells that cannot be empty, and so forth. Colors are similar.
Within the XML files, Excel uses a hexadecimal notation that is similar to HTML. For instance, the value #FF0000 denotes red. Behind the #-sign, you specify the values for red, blue and green. Each is a two-digit hexadecimal value. Alternatively, you can use the same color names that Internet Explorer uses for web sites. A list of these names is available online on the Microsoft web site:
Unfortunately, the tool tip names in the color selection dialog of Excel do not match these names, so you can deduct from the user interface what values you have to use in the XML file to obtain the same effect.
The possibility to specify hex values makes Excel look much more flexible than it actually is. FoxPro 2.x used color sets to specify the color for various elements. Excel calls them palettes, but the concept is the same. A palette is part of the excel workbook and consists of 40 colors for cells, eight colors for lines and eight colors for filling charts. In Excel you can view and edit the palette with the Tools > Options > Colors menu item.
Whenever you specify a color in hexadecimal notation in one of the styles, Excel searches the list of all 56 available colors. When there is no exact match, Excel will automatically use the closest color. Before you can use your own colors you have to define your own palette within the XML file:
Index can be a number between 1 and 56. RGB is the desired color. You cannot use more than 56 different colors in one Excel workbook.
There are many more options to format a spread sheet. You can change the background of a cell, define the margin for every side of the cell, define a print area, select the active cell when you the spread sheet, and much more. The easiest way to find out what XML you have to write is to create an Excel document that looks exactly as you want it to look. Then save the spreadsheet as an XML file and look for the style or cell definitions.
XML files are pure text files. Visual FoxPro provides extremely flexible ways to generate these text files. You can create them entirely programmatically from the ground up, or you use TEXTMERGE together with templates. When you started generating XML files, you won’t go back to pure automation.
The following sample generates a formatted Excel sheet from the Northwind Customer table where every second line is differently colored:
Set Point To "."
lcFile = GetEnv("USERPROFILE")+"\Desktop\Customers.xls"
Local lcRows, lnField, luValue, lcStyle, lcData
lcRows = ""
lcRows = m.lcRows + "<Row>"
For lnField = 1 to Fcount()
luValue = Evaluate(Field(m.lnField))
lcStyle = Iif(Recno()%2==0,"even","odd")
lcData = ;
lcData = ;
lcRows = m.lcRows + ;
lcRows = m.lcRows + "</Row>"
Text to m.lcXML Noshow Textmerge
<Font ss:FontName="Tahoma" ss:Size="13" ss:Bold="1" />
<Font ss:FontName="Tahoma" ss:Size="13" ss:Color="red" />
Copyright, 2009-2010, Christof Wollenhaupt.