StarWriter <= 5.x File Format (.sdw)

Overview

The .sdw file format is an OLE2 Stream. libole2 (available as part of wv) can be used to access the different substreams contained in the file.

All numbers given below are decimal, unless prefixed by 0x, in which case they are hexadecimal.

Note that this documentation is far from complete!

The example code given below was used with a C++ Compiler, but should mostly be valid C as well.

While writing the SDW Importer, I found this small utility I wrote (dumpstream.c [requires libole2]) (GPL) very helpful. It can be used to print out the contents of a OLE2 stream. If invoked like "dumpstream filename.sdw", it lists the streams that are part of the file. If invoked like "dumpstream filename.sdw StarWriterDocument", it prints out the contents of that stream (you might want to pipe the output through xxd to get a hexdump). Don't expect too much from this tool - it's a q&d hack I've made to be able to see the contents of a stream.

Filepaths like sw/source/... are paths to the OpenOffice sourcecode, relative to the root.

Data Types

Most types should speak for themselves (e.g. uint16 = unsigned 16 bit integer, sint32 = signed 32 bit integer).

There is, however, at least one special type: The Class ID, also known as ClsId. It's a structure defined a follows:

struct ClsId {
   sint32 n1;
   sint16 n2, n3;
   uint8 n4, n5, n6, n7, n8, n9, n10, n11;
};

The elements of the structure are stored in the file without any padding and in the order in which they occur in the above struct definition.

The type bool is a one-byte integer, where 0 means false and all other values true; though usually 1 is stored.

Another important type is the Bytestring,it looks like this: First, there is an uint16, giving the length in bytes of the following string. A char[] follows. It is supposed to be firstly decrypted (if in the StarWriterDocument stream outside the header and if the document is encrypted; see below). Under the same condition, the string is in the character set specified in the document header.

Streams

The file consists of the following streams:
SwPageStyleSheets
SwNumRules
StarWriterDocument - the actual document and most important stream
SfxWindows - position of windows (?)
SfxStyleSheets
SfxDocumentInfo - information about the document, like charset, author etc
persist elements
SummaryInformation
\001Ole - ?
\001CompObj - "Compatibility Object" (?), contains information about the creator of the document

SwPageStyleSheets

SwNumRules

StarWriterDocument

Offset in Hex
Length
Type
Default Value
Description
0x00
7
char[]
"SW5HDR"
Version Indicator, null-terminated.
Can be "SW3HDR", "SW4HDR" or "SW5HDR"
0x07
1
uint8
0x2e (?)
Length of the header, including Block Name, but not including Record Sizes (if used)
0x08
2
uint16
0x0217
Document version, increased every time a new feature is added.
0x0A
2
uint16
n/a
File Flags, see below
0x0C
4
uint32
n/a
Document Flags, see below
0x10
4
uint32
0
nRecSzPos (?)
0x14
6
--
0
dummy bytes... actually, uint32, uint8, uint8
0x1A
1
uint8
0x30
Redline mode, see below
0x1B
1
uint8
0x00
Compatibility Version. Is increased when a change makes the file format incompatible with previous versions.
0x1C
16
uint8[]
n/a
Password verification data, see below
0x2C
1
uint8
depends
The character coding of the file. Here is a file which includes mapping of StarWriter IDs to iconv names, usable a a C/C++ Header file
0x2D
1
uint8
0x00
cGui (?) "OLD: eSysType" (?) so not in use anymore?
0x2E
4
uint32

Current Date, used for Password verification (see below). Format: 20020501
0x32
4
uint32

Current Time, also for PW Verification, Format: 22034800 (HHMMSS00)
0x36
64
char[]

sBlockName (?) (in the document charset) (only read if SWGF_BLOCKNAME flag is set!)




rec sizes... only if nRecSzPos != 0 && nVersion >= SWG_RECSIZES
see sw/source/core/sw3io/sw3imp.cxx lines 1070ff
I don't know details yet

After the header, the file consists of many sections. Each section starts with a character (char) that indicates of what type it is (hereafter called section id). After that, there are three bytes indicating the length of the section (little-endian). (to convert this to a usable integer, use for example: (buf [0] | (buf[1] << 8) | (buf [2] << 16)) where buf points to the first of the three read bytes). This means that unsupported Sections can easily be skipped.

See below for a list of section types.

File Flags

(from sw/source/core/sw3io/sw3ids.hxx lines 65ff)
#define SWGF_BLOCKNAME  0x0002
Header has textmodule
#define SWGF_HAS_PASSWD 0x0008
Stream is password protected, see below for details.
#define SWGF_HAS_PGNUMS 0x0100
Stream has pagenumbers
#define SWGF_BAD_FILE   0x8000
There was an error writing the file - treat it as unusable.

Document Flags

#define SWDF_BROWSEMODE1 0x1
Show document in browse mode?
#define SWDF_BROWSEMODE2 0x2
Same as above, only one of them need to be set
#define SWDF_HTMLMODE 0x4
Document is in HTML Mode
#define SWDF_HEADINBROWSE 0x8
Show headers in Browse Mode
#define SWDF_FOOTINBROWSE 0x10
Show footers in browse mode
#define SWDF_GLOBALDOC 0x20
Is a global document (a global document can contain chapter documents... I think)
#define SWDF_GLOBALDOCSAVELINK 0x40
Include sections that are linked to the global document when saving
#define SWDF_LABELDOC 0x80
Is a label ("etiketten") document

Redline Mode

(from sw/inc/redlenum.hxx lines 83ff)

 enum SwRedlineMode
 {
         REDLINE_NONE,
No Redline mode
         REDLINE_ON = 0x01,

Redlines are on
         REDLINE_IGNORE  = 0x02,
Don't react to redlines
         REDLINE_SHOW_INSERT = 0x10,
Show all inserts
         REDLINE_SHOW_DELETE = 0x20,

Show all deletes
         REDLINE_SHOW_MASK = REDLINE_SHOW_INSERT | REDLINE_SHOW_DELETE
The Default
};

Password Protection

(from sw/source/core/sw3io/sw3imp.cxx lines 2721ff and sw/source/core/sw3io/crypter.cxx lines 77ff)

Firstly, to be able to en- or decrypt data, the password must be encrypted in memory (see below for the actual algorithm). For this encryption, this password is always used. Also, the password needs to be exactly 16 characters long; if it's shorter, it needs to be padded with spaces:

static const UT_uint8 gEncode[] =
{ 0xab, 0x9e, 0x43, 0x05, 0x38, 0x12, 0x4d, 0x44,
  0xd5, 0x7e, 0xe3, 0x84, 0x98, 0x23, 0x3f, 0xba };

The resulting string will be used as the password for actual en- or decryption. (For both en- and decryption the same algorithm will be used).

Here's the algorithm:

void SDWCryptor::Decrypt(const char* aEncrypted, char* aBuffer, UT_uint32 aLen) const {
        size_t nCryptPtr = 0;
        UT_uint8 cBuf[maxPWLen];
        memcpy(cBuf, mPassword, maxPWLen);
        UT_uint8* p = cBuf;

    if (!aLen)
        aLen = strlen(aEncrypted);

        while (aLen--) {
                *aBuffer++ = *aEncrypted++ ^ ( *p ^ (UT_uint8) ( cBuf[ 0 ] * nCryptPtr ) );
                *p += ( nCryptPtr < (maxPWLen-1) ) ? *(p+1) : cBuf[ 0 ];
                if( !*p ) *p += 1;
                p++;
                if( ++nCryptPtr >= maxPWLen ) {
                    nCryptPtr = 0;
                    p = cBuf;
                }
        }
}

Where:
maxPWLen = 16
mPassword is an array of characters, 16 bytes long, and contains the password which will be used

To verify that the given password is actually correct, these steps should be taken:

A new string, say testString, should be built, consisting of the Date and Time (from the header) next to each other in Hex format, padded with 0 on the left if shorter than 8 characters (can for example be archieved by snprintf(testString, sizeof(testString), "%08lx%08lx", mDate, mTime);)

This string should now be encrypted with the given password, and the result should be compared to the password verification data mentioned above. If they are equal, the password is correct.

Section Types

(the letter in parentheses is the section id)

SfxWindows

SfxStyleSheets

SfxDocumentInfo

(OpenOffice Tree: sfx2/source/doc/docinf.cxx lines 786ff)

offsets assume a version of 0x0B and default values (for bytestrings and lengths). quotes are from openoffice code or comments

present if header version >
Offset in Hex
Length
Type
Default Value
Description

0x00
2
uint16
0x0F
Length of the following String

0x02
15
char[]
"SfxDocumentInfo"
Headerstring (stored without terminating zero)

0x11
2
uint16
0x000B
Version

0x13
1
bool
0x00
True if doc is pw protected

0x14
2
uint16
0x0016 (on my system)
Charset, see below.

0x16
1
bool
0x00
Graphics are saved portable

0x17
1
bool
0x01
Ask the user whether the template should be reloaded

0x18
41
Timestamp

Creator Timestamp, see below

0x41
41
Timestamp

Timestamp for last Modification

0x6a
41
Timestamp

Timestamp for last Printing

0x93
65
Bytestring+Padding
""
Title of the document; pad until 63 chars are read

0xd4
65
Bytestring+Padding
""
Theme/Subject of the document, pad until 63

0x115
257
Bytestring+Padding
""
Comment, pad until 255

0x216
129
Bytestring+Padding
""
Keywords, pad until 127

0x297
4*42


following two fields are repeated 4 times:


21
Bytestring+Padding
"Info0" - "Info4"
Name of user-defined field, padded until 19


21
Bytestring+Padding
""
content of user-defined field, padded until 19

0x33f
--
Bytestring
""
Template Name

from here on, offset assumes an empty template name and filename

0x341
--
Bytestring
""
Template Filename

0x343
4
uint32

Template Date (format as in Timestamp)

0x347
4
uint32

Template Time (format as in Timestamp)

0x34b
2
uint16

Mail-Adress count. Only if the stream version (of StarWriterDocument?) is <= SOFFICE_FILEFORMAT_40 (3580). Unused field.

following two fields are repeated number_of_mail_adresses times; and can be ignored


--
Bytestring

the address


2
uint16

flags

following offsets assume that the stream version is >= SOFFICE_FILEFORMAT_40 and that therefore the mail addresses aren't present

0x34b
4
int32
?
lTime (?)
4
0x34f
2
uint16
1
Document number (seems to be the version, ie. how often the document was saved)

0x351
2
uint16
0
user data size

0x353
see above
byte[]

user data "e.g. document statistic". following offsets assume that this is not present

0x353
1
bool

Template contains configuration
5
0x354
1
bool
false
Reload enabled?
5
0x355
--
Bytestring
""
Reload URL
5
0x357
4
uint32
60
Reload seconds
5
0x35b
--
Bytestring
""
Default Target Frame
6
0x35d
1
bool
true
Save Graphics compressed (if true, next field is also true)
7
0x35e
1
bool
true
Save Original Graphics
8
0x35f
1
bool
false
Save Version on Close (?)
8
0x360
--
Bytestring
""
Copies to
8
0x362
--
Bytestring
""
Original
8
0x364
--
Bytestring
""
References
8
0x366
--
Bytestring
""
Recipient
8
0x368
--
Bytestring
""
Reply To
8
0x36a
--
Bytestring
""
Blind Copies
8
0x36c
--
Bytestring
""
In Reply To
8
0x36e
--
Bytestring
""
Newsgroups
8
0x370
2
uint16
0x0000
Priority
9
0x372
--
Bytestring
""
Special Mime-Type
10
0x374
1
bool

Use user data

A Timestamp has this structure:

length
type
desc
--
ByteString
name of the creator/modifier. Is less than or exactly 31 characters; after it, padding bytes follow until the total data length is 31 bytes (padding bytes = 0x20 = Spaces)
4
uint32
Modification Date (format: day+month*100+year*10000
4
uint32
Modification Time (format: centiseconds+seconds*100+minutes*10000+hours*1000000)


persist elements

(In the OpenOffice tree: so3/source/persist/persist.cxx)

This stream is also known as \002OlePress00 or\001Ole10Native .

SummaryInformation

\001Ole

(In the OpenOffice tree: class StgOleStream, sot/source/sdstor/stgole.cxx and .hxx)

Offset in Hex
Length
Type
Default Value
Description
0x00
4
sint32
0x2000001
Version of this stream
0x04
4
uint32

Object Flags
0x08
4
sint32
0
Update Options
0x0C
4
sint32
0
reserved
0x10
4
sint32
0
Moniker 1
(Sorry, I don't know anything about the meaning of these fields)

\001CompObj

(In the OpenOffice tree: class StgCompObjStream, sot/source/sdstor/stgole.cxx and .hxx)

This stream is the "Compatibility Object" I suppose. Its format is this:
Offset in Hex
Length
Type
Default Value
Description
0x00
2
uint16
0x0001
Version of the CompObj
0x02
2
uint16
0xFFFE
Byte Order
0x04
4
uint32
0x0A03
(=Windows 3.1)
Windows version (?)
0x08
4
sint32
0xFFFF (-1)
If this is -1, continue reading the stream
Marker
0x0C
16
ClsId
{C20CF9D1-85AE-11D1-AAB4-006097DA561A}
StarOffice's Class ID?
0x1C
4
uint32
5
Length of the "Username"
0x20

char[]
"Text"
A string of characters, known as "Username"
0x20 + length of username
4
uint32
15
Length of the file format string
0x24 + length of username

char[]
"StarWriter 5.0"
File format string. Basically, this describes the version of the file
0x24 + length1 + length2
4
uint32
0x0000
Terminator, always zero

Notes

  1. Both strings are stored with a terminating zero, and the length includes this character. If zero, no version/format string is stored in the stream.
  2. The length of the file format string can either be -1, zero or a positive value. If it is -1, it means that the next 4 bytes should be interpreted as a Windows Clipboard format. If it's zero, see above. If it's greater than zero, the version string follows. See below.
  3. Star/Open Office knows about lots of version strings.(see sot/source/base/exchange.cxxandtools/inc/solar.h lines 471ff). OpenOffice uses RegisterFormatName (line 253ff from the first file) to get the version number from the string. (XXX There was another file, but I can't find it right now)
  4. Common version strings and their numbers:
StarWriter 3.0
3450
StarWriter 4.0
3580
StarWriter 5.0
5050