The .sdw file format is an OLE2 Stream. libole2 (available as part of wv) can be used to access the different substreams contained in the file.
All numbers given below are decimal, unless prefixed by 0x, in which
case they are hexadecimal.
Note that this documentation is far from complete!
The example code given below was used with a C++ Compiler, but should mostly be valid C as well.
While writing the SDW Importer, I found this small utility I wrote (dumpstream.c [requires libole2]) (GPL) very
helpful. It can be used to print out the contents of a OLE2 stream. If
invoked like "dumpstream filename.sdw
", it lists the
streams that are part of the file. If invoked like "dumpstream
filename.sdw StarWriterDocument
", it prints out the contents of
that stream (you might want to pipe the output through xxd
to get a hexdump). Don't expect too much from this tool - it's a
q&d hack I've made to be able to see the contents of a stream.
Filepaths like sw/source/...
are paths to the
OpenOffice sourcecode, relative to the root.
Most types should speak for themselves (e.g. uint16 = unsigned 16 bit integer, sint32 = signed 32 bit integer).
There is, however, at least one special type: The Class ID, also known as ClsId. It's a structure defined a follows:
struct ClsId {
sint32 n1;
sint16 n2, n3;
uint8 n4, n5, n6, n7, n8, n9, n10, n11;
};
The elements of the structure are stored in the file without any padding and in the order in which they occur in the above struct definition.
The type bool
is a one-byte integer, where 0 means
false and all other values true; though usually 1 is stored.
Another important type is the Bytestring
,it
looks like this: First, there is an
uint16
, giving the
length in bytes of the following string. A char[]
follows.
It is supposed to be firstly decrypted (if in the StarWriterDocument
stream outside the header and if the document is encrypted; see below).
Under the same condition, the string is in the character set specified
in the document header.
Offset in Hex |
Length |
Type |
Default Value |
Description |
0x00 |
7 |
char[] |
"SW5HDR" |
Version Indicator, null-terminated. Can be "SW3HDR", "SW4HDR" or "SW5HDR" |
0x07 |
1 |
uint8 |
0x2e (?) |
Length of the header, including Block Name,
but not including Record Sizes (if used) |
0x08 |
2 |
uint16 |
0x0217 |
Document version, increased every time a new
feature is added. |
0x0A |
2 |
uint16 |
n/a |
File Flags, see below |
0x0C |
4 |
uint32 |
n/a |
Document Flags, see below |
0x10 |
4 |
uint32 |
0 |
nRecSzPos (?) |
0x14 |
6 |
-- |
0 |
dummy bytes... actually, uint32, uint8, uint8 |
0x1A |
1 |
uint8 |
0x30 |
Redline mode, see below |
0x1B |
1 |
uint8 |
0x00 |
Compatibility Version. Is increased when a
change makes the file format incompatible with previous versions. |
0x1C |
16 |
uint8[] |
n/a |
Password verification data, see below |
0x2C |
1 |
uint8 |
depends |
The character coding of the file. Here is a file which includes mapping of
StarWriter IDs to iconv names, usable a a C/C++ Header file |
0x2D |
1 |
uint8 |
0x00 |
cGui (?) "OLD: eSysType" (?) so not in use
anymore? |
0x2E |
4 |
uint32 |
Current Date, used for Password verification
(see below). Format:
20020501 |
|
0x32 |
4 |
uint32 |
Current Time, also for PW Verification, Format:
22034800 (HHMMSS00) |
|
0x36 |
64 |
char[] |
sBlockName (?) (in the document charset) (only
read if SWGF_BLOCKNAME flag is set!) |
|
rec sizes... only if nRecSzPos != 0 &&
nVersion >= SWG_RECSIZES see sw/source/core/sw3io/sw3imp.cxx lines 1070ffI don't know details yet |
After the header, the file consists of many sections. Each section
starts with a character (char
) that indicates of what
type it is (hereafter called section id). After that, there are three
bytes indicating the length of the section (little-endian). (to
convert this to a usable integer, use for example: (buf [0] |
(buf[1] << 8) | (buf [2] << 16))
where buf points to
the first of the three read bytes). This means that unsupported
Sections can easily be skipped.
See below for a list of section types.
(from sw/source/core/sw3io/sw3ids.hxx
lines 65ff)
#define SWGF_BLOCKNAME 0x0002
Header has textmodule
Stream is password protected, see below for details.
#define SWGF_HAS_PASSWD 0x0008
Stream has pagenumbers
#define SWGF_HAS_PGNUMS 0x0100
There was an error writing the file - treat it as unusable.
#define SWGF_BAD_FILE 0x8000
#define SWDF_BROWSEMODE1 0x1
Show document in browse mode?
#define SWDF_BROWSEMODE2 0x2
Same as above, only one of them need to be set
#define SWDF_HTMLMODE 0x4
Document is in HTML Mode
#define SWDF_HEADINBROWSE 0x8
Show headers in Browse Mode
#define SWDF_FOOTINBROWSE 0x10
Show footers in browse mode
#define SWDF_GLOBALDOC 0x20
Is a global document (a global document can contain chapter
documents... I think)
#define SWDF_GLOBALDOCSAVELINK 0x40
Include sections that are linked to the global document when saving
#define SWDF_LABELDOC 0x80
Is a label ("etiketten") document
(from sw/inc/redlenum.hxx
lines 83ff)
enum SwRedlineMode
No Redline mode
{
REDLINE_NONE,
REDLINE_ON = 0x01,
Redlines are on
REDLINE_IGNORE = 0x02,
Don't react to redlines
Show all inserts
REDLINE_SHOW_INSERT
= 0x10,
REDLINE_SHOW_DELETE
= 0x20,
Show all deletes
REDLINE_SHOW_MASK = REDLINE_SHOW_INSERT | REDLINE_SHOW_DELETE
The Default
};
(from sw/source/core/sw3io/sw3imp.cxx
lines 2721ff and sw/source/core/sw3io/crypter.cxx
lines 77ff)
Firstly, to be able to en- or decrypt data, the password must be encrypted in memory (see below for the actual algorithm). For this encryption, this password is always used. Also, the password needs to be exactly 16 characters long; if it's shorter, it needs to be padded with spaces:
static const UT_uint8 gEncode[] =
{ 0xab, 0x9e, 0x43, 0x05, 0x38, 0x12, 0x4d, 0x44,
0xd5, 0x7e, 0xe3, 0x84, 0x98, 0x23, 0x3f, 0xba };
The resulting string will be used as the password for actual en- or
decryption. (For both en- and decryption the same algorithm will be
used).
Here's the algorithm:
void SDWCryptor::Decrypt(const char* aEncrypted, char*
aBuffer, UT_uint32 aLen) const {
size_t nCryptPtr = 0;
UT_uint8 cBuf[maxPWLen];
memcpy(cBuf, mPassword,
maxPWLen);
UT_uint8* p = cBuf;
if (!aLen)
aLen = strlen(aEncrypted);
while (aLen--) {
*aBuffer++ = *aEncrypted++ ^ ( *p ^ (UT_uint8) ( cBuf[ 0 ] * nCryptPtr
) );
*p += ( nCryptPtr < (maxPWLen-1) ) ? *(p+1) : cBuf[ 0 ];
if( !*p ) *p += 1;
p++;
if( ++nCryptPtr >= maxPWLen ) {
nCryptPtr = 0;
p = cBuf;
}
}
}
Where:
maxPWLen
= 16
mPassword
is an array of characters, 16 bytes long, and
contains the password which will be used
To verify that the given password is actually correct, these steps
should be taken:
A new string, say testString
, should be built,
consisting of the Date and Time (from the header) next to each
other in Hex format, padded with 0 on the left if shorter than 8
characters (can for example be archieved by snprintf(testString,
sizeof(testString), "%08lx%08lx", mDate, mTime);
)
This string should now be encrypted with the given password, and the
result should be compared to the password verification data
mentioned above. If they are equal, the password is correct.
(the letter in parentheses is the section id)
'N'
)uint32
giving the number of nodes (?) (sw/source/core/sw3io/sw3sectn.cxx
lines 181ff)
uint16
= a dummy section
id, can be thrown away (at least that's what openoffice does)uint16
, same meaning as the nodes from above, just
as a 2 byte integer.'T'
) (sw/source/core/sw3io/sw3nodes.cxx
lines 788ff)'A'
),
which have the following structure)
offset relative to record start | length |
type |
description |
0x00 |
1 |
flag |
Flag record, as above. |
0x01 |
2 |
uint16 |
Which type of attribute
this is. |
0x03 |
2 |
uint16 |
Version of the attribute
(I don't know details yet) (seems to be 0 usually) |
0x05 |
2 |
uint16 | offset of the first
character to which this attribute applies, relative to the start of the
textnode. only exists if flag 0x10 is set. zero-based. |
0x07 |
2 |
uint16 |
offset of the last character. only exists if flag 0x20 is set. |
0x100a |
Italic |
0x100d |
Underline |
0x100e |
Bold |
'J'
)This section contains informations about the selected printer and paper. This only reflects the settings made in File|Printer Setup, not Format|Page!
Firstly, two defines:#define
JOBSET_FILE364_SYSTEM
(0xFFFF)
#define JOBSET_FILE605_SYSTEM (0xFFFE)
Offset in Hex |
Length |
Type |
Default Value |
Description |
0x00 |
2 |
uint16 |
TBD |
Length [nLen] |
0x02 |
2 |
uint16 |
TBD |
System (?) |
0x04 |
64 |
char[] |
Printer Name |
|
0x44 |
32 |
char[] |
Device Name |
|
0x64 |
32 |
char[] |
Port Name |
|
0x84 |
32 |
char[] |
Driver Name |
|
All further fields are only used if
System is
JOBSET_FILE364_SYSTEM or JOBSET_FILE605_SYSTEM |
||||
0xA4 |
2 |
uint16 |
TBD |
nSize |
0xA6 |
2 |
uint16 |
TBD |
nSystem (again??) |
0xA8 |
4 |
uint32 |
TBD |
Driver Data Length |
0xAC |
2 |
enum [uint16] |
Orientation (0=Portrait, 1=Landscape) |
|
0xAE |
2 |
uint16 |
Paper Bin |
|
0xB0 |
2 |
enum [uint16] |
Paper Format (0=A3, 1=A4, 2=A5, 3=B4,
4=B5, 5=Letter, 6=Legal, 7=Tabloid, 8=User defined |
|
0xB2 |
4 |
uint32 |
Paper Width |
|
0xB6 |
4 |
uint32 |
Paper Height |
|
0xBA |
Driver Data Length |
? |
Driver Data (?)
(vcl/source/gdi/jobset.cxx lines 383ff). Only if the Driver Data
Length is > 0. |
|
nLen minus already read data |
Corresponding Key and Value strings
(ByteStrings, in UTF-8 encoding). Only if System == JOBSET_FILE605_SYSTEM |
JOBSET_FILE364_SYSTEM
, in which
case it is the same encoding as the rest of the document.'Z'
)'!'
)Offset in Hex |
Length |
Type |
Description |
0x00 |
1 |
uint8 |
Character Set for the strings |
0x01 |
2 |
uint16 |
Number of strings |
Length |
Type |
Description |
2 |
uint16 |
ID of the string |
n/a |
Bytestring |
The string. Not encrypted. If ID ==
IDX_NOCONV_FF (0xFFFC), then the 0xFF character in the string should
be left unconverted; else, a normal conversion can be performed. |
ID |
New String |
RES_POOLCOLL_HTML_LISTING_40 (0x3002) |
"LISTING" |
RES_POOLCOLL_HTML_XMP_40 (0x3003) |
"XMP" |
old ID |
new ID |
RES_POOLCOLL_HTML_LISTING_40 /
RES_POOLCOLL_HTML_XMP_40 |
must be or'ed with USER_FMT (1 << 15) |
RES_POOLCOLL_HTML_HR_40 (0x3004) |
RES_POOLCOLL_HTML_HR (0x3002) |
RES_POOLCOLL_HTML_H6_40 (0x3005) |
RES_POOLCOLL_HEADLINE6 (0x80f) |
RES_POOLCOLL_HTML_DD_40 (0x3006) |
RES_POOLCOLL_HTML_DD (0x3003) |
RES_POOLCOLL_HTML_DT_40 (0x3007) |
RES_POOLCOLL_HTML_DT (0x3004) |
'Z'
)(OpenOffice Tree: sfx2/source/doc/docinf.cxx
lines
786ff)
offsets assume a version of 0x0B and default values (for bytestrings
and lengths). quotes are from openoffice code or comments
present if header version > |
Offset in Hex |
Length |
Type |
Default Value |
Description |
0x00 |
2 |
uint16 |
0x0F |
Length of the following String |
|
0x02 |
15 |
char[] |
"SfxDocumentInfo" |
Headerstring (stored without terminating zero) |
|
0x11 |
2 |
uint16 |
0x000B |
Version |
|
0x13 |
1 |
bool |
0x00 |
True if doc is pw protected |
|
0x14 |
2 |
uint16 |
0x0016 (on my system) |
Charset, see below. |
|
0x16 |
1 |
bool |
0x00 |
Graphics are saved portable |
|
0x17 |
1 |
bool |
0x01 |
Ask the user whether the template should be
reloaded |
|
0x18 |
41 |
Timestamp |
Creator Timestamp, see below |
||
0x41 |
41 |
Timestamp |
Timestamp for last Modification |
||
0x6a |
41 |
Timestamp |
Timestamp for last Printing |
||
0x93 |
65 |
Bytestring+Padding |
"" |
Title of the document; pad until 63 chars are
read |
|
0xd4 |
65 |
Bytestring+Padding |
"" |
Theme/Subject of the document, pad until 63 |
|
0x115 |
257 |
Bytestring+Padding |
"" |
Comment, pad until 255 |
|
0x216 |
129 |
Bytestring+Padding |
"" |
Keywords, pad until 127 |
|
0x297 |
4*42 |
following two fields are repeated 4 times: |
|||
21 |
Bytestring+Padding |
"Info0" - "Info4" |
Name of user-defined field, padded until 19 |
||
21 |
Bytestring+Padding |
"" |
content of user-defined field, padded until 19 |
||
0x33f |
-- |
Bytestring |
"" |
Template Name |
|
from here on, offset
assumes an empty template name and filename |
|||||
0x341 |
-- |
Bytestring |
"" |
Template Filename |
|
0x343 |
4 |
uint32 |
Template Date (format as in Timestamp) |
||
0x347 |
4 |
uint32 |
Template Time (format as in Timestamp) |
||
0x34b |
2 |
uint16 |
Mail-Adress count. Only if the stream version
(of StarWriterDocument?) is <= SOFFICE_FILEFORMAT_40 (3580). Unused
field. |
||
following two fields are
repeated number_of_mail_adresses times; and can be
ignored |
|||||
-- |
Bytestring |
the address |
|||
2 |
uint16 |
flags |
|||
following offsets assume
that the stream version is >= SOFFICE_FILEFORMAT_40 and that
therefore the mail addresses aren't present |
|||||
0x34b |
4 |
int32 |
? |
lTime (?) |
|
4 |
0x34f |
2 |
uint16 |
1 |
Document number (seems to be the version, ie.
how often the document was saved) |
0x351 |
2 |
uint16 |
0 |
user data size |
|
0x353 |
see above |
byte[] |
user data "e.g. document statistic". following
offsets assume that this is not present |
||
0x353 |
1 |
bool |
Template contains configuration |
||
5 |
0x354 |
1 |
bool |
false |
Reload enabled? |
5 |
0x355 |
-- |
Bytestring |
"" |
Reload URL |
5 |
0x357 |
4 |
uint32 |
60 |
Reload seconds |
5 |
0x35b |
-- |
Bytestring |
"" |
Default Target Frame |
6 |
0x35d |
1 |
bool |
true |
Save Graphics compressed (if true, next field is
also true) |
7 |
0x35e |
1 |
bool |
true |
Save Original Graphics |
8 |
0x35f |
1 |
bool |
false |
Save Version on Close (?) |
8 |
0x360 |
-- |
Bytestring |
"" |
Copies to |
8 |
0x362 |
-- |
Bytestring |
"" |
Original |
8 |
0x364 |
-- |
Bytestring |
"" |
References |
8 |
0x366 |
-- |
Bytestring |
"" |
Recipient |
8 |
0x368 |
-- |
Bytestring |
"" |
Reply To |
8 |
0x36a |
-- |
Bytestring |
"" |
Blind Copies |
8 |
0x36c |
-- |
Bytestring |
"" |
In Reply To |
8 |
0x36e |
-- |
Bytestring |
"" |
Newsgroups |
8 |
0x370 |
2 |
uint16 |
0x0000 |
Priority |
9 |
0x372 |
-- |
Bytestring |
"" |
Special Mime-Type |
10 |
0x374 |
1 |
bool |
Use user data |
A Timestamp has this structure:
length |
type |
desc |
-- |
ByteString |
name of the creator/modifier. Is less than or
exactly 31 characters; after it, padding bytes follow until the total
data length is 31 bytes (padding bytes = 0x20 = Spaces) |
4 |
uint32 |
Modification Date (format:
day+month*100+year*10000 |
4 |
uint32 |
Modification Time (format:
centiseconds+seconds*100+minutes*10000+hours*1000000) |
(In the OpenOffice tree: so3/source/persist/persist.cxx
)
\002OlePress00
or\001Ole10Native
.(In the OpenOffice tree: class StgOleStream
, sot/source/sdstor/stgole.cxx
and .hxx
)
Offset in Hex |
Length |
Type |
Default Value |
Description |
0x00 |
4 |
sint32 |
0x2000001 |
Version of this stream |
0x04 |
4 |
uint32 |
Object Flags |
|
0x08 |
4 |
sint32 |
0 |
Update Options |
0x0C |
4 |
sint32 |
0 |
reserved |
0x10 |
4 |
sint32 |
0 |
Moniker 1 |
(In the OpenOffice tree: class StgCompObjStream
, sot/source/sdstor/stgole.cxx
and .hxx
)
Offset in Hex |
Length |
Type |
Default Value |
Description |
0x00 |
2 |
uint16 |
0x0001 |
Version of the CompObj |
0x02 |
2 |
uint16 |
0xFFFE |
Byte Order |
0x04 |
4 |
uint32 |
0x0A03 (=Windows 3.1) |
Windows version (?) |
0x08 |
4 |
sint32 |
0xFFFF (-1) If this is -1, continue reading the stream |
Marker |
0x0C |
16 |
ClsId |
{C20CF9D1-85AE-11D1-AAB4-006097DA561A} |
StarOffice's Class ID? |
0x1C |
4 |
uint32 |
5 |
Length of the "Username" |
0x20 |
char[] |
"Text" |
A string of characters, known as "Username" |
|
0x20 + length of username |
4 |
uint32 |
15 |
Length of the file format string |
0x24 + length of username |
char[] |
"StarWriter 5.0" |
File format string. Basically, this describes
the version of the file |
|
0x24 + length1 + length2 |
4 |
uint32 |
0x0000 |
Terminator, always zero |
sot/source/base/exchange.cxx
andtools/inc/solar.h
lines 471ff). OpenOffice uses RegisterFormatName
(line 253ff from the first file) to get the version number from the
string. (XXX There was another file, but I can't find it right now)StarWriter 3.0 |
3450 |
StarWriter 4.0 |
3580 |
StarWriter 5.0 |
5050 |