Discussion:
Casting highDatum/lowDatum to type text is incorrect
(too old to reply)
anicoara
2014-01-20 20:06:15 UTC
Permalink
Hello,

During my debugging sessions I realized that the type for
highDatum/lowDatum is not text.

The data that is exposed is truncated:

(gdb)
1005
4: lowp = 0x7f9a6458b74c "rtion"
3: highp = 0x22177fc "rtion's"
2: toReturn = 10
1: cmpLen = 0

Note that there is no "rtion" word in words.txt; in fact, the word used in
the index there is "abortion" and "abortion's" respectively.
The issue is that the length field that is packed at the address of
highDatum/lowDatum is 1 byte in length, which is further confirmed by:

(gdb) p VARSIZE_ANY_EXHDR(lowText)
$1 = 8
(gdb) p VARSIZE_ANY(lowText)
$2 = 9

Thus, the length field is only 1 byte long, but highp and lowp are
being set assuming a 4 byte long length filed.
This messes up the index, and the output that you use for marking.
A simple solution to get to the data is to use the macro:
VARDATA_ANY

but the lines of code that are used for marking need to be changed.
Preferably, lowp and highp would be served correctly.

Adrian
Joseph Mate
2014-01-20 20:26:20 UTC
Permalink
Post by anicoara
Hello,
During my debugging sessions I realized that the type for
highDatum/lowDatum is not text.
(gdb)
1005
4: lowp = 0x7f9a6458b74c "rtion"
3: highp = 0x22177fc "rtion's"
2: toReturn = 10
1: cmpLen = 0
Note that there is no "rtion" word in words.txt; in fact, the word used in
the index there is "abortion" and "abortion's" respectively.
The issue is that the length field that is packed at the address of
(gdb) p VARSIZE_ANY_EXHDR(lowText)
$1 = 8
(gdb) p VARSIZE_ANY(lowText)
$2 = 9
Thus, the length field is only 1 byte long, but highp and lowp are
being set assuming a 4 byte long length filed.
This messes up the index, and the output that you use for marking.
VARDATA_ANY
but the lines of code that are used for marking need to be changed.
Preferably, lowp and highp would be served correctly.
Adrian
I have also encountered this issue. We cannot assume the length field is
4 bytes because postgres is using a variable byte length encoding for
the size of the datum. (see postgres.h).

To get around this, I modified the starter code given to us:
/* and get the C string (i.e. (char *)) out of the text structs */
highp = highText->vl_dat;
lowp = lowText->vl_dat;
to:
/* and get the C string (i.e. (char *)) out of the text structs */
highp = VARDATA_ANY(highText);
lowp = VARDATA_ANY(lowText);

Now I am stuck on how we SET_VARSIZE(highText, blah + 4)? Before the
structure used less than 4 bytes for the the length. SET_VARSIZE is
defined to use 4 bytes (see postgres.h). As a result, it overwrites the
existing bits, and the first 3 characters of the existing char* get
overwritten (ex: abortion's->rtions's).

At the end of this post, I have pasted the output

Best Regards,
Joseph

Example output:
DEBUG: CS448 Saved 1 chars, compressed to rtion'
DEBUG: CS448 Saved 1 chars, compressed to rrantl
DEBUG: CS448 Saved 0 chars, compressed to rrant
DEBUG: CS448 Saved 7 chars, compressed to c
DEBUG: CS448 Saved 0 chars, compressed to ucts
DEBUG: CS448 Saved 2 chars, compressed to uci
DEBUG: CS448 Saved 0 chars, compressed to ucent
DEBUG: CS448 Saved 1 chars, compressed to reviato
DEBUG: CS448 Saved 1 chars, compressed to ot'
DEBUG: CS448 Saved 1 chars, compressed to to
DEBUG: CS448 Saved 0 chars, compressed to shes

What lowp and highp look like when using ->vl_dat
DEBUG: AAAA DATA rtion
DEBUG: AAAA DATA 0123456
DEBUG: AAAA DATA rtion's
DEBUG: AAAA DATA rrant
DEBUG: AAAA DATA 0123456
DEBUG: AAAA DATA rrantly
DEBUG: AAAA DATA rrancy
DEBUG: AAAA DATA 012345
DEBUG: AAAA DATA rrant
DEBUG: AAAA DATA am
DEBUG: AAAA DATA 01234567
DEBUG: AAAA DATA cedarian
DEBUG: AAAA DATA uctors
DEBUG: AAAA DATA 012345
DEBUG: AAAA DATA ucts
DEBUG: AAAA DATA ucentes
DEBUG: AAAA DATA 0123456
DEBUG: AAAA DATA ucing
DEBUG: AAAA DATA ucens
DEBUG: AAAA DATA 01234
DEBUG: AAAA DATA ucent
DEBUG: AAAA DATA reviations
DEBUG: AAAA DATA 0123456789
DEBUG: AAAA DATA reviator
DEBUG: AAAA DATA ot
DEBUG: AAAA DATA 0123
DEBUG: AAAA DATA ot's
DEBUG: AAAA DATA tises
DEBUG: AAAA DATA 01234
DEBUG: AAAA DATA tor
DEBUG: AAAA DATA shed
DEBUG: AAAA DATA 0123
DEBUG: AAAA DATA shes

What lowp and highp look like when using
VARDATA_ANY(lowText),VARDATA_ANY(highText):
DEBUG: AAAA DATA abortion
DEBUG: AAAA DATA 0123456789
DEBUG: AAAA DATA abortion's
DEBUG: AAAA DATA aberrant
DEBUG: AAAA DATA 0123456789
DEBUG: AAAA DATA aberrantly
DEBUG: AAAA DATA aberrancy
DEBUG: AAAA DATA 012345678
DEBUG: AAAA DATA aberrant
DEBUG: AAAA DATA abeam
DEBUG: AAAA DATA 01234567890
DEBUG: AAAA DATA abecedarian
DEBUG: AAAA DATA abducts
DEBUG: AAAA DATA 0123456
DEBUG: AAAA DATA abeam
DEBUG: AAAA DATA accuracies
DEBUG: AAAA DATA 0123456789
DEBUG: AAAA DATA accuracy
DEBUG: AAAA DATA accumulativeness
DEBUG: AAAA DATA 0123456789012345
DEBUG: AAAA DATA accumulator
DEBUG: AAAA DATA acculturize
DEBUG: AAAA DATA 01234567890
DEBUG: AAAA DATA accumbency
DEBUG: AAAA DATA accruals
DEBUG: AAAA DATA 01234567
DEBUG: AAAA DATA accrue

Loading...