Sack Library Documentation
|
__cdecl TYPELIB_PROC PTEXT burstEx(PTEXT input DBG_PASS); #define burst( input ) burstEx( (input) DBG_SRC )
Burst is a simple method of breaking a sentence into its word and phrase parts. It collapses space and tabs before words into the word. Any space representation is space preceeding the word. Sentences are also broken on any punctuation. "({[<>]})'";;.,/?\!@#$%^&*=" for instances. + and - are treated specially if they prefix numbers, otherwise they are also punctuation. Also groups of '.' like '...' are kept together. if the '.' is in a number, it is stored as part of the number. Otherwise a '.' used in an abbreviation like P.S. will be a '.' with 0 spaces followed by a segment also with 0 spaces. (unless it's the lsat one)
so initials are encoded badly.
There is an exploit in the parser such that . followed by a number will cause fail to break into seperate words. This is used by configuration scripts to write binary blocks, and read them back in, having the block parsed into a segment correctly.
Copyright (c) 2000+. All rights reserved.
|
What do you think about this topic? Send feedback!
|