Sack Library Documentation
ContentsIndexReferenceHome
PreviousUpNext
sack::containers::text::burstEx Function
C++
__cdecl TYPELIB_PROC PTEXT burstEx(PTEXT input DBG_PASS);
#define burst( input ) burstEx( (input) DBG_SRC )
Parameters 
Description 
input 
pointer to a list of PTEXT segments to parse. 

normal_punctuation=WIDE("'"\({[<>]}):@%/,;!?=*&$^~#`"); 

Process a line of PTEXT into another line of PTEXT, but with words parsed as appropriate for common language.

Burst is a simple method of breaking a sentence into its word and phrase parts. It collapses space and tabs before words into the word. Any space representation is space preceeding the word. Sentences are also broken on any punctuation. "({[<>]})'";;.,/?\!@#$%^&*=" for instances. + and - are treated specially if they prefix numbers, otherwise they are also punctuation. Also groups of '.' like '...' are kept together. if the '.' is in a number, it is stored as part of the number. Otherwise a '.' used in an abbreviation like P.S. will be a '.' with 0 spaces followed by a segment also with 0 spaces. (unless it's the lsat one) 

so initials are encoded badly.

There is an exploit in the parser such that . followed by a number will cause fail to break into seperate words. This is used by configuration scripts to write binary blocks, and read them back in, having the block parsed into a segment correctly.

Created with a commercial version of Doc-O-Matic. In order to make this message disappear you need to register this software. If you have problems registering this software please contact us at support@toolsfactory.com.
Copyright (c) 2000+. All rights reserved.
What do you think about this topic? Send feedback!