Hi Jonathan,

Thanks for the comments.  FYI, I will not be changing the .spml files.  These files will be available as a custom export.  It will take several seconds to create.


On 10/7/11 12:03 PM, Jonathan wrote:
[log in to unmask]" type="cite"> Hi Steve,
    I don't remember why you want to use a string in the XML file for the signs. 
Speed, portability, and simplicity.  I just completed my proof of concept for fuzzy searching.  From a 3 MB file with over 10 thousand signs, I can get accurate search results in less than 1 second.  I am using Regular Expressions to process ASCII characters.  Next week, I'll write about fuzzy searching, with the appropriate links for the proof of concept.

[log in to unmask]" type="cite">Wouldn't building everything out of XML be easier to work with? 
Yes and no.  Yes, because XML offers organization and portability.  No, because XML has a lot of overhead and gotchas.  The libraries take time to process text.  Not all libraries work the same or support the same feature set.  I think XML is too robust for simple text processing.

[log in to unmask]" type="cite">Many libraries can parse XML back to objects or save to a database to do calculations and searches on.  My feeling is that XML and what's in it should be primarily for transporting data. 
Can you show me an example of the type of XML you'd want to use for an individual sign?

[log in to unmask]" type="cite">In my personal opinion, information that is one piece in itself shouldn't be concatenated with other data and then have to do special parsing to get a specific part of it.
I can understand the logic and agree in part.  For me, sign text should be like regular text.  This means spaces separate words.  For me, each word is a piece unto itself and should be concatenated without spaces or punctuation because it is a unit.
 
[log in to unmask]" type="cite"> So I don't really like the 6 digits you are proposing below. 
You can continue to use the premilinary Unicode strings if you prefer.  I've found that the ASCII version can be processed 4 times faster or more.  The ASCII regular expressions as always consistent, but the Unicode uses 3 different strings based on the encoding form of UTF-8, UTF-16, and UTF-24.

[log in to unmask]" type="cite">But if we are going to have to parse it then at least make it easy to distinguish the parts.  It think that if you are going to keep the string notation then, maybe the information should be enclosed within an identifying symbols. Something like

for the coordinates (41,60), (-18,-18) and  (11,-23)
Commas and parenthesis add punctuation to the string causing many unusual side effects and increase the possibility of a broken string.

I do agree with your point.  The current coordinate notation is sloppy.  I've employed a simple fix.  I add 500 to each value.  This means coordinates will always be 7 characters long: 3 for the X value, 1 for the separating value, and 3 for the Y value. 

The coordinate (41,60) becomes 541x560.  The coordinate (-18,-18) becomes 482x482. I was not planning to update the preliminary Unicode version with the new coordinate strings unless someone requested it.  So for the .spml files, I'm not planning any changes.

[log in to unmask]" type="cite"> What about C for coodinate, then the X or Y value + 500 to get the the Unicode point value.  One Unicode character for X and one for Y?
Additional Unicode characters are not being considered right now because there is no consensus on the higher level protocols of SignWriting for Unicode.  Instead of the coordinate style of SignPuddle, they may choose a conceptual design based on deeper structural.

But if the 2nd Unicode proposal did choose to go with coordinates, 1 or 2 rows of negative values and 1 or 2 rows of positive values would be best. 

As per your above preference, there is no reason to concatenate the X and Y values into a single character, although a single character for each point on a 2-dimensional grid of 256 by 256 does have a certain novelty.

[log in to unmask]" type="cite"> If you do go with what is below, I can make it work for my program.  I don't have any issues with the new limited size of the axis to -500 to +499
I'm glad you don't mind the size limitation.  This is the biggest change and it is mainly a validation issue.

[log in to unmask]" type="cite"> I am interested in your thoughts or comments on the above
Thanks for the comments.
-Steve