Print

Print


Hi Jonathan,

Thanks for the comments.  FYI, I will not be changing the .spml files.  
These files will be available as a custom export.  It will take several 
seconds to create.


On 10/7/11 12:03 PM, Jonathan wrote:
> Hi Steve,
>     I don't remember why you want to use a string in the XML file for 
> the signs. 
Speed, portability, and simplicity.  I just completed my proof of 
concept for fuzzy searching.  From a 3 MB file with over 10 thousand 
signs, I can get accurate search results in less than 1 second.  I am 
using Regular Expressions to process ASCII characters.  Next week, I'll 
write about fuzzy searching, with the appropriate links for the proof of 
concept.

> Wouldn't building everything out of XML be easier to work with? 
Yes and no.  Yes, because XML offers organization and portability.  No, 
because XML has a lot of overhead and gotchas.  The libraries take time 
to process text.  Not all libraries work the same or support the same 
feature set.  I think XML is too robust for simple text processing.

> Many libraries can parse XML back to objects or save to a database to 
> do calculations and searches on.  My feeling is that XML and what's in 
> it should be primarily for transporting data. 
Can you show me an example of the type of XML you'd want to use for an 
individual sign?

> In my personal opinion, information that is one piece in itself 
> shouldn't be concatenated with other data and then have to do special 
> parsing to get a specific part of it.
I can understand the logic and agree in part.  For me, sign text should 
be like regular text.  This means spaces separate words.  For me, each 
word is a piece unto itself and should be concatenated without spaces or 
punctuation because it is a unit.

> So I don't really like the 6 digits you are proposing below. 
You can continue to use the premilinary Unicode strings if you prefer.  
I've found that the ASCII version can be processed 4 times faster or 
more.  The ASCII regular expressions as always consistent, but the 
Unicode uses 3 different strings based on the encoding form of UTF-8, 
UTF-16, and UTF-24.

> But if we are going to have to parse it then at least make it easy to 
> distinguish the parts.  It think that if you are going to keep the 
> string notation then, maybe the information should be enclosed within 
> an identifying symbols. Something like
>
> for the coordinates (41,60), (-18,-18) and  (11,-23)
Commas and parenthesis add punctuation to the string causing many 
unusual side effects and increase the possibility of a broken string.

I do agree with your point.  The current coordinate notation is sloppy.  
I've employed a simple fix.  I add 500 to each value.  This means 
coordinates will always be 7 characters long: 3 for the X value, 1 for 
the separating value, and 3 for the Y value.

The coordinate (41,60) becomes 541x560.  The coordinate (-18,-18) 
becomes 482x482. I was not planning to update the preliminary Unicode 
version with the new coordinate strings unless someone requested it.  So 
for the .spml files, I'm not planning any changes.

> What about C for coodinate, then the X or Y value + 500 to get the the 
> Unicode point value.  One Unicode character for X and one for Y?
Additional Unicode characters are not being considered right now because 
there is no consensus on the higher level protocols of SignWriting for 
Unicode.  Instead of the coordinate style of SignPuddle, they may choose 
a conceptual design based on deeper structural.

But if the 2nd Unicode proposal did choose to go with coordinates, 1 or 
2 rows of negative values and 1 or 2 rows of positive values would be best.

As per your above preference, there is no reason to concatenate the X 
and Y values into a single character, although a single character for 
each point on a 2-dimensional grid of 256 by 256 does have a certain 
novelty.

> If you do go with what is below, I can make it work for my program.  I 
> don't have any issues with the new limited size of the axis to -500 to 
> +499
I'm glad you don't mind the size limitation.  This is the biggest change 
and it is mainly a validation issue.

> I am interested in your thoughts or comments on the above
Thanks for the comments.
-Steve