Print

Print



On 08/10/2011 1:45 PM, Steve Slevinski wrote:
> Hi Jonathan,
>
> Thanks for the comments.  FYI, I will not be changing the .spml 
> files.  These files will be available as a custom export.  It will 
> take several seconds to create.

OK that's good to know.
> On 10/7/11 12:03 PM, Jonathan wrote:
>> Hi Steve,
>>     I don't remember why you want to use a string in the XML file for 
>> the signs. 
> Speed, portability, and simplicity.  I just completed my proof of 
> concept for fuzzy searching.  From a 3 MB file with over 10 thousand 
> signs, I can get accurate search results in less than 1 second.  I am 
> using Regular Expressions to process ASCII characters.  Next week, 
> I'll write about fuzzy searching, with the appropriate links for the 
> proof of concept.
>
>> Wouldn't building everything out of XML be easier to work with? 
> Yes and no.  Yes, because XML offers organization and portability.  
> No, because XML has a lot of overhead and gotchas.  The libraries take 
> time to process text.  Not all libraries work the same or support the 
> same feature set.  I think XML is too robust for simple text processing.
>
>> Many libraries can parse XML back to objects or save to a database to 
>> do calculations and searches on.  My feeling is that XML and what's 
>> in it should be primarily for transporting data. 
> Can you show me an example of the type of XML you'd want to use for an 
> individual sign?

I was thinking of something a little like

<entry id="3" cdt="1172438877" mdt="1218173289" usr="admin">
<sign align="left" maxx="23" maxy="37">
<symbol x="1" y="7">???</symbol>
<symbol x="-22" y="7">???</symbol>
<symbol x="-2" y="-38">???</symbol>
<sequence>
<seqsymbol pos="1">???</seqsymbol>
<seqsymbol pos="2">???</seqsymbol>
<seqsymbol pos="3">???</seqsymbol>
</sequence>
</sign>
<term>DELAY</term>
<text>Delay, postpone, move forward in time</text>
<src>So and So</src>
</entry>

This way the only thing that has to have special code to parse is the 3 
character Unicode string.  I would have to look into it a little deeper 
for agreeing on a final XML.  I think it would be easier for programmers 
to use being that there is less parsing to do and can use regular XML 
parsing tools to get at the information.
>
>> In my personal opinion, information that is one piece in itself 
>> shouldn't be concatenated with other data and then have to do special 
>> parsing to get a specific part of it.
> I can understand the logic and agree in part.  For me, sign text 
> should be like regular text.  This means spaces separate words.  For 
> me, each word is a piece unto itself and should be concatenated 
> without spaces or punctuation because it is a unit.
>
>> So I don't really like the 6 digits you are proposing below. 
> You can continue to use the premilinary Unicode strings if you 
> prefer.  I've found that the ASCII version can be processed 4 times 
> faster or more.  The ASCII regular expressions as always consistent, 
> but the Unicode uses 3 different strings based on the encoding form of 
> UTF-8, UTF-16, and UTF-24.
>
>> But if we are going to have to parse it then at least make it easy to 
>> distinguish the parts.  It think that if you are going to keep the 
>> string notation then, maybe the information should be enclosed within 
>> an identifying symbols. Something like
>>
>> for the coordinates (41,60), (-18,-18) and  (11,-23)
> Commas and parenthesis add punctuation to the string causing many 
> unusual side effects and increase the possibility of a broken string.
>
> I do agree with your point.  The current coordinate notation is 
> sloppy.  I've employed a simple fix.  I add 500 to each value.  This 
> means coordinates will always be 7 characters long: 3 for the X value, 
> 1 for the separating value, and 3 for the Y value.
>
> The coordinate (41,60) becomes 541x560.  The coordinate (-18,-18) 
> becomes 482x482. I was not planning to update the preliminary Unicode 
> version with the new coordinate strings unless someone requested it.  
> So for the .spml files, I'm not planning any changes.
Yes I like it much better with the x in the middle.
>
>> What about C for coodinate, then the X or Y value + 500 to get the 
>> the Unicode point value.  One Unicode character for X and one for Y?
> Additional Unicode characters are not being considered right now 
> because there is no consensus on the higher level protocols of 
> SignWriting for Unicode.  Instead of the coordinate style of 
> SignPuddle, they may choose a conceptual design based on deeper 
> structural.
>
> But if the 2nd Unicode proposal did choose to go with coordinates, 1 
> or 2 rows of negative values and 1 or 2 rows of positive values would 
> be best.
>
> As per your above preference, there is no reason to concatenate the X 
> and Y values into a single character, although a single character for 
> each point on a 2-dimensional grid of 256 by 256 does have a certain 
> novelty.
I didn't mean both the X and the Y saved within one character, rather, 
one character each.
>
>> If you do go with what is below, I can make it work for my program.  
>> I don't have any issues with the new limited size of the axis to -500 
>> to +499
> I'm glad you don't mind the size limitation.  This is the biggest 
> change and it is mainly a validation issue.
>
>> I am interested in your thoughts or comments on the above
> Thanks for the comments.
Thanks for yours too!!
> -Steve
>
>
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.914 / Virus Database: 271.1.1/3943 - Release Date: 10/07/11 00:34:00
>

-- 

*  *

*                             _                        ____                                      *

*  /\                         | |                      (|    \                                     *

*|   |   __    _   _     __, _|_ | |      __,    _   _        |     |         _   _     __    __,    _   _    *

*|   | /   \_/ |/ |   /   |   |   |/ \    /   |   / |/ |      _|     ||    |   / |/ |   /     /   |   / |/ |   *

*  \_|/\__/    |   |_/\_/|_/|_/|    |_/\_/|_/   |   |_/   (/\___/   \_/|_/   |   |_/\___/\_/|_/   |   |_/*

*   /|                                                                                           *

*   \|                                                                                         *

email: [log in to unmask] <mailto:[log in to unmask]>
[log in to unmask] <mailto:[log in to unmask]>
Cel: 9983-1204
Tel: 2213-5285
Skype: yojoduncan

SignWriter Studio <http://www.signwriterstudio.com/>