LISTSERV mailing list manager LISTSERV 16.0

Help for SW-L Archives


SW-L Archives

SW-L Archives


SW-L@LISTSERV.VALENCIACOLLEGE.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

SW-L Home

SW-L Home

SW-L  October 2011

SW-L October 2011

Subject:

Re: Signbox size and coordinate strings

From:

Alan Post <[log in to unmask]>

Reply-To:

SignWriting List: Read and Write Sign Languages

Date:

Thu, 6 Oct 2011 11:36:18 -0600

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (143 lines)

I've seen too much code that uses regular expressions when it should
be doing actual parsing, and the slipperly slope of that boundary is
having to do backtracking/lookahead in a regular expression.

The end-game, if you're not going to work on optimizing what you've
got, is to define the full grammar for this data encoding, use a
parser, and pick the data you'd like out of the parse tree, rather
than cherry-picking it out of the incoming data by using a regular
expression.

There is a very large design space between there and the problem
you're trying to solve, though!  I'm not suggesting the above to be
the only route available.  It's just where all the roads lead.

I wonder why you have 0,0 at the origin, rather than at one of the
corners?  It seems computationally more efficient to pick 0,0 to be,
say, the bottom left and get rid of negative numbers entirely.  You
can always normalize your coordinate system by doing what you
describe below, but all the intermediate steps would just use the
always-positive x,y values.

The idea of providing a limit on the coordinate size seems very
sound to me.  TeX did this, many years ago, because paper is only so
large, and you can pick a value that is an order of magnitude larger
and find that no one ever cares.  The same principle applies here,
a sufficiently high limit is essentially infinite, because the
problem domain is defined by limitations of human perception, not
by limitations of current computer hardware.

As a final note, least relevant, there are at least two ways to
build regular expression engines, and they each have different
performance behaviors with different classes of regular expressions.
Are you using a bad RE engine?  Will other users do so if you don't?
(http://swtch.com/~rsc/regexp/regexp1.html)  Even simple things like
compiling the RE before using it can make a difference...

Nothing below scares me, from a living with it forever point of
view,

-Alan

On Thu, Oct 06, 2011 at 10:58:47AM -0500, Steve Slevinski wrote:
> Hi list,
> 
> Here is my current design and a technical discussion.  Any feedback
> is appreciated.  Please ignore if you don't want to peak under the
> hood.
> 
> Background material:
> =============
> 1) Regular Expressions
> http://en.wikipedia.org/wiki/Regular_expression
> 
> 2) Cartesian Coordinates.
> http://en.wikipedia.org/wiki/Cartesian_coordinates
> 
> =============
> 
> I use Cartesian Coordinates for the SignPuddle data.  We start with
> a 2-dimensional canvas.  Both the width and the height are divided
> into specific points to create a grid.  The center of the grid is
> point (0,0).  The horizontal position is called the X value.  The
> vertical position is called the Y value.
> 
>          -y|
>            |
>            |
>            |
> -x         |          +x
> -----------+------------
>            |
>            |
>            |
>            |
>          +y|
> 
> 
> 
> In my current design, the x and y values are unlimited.  Negative to
> the top-left.  Positive to the bottom-right.
> 
> In general, the challenge I face is to create a string that
> represents a specific coordinate.  My current string has the form
> "n100x100" for the coordinate (-100,100)".  Simply replace the "-"
> minus sign with an "n" and replace the "," comma with an "x".  The
> purpose of these replacements is to enable double click selection.
> The "n" and the "x" continue the string without a character that
> creates a gap.
> 
> Regular Expressions allow for efficient searching and pattern
> matching.  Regular expressions are simple and powerful when used
> correctly.  They can easily become overly complex and difficult to
> understand.
> 
> The current coordinate characters can be described with the regular
> expression pattern:
> "n?[0-9]+xn?[0-9]+"
> 
> This can be understood in parts.
> 
> n? , may or may not have an "n"
> 
> [0-9] , select one value between 0 and 9.
> 
> [0-9]+ , select one or more digits
> 
> x , match the character "x"
> 
> I've run into a problem that general searching is inefficient or
> slow.  This is due to Unicode and the current form of the coordinate
> value.  More accurate searching is forcing me use overly complex
> Regular Expressions features, like negative lookahead.
> 
> I think I need to change the form of my coordinates so that
> searching is efficient and accurate.  I am considering a new form of
> coordinate string that is a simple value 6 digits long.
> 
> The pattern can be described as "[0-9]{6}".   Understood in parts as:
> 
> [0-9] , select one value between 0 and 9.
> [0-9]{6} , select six values between 0 and 9.
> 
> I will limit both the X and Y axis to the values -500 to +499.  The
> center is still (0,0).
> 
> Here is the coordinate string for (0,0): "500500".  The string is
> divided in half.  The first 3 digits are for the X value and the
> last 3 digits are used for the Y value.  Simply subtract 500 from
> the value in the string.  To go in the reverse, simply add 500 to
> the value and combine the Y and Y values.  For example, the
> coordinate (111,111) would have a string of "611611" and the
> coordinate (-15,-20) would have the string "485480".
> 
> Depending on speed experiments, I may duplicate the SignPuddle XML
> files with ASCII rather then the Preliminary Unicode.  Large files
> have a lot of wasted overhead processing UTF-8 and Unicode values.
> 
> Thoughts? Opinions?
> -Steve
> 

-- 
.i ma'a lo bradi cu penmi gi'e du

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010

ATOM RSS1 RSS2



LISTSERV.VALENCIACOLLEGE.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager