On 10/6/11 12:36 PM, Alan Post wrote:
> I've seen too much code that uses regular expressions when it should
> be doing actual parsing, and the slipperly slope of that boundary is
> having to do backtracking/lookahead in a regular expression.
Agreed. I did hit this slippery slope. I started to need lookahead for
more accurate searches. Very quickly, the complexity went up and the
speed went down.
My coordinate redesign will remove the need for lookahead. It will
reduce complexity for accurate searching without sacrifices in speed.
> The end-game, if you're not going to work on optimizing what you've
> got, is to define the full grammar for this data encoding, use a
> parser, and pick the data you'd like out of the parse tree, rather
> than cherry-picking it out of the incoming data by using a regular
Reminds me of a saying. "A programmer had a problem, so he used Regular
Expressions. Now he has two problems."
The end-game is fuzzy searching for signs in a 10+ MB xml file. The
current inaccurate search needs about 5 seconds to process the file.
I'm hoping to reduce the time and increase the accuracy. The coordinate
redesign will help accomplish this.
> I wonder why you have 0,0 at the origin, rather than at one of the
> corners? It seems computationally more efficient to pick 0,0 to be,
> say, the bottom left and get rid of negative numbers entirely.
Each sign is on an individual canvas of variable size. There are 3
coordinates of interest: the minimum, the center, and the maximum. I
decided that the center should always be (0,0) for all signs. This has
2 benefits: I don't need to explicitly state the center, and I don't
need to compute the center. The mathematically formulas work out nice too.
> As a final note, least relevant, there are at least two ways to
> build regular expression engines, and they each have different
> performance behaviors with different classes of regular expressions.
> Are you using a bad RE engine? Will other users do so if you don't?
> (http://swtch.com/~rsc/regexp/regexp1.html) Even simple things like
> compiling the RE before using it can make a difference...
Interesting link. I'll dive into it tomorrow. I'm using simple Regular
Expressions, so I'm hoping the various RE engines will be okay.
> Nothing below scares me, from a living with it forever point of
I'll share my search results soon. Accurate and quick fuzzy searching
of SignWriting is almost here.