Regular Expressions

Top  Previous  Next

A regular expression is text that is used to describe textual patterns, and it can be thought of as a very terse descriptive language. This capability is provided to help with some searches that you can't pull off with eBay's query language. Entire books have been written about them and it will be beyond the scope of this document to give comprehensive coverage here. Further reading or searching on the web is recommended if you are interested.

 

Prospector takes regular expressions to use as must-match or must-not-match filters for items. Special characters called metacharacters, and character sequences called metacharacter sequences are mixed in with normal characters to form a pattern or a template that can be applied to text to see if there is a match. An example is "ST3.*LC". The period and the asterisk are metacharacters - the period means "match any character", and the asterisk is a quantifier on the preceding character that means "zero or more times". Together, they form a pattern that will match the following:

       ST3LC                - the ".*" maps to nothing to make this match

       ST318452LC        - the ".*" matches "18452" to make this match

 

Because these metacharacters are mixed in with regular characters to specify a pattern, they can make for tough learning because the metacharacters first need to be distinguished and their meanings understood in order to figure out what the pattern is supposed to match.

 

The "ST3.*LC" pattern is actually applicable to computer hard drives, specifically to one of Seagate's lines of computer disk drives so we'll continue to use it as an example.

 

Seagate has a labeling scheme for their SCSI drives like so:

       ST3xxxxxCC

where ST3 is the model, xxxxx is the capacity of the drive and CC is the interface code for the drive.

 

LC happens to be the interface code for 80-pin LVD, which is a common interface. You can do an ST3* with eBay's query language but you will pick up other interfaces like ST3xxxxxLW and ST3xxxxxFC which you can't filter out.

 

With regular expressions, you can add a "Must Not Match" expression such as:

       (LW|FC)

 

The parentheses and the vertical bar are metacharacters. The parentheses define a group of text within which is an OR '|' operator. This expression says either "LW" or "FC". When placed in the Must-Not-Match box, Prospector will filter out all results that do match this expression.

 

Common metacharacters are:

 

Metacharacter

Meaning

. (period)

Matches any character

* (asterisk)

Quantifies the preceding character and matches it zero or more times for as many times as possible.

+

Quantifies the preceding character and matches it one or more times for as many times as possible.

?

Quantifies the preceding character and matches it zero or one time, matching one time if possible.

[ccc] (square brackets)

Matches any character found within the square brackets. [ace] will match the characters a, c, or e.

The '-' (dash) character can be used to specify a range, so [a-e] means a, b, c, d, or e.

The '^' (caret) character can be used to negate, and mean "everything except", so [^a-e] means any character except a, b, c, d, or e.

( ) (parentheses)

Group.

|

Boolean OR operator, used to specify alternatives.

\d

Matches any digit, equivalent to [0-9].

\D

Matches any non-digit, equivalent to [^0-9].

\s

Matches a whitespace character.

\S

Matches a non-whitespace character.

\w

Matches any alphanumeric character, equivalent to [_0-9a-zA-Z].

\W

Matches any non-alphanumeric character, equivalent to [^_0-9a-zA-Z].

 

Prospector supports the full Microsoft .NET regular expression language.

 

Notes

 

Prospector performs regular expression matching only on listing titles. While it would be desirable to match within descriptions as well, it would be very costly in terms of the amount of data that will need to be fetched, hence the limitation.