Wednesday, June 25, 2008

The IGNORE operator

It occurs to me that Google (or, really, any keyword-style search engine) could use a new operator.

At the moment, I can specify that a keyword or phrase either be included or excluded. A search of A -B, for example, includes all hits of "A" that don't also include "B." So "A" gets found, but "B" doesn't. So far, so good.

But most web pages are not so simple - they have multiple occurrences of keywords, especially when I'm looking at the most relevant results. So "A, and also A and B" may be relevant to what I'm looking for, but that pesky B keeps the result off my list.

A more concrete example: I was looking up uses of the word "patriarchy" on blogs I comment on. There's a quite popular (if quite controversial) feminist blog known as "I Blame the Patriarchy" that's on quite a few blogrolls. If I simply search for jfpbookworm patriarchy, I'm going to get a hit on every single page on Feministing that contains my username, regardless of whether "patriarchy" was included. On the other hand, if I search for jfpbookworm patriarchy -"i blame the patriarchy", I'm going to miss all the results on any site that includes the blogroll on article pages, not to mention any article where IBTP was named.

What I want to look for is all pages that contain the terms "jfpbookworm" and "patriarchy", except that I want to ignore instances of "patriarchy" where it occurs only as part of the phrase "I Blame the Patriarchy." I don't think this is possible by stringing together OR, AND and NOT operators, because there's no way to limit the scope to less than the entire page. What's needed, I think, is an IGNORE operator (I'd propose using "!" as the shorthand, because as far as I'm aware the symbol isn't used and it already has a negation connotation), which says "this phrase is not what I'm searching for, but it's not so obviously wrong that its presence connotes irrelevance." Its use would look something like jfpbookworm patriarchy !"i blame the patriarchy", which would take all the hits for "jfpbookworm patriarchy", "de-highlight" instances of "i blame the patriarchy," and then check again to see if all the keywords are highlighted. (There may be a more efficient way to do this; that's left as an exercise for the reader.)

So what do y'all think? Good idea? Idiosyncratic grumble? Just plain incomprehensible?

3 comments:

Aerik said...

Or you could remove the domain entirely

+jfpbookworm +patriarchy -site:iblamethepatriarchy.com

You know you can do that, right?

Aerik said...

Oh, or you can us the link: operator to slim it down even more

+jfpbookrom intext:patriarchy -link:iblamethepatriarchy.com -site:iblamethepatriarchy.com

jfpbookworm said...

You can do that, but that's not quite the result I'm looking for.

For one thing, it's not just IBTP I'm trying to remove (I don't actually comment there, so it's pretty moot; it's more about places like Feministing that include a link to IBTP on every page). Now I could go through and remove specific sites one by one as they occur, but that's time-consuming, even more so than just skipping past the results.

Second, I *don't* want to remove every result from a site. If I comment using the word "patriarchy" over at Feministing, that should register while a page that simply contains the word as part of the blogroll shouldn't.