Search Engines: biases and problems

I had a recent post disappear from listing on Word Press and shortly after it disappeared almost entirely from search engine results as well.  The post only managed to remain as a shadow in Google results in the form of indirect links and some cached pages of when Word Press had listed it, but it disappeared without a trace in Yahoo results.  The last time I checked it never even showed up at all in other search engines.  This got me wondering how search engines work.  Both Google and Yahoo had originally shown and cached the direct link to the post, and so their web crawlers had already discovered it.  However, when it disappeared from Word Press listing the search engines followed suit.  Were the web crawlers no longer able to see my post even though Google and Yahoo previously had the direct link to it?

Also, I’d noticed in the past that the search engines seem to treat the various blogging sites differently.  For a while, I had several blogs going on several hosting sites because I was testing them out.  I was posting the exact same things to each of them, but I often noticed that the My Opera blog often showed up higher in search results than my other blogs.  Now, I use only Word Press because I like its functionality the best.  This recent event, however, made me wonder how often my posts might not show up at all in search results. 

To test it out, I did a search of a blog title that was posted when I was using all of the blogging sites.  In Yahoo search results, only the My Opera post was given a direct link and the other posts such as from Word Press only were given indirect links through the blogs home link, through tag listings, or through other websites’ hyperlinking.  Google gave very different results which gave direct links to the postings on all of the blogging sites, but put Word Press as the top result.  Did Google put Word Press on top because it’s the only blog of mine that is active right now?  If so, why did Yahoo give preference to My Opera which I haven’t used in recent months?  Also, why didn’t Google show direct links to my recent disappeared post on Word Press? 

I did another comparison search between Google and Yahoo using a different early post of mine.  This time Google showed the direct links to my posts on all of the blogging sites except it left out the direct link to the Word Press post.  Yahoo, for some reason, didn’t show a direct link to my post on any of the blogging sites, but did show several indirect links.  As a further experiment, I did a search of the Word Press web address for that post and it doesn’t show up at all in either Google or Yahoo.

Another question that comes to mind is the matter of the biases of search engines.  Do search engines filter their results to fit my past searches?  I’d be fine if they do this as long as they tell me they’re doing this.  And to what degree does advertising and vested interests influence results?  Furthermore, what about the government?  Covert government sites get erased from Google Earth for example.  It wouldn’t surprise me if they don’t simply erase those sites but even replace them with natural looking terrain so that no one would realize something was missing.  It is without a doubt that the government censors some information on the internet.  The question is what kind of information and how often? 

But not everything is nefarious or intentional.  Quite possibly, my disappeared posting was just a glitch.  So, how typical are such technical failures?  If a search engine doesn’t show something as existing, how does someone know it exists?  Even if someone knows it exists and even know an exact title or phrase, how do they seek it out if search engines aren’t helpful?  Do traces remain of disappeared, removed, and lost information?  How can someone recognize a trace of something once having existed or still existing unseen?  How often can those traces lead someone to finding the information?

The first example that made me aware of problems with search engines had to do with the fairly popular writer Acharya S.  She comes up a lot on the internet.  She was partly involved with the heavily watched Zeitgeist film which created the biggest buzz on the internet than any other web realeased film before.  She runs a website that has tons of useful info about her field of expertise.  There really is no other website that is even close to being comparable if you’re interested in researching the subject of astrotheology.  However, when in the past I did a direct “in quote” Google search for the name of her website, I didn’t find it in the top results.  The direct link to her website only showed up several pages beyond the first page of results.  The first several pages were filled with her detractors and other websites linking her website.  If I do a Google search for an exact title, why doesn’t it give me the most exact result right at the top?  Why does it give pages of indirect links before showing the direct link itself?

Are there search engines that give you more control instead of feeding you the info it thinks you want?  Is there a search engine that is upfront and transparent about its biases?

5 thoughts on “Search Engines: biases and problems

  1. I was just playing around with some other alternative search engines. I discovered Mahalo actually shows a direct link to my original post “Cold War Era: Paranoia and Oppression” even though the major search engines don’t. That is cool.

    The reason might be because Mahalo depends on people to organize information rather than solely relying on web crawlers. Yeah! People are still smarter than computers. A victory for the human race. Maybe or maybe not…

    I’m not actually sure how my blog post managed to make it in the Mahalo results. It could’ve just been a different search engine method. I was reading that it’s based on Google results which is strange since Google doesn’t now show my post. Maybe Mahalo gathered the info when Google had originally shown it in results and Mahalo archived it in it’s own results.

  2. Part of the problem with search engines these days is that there are so many people trying to game the system. Heck, even the search engine companies themselves try to game the system. It can be hard to find honest and useful info about exactly what you want.

    I try to make posts interesting to an extent, but that is as far as I go to attract attention. I’m not trying to make money off my writing or anything. I do, however, want my blog posts visible to the larger world. Otherwise, I’d just go back to journalling. So, it bothers me a bit when a post of mine is almost entirely invisible in search results.

    I think a search engine should be designed for honest people looking for honest info. Now, that would be a technological wonder.

  3. I just tried Bing search engine. Guess what? It shows my Cold War blog as the top result. I switched from Yahoo to Google some years ago because I liked the results from the latter somewhat better. However, I may now give up on Google as well. I just can’t trust the results it gives me.

  4. There is another example of search engine difficulties. A while back, I did a search on comparisons between certain products. A large number of the results were advertisements and sponsored “reviews”. I was looking for fair and balanced comparisons, but I was unable to make any clear decision because all of the info was so biased. The advertisers had become so good at gaming the system, that the search engines were useless.

  5. There several ways of improving search engines. Give users more control, more transparency, more info. Make the interface more user-friendly. Prioritize what the user is actually looking for over all other factors. Customize search results to specific users, but also allow users to see how their search is customized and allow users to control that customization. Of course, the best possible improvement would be simply correlating computer programs to how people actually think and communicate.

    All of those are good, but most importantly is not to rely on any single factor or method. Key words can be useful to organize info, but they’re also easy to game. Trying to counter that gaming can just lead to convoluted rules that may not actually help the user. Key words should only be a single factor along with many other factors. There should be user rankings and the rankings the web crawlers give things needs to be more complex and nuanced.

    Furthermore, there needs to be a semi-intelligent dialogue between the search engine and the user. If the search is too general, the search engine should offer clarifying options and questions. There should be categories of searches rather than having to broadly search everything with every search. I want to be able to easily narrow my search down. Search engines should learn from past searches so that it better understands someone’s interests and communication style.

    More importantly, there needs to be a way to distinguish information from advertising. Companies, websites, and bloggers who falsely present advertising as anything other than obvious advertising should be punished by losing their page ranking. I should be able to do a search of information entirely devoid of advertising if I so wish. And I should be able to do a search of nothing but advertisement if I so wish.

    Anyways, the old method of having a search box that you type a few words into just doesn’t even come close to being all that useful except for the most general kinds of searches.

Leave a Reply to Benjamin Steele Cancel reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s