What do I need to know about the new MSN Search Engine?
search engine optimization (SEO)

Now that MSN has removed the wraps on its new search engine (beta.search.msn.com), intended to compete with both Google and Yahoo, the obvious question on the minds of SEO people everywhere is: what algorithm is MSN going to use for their pagerank calculations?

Microsoft is being predictably coy: their FAQ states: "The MSN Search ranking algorithm analyzes factors such as page contents, the number and quality of sites that link to your pages, and the relevance of your site’s content to keywords. The algorithm is complex and never human-mediated."

Nonetheless, there are some useful tips that give you a little bit of insight into how MSN is approaching search. This is all quoted from their site, and broken into three categories.

Technical Recommendations

  • Use only well-formed HTML code in your pages. Ensure that all tags are closed, and that all links function properly. If your site contains broken links, MSNBot may not be able to index your site effectively, and people may not be able to reach all of your pages.
  • If you move a page, set up the page's original URL to direct people to the new page, and tell them whether the move is permanent or temporary.
  • Make sure MSNBot is allowed to crawl your site, and is not on your list of web crawlers that are prohibited from indexing your site.
  • Use a robots.txt file or meta tags to control how MSNBot and other web crawlers index your site. The robots.txt file tells web crawlers which files and folders it is not allowed to crawl.
  • Keep your URLs simple and static. Complicated or frequently changed URLs are difficult to use as link destinations. For example, the URL www.example.com/mypage is easier for MSNBot to crawl and for people to type than a long URL with multiple extensions. Also, a URL that doesn't change is easier for people to remember, which makes it a more likely link destination from other sites.

    Dave's comment: In case MSN didn't notice, the majority of traffic to a site are from search results, so the complexity of a URL doesn't matter as much as they are saying here. It's an interesting insight into their ranking criteria, imo.

Content Guidelines

  • In the visible page text, include words users might choose as search query terms to find the information on your site.
  • Limit all pages to a reasonable size. We recommend one topic per page. An HTML page with no pictures should be under 150 KB.
  • Make sure that each page is accessible by at least one static text link.
  • Create a site map that is fairly flat (i.e., each page is only one to three clicks away from the home page). Links embedded in menus, list boxes, and similar elements are not accessible to web crawlers unless they appear in your site map.
  • Keep the text that you want indexed outside of images. For example, if you want your company name or address to be indexed, make sure it is displayed on your page outside of a company logo.

You can learn more at the MSN Search Site Owner Help. Competition is always good, so it'll be interesting to see what theories arise about how they're ranking and ordering search results!

What's Acceptable Search Engine "Spam" Technique?
search engine optimization (SEO)

After hearing a lot about it, I went over to SitePoint and read an interesting article by a "search engine optimization expert" wherein he enumerates his list of fifteen of the most egregious techniques by which companies and individuals "spam" the search engines. What's search engine spam, you ask? It's using inappropriate and deceitful methods of manipulating the HTML or other elements of a site to generate a higher ranking than the site would otherwise be granted by a typical search engine relevance calculation.

Now don't get me wrong, I absolutely am against search engine spamming and other forms of 'cheating', so it's not that I don't agree with the premise of the article at all. What I have a problem with is more about whether the technologies and techniques that are singled out are really search engines spamming or not.

Wikis

For example, Wikis are singled out as a bad technology, yet a Wiki is just a minimalist shared white board, a technology that lets a group of people share the maintenance of Web-based content. The most popular is probably Wikipedia, which is a fabulous resource, but even Net-savvy publisher O'Reilly has a Wiki that they use to manage the interaction between the company, their authors, and user groups.

The argument of the article author, though, is that Wiki's are dangerous because anyone can -- theoretically -- add content and therefore add bogus links back to a third-party site. Are Wiki's therefore bad because people can "spam" them? Of course not.

Just like comments on a weblog or entries in a guestbook, pages on a Wiki should be monitored to ensure that the information thereon is relevant.

Networked Blogs

Another area of complaint: so-called "networked blogs". Again, the article's author arbitrarily decides what is and isn't legitimate content, stating: "some spammers start a blog, plug it full of garbage content such as comments on what they thought at 5:15, along with a link or two and some keyword rich text."

There are undoubtedly some people who exploit that idea, but I am far more reticent about deciding that a weblog where they talk about what they were thinking at a given time is garbage. As a quick example, perhaps the 5:15pm thoughts were interesting because the weblog writer had just gotten off work and knew that something frightening (or wonderful) was going to happen at 5:50.

Writing weblog entries or articles in a manner that allows certain keywords or key phrases to be repeated with some frequency seems like more of a smart way to ensure that your musings are rated as relevant with a search engine than otherwise. After all, in typical prose you might mention the subject of the comment once, then just refer to "it" and "the problem", "the company" or similar, making it impossible for a search engine to know what is the subject of the article in the first place. (and that's a good argument as to why you should also craft good titles for your entries too)

Who Makes the Judgment Call?

What bothers me about these two techniques that are highlighted is that we're sliding from the overt spam techniques cogently discussed in the Sitepoint article (techniques including invisible text and link farms, both discussed in detail in Three Ways to Adversely Impact your Google Pagerank) to techniques that are really more of a judgment call. I imagine that the author of the original article, for example, would find a comment added to this weblog entry that pointed to someone else's site offering ten smart ways to improve your search ranking to be spam, even though I wouldn't necessarily agree.

What I'm trying to say here is that I'm in agreement when we're talking about objective search engine spam techniques, but when we move into subjective search engine spam techniques, I'm a lot less comfortable with the entire topic and am confident that what I think is SES is going to be different to what you, the reader, would think is SES.

So my closing thought on this topic is that it's always important to take what anyone says about search engine optimization -- including what I say -- with a grain of salt. Think for yourself, read the published criteria from important search engine sites like Google, and make your own decisions on how to approach the problem and what kind of results you seek.

The article on Sitepoint: Latest Search Engine Spam Techniques by Gord Collins.

The Right Way to Link To Pages On Your Site
search engine optimization (SEO)

Here's a topic that should be obvious, but isn't: how should you best code links on your site from page to page? Should you use something like "page two" or "continued..." or "more" with the page filenames as the links? Should you use absolute links that always begin with a leading / (as in "/reviews.html"), should you always use relative links (as in "../reviews.html") or should you use fully qualified links (as in "http://www.free-web-money.com/reviews.html")?

The answer to this question might surprise you! First off, innuendo and rumor aside, Google and other search engines do not care about how your links are coded. I have read on some SEO sites that people suggest that Google "spiders" your site faster if you have absolute or even fully-qualified URLs, but as far as I can ascertain, that's just not true.

So this facet of the question boils down to what's the easiest for you to maintain on your site? A link that allows you to move all the pages around as you might need to reorganize things, or a link that forces you to always live with a specific domain name and directory name? My druthers is unquestionably to use relative links as much as possible, and to always use absolute (though not fully-qualified) links on 404 error pages and other content that kind of floats around on your site.

The only area where full, absolute URLs are a necessity are weblog entries, because your Weblog entries should be generating an RSS feed (learn more about RSS feeds at this RSS info page) which is then read by subscribers in their own applications, so relative links almost always fail. This means that it's a bit more tricky to add links to, say, this entry since this Web site -- free web money -- is built around the Movable Type weblog content management system, but the trade-off of having clickable links in the RSS feed makes it worthwhile.

Let's get back to the main question, then: How should you structure the links between pages on your site?

Well, I used to have links like "home" and similar, but upon reflection realized that they were empty links because the words that are used to link to a site are important and "home" is almost as bad as "welcome" in terms of being completely useless. Instead, all of your interpage links should, as much as possible, reinforce the key words and key phrases that you want to have identify your site (also see Understanding Keyword Density for more about keywords). Instead of a link like:

<a href="index.html">home</a>
therefore, you'll find that you get more value out of simply replacing that link with a link that has the name of the site, the key concept, or similar:
<a href="index.html'>free web money</a>
If you really want some extra credit, think about your filenames too: "index.html" is generic and meaningless, yet your could easily configure your site to have a linked file called "affiliate-secrets.html" and link to that instead:
<a href="affiliate-secrets.html">free web money</a>
Now you're really rocking.

Whether you want to think about filenames or not, it's certainly useful to think about the words that you use to establish the links between your pages. A few minor changes can have a significant impact on your findability and isn't that worth the effort, after all?

The hidden importance of your page TITLE
search engine optimization (SEO)

Here's a search engine optimization concept that most people don't think about: make sure you have keywords and key phrases in your TITLE tag. You know what the TITLE tag is, it's the tag that gives you the name of the page on the Window frame in your browser, and it's remarkable how few sites pay any attention to what's in that critical search engine optimization (SEO) field.

Let's take a quick tour of some big sites and have a look, shall we? HBO.com has a title tag of "HBO Online". ESPN.com has "ESPN.com" as their title. No kidding. NYTimes.com is better, with "The New York Times > Breaking News, World News & Multimedia", Microsoft has "Microsoft Corporation", though, and, finally, BMW.com has "BMW International Website".

What's wrong with these? The problem is that each and every word in a TITLE tag is considered quite important by search engines (e.g, Google) when they figure out what your page is about and how relevant a given topic is on the page. Keyword density is definitely important in this regard, but one of the easiest ways to become more relevant to a given search result is to ensure that the keywords or key phrases you want to match are in the TITLE tag.

The downside is that sometimes the TITLES look a bit weird - as is demonstrated on this very site - but the upside is that if you want to have a site that Google thinks is an excellent match for, say, "acupuncture information", then having a TITLE like "Acupuncture Information for Everyone" will yield a definite improvement.

If nothing else, please, do me a favor and don't use "Welcome to", "Home Page", "Website" or any other empty words in your TITLE. After all, with all due respect to BMW, I think it's pretty obvious that if I'm looking at their information on the Web with a Web browser that it's a Website. So why bother saying so in the TITLE?

Frankly, for BMW, I think I'd suggest that they have a TITLE more like "BMW:Luxury Automobiles and Sports Cars from Germany for over 80 Years" which is still readable and friendly, but now it includes other keywords that can help with searches, making it a more relevant match for "luxury cars", "luxury automobiles", "sports cars", "German cars", etc. See how that works? Simple, but surprisingly effective.

So take five minutes and think about your TITLE tag. Is it doing the job you want? And keep in mind that Google and other search engines look at pages, not sites, so you need to ensure that the TITLE on every page of your site is helping your relevance with search engines.

This is still just search engine optimization (SEO) 101, but it's important.

Understanding Keyword Density
search engine optimization (SEO)

Search engine optimization, or "SEO" in the biz, isn't only for people trying to turn their Web site into a revenue machine, to make money online, but can really be useful for everyone building Web sites. There are lots of different facets to writing, designing and adjusting your Web pages to maximize the chance of them being a top result for search terms, but one of the best - and easiest - is to work with keyword density.

What is keyword density? It's basically a measurement of how relevant a given keyword "topic" is to a page of material. For example, this page is quite relevant to the word 'keyword' and the phrase 'keyword density' because both occur many times. More importantly, the ratio of their occurances to the total number of words or phrases on the page is reasonably high because, well, they occur a bunch of times.

That's what keyword density is about. The keyword density of the word "keyword" is calculated by counting the total number of words on the page, then figuring out how many of them are "keyword". Typical highly-ranked sites have at least a 2-3% keyword density for the key search word or search phrase.

But don't take my word for it. Check out the keyword density of your favorite Web page at Search Engine World with their terrific - free - keyword density analyzer. To keep your sanity, I suggest that you set it to ignore words of five letters or less.

Of course, SEOs will tell you that keyword density isn't the only factor to consider when building your page. Among the other important search engine optimization topics are so-called keyword prominence, that is, where on your page the keyword or keywords appear. A title tag is considerably more prominent than the alt text of an image, for example. :-)

Nonetheless, it's quite informative to search for a key phrase that you would like to have match your own site and then use the keyword density analyzer to see the density of top matched pages versus your own. Then add the phrase a few more times on your page, perhaps in the title or a h1 tag or similar, and try again.

And don't be surprised if this change all by itself helps boost your site ranking on the search results.