Tim Converse Yahoo
The perfect site for crawlers:
all pages reachable from root page, tree structure. Link to a sitemap.
links should be extractable in plain html, view source, or use old school webbrowser.
if every distinct URL matches up with distinct content, multple URLs to same cotnetn looks like dupe
limit dynamic paraments,
sessions and cookies should not determind content.
they won’t crawl stuff that is blocked in robots.
they do pay attention to internal anchor text.
constructed on the fly, almost an obsolete meaning. the url has arguments after the question mark. SE’s look at the URL if there is lots of parameters, its likely to be duplicated.
stay away from sessid urls.
map non-? urls to dymamic content
or provide session id free way to navigate the site.
soft-404 traps are bad. if the URL is bogus, send a 404.
worst case is a status 200 on a bad URL with a link that doesn’t exist, soft-404 trap makes it hard to get real content.
evil bots may not obey, but all major SEs will.
don’t accidentally screen good bots out, often robots will screan every bot but a single one.
yahoo added extensions to robots.txt: regexp syntax. google also supports the same syntax, use to screen out dupes.
when the rules disagree, the longest pattern wins.
301 to new site and hang on to old domain as long as possible, as much as they can they will migrate ranking to a new site (this is being reworked). Map old paths to the new paths rather than just redirect to the root.
links, when the last crawl was, what the representation in the index.
authenticate site at siteexplorer and specific a feed of URls to crawl.
He’s now showing a list of resources and help pages.
Brett says there will be a big annoucement from the SEs tomorrow at the super session about this stuff!
Vanessa Fox Google
What ever will work out well for the visitors will work our well for the SEs, how well can a visitor navigate your site, are all the pages crawlable via links, how accessible is your site, ie with extras turned off or a mobile browser.
Take an objective look using a text browser, have someone look at your site and see how easily they can find things.
Get the most out of your links: use anchor text links, minimize redirects, make sure every page on your site is accesible by static text links. make sure you have an HTML site map and link to it from your main page.
Webmaster Central: crawl errors, bot activity, robots.txt problems, use a sitemap file.
Check your site in the SERPs: she’s using an example where a site’s site: command shows redirect pages, bad title tags, incorrectly optimzed flash pages.
Is only the URL in the SERPs that means you may be blocking acces to a crawl
if all titles and descriptions are the same that is bad, have unique description and title tag.
if your description is loadin.. loading… loading, you’ve got problems.
She’s showing webmaster central.
Don’t haphazrdly jump into a redesign. he’s showing a picture of micheal jackson young and old it says “Be Careful”.
think about SEO early in the process, begining to end. Make sure you have the content. Use keyword research for Information Archecture. Balance cool design with strategic, no flash sites. assign keywords to each page and each URL, 3 phrases per page. validate the site, make it section 508 compliant. copy should compliment keywords. Use 301s.
Authority sites are very deep. write good content for each product, put keywords in descriptions. keep old content, like archives newsletters. Study analytics, current pages could be ranking.
Size matters. wikipedia is a very very deep site, its an authority site.
SE friendly doesn’t have to be ugly (he shows and ugly site)
He’s using an exmaple of a site with a flash popup on the homepage. you should keywords on your homepage and have navigation.
avoid image, flash or javascrip navigation. it is better to use css/text navigation. use keywords in anchor text, name pages using keywords.
he’s using an example of cookies by design where they cut the page size from 120k to 26k and put keywords in left-hand navigation.
Don’t use spacer.gif The site ranks pretty well. (#19 on gifft baskets?)
250 words of content for interior pages, 400 on homepage, no keyword stuff, make it readable, link to relevant pages, internal linking is very important. Don’t use “Click Here”.
He’s using an example (conferencecall.com) they use web conference in internal links and they rank at 25 now.
watch URL structure, and canonical domain, pick www or non-www and stick to it.
Use URL rewriting.
Hypens or underscores? very unimportant, use either.
Unique title, description, and keywords, put the most important words first.
don’t stuff keywords tag, but use a call to action in your description tag.
now he’s using an example palmharbor.com they put modular homes/manufactured homes/mobile home in the title tag and they rank well now. Wikipedia beats them in some places (size matters).
use good IA, use keywords, keep URLs static and use 301s.
He’s talking about webmasterworld. They have about 2.5 million pages. The content can be rolled a lot of different ways. They also have a mobile version of the entire site, and a printer friendly version. something like 20 million pages the SEs could possibly see. About two years ago the rankings totally dissapeared because a bot got into the printer friendly version and caused duplicate content problems.
They just reworked the entire site’s URL structure. The most challenging programming he’s ever done. they didn’t want to redirect the old URL, they just used the new URL for new content. The new keyword urls worked better for SEs, they’re turning up for more keywords. User’s bookmarks were a big cause of confusion. They’re using a dozen different variations. The SE’s haven’t hassled them, they’re indexed better than ever before.
The site is all custom software brett wrote. He setup a big network of sites for john deere. they got a lot of questions about how to structure a site. try to find the sweet spot between SEs and users. The SEs are digital and the users are analog. Just before infoseek was sold they were working on a theme engine, where a site would be indexed as a whole for 20-50 keywords. He brings up the theme pyramid table, and the webmaster world table. The structure is simple and based on keywords. They callit the “longer tail”.
someone suggests making your site so that a blind person can use it.
someone is asking about hidden divs and saying google traffic has tailed off
if you use a google sitemap, the sitemap only augments the natural links. In addition to the “free crawl” says yahoo.
google seems to be case sensative, vanessa suggests redirect to the chosen uppercase or lowercase situation, yahoo agrees.
Someone is asking about page load time.
Yahoo says page load time and page size don’t matter for the crawler, but its not a good idea for a user. Yahoo doesn’t penalize pages that take a long time to load or that are large. Google agrees, there is a timeout, and they’ll only use up a certrain amount of bandwidth on the site. Google won’t penalize for long times or large size, matt did a post about. Its about sales, not traffic.
Someone is asking about multiple urls pointing to the same site.
There will be a session about duplicate content after lunch.