There’s No Such Thing as a Site Migration

Posted by jonoalderson

Websites, like the businesses who operate them, are often deceptively complicated machines.

They’re fragile systems, and changing or replacing any one of the parts can easily affect (or even break) the whole setup — often in ways not immediately obvious to stakeholders or developers.

Even seemingly simple sites are often powered by complex technology, like content management systems, databases, and templating engines. There’s much more going on behind the scenes — technically and organizationally — than you can easily observe by crawling a site or viewing the source code.

When you change a website and remove or add elements, it’s not uncommon to introduce new errors, flaws, or faults.

That’s why I get extremely nervous whenever I hear a client or business announce that they’re intending to undergo a "site migration."

Chances are, and experience suggests, that something’s going to go wrong.

Migrations vary wildly in scope

As an SEO consultant and practitioner, I've been involved in more "site migrations" than I can remember or count — for charities, startups, international e-commerce sites, and even global household brands. Every one has been uniquely challenging and stressful.

In each case, the businesses involved have underestimated (and in some cases, increased) the complexity, the risk, and the details involved in successfully executing their "migration."

As a result, many of these projects negatively impacted performance and potential in ways that could have been easily avoided.

This isn’t a case of the scope of the "migration" being too big, but rather, a misalignment of understanding, objectives, methods, and priorities, resulting in stakeholders working on entirely different scopes.

The migrations I’ve experienced have varied from simple domain transfers to complete overhauls of server infrastructure, content management frameworks, templates, and pages — sometimes even scaling up to include the consolidation (or fragmentation) of multiple websites and brands.

In the minds of each organization, however, these have all been "migration" projects despite their significantly varying (and poorly defined) scopes. In each case, the definition and understanding of the word "migration" has varied wildly.

We suck at definitions

As an industry, we’re used to struggling with labels. We’re still not sure if we’re SEOs, inbound marketers, digital marketers, or just… marketers. The problem is that, when we speak to each other (and those outside of our industry), these words can carry different meaning and expectations.

Even amongst ourselves, a conversation between two digital marketers, analysts, or SEOs about their fields of expertise is likely to reveal that they have surprisingly different definitions of their roles, responsibilities, and remits. To them, words like "content" or "platform" might mean different things.

In the same way, "site migrations" vary wildly, in form, function, and execution — and when we discuss them, we’re not necessarily talking about the same thing. If we don’t clarify our meanings and have shared definitions, we risk misunderstandings, errors, or even offense.

Ambiguity creates risk

Poorly managed migrations can have a number of consequences beyond just drops in rankings, traffic, and performance. There are secondary impacts, too. They can also inadvertently:

  • Provide a poor user experience (e.g., old URLs now 404, or error states are confusing to users, or a user reaches a page different from what they expected).
  • Break or omit tracking and/or analytics implementations, resulting in loss of business intelligence.
  • Limit the size, shape, or scalability of a site, resulting in static, stagnant, or inflexible templates and content (e.g., omitting the ability to add or edit pages, content, and/or sections in a CMS), and a site which struggles to compete as a result.
  • Miss opportunities to benefit from what SEOs do best: blending an understanding of consumer demand and behavior, the market and competitors, and the brand in question to create more effective strategies, functionality and content.
  • Create conflict between stakeholders, when we need to "hustle" at the last minute to retrofit our requirements into an already complex project (“I know it’s about to go live, but PLEASE can we add analytics conversion tracking?”) — often at the cost of our reputation.
  • Waste future resource, where mistakes require that future resource is spent recouping equity resulting from faults or omissions in the process, rather than building on and enhancing performance.

I should point out that there’s nothing wrong with hustle in this case; that, in fact, begging, borrowing, and stealing can often be a viable solution in these kinds of scenarios. There’s been more than one occasion when, late at night before a site migration, I’ve averted disaster by literally begging developers to include template review processes, to implement redirects, or to stall deployments.

But this isn’t a sensible or sustainable or reliable way of working.

Mistakes will inevitably be made. Resources, favors, and patience are finite. Too much reliance on "hustle" from individuals (or multiple individuals) may in fact further widen the gap in understanding and scope, and positions the hustler as a single point of failure.

More importantly, hustle may only fix the symptoms, not the cause of these issues. That means that we remain stuck in a role as the disruptive outsiders who constantly squeeze in extra unscoped requirements at the eleventh hour.

Where things go wrong

If we’re to begin to address some of these challenges, we need to understand when, where, and why migration projects go wrong.

The root cause of all less-than-perfect migrations can be traced to at least one of the following scenarios:

  • The migration project occurs without consultation.
  • Consultation is sought too late in the process, and/or after the migration.
  • There is insufficient planned resource/time/budget to add requirements (or processes)/make recommended changes to the brief.
  • The scope is changed mid-project, without consultation, or in a way which de-prioritizes requirements.
  • Requirements and/or recommended changes are axed at the eleventh hour (due to resource/time/budget limitations, or educational/political conflicts).

There’s a common theme in each of these cases. We’re not involved early enough in the process, or our opinions and priorities don’t carry sufficient weight to impact timelines and resources.

Chances are, these mistakes are rarely the product of spite or of intentional omission; rather, they’re born of gaps in the education and experience of the stakeholders and decision-makers involved.

We can address this, to a degree, by elevating ourselves to senior stakeholders in these kinds of projects, and by being consulted much earlier in the timeline.

Let’s be more specific

I think that it’s our responsibility to help the organizations we work for to avoid these mistakes. One of the easiest opportunities to do that is to make sure that we’re talking about the same thing, as early in the process as possible.

Otherwise, migrations will continue to go wrong, and we will continue to spend far too much of our collective time fixing broken links, recommending changes or improvements to templates, and holding together bruised-and-broken websites — all at the expense of doing meaningful, impactful work.

Perhaps we can begin to answer to some of these challenges by creating better definitions and helping to clarify exactly what’s involved in a "site migration" process.

Unfortunately, I suspect that we’re stuck with the word "migration," at least for now. It’s a term which is already widely used, which people think is a correct and appropriate definition. It’s unrealistic to try to change everybody else’s language when we’re already too late to the conversation.

Our next best opportunity to reduce ambiguity and risk is to codify the types of migration. This gives us a chance to prompt further exploration and better definitions.

For example, if we can say “This sounds like it’s actually a domain migration paired with a template migration,” we can steer the conversation a little and rely on a much better shared frame of reference.

If we can raise a challenge that, e.g., the "translation project" a different part of the business is working on is actually a whole bunch of interwoven migration types, then we can raise our concerns earlier and pursue more appropriate resource, budget, and authority (e.g., “This project actually consists of a series of migrations involving templates, content, and domains. Therefore, it’s imperative that we also consider X and Y as part of the project scope.”).

By persisting in labelling this way, stakeholders may gradually come to understand that, e.g., changing the design typically also involves changing the templates, and so the SEO folks should really be involved earlier in the process. By challenging the language, we can challenge the thinking.

Let’s codify migration types

I’ve identified at least seven distinct types of migration. Next time you encounter a "migration" project, you can investigate the proposed changes, map them back to these types, and flag any gaps in understanding, expectations, and resource.

You could argue that some of these aren’t strictly "migrations" in a technical sense (i.e., changing something isn’t the same as moving it), but grouping them this way is intentional.

Remember, our goal here isn’t to neatly categorize all of the requirements for any possible type of migration. There are plenty of resources, guides, and lists which already try do that.

Instead, we’re trying to provide neat, universal labels which help us (the SEO folks) and them (the business stakeholders) to have shared definitions and to remove unknown unknowns.

They’re a set of shared definitions which we can use to trigger early warning signals, and to help us better manage stakeholder expectations.

Feel free to suggest your own, to grow, shrink, combine, or bin any of these to fit your own experience and requirements!

1. Hosting migrations

A broad bundling of infrastructure, hardware, and server considerations (while these are each broad categories in their own right, it makes sense to bundle them together in this context).

If your migration project contains any of the following changes, you’re talking about a hosting migration, and you’ll need to explore the SEO implications (and development resource requirements) to make sure that changes to the underlying platform don’t impact front-end performance or visibility.

  • You’re changing hosting provider.
  • You’re changing, adding, or removing server locations.
  • You’re altering the specifications of your physical (or virtual) servers (e.g., RAM, CPU, storage, hardware types, etc).
  • You’re changing your server technology stack (e.g., moving from Apache to Nginx).*
  • You’re implementing or removing load balancing, mirroring, or extra server environments.
  • You’re implementing or altering caching systems (database, static page caches, varnish, object, memcached, etc).
  • You’re altering the physical or server security protocols and features.**
  • You’re changing, adding or removing CDNs.***

*Might overlap into a software migration if the changes affect the configuration or behavior of any front-end components (e.g., the CMS).

**Might overlap into other migrations, depending on how this manifests (e.g., template, software, domain).

***Might overlap into a domain migration if the CDN is presented as/on a distinct hostname (e.g., AWS), rather than invisibly (e.g., Cloudflare).

2. Software migrations

Unless your website is comprised of purely static HTML files, chances are that it’s running some kind of software to serve the right pages, behaviors, and content to users.

If your migration project contains any of the following changes, you’re talking about a software migration, and you’ll need to understand (and input into) how things like managing error codes, site functionality, and back-end behavior work.

  • You’re changing CMS.
  • You’re adding or removing plugins/modules/add-ons in your CMS.
  • You’re upgrading or downgrading the CMS, or plugins/modules/addons (by a significant degree/major release) .
  • You’re changing the language used to render the website (e.g., adopting Angular2 or NodeJS).
  • You’re developing new functionality on the website (forms, processes, widgets, tools).
  • You’re merging platforms; e.g., a blog which operated on a separate domain and system is being integrated into a single CMS.*

*Might overlap into a domain migration if you’re absorbing software which was previously located/accessed on a different domain.

3. Domain migrations

Domain migrations can be pleasantly straightforward if executed in isolation, but this is rarely the case. Changes to domains are often paired with (or the result of) other structural and functional changes.

If your migration project alters the URL(s) by which users are able to reach your website, contains any of the following changes, then you’re talking about a domain migration, and you need to consider how redirects, protocols (e.g., HTTP/S), hostnames (e.g., www/non-www), and branding are impacted.

  • You’re changing the main domain of your website.
  • You’re buying/adding new domains to your ecosystem.
  • You’re adding or removing subdomains (e.g., removing domain sharding following a migration to HTTP2).
  • You’re moving a website, or part of a website, between domains (e.g., moving a blog on a subdomain into a subfolder, or vice-versa).
  • You’re intentionally allowing an active domain to expire.
  • You’re purchasing an expired/dropped domain.

4. Template migrations

Chances are that your website uses a number of HTML templates, which control the structure, layout, and peripheral content of your pages. The logic which controls how your content looks, feels, and behaves (as well as the behavior of hidden/meta elements like descriptions or canonical URLs) tends to live here.

If your migration project alters elements like your internal navigation (e.g., the header or footer), elements in your <head>, or otherwise changes the page structure around your content in the ways I’ve outlined, then you’re talking about a template migration. You’ll need to consider how users and search engines perceive and engage with your pages, how context, relevance, and authority flow through internal linking structures, and how well-structured your HTML (and JS/CSS) code is.

  • You’re making changes to internal navigation.
  • You’re changing the layout and structure of important pages/templates (e.g., homepage, product pages).
  • You’re adding or removing template components (e.g., sidebars, interstitials).
  • You’re changing elements in your <head> code, like title, canonical, or hreflang tags.
  • You’re adding or removing specific templates (e.g., a template which shows all the blog posts by a specific author).
  • You’re changing the URL pattern used by one or more templates.
  • You’re making changes to how device-specific rendering works*

*Might involve domain, software, and/or hosting migrations, depending on implementation mechanics.

5. Content migrations

Your content is everything which attracts, engages with, and convinces users that you’re the best brand to answer their questions and meet their needs. That includes the words you use to describe your products and services, the things you talk about on your blog, and every image and video you produce or use.

If your migration project significantly changes the tone (including language, demographic targeting, etc), format, or quantity/quality of your content in the ways I’ve outlined, then you’re talking about a content migration. You’ll need to consider the needs of your market and audience, and how the words and media on your website answer to that — and how well it does so in comparison with your competitors.

  • You significantly increase or reduce the number of pages on your website.
  • You significantly change the tone, targeting, or focus of your content.
  • You begin to produce content on/about a new topic.
  • You translate and/or internationalize your content.*
  • You change the categorization, tagging, or other classification system on your blog or product content.**
  • You use tools like canonical tags, meta robots indexation directives, or robots.txt files to control how search engines (and other bots) access and attribute value to a content piece (individually or at scale).

*Might involve domain, software and/or hosting, and template migrations, depending on implementation mechanics.

**May overlap into a template migration if the layout and/or URL structure changes as a result.

6. Design migrations

The look and feel of your website doesn’t necessarily directly impact your performance (though user signals like engagement and trust certainly do). However, simple changes to design components can often have unintended knock-on effects and consequences.

If your migration project contains any of the following changes, you’re talking about a design migration, and you’ll need to clarify whether changes are purely cosmetic or whether they go deeper and impact other areas.

  • You’re changing the look and feel of key pages (like your homepage).*
  • You’re adding or removing interaction layers, e.g. conditionally hiding content based on device or state.*
  • You’re making design/creative changes which change the HTML (as opposed to just images or CSS files) of specific elements.*
  • You’re changing key messaging, like logos and brand slogans.
  • You’re altering the look and feel to react to changing strategies or monetization models (e.g., introducing space for ads in a sidebar, or removing ads in favor of using interstitial popups/states).
  • You’re changing images and media.**

*All template migrations.

**Don’t forget to 301 redirect these, unless you’re replacing like-for-like filenames (which isn’t always best practice if you wish to invalidate local or remote caches).

7. Strategy migrations

A change in organizational or marketing strategy might not directly impact the website, but a widening gap between a brand’s audience, objectives, and platform can have a significant impact on performance.

If your market or audience (or your understanding of it) changes significantly, or if your mission, your reputation, or the way in which you describe your products/services/purpose changes, then you’re talking about a strategy migration. You’ll need to consider how you structure your website, how you target your audiences, how you write content, and how you campaign (all of which might trigger a set of new migration projects!).

  • You change the company mission statement.
  • You change the website’s key objectives, goals, or metrics.
  • You enter a new marketplace (or leave one).
  • Your channel focus (and/or your audience’s) changes significantly.
  • A competitor disrupts the market and/or takes a significant amount of your market share.
  • Responsibility for the website/its performance/SEO/digital changes.
  • You appoint a new agency or team responsible for the website’s performance.
  • Senior/C-level stakeholders leave or join.
  • Changes in legal frameworks (e.g. privacy compliance or new/changing content restrictions in prescriptive sectors) constrain your publishing/content capabilities.

Let’s get in earlier

Armed with better definitions, we can begin to force a more considered conversation around what a "migration" project actually involves. We can use a shared language and ensure that stakeholders understand the risks and opportunities of the changes they intend to make.

Unfortunately, however, we don’t always hear about proposed changes until they’ve already been decided and signed off.

People don’t know that they need to tell us that they’re changing domain, templates, hosting, etc. So it’s often too late when — or if — we finally get involved. Decisions have already been made before they trickle down into our awareness.

That’s still a problem.

By the time you’re aware of a project, it’s usually too late to impact it.

While our new-and-improved definitions are a great starting place to catch risks as you encounter them, avoiding those risks altogether requires us to develop a much better understanding of how, where, and when migrations are planned, managed, and start to go wrong.

Let’s identify trigger points

I’ve identified four common scenarios which lead to organizations deciding to undergo a migration project.

If you can keep your ears to the ground and spot these types of events unfolding, you have an opportunity to give yourself permission to insert yourself into the conversation, and to interrogate to find out exactly which types of migrations might be looming.

It’s worth finding ways to get added to deployment lists and notifications, internal project management tools, and other systems so that you can look for early warning signs (without creating unnecessary overhead and comms processes).

1. Mergers, acquisitions, and closures

When brands are bought, sold, or merged, this almost universally triggers changes to their websites. These requirements are often dictated from on-high, and there’s limited (or no) opportunity to impact the brief.

Migration strategies in these situations are rarely comfortable, and almost always defensive by nature (focusing on minimizing impact/cost rather than capitalizing upon opportunity).

Typically, these kinds of scenarios manifest in a small number of ways:

  • The "parent" brand absorbs the website of the purchased brand into their own website; either by "bolting it on" to their existing architecture, moving it to a subdomain/folder, or by distributing salvageable content throughout their existing site and killing the old one (often triggering most, if not every type of migration).
  • The purchased brand website remains where it is, but undergoes a design migration and possibly template migrations to align it with the parent brand.
  • A brand website is retired and redirected (a domain migration).

2. Rebrands

All sorts of pressures and opportunities lead to rebranding activity. Pressures to remain relevant, to reposition within marketplaces, or change how the brand represents itself can trigger migration requirements — though these activities are often led by brand and creative teams who don’t necessarily understand the implications.

Often, the outcome of branding processes and initiatives creates new a or alternate understanding of markets and consumers, and/or creates new guidelines/collateral/creative which must be reflected on the website(s). Typically, this can result in:

  • Changes to core/target audiences, and the content or language/phrasing used to communicate with them (strategy and content migrations -—more if this involves, for example, opening up to international audiences).
  • New collateral, replacing or adding to existing media, content, and messaging (content and design migrations).
  • Changes to website structure and domain names (template and domain migrations) to align to new branding requirements.

3. C-level vision

It’s not uncommon for senior stakeholders to decide that the strategy to save a struggling business, to grow into new markets, or to make their mark on an organization is to launch a brand-new, shiny website.

These kinds of decisions often involve a scorched-earth approach, tearing down the work of their predecessors or of previously under-performing strategies. And the more senior the decision-maker, the less likely they’ll understand the implications of their decisions.

In these kinds of scenarios, your best opportunity to avert disaster is to watch for warning signs and to make yourself heard before it’s too late. In particular, you can watch out for:

  • Senior stakeholders with marketing, IT, or C-level responsibilities joining, leaving, or being replaced (in particular if in relation to poor performance).
  • Boards of directors, investors, or similar pressuring web/digital teams for unrealistic performance goals (based on current performance/constraints).
  • Gradual reduction in budget and resource for day-to-day management and improvements to the website (as a likely prelude to a big strategy migration).
  • New agencies being brought on board to optimize website performance, who’re hindered by the current framework/constraints.
  • The adoption of new martech and marketing automation software.*

*Integrations of solutions like SalesForce, Marketo, and similar sometimes rely on utilizing proxied subdomains, embedded forms/content, and other mechanics which will need careful consideration as part of a template migration.

4. Technical or financial necessity

The current website is in such a poor, restrictive, or cost-ineffective condition that it makes it impossible to adopt new-and-required improvements (such as compliance with new standards, an integration of new martech stacks, changes following a brand purchase/merger, etc).

Generally, like the kinds of C-level “new website” initiatives I’ve outlined above, these result in scorched earth solutions.

Particularly frustrating, these are the kinds of migration projects which you yourself may well argue and fight for, for years on end, only to then find that they’ve been scoped (and maybe even begun or completed) without your input or awareness.

Here are some danger signs to watch out for which might mean that your migration project is imminent (or, at least, definitely required):

  • Licensing costs for parts or the whole platform become cost-prohibitive (e.g., enterprise CMS platforms, user seats, developer training, etc).
  • The software or hardware skill set required to maintain the site becomes rarer or more expensive (e.g., outdated technologies).
  • Minor-but-urgent technical changes take more than six months to implement.
  • New technical implementations/integrations are agreed upon in principle, budgeted for, but not implemented.
  • The technical backlog of tasks grows faster than it shrinks as it fills with breakages and fixes rather than new features, initiatives, and improvements.
  • The website ecosystem doesn’t support the organization’s ways of working (e.g., the organization adopts agile methodologies, but the website only supports waterfall-style codebase releases).
  • Key technology which underpins the site is being deprecated, and there’s no easy upgrade path.*

*Will likely trigger hosting or software migrations.

Let’s not count on this

While this kind of labelling undoubtedly goes some way to helping us spot and better manage migrations, it’s far from a perfect or complete system.

In fact, I suspect it may be far too ambitious, and unrealistic in its aspiration. Accessing conversations early enough — and being listened to and empowered in those conversations — relies on the goodwill and openness of companies who aren’t always completely bought into or enamored with SEO.

This will only work in an organization which is open to this kind of thinking and internal challenging — and chances are, they’re not the kinds of organizations who are routinely breaking their websites. The very people who need our help and this kind of system are fundamentally unsuited to receive it.

I suspect, then, it might be impossible in many cases to make the kinds of changes required to shift behaviors and catch these problems earlier. In most organizations, at least.

Avoiding disasters resulting from ambiguous migration projects relies heavily on broad education. Everything else aside, people tend to change companies faster than you can build deep enough tribal knowledge.

That doesn’t mean that the structure isn’t still valuable, however. The types of changes and triggers I’ve outlined can still be used as alarm bells and direction for your own use.

Let’s get real

If you can’t effectively educate stakeholders on the complexities and impact of them making changes to their website, there are more "lightweight" solutions.

At the very least, you can turn these kinds of items (and expand with your own, and in more detail) into simple lists which can be printed off, laminated, and stuck to a wall. At the very least, perhaps you'll remind somebody to pick up the phone to the SEO team when they recognize an issue.

In a more pragmatic world, stakeholders don’t necessarily have to understand the nuance or the detail if they at least understand that they’re meant to ask for help when they’re changing domain, for example, or adding new templates to their website.

Whilst this doesn’t solve the underlying problems, it does provide a mechanism through which the damage can be systematically avoided or limited. You can identify problems earlier and be part of the conversation.

If it’s still too late and things do go wrong, you'll have something you can point to and say “I told you so,” or, more constructively perhaps, “Here’s the resource you need to avoid this happening next time.”

And in your moment of self-righteous vindication, having successfully made it through this post and now armed to save your company from a botched migration project, you can migrate over to the bar. Good work, you.


Thanks to…

This turned into a monster of a post, and its scope meant that it almost never made it to print. Thanks to a few folks in particular for helping me to shape, form, and ship it. In particular:

  • Hannah Thorpe, for help in exploring and structuring the initial concept.
  • Greg Mitchell, for a heavy dose of pragmatism in the conclusion.
  • Gerry White, for some insightful additions and the removal of dozens of typos.
  • Sam Simpson for putting up with me spending hours rambling and ranting at her about failed site migrations.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

The State of Links: Yesterday’s Ranking Factor?

Posted by Tom.Capper

Back in September last year, I was lucky enough to see Rand speak at MozCon. His talk was about link building and the main types of strategy that he saw as still being relevant and effective today. During his introduction, he said something that really got me thinking, about how the whole purpose of links and PageRank had been to approximate traffic.

Source

Essentially, back in the late '90s, links were a much bigger part of how we experienced the web — think of hubs like Excite, AOL, and Yahoo. Google’s big innovation was to realize that, because people navigated the web by clicking on links, they could approximate the relative popularity of pages by looking at those links.

So many links, such little time.

Rand pointed out that, given all the information at their disposal in the present day — as an Internet Service Provider, a search engine, a browser, an operating system, and so on — Google could now far more accurately model whether a link drives traffic, so you shouldn’t aim to build links that don’t drive traffic. This is a pretty big step forward from the link-building tactics of old, but it occurred to me that it it probably doesn’t go far enough.

If Google has enough data to figure out which links are genuinely driving traffic, why bother with links at all? The whole point was to figure out which sites and pages were popular, and they can now answer that question directly. (It’s worth noting that there’s a dichotomy between “popular” and “trustworthy” that I don’t want to get too stuck into, but which isn’t too big a deal here given that both can be inferred from either link-based data sources, or from non-link-based data sources — for example, SERP click-through rate might correlate well with “trustworthy,” while “search volume” might correlate well with “popular”).

However, there’s plenty of evidence out there suggesting that Google is in fact still making significant use of links as a ranking factor, so I decided to set out to challenge the data on both sides of that argument. The end result of that research is this post.

The horse's mouth

One reasonably authoritative source on matters relating to Google is Google themselves. Google has been fairly unequivocal, even in recent times, that links are still a big deal. For example:

  • March 2016: Google Senior Search Quality Strategist Andrey Lipattsev confirms that content and links are the first and second greatest ranking factors. (The full quote is: “Yes; I can tell you what they [the number 1 and 2 ranking factors] are. It’s content, and links pointing to your site.”)
  • April 2014: Matt Cutts confirms that Google has tested search quality without links, and found it to be inferior.
  • October 2016: Gary Illyes implies that text links continue to be valuable while playing down the concept of Domain Authority.

Then, of course, there’s their continued focus on unnatural backlinks and so on — none of which would be necessary in a world where links are not a ranking factor.

However, I’d argue that this doesn’t indicate the end of our discussion before it’s even begun. Firstly, Google has a great track record of giving out dodgy SEO advice. Consider HTTPS migrations pre-2016. Will Critchlow talked at SearchLove San Diego about how Google’s algorithms are at a level of complexity and opaqueness where they’re no longer even trying to understand them themselves — and of course there are numerous stories of unintentional behaviors from machine learning algorithms out in the wild.

Third-party correlation studies

It’s not difficult to put together your own data and show a correlation between link-based metrics and rankings. Take, for example:

  • Moz’s most recent study in 2015, showing strong relationships between link-based factors and rankings across the board.
  • This more recent study by Stone Temple Consulting.

However, these studies fall into significant issues with correlation vs. causation.

There are three main mechanisms which could explain the relationships that they show:

  1. Getting more links causes sites to rank higher (yay!)
  2. Ranking higher causes sites to get more links
  3. Some third factor, such as brand awareness, is related to both links and rankings, causing them to be correlated with each other despite the absence of a direct causal relationship

I’ve yet to see any correlation study that addresses these very serious shortcomings, or even particularly acknowledges them. Indeed, I’m not sure that it would even be possible to do so given the available data, but this does show that as an industry we need to apply some critical thinking to the advice that we’re consuming.

However, earlier this year I did write up some research of my own here on the Moz Blog, demonstrating that brand awareness could in fact be a more useful factor than links for predicting rankings.

Source

The problem with this study was that it showed a relationship that was concrete (i.e. extremely statistically significant), but that was surprisingly lacking in explanatory power. Indeed, I discussed in that post how I’d ended up with a correlation that was far lower than Moz’s for Domain Authority.

Fortunately, Malcolm Slade recently discussed some of his very similar research at BrightonSEO, in which he finds similar broad correlations to myself between brand factors and rankings, but far, far stronger correlations for certain types of query, and especially big, high-volume, highly competitive head terms.

So what can we conclude overall from these third-party studies? Two main things:

  1. We should take with a large pinch of salt any study that does not address the possibilities of reverse causation, or a jointly-causing third factor.
  2. Links can add very little explanatory power to a rankings prediction model based on branded search volume, at least at a domain level.

The real world: Why do rankings change?

At the end of the day, we’re interested in whether links are a ranking factor because we’re interested in whether we should be trying to use them to improve the rankings of our sites, or our clients’ sites.

Fluctuation

The first example I want to look at here is this graph, showing UK rankings for the keyword “flowers” from May to December last year:

The fact is that our traditional understanding of ranking changes — which breaks down into links, on-site, and algorithm changes — cannot explain this degree of rapid fluctuation. If you don’t believe me, the above data is available publicly through platforms like SEMRush and Searchmetrics, so try to dig into it yourself and see if there’s any external explanation.

This level and frequency of fluctuation is increasingly common for hotly contested terms, and it shows a tendency by Google to continuously iterate and optimize — just as marketers do when they’re optimizing a paid search advert, or a landing page, or an email campaign.

What is Google optimizing for?

Source

The above slide is from Larry Kim’s presentation at SearchLove San Diego, and it shows how the highest SERP positions are gaining click-through rate over time, despite all the changes in Google Search (such as increased non-organic results) that ought to drive the opposite.

Larry’s suggestion is that this is a symptom of Google’s procedural optimization — not of the algorithm, but by the algorithm and of results. This certainly fits in with everything we’ve seen.

Successful link building

However, at the other end of the scale, we get examples like this:

Picture1.png

The above graph (courtesy of STAT) shows rankings for the commercial keywords for Fleximize.com during a Distilled creative campaign. This is a particularly interesting example for two reasons:

  • Fleximize started off as a domain with relatively little equity, meaning that changes were measurable, and that there were fairly easy gains to be made
  • Nothing happened with the first two pieces (1, 2), even though they scored high-quality coverage and were seemingly very comparable to the third (3).

It seems that links did eventually move the needle here, and massively so, but the mechanisms at work are highly opaque.

The above two examples — “Flowers” and Fleximize — are just two real-world examples of ranking changes. I’ve picked one that seems obviously link-driven but a little strange, and one that shows how volatile things are for more competitive terms. I’m sure there are countless massive folders out there full of case studies that show links moving rankings — but the point is that it can happen, yet it isn’t always as simple as it seems.

How do we explain all of this?

A lot of the evidence I’ve gone through above is contradictory. Links are correlated with rankings, and Google says they’re important, and sometimes they clearly move the needle, but on the other hand brand awareness seems to explain away most of their statistical usefulness, and Google’s operating with more subtle methods in the data-rich top end.

My favored explanation right now to explain how this fit together is this:

  • There are two tiers — probably fuzzily separated.
  • At the top end, user signals — and factors that Google’s algorithms associate with user signals — are everything. For competitive queries with lots of search volume, links don’t tell Google anything it couldn’t figure out anyway, and links don’t help with the final refinement of fine-grained ordering.
  • However, links may still be a big part of how you qualify for that competition in the top end.

This is very much a work in progress, however, and I’d love to see other people’s thoughts, and especially their fresh research. Let me know what you think in the comments below.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Half of Page-1 Google Results Are Now HTTPS

Posted by Dr-Pete

Just over 9 months ago, I wrote that 30% of page-1 Google results in our 10,000-keyword tracking set were secure (HTTPS). As of earlier this week, that number topped 50%:

While there haven't been any big jumps recently – suggesting this change is due to steady adoption of HTTPS and not a major algorithm update – the end result of a year of small changes is dramatic. More and more Google results are secure.

MozCast is, of course, just one data set, so I asked the folks at Rank Ranger, who operate a similar (but entirely different) tracking system, if they thought I was crazy...

Could we both be crazy? Absolutely. However, we operate completely independent systems with no shared data, so I think the consistency in these numbers suggests that we're not wildly off.

What about the future?

Projecting the fairly stable trend line forward, the data suggests that HTTPS could hit about 65% of page-1 results by the end of 2017. The trend line is, of course, an educated guess at best, and many events could change the adoption rate of HTTPS pages.

I've speculated previously that, as the adoption rate increased, Google would have more freedom to bump up the algorithmic (i.e. ranking) boost for HTTPS pages. I asked Gary Illyes if such a plan was in the works, and he said "no":

As with any Google statement, some of you will take this as gospel truth and some will take it as devilish lies. While he isn't promising that Google will never boost the ranking benefits of HTTPS, I believe Gary on this one. I think Google is happy with the current adoption rate and wary of the collateral damage that an aggressive HTTPS ranking boost (or penalty) could cause. It makes sense that they would bide their time..

Who hasn't converted?

One of the reasons Google may be proceeding with caution on another HTTPS boost (or penalty) is that not all of the big players have made the switch. Here are the Top 20 subdomains in the MozCast dataset, along with the percentage of ranking URLs that use HTTPS:

(1) en.wikipedia.org — 100.0%
(2) www. amazon.com — 99.9%
(3) www. facebook.com — 100.0%
(4) www. yelp.com — 99.7%
(5) www. youtube.com — 99.6%
(6) www. pinterest.com — 100.0%
(7) www. walmart.com — 100.0%
(8) www. tripadvisor.com — 99.7%
(9) www. webmd.com — 0.2%
(10) allrecipes.com — 0.0%
(11) www. target.com — 0.0%
(12) www. foodnetwork.com — 0.0%
(13) www. ebay.com — 0.0%
(14) play.google.com — 100.0%
(15) www. bestbuy.com — 0.0%
(16) www. mayoclinic.org — 0.0%
(17) www. homedepot.com — 0.0%
(18) www. indeed.com — 0.0%
(19) www. zillow.com — 100.0%
(20) shop.nordstrom.com – 0.0%

Of the Top 20, exactly half have switched to HTTPS, although most of the Top 10 have converted. Not surprisingly, switching is, with only minor exceptions, nearly all-or-none. Most sites naturally opt for a site-wide switch, at least after initial testing.

What should you do?

Even if Google doesn't turn up the reward or penalty for HTTPS, other changes are in play, such as Chrome warning visitors about non-secure pages when those pages collect sensitive data. As the adoption rate increases, you can expect pressure to switch to increase.

For new sites, I'd recommend jumping in as soon as possible. Security certificates are inexpensive these days (some are free), and the risks are low. For existing sites, it's a lot tougher. Any site-wide change carries risks, and there have certainly been a few horror stories this past year. At minimum, make sure to secure pages that collect sensitive information or process transactions, and keep your eyes open for more changes.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Why Net Neutrality Matters for SEO and Web Marketing – Whiteboard Friday

Posted by randfish

Net neutrality is a hot-button issue lately, and whether it's upheld or not could have real ramifications for the online marketing industry. In this Whiteboard Friday, Rand covers the potential consequences and fallout of losing net neutrality. Be sure to join the ensuing discussion in the comments!


Why net neutrality matter for SEO and web marketing

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week, we're taking a departure from our usual SEO tactics and marketing tactics to talk for a minute about net neutrality. Net neutrality is actually something that is hugely critical and massively important to web marketers, especially those of us who help small and medium businesses, local businesses, and websites that aren't in the top 100 most popular sites and wealthiest sites on the web.

The reason that we're going to talk net neutrality is because, for the first time in a while, it's actually at high risk and there are some things that we might be able to do about it. By protecting net neutrality, especially here in the United States, although this is true all over the world, wherever you might be, we can actually help to preserve our jobs and our roles as marketers. I'll talk you through it in just a sec.

What is net neutrality?

So, to start off, you might be asking yourself, "Okay, Rand, I might have heard of net neutrality, but explain to me what it is." I'm going to give you a very basic introduction, and then I'll invite you to dig deeper into it.

But essentially, net neutrality is this idea that as a user of the Internet, through my Internet service provider — that might be through my cellphone network, that might be through my home Internet provider, through my Internet provider at work, these ISPs, a Verizon or a Comcast or a Cox or a T-Mobile or AT&T, those are all here in the United States and there are plenty of others overseas — you essentially can get access to the whole web equally, meaning that these ISPs are not regulating download speed based on someone paying them more or less or based on a website being favored by them or owned by them or invested in by them. Essentially, when you get access to the web, you get access to it equally. There's equality and neutrality for the entire Internet.

Non-neutrality

In a non-neutrality scenario, you can see my little fellow here is very unhappy, because his ISP is essentially regulating and saying, "Hey, if you want to pay us $50 a month, you can have access to Google, Facebook, Twitter, and MSN. Then if you want to pay a little bit more, $100 a month, you can get access to all these second-tier sites, and we'll let you visit those and use those services. If you want to pay us $150 a month, you can get access to all websites."

This is just one model of how a non-neutrality situation might work. There are a bunch of other ones. This is probably not the most realistic one, and it might be slightly hyperbolic, but the idea behind it is always the same — that essentially the ISP can work however they'd like. They are not bound by government rules and regulations requiring them to serve the entire web equally.

Now, if you're an ISP, you can imagine that this is a wonderful scenario. If I'm AT&T or I'm Verizon, I might be maxing out how much money I can make from consumers, and I'm constantly having to be competitive against other ISPs. But if I can do this, I can then have a bunch more vectors (a) to get money from all these different websites and web services, and (b) to charge consumers much more based on tiering their access.

So this is wonderful for me, which is why ISPs like Comcast and Verizon and AT&T and Cox and all these others have put a lot of money towards lobbyists to try and change the opinions of the federal government, and that's mostly, well, in the United States right now, it's the Republican administration and the folks in Congress and the Federal Communications Chair, who is Ajit Pai, recently selected by Trump as the new FCC Chair.

Why should marketers care?

Reasons that you should care about this as a web market are:

1. Equal footing for web access creates a more even playing field.

  • Greater competition. It allows websites to compete with each other without having to pay and without having to only serve different consumers who may be paying different rates to their ISPs.
  • It also means more players, because anyone can enter the field. Simply by registering a website and hosting it, you're now on an even playing field technically with everyone, with Google, with Facebook, with Microsoft, with Amazon. You get the same access, at least at the fundamental ISP level, with everyone else on the Internet. That means there's a lot more demand for competitive web marketing services, because there are many more businesses who need to compete and who can compete.
  • Also much less of an inherent advantage for these big, established, rich players. If you're a very, very rich website, you have tons of money, you have tons of resources, lots of influence, it's easy to say, "Hey, I'm not going to worry about this because I know I can always be in this lowest tier or whatever they're providing for free because I can pay the ISP, and I can influence government rules and regulations, and I can connect with all the different ISPs out there and make sure I'm always accessible." But for a small website, that's a nightmare scenario, incredibly hard, and it makes a huge competitive advantage by being big and established already, which means it's much tougher to get started.

2. The costs of getting started online are much lower under net neutrality.

Currently, if you register your website and you start doing your hosting:

  • You don't need to pay off any ISPs. You don't need to go approach Comcast or Verizon or AT&T or Cox or anybody like this and say, "Hey, we would like to get on to your high-speed, fastest-tier, best-access plan." You don't have to do that, because net neutrality, the law of the land means that you are automatically guaranteed that.
  • There's also no separate process. So it's not just the cost, it's also the work involved to go to these ISPs. There are several hundred ISPs with hundreds of thousands of active customers in the United States today. That number has generally been shrinking as that industry has been consolidating a little bit more. But still, that's a very, very challenging thing to have to do. If you are a big insurance provider, it's one thing to have someone go manage that task, but if you're a brand-new startup website, that's entirely another to try and do that.

3. The talent, the strategy, the quality of product and services and marketing that a new company, a new website has are going to create winners and losers in their field today versus this potential non-neutrality situation, where it's not quite a rigged system, but I'm calling it a rigged system a little bit because of this built-in advantage that you have for money and influence.

I think we would all generally agree that, in 2017, in late-stage capitalist societies, that, generally speaking, there's already a huge advantage by having a lot of money and influence. I'm not sure those with money and influence necessarily need another leg up on entrepreneurs and startups and folks who are trying to compete on the web.

What might happen?

Now, maybe you'll disagree, but I think that these together make a very compelling case scenario. Here's what might actually happen.

  • "Fast lanes" for some sites - Most observers of the space think that fast lanes would become a default. So fast lanes, meaning that certain tiers, certain parts of the web, certain web services and companies would get fast access. Others would be slower or potentially even disallowed on certain ISPs. That would create some real challenges.
  • Free vs. paid access by some ISPs - There would probably be some free and some paid services. You can see T-Mobile actually tried to do this recently, where they basically said, "Hey, if you are on a T-Mobile device, even if you're paying us the smallest amount, we're going to allow you to access these certain things." I think it was a video service for free. Essentially, that currently is against the rules.

You might say, "Rand, that seems unfair. Why shouldn't T-Mobile be able to offer some access to the web for free and then you just have to pay for the rest of it?" I hear you. I think unfortunately that's a bit of a red herring, because that particular implementation of a non-neutral situation is not that bad. It's not particularly bad for consumers. It's not particularly bad for businesses.

If T-Mobile just charged their normal rate, and then they happen to have this, "Oh, by the way, here you get this little portion of the web for free," no one's going to complain about that. It's not particularly terrible. But it does violate net neutrality, and it is a very slippery slope to a world like this, a very painful world for a lot of people. That's why we're willing to sort of take the sacrifices of saying, "Hey, we don't want to allow this because it violates the principle and the law of net neutrality."

  • Payoffs required for access or speed - Then the third thing that would almost certainly happen is that there would be payoffs. There would be payoffs on both sides. You have to pay more as a consumer, to your ISP, in order to access the web in the way that you do today, and as a website, in order to reach consumers who maybe can't afford or choose not to pay more, you have to pay off the ISP to get that full access.

What's the risk?

Why am I bringing this up now?

  • Higher than ever... Why is the risk so high? Well, it turns out the new American administration has basically come out against net neutrality in a very aggressive fashion.
  • The new FCC Chair, Ajit Pai, has been fighting this pretty hard. He's made a bunch of statements just in the last few days. He actually overturned some rulings from years past that asked smaller ISPs to be transparent about their net neutrality practices, and that's been overturned. He's arguing this basically under what I'm going to call a guise of free markets and competitive marketplaces. I think that is a total misnomer.

This creates a truly equal marketplace for everyone. While it is somewhat restrictive, I think one of the most interesting things to observe about this is that this is a non-political issue or at least not a very politicized issue for most of American voters. Actually, 81% of Democrats in a Gallup survey said that they support net neutrality, and an even greater percent of Republicans, 85%, said they support net neutrality.* So, really, you have virtually an overwhelming swath of voters in the United States who are saying this should be the law of the land.

The reason that this is generally being fought against by both Congress and the FCC is because these big ISPs have a lot of money, and they've paid a lot of lobbying dollars to try and influence politics. For those of you outside the United States, I know that sounds like it should be illegal. It's not in our country. I know it's illegal in most democracies, but it's sort of how democracy in the United States works.

*Editor's note: This poll was conducted by the University of Delaware.

What can we do?

If you want to take some action on this and fight back and tell your Congress person, your senator, your representatives locally and federally that you are against this, I would check out SaveTheInternet.com for those folks who are in the United States. For whatever country you're in, I would urge you to search for "support net neutrality" and check out the initiatives that may be available in your country or your geography locally so that you can take some action.

This is something that we've fought against as Internet users in the past and as businesses on the web before, and I think we're going to have to renew that fight in order to maintain the status quo and keep equal footing with each other. This will help us preserve our careers in web marketing, but it will also help preserve an open, free, competitive Internet. I think that's something we can all agree is very important.

All right. Thanks, everyone. Look forward to your comments. Certainly open to your critiques. Please try and keep them as kosher and as kind as you can. I know when it gets into political territory, it can be a little frustrating. And we will see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Large Site SEO Basics: Faceted Navigation

Posted by sergeystefoglo

If you work on an enterprise site — particularly in e-commerce or listings (such as a job board site) — you probably use some sort of faceted navigation structure. Why wouldn’t you? It helps users filter down to their desired set of results fairly painlessly.

While helpful to users, it’s no secret that faceted navigation can be a nightmare for SEO. At Distilled, it’s not uncommon for us to get a client that has tens of millions of URLs that are live and indexable when they shouldn’t be. More often than not, this is due to their faceted nav setup.

There are a number of great posts out there that discuss what faceted navigation is and why it can be a problem for search engines, so I won’t go into much detail on this. A great place to start is this post from 2011.

What I want to focus on instead is narrowing this problem down to a simple question, and then provide the possible solutions to that question. The question we need to answer is, “What options do we have to decide what Google crawls/indexes, and what are their pros/cons?”

Brief overview of faceted navigation

As a quick refresher, we can define faceted navigation as any way to filter and/or sort results on a webpage by specific attributes that aren’t necessarily related. For example, the color, processor type, and screen resolution of a laptop. Here is an example:

Because every possible combination of facets is typically (at least one) unique URL, faceted navigation can create a few problems for SEO:

  1. It creates a lot of duplicate content, which is bad for various reasons.
  2. It eats up valuable crawl budget and can send Google incorrect signals.
  3. It dilutes link equity and passes equity to pages that we don’t even want indexed.

But first… some quick examples

It’s worth taking a few minutes and looking at some examples of faceted navigation that are probably hurting SEO. These are simple examples that illustrate how faceted navigation can (and usually does) become an issue.

Macy’s

First up, we have Macy’s. I’ve done a simple site:search for the domain and added “black dresses” as a keyword to see what would appear. At the time of writing this post, Macy’s has 1,991 products that fit under “black dresses” — so why are over 12,000 pages indexed for this keyword? The answer could have something to do with how their faceted navigation is set up. As SEOs, we can remedy this.

Home Depot

Let’s take Home Depot as another example. Again, doing a simple site:search we find 8,930 pages on left-hand/inswing front exterior doors. Is there a reason to have that many pages in the index targeting similar products? Probably not. The good news is this can be fixed with the proper combinations of tags (which we’ll explore below).

I’ll leave the examples at that. You can go on most large-scale e-commerce websites and find issues with their navigation. The points is, many large websites that use faceted navigation could be doing better for SEO purposes.

Faceted navigation solutions

When deciding a faceted navigation solution, you will have to decide what you want in the index, what can go, and then how to make that happen. Let’s take a look at what the options are.

"Noindex, follow"

Probably the first solution that comes to mind would be using noindex tags. A noindex tag is used for the sole purpose of letting bots know to not include a specific page in the index. So, if we just wanted to remove pages from the index, this solution would make a lot of sense.

The issue here is that while you can reduce the amount of duplicate content that’s in the index, you will still be wasting crawl budget on pages. Also, these pages are receiving link equity, which is a waste (since it doesn’t benefit any indexed page).

Example: If we wanted to include our page for “black dresses” in the index, but we didn’t want to have “black dresses under $100” in the index, adding a noindex tag to the latter would exclude it. However, bots would still be coming to the page (which wastes crawl budget), and the page(s) would still be receiving link equity (which would be a waste).

Canonicalization

Many sites approach this issue by using canonical tags. With a canonical tag, you can let Google know that in a collection of similar pages, you have a preferred version that should get credit. Since canonical tags were designed as a solution to duplicate content, it would seem that this is a reasonable solution. Additionally, link equity will be consolidated to the canonical page (the one you deem most important).

However, Google will still be wasting crawl budget on pages.

Example: /black-dresses?under-100/ would have the canonical URL set to /black-dresses/. In this instance, Google would give the canonical page the authority and link equity. Additionally, Google wouldn’t see the “under $100” page as a duplicate of the canonical version.

Disallow via robots.txt

Disallowing sections of the site (such as certain parameters) could be a great solution. It’s quick, easy, and is customizable. But, it does come with some downsides. Namely, link equity will be trapped and unable to move anywhere on your website (even if it’s coming from an external source). Another downside here is even if you tell Google not to visit a certain page (or section) on your site, Google can still index it.

Example: We could disallow *?under-100* in our robots.txt file. This would tell Google to not visit any page with that parameter. However, if there were any "follow" links pointing to any URL with that parameter in it, Google could still index it.

"Nofollow" internal links to undesirable facets

An option for solving the crawl budget issue is to "nofollow" all internal links to facets that aren’t important for bots to crawl. Unfortunately, "nofollow" tags don’t solve the issue entirely. Duplicate content can still be indexed, and link equity will still get trapped.

Example: If we didn’t want Google to visit any page that had two or more facets indexed, adding a "nofollow" tag to all internal links pointing to those pages would help us get there.

Avoiding the issue altogether

Obviously, if we could avoid this issue altogether, we should just do that. If you are currently in the process of building or rebuilding your navigation or website, I would highly recommend considering building your faceted navigation in a way that limits the URL being changed (this is commonly done with JavaScript). The reason is simple: it provides the ease of browsing and filtering products, while potentially only generating a single URL. However, this can go too far in the opposite direction — you will need to manually ensure that you have indexable landing pages for key facet combinations (e.g. black dresses).

Here’s a table outlining what I wrote above in a more digestible way.

Options:

Solves duplicate content?

Solves crawl budget?

Recycles link equity?

Passes equity from external links?

Allows internal link equity flow?

Other notes

“Noindex, follow”

Yes

No

No

Yes

Yes


Canonicalization

Yes

No

Yes

Yes

Yes

Can only be used on pages that are similar.

Robots.txt

Yes

Yes

No

No

No

Technically, pages that are blocked in robots.txt can still be indexed.

Nofollow internal links to undesirable facets

No

Yes

No

Yes

No


JavaScript setup

Yes

Yes

Yes

Yes

Yes

Requires more work to set up in most cases.

But what’s the ideal setup?

First off, it’s important to understand there is no “one-size-fits-all solution.” In order to get to your ideal setup, you will most likely need to use a combination of the above options. I’m going to highlight an example fix below that should work for most sites, but it’s important to understand that your solution might vary based on how your site is built, how your URLs are structured, etc.

Fortunately, we can break down how we get to an ideal solution by asking ourselves one question. “Do we care more about our crawl budget, or our link equity?” By answering this question, we're able to get closer to an ideal solution.

Consider this: You have a website that has a faceted navigation that allows the indexation and discovery of every single facet and facet combination. You aren’t concerned about link equity, but clearly Google is spending valuable time crawling millions of pages that don’t need to be crawled. What we care about in this scenario is crawl budget.

In this specific scenario, I would recommend the following solution.

  1. Category, subcategory, and sub-subcategory pages should remain discoverable and indexable. (e.g. /clothing/, /clothing/womens/, /clothing/womens/dresses/)
  2. For each category page, only allow versions with 1 facet selected to be indexed.
    1. On pages that have one or more facets selected, all facet links become “nofollow” links (e.g. /clothing/womens/dresses?color=black/)
    2. On pages that have two or more facets selected, a “noindex” tag is added as well (e.g. /clothing/womens/dresses?color=black?brand=express?/)
  3. Determine which facets could have an SEO benefit (for example, “color” and “brand”) and whitelist them. Essentially, throw them back in the index for SEO purposes.
  4. Ensure your canonical tags and rel=prev/next tags are setup appropriately.

This solution will (in time) start to solve our issues with unnecessary pages being in the index due to the navigation of the site. Also, notice how in this scenario we used a combination of the possible solutions. We used “nofollow,” “noindex, nofollow,” and proper canonicalization to achieve a more desirable result.

Other things to consider

There are many more variables to consider on this topic — I want to address two that I believe are the most important.

Breadcrumbs (and markup) helps a lot

If you don't have breadcrumbs on each category/subcategory page on your website, you’re doing yourself a disservice. Please go implement them! Furthermore, if you have breadcrumbs on your website but aren’t marking them up with microdata, you’re missing out on a huge win.

The reason why is simple: You have a complicated site navigation, and bots that visit your site might not be reading the hierarchy correctly. By adding accurate breadcrumbs (and marking them up), we’re effectively telling Google, “Hey, I know this navigation is confusing, but please consider crawling our site in this manner.”

Enforcing a URL order for facet combinations

In extreme situations, you can come across a site that has a unique URL for every facet combination. For example, if you are on a laptop page and choose “red” and “SSD” (in that order) from the filters, the URL could be /laptops?color=red?SSD/. Now imagine if you chose the filters in the opposite order (first “SSD” then “red”) and the URL that’s generated is /laptops?SSD?color=red/.

This is really bad because it exponentially increases the amount of URLs you have. Avoid this by enforcing a specific order for URLs!

Conclusions

My hope is that you feel more equipped (and have some ideas) on how to tackle controlling your faceted navigation in a way that benefits your search presence.

To summarize, here are the main takeaways:

  1. Faceted navigation can be great for users, but is usually setup in a way that negatively impacts SEO.
  2. There are many reasons why faceted navigation can negatively impact SEO, but the top three are:
    1. Duplicate content
    2. Crawl budget being wasted
    3. Link equity not being used as effectively as it should be
  3. Boiled down further, the question we want to answer to begin approaching a solution is, “What are the ways we can control what Google crawls and indexes?”
  4. When it comes to a solution, there is no “one-size-fits-all” solution. There are numerous fixes (and combinations) that can be used. Most commonly:
    1. Noindex, follow
    2. Canonicalization
    3. Robots.txt
    4. Nofollow internal links to undesirable facets
    5. Avoiding the problem with an AJAX/JavaScript solution
  5. When trying to think of an ideal solution, the most important question you can ask yourself is, “What’s more important to our website: link equity, or crawl budget?” This can help focus your possible solutions.

I would love to hear any example setups. What have you found that’s worked well? Anything you’ve tried that has impacted your site negatively? Let’s discuss in the comments or feel free to shoot me a tweet.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!