Jun 18, 2024

The Google API Documentation Leak: 10 Insights for Digital Marketers

In a major development for the Search Engine Optimization (SEO) community, an extensive leak of Google’s internal documentation has provided unprecedented insights into the search giant’s ranking mechanisms. 

The leak, which included more than 2,500 pages of sensitive information, shed light on Google's ranking factors and processes, offering SEO specialists and managers a rare glimpse into how the dominant search engine really works.

The documents revealed several significant details, many of which contradict public statements Google has made to the SEO industry:

Google recognized early on the need for full clickstream data, which is the record of every website visited by a browser. This led to the creation of the Chrome browser to improve search result quality.

  • A system called “NavBoost” collects extensive user interaction data, such as click patterns and trending search terms, to refine search results.
  • Click and engagement data are used to evaluate site quality, significantly influencing rankings.
  • Google employs whitelists during critical events like the Covid-19 pandemic and democratic elections to control which websites appear prominently in search results.

This article aims to break down the implications of this leak for non-technical marketers to answer three key questions: 

  1. What is the Google API documentation leak?
  2. Why does it matter for marketers? 
  3. 10 things every marketer should know, and do

What is the Google API documentation leak?

The story was broken by Rand Fishkin (ex-CEO of Moz, co-founder at SparkToro and recent DMI podcast guest) who received the initial documents and then broke the story in his article here. 

An anonymous source, later revealed to be Erfan Azimi, contacted Fishkin to discuss documents that were accidentally made public on GitHub between March and May 2024. 

This leak exposed more than 2,500 pages of Google’s API documentation, which are technical manuals that describe how software components interact (you can read a technical breakdown of the leak here). These documents contained 14,014 attributes, specific features or variables used in Google's internal “Content API Warehouse.” 

The documents do not reveal the "weights" applied to the attributes, meaning they do not specify the relative importance of each attribute within Google’s search algorithm. Nonetheless, this leak provides overdue clarity on the individual attributes, offering valuable insights for the marketing industry.

Screenshot of details of documents
Screenshot of details of documents

Why does it matter for marketers?

It’s worth taking a step back to put the leak in its wider context. What is SEO?

Well, SEO is the practice of increasing traffic to websites from the non-paid listings on search engines. Every time someone searches on Google, which has 91% of the global search engine market, Google’s automated systems must “decide” which websites to show on the results page. 

There is also a huge global industry of SEO managers and professionals, all of whom spend their days working to be rewarded by Google with better ranking positions for their websites. Naturally, if these SEO professionals knew the components of Google’s decision-making system, it would help channel their energy more productively. 

Google has good reason to keep its ranking factors secret. Not only does it want to keep this sensitive information away from competitors, it wants to prevent some of the less ethical marketers from gaming the system. Google requires high-quality SEO results to keep users engaged, all the better to serve paid listings against their future search queries. 

This tantalizing glimpse behind the curtain this story has given us has implications for every marketer. We have rounded up the ten most important findings below. 

10 things every marketer should know, and do

1. ‘Domain Authority’ exists - sort of

Google evaluates the overall authority of a website using a metric called "siteAuthority." This metric assesses the trustworthiness and credibility of a website, which significantly influences the rankings of its individual pages. A higher siteAuthority indicates that a website is more likely to rank well in search results, as it is seen as a reliable and authoritative source.

Actions: 

  • Build a credible site with high-quality, valuable content. 
  • Attract reputable backlinks from industry-leading websites. 
  • Ensure your site is technically sound, with a user-friendly design and regular updates. 
  • Engage in content marketing strategies that establish your expertise and reliability to feed into Google’s E-E-A-T quality rating that can help determine where your content ranks on a search page.

2. User clicks influence rankings

Google uses click data, such as click-through rates (CTR) and user interactions, to adjust search rankings. This means that user behavior directly impacts how pages are ranked. When users frequently click on a result, it signals to Google that the page is relevant and useful, thereby potentially improving its ranking.

Actions: 

  • Optimize your page titles and meta descriptions to be compelling and relevant, encouraging users to click on your links. 
  • Create content that directly addresses user queries and provides immediate value. 
  • Monitor and improve your CTR and reduce bounce rates by ensuring your content is relevant, informative, and easy to navigate.

3. There is a sandbox effect for new sites

New websites or those suspected of being spammy are temporarily placed in a "sandbox," limiting their visibility in search results until they establish credibility and reliability. This practice helps Google filter out low-quality or untrustworthy sites, ensuring that only reputable sites are prominently displayed.

Actions: 

  • Be patient with new sites!
  • Prioritize creating quality, authoritative content and focus on acquiring organic, reputable high-quality backlinks. 
  • Engage in ethical SEO practices and avoid shortcuts that could harm your site's reputation.

4. User-generated content can help rankings

Google uses various signals to evaluate content quality, including user engagement metrics, the effort behind user-generated content, and the quality of reviews. High-quality content that meets user needs is more likely to rank well, as it provides value and enhances the user experience.

Actions

  • Encourage high-quality user-generated content like reviews and comments, and manage them to maintain relevance and quality. 
  • Invest in creating well-researched, informative content that addresses user needs. 
  • Regularly update and improve your content to keep it current and valuable.

5. Personalization is bigger than many marketers thought

Google personalizes search results based on individual user preferences and behaviors, tailoring the search engine results pages (SERPs) to provide the most relevant content for each user. This means that two users searching for the same term might see different results based on their past behavior and preferences.

Actions: 

  • Understand your audience’s preferences and behaviors. 
  • Use analytics and SEO reporting tools to track user behavior and tailor your content strategy. 
  • Segment your audience to deliver more targeted, relevant content, improving user satisfaction and engagement.

6. Machine learning is everywhere

Google uses machine learning models to rank content, continuously adapting its algorithms based on vast amounts of data and user behavior. These models help Google understand the context and relevance of content, making the search results more accurate and useful.

Actions: 

  • Ensure your content is both user-friendly and algorithm-friendly. 
  • Use structured data and semantic HTML to help Google’s machine learning models understand your content. 
  • Focus on high-quality, relevant content that meets user intent. 
  • Stay updated with SEO best practices and algorithm changes to adapt your strategy as needed.

7. Video really matters!

The leaked documents underscore the importance of video content for enhancing search rankings. Google uses a metric called "isVideoFocusedSite" to determine if a site primarily features video content. Sites with over 50% video URLs are classified as video-focused, potentially boosting their search presence. This trend reflects the growing inclusion of video results in SERPs across various industries.

Actions:

  • Transform your website by adding relevant video content. 
  • For small local businesses, create simple, informative videos about your services. 
  • Ensure your videos are self-hosted or on non-mainstream video platforms. 
  • Include geolocation metadata in your videos to improve local relevance.

8. Google has lots of user engagement metrics

Metrics such as dwell time (how long a user stays on a page) and long clicks (when a user does not quickly return to the search page) are important for ranking. These metrics indicate that users find the content valuable and relevant, which can improve the page’s ranking.

Actions: 

  • Create content that provides immediate value and keeps users on your site (this includes content that excites and entertains). 
  • Use multimedia elements like videos and infographics to enhance user experience. 
  • Improve site speed and navigation to reduce bounce rates. 
  • Regularly update your content to maintain its relevance and engagement.

9. Google has deprecated some well-known ranking factors

Some ranking signals mentioned in the leaked documents, such as the "Link Juice" concept and PageRank sculpting, are no longer in use. This indicates that Google continuously updates its algorithms to improve search result accuracy and quality.

Actions: 

  • Stay informed about the latest SEO best practices and algorithm updates. 
  • Regularly audit your SEO strategy to ensure it aligns with current guidelines. 
  • Be flexible and ready to adapt your approach as Google’s algorithms evolve, focusing on sustainable, long-term SEO tactics.

10. Google might not always tell SEOs the truth…

The leak reveals discrepancies between Google’s public statements and its internal practices, highlighting a lack of transparency. This disparity can lead to misunderstandings and misaligned SEO strategies. For example, the Google engineer Gary Ilyes has repeatedly stated that Google does not use clicks to affect rankings. As we saw in takeaway number two, this does not appear to be entirely true.

The Google API Documentation Leak: 10 Insights for Digital Marketers

Some of these contradictions will be familiar to anyone that has been following the Google antitrust case in the US. For example, the Google VP of Search, Pandu Nayak, has testified about “NavBoost.” This system, which initially gathered data from the Google Toolbar, motivated the creation of the Chrome browser to collect more comprehensive clickstream data. This data, which tracks every website visited by users, is crucial for improving the quality of Google's search results. 

Actions: 

  • Conduct independent research and validation of SEO tactics. 
  • Rely on multiple, reputable sources for SEO insights and avoid solely depending on Google’s statements. 
  • Stay active in the SEO community to share knowledge and learn from others’ experiences, ensuring a well-rounded and informed approach to SEO.

Conclusion

It would be impossible to quantify the number of hours SEO professionals have spent debating the ranking factors of Google's top-secret algorithm. This leak offers us all a rare glimpse into the intricacies of Google's ranking algorithms, providing invaluable insights for SEO professionals and putting a few debates to bed. 

By understanding these factors, non-technical SEOs can refine their strategies to align more closely with Google's actual practices. Embracing these insights and actions will lead to more effective SEO practices, better website performance, and ultimately, greater success in the ever-evolving landscape of search engine optimization.

Use SEO to boost brand awareness & generate quality leads

To leverage Google’s algorithm, it’s important to understand the fundamentals of search marketing. Our certified Search Marketing Course developed in collaboration with Neil Patel will introduce you to search marketing and explore search analytics, search strategy, demand generation, data visualization and much more. Get started today to get your content and brand noticed! 


Clark Boyd
Clark Boyd

Clark Boyd is CEO and founder of marketing simulations company Novela. He is also a digital strategy consultant, author, and trainer. Over the last 12 years, he has devised and implemented international marketing strategies for brands including American Express, Adidas, and General Motors.

Today, Clark works with business schools at the University of Cambridge, Imperial College London, and Columbia University to design and deliver their executive-education courses on data analytics and digital marketing. 

Clark is a certified Google trainer and runs Google workshops across Europe and the Middle East. This year, he has delivered keynote speeches at leadership events in Latin America, Europe, and the US. You can find him on X (formerly) Twitter, LinkedIn, and Slideshare. He writes regularly on Medium and you can subscribe to his email newsletter, hi, tech.

Upgrade to Power Membership to continue your access to thousands of articles, toolkits, podcasts, lessons and much much more.
Become a Power Member

CPD points available

This content is eligible for CPD points. Please sign in if you wish to track this in your account.