Do you have your coffee? Because weāre going to talk about Caffeine!
In Googleās most recent Search Off the Record podcast episode, Gary Illyes, Googleās Webmaster Trends Analyst, shared how Google uses Caffeine to index webpages in search.
What is Google Caffeine?
Google Caffeine is Googleās indexing system. It acts as the bridge between Googleās crawler (Googlebot) and your website. While it has a multitude of functions, itās main purpose is to read your website and turn it into uniform HTML format which it then indexes.
Think of it like the translators at the United Nations. Say the delegate from Denmark was speaking on stage to an audience from Thailand. The translator (Caffeine) would be turning the Danish language into Thai so that communication between the two parties can happen.
How does Google Caffeine work?
The first step is for the Googlebot to pick up the information on your website and produce a protocol buffer.
What is a protocol buffer?
A protocol buffer, developed by Google, is a method of translating data into a normalized structure. Itās designed to take in a lot of different types of web information and turn it into a single, simple, HTML format. This is done to streamline indexing. A protocol buffer doesnāt make any changes to a website. It simply reads and regurgitates. Google describes protocol buffers as ālanguage-neutralā and āplatform-neutralā.
After Googlebot produces the protocol buffer, Caffeine will pick up the HTML and start to process it. By “processing” it Caffeine will read through the HTML. Thus why itās important to have clean functioning HTML on your website.
As it reads through the HTML, it will begin working through the structure that youāve worked into your website – namely your header tags.
Header tags create structure. As Illyes notes: āWe try to understand the styling that was applied on the h tags, so we can determine the relative importance of the h tags compared to each other.ā That is why itās important to use them appropriately. If you build a page made entirely of H4 tags Google, and Caffeine, will read everything on that page as equally important.
Can Google index PDFs?
Yes! Illyes revealed that Google can index a variety of formats including PDFs, spreadsheets, word documents, and more. Caffeine translates these file types into HTML.
It seems like, to streamline the process and increase the rate of indexability, it may make sense to create content in both a text-based non-PDF format as well as having a downloadable PDF version for users. Illyes acknowledged that PDF, as a binary format, is not easy to process.
Do robot.txt files matter to Google Caffeine?
Yes! Illyes remarked that robot.txt files are something that āwe deeply care about.ā If Caffeine finds a ānoindexā code it will automatically stop reading the file and wonāt index it.
Does HTML in a header tag affect indexing?
Yes again! Illyes revealed that the HTML reader will āclose the head, right before those tags, and starts the body from there on.ā Using the appropriate HTML and header tag structure increases indexability.
How can you make Google crawl your site faster?
Google is relatively transparent about what works. And what doesnāt. This latest podcast highlights the need for appropriate use of header tags, using HTML appropriately, having on-page content over PDFs, not miss using ānoindexā codes, and being mindful of creating helpful content.
How do you know if Google crawled your site?
This information is available via Google Search Console using the URL inspection tool. This nifty tool will also let you know if your page was indexed. If it isnāt itāll also tell you why.
Other takeaways from Gary Illyes:
Do meta keywords matter?
Nope! Google does not care about meta keywords! As Illyes says: āWe donāt care about the meta keywords at all. At all.ā
What are meta keywords?
Meta keywords are keywords that are built into the HTML of a website.
Do Out-of-Stock pages affect SEO?
Yes, Illyes revealed that they can impact indexability. Instead either remove the page from your website or edit it to include a āsubscribe for updatesā options. If you have too many on page āerrorā notifications you run the risk of Google reading it as a soft 404 page. Illyes hinted that staying helpful to users, and not being misleading, is whatās important.
What is a soft 404 page?
A soft 404 page is page that has a 2xx code but Google thinks it should be an error page. Illyes noted that it is a Google error. He shared, for example, that if āyou are writing an article about error pages in general, and you canātā¦ get it indexedā¦.Thatā sometimes because our error page handling systems miss-detect your articled, based on the keywords that you use.ā
Ready to keep reading? Learn how Google uses backlinks to determine rankings in search results.