I am generating the sitemap through PHP, since I want to control the URLs that are indexed, but I have a question, since before I did not use friendly URLs and now I do.
That is, for example I have this structure:
<url>
<loc>http://www.miweb.com/contacto.php</loc>
<priority>0.8</priority>
</url>
Should I change it to this one?
<url>
<loc>http://www.miweb.com/contacto/</loc>
<priority>0.8</priority>
</url>
That is, change contacto.php
to /contacto/
. I had planned to pass everything so that it would appear with friendly URLs, which I understand will be better for Google robots to understand what is in the URL.
This is a misconception. Friendly URLs are human friendly . Google's computers/machines/bots don't care if they are human-readable and "see no difference" between
/contacto.php
or/contacto/
.If you have the meta information right on the landing page, search engines won't care what URL you use in the sitemap*; all they want is for you to link to existing pages for indexing.
If you have a page that can be accessed in different ways, what you need to do is specify which of the URLs is the canonical one. So no matter which version of the URL Googlebot visits, it will be indexed with the URL specified as canonical.
For example, you have
/contacto.php
or/contacto/
, but you want Google to index it as/contacto/
, then you should add the following code to your page header:In this way, when your website is crawled, regardless of the URL you have in the sitemap, Google will index the page as
/contacto/
. In this Google article you can find more information about how it works.*If a canonical URL is not specified, the search engine may index the page with the URL linked in the sitemap. In fact, on the page linked above, in the section about sitemaps , it is recommended to use the one you are going to use as canonical. But it is not guaranteed that they will use that URL.
In addition to what was already added by the partner, tell you that with it you will
sitemap
not control in any way what Google indexes or stops indexing a site, because Google tracks URLs whether or not they are in the site map as long as the contrary is not indicated. If you want to avoid the indexing of certain pages and control the crawl budget , do it through the robots.txt .