Well-Known Files

Contents

This article provides brief information about the most well-known files on a web server. Some of these files also list their advantages and disadvantages. Instructions on how to safely handle the contents of these files are included with the particularly critical files.

/ads.txt

This file is used in the context of selling digital advertising. Authorized Digital Sellers for Web (ADS) aims to improve transparency in automated online advertising. The ads.txt file contains information about companies that are authorized to sell advertising on the website. The content can be generated, for example, using the Google Ad Manager.

/humans.txt

The idea behind this file was to name the authors and other participants in an online project. However, the use of this file has never prevailed. This file exists on google.com, but the content is irrelevant.

The website of the "inventors" of this file (humanstxt.org) has an expired certificate, and the content of their humans.txt file is a treasure trove for spammers and hackers. In addition to names, email addresses, and places of residence, you can also see that this file or their information was last updated in the year 2012.

Advantages:
Disadvantages:

/robots.txt

The file "robots.txt" is located in the root directory of a domain and contains instructions for bots such as crawlers of search engines, bots of SEO or statistics services, AI, etc. It defines the files and directories that may or may not be indexed by bots. In addition, depending on the bot, settings such as the delay of the requests can also be set.

However, the information in the robots.txt file should only be viewed as guidelines. Each bot interprets the entries and path information differently, or not all information is supported by every bot. Dubious bots, in particular, do not adhere to these guidelines. Attackers can also use the information from it to obtain information.

Advantages:
Disadvantages:
Conclusion:

Useful to instruct the crawlers of well-known search engines to include or exclude pages or directories.

Further information can be found at https://www.robotstxt.org and https://datatracker.ietf.org/doc/html/rfc9309.

/sitemap.xml and /sitemap.txt

Sitemaps are intended to inform search engines about URLs on a website that are available for web crawling. It allows webmasters to add additional information to each URL, like when it was last updated, how often it changes, and how important it is in relation to other URLs on the website. This allows search engines to crawl the website more efficiently and find URLs that may be isolated from the rest of the website's content.

In addition to the XML format, the Sitemap protocol also allows a text file with a simple list of URLs, where each URL must be on its own line. For both formats, however, the URL must be specified as an absolute indication, including schema and domain. Relative paths are not allowed.

/.well-known/

This directory can contain many different files for various purposes. These can be registered by IANA and standardized, or freely chosen for your own projects. The files can be used for certain services, projects, or apps. Some are used for security technologies or to exchange specific information. The data is often available as plain text, either without or with a specific format, such as JSON or XML.

In addition to the specific information about the respective services, URLs, email addresses, or other data of interest to attackers, may also be included.

/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/

This directory is used to check the correct configuration of the web server regarding the return of the status code. When calling this directory, the status "Ok" should never be returned. This directory should therefore never exist.

/.well-known/dnt-policy.txt

In this file, the EFF DNT policy can be published to unilaterally commit to a meaningful version of "Do Not Track" so that other software can detect it.

/.well-known/trust.txt

This file is intended for pages with journalistic content that want to disclose their connections with each other - e.g. to other agencies or journalists. In addition to URLs to other websites, the file may also contain social media URLs and email addresses. Permission for AI robots to collect training data can also be entered.

/.well-known/security.txt

The intention behind this file is to make it easier for security researchers to contact a company when they find a vulnerability. A contact is typically provided as an email address or website URL. Additional information such as a PGP key, a URL to the security policy, and an expiration date can be entered. The latter is required by the standard, but is often forgotten or intentionally ignored.

In a security policy, the most important points should be addressed, such as:

Advantages:
Disadvantages:
Conclusion:

For large, well-known companies or if a bug bounty program exists, the use of the security.txt file makes sense. Without a bug bounty program, only the disadvantages remain.

404.php / 404 Error Page

The file "404.php" is often queried by attackers to retrieve the error page for 404 errors (page not found). The file name "404.php" is very popular, but irrelevant to the actual purpose. On 404 error pages - especially on the standard error page of the web server - the version of the web server, the PHP version, or information about the page operator is displayed. This information is useful for attackers. Therefore, the standard error page should be replaced or edited accordingly so that no more technical information is disclosed.

phpinfo.php

A file with this filename is often used to display information about PHP and the web server. It is usually located in the root directory, but can also be located anywhere else. The contents of this file usually consist of just these lines:

<?php
phpinfo();
?>

The function "phpinfo" outputs extensive information such as:

Due to the large amount of detailed information about the server, this file should be uploaded to the server with a different name and only when needed. Once the information has been received, it should be deleted immediately. Attackers would benefit greatly from this information.

Were the free content on my website helpful for you?
Support the further free publication with a donation via PayPal.

Read more about support options...