Well-Known Files
Contents
- /ads.txt
- /humans.txt
- /robots.txt
- /sitemap.xml and /sitemap.txt
- /.well-known/
- 404.php / 404 Error Page
- phpinfo.php
This article provides brief information about the most well-known files on a web server. Some of these files also list their advantages and disadvantages. Instructions on how to safely handle the contents of these files are included with the particularly critical files.
/ads.txt
This file is used in the context of selling digital advertising. Authorized Digital Sellers for Web (ADS) aims to improve transparency in automated online advertising. The ads.txt file contains information about companies that are authorized to sell advertising on the website. The content can be generated, for example, using the Google Ad Manager.
/humans.txt
The idea behind this file was to name the authors and other participants in an online project. However, the use of this file has never prevailed. This file exists on google.com, but the content is irrelevant.
The website of the "inventors" of this file (humanstxt.org) has an expired certificate, and the content of their humans.txt file is a treasure trove for spammers and hackers. In addition to names, email addresses, and places of residence, you can also see that this file or their information was last updated in the year 2012.
Advantages:
- No recognizable benefits. Most information is usually listed on pages like "Imprint," "Team," "About Us," etc. These pages are also much easier for people to recognize, access, and read than a text file without formatting.
Disadvantages:
- Names, email addresses, and places of residence are interesting for attackers.
- If the software or server software used is entered, this can provide information about vulnerabilities.
- The change date might suggest neglected pages and unresolved vulnerabilities.
/robots.txt
The file "robots.txt" is located in the root directory of a domain and contains instructions for bots such as crawlers of search engines, bots of SEO or statistics services, AI, etc. It defines the files and directories that may or may not be indexed by bots. In addition, depending on the bot, settings such as the delay of the requests can also be set.
However, the information in the robots.txt file should only be viewed as guidelines. Each bot interprets the entries and path information differently, or not all information is supported by every bot. Dubious bots, in particular, do not adhere to these guidelines. Attackers can also use the information from it to obtain information.
Advantages:
- Bots of the well-known search engines largely adhere to the specifications.
- Easy setup as it's just a text file with a simple format.
Disadvantages:
- Not all legitimate bots interpret the data in the same way.
- Not all bots adhere to the guidelines.
- Attackers can use the information to attack certain pages, for example, entries like: Disallow: /login.php oder Disallow: /users/.
Conclusion:
Useful to instruct the crawlers of well-known search engines to include or exclude pages or directories.
Further information can be found at https://www.robotstxt.org and https://datatracker.ietf.org/doc/html/rfc9309.
/sitemap.xml and /sitemap.txt
Sitemaps are intended to inform search engines about URLs on a website that are available for web crawling. It allows webmasters to add additional information to each URL, like when it was last updated, how often it changes, and how important it is in relation to other URLs on the website. This allows search engines to crawl the website more efficiently and find URLs that may be isolated from the rest of the website's content.
In addition to the XML format, the Sitemap protocol also allows a text file with a simple list of URLs, where each URL must be on its own line. For both formats, however, the URL must be specified as an absolute indication, including schema and domain. Relative paths are not allowed.
/.well-known/
This directory can contain many different files for various purposes. These can be registered by IANA and standardized, or freely chosen for your own projects. The files can be used for certain services, projects, or apps. Some are used for security technologies or to exchange specific information. The data is often available as plain text, either without or with a specific format, such as JSON or XML.
In addition to the specific information about the respective services, URLs, email addresses, or other data of interest to attackers, may also be included.
/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/
This directory is used to check the correct configuration of the web server regarding the return of the status code. When calling this directory, the status "Ok" should never be returned. This directory should therefore never exist.
/.well-known/dnt-policy.txt
In this file, the EFF DNT policy can be published to unilaterally commit to a meaningful version of "Do Not Track" so that other software can detect it.
/.well-known/trust.txt
This file is intended for pages with journalistic content that want to disclose their connections with each other - e.g. to other agencies or journalists. In addition to URLs to other websites, the file may also contain social media URLs and email addresses. Permission for AI robots to collect training data can also be entered.
/.well-known/security.txt
The intention behind this file is to make it easier for security researchers to contact a company when they find a vulnerability. A contact is typically provided as an email address or website URL. Additional information such as a PGP key, a URL to the security policy, and an expiration date can be entered. The latter is required by the standard, but is often forgotten or intentionally ignored.
In a security policy, the most important points should be addressed, such as:
- How to report a security problem (form, required information).
- What are security researchers allowed to do and what they are not allowed to do - especially types of attacks and number of accesses.
- What steps are initiated by the site operator and in what time.
- Is there a right to a reward or not.
Advantages:
- Easier to find contact information for security researchers, especially at large companies with many contact persons for different purposes.
- Reference to a security policy to make it easier to find.
- A useful addition if a bug bounty program exists.
Disadvantages:
- The contact information may be of interest to attackers.
- Security researchers expect the contact information to be correct and to respond to messages. This should therefore be ensured and checked regularly.
- The existence of the security.txt file can be understood as an invitation for security researchers to perform excessive tests. This point should definitely be addressed in the security policy. Clear restrictions or the specific permission of certain tests avoid misunderstandings.
- Security researchers may be tempted to report non-existent or exaggerated security issues in order to receive a reward or recognition. A bug bounty program should therefore contain very clear rules - especially regarding the reward.
Conclusion:
For large, well-known companies or if a bug bounty program exists, the use of the security.txt file makes sense. Without a bug bounty program, only the disadvantages remain.
404.php / 404 Error Page
The file "404.php" is often queried by attackers to retrieve the error page for 404 errors (page not found). The file name "404.php" is very popular, but irrelevant to the actual purpose. On 404 error pages - especially on the standard error page of the web server - the version of the web server, the PHP version, or information about the page operator is displayed. This information is useful for attackers. Therefore, the standard error page should be replaced or edited accordingly so that no more technical information is disclosed.
phpinfo.php
A file with this filename is often used to display information about PHP and the web server. It is usually located in the root directory, but can also be located anywhere else. The contents of this file usually consist of just these lines:
<?php phpinfo(); ?>
The function "phpinfo" outputs extensive information such as:
- PHP version and PHP configuration
- Web server version and modules used
- Environment variables
- HTTP header
- MySQL information
Due to the large amount of detailed information about the server, this file should be uploaded to the server with a different name and only when needed. Once the information has been received, it should be deleted immediately. Attackers would benefit greatly from this information.
Were the free content on my website helpful for you?
Support the further free publication with a donation via PayPal.