That question was recently posed on Stackoverflow and I tried to answer it as follows:
A banner is simply metadata about a service. It can contain whatever information you decide it should contain. Shodan decided that for VNC it includes information about whether the service has authentication disabled. For HTTP it means headers, for FTP it means welcome string and results from running a few commands, etc. You can generate a banner for any service but the content will be different based on who generated the banner and the type of service.
A banner is the fundamental unit of data for the Shodan crawlers and we've worked a lot over the past decade on making them as descriptive as possible. Everybody has their own ideas though of what a banner contains so I wanted to mention my thoughts and how banners at Shodan have changed over the years.
In the early days of Shodan, I started collecting data by simply connecting to a service and using as the banner whatever information the service returns. In the case of HTTP that would be the headers or for Telnet the welcome/ login message. When I added FTP crawling I noticed that it's not always easy to know what software is running based solely on the welcome message so I started seeing whether there's additional context I could provide. In the case of FTP, I realized that we could be sending a HELP request to get a list of available commands that the server supports. The order and availability of commands created a unique fingerprint that could help identify the FTP software (ex. using k-means clustering). And it could also be used to detect honeypots that technically support the same commands as a regular FTP server but order the commands slightly differently. The same logic of the unique order of FTP HELP commands was eventually applied to HTTP headers and a few other protocols to aid Shodan develop fingerprints. This sort of pattern recognition using dynamic programming was already familiar to me from my background in bioinformatics and I applied it to Internet crawling data to gain further insights about product usage. Keep in mind that the original use case for Shodan was market intelligence. I still think that using the unique order of data remains a clever technique for grouping results and identifying outliers but over time we expanded the metadata collection and the actual order of commands/ headers became less important as a result. For example, if you look at a modern HTTP banner in Shodan you can find the following information:
- HTTP headers
- HTML
- Robots.txt
- Sitemap.xml
- Security.txt
- Favicon
- Screenshot
- List of web technologies that the website uses (ex. Angular, Bootstrap, etc.)
- Intermediate data from redirects (also supports META redirects)
And the Shodan crawlers are hostname-aware so they will send a proper hostname if necessary. If you're not sure what a banner contains in Shodan then please check out the Raw Data section on our new beta website:
It gives you a full breakdown of all the properties that are available for the various protocols. The website has continued to show similar information to what we gathered 10 years ago but if you dig behind the scenes we've made significant improvements to understand protocols and provide you with greater context.