ExoticSilicon.com - Design studio part three

Use C.

Seriously, we wouldn’t even consider writing back-end code for a website CMS in anything other than C, and we certainly wouldn’t use any of the interpreted languages that have sprung up in the last couple of decades and positioned themselves as the de facto go-to standards for web development.

The reasons are simple, and basically the same reasons that we do all of our other software development in C. We want the best performance from the finished code, and we want the highest level of control over how it works. This will always come from using a compiled language with manual memory management.

We don’t want to rely on third party interpreters in our software stack that could include as-yet-undiscovered exploits. This is true for all of our software development, and we see absolutely no reason why back-end website code should be a special case. If anything, code which is actively exposed to operation from a public interface is even more vulnerable to such attacks.

We have more experience writing C than just about any other language, so why wouldn't we use it?

Even with the best website developers on your team, if the code is developed and delivered in a high-level interpreted language your project simply won’t be as fast or efficient as a highly optimised implementation in pure C could be. Your hardware requirements for any particular workload will likely be higher, causing an on-going business expense.

Of course, learning to write good C code obviously has a much steeper learning curve than learning a typical web-oriented scripting language. An enthusiastic amateur with little or no programming experience can pick up the basics of web development in PHP within a week or two, and pass themselves off as a freelancer to earn a bit of extra income - but is that who you want to trust your business reputation to?

Web development in C is harder, but it can be done and there are professionals doing it with great results.

By now, all web traffic on the public internet should be delivered over an encrypted connection using https. No excuses.

The days when it was acceptable to serve everything except payment and billing pages over plain http and switch to https for the collection of sensitive information are long gone.

It’s not even just about privacy. Unencrypted connections can not only be observed by a third party, but modified too. Almost all modern web browsers have a long history of vulnerabilities, and so by serving your content over plain unencrypted http you’re giving any third party who has access to the intermediate network an easy opportunity to inject malicious code into it. By not embracing https, you're opening a door for third parties to compromise your customers' computers.

Domain validated certificates can now be obtained at no cost, and are perfectly adequate for the vast majority of web serving applications. Configuring HSTS will further ensure that returning visitors fetch your pages directly over https even if they enter a plain http address. HTTP Strict Transport Security is widely supported and trivial to set up, so there is little reason not to use it.

Be organised and keep your webserver certificates up to date. An expired certificate looks very unprofessional.

Although not strictly a web design issue, don’t forget to use https links in HTML e-mails, even if you do have HSTS configured. Small details like this help to demonstrate that you are a forward-thinking business with a grip on IT issues.

If you have good IPv4 connectivity, this might seem completely un-necessary. After all, very few users indeed worldwide are on IPv6 only connections.

However, IPv6 can give significant speed and latency improvements, especially for clients who are connecting via mobile and cellular networks which are often IPv6 native.

If you’re already dual-stacked and think that you can smugly skip over this advice, quickly check whether your webserver is really accessible purely via IPv6 connectivity. Don’t be surprised if it isn’t! We’ve seen many instances where the authoritative DNS for a domain hosted on an IPv6 capable server was badly configured and only reliably serving records over IPv4. This won't prevent dual-stacked users connecting to your origin server over IPv6, as the DNS reply will still contain the necessary AAAA records, but why not fix the problem now and future-proof your system?

Designing a website to appeal to a machine, an algorithm, rather than your intended audience is a dubious strategy. It comes at a price, and re-investing the resources you’re using to chase that high placement in the search results into simply making your webpages more appealing for your target audience, might be much more rewarding than you realise.

Back in the early days of the world wide web, when the number of websites on-line globally was in the tens or perhaps hundreds of thousands, SEO was very easy indeed. You could literally just stuff seemingly random words at the ends of your pages that you thought might be in user’s search queries. Search engine technology was very primitive by today’s standards, and did little more than a simple tally of the number of times that the search terms matched your page content.

Times have obviously changed. Not only has the technology changed, but critically, business models have changed too. Running a search engine is no longer about technical prowess, a project you might set up on a university server in your spare time. It’s driven by profit.

It would be very naïve to think that the advice you receive about webpage design from the search engines is primarily in your interest.

To take just one example, we hear all the time about speed, speed, and more speed, that you need to make your pages load more quickly to ensure that people don’t click away before they’ve seen your content.

Of course, there is definitely an element of truth to this - after all nobody likes a slow website. But the way that you go about achieving this can have a big bearing on the browsing experience for real users.

One of the best ways that we can think of to approach it would be to strip out all of the bloat caused by banner-adverts, analytics code, and captchas. Especially considering that these page elements are usually hosted on other domains and therefore require extra DNS lookups, possibly blocking the rendering of your page until they’ve been resolved.

Instead, we see countless examples of website owners compressing their graphical content to, and often beyond, the point where the visual quality suffers. Product images that previously gave a clear view of the item in question, are now fuzzy and full of artifacts. To a user looking for a specific item amoungst several similar products, it might be important to zoom in and have a look at what is written on the box, but instead we see webmasters focused on delivering the page content as quickly as possible.

Of course, this does matter very much for a search engine that needs to connect to hundreds of thousands of other servers every day, downloading and indexing the content. A few bytes saved on each transfer really adds up, and makes their business more efficient, but does it benefit you?

One of the most crass examples of putting the website owners’ interests in second place that we’ve ever seen, is having the option to download so-called ‘optimised’ image resources automatically after your site has been ‘analysed’ by a popular search engine.

Anybody who has worked seriously with digital media will know that the best image quality verses filesize will come from re-compressing the original image files. Put simply, if you have an image on your website which is, say, 100 Kb compressed, and you want to reduce that to a size of 50 Kb to increase the speed of delivery, then for the same 50 Kb filesize, you will get better visual quality by preparing it working directly from the original, rather than from the intermediate 100 Kb compressed version.

Accepting the download of the so called automatically ‘optimised’ images, whether out of laziness or ignorance, leaves you with a lower quality image on your website than you needed to have for the same eventual filesize. Who exactly is this optimal for? The user browsing your site, who sees a fuzzy blur instead of a sharp photograph? The website owner who now has potential customers who can’t look at the products in as much detail? Or the search engine, which has little or no interest in the image quality, as long as it can be downloaded, indexed, archived, analysed and used how they see fit, as quickly as possible?

Does a search engine listing really matter much at all these days? Other ways exist for users to find your content. If a good ranking on your search engine of choice comes for free, (which it may well do if you follow our other advice), then it's fine as an additional benefit. But if you are absolutely relying on a high percentage or even all of your users coming to your site from a search engine, then you’re taking a risk. Even if you hit that elusive top spot today, it could be gone tomorrow, along with your website traffic.

For a long time, we used to say, "However you arrived here at exoticsilicon.com reading this page, it probably wasn’t from a search engine, as we don’t encourage or even permit our site to be crawled and indexed". This is no longer true, as we changed our policy due to various search engines indexing our pages without even crawling them or seeing their content, (unless they did so using a non-standard user-agent string that we are unaware of). This was likely because of the populatity of the material we publish and large numbers of inbound links to us. We also noticed that unauthorised copies of our content hosted elsewhere were being crawled and indexed by some search engines, so we changed our policy to allow certain crawlers. However, our robots.txt still contains a deny by default rule intended for the vast majority of them.

Ideally, your website should be sufficiently clear and well organised that your visitors are able to go directly to the information they’re looking for, without relying on a search-within-website function. However in some cases, especially if you have a large site or a site that contains a lot of technical terms and part numbers, a website search option might be beneficial.

In this case, absolutely don’t just create a link to an external search engine with pre-set options to search your domain! Especially, don’t lead the user into performing a form submission which looks as if it’s internal to your site, but which really just submits their query to an external search engine.

A sizable minority of users might have an aversion to the particular search engine that you’ve chosen, or simply have a preference for a different one if they are going to use an external search facility at all. It can be a very jolting experience to click on a submit button and suddenly have the consistent style and branding of the website a user was just looking at, disappear and be replaced with the generic white background of your search engine of choice.

Aside from this, by redirecting your users to an external search engine, you’ve just pushed them closer to your competitors. Most likely, the option to search, ‘just this domain’, or ‘the whole web’, is now right in front of their eyes. The search engine doesn’t care if they click away from your site, but you probably do.

Above all, though, implementing your own site search facility simply looks more professional. It also gives you the chance to add functionality to the search algorithm that is specific to the data you’re hosting. Imagine for a moment that you have an on-line catalogue of parts with model numbers. These may be in a mixture of old and new formats, with a determined way to convert between the two, or alternatively there may be optional prefixes and suffixes indicating products that differ only in color or region of sale. By implementing your own website search code, you can allow users to search intelligently for specific subsets of a part number and get appropriate results. An generic external search engine won’t have the knowledge of what the individual characters of the part number mean, so it will be limited, at best, to returning what it thinks are close matches. Unlike English words, where a difference of one character is often a simple typo, a part number ending in an ‘S’ instead of a ‘Z’, might be a completely different item.

Unless you take a moment to block such unwanted webcrawlers, sooner or later the content on your website will be automatically crawled, archived, and made available to anybody who wants to see what it looked like at some random time in the past.

Presumably, having spent significant time, effort, and resources creating your material, you would prefer your potential website visitors to get it directly from you, rather than via a third party. If you allow your webpages to be crawled and archived in this way, it’s much more difficult to ensure that this happens, especially once other websites start to place links to your content as it is in the third party archive, rather than to your actual website. Some potential customers following such a link might even be completely unaware that they are not visiting you directly.

To make matters worse, you have little or no control over exactly when your content is archived. It could be just before a major website update or, even worse, right in the middle of it. Unless your website consists entirely of static pages, the value of such an archived copy as a form of off-site backup is also very doubtful. However, if you ever allow your domain to expire, either intentionally or unintentionally, or somehow lose control of it, these archived copies of your content can be used by anybody to at least partially re-create it how it used to be, but now under their full control.

Create your own archives section for old material, and be diligent about backing up your data yourself. Block third party archiving crawlers to the maximum extent possible.

Unfortunately, some material that appears without authorisation in so called 'archives' of web content is scraped without any respect for exclusion directives in robots.txt, and using techniques such as diverse IP addresses from various subnets. Whilst this complicates the process of identifying and excluding such requests from your webserver, doing so is usually still possible and worthwhile.