The status quo for User Experience (UX) is to test a website with only 5 users. We keep getting asked why – if this is true – do you need to use a tool like Webnographer to test with more users? The perceived wisdom is that 5 users will find 85% of the site’s usability issues. Without addressing issues like, how the context of the lab effects the users behavior, here is our answer. We argue that the idea that you only need 5 users is very old, and needs updating.
The underlying assumption
Jakob Nielsen, a Usability guru, 21 years ago wrote a paper that said that you only need to test with 5 users to discover 85% of your usability issues . The claim is based on the formula N (1-(1- .31 ) n ). The formula is simple. Nearly every UX Consultant quotes it. But the formula makes a huge assumption in that usability issues have a visibility of 31%, which means that the usability issues affect at least 31% of your users. The question is does this still hold up today after so many years? In the time since Nielsen discovered the “formula” both the technology, and the people using software has changed dramatically. I will provide evidence that we need to revisit the constants used in the formula, and show that we need to be testing with more than 5 users.
Back in 1993, when the formula was first introduced, the World Wide Web was only one year old. Nowadays, we take for granted that if you need to book a flight you just visit Expedia, Kayak, or the airline’s website. Back then if you needed to book a flight you had to go to a travel agent, who had months of training to use the Computer Reservation system. Windows 3.1 was also just released, which was the first popular Windows operating system. Before then most people used DOS, they interacted with the computer through a black screen. To just copy a file you needed to remember and type a series of commands.
In these past 21 years the profession of the Interactive Designer has been created. It was in 1994 that Carnegie Mellon University starts offering a degree in Interaction Design , followed by other universities around the world. In this time period, we have learned a lot more about what works and what does not in designing interfaces. Since then, computer systems have both become easier to use, as well as people becoming better at using them. What this means is that the obvious and big issues have probably already been discovered. Designers know about the obvious mistakes.
Since the Journal Article was written, usage of computer systems has exploded. Only 3 million people bought Windows 3.1 in the first 3 months in 1993. Recently Windows 8 sold 60 million copies in 2 months . Back then, a popular application would sell in the thousands mainly in the USA or Europe. Now it is common for a website to be used by millions of people around the world. It is easy to dismiss the idea that 5 users in Texas can predict how users in England would use a website. In fact, Nielsen on his website describing the method goes on to say that “You need to test additional users when a website has several highly distinct groups of users”. Nielsen does recommend to test 3 users from each category . How many different distinct groups does a modern website have? Is it easier just to take a sample of users from the site, or it easier to try to categorize users into different groups? Even small websites get 10′s of thousands of users a month. Therefore, even an issue that affects 10% or 20% of users impacts 1,000′s of users. Though the issues are getting smaller, the number of people experiencing them is increasing.
The discoverability gap
Nielsen’s assumption is that the smallest issue will affect at least 31% of users. Using his formula, the chart below shows the likelihood of an issue being discovered by testing with 5 or 80 people. And there is a large discoverability gap between both. A test with 5 users only discovers 85% of issues that affect at least 31% of your users. This means that 5 users only identify major issues, and it leads to smaller but still important issues being overlooked. In contrast, testing with 80 users will discover 98% of issues with a visibility as low as 5%.
It may be easy to think that a 5% issue is not that important. However, a 5% issue on a website with 100.000 users, means 5000 users struggling to find information, or being unable to complete a purchase. This is ultimately a large amount of revenue lost. Additionally, data collected in the last 2 years through Remote Usability Testing backs up the importance of fixing smaller issues. It shows in the chart below that the majority of issues have a low probability of a user failing in any one place, and that issues are widely distributed across the sites. 56% of all issues have a visibility of less than 31% (they affect less than 31% of visitors). In practical terms, this means that because 5 users can only reliability find issues with a probability of more than 31%, they leave 56% of all issues undiscovered.
Occurrence versus frequency
It is worth pointing out that Nielsen’s formula is about occurrence of an issue. In other words, does an issue exist or not. For example, does a haystack have a needle in it. It does not estimate the frequency of an issue. In other words, how many needles does the haystack contain. We will deal with estimating frequency in another article. Far more users are needed to estimate frequency of an issue with a low margin of error, than to discover if the issues exist. Quantifying an issue is important in that, as we are faced with more issues, we need help in prioritizing them.
For finding big issues early on in the development process, methods like Guerrilla UX testing can make sense, but for production other methods need to be coincided. When Nielsen came up with the formula testing with many users was expensive, nowadays with technology like Webnographer, testing with more users is cheaper than testing in the lab. More people are using technology and are experiencing more varied issues than 20 years ago. We now have to tools for finding smaller issues, so we should use them. Do get in touch if you have any questions on how to test with five or five thousand users.
» 23 Remote Usability Methods Webnographer
» Discovering WHY from numbers Webnographer
» 5 Reasons You Should And Should Not Test With 5 Users MeasuringUsability
 Nielsen, Jakob, and Landauer, Thomas K.: “A mathematical model of the finding of usability problems,”Proceedings of ACM INTERCHI’93 Conference (Amsterdam, The Netherlands, 24-29 April 1993), pp. 206-213.
People Mosaik, DOS Photo, Mac OS X Mavericks