I like to hone my testing skills by trying different techniques. Sometimes the project I happen to be working on serves well as a sandbox for this, but not always. I also like to write about testing techniques using examples that other people can try. So it’s convenient to have an easily accessible application that I can write about.
I’ve been working on generating test data like long strings and large numbers with the venerable perlclip tool and a partial perlclip port to Ruby that I call testclip. I’m curious what you think about the ethics of testing in each of these real situations below.
1) Sorry, Wikipedia
I was having a discussion with a contact at Wikipedia, and I wanted to illustrate how I use bisection with long strings to isolate a bug. I wanted to find a bug on Wikipedia itself, so I tested its search feature. I considered the risks of testing on their production system – though long strings are fairly likely to find a bug, I couldn’t remember ever seeing them cause a catastrophic failure. So I judged that it was appropriate to continue. I think my contact was aware that I was testing it, but I didn’t explain the risks and he didn’t grant explicit permission.
Wikipedia gave me an ideal example, with a minor failure on a moderately long search string, and a more severe error with a much longer string (I went up to about 10,000 characters). I started writing up my analysis. As I went back to reproduce a few of the failures again, I noticed a new failure mode I hadn’t noticed before. Rather than isolate this new failure, I decided to stop testing. It seemed unlikely that my testing was related to this, but I wanted to make sure.
When I got in touch with my contact at Wikipedia, I found out that I had caused a major worldwide outage in their search feature. I did a lot of reflection after that – I really regretted causing this damage to a production system.
Was it ethical for me to run these tests?
2) Please test my site
I listened in to the virtual STAR East 2016 conference, which had a Test Labs activity that was accessible for virtual participants. I didn’t really understand what the activity was, but I did see that we were invited to test a particular open source application, CrisisCheckin, and report bugs on GitHub. An instance of the server was set up for testing. I used this as motivation to add a feature to testclip to bisect on an integer value in addition to the length of a counterstring.
It was nice to have a test instance of the system. I still considered the possibility that my testing could cause an outage that would affect the other people who were using the test instance. I decided to take the risk. The long strings I tested with made all similar types of data slightly more difficult for all users to read on the page, and in some cases the user interface didn’t provide a way to delete the data, so I did have a small impact on the shared system. I didn’t cause any outages that I was aware of.
There were instructions on GitHub for setting up a local instance of the software, which would be ideal in terms of not interfering with anyone else’s use of the site, but I chose not to take the time to do that.
Would you agree that my testing in this case was ethical?
3) It’s popular, so I’m picking on it
I’m working on writing an example usage of perlclip now, where I chose to pick on the main Google search field. I tested with a search string up to 1000 characters long, which finds a minor bug, but doesn’t seem to affect the availability of the system.
Is it ethical for me to do this testing, and publish something that encourages others to do the same?
A common reaction to these questions I’ve heard is that it’s the responsibility of the owners of the web site to make the site robust, so it’s not my fault if I’m able to do something though the user interface that breaks it. I don’t think it’s that simple.
I perused the Code of Ethics for the Association for Software Testing, and I didn’t see anything that directly addresses this question, though it’s clear on what to do when we do cause harm. At least for example 1 and 3 here, I’m not using these services for the purposes they were intended for. The Terms of Service for Google don’t actually say that I have to use it for the intended purpose. The Wikipedia Terms of Use, though, do talk about testing directly, which is expressly allowed in some situations. This testing is not allowed if it would “…unduly abuse or disrupt our technical systems or networks.” The terms also don’t allow disrupting the site by “placing an undue burden on a Project website.” So clearly it’s bad to cause an outage, but difficult to assess the risk in advance of an outage happening.
It’s much more clear that it’s not okay to conduct security testing without explicit permission. Security testing includes looking for denial of service vulnerabilities. But my intentions for doing long string testing generally aren’t to find vectors for a denial of service attack, even if that’s what happened in one case.
So how much caution is warranted to mitigate the risks of long string testing on production servers?
If the conclusion is that we should never test with long strings in production (at least without permission), then we have to look for safe places to practice our testing skills. Running a personal instance of an application server is one option, but that isn’t easy for everyone to do. Another option is having a public sandbox that we can access, as we have with CrisisCheckin. There are several cases of servers set up for educational purposes, either associated with exercises in a book or with a training class. Many of those, though, are only intended for customers who bought the book or the class. I think I’ll shift my focus to native applications that run locally and are easy to install. My head is in the web so much, I forget that there is such a thing as a local application. 🙂
Danny, I salute you for your concern about ethics, but I think your concern on this issue is misplaced. The server is a public service, and if provide something in public, you are automatically subjecting it to testing.
I test public services all the time. Like any sensible user, before I make use of a public service, I want to do “acceptance testing.”
I only wish more people would do it so these services would become more reliable. In the past few years, I have turned up perhaps a hundred failures in public services. I had another example just a few days ago.
I was trying to use Hootsuite, an app that gives me access to Twitter and a few other sites. I like their service, and have had few troubles with it, but that day, I couldn’t sign on with Safari. I tried the usual workarounds, restarting Safari, then the entire OS. Those didn’t work, so I tried a different browser. Everything went fine with Firefox, so I did my twittering and could have let things go right there. That’s probably what most users would do, but that’s no my style.
I asked myself, “What’s different between Firefox and Safari? On Safari, I couldn’t log in because the login page showed no buttons, unlike Firefox. Carefully observing Safari’s screen when I accessed Hootsuite, I saw a millisecond flash of the missing buttons, which then disappeared before I could click them. I realized that something was removing the buttons, which was similar to the behavior of an ad-blocking add-on I use on Safari, but not on Firefox. To test that idea, I disabled the ad-blocker. Voila! Now the buttons appeared on the Hootsuite logon page. Something Hootsuite was doing made its buttons look like ads to the ad-blocker.
I now had a solution to my problem, a perfect workaround. I could have stopped there, but as a grateful user, I decided to notify Hootsuite so they could correct the situation for their thousands of other users. I searched for a link for reporting problems, but couldn’t find one. Well, that’s a second problem with Hootsuite, so I wanted to report that, too.
I decided to report these failures, so I sent my report to the Support link, telling them I hoped they would see that the right people received the report. Shortly thereafter, I received an extremely polite and grateful reply, saying the support people would pass on the report to the right people. This time, my test showed a feature, not a failure. I was not a feature of the software, but a feature of the Hootsuite organization. Their website might have a flaw or two, but their organization was delightfully responsive—and I reported that feature, too.
So, in the end, like any good tester, I had tested both their software and their organization, finding and reporting both favorable and unfavorable information. If they choose to, Hootsuite can now use that information to improve their service for thousands of other users, but they don’t have to do anything. I fail to see any ethical problem in what I did.
Ah, but what if my tests had somehow crashed their service? That might have affected current users, but it was not my testing, but their system, that made that crash possible. Presumably, it would have been in their power to prevent that crash, and to prevent it in the future now that they knew about it.
What if my tests were “unreasonable,” such as entering a 10K-long password? Well, as you know, if your system is public, open to millions of potential users, there’s not such thing as “unreasonable.” Somewhere, someone will do almost anything you can imagine. If you’re going to play in the public arena, you’d better be prepared for such things. That’s just part of the cost of doing business with the public.
Jerry, I agree that acceptance testing is reasonable, as is troubleshooting when something isn’t working properly.
And I agree, from the point of view of the service providers, that they need to be prepared for any kind of input. The example I like to use is a user who copies a large amount of data to the clipboard (like a long Word document), then attempts to copy some short string but fails, then accidentally pastes the long document into an input field, perhaps pressing Enter before noticing the mistake.
But one could argue that I violated the Wikipedia terms of service. While it served my interests to do the testing I was doing, it wasn’t serving the purpose it was designed for nor troubleshooting anything I actually needed the software to do. I still think I was in a gray area between normal usage and penetration testing.