How to Create a Search Script for Load Testing

 

Many websites offer a search capability. Whether the search is a product search, a search for site-specific data or a website find-my-page search, this functionality is typically highly visible. Search capabilities generally rely on specific back-end components that have distinct scaling properties verses most other parts of a website. For example, a full-text search might rely on Apache Solr, ElasticSearch, or a product-search might rely on a properly indexed database table.

 

Isolation

To isolate search functionality, a search script should touch as few other pages as possible. If any extraneous pages are included, they should be chosen to be the least impactful from a resource perspective. Because these components are unique to these functions, search should first be scripted in isolation, as it’s scaling properties are likely to be different than those of the rest of the website.

On the same token, scripts other than the login script should touch the login functionality as seldomly as possible, and should not be logging in and out with each iteration. Scripts that login and out on each iteration overweight and overstress the authentication components. If the search functionality can be tested without authenticating, that is the preferred as it provides the most targeted way to test search functionality.

Isolating the authentication functionality achieves several things

For many sites, the search script (in pseudocode) is simply

 

[only if required] authenticate
visit page_with_search

start loop
post search_terms
verify page_items
end loop

 

Verification

A search script should also verify that the returned search results.  This lets you confirm the results come back and rather than displaying an error. The easiest way to verify search results is to check the page for the product name or description.

 

Databanking and Caching

At the application-level, search results are often cached. As a result, to make a search script’s load characteristics realistic, it needs to search with many different terms. So when writing a search script, it must be databanked (aka data-driven) with enough records to simulate normal usage, and to prevent the application from surviving off cached records, which will skew your results with a false impression of scalability.