This article describes how to run a load test on a Katta cluster.
To execute a load test you will need to do the following
- start a (search) Katta cluster (Katta master and n Katta nodes)
- start a second (loadtest) Katta cluster
- deploy one or multiple indices on the search cluster
- create a query file (one query per line) for the indices and copy it to the loadtest master
Starting the Search Katta Cluster
Starting the Loadtest Katta Cluster
The load test cluster is are very similar to the ordinary Katta search cluster. Both clusters should share the same zookeeper but should have different zk root pathes. To achieve that configure katta.zk.properties as follow :
- set zookeeper.embedded in to false
- set zookeeper.servers to the same zk server as the search cluster uses
- change zookeeper.root-path, f.e to /katta-loadtest
Deploy Test Index On The Search Cluster
See How to create a katta index for more information.
Run Load Test From The Loadtest Cluster
Go to the master of the Katta loadtest cluster and run the following command:
sh bin/katta loadtest <zkRootPath> <nodeCount> <startQueryRate> <endQueryRate> <rateStep> <durationPerIteration> <indexName> <queryFile> <resultFolder> <typeWithParameters>
- zkRootPath - The zookeeper root path of the search cluster
- nodeCount - The number of test search clients that you want to use for this test (usually the number of test search clients that you have started). If this number if greater than the number of running test search clients, starting the load test will be deferred until the requested number of test search clients is available.
- startQueryRate - The query rate (in queries per second) that will be used at the beginning of the load test.
- endQueryRate - The query rate (in queries per second) that will be used at the end of the load test.
- stepRate - The query rate will be increased by this value from iteration to iteration.
- durationPerIteration - The period of time (in milliseconds) that the query rate will not be changed. After this time the query rate will be increased by the step value and again kept constant for this period of time. This is done until the maximum query rate has been reached.
- indexName - The name of the index that should be searched. Can be “*” for searching all deployed indices.
- queryFile - A text file with one query per line. Queries will be picked from this file during load tests.
- resultFolder - The folder where the 2 resultfiles are stored in. This will be a local folder on the host of the master of the loadtest cluster
- typeWithParameters - which search implementation should be tested (lucene <maxHits> or mapfile)
After running the tests 2 result files will be available at the load test master in the specified result folder:
Running a Load Test On EC2
See running-katta-on-ec2. You simply have to boot 2 clusters.
This file contains the combined results from all test search clients. It looks something like this:
A B C D E F
20 12.4 17 0 25.10 77.04
30 31.5 62 0 5.45 2.45
40 41.0 41 0 5.03 4.52
A <requested query rate>
B <achieved query rate>
C <successful queries>
D <failed queries>
E <average query time in ms>
F <standard deviation of query time>
Please note that in this example the first row contains the biggest standard deviation of query times. This is probably due to the warm up time of the Katta cluster that affect the first queries. We will work on this later.
This file logs each query that was executed. It looks like this:
A B C D E F
20 host:20001 1244469344067 1244469344263 196 text:katta
20 host:20001 1244469343971 1244469344265 294 text:katta
20 host:20001 1244469343892 1244469344268 376 text:katta
20 host:20001 1244469343873 1244469344271 398 text:katta
20 host:20001 1244469343929 1244469344271 342 text:katta
20 host:20001 1244469343814 1244469344272 458 text:katta
20 host:20001 1244469344050 1244469344273 223 text:katta
20 host:20001 1244469344277 1244469344289 12 text:katta
20 host:20001 1244469344285 1244469344294 9 text:katta
20 host:20001 1244469344376 1244469344391 15 text:katta
A <requested query rate>
B <test search client id>
C <start time of query>
D <end time of query>
E <elapsed time>