This install and configuration guide is about installing Katta on a cluster of several servers. To try out Katta on a local machine please see Getting started.

Since all katta nodes needs access to an added index you should have an shared or distributed file system like the hadoop dfs ready. Also there must be the same user on each machine.
To ease the configuration and the management of the katta cluster, the setup of passphraseless ssh between the servers (node->master, master->node) would be helpful, also not required.

Install Katta

Either download the latest katta distribution or build a distribution from source.
Unpack the distribution on all of your servers into the same directory.

Configure Katta on the master

Log in to the server that should be the Katta master. There are a bunch of configuration files in the conf directory of your katta master installation.
Most of these files have default values which should be ok for you, but a little configuration is necessary.

conf/masters:
(required) configure the master (replace ‘localhost’ with the hostname which is reachable from the nodes)
server0
conf/nodes:
(required) configure all nodes (every line in this file should be the hostname a node):
server1
server2
server3

For Zookeeper-related properties edit conf/katta.zk.propeties:
Zookeeper can be started embedded with a katta master or run standalone. We strongly encourage to have a external zookeeper running, otherwise there is no katta-master failover.

# comma serperated list of host:port that should run a zookeeper server, # make sure you use hostnames and not ip addresses
zookeeper.servers=<server0>:2181
zookeeper.embedded=true
...

Note: If you change this file all changes are propagated to the nodes automatically in case passphraseless ssh is set up and KATTA_MASTER is enabled in the conf/katta-env.sh script. If this is not the case, you have to change this file on each node.

conf/katta-env.sh:
(required) configure JAVA_HOME
(optional) configure KATTA_MASTER in order to propagate node configuration changes to the nodes
# Set Katta-specific environment variables here.
...
# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/j2sdk1.5-sun
...
# host:path where katta code should be rsync'd from. Unset by default.
export KATTA_MASTER=server0:/home/$USER/katta-distribution
...

Note: If you change this file all changes are propagated to the nodes automatically in case passphraseless ssh is set up and KATTA_MASTER is enabled in the conf/katta-env.sh script. If this is not the case, you have to change this file on each node.

conf/katta.node.properties:
(no change required)

...

Note: If you change this file all changes are propagated to the nodes automatically in case passphraseless ssh is set up and KATTA_MASTER is enabled in the conf/katta-env.sh script. If this is not the case, you have to change this file on each node.

Starting and Stopping Katta

To start Katta you have to be in the Katta distribution directory and to execute bin/start-all.sh.
To stop Katta you have to execute bin/stop-all.sh.

Steering Katta

In order to take control over Katta there is one main script bin/katta (next to the java API) .
This script contains following (and more) commands:

Command What it does
listNodes Prints out a list of all nodes.
listIndices Prints out a list of all indexes.
showStructure Prints out a hierarchical view of the Katta system.
check Prints out useful deployment information.
version Prints the katta version
addIndex <index name> <path to index> [<replication level>] Deploys an index
removeIndex <index name> Undeploys an index
redeployIndex <index name> Redeploys an index
listErrors <index name> Prints all errors of an index
search <index name> <query> Searches in the specified index with the specified lucene query.

The next sections will contain steps how to deploy and search an index with this script.

Deploy an index

For deploying an index you have to execute addIndex.

bin/katta addIndex <name of index> [file:///<path to index>|hdfs://<server name>/<path to index>] <replication level>

An example is:

bin/katta addIndex demo hdfs://server0:9000/testIndexes/demo_large 3
This deploys an index called demo from a hadoop filesystem.

Search an index

For searching an index you have to execute search.

bin/katta search <index name>[,<index name>,...] “<query>” [count]

An example is:

bin/katta search demo "hadoop" 20

The index demo will be searched with the query hadoop. At the maximum 20 results will be returned.

Setup a second Katta master

Go to any node or copy the katta distribution from one node to another host.
Set zookeeper.embedded to false in conf/katta.zk.properties.
Call bin/katta startMaster or bin/katta-daemnon.sh start startMaster