Index JanusGraph Database

2 minute read

Lifecycle

States (SchemaStatus)

States Description
INSTALLED The index is installed in the system but not yet registered with all instances in the cluster
REGISTERED The index is registered with all instances in the cluster but not (yet) enabled
ENABLED The index is enabled and in use
DISABLED The index is disabled and no longer in use

Actions (SchemaAction)

SchemaAction Description
REGISTER_INDEX Registers the index with all instances in the graph cluster. After an index is installed, it must be registered with all graph instances
REINDEX Re-builds the index from the graph
ENABLE_INDEX Enables the index so that it can be used by the query processing engine. An index must be registered before it can be enabled.
DISABLE_INDEX Disables the index in the graph so that it is no longer used.
REMOVE_INDEX Removes the index from the graph (optional operation). Only on composite index.

Scripts to index the graph database

First you have to kill the gremlin server and change some parameters in the server script

ps aux | grep gremlin # finds the gramlin process id
sudo kill -9 8532

vi conf/gremlin-server/gremlin-server.yaml
# then change scriptEvaluationTimeout -> scriptEvaluationTimeout: 300000 
# save and exit

# run the gramlin server again
cd janusgraph-0.4.0-hadoop2 # change directory to JanusGraph download location
sudo bin/gremlin-server.sh conf/gremlin-server/gremlin-server.yaml &
# make sure to run the command with & sign so that you can exit from the the command without killing it

Login to the gremlin console after restarting the server

sudo ./bin/gremlin.sh

# run inside the console to connect with the remote gremlin server
:remote connect tinkerpop.server conf/remote.yaml session
:remote console

Let’s assume the database is empty. Now we have to first create a node for indexing. Let’s create an account node

g.addV('Account').property(single, 'id', '11111111').next()
g.tx().commit()

Check for open transactions and close them

graph.getOpenTransactions()
// to remove any open transactions <- remove all open transactions by repeating the command
graph.getOpenTransactions().getAt(0).rollback()

Check for any open sessions and close them

mgmt = graph.openManagement()
mgmt.getOpenInstances()
// close all instances except the current instance
mgmt.forceCloseInstance('<_id_>')
mgmt.commit()

Printing schemas – this helps to view the internal organization of the database

mgmt = graph.openManagement()
mgmt.printSchema()
mgmt.commit()

Now let’s register the index with the graph. This commands adds an index with both the vertex label and a key property

graph.tx().rollback()  //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
accountId = mgmt.getPropertyKey('id')
account = mgmt.getVertexLabel('Account')
mgmt.buildIndex('byAccountIdAndLabel', Vertex.class).addKey(accountId).indexOnly(account).buildCompositeIndex()
mgmt.commit()

Now if you look at the schemas, you will see a Vertex Index Name by ‘byAccountIdAndLabel’ in the REGISTERED state

//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byAccountIdAndLabel').call()

As the final step reindex the data in the graph

//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byAccountIdAndLabel"), SchemaAction.REINDEX).get()
mgmt.commit()

Now if you do a print schema command, you will see that the ‘byAccountIdAndLabel’ index status is turned to ENABLED

Troubleshooting

If you are unsuccessful in executing this procedure due to some interruption and the index gets stuck at the INSTALLED state there is a workaround to ENABLE the index

//Step1: Clear all transactions
graph.getOpenTransactions()
graph.getOpenTransactions().getAt(0).rollback()

//Step2: Clear all management instances
mgmt=graph.openManagement()
mgmt.getOpenInstances() 

//After doing step 1 and 2
//Step3: Force change from Installed to Registered
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byAccountIdAndLabel"), SchemaAction.REGISTER_INDEX).get()
mgmt.commit()

//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byAccountIdAndLabel').call()

//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byAccountIdAndLabel"), SchemaAction.REINDEX).get()
mgmt.commit()

References

  • https://docs.janusgraph.org/index-management/index-performance/
  • https://groups.google.com/forum/#!topic/janusgraph-users/E0speKxetgM

Categories:

Updated: