General

G.V() 3.6.26: Cypher Support for Amazon Neptune & PuppyGraph, and more!

Introduction

G.V() 3.6.26 is out and brings massive changes to our software. For the first time since we’ve released G.V(), we are now expanding our support to new graph database query languages, starting with openCypher for Amazon Neptune & PuppyGraph. This new version is the first of a series of releases that will expand the reach of our graph database client beyond the Apache TinkerPop ecosystem.

We’ll also cover some recent 3.x release improvements that were not covered in previous announcements.

A quick recap of the 3.x releases

G.V() is changing rapidly – to support that evolution, we’ve undertaken substantial behind the scene work which was released over the Summer as part of our 3.0 update. You currently know our software to be a desktop-only executable. We will maintain this deployment model, but also introduce a new Docker version in the near future. Such has been the purpose of the 3.0 release.

That’s not all that’s changed however, so here’s a quick fire round of improvements we’ve added:

Improved application performance and reduced memory footprint
Improved query editor look and feel, and autocomplete engine accuracy
Updated JSON output format to be more inline with various database provider output formats

JSON output display of a query run on Azure Cosmos DB with G.V() — The Azure Cosmos DB JSON Output now matches the output returned by the database itself for better consistency

Cypher support for Amazon Neptune & PuppyGraph

Cypher is one of the most popular graph query languages out there. It’s a declarative query language initially developed by Neo4j that shares strong similarities with SQL, arguably the best known database query language of all. Cypher has branched off to an open-source implementation called openCypher which many graph database providers have adopted either as their primary language or to complement their main language.

We’ve set out to bring openCypher support to G.V(), giving users more flexibility in how they can query their database. Today, we offer this support for Amazon Neptune & PuppyGraph which both offer an openCypher API concurrent to their Gremlin API.

Simply put, you can now write and run Cypher queries on G.V() just the same way we’ve supported it so far for the Gremlin query language, without any additional configuration required. Download the latest version of G.V() and you’re good to go!

Our query editor now offers the option to choose between Cypher and Gremlin for databases that support it. You can seamlessly switch between both languages, access the same advanced auto completion features we offer for Gremlin, and visualize your query results the same way you’ve been able to so far for Gremlin.

opencypher query support on G.V() with autocomplete — Running a Cypher query on Amazon Neptune using G.V(), with autocomplete and advanced results visualization

Graph view improvements

We’ve added some quality of life features and improvements to our graph view, to provide a better, more versatile experience to users.

Centering of the camera to the layout applied on the graph is now more accurate, and ensures that the optimal zoom level is applied no matter the size of the graph. The animations applied to reposition nodes after layout are smoother, and the graph can be rotated 90 degrees in any direction to give you more flexibility on how you want it displayed.
We’ve also introduced a new horizontal tree layout which is best suited to hierarchical data structures.

running graph layouts on G.V() — Running layouts and positioning the graph now runs much smoother

Query Editor changes

Aside from the ability to switch between Cypher and Gremlin query languages whilst writing queries, you can now also specify a query timeout to ensure that your query does not exceed a threshold of your choice.

We’ve also updated the look and feel of the UI to provide the same useful information in a more compact format, creating more space on screen to write complex queries.

What’s next for G.V()?

With G.V() 3.x, we’re embarking on the next stage of growth for our software. We will continue to expand support further to new graph database providers, starting with Neo4j’s AuraDB, Desktop and self-hosted editions. Once that support is released, we will be turning our attention to ISO GQL (Graph Query Language) with the aim to provide the first fully featured graph database client for GQL.

Other Cypher-enabled graph database providers will be progressively added to the roster of available technologies in G.V(), such as Memgraph. If there’s a graph database provider you’re specifically interested in seeing in G.V() (or if you work on a database you’d like to see us support!) give us a holler at support@gdotv.com. Our ultimate goal is for G.V() to be the only graph database client you’ll ever need.

We’re not just looking to expand compatibility to other databases – a crucial goal as part of the 3.x release was to make G.V() deployable not just as a desktop executable, but also as a fully fledged web application using Docker. We will initially launch the web version of our software on AWS Marketplace in the coming months so that you and your team can collaborate directly on a single deployment of our software. Stay tuned for more news early next year.

Turning Relational Data Into Graph Visualizations with PuppyGraph and G.V()

Arthur General, PuppyGraph

In this article we’ll showcase a first of its kind Graph analytics engine that transform and unify your relational data stores into a highly scalable and low-latency graph. I present to you: PuppyGraph!

Introduction

This is going to be a part-tutorial, part technical deep dive into this unique technology. By the end of this article you will have your own PuppyGraph Docker container running with a sample set of data loaded for you to explore and interact with using G.V(), or PuppyGraph’s own querying tools. Best part is, this is all free to use and will only take a few minutes to setup. Let’s go!

What is PuppyGraph?

PuppyGraph is a deployable Graph Analytic Engine that aggregates disparate relational data stores into a queryable graph, with zero ETL (Extract, Transform, Load) requirements. It’s plug and play by nature, requiring very little setup or learning: deploy it as a Docker container or AWS AMI, configure your data source and data schema, and you’ve got yourself a fully functional Graph Analytics Engine.

The list of supported data sources is long and growing. At time of writing PuppyGraph supports all of the below sources:

Data Sources supported by puppygraph

PuppyGraph’s unique selling point is to deliver all the benefits of a traditional graph database deployments without any of the challenges:

Complex ETL: Graph databases require building time-consuming ETL pipelines with specialized knowledge, delaying data readiness and posing failure risks.
Scaling challenges: Increasing nodes and edges complicate scaling due to higher computational demands and challenges in horizontal scaling. The interconnected nature of graph data means that adding more hardware does not always translate to linear performance improvements. In fact, it often necessitates a rethinking of the graph model or using more sophisticated scaling techniques.
Performance difficulties: Traditional graph databases can take hours to run multi-hop queries and struggle beyond 100GB of data.
Specialized graph-modeling knowledge requirements: Using graph databases demands a foundational understanding of mapping graph theory and logical modeling to an optimal physical data layouts or index. Given that graph databases are less commonly encountered for many engineers compared to relational databases, this lower exposure can act as a considerable barrier to implementing an optimal solution with a traditional graph database.
Interoperability issues: Tool compatibility between graph databases and SQL is largely lacking. Existing tools for an organization’s databases may not work well with graph databases, leading to the need for new investments in tools and training for integration and usage.

Because a picture speaks a thousand words, PuppyGraph illustrates these pain-points and how they’re with a simple side-by-side comparison of how you would aggregate your relational data without PuppyGraph versus using PuppyGraph, and it says it all:

Puppygraph vs traditional ETL based graph architecture

Why does PuppyGraph exist and why is it more performant than a traditional graph database?

So PuppyGraph suggests that more than 90% of Graph use cases involve analytics, rather than transactional workloads. And the data leveraged in these analytical use cases tend to already exist in an organisation in some form of column-based storage, typically SQL. This is simply due to the fact that SQL systems are ubiquitous, thanks to their long history in the database and data warehouse markets.

With that data already in place and accessible, leveraging it directly at the source with no ETL means that you’re no longer copying the data into a graph, instead merely wrapping your data sources with a graph query engine.

Aside from the obvious zero ETL factor, there is another considerable performance optimisation being leveraged directly as part of your graph analytics. In graph, accessing a single node or edge requires loading all of their attributes in memory due to their placement on the same disk page, which leads to a higher memory consumption. By leveraging column-based storage, graph queries run by PuppyGraph can restrict their access to just the necessary attributes, which optimizes in turn the disk-access and memory storage required to evaluate a query. And therein lies the secret sauce.

Under the hood

So how does it work? You may think that PuppyGraph is merely translating your graph queries into SQL queries for the underline data sources – but it doesn’t. Instead, PuppyGraph performs all optimisations directly within its own query engine, restricting its SQL footprint to simple SELECT queries, e.g. SELECT name, age FROM person WHERE filter1 AND filter2.

You do of course need to tell PuppyGraph how to access your data sources, what tables you’re interested in accessing and what relationships between those tables are going to become the edges of your graph. This is done via a Schema configuration file, in which you’ll need to configure 3 sections:

catalogs: This is going to be your list of data sources. A data source consists of a name, credentials, database driver class and jdbc URI
vertices: this is the translation layer between your database tables and your vertices. Each vertex is mapped from a catalog, a schema and a table. Simply put, a table should map to a vertex, and its columns to vertex properties, with a name and a type. In other words, your columns ARE your vertex properties, and you can pick which ones to include as part of your vertex.
edges: this is translation layer that leverages the relationships of your relational data, and maps them into edges. Think simple: its (mostly) going to be foreign keys. You can even map attributes to your edges from columns of your related tables.

To illustrate this, see below a simple schema mapping two PostgreSQL tables into two vertices and an edge:

{
  "catalogs": [
    {
      "name": "postgres_data",
      "type": "postgresql",
      "jdbc": {
        "username": "postgres",
        "password": "postgres123",
        "jdbcUri": "jdbc:postgresql://postgres:5432/postgres",
        "driverClass": "org.postgresql.Driver"
      }
    }
  ],
  "vertices": [
    {
      "label": "Location",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "supply",
        "table": "locations",
        "metaFields": {
          "id": "id"
        }
      },
      "attributes": [
        {
          "name": "address",
          "type": "String"
        },
        {
          "name": "city",
          "type": "String"
        },
        {
          "name": "country",
          "type": "String"
        },
        {
          "name": "lat",
          "type": "Double"
        },
        {
          "name": "lng",
          "type": "Double"
        }
      ]
    },
    {
      "label": "Customer",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "supply",
        "table": "customers",
        "metaFields": {
          "id": "id"
        }
      },
      "attributes": [
        {
          "name": "customername",
          "type": "String"
        },
        {
          "name": "city",
          "type": "String"
        }
      ]
    }
  ],
  "edges": [
    {
      "label": "CustomerLocation",
      "mappedTableSource": {
        "catalog": "postgres_data",
        "schema": "supply",
        "table": "customers",
        "metaFields": {
          "id": "id",
          "from": "id",
          "to": "location_id"
        }
      },
      "from": "Customer",
      "to": "Location"
    }
  ]
}

And there you have it! The schema file below would result in the following Graph data schema:

A simple graph showing the relationship between customers and a location

Now that we’ve covered the theory, let’s jump to practice with a step by step guide to create, configure and query your first Graph Analytics Engine using PuppyGraph and G.V().

Setting up your first PuppyGraph container

For simplicity, we’ll run a local instance of PuppyGraph together with a PostgreSQL database using Docker Compose. If you haven’t already, install Docker. Once installed, create a docker-compose.yaml file with the following contents (or download it here):

version: "3"
services:
  puppygraph:
    image: puppygraph/puppygraph:stable
    pull_policy: always
    container_name: puppygraph
    environment:
      - PUPPYGRAPH_USERNAME=puppygraph
      - PUPPYGRAPH_PASSWORD=puppygraph123
    networks:
      postgres_net:
    ports:
      - "8081:8081"
      - "8182:8182"
      - "7687:7687"
  postgres:
    image: postgres:14.1-alpine
    container_name: postgres
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres123
    networks:
      postgres_net:
    ports:
      - "5432:5432"
    volumes:
      - ./postgres-data:/var/lib/postgresql/data
      - ./csv_data:/tmp/csv_data:ro
      - ./postgres-schema.sql:/tmp/postgres-schema.sql
networks:
  postgres_net:
    name: puppy-postgres

You’ll also need to create a couple folders with sample data and a Postgres Schema file to create your Postgres table. These files will be mounted to your Postgres Docker container.

Create a new postgres-schema.sql file in the same folder as your docker-compose-puppygraph.yaml file with the following contents (or download it here):

create schema supply;
create table supply.customers (id bigint, customername text, city text, state text, location_id bigint);
COPY supply.customers FROM '/tmp/csv_data/customers.csv' delimiter ',' CSV HEADER;

create table supply.distance (id bigint, from_loc_id bigint, to_loc_id bigint, distance double precision);
COPY supply.distances FROM '/tmp/csv_data/distance.csv' delimiter ',' CSV HEADER;

create table supply.factory (id bigint, factoryname text, locationid bigint);
COPY supply.factory FROM '/tmp/csv_data/factory.csv' delimiter ',' CSV HEADER;

create table supply.inventory (id bigint, productid bigint, locationid bigint, quantity bigint, lastupdated timestamp);
COPY supply.inventory FROM '/tmp/csv_data/inventory.csv' delimiter ',' CSV HEADER;

create table supply.locations (id bigint, address text, city text, country text, lat double precision, lng double precision);
COPY supply.locations FROM '/tmp/csv_data/locations.csv' delimiter ',' CSV HEADER;

create table supply.materialfactory (id bigint, material_id bigint, factory_id bigint);
COPY supply.materialfactory FROM '/tmp/csv_data/materialfactory.csv' delimiter ',' CSV HEADER;

create table supply.materialinventory (id bigint, materialid bigint, locationid bigint, quantity bigint, lastupdated timestamp);
COPY supply.materialinventory FROM '/tmp/csv_data/materialinventory.csv' delimiter ',' CSV HEADER;

create table supply.materialorders (id bigint, materialid bigint, factoryid bigint, quantity bigint, orderdate timestamp,

expectedarrivaldate timestamp, status text);
COPY supply.materialorders FROM '/tmp/csv_data/materialorders.csv' delimiter ',' CSV HEADER;

create table supply.materials (id bigint, materialname text);
COPY supply.materials FROM '/tmp/csv_data/materials.csv' delimiter ',' CSV HEADER;

create table supply.productcomposition (id bigint, productid bigint, materialid bigint, quantity bigint);
COPY supply.productcomposition FROM '/tmp/csv_data/productcomposition.csv' delimiter ',' CSV HEADER;

create table supply.products (id bigint, productname text, price double precision);
COPY supply.products FROM '/tmp/csv_data/products.csv' delimiter ',' CSV HEADER;

create table supply.productsales (id bigint, productid bigint, customerid bigint, quantity bigint,

saledate timestamp, totalprice double precision);
COPY supply.productsales FROM '/tmp/csv_data/productsales.csv' delimiter ',' CSV HEADER;

create table supply.productshipment (id bigint, productid bigint, fromlocationid bigint, tolocationid bigint,

quantity bigint, shipmentdate timestamp, expectedarrivaldate timestamp, status text);
COPY supply.productshipment FROM '/tmp/csv_data/productshipment.csv' delimiter ',' CSV HEADER;

Create a new csv_data folder, download the CSV data archive containing our sample data, and unzip it under csv_data.

You should now have the following file structure:

/csv_data
            customers.csv
            distance.csv
            factory.csv
            inventory.csv
            locations.csv
            materialfactory.csv
            materialorders.csv
            materials.csv
            productcomposition.csv
            products.csv
            productsales.csv
            productshipment.csv
docker-compose-puppygraph.yaml
postgres-schema.sql

We’re now ready to start the engine! On your command line prompt, at the folder location of your docker-compose-puppygraph.yaml file, run the following command:

docker compose -f puppygraph/docker-compose-puppygraph.yaml up

Give Docker a few minutes to pull the images and create your containers, and you’ll have the following running on your device:

Loading Relational Data and Turning it into a Graph

Next, we need to load data in our PostgreSQL database and tell PuppyGraph about it. To load the data, run the following commands:

docker exec -it postgres psql -h postgres -U postgres

\i /tmp/postgres-schema.sql

Then, head on over to localhost:8081 to access the PuppyGraph console. You’ll be prompted to sign in. Enter the following credentials and click Sign In:

Username: puppygraph

Password: puppygraph123

After that, you’ll be presented with a screen with an option to upload your Graph Data Schema. Download our pre-made graph data schema configuration file here, click Choose File, then Upload. PuppyGraph will perform some checks and in just a minute you should be presented with the following on your screen:

A fully loaded PuppyGraph data schema

Your PuppyGraph instance is now ready to be queried with G.V() (or using PuppyGraph’s internal tooling)!

Connecting G.V() to PuppyGraph

So first off, make sure to download and install G.V(), which will only take a minute. Open G.V() and click on “New Database Connection”. Select PuppyGraph as the Graph Technology Type, and enter localhost as the Hostname/IP Address, then click on Test Connection. Next, you’ll be prompted for your PuppyGraph credentials, which are the same as earlier (puppygraph/puppygraph123). Click on Test Connection again, and you’re good to go! Click on Submit to Create the Database Connection.

You’ll be prompted to sign up for a 2 weeks trial – enter your details, get your validation code via email, and then we’re ready to start. If you’d rather not share your details, click on the close button for the application and you’ll be offered to get an anonymous trial instead, which will apply immediately. With that done, you’re all set!

Getting insights from your shiny new PuppyGraph instance with G.V()

With all that hard work done, we’re ready to write some cool Gremlin queries to apply the benefits of your PuppyGraph Analytics Engine to relational data.

You’ll first notice a query tab opened with a simple query running, g.E().limit(100), and corresponding graph display, as shown below:

Your first puppygraph gremlin query using G.V()

There’s a lot going on in this screen and we’ll come back to that. For now, let’s check out the Entity Relationships diagram G.V() has created for your PuppyGraph data schema. On the left handside, click on View Graph Data Model, and you’ll be presented with the following:

An entity relationship diagram showing the PuppyGraph data schema

The Entity Relationship diagram G.V() provides gives you an easy way to inspect the structure of your data. This becomes especially useful when mixing multiple data sources in your PuppyGraph data schema as the resulting schema would be different from the individual data models of your data sources. Anyway, the added benefit of G.V() knowing your data schema is that it can also use it to power a whole bunch of features, such as smart autocomplete suggestions when writing queries, or graph stylesheets to customise the look and feel or your displays.

What’s important here is to realise what huge benefits a graph structure brings to your relational data. Let’s take a real life example applied to this dataset and compare how a graph query would perform against a normal SQL query. The dataset we’re using here is a supply chain use case. Unfortunately sometimes in a supply chain, a material can be faulty and lead to downstream impact to our customers.

Let’s say as an example that a Factory has been producing faulty materials and that we need to inform impacted customers of a product recall. To visualise how we might solve this querying problem, let’s filter down our data model to display the relevant entities and relationships we should leverage to get the right query running:

a stripped down ER diagram of a supply chain data model

Using this view allows use to see the path to follow from Factory to Customer. This concept of traversing a path in our data from a point A (a factory putting out faulting materials) to point B (our impacted customers) is fundamental in a graph database. Crucially, this is exactly the type of problems graph analytics engine are built to solve. In an SQL world, this would be a very convoluted query: a Factory joins to a Material which joins to a Product which joins to a ProductOrder which joins to a Customer. Yeesh.

Using the Gremlin querying language however, this becomes a much simpler query. Remember that unlike relational databases, where we select and aggregate the data to get to an answer, here we are merely traversing our data. Think of it as tracing the steps of our Materials from Factory all the way to Customer. To write our query, we will pick “Factory 46” as our culprit, and design our query step by step back to our customers.

In Gremlin, we are therefore picking the vertex with label “Factory” and factoryname “Factory 46”, as follows:

g.V().has("Factory", "name", "Factory 46")

This is our starting point in the query, our “Point A”. Next, we simply follow the relationships displayed in our Entity Relationship diagram leading to our unlucky Customers.

To get the materials produced by the factory, represented as the MatFactory relationship going out of Material into Factory, we simply add the following step to our query:

g.V().has("Factory", "name", "Factory 46".in("MatFactory")

You should start seeing where this is going. Following this logic, let’s get all the way to our Customers:

g.V().has("Factory", "name", "Factory 46").in("MatFactory").in("ProductComposition").in("ProOrderToPro").out("ProOrderToCus")

And there you have it! This query will return the Customer vertices that have bought products made up of materials manufactured in Factory 46. Best of all, it fits in just one line!

Let’s punch it in G.V() – this will be an opportunity to demonstrate how our query editor’s autocomplete helps you write queries quick and easy:

A demonstration of a Gremlin query being typed and displaying autocomplete suggestions as well as documentation help using G.V()

We can of course create more complex queries to answer more detailed scenarios – for instance, in our example above, we could narrow down to a single faulty material or only recall orders made at a specific date.

The Gremlin querying language offers advanced filtering capabilities and a whole host of features to fit just about any querying scenario. G.V() is there to help you with the process of designing queries by offering smart suggestions, embedded Gremlin documentation, query debugging tools and a whole host of data visualisation options. If you’re interested in a more in depth view of G.V(), check out our documentation, our blog and our website. We also regularly post on upcoming and current developments in the software on Twitter/X and LinkedIn!

Conclusion

PuppyGraph has built an amazing solution to transform your relational data stores into a unified graph model in just minutes. It’s scalable to petabytes of data and capable of executing 10-hop queries in seconds. Their graph analytics engine is trusted by industry leaders such as Coinbase, Clarivate, Alchemy Pay and Protocol Labs. If you’ve got this far, you’ve now got a working setup combining PuppyGraph and G.V() – go ahead and try it on your own data!

Local Amazon Neptune Development with G.V() and LocalStack, the AWS Cloud Emulator

Arthur Amazon Neptune, General

This article will cover how to connect your locally running Amazon Neptune database powered by LocalStack using G.V() – Gremlin IDE. To support this, we’ll use the AWS CLI to create a Neptune database on your local machine and start a connection while loading and querying data interactively on G.V().

Introduction

Before we start, let’s quickly introduce LocalStack, Amazon Neptune, and G.V().

LocalStack is a cloud development framework which powers a core cloud emulator that allows you to run your cloud & serverless applications locally. It helps developers work faster by supporting them to build, test, and launch applications locally — while reducing costs and improving agility. The emulator supports various AWS services like S3, Lambda, DynamoDB, ECS, and Kinesis. LocalStack also works with tools and frameworks like AWS CLI, CDK, and Terraform, making it easy for users to connect to the emulator when building and testing cloud apps.

Amazon Neptune is a managed graph database service designed to handle complex datasets with many connections. It’s schema-free and uses the Neptune Analytics engine to quickly analyze large amounts of graph data, providing insights and trends with minimal latency. Users can control access using AWS IAM and query data using languages like TinkerPop Gremlin and RDF 1.1 / SPARQL 1.1.

LocalStack supports Amazon Neptune as part of its core cloud emulator. Using LocalStack, you can use Neptune APIs in your local environment supporting both property graphs and RDF graph models.

G.V() is a Gremlin IDE – its purpose is to complement the Apache TinkerPop database ecosystem with software that is easy to use and install, and provides essential facilities to query, visualize, and model the graph data. If you want to find out more about G.V(), check out From Gremlin Console to Gremlin IDE with G.V().

Prerequisites

gdotv and LocalStack have partnered to offer a free trial of both LocalStack’s core cloud emulation and G.V() that you can take advantage of now if you haven’t already!

To get started, you’ll need the following:

Install LocalStack: Have a running instance of LocalStack as described in LocalStack’s Getting Started documentation with a LOCALSTACK_AUTH_TOKEN added to enable Pro features, including the Neptune emulation.
Install G.V(): Download and install for free from https://gdotv.com.
Install AWS CLI and awslocal: Download the AWS CLI as described in the AWS documentation, and install the awslocal wrapper script to re-direct AWS API calls to LocalStack.

Once you’ve done all the above, you’ll be ready to connect G.V() to your database and run queries.

Connecting G.V() to your LocalStack Neptune Database

Connecting G.V() to your LocalStack Neptune Graph database is quick and easy.

To create a LocalStack Neptune Graph database, follow these steps:

Start your LocalStack instance using either localstack CLI or a Docker/Docker-Compose setup.
Create a LocalStack Neptune cluster using Amazon’s CreateDBCl3uster API with the AWS CLI:

awslocal neptune create-db-cluster \ 
--engine neptune \ 
--db-cluster-identifier my-neptune-db

Add a LocalStack Neptune instance using Amazon’s CreateDBInstance API with the AWS CLI:

awslocal neptune create-db-instance \ 
--db-cluster-identifier my-neptune-db \ 
--db-instance-identifier my-neptune-instance \ 
--engine neptune \ 
--db-instance-class db.t3.medium

After starting the LocalStack Neptune database, you can see the Address and Port in the Endpoint field. Navigate to the G.V() IDE and follow the instructions:

Click on New Database Connection.
Choose the Graph Technology Type as LocalStack.
Enter localhost.localstack.cloud as the hostname and 4510 as the port. Customize the values if you have a different hostname and port.
Click on Test Connection. G.V() will make sure it can connect to your LocalStack Neptune database. It will then present a final screen summarizing your connection details, which you can now save by clicking Submit.

This will transition to a new query window. Now that your LocalStack Neptune database is up and running in G.V(), let’s run some Gremlin queries:

g.addV('person').property('name', 'Alice').property('age', 30)

Once complete, you can see the summary, results, vertices, graph, and more in the G.V() IDE.

For a more in depth look at using G.V() and LocalStack Neptune together, check out LocalStack’s announcement blog post and this LocalStack Neptune development presentation.

Conclusion

We’re only beginning to see the potential of LocalStack Neptune and G.V() when used together. This post shows how you can easily start working with setting up LocalStack Neptune on G.V() and running basic Gremlin queries. LocalStack also supports other AWS services, which allows you to test integrations supported by Neptune and shift left your database development without maintaining additional dependencies or mocks.

G.V() 2.1.2 Release Showcase

Arthur General

Today I’m very proud to announce the release, at long last, of G.V() 2.1.2. This is our most important update yet, and is full of essential improvements and changes to take our software to the next level.

Major version change and major performance improvements

The first thing you’ll notice is that we’re going from 1.70.92 to 2.1.2, which looks like a big leap (and it is). However there are no breaking change as part of this release – our compatibility remains the same as ever!

The main reason behind the shift from version 1.x to version 2.x is a major upgrade of the technology stack that G.V() runs on. When we first started developing G.V(), it was running on a Vue 2 + Vuetify 2 + Webpack stack which was just about to give way to a newer, better Vue 3 + Vuetify 3 + Vite stack. For a number of technical reasons over the years we’ve been unable to perform that upgrade, up until recently, which leads us to today.

The upgrade work itself was quite significant both in scope and reward. One of the most immediately noticeable improvement in G.V() 2.x is performance: thanks to the benefits of the Vue 3/Vite ecosystem, G.V() now runs much faster overall.

The performance improvements we’re seeing today aren’t the result of a deep dive into our application’s optimisation either – and so we will continue delivering faster, more resource efficient versions of G.V() throughout the year.

Whilst the bulk of the work we’ve done on this release is behind the scenes, we also have a number of new features and user experience improvements to show for it.

User Experience Improvements

First and foremost, if you’ve been using G.V() on a macOS or Linux based device, you’ll have likely found the auto update experience clunky at best. We have finally resolved this issue and all users across all operating systems will now receive the same one-click auto update experience, which we’re hoping will help you adopt newer (and as always, better) versions of G.V() more easily.

We’ve also reworked the layout and resizing features of the application, and whilst this may not be immediately visible to the eye, resizing of the Gremlin Query Assistant or Query Output is now much faster and much better looking.

Finally, we’ve improved the handling of presenting Query results such that when running consecutive queries, the Query Output will now automatically update itself without closing then re-opening, as shown below:

Repeating queries on G.V()

Exposing G.V() Playgrounds over localhost

When we sunset our free G.V() Basic tier in favor of G.V() Lite, one valid concern that our user base expressed was losing the ability to use G.V() for local development against a Gremlin Server, for instance.

To respond to those concerns, we’ve now made our in-memory graph, G.V() Playground, optionally available to connect to a configured port on localhost. Currently this feature is limited to wrapping G.V() Playground with a Gremlin Server, though we will be investigating other embedded server technologies, such as JanusGraph.

This means that from this release, you can use G.V() to quickly stand up and manage Gremlin Servers as well as query them directly from your development environment, for instance.

What’s next for G.V()?

Parallel to this 2.0 rewrite, we’ve been busy planning for our upcoming work for the year. Last year, we’d commissioned a number of improvements to SigmaJS, the WebGL based graph visualization framework that you’ll recognize as the “Graph (Advanced)” view in the Query Output, as well as our Graph Data Explorer. SigmaJS is a high-performance, open source library developed by OuestWare, a fantastic data analysis solutions company responsible for plenty open source gems built around SigmaJS, and many more.

The 3.0 release of SigmaJS is coming soon and will be integrated in G.V() in the near future. This release focuses on performance improvements allowing rendering of more complex graphs, as well as a few cosmetic improvements such as the availability of curved edges, at long last!

G.V() 1.64.87 Release Showcase

Arthur General

Well hello there! It’s another month (October 31st so we technically made the cut on our monthly feature release) and with that we’ve got a bunch of new cool functionalities out in G.V().

Let’s go over them!

Working as a team: Remote Gremlin Queries and Folders

One big issue with the Apache TinkerPop framework and its implementations is the lack of a standard mechanism to store reporting queries directly within the graph – much like you would for instance in SQL using stored procedures. This was partially addressed by G.V() allowing you to save your queries locally on your device and organize into folders.

But what if you have 15 people in your team all connecting to the same graph database and wanting to run the same queries? What if you have hundreds of users looking to do this? You get the point – having each and everyone copy those queries over on their own G.V() client is not gonna cut it.

This is why we’ve introduced a new feature in this release allowing your G.V() Queries to be saved directly against your graph database so that they can be fetched automatically in G.V() whenever anyone connects to your graph using G.V().

The idea is simple: if you have reports that you want to centrally engineer and deploy to users that can connect to your database, design them in G.V() and save them remotely on your graph database in just a click and all your users will have access to them via G.V(). What’s more, you can also centrally update them and remove them, users will receive those updates automatically.

But here’s the best part: there’s no additional configuration required on your end! We’re keeping it simple by having all this information saved as vertices directly on your graph so that you don’t need any additional infrastructure to store and manage these remote queries (and folders).

We’ve put some documentation together on all of this that you can check out at https://gdotv.com/docs/query-editor/#save-a-query. This document goes through the details of how to use feature and how G.V() stores this metadata against your database.

In the future we plan to extend this further by allowing stylesheets to also be saved on your graph database so will soon be able to manage graph visualization configurations centrally too.

Gremlin Query Variables and Reporting

So you’ve got common Gremlin queries you’d like to deploy using Remote Gremlin queries and folders but you don’t want folks to have to write any Gremlin to run them? We got you covered!

In conjunction with the above feature, we’ve also added the ability to create variables in your saved Gremlin Queries along with a new “Run Query” option for saved queries that allows you to get your query’s results in full screen without having to go through the Gremlin Query Editor.

Once again we’ve documented all this in details at https://gdotv.com/docs/query-editor/#query-variables, but for a quick visual of what this looks like, check out our demo below:

gremlin stored procedures with G.V()

TL;DR: Think stored procedures for SQL but applied to Apache TinkerPop with a rich UI to prompt for the query’s parameters and display its results in a variety of ways!

There will be further customisation options introduced in future updates to allow creating even easier to run reports for your users, such as the ability to provide a dropdown of options for Query Variables or that ability to use boolean toggles.

Query Editor and Graph Size Settings Improvements

We’ve slightly improved the Query Editor’s suggestion engine to handle more complex scenarios (such as remembering property keys that have already been used in a step when generating suggestions).

Along that we’ve added a new Default Output Tab option allowing you to select which Result visualization G.V() should go to by default on the query.

The Graph Size Settings shown on the Large Graph View can now also be (partially) saved against your stylesheets so that you can easily apply defaults that meet your criteria on your visualization. Currently the sizing setting rules for Vertex and Edge labels cannot be saved against your stylesheet but this will be available in an upcoming release. The min/max vertex size, and apply custom vertex/edge sizes can all be saved on the stylesheet.

Goodbye G.V() Basic, hello G.V() Lite

We’ve covered this topic in a lot more detail in a separate blog post but G.V() Basic is going away and being renamed to G.V() Lite, along with a few changes to what the tier offers.

First of all (and most important), G.V() Basic is no longer going to be available to new users. Existing users will continue to have full access to it until February 5th, 2024, after which all G.V() Basic licenses will automatically expire.

The G.V() Lite tier now offers free access to our Gremlin Query Debugging feature as well as our OpenAI Text To Gremlin functionality. It will however now be restricted to only G.V() Playgrounds (our in-memory graph).

To find out all the details about this change, head over to this blog post.

Our October TinkerPop Wide presentation

We’ve held a presentation over at the Apache TinkerPop Discord Server on October 23rd covering upcoming features, roadmap and important G.V() related announcements.

You can check out the replay of the presentation on YouTube below:

Getting started on Aerospike Graph with G.V() – Gremlin IDE

Arthur General

In this article, we’ll cover how to visualize and query your Aerospike Graph database using G.V() – Gremlin IDE. To support this, we’ll use a sample movies dataset that we’ll load on our Aerospike Graph database and discover interactively on G.V(). We’ll also write and explain some Gremlin queries to extract valuable information for the dataset via Aerospike Graph.

Before we start, let’s quickly introduce Aerospike Graph and G.V().

Aerospike Graph is a new massively scalable, high-performance graph database launched on June 23 2023 as part of Aerospike’s multi-model NoSQL database. It uses Gremlin as its main querying language and reports < 5ms latency for multihop queries, even for graphs comprising of billions of elements. It was also recently made available on the Google Cloud Marketplace.

gdotv and Aerospike have partnered to offer a 60 days free trial of both Aerospike Graph and G.V() that you can take advantage of now if you haven’t already!

To get started, you’ll need the following:

Download Aerospike Graph: Have a running instance of Aerospike Graph as described in Aerospike’s Getting Started documentation with a folder of your choice mounted to the /etc/default-data folder of your container.
Install G.V(): Download and install for free.
A dataset: This movies dataset can help you get you get started.

Once you’ve done all the above, you’ll be ready to connect G.V() to your database and visualize your data.

Connecting G.V() to your Aerospike Graph Database

Connecting G.V() to your Aerospike Graph database is quick and easy.

If you’re running your Aerospike Graph database from a networked device, ensure that the machine you’re running G.V() from can connect to the device. Refer to the demo below for connecting to an Aerospike Graph database run locally on the same device as G.V():

Follow these step-by-step instructions:

Click on New Database Connection.
Enter the hostname of your Aerospike Graph database; if running on your local machine, this will just be localhost.
Click on Test Connection. G.V() will make sure it can connect to your Aerospike Graph container. It will then present a final screen summarizing your connection details, which you can now save by clicking Submit.

Once you’ve created the connection on G.V(), you’ll first be prompted to sign up for your 60 days free, no obligation trial of G.V(). Pop your details in there, enter your validation code, and you’re all set.

This will transition to a new query window fetching the first 100 edges of your database, which should result in an empty array as we’ve not yet loaded data in our database.

Loading the Movies dataset in Aerospike Graph

Now that your Aerospike Graph database is up and running in G.V(), let’s load some data. Make sure you’ve mounted the volume to your Aerospike Graph Service Docker container, pointing either to a folder with the Movies dataset or to the dataset file itself.

For instance, in our setup, we’ve mounted our local default-data folder containing movies.xml to /etc/default-data. To load the movies.xml dataset in our database, let’s run the following query:

g.with("evaluationTimeout", 200000).io("/etc/default-data/movies.xml").read().iterate()

Give it a minute to run. Once complete, our dataset is loaded, and we’re ready to play with the data!

Styling the graph visualization

Let’s run a query to quickly visualize our data and get a good overview of the graph’s data model. In your G.V() query editor, run the following query:

g.E().limit(250)

Nothing fancy here – we’re just loading the first 250 edges in the database to generate a little display of your graph database. This is just to give you a taste of what G.V() can do!

Before we move on to our next steps, let’s quickly stylize the graph to make sure we’ve got the best display. To speed this along, we’ve created a stylesheet that you can import in G.V(). Download it here and follow the instructions below.

On the Graph view, click on Graph Styles as highlighted below:

Next, click on “Import Stylesheet”:

This will open a file explorer in which you need to select the “movies-aerospike.json” file you’ve just downloaded.

Once loaded, click on “Save New Stylesheet.” After the stylesheet is saved, toggle it to be the default stylesheet by clicking on “Set As Default Stylesheet” – Done!

You’ll see that the graph now displays the relevant information directly on screen, as shown below:

There are a lot of other things you can do in the graph view, so feel free to play around with the graph display. For reference, these are the graph controls and how to display them:

Exploring our graph’s data model

Once you’ve had a little interactive browse of your data, head over the Data Model Explorer view so you can examine the data structure:

As shown in the Data Model Explorer, our graph contains the following vertices:

Movie
Genre
Actor
Director
ActorDirector
User

Relationships in this graph are as follows:

Movies are in a genre (IN_GENRE)
Users have rated movies (RATED)
Directors have directed movies (DIRECTED)
Actors have acted in movies (ACTED_IN)
ActorDirectors have acted and directed in movies (ACTED_IN and DIRECTED)

It’s all pretty self-explanatory (and that’s the beauty of graphs!). We’ll not enumerate all the properties here, but let’s just go over the main ones of relevance:

All vertices have a name property
The RATED edge has a rating indicating the rating a user gave to a movie
All vertices but Genre and User have a poster property containing an image URL and a URL property pointing to their IMDB page

Querying, analyzing and visualizing the graph

There’s a lot of useful information that we can leverage to query our graph and get some insights. Let’s give it a go:

Our first query is going to be simple. I just want to see the graph surrounding the Titanic movie:

g.V().has("Movie", "title", "Titanic").bothE()

Quick breakdown:

g.V().has(“Movie”, “title”, “Titanic”) finds any vertices with a Movie label and a title property that equals “Titanic” – makes sense so far.

The .bothE() bit at the end there says, “fetch all incoming and outgoing edges to the vertices”, in other words, it will fetch all relationships to the Titanic movie.

To run the query, first, enter it in the query editor as shown below, then click on the green play button.

Quick note: If you click on the individual steps in the query, you’ll be able to see the official Gremlin documentation in the Gremlin Query Assistant on the right side of the editor. Great way to learn or remind yourself of the various steps and how they work:

Anyway, once you’ve run the query, you’ll be presented with a graph display of the resulting data, and you should notice something odd: there are two Titanic movies!

(Now of course there’s nothing odd here – there are indeed two Titanic movies but I for one was born in the 90s and I have missed the release night for the first one by just about 40 years)

The graph display also visually indicates that one of these Titanic movies has many more reviews than the other. Unsurprisingly, it is James Cameron’s version, as highlighted by the DIRECTED_BY relationship between Titanic and James Cameron.

Well, it’s simple: it just turned out James Cameron’s Titanic wasn’t the only one or even the first to come out!

If you click on the Titanic nodes, you’ll also be able to check out their posters or open their IMDB movie page, as demonstrated below:

Let’s try a more complex query. What are the top 10 movies with the most user ratings?

g.V().
  hasLabel("Movie").
  order().by(inE("RATED").count(), desc). values("title")
  limit(10)

Here’s the breakdown of the query:

First of all, we’re only interested in movies, so filter the vertices accordingly with g.V().hasLabel(“Movie”).

Next, order these movies by a metric, in descending order, and get the first 10 results, which happens at:

order().by(…, desc)

Now, for the order itself, use the count of incoming RATED edges as the main metric , which is done via inE(“RATED”).count().

Finally, to display the title of the matched movies and limit this to a top 10, add values(“title”).limit(10)

The final result is (drum roll please):

==>Forrest Gump
==>Pulp Fiction
==>Shawshank Redemption, The
==>Silence of the Lambs, The
==>Star Wars: Episode IV - A New Hope
==>Jurassic Park
==>Matrix, The
==>Toy Story
==>Schindler's List
==>Terminator 2: Judgement Day

Now, these are all pretty good movies, but they’re not the highest rated films .

In this database, there are two types of ratings available:

User ratings from 0 to 5 as shown in the RATED relationship from User to Movie
IMDB ratings based on votes which are aggregated into the rating property of movies

Let’s start with the easy one: IMDB ratings.

g.V().hasLabel("Movie").order().by("imdbRating", desc).values("title").limit(10)

We get the following results:

==>Band of Brothers
==>Civil War, The
==>Shawshank Redemption, The
==>Cosmos
==>Godfather, The
==>Decalogue, The (Dekalog)
==>Frozen Planet
==>Pride and Prejudice
==>Godfather: Part II, The
==>Power of Nightmares, The: The Rise of the Politics of Fear

Okay, I don’t know half of these, but then again I’m no movie critic. Let’s change the query slightly to also include the rating of these movies:

g.V().
  hasLabel("Movie").
  order().by("imdbRating", desc).
  project("title", "imdbRating").by("title").by("imdbRating").
  limit(10)

This now results in the following:

==>{title=Band of Brothers, imdbRating=9.6}
==>{title=Civil War, The, imdbRating=9.5}
==>{title=Shawshank Redemption, The, imdbRating=9.3}
==>{title=Cosmos, imdbRating=9.3}
==>{title=Godfather, The, imdbRating=9.2}
==>{title=Decalogue, The (Dekalog), imdbRating=9.2}

==>{title=Frozen Planet, imdbRating=9.1}
==>{title=Pride and Prejudice, imdbRating=9.1}
==>{title=Godfather: Part II, The, imdbRating=9.0}
==>{title=Power of Nightmares, The: The Rise of the Politics of Fear, imdbRating=9.0}

Let’s enhance the query a bit further by including the director and actors in these movies.

g.V().
  hasLabel("Movie").
  order().by("imdbRating", desc).
  project("title", "imdbRating", “directedBy”, “actors”).by("title").by("imdbRating").by().by().
  limit(10)

We get the following visualization:

At a glance, we can see that Morgan Freeman and Al Pacino both acted in two top-rated IMDB movies and that Francis Ford Coppola directed two of them, as well. This kind of makes sense, given that for the latter, both movies are part of the same trilogy, The Godfather.

Just to wrap things up in this first introductory post, you’ll notice that some of the actor relationships are missing information; for instance, Morgan Freeman’s role in The Civil War is not stated. Using our graph visualization, we can easily update the graph interactively to fix any missing data. I will also add myself as an actor on Pride and Prejudice in this dataset, just because I can!

Try Aerospike Graph and G.V() free for 60 days

We’re just barely scratching the surface of what Aerospike Graph and G.V() can deliver together. This post simply demonstrates how you can quickly get up and running with a sample dataset on Aerospike Graph to explore graph data interactively or via Gremlin queries.

For instance, Aerospike Graph offers a Bulk Data Loader that can load your semi-structure CSV data into a graph database, enabling fast access to visual insights and interactive editing via G.V().

And as a reminder, here’s the best part: you can try this all out for 60 days for free! So what are you waiting for? Get your Aerospike Graph 60-day trial now and explore your shiny new graph database with G.V()!

G.V() Basic is going away February 5th, 2024 – what you need to know and what’s next.

Arthur General

What will change with G.V() Basic going away?

G.V() Basic is our free tier. It allows you to use most of our software’s functionality without restrictions for the following Apache TinkerPop Graph Systems:

Gremlin Server
JanusGraph
ArcadeDB
Azure Cosmos DB Emulator

On February 5th 2024, this tier will be removed and all G.V() Basic Licenses that have been issued will expire. Continuing to use G.V() with the above Graph Systems will require a paid G.V() Pro license.

A new free tier, G.V() Lite, has already been introduced for new users and allows (mostly) unrestricted use of G.V() with our in-memory graph, G.V() Playground.

Continuing to support the developer experience with G.V() Lite

This free tier is designed to support users learning graph databases or Gremlin and will receive new specific functionality before G.V() Basic’s expiry allowing for in-memory graphs to be exposed over a network to connect your development environment (or any other application).

Initially G.V() Playground will be restricted to running TinkerGraph (Apache TinkerPop’s own in-memory graph) but we plant to extend this to support JanusGraph’s as well.

As part of this, we’re also opening up access to our Gremlin Query Debugging functionality and our OpenAI Text To Gremlin feature to all users of G.V() Basic and G.V() Lite.

Our goal is to give everyone a free to use software that allows them to stand up and manage Apache TinkerPop in-memory graphs effortlessly, and to expose them over a local network so that they can be accessed from their development environment.

This will allow developers and engineers to continue to develop their graph project locally using G.V() without the burden of having to stand up and manage Apache TinkerPop graph systems by themselves. When the time comes for these graphs to be deployed in production using one of the many Apache TinkerPop systems G.V() support, they can then choose to start using G.V() Pro to continue supporting their graph use cases.

Why is this happening?

G.V() is a passion project that I’ve now embarked on three years ago at time of writing. G.V() saw it’s first beta release after a year of work and remained 100% free to use until January 2023, when we’ve introduced G.V() Pro for Amazon Neptune and Azure Cosmos DB. We’ve received a lot of great feedback that helped us improve the software over the years.

To continue delivering quality updates at a monthly cadence we need financial support from our user base. With your support we can work on more ambitious features that keep pushing the boundaries of tooling in the Apache TinkerPop ecosystem.

We hope that you will understand and support this decision – it’s essential to sustaining the growth of our software.

Will there be exemptions/exceptions?

We offer a 30% discount on G.V() Pro for small companies like us that you can apply for here.

If you are a student or an academic looking to use G.V() for learning or educational purposes, contact us at support@gdotv.com via your academic email address and we’ll provide you with a G.V() Pro license free of charge.

If you’re a committer to one of the open source Apache TinkerPop systems listed above, we’ll also be happy to send you a free G.V() Pro license. Just contact us at support@gdotv.com from an email address that can be tied back to any of your contributions in the last 12 months.

So I’ll (likely) need a G.V() Pro license – where do I start?

To purchase a G.V() Pro license, head on over to our Pricing page. We’ve included a whole lot of FAQs to help you through the process. If you need to raise a Purchase order with us simply get in touch at support@gdotv.com and we’ll get this all over to you in no time – same goes if you have vendor onboarding questionnaires or any similar due diligence.

What’s next for G.V()?

G.V() is going to continue receiving monthly feature updates – and with all your support it will just keep getting better and better, so stay tuned!

Learning Graph Databases with G.V() and Practical Gremlin by Kelvin Lawrence

Arthur General Apache TinkerPop

First steps with Apache TinkerPop and Gremlin

Are you new to Apache TinkerPop or Graph Databases in general? Are you looking for directions on how to get started with graph data and the Gremlin query language?

Then look no further than Kelvin Lawrence’s Practical Gremlin: An Apache TinkerPop Tutorial. It’s a free ebook that you can read right here from your browser. It covers the Apache TinkerPop framework and its querying language, Gremlin in great depth. Or as Kelvin describes it:

This book introduces the Apache TinkerPop 3 Gremlin graph query and traversal language via real examples featuring real-world graph data. That data along with sample code and example applications is available for download from the GitHub project as well as many other items. The graph, air-routes, is a model of the world airline route network between 3,373 airports including 43,400 routes

Practical Gremlin is packed full of example queries that will give you a comprehensive overview of the capabilities of the framework. It also discusses very concept behind graph databases and their advantages over traditional relational databases such as SQL. Each query presented in the book corresponds to a genuine use case The query description shows what the data is needed for, its format, and how it is written using Gremlin.

Practical Gremlin is much more than a book too; its Github repository contains not just the book but code samples, as well as the air routes dataset in a variety of formats, compatible for instance with Amazon Neptune.

The Air Routes dataset is simple to understand and intuitive. It contains the relationship between Airports, Countries and Continents. A continent contains countries which contain airports which have air routes allowing travel between cities and countries across the world.

For reference, this is the structure of the Air Routes dataset represented as an Entity Relationship diagram using G.V()’s Data Model Explorer tool:

air routes data set model

The graph use case is clear too, especially over the use of relational data. In order to calculate possible itineraries based on existing routes between Airports in different countries, a lot of joining operations are required. For instance when calculating all routes from Austin, Texas, USA to Paris, France, all relationships going out of Austin, into Paris, and potentially in between have to be evaluated as well. This approach would scale very poorly in SQL due to the immense amount of joining operations required. Additionally, the more hops are required in the data, the worse performance (and query structure) would get!

This works in graph because the relationships between entities are first-class entities too, the same way vertices are – you can query entities in a relational database, but you can’t directly query relationships without having to go through its entities (tables in this case, where relationships are the foreign keys).

We’ve covered the basics of our favorite Gremlin learning resource – now let’s use the best available tool (G.V() of course :D) to practice the contents of the book!

Getting started with G.V()

The entire Air Routes Dataset display in G.V()’s Graph View – Now is it just me or does this look a bit like our world map?

Let’s get to work! First things first, you need to download and install G.V() – and don’t worry, this is all 100% free.

Once you’ve got G.V() installed, you’ll be presented with a Welcome Screen. We love Kelvin’s work so much that we’ve put it front and center on our application. Scroll down (if you’re on a small screen) and you’ll notice a Learn Gremlin With Kelvin Lawrence’s PRACTICAL GREMLIN section. Click on Open Practical Gremlin to open the book, if you haven’t already, and click on Create Air Routes Graph to create an in-memory TinkerGraph instance on G.V() with the air routes data set pre-loaded.

Creating air routes dataset on gdotv

Quick editor’s note: you’ll be prompted to sign up for a G.V() Basic License – it’s free and permanent. We’re just asking for some basic details and we’ll not bother you unless you agree to be contacted by us!

Once you’ve created your air routes graph and signed up for a free G.V() Basic License, you’ll be presented with a query screen with the following query run for you:

g.E().limit(100)

This is a simple query that simply says “fetch me the first 100 edges in the database” – this allows G.V() to produce a nice little initial graph visualisation for you.

G.V() is a Gremlin IDE – long story short, it can do a lot. For the purposes of this blog post you can mostly just stick to this newly opened query tab to run queries from the book. Have a little click around the various result tabs showing (Query Output, Vertices, Edges) to get a better idea of what’s being displayed. You can also click on elements in the graph to view their details, modify their properties, etc.

Additionally, you can view your database’s graph schema by clicking “Open Data Model Explorer” on the left navigation pane. If you just want to explore the data in free form or query it without writing any actual query, you can also open a Graph Data Explorer, also in the left pane. Finally, you can open as many queries as you like, so you can easily compare them. Go ahead and click “New Query” on the left navigation pane under your Air Routes connection to create another query tab.

There’s a lot more functionality available but as far as this follow along with the book exercise is concerned, this should about do you! Feel free to have a play around though.

We’re all setup to follow along with the book now. Note that since we’ve already loaded Air Routes in a G.V() in-memory graph, you can just go ahead and skip section 2.6 and 2.7, but we recommend you have a read and give them a try at some point anyway! There’ll be plenty queries for you try from section 3 through to 5. Further sections of the book will give you a great start on developing a graph application, deploying a graph database, and what your options are!

Get reading, pop those queries in G.V() and give them a whirl! Try and type them manually in G.V() too so you can see our smart autocomplete feature in action!

Graph Visualization Options

Here’s a few advanced configurations you can do in G.V() to improve your graph visualization. Open a graph data explorer for your Air Routes connection, as explained before, and click on Graph Styles as shown below:

You can change the styles of your graph visualization. For instance, you can select what text to display on the vertex and edge labels. This is really useful to get quick and effective visual of your graph data. For your vertices, set the Label Display Rule value to “desc” (the name of the property in the graph containing the object’s name, e.g. country name or airport name) and click on “Save All”:

You can also set styles for your edges. Click on Edges, and for the “route” edge, select “dist” as the label display rule to display the route’s length (in miles) on the graph visualization.

Change the label display rule and hit Save Changes again. The graph visualization will update in real time as you change these configurations. You can also create multiple stylesheets and swap between them. There’s a lot of options available which we’ll not list here, but have a play. You can even set images for your vertex backgrounds!

You can also change the graph layout algorithm displaying your data. This can be useful to de-clutter the visual and is typically highly dependent on the volume of the data on your graph as well as how interconnected it is, check it out:

Generating Gremlin queries on your dataset with OpenAI

If you’re not much in the mood for writing queries today, guess what? You can just ask an OpenAI Model to do it for you. Open a new query editor and click on “Text To Gremlin” in the toolbar. First off, you’ll need to configure your OpenAI key – check out the documentation linked in the popup on your screen for more information.

Once you’ve got your OpenAI API Key configured within G.V(), you’re ready to go. Note that you can also choose with GPT model to use – GPT3.5 Turbo , GPT3.5 Turbo 16k (useful for more complex data schemas) or GPT4.

Let’s pop some prompts in and let OpenAI work its magic:

We’ll start with the following prompt, using GPT4:

find all the routes between airports in London, Munich and Paris

It outputs the following query:

g.V().
has(‘airport’, ‘city’, within(‘London’, ‘Munich’, ‘Paris’)).as(‘a’).
out(‘route’).
has(‘airport’, ‘city’, within(‘London’, ‘Munich’, ‘Paris’)).as(‘b’).
path().
dedup().
toList()

And we get the following graph display:

Comparing it to the query recommended in section 5.2.8 of Kelvin’s book, we get a similar result:

g.V().has('city',within('London','Munich','Paris')).aggregate('a').out().
where(within('a')).path()

Editor's note: taking out the by('code') from the query to allow it to output a graph visualization rather than

plaintext Airport codes

And we get the following visual:

It’s the same result! Uncanny….G.V() does a little bit of magic behind the scenes to ensure that the GPT prompt submitted to OpenAI contains the essential information it needs to generate a Gremlin query that is aligned to your graph data schema. Give it a try! It’s a great way to query your data without ever having to write a Gremlin query.

Everything else

G.V() is the most feature rich Gremlin IDE available – so we’re not covering everything it can do here. But have a look at our documentation to find out more and make sure to check our Blog regularly too for new content demonstrating the capabilities we have in store. We have monthly new feature releases so you can be sure there’ll always be more for you to do on G.V()!

Conclusion time

Graph databases are great – they’re still relatively new compared to titans such as SQL and that means it can be hard to unlearn years or relational data reflexes. Kelvin Lawrence’s Practical Gremlin: An Apache TinkerPop Tutorial is a great way to learn more about Graph Databases, Apache TinkerPop and the Gremlin query language. With G.V(), you can enhance your learning experience and start seeing the concrete benefits of using a graph database with cool visuals and features to match with your data, at no cost.

There are other learning resources available too of course – many of them just as great, for instance, Apache TinkerPop’s own Getting Started guide!

Who knows, with a bit of practice and learning you might find yourself developing and deploying your graph database to one of our many supported graph systems (Aerospike Graph, Amazon Neptune, Azure Cosmos DB, JanusGraph, and more)!

Did you like this post? Share it around and tell us your thoughts below!

From Gremlin Console To Gremlin IDE with G.V()

Arthur General Apache TinkerPop

What do Neo4j, MySQL, Microsoft SQL, and many other databases have that Apache TinkerPop doesn’t?

You probably read the title – an Integrated Development Environment (IDE)!

SQL in particular has a lot of options available – MySQL Workbench, Microsoft SQL Server, DBeaver, the list goes on. Neo4j has an entire suite of tools to cover the various aspects of data management (Neo4j Bloom, Neo4j Desktop).

It’s an essential part of a database’s toolkit in helping developers, data designers and analysts work with their data day to day as well as presenting it to stakeholders.

So what does Apache TinkerPop have to offer users out of the box? Why of course, the Gremlin Console!

Gremlin Console is an interactive terminal that ships with all releases of Apache TinkerPop and allows users to quickly run queries against their database without the need for writing any code.

It’s a command line interface – that means it’s somewhat limited in what it can display but it has a number of customisable options called “Console Preferences” which can improve the quality of the display. For instance, they allow introducing intuitive color schemes to visually disseminate the different types of results output when running a query.

*Running a Gremlin query and viewing results from the Gremlin Console*

There’s been a lot of effort put behind the Gremlin Console and there’s no better way to learn about what it is and what it can do than to check out its main tutorial.

Now we’re not here to bad mouth the Gremlin Console by any means – it’s a great tool to get started with Gremlin. In fact the best and most popular Gremlin learning resource (in my humble opinion), Kelvin Lawrence’s Practical Gremlin – An Apache TinkerPop Tutorial, uses the Gremlin Console to demonstrate the many querying use cases of Gremlin.

But if you want to get serious with Apache TinkerPop Graph databases, you’ll need a Gremlin IDE. You need G.V().

Meet G.V(), your Gremlin IDE

Let’s jump right into it – G.V() (pronounced g dot v) is the answer to all your Gremlin headaches. To put it shortly:

G.V() is an all-in-one Gremlin IDE to write, debug, test and analyze results for your Gremlin graph database. It offers a rich UI with smart autocomplete, graph visualization, editing and connection management.

Okay, cool. But what does that look like? Well, G.V() has a lot to offer, but in keeping with our Gremlin Console comparison, let’s see what running the same query on the same data looks like in G.V() versus on the Gremlin Console:

Can you spot the 7 differences?

There’s a lot to look at on the above but let’s just summarize on what we’re seeing:

– On the top of the screen is your query editor, with our query, in this case, “g.V()”.

– On the rest of the screen is our results view, which shows a graph visualization of the data returned by the query.

– Everywhere else on the screen is many tools and controls around editing the query and viewing its results – but we’ll come back to these later.

So we’ve gone from g.V() returning this on the Gremlin Console:

==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]

To this on G.V():

How does the above map with our graph? Where did the edges come from?

Well, v[1] in the Gremlin Console really means “vertex with id 1” – a vertex in a graph is essentially the dot, and edges are the lines that connect them.

So in our graph representation, each of the dot corresponds to a vertex in the database – and they’re annotated with information that best describes what they are. In this case, our graph contains “person” and “software” vertices, all of which have a “name” property.

Much better!

As for the edges, they strictly speaking don’t belong there – after all, the query we’ve run was g.V(), not g.E(). What G.V() does by default is attempt to include meaningful relationships between any vertices run in a query – this allows producing a more useful representation of your data, and can easily be opted out of.

Even though a graph is a useful representation, particularly in the context of graph databases, sometimes, you may want to rely on more traditional formats inherited from the likes of SQL. Not to worry – G.V() has you covered! In a property graph, it can be convenient to visualize the data as tables too. After all, a vertex label can be thought of as a table name, and it’s properties as its columns. We can also apply a similar logic to edge labels.

For that reason, G.V() offers various results views tailored to your needs:

Vertex View

Edge View

Query Output View

Now that we’ve done a quick comparison of G.V() and the Gremlin Console, let’s look at some of the common features other database IDEs offer, and how G.V() compares.

Gremlin IDE Wishlist

This is by no means exhaustive but here are some of the features commonly found in other database IDEs, such as Neo4J Bloom or Oracle’s MySQL Workbench:

Ability to add, remove and update connection configurations to different database endpoints potentially using slightly different implementations and version of the Apache TinkerPop framework
Ability to query the database and benefit from code completion during the writing of queries
Ability to visualize, compare and modify data interactively on the database
Data analysis and reporting
Database schema management and visualisation
Debugging, profiling, and other optimisation tooling to get the best performance out of our queries
Easy and secure to install/deploy

These features are essential to the ecosystem of any database, and Apache TinkerPop should be no exception – and this is exactly why G.V() was created in the first place. So how does G.V() currently measure against these requirements? Quite well actually! Let’s have a closer look:

Connection Management

G.V() is officially compatible with a wide variety of Apache TinkerPop implementations. Generally speaking, any implementation of the framework that runs over a websocket channel has a fairly high chance of working out of the box with G.V(). Where official support from us is important however is that we can also implement additional functionality based on the “extras” that each of these implementation support – for instance, G.V() has official built-in support for Amazon Neptune’s IAM authentication mechanism, its Profile and Explain API endpoints, as well as many other Amazon Neptune specific features.

At time of writing, this is the exhaustive list of all the Apache TinkerPop-Enabled Graph Systems that we support, in alphabetical order:

Aerospike Graph
Alibaba Graph Database
Amazon Neptune
ArcadeDB
Azure Cosmos DB
DataStax Enterprise Graph
JanusGraph
and of course, last but not least, Gremlin Server!

We also provide the ability to quickly spin up in-memory graph’s using G.V()’s Playground feature, which uses TinkerGraph instances behind the scenes. Perfect to get a quick start!

You can have as many connections to each and any of these graph systems stored in G.V() concurrently and used in parallel unlike the Gremlin Console for instance that would require one per database instance. We have a simple connection setup wizard that guides you through the steps to connecting to your database based on the requirements it returns to us (e.g. credentials required, graph traversal source name required, authentication key, etc) and an advanced setup mode where you can fine tune how you want to connect to your database, down to which serializer to use.

Querying the database

This is a lot of our users’ favourite feature of G.V() – our query editor. It is quite simply the most advanced and feature complete code editor for Gremlin you’ll find out there. It comes bundled with an advanced autocompletion engine that provides suggestions based not just on steps and predicates of the language but also vertex labels, edge labels and property keys of your database, as inferred from its data model.

It also offers syntax error reporting and highlighting, query formatting using Gremlint, Gremlin Language Variant translation (e.g. Java, Python, Go, Javascript) based on the official implementation of the framework, embedded Gremlin reference documentation, and much more! Just have a look for yourself:

gdotv query editor demo

Visualizing and Editing data interactively

There’s a lot to cover here as far as G.V() is concerned and this deserves a separate article to really delve into the various options available and all the customization that can be configured directly. A couple small visualization examples were shown earlier that should give you a good idea of what to expect when starting to use our software.

One really important feature in G.V() that can sometimes be missing in other database IDEs is the ability to create, update and delete vertices and edges interactively. It’s a huge time saver to be able to maintain individual records without having to rely on writing entire queries. Despite most Apache TinkerPop Graph Systems having no data-schema constraints, G.V() will once again use its knowledge of your data to accelerate this operation. A picture (actually a GIF) speak a thousand words so once again, check out this quick demo:

Visually exploring graph data and modifying it interactively

Data Analysis and Reporting

G.V() bundles a number of features to help with Data Analysis, particularly leveraging our graph visualization engine. At time of writing we don’t have general reporting and graph analytics functionality to offer other than those you can run directly on your graph database. We do however have an upcoming graph analytics feature coming up before the end of the year providing access to various useful algorithms that can be run directly within the user interface – so keep an eye out on this space!

Graph Data Schema visualization and modelling

A core aspect of the G.V() software is how it builds an internal representation of your data schema that is then leveraged to power a number of UI features. Being able to quickly see the structure of your data is essential to understanding it and presenting it to others. That’s why we also offer a number of handy views to visualize your data model directly, such as our Data Model Editor, shown below:

An Entity Relationship visualization of one our graph database’s data schema

At time of writing most Apache TinkerPop Graph systems are schema-less, meaning that there is no form of data schema enforcement available (except for a rare few such as JanusGraph and DataStax Enterprise Graph). We’re keeping our eyes peeled for more and we’ll be keen to introduce more data model management functionalities in the future to support existing APIs such as JanusGraph’s as well as potential new ones!

Query Debugging and Profiling

Okay, now we’re getting to some truly unique functionality in G.V(). Typically when we think about debugging database queries, we mostly refer to profiling and query planning – not so much ACTUALLY debugging the query step by step and thread by thread.

So first of all and before anything else, G.V() does offer a lot of convenience features for Gremlin Query Profiling and Traversal Explanation generation, allowing you to get this information about your query in just one click. Additionally, we fully support provider specific functionality in that area such as Amazon Neptune’s Explain and Profile APIs.

But this is where things get really interesting – and you’ll not find this anywhere else – we provide real debugging tooling to simulate individual Gremlin traversals at any step of the query.

This feature deserves its own little deep dive post and we’ll not cover it here in too much detail but here’s a visual of it just to give you an idea of what it provides:

Gremlin Query Debugging using G.V() - Gremlin IDE

Stepping through a query with G.V()’s debugger and inspecting individual traversals step by step

We believe it is a feature unique to the Gremlin language itself and its ability to be broken down into clear steps, both within the query and for each traversal in the query. To put it short, you can’t break down a SQL statement such as:

SELECT * from person GROUP BY age

in SQL into multiple steps but looking at the Gremlin equivalent,

g.V().hasLabel(“person”).group().by(“age”)

you can clearly see the distinct steps of the query leading to its final result (person records grouped by age).

This allows to really dig deep into how Gremlin traversal works as well as troubleshoot queries that aren’t behaving as expected, say for instance due to a missing edge or property.

Secure and easy to install

G.V() is not a SaaS (Software as a Service) or PaaS (Platform as a Service) solution. That may sound like a step backwards in the evolution of software delivery. After all, we’re used to doing more and more directly on websites either deployed internally to our organizations or offered as an online service.

Here’s the thing – we’re connecting to databases, which will likely contain sensitive data that is owned by your organisation alone. We don’t want that going anywhere we don’t want to! Additionally, what’s more frustrating than wanting to get started with a solution but having to figure out how to deploy it, maintain it and monitor it before anything else?

G.V() is keeping things simple – it’s a software executable compatible with Windows, MacOS and Linux that you can simply install and get started with right away. The software runs locally on your device and network and is therefore not requiring your databases to be accessible outside of your network. Everything stays in your network and in your organisation without the need to navigate complex deployment scenarios or data privacy concerns.

Just download it for free, install it and you’re good to go!

In conclusion

G.V() is a continuously evolving software – we’ve put the Apache TinkerPop community’s feedback and interests at the centre of our solution’s design to help us shape it into a product that answers YOUR needs. Our aim is to deliver the best possible product to support and enhance the growing ecosystem of Apache TinkerPop Graph Systems. We believe we have the most comprehensive Gremlin IDE to date, and we’re going to keep adding more and more awesome features to help you make the best use of your time working with your graph database.

Whether you’re just getting started or already fully deployed with Amazon Neptune, Azure Cosmos DB, JanusGraph, and the many other graph databases we support, you must give G.V() a try!

We offer a free tier for our product and a no obligation trial for our more advanced features allowing you to get started right away with no overhead or complication. So what are you waiting for? Install G.V() now!

Did you find our article interesting? Have you got any thoughts? Give us a comment below!

If you wanna chat you can come find us over on Twitter, by email or on the Apache TinkerPop Discord Server (seriously, check it out, it’s great).