Product Bulletin

Cypher for People Who Think in SQL

A standalone field guide to reading graph queries when SQL is your native tongue. One mental model, a translation table, and an honest answer to "why a graph and not Postgres?"

By Ambro Quach

CypherSQLNeo4jData Modeling8 min read

Part 5 of a 5-part series on the engineering behind Arc Radius, a platform that tracks US state legislation affecting LGBTQ+ youth. The first four posts were a system retrospective. This one's a change of pace: a standalone guide to reading graph queries if SQL is your native tongue — drawn from the query layer that powers the whole platform.


Why this post exists

Arc Radius is unusual in that it carries both database worlds in one codebase. The source data from LegiScan arrives as a normalized relational schema — about 40 tables, the kind of thing any SQL person reads on sight. The serving layer is a Neo4j property graph, queried in Cypher. And the ingestion script that connects them is, quite literally, a relational-to-graph migration: it reads the relational exports and writes them out as nodes and relationships.

So the same data is modeled two ways, and we spent a lot of time translating between them. If you already think in SQL, none of Cypher is hard — it's just unfamiliar notation for ideas you already have. This post is the translation table we wished we'd had on day one.

The one mental model

Here's the single idea that makes Cypher click:

A join in SQL is a relationship traversal in Cypher.

SQL is a set-algebra language. You name tables and the predicates that connect their rows, and the engine computes the resulting set. Cypher is a pattern-matching language. You draw the shape of the data you want using ASCII art — (node)-[:REL]->(node) — and the engine finds every subgraph matching that shape.

That's the whole conceptual leap. Where SQL says "these two tables, joined on this key," Cypher says "this node, connected by this kind of arrow, to that node." Same relationship, different way of expressing it. Once you read (b)-[:IN_STATE]->(s) as "bill joined to state," you can read most Cypher.

The cost model differs in a way worth holding onto, though: in a relational database a join is computed at query time against indexes, while in a graph the relationship is a stored edge you walk directly. That difference is exactly why some queries are dramatically faster as a graph and others aren't — but that's a topic for the end.

The translation table

Here's the core vocabulary. Most of it is one-to-one:

Cypher (graph)SQL (relational)
Node label (:Bill)Table ls_bill
A nodeA row
Property b.statusColumn ls_bill.status_id
Relationship (b)-[:IN_STATE]->(s)Foreign key, or a join table
Relationship property [s:SPONSORS {position}]An extra column on a join table
MERGE (n {key})INSERT ... ON CONFLICT (key) DO UPDATE (upsert)
Uniqueness constraintPRIMARY KEY / UNIQUE and its backing index
Label/property indexA B-tree index on a column
Vector index (HNSW)a pgvector index (USING hnsw)
Fulltext index (Lucene)tsvector + GIN, or FULLTEXT
OPTIONAL MATCHLEFT JOIN
Variable-length path -[:R*1..3]->Recursive CTE (WITH RECURSIVE)
WITH (pipe one stage into the next)A CTE (WITH x AS (...))
UNWIND $rowsunnest() / a multi-row INSERT
collect()array_agg() / json_agg()

If you internalize just the top half of that table, you can read the query layer of most graph applications.

A few translations in full

The table tells you the words; seeing them in sentences is what makes it stick. Here are the patterns that come up most.

A join with a left-outer fallback. The bill-detail query attaches a bill to its committee:

The arrow (b)-[:REFERRED_TO]->(cm) is the join. OPTIONAL MATCH is LEFT JOIN — a bill with no committee still comes back, with the committee fields null. The one thing to notice is the modeling difference: in SQL the link is a foreign-key column on the bill row; in the graph it's a stored edge between two nodes.

An optional filter. The paged bill list filters by state and, conditionally, by stance:

b.label IN stances is label = ANY(:stances). And size(stances) = 0 OR ... is the Cypher idiom for an optional filter — "if no stances were requested, don't filter on stance." It's the exact pattern you build in SQL with cardinality(...) = 0 OR col = .... Different syntax, identical intent.

An upsert. The ingestion writes bills idempotently:

MERGE is match-or-create on the key pattern, and SET applies on both the create and the match path — which is precisely upsert semantics. This is what makes re-running the pipeline idempotent: no duplicate Bill nodes. And there's a critical performance footnote that translates directly: MERGE is only fast because the schema declares bill_pk unique. Without the backing constraint, MERGE does a full label scan per row and can even race into duplicates — exactly the way a SQL upsert needs a unique index on its conflict target to be safe and fast. The lesson is identical in both worlds: the constraint isn't optional, it's what makes the write correct.

A many-to-many with an attribute. This is where the two models visibly diverge. The "sponsors" relationship carries a position:

A many-to-many relationship that carries data is the textbook case where SQL forces you into an association table — ls_bill_sponsor, with a composite primary key and an extra column for the attribute. The graph just stores the attribute on the edge itself. This is the one place where the graph model is genuinely more natural to write. (Note the two MATCH clauses before the MERGE: the edge can only exist if both endpoints already do — which is exactly why the ingest loads all nodes before any relationships, the same way you load parent rows before enforcing foreign keys.)

Batching. Every ingest query opens with UNWIND $rows AS row, fed 500 rows at a time:

UNWIND turns one list parameter into N rows inside a single statement — the graph analog of a multi-row INSERT or INSERT ... SELECT FROM unnest(...). And the reason for the 500-row batch size is the reason you batch SQL inserts too: amortize per-statement and network overhead, bound transaction size. One round trip per 500 rows instead of one per row is the difference between minutes and hours on tens of thousands of records. The optimization instinct ports over unchanged.

When the graph actually earns its place

The fair question a SQL person asks is: if it all translates, why use a graph at all? The honest answer is that for a lot of these queries — paged lists, filtered lookups, simple joins — it genuinely doesn't matter much. Relational would serve them just as well.

The graph earns its keep in two specific places. The first is deep or variable-length traversal — questions shaped like "everything within N hops of this node." In SQL that's a recursive CTE that gets unwieldy fast; in Cypher it's -[:R*1..3]->, and the engine walks stored edges rather than recomputing joins at each level. The second is relationships as first-class data — the sponsors-with-a-position case above, where the connection itself carries attributes and you query over those connections constantly.

For Arc Radius, the deciding factor was actually a third thing: the graph also hosts the vector index that powers semantic search (the subject of Part 1). Keeping the bill relationships and the chunk embeddings in one store, queryable together, is what made the retrieval layer possible without stitching two databases together at query time.

So the real takeaway isn't "graphs beat relational" — it's that they're the same ideas in different clothes, and the right choice is about which shape your hardest queries take. If your hard queries are aggregations and reports, relational. If they're traversals and richly-attributed relationships — especially alongside vector search — a graph stops being exotic and starts being the obvious fit. Either way, if you can read SQL, you can read the graph. It was never a different way of thinking; just a different way of writing it down.


That's the series. Five posts, one system — from the retrieval layer that answers questions, through the pipeline that keeps it fed, the model that reads the law, the API that serves it all, and the query language underneath. Thanks for reading.

More From Product Bulletin