What is GIN in PostgreSQL?
The GIN or Generalized Inverted Index, is one of the most powerful indexing techniques in PostgreSQL. It suits best indexing composite values, such as arrays, JSONB, or full-text search. In this article, we will learn about what GIN is, how it works along with their syntax and examples.
What is GIN in PostgreSQL?
GIN is the abbreviation for Generalized Inverted Index in PostgreSQL. GIN indexes search operations on various data structures that contain elements, such as arrays, JSONB, and full-text search data.
Just like B-tree indexes would directly index an entire value, GIN indexes components of a value and therefore is the one thats suitable for queries that have a search through the components.
Syntax of GIN in PostgreSQL
You can create a GIN index using the CREATE INDEX command, specifying the column and data type you want to index.
Below is the basic syntax:
CREATE INDEX index_name ON table_name USING GIN (column_name);
Alternatively, for composite types like JSONB or arrays, you can use GIN indexes with operators or functions:
CREATE INDEX index_name ON table_name USING GIN (column_name gin_trgm_ops);
How GIN Works?
A GIN index keeps track of the associations between an indexed item and its individual components-rather be elements of an array or words appearing in text. It is hence a kind of reverse file system-where indexed components point back to rows containing them.
For example, take the list [1, 2, 3]. GIN will index each of those separately and map them all over into the row where the list is stored. The picture above makes it easy to look up any of the constituents.
Examples of Using GIN Indexes
Example 1: GIN on an Array Column
Consider the following table where a column contains an array of integers:
CREATE TABLE items (
id serial PRIMARY KEY,
tags int[]
);
To speed up queries on this array column, you can create a GIN index:
CREATE INDEX idx_gin_tags ON items USING GIN (tags);
Now, if you query the table for rows that contain a specific tag, the GIN index will improve performance:
SELECT * FROM items WHERE tags @> ARRAY[2];
Output:
id | tags |
---|---|
1 | {1,2,3} |
4 | {2,4,5} |
The query searches for rows where the tags array contains 2, and the GIN index makes this search faster.
Example 2: GIN for Full-text Search
Let’s assume you have a table for storing articles with a text field:
CREATE TABLE articles (
id serial PRIMARY KEY,
content text
);
You can create a GIN index for full-text search by converting the text to a tsvector:
CREATE INDEX idx_gin_content ON articles USING GIN (to_tsvector('english', content));
Now, full-text search queries will be much faster:
SELECT * FROM articles WHERE to_tsvector('english', content) @@ to_tsquery('PostgreSQL');
Output:
id | content |
---|---|
3 | Learn GIN indexing in PostgreSQL |
In this query, the GIN index speeds up the search for articles containing the word "PostgreSQL."
Example 3: GIN on JSONB Column
For a table containing JSONB data:
CREATE TABLE products (
id serial PRIMARY KEY,
details jsonb
);
Create a GIN index on the JSONB column:
CREATE INDEX idx_gin_details ON products USING GIN (details);
Now you can efficiently query for specific key-value pairs within the JSON:
SELECT * FROM products WHERE details @> '{"color": "red"}';
Output:
id | details |
---|---|
2 | {"color": "red", "size": "medium"} |
The GIN index significantly improves performance when querying for specific JSON elements.
GIN vs. Other Index Types
GIN is not one of the many indexes in PostgreSQL, and it has some unique advantages and disadvantages compared to other index types:
- Gin Vs. B-Tree: GIN is much more efficient for multi-value data types like arrays and full-text search, but B-Tree is better for scalar values (numbers, strings).
- GIN vs. GiST: GIN is optimized for fast reading, while GiST is more flexible, supports a larger variety of queries (although in worse search performance).
Use GIN usually for
- Full-text searches
- JSONB fields
- Arrays or other composite types
Disadvantages of GIN Indexes
Despite GIN indexes being highly powerful they have some disadvantages as well:
- Slower Write Operations. In GIN indexes, it indexes complex data structures. So, the time it takes to index all components would take much longer and, thus consumes much resources.
- Consumes Much Memory. Depending on the component indexing, GIN indexes may consume more memory and disk space than other indexes would necessarily require.
- Not Suitable for Range Queries. GIN indexes are terrible for queries requiring a range such as BETWEEN or > operations. For such queries, one can always rely on B-tree indexes.
Conclusion
The Generalized Inverted Index provides PostgreSQL with highly powerful indexing for the multi-value data types such as arrays, JSONB, and full-text search. Aware of its use cases together with its performance characteristics, you can use the techniques of GIN for optimizing query performance in PostgreSQL applications. GIN cannot be the best choice for every scenario, but the ability it offers to index complex data structures makes it extremely valuable in specific kinds of searches.