Skip to content

#272: Extensions for DuckDB

DuckDB offers us a fast access to analyse our data. A lot of the flexibility in DuckDB comes from extensions, that we can load when we need them. Let us explore how we can work with those extensions to solve common problems.

What extensions are active?

We can use this query to see what extensions we have in our database and which ones are turned on:

1
2
3
import duckdb

duckdb.sql('SELECT extension_name, installed, description FROM duckdb_extensions();').show()

We can see the 23 core extensions and which ones are turned on.

As we can see in the output of our command, the core extensions cover a wide range of use cases: from support for various data formats, to connections for database systems like PostgreSQL or MySQL, up to abstractions over AWS or Azure blob storage.

Activate an extension

We can activate an extension with the INSTALL and LOAD statement:

duckdb.sql("INSTALL spatial; LOAD spatial;")

If we now check the state of the spatial extension, we see that it is now installed:

The spatial extension is now active.

Use the extension

To use the newly activated spatial extension, we can check the documentation to see what features it offers. All core extensions come with a good documentation that should answer our questions.

With the spatial extension we get a GEOMETRY type that lets us create points and functions like st_distance to find the distance between two points:

1
2
3
4
5
6
7
8
9
duckdb.sql("""
    CREATE TABLE points (geom GEOMETRY);

    INSERT INTO points VALUES (ST_Point(1, 2)), (ST_Point(3, 4));

    SELECT ST_Collect(list(geom)) FROM points;
           """).show()

duckdb.sql("SELECT st_distance('POINT(0 0)'::GEOMETRY, 'POINT(1 1)'::GEOMETRY)").show()
We get a list of points for the first SELECT the second one shows the distance between the two points.

Implicit activation

Many of the core extensions offer an autoload feature, that will activate the extension when we run a command that requires it. We can try this with the httpfs extension that gets activated if we try to access data through an URL:

duckdb.sql("SELECT * FROM 'https://raw.githubusercontent.com/duckdb/duckdb-web/main/data/weather.csv'").show()
We get the data from a Git repository.

The output we see is from the data on a remote server. If we now check the state of the httpfs extension, we see that DuckDB activated it: Now httpfs is installed.

Install community extensions

If we need some special functionality that is not covered with a core extension, we can check the community extensions for something that could solve our problem.

To install the community extensions, we need a slight modification of our INSTALL and LOAD syntax and add FROM community:

1
2
3
duckdb.sql("INSTALL quack FROM community; LOAD quack;")

duckdb.sql("SELECT quack('world');").show()
We got now the quack extension that responses with Quack world 🐥

Update extensions

We can update the installed extensions with this command:

duckdb.sql("UPDATE EXTENSIONS;")

For the community extensions we need to run this command from time to time. Otherwise, we stay at the version that we initially loaded.

For the core extensions we usually get the updates with a new version of DuckDB. But there may be an update available before a new release of DuckDB is published. In this case it can be helpful to explicitly update the extensions.

Next

With extensions we get a powerful tool to add more functionality to DuckDB. We can install what we need and if we do not need the extension, it does not slow us down. That way we get speed and flexibility.

Next week we use the Excel extension to work with *.xlsx files.