Setting up a postgres docker container
Assuming that you already have Docker installed and a Docker Hub account created we’ll start by creating a project folder with a pgdata folder inside of it:
1
mkdir -p data_pipeline/pgdata/
I’ve named my project folder “data_pipeline” because I’m going to turn this into a data pipeline project. The pgdata folder will be used to store data from the postgres container. The name “pgdata” is arbitrary like the project name, but the folder is needed because containers do not persist data after they are stopped. This means we’ll lose all the data we put into our database unless we tell Docker to put it somewhere. We do this through volumes as you’ll see below.
The -p flag tells the mkdir command to create all folders/subfolders needed to make the path exist.
Now we’ll create a network that this container will use. This way when the project grows and we pull in more containers, they can all communicate with each other over this network.
1
docker network create networkdp
The above command creates a network called networkdp. You can confirm it has been created via
1
docker network ls
Now navigate to the pgdata folder and run the following command. An explanation of what this all means follows after it.
1
2
3
4
5
6
7
docker run -d \
--name postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=raw \
--network networkdp \
-v $(pwd):/var/lib/postgres/data \
postgres
docker run
The docker run command is used to create and start a new container.
-d
Starts the container in detached mode. This way we keep our terminal rather than jumping into the container.
--name postgres
This is how we name the container. Here we’ve named it postgres.
-e POSTGRES_PASSWORD=postgres
The -e option sets up environment variables in the container. Postgres requires that there at least be a password variable set up, so we do that here. The “postgres” password is fine for now, but there’s a better way to handle sensitive data altogether that will be covered in later posts.
-e POSTGRES_DB=raw
This creates a database called raw. This will be used as the project grows to store raw incoming data.
--network networkdp
This lets docker know that this container should run on the network we created
-v $(pwd):/var/lib/postgres/data
This establishes a volume mount between a local directory and directory in the container i.e. -v "local directory":"container directory"
. Since we navigated to the pgdata folder before running the command the $(pwd)
just punched in the absolute path to the pgdata folder. It’s not necessary to do that however, you could also just write the path in yourself from whatever folder you’re in.
The /var/lib/postgres/data/ folder is the folder inside the container where postgres puts data. You can read up on that via the docker postgres docs.
postgres
This is the name of the postgres image. This causes the docker run command to pull the latest version of postgres though you could use one of the tags listed on the prior link to get a specific version. For example, you could do something like postgres:14 to get version 14.
Now if you run
1
docker ps
You should see the postgres container running. And that’s that. We now have a container running postgresql that we can store data in later.
Extra
If you want to connect to the container and explore the databases there, you can do so via psql. Run the following
1
docker exec -it postgres psql -U postgres
This will drop you in the container as the user postgres (that’s what the -U is for. If you omit it, you’ll get an error because it’ll default to root).
Use \? to read through the list of commands.
Alternatively, you can also hook up a gui program to it (DBeaver, DBVis, PGAdmin4, etc) I believe through localhost:5432. If you want to view the IP of the container use docker inspect postgres
and look under “Network” > “networkdp” > “IPAddress”