I had a report today from somebody unable to log into our Grafana instance. Super weird because this this has been running fine for months and we haven't touched it. So I jumped onto the machine to see what was up. First up was just looking at the logs from Grafana.
1 | docker logs -f --tail 10 5efd3ee0074a |
There in the logs was the culprit No space left on device
. Uh oh, what's going on here? Sure enough the disk was full.
1 | df -h |
1 | Filesystem Size Used Avail Use% Mounted on |
This stuff is always annoying because you get to a point where you can't run any commands because there is no space left. I started with cleaning up some small parts of docker
docker system prune -a
Then cleaned up docker logs
sudo truncate -s 0 /var/lib/docker/containers/**/*-json.log
This then gave me enough space to run docker system df
and see where the space was being used. Containers were the culprit. So next was to run
docker ps --size
Which showed me the web scraper container had gone off the rails and was using over a 100GiB of space.
1 | 7cf14084c56a webscraper:latest "wsce start -v -y" 7 weeks ago Up 20 minutes 0.0.0.0:7002->9924/tcp, [::]:7002->9924/tcp webscraper 123GB (virtual 125GB) |
This thing is supposed to be stateless so I just killed an removed it.
1 | docker kill 7cf14084c56a |
After a few minutes these completed and all was good again. So we'll keep an eye on that service and perhaps reboot it every few months to keep it in check.