Serve Mayan EDMS over HTTPS using Traefik reverse proxy, LetsEncrypt and docker-compose

Community contributed guides or tutorials for multiple topics like installations for other operating systems or platforms, monitoring, log aggregation, etc.
Post Reply
User avatar
rssfed23
Moderator
Moderator
Posts: 185
Joined: Mon Oct 14, 2019 1:18 pm
Location: United Kingdom
Contact:

Serve Mayan EDMS over HTTPS using Traefik reverse proxy, LetsEncrypt and docker-compose

Post by rssfed23 »

By default Mayan EDMS only listens on port 8000. When we use docker compose the "ports" option forwards port 80 on your Docker host to port 8000 inside the container where Mayan is running.

It is commonplace to secure http services with a reverse proxy so that the application (Mayan) doesn't need to configure https itself. This is where Traefik comes in.

Traefik is a cloud-native reverse proxy written in go. It is fast, easily configurable and highly scalable and has a lot of features beyond adding https to an existing web service. Advantages over using a "Traditional" Nginx reverse proxy include:
- Simpler configuration through environment variables supported natively
- Lower resource (RAM/CPU) usage with fast go libraries
- Build for the cloud-native world so integrates seamlessly with Docker and Kubernetes. In fact; I personally find it difficult to deploy outside of a container environment
- Integrated metrics, tracing, and logging
- The native integration means we can do automatic service discovery of containers and automatically expose them over https
- Can do any type of TCP as well as http and supports a load of other awesome features
- It has a lovely web dashboard to help figure out what's going on in your reverse proxy environment:

Image

This guide will walk us through modifying your existing mayan docker-compose file to include Traefik.

Mayan/gunicorn will no longer be exposed to the outside world at all, and all web traffic to Mayan will go through Traefik and be encrypted with https using a free SSL certificates from LetsEncrypt. Traefik will automatically renew your certificate before expiry also.
We will use the LetsEncrypt TLS challange as that does not require you to also open port 80 to renew certificates.

Requirements:
- Docker
- Docker Compose
- An external DNS name you want to use. This process will NOT work with local/self-signed certificates (although Traefik can do that easily and I'll write a follow up guide for that soon)
- Port 443 forwarded from the internet
- A bit of patience

Setup:
Open your Mayan docker-compose file and add the following under services (but before mayan):

Code: Select all

  traefik:
    image: "traefik:v2.0.0-rc3"
    container_name: "traefik"
    command:
      - "--api.insecure=true"   #Change this to false if you want to disable the unsecured web dashboard
      - "--providers.docker=true"  #Enable the docker provider
      - "--providers.docker.exposedbydefault=false"  #Setting this to true will expose all containers automatically. We don't want this for Mayan
      - "--entrypoints.websecure.address=:443"  #What port Traefic should listen on. Must have a ports mapping further below
      - "--certificatesresolvers.mytlschallenge.acme.tlschallenge=true"  
      - "--certificatesresolvers.mytlschallenge.acme.email=[b]MYEMAILADDRESS@ME.COM[/b]"  #Put your email address here
      - "--certificatesresolvers.mytlschallenge.acme.storage=/letsencrypt/acme.json"
    ports:
      - "443:443"  #Exposes entrypoints.websecure.address from above so Traefic can be reached
      - "8080:8080" #remove this line if you want to disable the web dashboard
    volumes:
      - "./letsencrypt:/letsencrypt"  #Where your certificate is stored. Do NOT remove this
      - "/var/run/docker.sock:/var/run/docker.sock:ro"  #How Traefik talks to Docker. Do NOT remove this
Warning: Indentation is important with yaml, so ensure that you copy all the blank spaces before the command block otherwise the file may not be indented enough. The "T" from "traefik" should be 2 spaces. There is a complete compose example at the end of the guide you to review.

In the section you pasted, you now need to go in and change MYEMAILADDRESS@ME.COM to your own email address. This is the email letsencrypt will sent certificate related information to when needed and should be a valid email address so you can be notified of any problems with the certificate further down the line.

Also note the volumes section: Traefik will create a folder in the location you run docker-compose called letsencrypt. This is where Traefik stores your certificate data while it's running. Traefik will automatically re-generate the file if it's deleted, so you are free to replace the Docker bind mount with a standard docker-volume
We need to tell Mayan that it has a dependency on Traefik. In the Mayan section add a dependency to Traefik:

Code: Select all

  mayan:
    depends_on:
      - redis
      - traefik
Next we need to tell docker to no longer expose mayan on port 80 (as Traefik will handle 443 for us).
In the Mayan service section, find and remove your port definition lines:

Code: Select all

    ports:
      - "80:8000"
Delete those 2 lines from your compose file.
There should no longer be any "ports" listed anywhere in the compose file anymore below the Traefik section we pasted in earlier (see the bottom of the guide for an example full compose file).

Next, we need to add some labels to our Mayan service definition.
Traefik will read these labels when the Mayan container starts and then use the information provided to figure out what to do. Add the following below your existing volumes section (included here for reference):

Code: Select all

    volumes:
      - /docker-volumes/mayan-edms/media:/var/lib/mayan   #This line or similar is already in your docker-compose file
    labels:
      - "traefik.enable=true"  #Tells Trafeik that it should proxy this container
      - "traefik.http.routers.mayan.rule=Host(`[b]YOURDOMAINGOESHERE[/b]`)"  #Put the domain people will use to reach Mayan here
      - "traefik.http.routers.mayan.entrypoints=websecure"
      - "traefik.http.routers.mayan.tls.certresolver=mytlschallenge"
You need to update YOURDOMAINGOESHERE to whatever domain name you want to access mayan through. For example:

Code: Select all

      - "traefik.http.routers.mayan.rule=Host(`mayan.myawesomecompanyname.com`)"
Ensure you don't remove the backtick quotes or brackets in the process of doing this and don't include any / afterwards it's a domain name only.

If you want to access Mayan over a different context (such as myawesomecompanyname.com/services/mayan) we will edit the guide with how to do contexts in Traefik later on as domain-only handles the majority of use cases that Mayan supports.

You're now all set to run "docker-compose up -d" and wait for Docker to start all the containers.

The Traefik dashboard can be accessed on http://nodeipaddress:8080.

Note: The Traefik dashboard is deployed by default with no security. Therefore, it is not exposed to the outside world in the same way. If you're on a local LAN with no firewall in front of your Docker Host then other users will be able to access the dashboard. The dashboard is read only and settings can't be changed however you want to avoid exposing it to the public internet. Don't forward 8080 from the public internet to your Docker Host and this will be prevented.
You can remove the port 8080 line from the Traefik section in your docker-compose file if you want to completely remove access to the dashboard.


Once Traefik comes up you will likely get an expiry email form LetsEncrypt right away.
This is letsencrypt warning is harmless and can be ignored. It says the certificates will expire in 10 days because by default Traefik requests short lived certificates and renews often. Traefik will automatically manage the certificate renewal/replacement for you and you shouldn't get subsequent emails.

That's it. You're now using Traefik as a reverse proxy for Mayan. You may notice Mayan feels significantly faster (especially over the internet) even though you've done nothing traditionally associated with performance improvements. That's one of the benefits of an optimised reverse proxy over traditional middleware servers.

Please reply to this thread if you have any questions or run into issues or have other feedback. We'll move this to the official documentation in a more concise form soon.
Enjoy!

For reference, below is my full docker compose for Mayan showing a complete example. It's missing postgresql because I'm using Amazon RDS for that:

Code: Select all

version: '3.7'

networks:
  mayan-bridge:
    driver: bridge

services:
  traefik:
    image: "traefik:v2.0.0-rc3"
    container_name: "traefik"
    networks:
      - mayan-bridge
      - default
    command:
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.mytlschallenge.acme.tlschallenge=true"
      - "--certificatesresolvers.mytlschallenge.acme.email=rob.knight@mayan-edms.com"
      - "--certificatesresolvers.mytlschallenge.acme.storage=/letsencrypt/acme.json"
    ports:
      - "443:443"
      - "8080:8080"
    volumes:
      - "./letsencrypt:/letsencrypt"
      - "/var/run/docker.sock:/var/run/docker.sock:ro"

  mayan:
    depends_on:
      - redis
      - traefik
    environment: &mayan_env
      MAYAN_CELERY_BROKER_URL: redis://:mayanredispassword@redis:6379/0
      MAYAN_CELERY_RESULT_BACKEND: redis://:mayanredispassword@redis:6379/1
      MAYAN_DATABASES: "{'default':{'ENGINE':'django.db.backends.postgresql','NAME':'dbmayan','PASSWORD':'nicetry','USER':'mayandbuser','HOST':'areallylongurl.eu-west-2.rds.amazonaws.com'}}"
    image: mayanedms/mayanedms:3.3.7
    networks:
      - mayan-bridge
    restart: unless-stopped
    volumes:
      - /docker-volumes/mayan-edms/media:/var/lib/mayan
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.mayan.rule=Host(`mayan.app`)"
      - "traefik.http.routers.mayan.entrypoints=websecure"
      - "traefik.http.routers.mayan.tls.certresolver=mytlschallenge"

  redis:
    command:
      - redis-server
      - --databases
      - "2"
      - --maxmemory-policy
      - allkeys-lru
      - --save
      - ""
      - --requirepass mayanredispassword
    image: redis:5.0-alpine
    networks:
      - mayan-bridge
    restart: always
    volumes:
      - /docker-volumes/mayan-edms/redis:/data
Please don't PM for general support; start a new thread with your issue instead.

User avatar
rssfed23
Moderator
Moderator
Posts: 185
Joined: Mon Oct 14, 2019 1:18 pm
Location: United Kingdom
Contact:

Re: Serve Mayan EDMS over HTTPS using Traefik reverse proxy, LetsEncrypt and docker-compose

Post by rssfed23 »

Addendum:

One of my favourite things about Traefik is the inbuilt metrics. This can be integrated easily into your overall monitoring solution. I use Prometheus for general Mayan EDMS monitoring and wanted to share the config used to achieve that (there will be a more detailed guide on monitoring Mayan with Prometheus soon). Traefik does support other metrics backends also.

If you change the Traefik services section from earlier in the guide to:

Code: Select all

services:
  traefik:
    image: "traefik:v2.0.0-rc3"
    container_name: "traefik"
    networks:
      - mayan-bridge
      - default
    command:
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.websecure.address=:443" #Says what port Traefic should listen on. Must have a ports mapping further below
      - "--certificatesresolvers.mytlschallenge.acme.tlschallenge=true"
      - "--certificatesresolvers.mytlschallenge.acme.email=rssfed23@myemail.com" # Put your email address here           - "--certificatesresolvers.mytlschallenge.acme.storage=/letsencrypt/acme.json"
      - "--metrics.prometheus=true" #Enables the metrics backend
      - "--metrics.prometheus.addEntryPointsLabels=true" #Enables labelling the metric per entrypoint
      - "--metrics.prometheus.addServicesLabels=true" #Enables labelling metrics per service
      - "--metrics.prometheus.buckets=0.1,0.3,1.2,5.0" #Time intervals so averages within these time ranges (in seconds) are stored
Then restart your docker compose stack. With the above configuration a /metrics endpoint will appear on the same port as the web UI (8080). This can then be scraped with a standard Prometheus scrape_config:

Code: Select all

    - job_name: traefik_mayan
      static_configs:
          - targets: ['IP address of your server:8080']
It's important to use the IP address here as we haven't enabled name based routing for the web dashboard (which is why when going to your server's domain name:8080 it won't show the web UI but give a 404 and prevents standard visitors from stumbling across the endpoint).

You can then graph these metrics using Grafana:

Code: Select all

    - job_name: traefik_mayan
      static_configs:
          - targets: ['3.11.82.255:8080']

I've made a Grafana dashboard (click here to import it) based on a couple of the upstream Traefik ones that show the relevant stats for Mayan (removed things like dedicated http 404 boxes as Traefik will never return a 404 as Mayan is redirecting to the home page every time):


Image


Leave feedback below :)

-----
This is the first post in a whole series that will cover production Mayan EDMS monitoring. To give you a sneak peak of what I'll be typing up over the coming weeks so you can enjoy it also in addition to the above:

Image

Image

Image

Image
Please don't PM for general support; start a new thread with your issue instead.

mcarlosro
Posts: 11
Joined: Fri Jan 10, 2020 3:23 pm

Re: Serve Mayan EDMS over HTTPS using Traefik reverse proxy, LetsEncrypt and docker-compose

Post by mcarlosro »

Hi,

Is mayan monitoring only related to Traefik data (HTTP requests)? I would like to see some metrics like number of documents, indexes, cabinets, ...

Also, I have 3 containers on my Mayan EDMS installation: mayan-edms, Redis and Database. Where in the picture is RabitMQ and Celery?

Thanks,

User avatar
rssfed23
Moderator
Moderator
Posts: 185
Joined: Mon Oct 14, 2019 1:18 pm
Location: United Kingdom
Contact:

Re: Serve Mayan EDMS over HTTPS using Traefik reverse proxy, LetsEncrypt and docker-compose

Post by rssfed23 »

Is mayan monitoring only related to Traefik data (HTTP requests)?
With what's been posted above about monitoring yes, only the Traefik data is exported and graphed.
There's no way to monitor number of documents etc etc currently as there exists to metrics exporter for Mayan. Until someone writes a prometheus exporter for it then there's nowhere to get that data from.
It wouldn't be massively hard for someone to write one given the data is already there in the API, so if there's someone with a bit of Go experience out there that wants to give it a shot be my guest I'll happily help test!

In terms of the other screenshots, those are a preview of other areas that can be monitored.
A default installation only has mayan, redis and postgres. Celery is running inside of the Mayan container (it's what talks to Redis in your setup). Celery handles all the tasks that mayan needs to run. For example; when a document needs to be OCR'd Mayan adds an entry into Redis saying "this document needs OCR". Celery is connected to Redis also and sees that new message appear, retrieves it and then actually executes the OCR process.
If you're using docker-compose to launch Mayan you'll see in the example compose file reference to Flower. Flower is a monitoring tool for Celery tasks.
That dashboard can tell you how many documents have been OCRd, previews generated, things indexed. There's loads of tasks that happen when you upload a bunch of documents to Mayan and Celery is what handles them all by distributing them to various workers.

It's this framework that allows Mayan to run as a distributed app across multiple nodes. A large production environment will be spread across multiple nodes and those nodes will be running Celery to pick up tasks from the queue and execute them.

The reason I have rabbitmq in mine also is because it's a more scalable production ready queuing system. It's used instead of Redis for larger environments. It also has persistence, so I can reboot nodes without worrying about them losing tasks. If you're using the default redis configuration then upload 1000 documents you'll have 1000 documents in the OCR queue. If you then reboot that node after it's only finished OCR on 2 documents Redis will loose the message telling celery to OCR the other 998 an they won't get OCRd unless you tell mayan to OCR them again manually. RabbitMQ doesn't have that problem as it's a persistent message store (Redis is an in-memory database, although it can be configured to persist some data to storage).

One of these days a diagram will appear walking through all these various components and what they do. Most likely in the book.
Please don't PM for general support; start a new thread with your issue instead.

Post Reply