Version 4.5 (final)

Version 4.5

Released: September 12, 2023

Status: Stable

Changes

Converter

The PyPDF2 package was replaced with the original pypdf package and updated to the latest version. Several PDF parsing and page introspection issues were solved due to this change.

Transformation layers now support icon classes and don’t expect a FontAwesome identifier anymore.

Credentials

The commercial credentials app was generalized and added to the core apps. This app provides a centralized location to store and protect external authentication credentials. By default two credential backends are provided: token, username and password. The credential backend
system is extensible and other credential systems can be added.

Apps that use external authentication, like the mailer and sources, were updated to use credentials in their setup forms. In the case of features that use optional external credentials or where the credentials are the result of a template, like the HTTP workflow action, staging storage source, and watch storage source, the credential is selected and passed as a variable to the template.

Dependencies

Update Python dependencies version:

  • PIP from 22.2 to 23.2.1
  • Redis from 4.2.0 to 4.6.0
  • Wheel from 0.37.0 to 0.41.0
  • Bleach from 4.1.0 to 6.0.0
  • django-auth-ldap from 4.0.0 to 4.4.0
  • PyYAML from 6.0 to 6.0.1
  • importlib-metadata from 5.0.0 to 6.8.0
  • requests from 1.14.3 to 2.0.4
  • django-extensions from 3.1.5 to 3.2.3
  • django-rosetta from 0.9.8 to 0.9.9
  • django-silk from 4.3.0 to 5.0.3
  • flake8 from 4.0.1 to 6.1.0
  • ipython from 7.32.0 to 8.14.0
  • twine from 3.8.0 to 4.0.2
  • Pillow from 9.4.0 to 10.0.0
  • dateparser from 1.1.1 to 1.1.8
  • elasticsearch from 7.17.1 to 7.17.9
  • elasticsearch-dsl from 7.4.0 to 7.4.1
  • python-magic from 0.4.26 to 0.4.27
  • gunicorn from 20.1.0 to 21.2.0
  • sentry-sdk from 1.12.1 to 1.29.0
  • whitenoise from 6.2.0 to 6.5.0
  • django-cors-headers from 3.10.0 to 4.2.0
  • drf-yasg from 1.21.4 to 1.21.7
  • jsonschema from 4.4.0 to 4.18.0
  • swagger-spec-validator from 2.7.4 to 3.0.3
  • boto3 from 1.24.70 to 1.28.16
  • django-storages from 1.13.1 to 1.13.2
  • extract-msg from 0.36.4 to 0.37.1
  • pycryptodome from 3.10.4 to 3.18.0
  • celery from 5.2.7 to 5.3.1
  • django-celery-beat from 2.2.1 to 2.3.0
  • django-formtools from 2.3 to 2.4.1
  • psycopg2 from 2.9.3 to 2.9.6

The Mayan EDMS version checking was improved.

Support was added for comparing versions. This is relevant for custom versions of Mayan EDMS when deciding when to merge back from upstream.

When reporting version mismatches the different versions numbers are shown.

Improved how dependencies copyright and license information is extracted.

Development

Support was added for PEP-0440 local versions.

The release exporter was updated to use pathlib for internal path computations. Support for bbcode was removed.

The release exporter no longer requires Mayan EDMS or Django to work.

Support for configurable release directory location was added.

Custom version parsing code was replaced with a wrapper for the Python packaging library.

Support to extract and manipulate more parts of the version string like the pre-release and post release parts was added.

Docker

The Debian Docker base image from version 11.5-slim to 12.1 “Bookworm”.

Environment variables were added to expand configuration of services and function of the Docker Compose file. The environment variable MAYAN_DOCKER_RABBITMQ_HOSTNAME was added to control the the default host for the Celery broker URL.

The environment variable MAYAN_DOCKER_RABBITMQ_HOSTNAME also sets the hostname of the RabbitMQ service container which enables persistent queue content even after shutting down the Docker Compose stack.

The Redis Celery result database was made configurable via MAYAN_REDIS_RESULT_DATABASE and defaults to 1.

The Mayan EDMS Redis lock database was made configurable via MAYAN_REDIS_LOCK_MANAGER_DATABASE and defaults to 2.

A note regarding opening up RabbitMQ data port was added.

Docker image versions were updated:

  • mysql from 8.0.32 to 8.0.34
  • docker from version 20.10.21-dind to 23.10.6-dind
  • postgresql from 13.10-alpine to 13.11-alpine
  • python 3.10.11-slim to 3.11.4-slim
  • rabbitmq from 3.11.13-alpine to 3.12.2-alpine
  • redis from 7.0.10-alpine 7.0.12-alpine

Add container dependency was added to ensure containers are started only after the setup_or_upgrade containers finishes which properly initializes the database and does migrations on subsequent startups. This results is more predictable and repeatable deployments at the expense of an small increase in startup time.

Support for Django’s CONN_MAX_AGE was added via the new environment variable MAYAN_DATABASE_CONN_MAX_AGE.

The the PostgreSQL container command arguments were updated to increase the performance and scalability of the out-of-the-box installation.

Added a maximum Docker logging size for all Mayan EDMS containers. The values are 3 log files of a maximum 100MB each.

Added the multi_container profile. This allows easy switching from a single all-in-one container Docker Compose deployment to a multi container deployment. The multi container profile uses substantially more resources and is not meant for basic installations, or for installations
running on NAS or other small appliances.

The Keycloak database name is now ensured to be the same as the Keycloak PostreSQL one.

Renamed all environment variables containing POSTGRES to use the full name POSTGRESQL. These are: MAYAN_DOCKER_KEYCLOAK_POSTGRES_TAG, MAYAN_KEYCLOAK_POSTGRES_VOLUME, MAYAN_DOCKER_POSTGRES_IMAGE, MAYAN_POSTGRES_VOLUME.

Added support for changing the worker log level via the new environment variable MAYAN_WORKER_LOG_LEVEL which defaults to ERROR.

Documentation

Support per app documentation. Each app can include its own documentation which is collected at build time.

Documents

The document file deletion operation was updated to now be a background task. Requesting a document file deletion will no longer return a 204 status but instead code 202 (Accepted).

Updated the type of the document file size field to a PositiveBigIntegerField to allow tracking document files bigger than 2GB in quota queries.

Added support for per document type document stub pruning. This change adds the document type fields document_stub_pruning_enabled, document_stub_expiration_interval, and removes the setting
DOCUMENTS_STUB_EXPIRATION_INTERVAL which is now configured per document type. By default pruning of document stubs is enabled to preserve the existing behavior. Disabling document stub pruning can be used to support document archiving where the document files are deleted but the
document database information is kept for reference.

All references to document type deletion policies were renamed to document type retention policies.

The active version and latest file attributes of documents were updated to be stored fields instead of computed values. This increases performance is large installations by a significant margin. Migrations
for large installation will take longer in order to populate these fields, account for theses extended times during the upgrade.

Added a new document file introspection link and view. This view re-scans the document file and populates the size, checksum, and mimetype of files. It also updates the document file page count and creates a new document version linking all discovered file pages. This view
replaces the previous document file page count update link and view.

Deleting a document file page will now also delete any document version pages that link to it. This avoid left over invalid page references.

New document versions create manually will no longer become active by default. Only new document versions created as a result of a document file upload will become active by default.

The new document file pages and document version pages code was updated to create the objects as a single bulk operation. The documents queues where also split into more smaller queues that can be spread over more workers increasing scalability.

Removed the unused signal signal_post_document_created. Use the signal_post_document_file_upload instead or Django’s default post_save
signal for the Document sender.

The document tasks callback interface was updated to accept a single dictionary for all the task and subtasks callbacks instead of having to pass different structures for each callback.

Document downloads and document exports were moved to their own queues.

Downloads

The API for the download file area was added. This API allows listing, deleting, and download actions.

Support was added for document download message templating. This allows customizing the message users receive when their document or document bundle is ready for download.

The document and document file creation code paths were updated to occur in smaller units that can be spread over multiple workers easier to increase parallel upload speeds and scalability. This is very important when performing multi million initial document imports or migrations
from existing systems.

Duplicates

Update duplicate bulk creation was updated to work in batches of 100 entries in order to avoid overwhelming the database cache when a large set of documents is uploaded.

Duplicates task queue was moved to the C worker.

Events

The EventManager classes were moved to their own module. Update your custom app imports to handle this change.

The event system was updated to work in asynchronous mode. This increase the performance of the event system by several orders of magnitude but has the disadvantage that events might not be processed in the same order they occurred. Update any automation or workflow to account for
this change. Future improvement are planned to provide order-of-execution support.

To allow disabling asynchronous mode and reverting to the previous behavior the setting EVENTS_DISABLE_ASYNCHRONOUS_MODE was added.

The events app queues were split into two to process fast and slow operations separately.

File caching

Updated the file cache maximum_size field from a BigIntegerField to a PositiveBigIntegerField. This removes the need to do validation for negative numbers.

The cache model field maximum_size was promoted to a database index to speed up cache calculations.

Widgets for file cache were added to the administrator dashboard.

Updated the cache and cache partition purge loop to continue executing even when there are files that cannot be purged. Cache partition files that could not be deleted will be skipped and retried on the next purge execution.

Indexing

The logic of the indexes was updated to remove documents even if the index is disabled.

Another wave of indexing algorithm optimization updates was merged for increased update performance for centi million document installations and for installations under heavy load.

The size of the document indexing value field was updated from 128 to 255 characters to allow for large template results. This field is indexed and the increase does not affect performance.

Search

Several redundant fields were removed from the search index. This includes document level fields from document files, document file pages, document version, and document version pages.

The advanced search form was improve to use fieldsets. It will also now sort the search fields.

A new worker E was added and devoted to search tasks. This isolates the search index uploads and prevents them from slowing down other tasks.

Settings

Added a new setting named CONFIGURATION_FILE_IGNORE which cause the setting system to not load settings from the config.yml file or save the current configuration to the config_backup.yml file. This is used on immutable installations and now during testing.

Custom setting cache implementation was removed in favor of Python’s functools.cache.

A set_value method was added to allow overriding a bootstrap setting’s value.

Support was added for passing a global_symbol_table argument when updating the setting namespace global symbol table. This is used by enterprise installations that require setting partitioning based on deployment or platform logic.

Sources

When documents are upload, information about their source, like where they came from and who sent them is now stored in a new structure that continues to be accessible as long as the document exists. This helps access this information easily via the new {{ document.source_metadata_value_of.source_id }} template helper.

By default all uploads store the ID of the source used. Other backends like the email sources store more information like the sender, receiver, subject, message ID.

This feature replaces the metadata assignment in the email sources, works for all source types, and allows for more values to be stored other than just the previous sender, subject, and message ID.

The next wave of sources app refactor patches were merged. These split the single sources app into separate apps per source type. This allows adding custom sources much easier. Individual source apps can also be disabled independently.

Two new source were added: staging storage and watch storage. These work like the staging folder and watch folder but use a configure storage system that can access storages like object storage or remote storages. This allows the staging storage and watch storage to monitor files in
remote location without having to first mount NFS or SMB directories as the OS level.

Specific source backend functionality was consolidated into reusable mixins. These work in a modular fashion and can be used to add capabilities to any built in or custom defined source very easily.

Fieldsets support was added to the source backend setup forms.

Support was added for single or multiple document API uploads. This feature is called immediate mode and allow receiving a document ID immediately after a file is submitted to a source via the API. This
document ID allows working with the document while it is being processed. Since only one document ID can be returned per upload, using immediate mode will disable processing compressed files.

The entire source system was updated to work in asynchronous manner. Submitting large files via the staging or watch sources will work very fast regardless of the file being submitted even if the file is a multi
gigabyte file. This change combined with object storage support for sources allows for uploading very large files and bypass HTTP and reverse proxy upload limits.

A new source actions system was merged. This change unifies and exposes the capabilities of each source via the API so that they can be inspected and used via the API or via Python code exactly as they are used via the HTTP user interface. Each action defines what interface they support along with the arguments that each interface requires. In effect this is a minimal custom MVC framework for sources.

Reduces the size of the secondary icon of the FontAwesomeDualClassesDriver drive to make the source metadata icon more readable.

Updated the source backend’s get_upload_form_class attribute to be an instance method and allow backends to dynamically change the form fields. Custom source can now enable or disable fields based on user access or other system state.

Staging folder file image cache errors are now ignored if the image cache is not already generated when deleting the staging folder file.

The SANE scanner source was updated to perform the scan as a background task. This means that the scan might not happen immediately when the scan form is submitted but won't block the user interface and allows scanning documents of unlimited pages that would normally be interrupted by the HTTP timeout.

The document upload wizard was updated to support user access filtering. This feature is used to filter cabinets, metadata and tags during upload based on the access of the logged user.

Support was added to disable the wizard next button conditionally. This is used to stop uploads when a required metadata type is not available to the user due to missing access.

Storage

The stale shared uploaded file and download file deletion loop was updated to continue executing even when there are files that cannot be deleted. Remaining files are skipped and retried on the next iteration.

Apps that handled shared upload files can now offload deletion of these files after use to the storage app. The documents app makes use of this new capability resulting in faster document processing during large document batch uploads.

Storage task queue was moved to the B worker.

Task manager

The shared “Tools” queue was removed. Each app is now responsible of defining its own queue for slow tools tasks.

Tasks queues were balanced among workers to avoid slower tasks from blocking faster tasks.

The task manager was updated to provide more information. It will now list workers, queue, and task types along with help information for each.

Increase default maximum worker tasks by 10x.

The default maximum memory per Celery worker child was increased from 300000 to 400000.

The options --without-gossip and --without-heartbeat were removed from the run_worker script. This undo a this previous Celery optimization to ensure task queue processing does not freeze under some situations when using RabbitMQ. The tasks queue processing is now more resilient to communication failures, sudden shutdowns, and high latency network situations.

Templating

Improved template initialization to support custom tag loading.

Testing

Added a system check named check_app_tests to ensure Mayan apps tests flag matches the actual state of the app’s tests existence.

Add improved test case tag inheritance to workaround a limitation in Django’s test tag inheritance that prevents tags from test mixins to propagate to the child classes.

The test system will now create a temporary MEDIA_ROOT folder when running tests. This change allows further isolation of testing artifacts.

Support was added for running test label/tag from the make file with make test TAG=.

User interface

On small screens, the main menu will now when a link is clicked.

Return links were added to the “Tools” and “Setup” areas to speed up navigation.

The AJAX loading spinner is shown in mobile devices.

The way the project title setting works was updated. The code was updated to reflect the actual purpose of the setting which is to identify an installation and not to do rebranding.

The way Tools and Setup view buttons are rendered was improved to ensure they do so with consistent heights always.

The Dropzone.js usage was improved and turned into to a Django widget for cleaner form integration.

The HTML widgets modules were split into HTML and column widget modules.

User management

All usage of the term “superuser” was updated to “super user” or “super_user”.

A warning message will now be displayed informing that it is not possible to change password of staff or super user accounts from the user interface.

Super user column was added to user list view.

Workflows

Workflow previews were updated to change the symbol to identify transitions, actions, and escalations with conditions from the math arrow to a math symbol for function (fn of).

Escalations are now presented in the workflow previews. Escalation hashes are now used to invalidate workflow previews caches.

Ensure the workflow state action column is not shown for the workflow state runtime proxies where is does not make sense to show.

An escalation list column was added to workflow states list view.

Transition are now shown in the workflow template state escalation list view.

Only the correct transitions can now be select for the corresponding workflow template state escalation in the user interface and the API.

Other

  • Add ContentType API detail view.
  • PropertyHelper updates:
    • Move all PropertyHelper usage to their own modules.
    • Add property helper file_metadata_value_of to document files.
    • Formalize PropertyHelper behaviors and testing.
    • Tag all PropertyHelper with classes_property_helper.

Removals

Backward incompatible changes

Deprecations

Issues closed

2 Likes