E-Rechnung or X-Rechnung in Mayan EDMS

weckwerth · November 22, 2024, 1:35pm

Hi,

as the new year 2025 is coming, it will get mandatory in Germany to receive and archive electronic invoices in two versions (e-rechnung and x-rechnung). Somehow the phrase ZUGFeRD for X-Rechnung is used, in France the standard is called Facture-X.

Will it be possible to archive those documents in Mayan EDMS, do I need some extensions, updates or is the system already capable of doing this?

Many regards
Guido

DrRSatzteil · November 23, 2024, 5:04pm

I don’t see how Mayan would support native e/x-Rechnung documents and I doubt that it makes a lot of sense to support this directly. The electronic invoices are in XML format and are not meant to be human readable. The usual workflow should be to feed the invoice directly into your enterprise workflow for automatic processing. If you need a human readable representation in your document management system for whatever reasons you would probably generate a human readable version of this invoice in the software you use for your invoice processing.

In practice a lot of companies will probably do it the other way round: first generate a human readable document and then process it like any other invoice received as a paper document. However in my opinion this contradicts the whole idea behind the e-Rechnung.

weckwerth · November 25, 2024, 3:20pm

I can see you point. On the other hand, the legislation demands to store the invoice in original form as received. So I clearly see Mayan as storage engine, which would be very convenient to have some sort of preview for those XML files.

DrRSatzteil · November 25, 2024, 4:59pm

Yeah I see your point, too. I never tried to import an xml document, it probably works. You could then maybe create a workflow to extract the relevant data as document metadata. This way you don’t necessarily need a visual representation of the document. However this would only work with some key elements of the invoice, not for single line items.

roberto.rosario · November 25, 2024, 5:32pm

Hi,

There have been request similar to this but the process has not been articulated in a way we’ve been able to understand.

Based on our limited understanding we added nested XML metadata extraction which makes it possible manage the XML document by any piece or value using the normal Mayan features like search or indexing. Therefore querying the XML is already supported fulfilling the data requirement.

However, the presentation requirement is not there. This is because rendering the XML into a full PDF is not implemented because the XML requires the corresponding XSL file which is not bundled with the XML.

Rendering files by merging a local and another found in the web is not something the converter is designed to do or that it should even be doing. Mayan presents an image of the actual document but rendering the XML will be creating a PDF out of an XML, this requires rethinking the role and code of the converter and if it is something that should be done since each render of the same XML would create a new PDF file or even a completely different PDF amongst two users in different Mayan installation depending on the XSL file used.

Are there details or a website explaining this requirement that will go into effect in 2025?

We’ll need sample source with no public information documents (official test documents would be the best) including the XML sample rendering, technical details about the XML rendering, sample queries expected to be fulfilled against the source XML.

weckwerth · November 25, 2024, 5:54pm

Unfortunately most of this is in German (I might help understanding this, if you wish):

A summary of all specs as a download link:

X-Rechnung Specs

And this is a document defining the XML format:

X-Rechnung definition

Hope this helps for the beginning.

From my understanding, it is not necessary to create a PDF file from the XML file. The XML file is the original version of the invoice and therefore needs to be archived unaltered, it just would require some reader to see the contents of the invoice in human readable from, no need of special styles or something. Plain content in a more readable form would be enough.

Here is a sample of such a X-Rechnung

X-Rechnung Sample

This would transform into something like this:

Might be some more intelligent formatting possible but this representation of the XML file would be absolutely sufficient.
I’d like to stress the fact that no formatting of fonts whatsoever was done.

(Edit: Formatting reworked)

DrRSatzteil · November 28, 2024, 8:57am

Also as far as I know there is no such thing as an „official“ xsl transform to create a human readable document. In my understanding this is left intentionally to the software developers that handle such documents in their applications.

jecasc · December 10, 2024, 2:32pm

I doubt that many companies will use the pure XML Form. Most E-Invoices will be a PDF with XML attached.

Here a sample file:

If you open the file in Adobe Acrobat you will see it has an XML attachment: factur-x.xml

So it would be great to be able to store the XML attachment in the OCR Text instead of doing OCR on the main PDF.

This would actually be a perfect use case for Mayan - you would have the PDF for viewing and a structured format for easy automated extraction of invoice data like company name, invoice number, invoice date, total amount. Finally 100% automation without errors!

If there is no direct support in Mayan, we will probably handle the extraction outside of Mayan somehow and then overwrite the OCR with the XML using the API.

weckwerth · December 10, 2024, 3:10pm

If there is no direct support in Mayan, we will probably handle the extraction outside of Mayan somehow and then overwrite the OCR with the XML using the API.

Could be a solution for viewing but probably not for storage. It is required to store the unaltered version of the document - so to have the export function working, your proposal of storing the complete file but only preview the PDF component seems the right way to go.

Really hope, that Roberto an his team are considering some solution as this whole electronic invoice thing is a EU wide requirement eventually, in Germany already in 2025.

jecasc · December 10, 2024, 3:34pm

Could be a solution for viewing but probably not for storage

The document is stored unaltered by Mayan, that is not an issue. What we would do is access new documents through the Mayan API, extract the XML attachment with a script and then overwrite the OCR data in Mayan with the content of the XML attachment through the API

Then we can use the XML in workflows and extract data with regex.

It is doable but I would very much prefer a solution where instead of activating “automatic OCR” for a new document I could simply click on “fill OCR data with XML attachment data”.

In the long run it would of course be better to have a separate storage field for the XML but simply filling the exisiting OCR with the XML content would be simple short term solution.

george · December 10, 2024, 4:17pm

Guido, in kivitendo you can find a perfect open source implementation. You can use this code and logic with Mayan API. https://www.kivitendo.de/

weckwerth · December 10, 2024, 4:37pm

That might be an option …
On the other hand, installing (and maintaining and understanding) a complete ERP system plus developing the API was not exactly the solution I am looking for.

george · December 10, 2024, 4:41pm

Is long there is no mayan plugin, just use : https://pypi.org/project/factur-x/

george · December 10, 2024, 4:47pm

by the way i am near to Planegg, if you publish your requirements specification, i will check how much effort is involved to provide you a simple API.

weckwerth · December 10, 2024, 6:04pm

Sounds extremely cool. I‘ll prepare the stuff …

george · December 11, 2024, 8:10am

My first idea is, get your invoices from an imap postbox.
convert with adding zugferd xml and rename it.
save file into mayan watch folder.


#!/usr/bin/python3
import imaplib
import email
import os
import re
import fitz  # PyMuPDF

def save_attachment(msg, download_folder="/opt/watchfolder"):
    for part in msg.walk():
        if part.get_content_maintype() == 'multipart':
            continue
        if part.get('Content-Disposition') is None:
            continue
        filename = part.get_filename()
        if filename and filename.lower().endswith('.pdf'):
            filepath = os.path.join(download_folder, filename)
            with open(filepath, 'wb') as f:
                f.write(part.get_payload(decode=True))
            print(f"Saved attachment: {filepath}")
            search_german_dates(filepath)

def search_german_dates(pdf_path):
    german_date_pattern = r'\b(0[1-9]|[12][0-9]|3[01])\.(0[1-9]|1[0-2])\.\d{4}\b'
    doc = fitz.open(pdf_path)
    for page_num in range(len(doc)):
        page = doc[page_num]
        text = page.get_text()
        lines = text.split('\n')
        for line in lines:
            matches = re.finditer(german_date_pattern, line)
            for match in matches:
                date = match.group()
                print(f"German date found on page {page_num + 1}: {date}")
                print(f"Full line: {line.strip()}")
                print("---")
    doc.close()

def fetch_unseen_emails():
    imap_host = 'mx2f59.netcup.net'
    username = 'dms@aufgewecktes_bürschchen.de'
    password = 'yourpassword'

    imap = imaplib.IMAP4_SSL(imap_host)
    imap.login(username, password)
    imap.select('INBOX')

    _, messages = imap.search(None, 'UNSEEN')
    for num in messages[0].split():
        _, msg = imap.fetch(num, '(RFC822)')
        email_body = msg[0][1]
        email_message = email.message_from_bytes(email_body)
        save_attachment(email_message)

    imap.close()
    imap.logout()

if __name__ == "__main__":
    fetch_unseen_emails()

george · December 11, 2024, 12:27pm

May be you can that implement inside mayan as a plugin:

from facturx import FacturX
import json

# Load the original PDF invoice
input_pdf = "invoice.pdf"
output_pdf = "invoice_facturx.pdf"

# Create a FacturX object
fx = FacturX(input_pdf)

# Set invoice data (replace with actual invoice details)
invoice_data = {
    "seller": {"name": "Seller Company"},
    "buyer": {"name": "Buyer Company"},
    "date": "2024-12-11",
    "due_date": "2024-12-25",
    "amount_tax_excluded": 1000.00,
    "amount_tax_included": 1200.00,
    "currency": "EUR"
}

# Update the FacturX object with invoice data
fx.update(invoice_data)

# Validate the data
if fx.is_valid():
    # Generate the Factur-X PDF
    fx.write_pdf(output_pdf)
    print(f"Factur-X PDF generated: {output_pdf}")
else:
    print("Invalid invoice data")

roberto.rosario · December 11, 2024, 1:11pm

We are but it has to be in a way that fits the philosophy, the existing paradigms, and apps. This is not a single issue but a composition of challenges that need to be addressed individually.

Once the current set of patches being reviewed and merged are complete and we move to alpha2/beta1 in the following weeks, I’ll open a topic to show how we are addressing the support for these e-invoices.