-
Improvement
-
Resolution: Fixed
-
Minor
-
3.8.5, 3.11, 4.0
-
MOODLE_311_STABLE, MOODLE_38_STABLE, MOODLE_400_STABLE
-
MOODLE_311_STABLE
-
MDL-70038-master -
This issue is relate somehow to MDL-57202.
All what I comment here is related to the file mod/assign/feedback/editpdf/classes/pdf.php and the gs command built to extract a single page as PNG image.
Context:
Linux installs have a package called poppler-utils or poppler package, depending on the Linux distribution, that has a tool named pdftoppm. This tool is able to convert single pages (or all pages at once) several orders of magnitude quicker than ghostscript.
Why is this process slow using ghostscript (gs)? I read that this is because gs converts the whole document to PDF (again) and then extracts the pages requested. The reason behind that is that a PDF may be so complex that content on diferent pages may affect to the final result viewed in an specific page of the PDF.
Why pdftoppm has more conversion speed? Really I don't know. I tried to search why is that the reason, but without success. However, pdftoppm and the rest of tools inside the poppler project are open source too, from the [Poppler project|https://poppler.freedesktop.org/.]
Proposal:
So here it is my proposal and I add a patch for it:
- Add a setting for the pdftoppm path, in the same way we do for /usr/bin/gs.
- Use pdftoppm if defined to convert PDF to PNG. If the setting is empty, Moodle will convert using gs as it does today.
Performance analysis:
In our Moodle we have Architecture studies, where the students' final works are really big, with high quality images, resulting on PDFs document of > 250 MB. This makes the queue for converting submissions to PNG images grow easily to > 2K in a couple of days in our site in normal days, without exams nor deadlines.
This is an example and a comparison of the performance for both tools in my local computer, for just extracting a single page, the page number 46:
jordi@jpax360:~/pdf_conversion $
time pdftoppm -q -f 46 -l 46 -png -singlefile 20200623_combined.pdf pag46
real 0m1,400s
user 0m1,343s
sys 0m0,024s
jordi@jpax360:~/SREd/pdf_conversion $
time gs -q -sDEVICE=png16m -dSAFER -dBATCH -dNOPAUSE -r100 -dFirstPage=46 -dLastPage=46 -dDOINTERPOLATE -dGraphicsAlphaBits=4 -dTextAlphaBits=4 -sOutputFile=pag46_gs.png 20200623_combined.pdf
real 3m59,952s
user 3m57,231s
sys 0m2,287s
jordi@jpax360:~/pdf_conversion
$
This is 171 times quicker pdftoppm than gs for this single page, as an example.
Another result from our patch:
Running the scheduled task by hand "php admin/tool/task/cli/schedule_task.php --execute='\assignfeedback_editpdf\task\convert_submissions'", using pdftoppm, it converts a document of 207MB with high quality images and details, with 43 pages, in 28m17,105s. That is, all 43 pages as PNG images in 28 minutes aprox.
Instead, the same scheduled task using the gs command, it took 19m1,642s to convert just the first page of the document. Just the first. To do so, I've just left empty the setting for the pdftoppm path in my testing Moodle.
The result of each PNG file is the same in both cases, either using pdftoppm than using gs.
We have it in our production site already.
Hoping this helps to this part of the Moodle. For us it's a critical part, and a headache at the same time, having to check every week alerts for long lasting cron.php processes.
- has a non-specific relationship to
-
MDL-57202 Large docs/PDFs cripple grading interface
-
- Open
-
-
MDL-64585 High resolution PDF causes endless GhostScript processing
-
- Open
-
-
MDL-75295 Produce all mod_assign editpdf images from pages in a single shell call instead of per page
-
- Closed
-
- has a QA test
-
MDLQA-16045 An admin can set either ghostscript or pdftoppm to be used for converting PDF to PNG files
-
- Open
-
-
MDLQA-16046 CLONE - An admin can set either ghostscript or pdftoppm to be used for converting PDF to PNG files
-
- Passed
-
-
MDLQA-16672 CLONE - An admin can set either ghostscript or pdftoppm to be used for converting PDF to PNG files
-
- Passed
-
-
MDLQA-17288 CLONE - An admin can set either ghostscript or pdftoppm to be used for converting PDF to PNG files
-
- Passed
-
-
MDLQA-17866 CLONE - An admin can set either ghostscript or pdftoppm to be used for converting PDF to PNG files
-
- Passed
-
-
MDLQA-18369 CLONE - An admin can set either ghostscript or pdftoppm to be used for converting PDF to PNG files
-
- Passed
-
-
MDLQA-18856 CLONE - An admin can set either ghostscript or pdftoppm to be used for converting PDF to PNG files
-
- Passed
-
-
MDLQA-19316 CLONE - An admin can set either ghostscript or pdftoppm to be used for converting PDF to PNG files
-
- Passed
-
-
MDLQA-20198 CLONE - An admin can set either ghostscript or pdftoppm to be used for converting PDF to PNG files
-
- Passed
-
- has been marked as being related by
-
MDL-64431 Layers missing from PDFs in grading view
-
- Closed
-
- will help resolve
-
MDL-64402 PDF conversion hangs for large files
-
- Open
-
-
MDL-71431 PDF conversion hangs for PDFs with lots of vector graphics
-
- Reopened
-