Debugging a failed conversion¶
We may get a request in order to find why an specific document has failed. This short doc intends to show possible ways to retrieve information. They dont necessarily need to be tried in this order.
Initial request¶
A document named: 2441836-2281035-News_from_ISOLDE.pptx
failed to convert during certain interval of time, for this example it was about 3 days ago.
Database check¶
From a machine having psql.exe
e.g. doconv01-test run something similar to:
``` doconverter=> select a.taskid, a.uploadedfile, b.error, b.logdate, a.server from taskdb a, results_conversion b where a.taskid = b.taskid and duration = -1 and a.uploadedfile = '2441836-2281035-News_from_ISOLDE.pptx' and b.logdate > current_date - interval '4' day order by a.logdate desc; taskid | uploadedfile | error | logdate | server ------------+---------------------------------------+-------------------------------------------------------------------+----------------------------+--------------- 1510078054 | 2441836-2281035-News_from_ISOLDE.pptx | FileNotFoundError(2, 'The system cannot find the file specified') | 2017-11-06 19:29:24.362224 | doconverter01 (1 row)
```
From here you know what worker node was involved, the time and also the internal taskid from the converter.
If required you can check the tables fields:
doconverter=> \dt
List of relations
Schema | Name | Type | Owner
-------------+--------------------+-------+-------------
doconverter | results_conversion | table | doconverter
doconverter | taskdb | table | doconverter
(2 rows)
doconverter=> \d results_conversion
Table "doconverter.results_conversion"
Column | Type | Modifiers
------------------+-----------------------------+-----------------------------------------------------------------
id | integer | not null default nextval('results_conversion_id_seq'::regclass)
from_ext | character varying(64) | not null
to_ext | character varying(64) | not null
taskid | integer | not null
uploadedfilehash | character varying(128) | not null
converter | character varying(64) | not null
server | character varying(64) | not null
error | character varying(2048) |
size_from | integer | not null
size_to | integer | not null
duration | integer | not null
logdate | timestamp without time zone | not null
remotehost | inet | not null
hashurl | character varying(1024) |
Indexes:
"results_conversion_pkey" PRIMARY KEY, btree (id)
"duration_col" btree (duration)
"fk_constraint" btree (server, taskid, uploadedfilehash)
"fromtoextention_col" btree (from_ext, to_ext)
"size_from_col" btree (size_from)
"size_to_col" btree (size_to)
"time_col" btree (logdate)
Foreign-key constraints:
"results_conversion_server_fkey" FOREIGN KEY (server, taskid, uploadedfilehash) REFERENCES taskdb(server, taskid, uploadedfilehash) ON UPDATE CASCADE ON DELETE CASCADE
Logs at the worker node¶
From the server and time we got, logs are located at c:\doconverter\logs
cd /d c:\doconverter\logs
...
[2017-11-07 10:06:50 converter_daemon.py:147 - <module>() ] processing task: 1510078054
[2017-11-07 10:06:50 converter_daemon.py:152 - <module>() ] list of processes [1510078054, 1510078850, 1510079015, 1510079239]
[2017-11-07 10:06:50 converter_daemon.py:156 - <module>() ] task being added [1510078054, 1510078850, 1510079015, 1510079239]
[2017-11-07 10:06:50 Utils.py:158 - logmessage() ] 1510078054 reading json file.
[2017-11-07 10:06:50 Task.py:68 - __init__() ] 1510078054 newfilename is 2441836-2281035-News_from_ISOLDE.pdf
[2017-11-07 10:06:50 Task.py:86 - __createFileTask() ] file already present C:\Users\cdsconv\cernboxprod\doconverter01\var\tasks\1510078054
[2017-11-07 10:06:50 ConverterManager.py:42 - __init__() ] Working on taskid 1510078054 from remote_host: 10.76.46.1 ext_from: pptx ext_to: pdf
[2017-11-07 10:06:50 converter_daemon.py:156 - <module>() ] task being added [1510078054, 1510078850, 1510079015, 1510079239]
[2017-11-07 10:06:50 converter_daemon.py:165 - <module>() ] JOB START 1510078054
[2017-11-07 10:06:50 converter_daemon.py:168 - <module>() ] JOB JOIN 1510078054
[2017-11-07 10:06:51 Utils.py:158 - logmessage() ] 1510078054 reading json file.
[2017-11-07 10:06:51 Task.py:68 - __init__() ] 1510078054 newfilename is 2441836-2281035-News_from_ISOLDE.pdf
[2017-11-07 10:06:51 Task.py:86 - __createFileTask() ] file already present C:\Users\cdsconv\cernboxprod\doconverter01\var\tasks\1510078054
[2017-11-07 10:06:51 Neevia.py:74 - convert() ] 1510078054 conversion started
...
2017-11-07 10:07:21 ConverterManager.py:143 - run() ] Exception got [WinError 2] The system cannot find the file specified: 'C:\\Users\\cdsconv\\cernboxprod\\doconverter01\\var\\uploadsresults\\1510078054\\2441836-2281035-News_from_ISOLDE.pptx'. Stack trace: None
[2017-11-07 10:07:21 ConverterManager.py:149 - run() ] task 1510078054 in server doconverter01, remote host: 10.76.46.1 failed: file: 2441836-2281035-News_from_ISOLDE.pptx size: 0 KB from: pptx to: pdf size: -1 KB in -1 secs
[2017-11-07 10:07:22 ConverterManager.py:160 - run() ] Results for task 1510078054 were logged.
[2017-11-07 10:07:22 Utils.py:158 - logmessage() ] 1510078054 moving from C:\Users\cdsconv\cernboxprod\doconverter01\var\tasks\1510078054 to C:\Users\cdsconv\cernboxprod\doconverter01\var\error\1510078054
[2017-11-07 10:07:22 ConverterManager.py:174 - run() ] task 1510078054 in server doconverter01 remote host: 10.76.46.1 failed: file: 2441836-2281035-News_from_ISOLDE.pptx size: 0 KB from: pptx to: pdf size: -1 KB in -1 secs
[2017-11-07 10:07:23 Utils.py:158 - logmessage() ] 1510078054 success sending file None to https://indico.cern.ch/conversion/finished
From ES cluster¶
From ES collaborativeapps cluster we need to first select the right dashboard: doconverter_prod
for production or doconverter_qa
for test. In our example we will use doconverter_prod
as this was a real case from Indico.
First adjust the time range to speed up the search, and introduce the search string e.g. *News_from_Isolde*
in the search box:
As you can see two times such a document was processed, first it ended up with an error and secondly it was processed successfully, you can check further the information on the logs in the bottom visualization, and also check other visualizations that provide information about which worker node processed the file or size of input and output size and also which conversion was performed e.g. from pptx to pdf.