I need to run two external processes (on Linux):
pdftotext -tsv one.pdf
pdftotext -tsv two.pdf
For each one I need to acquire the output and post-process it.
Both are completely independent.
(However, once I've finished post-processing I then do some work on
both sets of post-processed data together.)
Each external process takes about 3 secs so it takes just over 6 secs
to acquire the data from both processes.
When I've done something similar in Python I've used the multiprocessing module and this has got my runtime close to the 3 secs.
In my experiments with Tcl's threading I've found the threading startup overhead to be rather large.
What is the fastest way to run two independent processes concurrently
and acquire their outputs using Tcl?
This takes ~1 sec.So it's 2x faster, as expected.
Am 29.04.2026 um 10:51 schrieb Mark Summerfield:
This takes ~1 sec.So it's 2x faster, as expected.
What's the issue?
In my experiments with Tcl's threading I've found the threading
startup overhead to be rather large.
I need to run two external processes (on Linux):
pdftotext -tsv one.pdf
pdftotext -tsv two.pdf
I need to run two external processes (on Linux):
pdftotext -tsv one.pdf
pdftotext -tsv two.pdf
* Mark Summerfield <m.n.summerfield@gmail.com>
| I created a tiny test program (65 LOC; shown at the end) to
| compare timings. I did multiple timings and here're the averages:
| serial (2 LOC) 2.020 sec
| multiprocess (19 LOC) 1.055 sec
| threaded (13 LOC) 1.061 sec
| Since the difference between the multiprocess and threaded
| approaches is so small and that the threaded code is simpler
| and more appealing, I'm going to use the threaded version in
| my programs (which only ever work with two PDFs at a time)
| — so thank you "abu"!
I wonder: you stated in your initial message
Message-ID: <10sschf$3nvs2$1@dont-email.me>
In my experiments with Tcl's threading I've found the threading
startup overhead to be rather large.
Can you tell what is/was the difference to the current solution which obviously has no "startup overhead"?
R'
Shameless plug...
Bit late to the topic, but the simplest way to parallelize multiple processes or threads and wait for completion is promises, if you do not
mind an external package. Bit of a learning curve however.
lappend promises [promise::pexec pdftotext pdf1.pdf pdf1.txt]
lappend promises [promise::pexec pdftotext pdf2.pdf pdf2.txt]
set waiter [promise::all $promises]
# Assumes eventloop not running!
promise::eventloop $waiter
Timing:
% time {demo} <- using promises
2606403 microseconds per iteration
% time {demo2} <- sequential exec's
4762417 microseconds per iteration
https://wiki.tcl-lang.org/page/promise
https://tcl-promise.magicsplat.com/ https://www.magicsplat.com/blog/tags/promises/
* Mark Summerfield <m.n.summerfield@gmail.com>
| > https://wiki.tcl-lang.org/page/promise
| > https://tcl-promise.magicsplat.com/
| > https://www.magicsplat.com/blog/tags/promises/
| I tried it but hit a problem. Here's the code I used:
| proc promised {pdf1 pdf2} {
| set p1 [promise::pexec $::PDFTOTEXT $::OPT $pdf1 - 2>@1]
| set p2 [promise::pexec $::PDFTOTEXT $::OPT $pdf2 - 2>@1]
| set waiter [promise::all [list $p1 $p2]]
| # Assumes eventloop not running!
| promise::eventloop $waiter
| set tsv1 [$p1 getdata]
| set tsv2 [$p2 getdata]
promise::eventloop already returns the result of the 'waiter' promise
(i.e. those registered in promise::all).
So change those two 'getdata' calls to
lassign [promise::eventloop $waiter] tsv1 tsv2
| puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
HTH
R'
Thanks, I've now done that. Here are the new timings (each is the best
of several):
sec method
2.010 serial
1.052 multiprocess
1.065 thread_pool
1.067 threaded
8.366 promised
Ashok,
since coroutines are already part of TCL, any chance of getting promises
into the core? It would seem to me as a 'natural' addition for async features in TCL, and the package looks quite mature...
R'
Thanks, I've now done that. Here are the new timings (each is the best
of several):
sec method
2.010 serial
1.052 multiprocess
1.065 thread_pool
1.067 threaded
8.366 promised
I'm surprised by the promises result below (not that I doubt it). I'll
have to take a look when I have some time.
In my tests that I posted earlier, the promise version took about the
same time as the multiprocess one.
The difference between my example and yours is that in my example,
pdftotext was writing to a file and not to its stdout. In your example,
it is writing back to the pipe and read directly in Tcl.
I wonder if the difference stems from your code essentially doing a busy loop reading data while the promise version goes through the event loop though I cannot explain why that would make that much difference.
Worth investigating further when I have time...
/Ashok
On 5/13/2026 2:24 PM, Mark Summerfield wrote:
Thanks, I've now done that. Here are the new timings (each is the best
of several):
sec method
2.010 serial
1.052 multiprocess
1.065 thread_pool
1.067 threaded
8.366 promised
In the hope it helps, below is the full source for the example I used.Thanks, having a benchmark source helped. However I cannot reproduce
I ran it on Tcl/Tk 9.0.3 (64-bit), Debian GNU/Linux 12 (bookworm)
Linux 6.1.0-44-amd64 (x86_64), 12th Gen Intel Core i7-12700 20 cores.
I used two PDF files both of 647 pages and did several runs of each
method to find the best time.
The most obvious difference is that I am running Linux on the hardware
and (I think) you are running Linux on Windows. All I can suggest is
trying the same test on Linux that's running directly on the hardware?
I bet there is some significant difference between bench.tcl and concurrent.tcl which explains the huge difference between the 'threaded'
and 'promised' versions of Ashok and Mark.
* Ashok <apnmbx-public@yahoo.com>
--<snip-snip>--
| time -p ~/tcl/9.0.3/x64/bin/tclsh9.0 bench.tcl
| threaded tsv1=3559848 tsv2=3559848 real 3.17 user 0.90 sys 0.41
| promise tsv1=3559849 tsv2=3559849 real 3.33 user 2.68 sys 0.42
* Mark Summerfield <m.n.summerfield@gmail.com>
--<snip-snip>--
| time -p ~/opt/tcl9/bin/tclsh9.0 \
~/app/tcl/xmisc/concurrent.tcl
| threaded tsv1=14809914 tsv2=14816735 real 1.08 user 2.09 sys 0.09
| promised tsv1=14809915 tsv2=14816736 real 8.53 user 10.39 sys 0.26
I bet there is some significant difference between bench.tcl and concurrent.tcl which explains the huge difference between the 'threaded'
and 'promised' versions of Ashok and Mark.
Could both of you post the current version of your scripts,
so one could retry?
R'
* saito <saitology9@gmail.com>
| On 5/21/2026 8:21 AM, Ralf Fassel wrote:
| > I bet there is some significant difference between bench.tcl and
| > concurrent.tcl which explains the huge difference between the 'threaded' | > and 'promised' versions of Ashok and Mark.
| >
| Yes, the difference is significant. I suspect the difference is either
| in pdftotext itself or how it is spawned:
| - It is called with different arguments: "pdftotext pdf1.pdf"
| vs. pdftotext pdf1.pdf pdf1.txt" so perhaps pdftotext behaves
| differently internally to handle output to stdout or to a file.
No, I don't mean the difference betwenn Ashoks and Marks results (this
is to be expected because of different hardware), but why Marks
'promised' version is so much slower than his 'threaded' version (since 'promised' is only a threaded in disguise). Both execute the same
pdftotext program, so this should be the same overhead in both cases.
I'll try Marks version soon.
R'
* Mark Summerfield <m.n.summerfield@gmail.com>
| On Fri, 22 May 2026 11:34:44 +0200, Ralf Fassel wrote:
| > * saito <saitology9@gmail.com>
| > | On 5/21/2026 8:21 AM, Ralf Fassel wrote:
| > | > I bet there is some significant difference between bench.tcl and
| > | > concurrent.tcl which explains the huge difference between the 'threaded'
| > | > and 'promised' versions of Ashok and Mark.
| > | >
| >>
| > | Yes, the difference is significant. I suspect the difference is either | > | in pdftotext itself or how it is spawned:
| >>
| > | - It is called with different arguments: "pdftotext pdf1.pdf"
| > | vs. pdftotext pdf1.pdf pdf1.txt" so perhaps pdftotext behaves
| > | differently internally to handle output to stdout or to a file.
| >
| > No, I don't mean the difference betwenn Ashoks and Marks results (this
| > is to be expected because of different hardware), but why Marks
| > 'promised' version is so much slower than his 'threaded' version (since
| > 'promised' is only a threaded in disguise). Both execute the same
| > pdftotext program, so this should be the same overhead in both cases.
| > I'll try Marks version soon.
| >
| > R'
| It is possible that the difference is in different versions of pdftotext.
| For example, Ashok said his didn't support the -tsv option and mine does.
Again: it is the difference between the Threaded and Pronmised versions
in *your* report:
1.067 threaded
8.366 promised
where you use (hopefully?) the *same* pdftotext which puzzles me.
The timings should be identical for these two.
With your code, I don't see that difference - see my other post
Message-ID: <yga5x4fohy5.fsf@akutech.de>.
R'
Everything as expected, no difference between Promised and the Threaded
or Multiprocess version, slightly higher execution time for Serial.
So it really puzzles me why you see such a huge timing difference in the 'Promised' version...
R'
On 5/18/2026 12:41 PM, Mark Summerfield wrote:[snip]
The most obvious difference is that I am running Linux on the hardware
and (I think) you are running Linux on Windows. All I can suggest is
trying the same test on Linux that's running directly on the hardware?
Don't have such a beast :-)
On Fri, 22 May 2026 12:35:22 +0200, Ralf Fassel wrote:
* Mark Summerfield <m.n.summerfield@gmail.com>
| On Fri, 22 May 2026 11:34:44 +0200, Ralf Fassel wrote:
| > * saito <saitology9@gmail.com>
| > | On 5/21/2026 8:21 AM, Ralf Fassel wrote:
| > | > I bet there is some significant difference between bench.tcl and
| > | > concurrent.tcl which explains the huge difference between the 'threaded'
| > | > and 'promised' versions of Ashok and Mark.
| > | >
| >>
| > | Yes, the difference is significant. I suspect the difference is either >> | > | in pdftotext itself or how it is spawned:
| >>
| > | - It is called with different arguments: "pdftotext pdf1.pdf"
| > | vs. pdftotext pdf1.pdf pdf1.txt" so perhaps pdftotext behaves
| > | differently internally to handle output to stdout or to a file.
| >
| > No, I don't mean the difference betwenn Ashoks and Marks results (this >> | > is to be expected because of different hardware), but why Marks
| > 'promised' version is so much slower than his 'threaded' version (since >> | > 'promised' is only a threaded in disguise). Both execute the same
| > pdftotext program, so this should be the same overhead in both cases.
| > I'll try Marks version soon.
| >
| > R'
| It is possible that the difference is in different versions of pdftotext. >> | For example, Ashok said his didn't support the -tsv option and mine does. >>
Again: it is the difference between the Threaded and Pronmised versions
in *your* report:
1.067 threaded
8.366 promised
where you use (hopefully?) the *same* pdftotext which puzzles me.
The timings should be identical for these two.
With your code, I don't see that difference - see my other post
Message-ID: <yga5x4fohy5.fsf@akutech.de>.
R'
Sorry, you're quite right I use only the system pdftotext in all cases.
In addition, with top running in a window, when promises ran with the
large file, it would show that tclsh was pegged at 100% with each of the
two pdftotext only about 10%
* et99 <et99@rocketship1.me>
--<snip-snip>--
| I was able to reproduce the timings where promises was using much more
| real time. My setup is a vmware ubuntu running on windows 10 on a 4
| core machine.
| If I ran with a smallish pdf file pair which reported about 1mb output
| size of the text, the timings were as expected. When I ran with a very
| large pdf pair, and the output was 78mb, promises would slow down
| dramatically.
| In addition, with top running in a window, when promises ran with the
| large file, it would show that tclsh was pegged at 100% with each of
| the two pdftotext only about 10%
| When running the several versions of threading, it would show the two
| pdftotext at 100% with tclsh at 5%
Ok, now we're getting somewhere. With 75MB of text output, I see the
same as et99: s/m/t/p as expected (pdftotext 2x 100%CPU, tclsh waiting
for the result), but P has tclsh at 100% CPU and both pdftotext at 10%.
Looking at the promise 'pexec' code, I see that I was wrong in assuming
it is "threads-in-disguise". Instead it sets up two pipes and reads
them until EOF, causing many reallocations of the huge strings. Somehow
this is not as effective as 'exec' itself, which also needs to read from
the pipes, but obviously does it more efficiently.
Perhaps using promise::ptask or promise::pworker would give better
results.
R'
Thanks for the analysis, all. I just haven't had time to follow up on
this regularly.
As to the explanation,
- the multi-process read is more efficient because it blocks for 1
second so reads larger chunks of data with fewer appends.
Thanks for the analysis, all. I just haven't had time to follow up on
this regularly.
As to the explanation,
- the multi-process read is more efficient because it blocks for 1
second so reads larger chunks of data with fewer appends. pexec could be modified similarly. The drawback of this approach, besides blocking for
1 second, is the potential of the forced 1 second latencies (exec's that actually finished in 1.01 secs would take 2 seconds, 2.01->3 secs etc.).
- multi-thread execs read in one big chunk. ptask or pthread would give similar results, I think as Ralf suggested though I have not actually
tried it.
I'll take a further look at some later point when I have some time about optimizing pexec, but not clear how since I'm not in favor of blocking
or polling as in the multi-process case. Perhaps in 9.0, the new process commands will allow waiting till the process completes instead of
reading via events.
/Ashok[snip]
* Ralf Fassel <ralfixx@gmx.de>
| However, Tcl_FileEventObjCmd() in generic/tclIO.c contains this section
| of code:
| /*
| * If we are supposed to delete a stored script, do so.
| */
| if (*(TclGetString(objv[3])) == '\0') {
| DeleteScriptRecord(interp, chanPtr, mask);
| return TCL_OK;
| }
| I.e. the string rep of the fileevent script is checked to be empty, and
| this probably is what causes the slowdown (shimmering from list to
| string after each read).
See https://core.tcl-lang.org/tcl/tktview?name=7da6c2d04c219e8e
R'
Am 26.05.2026 um 19:59 schrieb Ralf Fassel:
* Ralf Fassel <ralfixx@gmx.de>
| However, Tcl_FileEventObjCmd() in generic/tclIO.c contains this section
| of code:
| /*
| * If we are supposed to delete a stored script, do so.
| */
| if (*(TclGetString(objv[3])) == '\0') {
| DeleteScriptRecord(interp, chanPtr, mask);
| return TCL_OK;
| }
| I.e. the string rep of the fileevent script is checked to be empty, and
| this probably is what causes the slowdown (shimmering from list to
| string after each read).
See https://core.tcl-lang.org/tcl/tktview?name=7da6c2d04c219e8e
R'
Wow, Ralf, Wizard level analysis! We are all impressed!
Cudos,
Harald
The thread lead to this Tk ticket: https://core.tcl-lang.org/tcl/info/7da6c2d04c
with a set of solutions.
Thanks for all,
Harald
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,118 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 35:00:08 |
| Calls: | 14,340 |
| Files: | 186,357 |
| D/L today: |
14,400 files (4,505M bytes) |
| Messages: | 2,532,891 |