Forum: War Ensemble BBS

Isn't that beauty ?

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Mar 12 07:24:59 2026

From Newsgroup: comp.lang.c

There was a programming-"contest" on comp.lang.c and I wanted to show
the simpler code in C, here it is:

#include <iostream>
#include <sstream>
#include <iomanip>
#include <optional>
#include <algorithm>

using namespace std;

static optional<size_t> parse( const char *str );

int main( int argc, char **argv )
{
if( argc < 3 )
return EXIT_FAILURE;
optional<size_t>
pRows = parse( argv[1] ),
pCols = parse( argv[2] );
if( !pRows || !pCols )
return EXIT_FAILURE;
size_t rows = *pRows, cols = *pCols;
optional<size_t> pClip( rows * cols );
if( argc >= 4 && !(pClip = parse( argv[3] )) )
return EXIT_FAILURE;
size_t clip = min( *pClip, rows * cols );
streamsize width = (ostringstream() << clip).str().length();
for( size_t row = 1; row <= min( rows, clip ); ++row )
{
bool head = true;
for( size_t value = row; value <= clip; value += rows, head = false )
cout << " "sv.substr( head, !head ) << right << setw( width ) << value;
cout << endl;
}
}

static optional<size_t> parse( const char *str )
{
istringstream iss( str );
size_t ret;
iss >> ret;
if( !iss || !iss.eof() )
return nullopt;
return ret;
}

C++ really rocks since you've to deal with much less details than in C.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Mar 12 07:26:56 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 07:24 schrieb Bonita Montero:

There was a programming-"contest" on comp.lang.c and I wanted to show
the simpler code in C, here it is:

^^^^^^^^^^^^^^^^^^^^^ C++

#include <iostream>
#include <sstream>
#include <iomanip>
#include <optional>
#include <algorithm>

using namespace std;

static optional<size_t> parse( const char *str );

int main( int argc, char **argv )
{
    if( argc < 3 )
        return EXIT_FAILURE;
    optional<size_t>
        pRows = parse( argv[1] ),
        pCols = parse( argv[2] );
    if( !pRows || !pCols )
        return EXIT_FAILURE;
    size_t rows = *pRows, cols = *pCols;
    optional<size_t> pClip( rows * cols );
    if( argc >= 4 && !(pClip = parse( argv[3] )) )
        return EXIT_FAILURE;
    size_t clip = min( *pClip, rows * cols );
    streamsize width = (ostringstream() << clip).str().length();
    for( size_t row = 1; row <= min( rows, clip ); ++row )
    {
        bool head = true;
        for( size_t value = row; value <= clip; value += rows, head =
false )
            cout << " "sv.substr( head, !head ) << right << setw( width ) << value;
        cout << endl;
    }
}

static optional<size_t> parse( const char *str )
{
    istringstream iss( str );
    size_t ret;
    iss >> ret;
    if( !iss || !iss.eof() )
        return nullopt;
    return ret;
}

C++ really rocks since you've to deal with much less details than in C.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Mar 12 09:32:38 2026

From Newsgroup: comp.lang.c

On 3/12/26 07:24, Bonita Montero wrote:

There was a programming-"contest" on comp.lang.c and I wanted to show
the simpler code in C[++], here it is:
[...]

C++ really rocks since you've to deal with much less details than in C.

Concerning your subject; it wouldn't appear to me to use the term
"beauty" in context of C++. C++ inherited so much ugly syntax and
concepts (from "C") and it added yet more syntactic infelicities.
(And I'm saying that as someone who liked to program in C++, also professionally, for many many years.)

You're right in the "details" aspect. C++ adds so much per default
that you just don't have in "C"; that's why I consider it pointless
to compare it in the first place, let alone to argument in a "C"
newsgroup about it. Don't expect missionaries to be welcome here.

C++ has also other benefits beyond the libraries; first of all the
OO concepts and whatnot. But all uninteresting for this group; "C"
won't ever come close to C++. (And "beauty" should be searched in
other places.)

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,comp.lang.c++ on Thu Mar 12 11:36:35 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 09:32 schrieb Janis Papanagnou:

Concerning your subject; it wouldn't appear to me to use the term
"beauty" in context of C++. C++ inherited so much ugly syntax and
concepts (from "C") and it added yet more syntactic infelicities.
(And I'm saying that as someone who liked to program in C++, also professionally, for many many years.)

My perception of beauty is just as subjective as your perception of the
ugly aspects of C++. I like the span of lowleven means in C++ that are inherited from C to higher level abstractions C++ has inherited from
other more moderrn language.
And if you don't the subjektive part of beauty or ugly parts of this
language C++ is several times more effective than C, i.e. you need
only a fraction of the code size while mostly maintaining the same
performance.
--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Thu Mar 12 10:00:30 2026

From Newsgroup: comp.lang.c

On 3/12/2026 2:24 AM, Bonita Montero wrote:

There was a programming-"contest" on comp.lang.c and I wanted to show
the simpler code in C, here it is:

#include <iostream>
#include <sstream>
#include <iomanip>
#include <optional>
#include <algorithm>

using namespace std;

static optional<size_t> parse( const char *str );

int main( int argc, char **argv )
{
    if( argc < 3 )
        return EXIT_FAILURE;
    optional<size_t>
        pRows = parse( argv[1] ),
        pCols = parse( argv[2] );
    if( !pRows || !pCols )
        return EXIT_FAILURE;
    size_t rows = *pRows, cols = *pCols;
    optional<size_t> pClip( rows * cols );
    if( argc >= 4 && !(pClip = parse( argv[3] )) )
        return EXIT_FAILURE;
    size_t clip = min( *pClip, rows * cols );
    streamsize width = (ostringstream() << clip).str().length();
    for( size_t row = 1; row <= min( rows, clip ); ++row )
    {
        bool head = true;
        for( size_t value = row; value <= clip; value += rows, head =
false )
            cout << " "sv.substr( head, !head ) << right << setw( width
) << value;
        cout << endl;
    }
}

static optional<size_t> parse( const char *str )
{
    istringstream iss( str );
    size_t ret;
    iss >> ret;
    if( !iss || !iss.eof() )
        return nullopt;
    return ret;
}

C++ really rocks since you've to deal with much less details than in C.

Is your strategy to just ignore reality, and keep making bogus claims
that - for this challenge at least - you can't support?

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
if (argc < 3 || argc > 4) {
printf("Enter 2 or 3 arguments:\n$./prog rows columns [stop]\n");
return 0;
}

int rows = atoi(argv[1]);
int cols = atoi(argv[2]);
int max = (argc == 4) ? (atoi(argv[3])) : (rows * cols);
char cw[12];
int colwidth = sprintf(cw,"%d ",rows * cols);

for (int r = 1; r <= rows; r++) {
if (r <= max) {
int nbr = r;
printf("%*d", colwidth, nbr);
for (int i = 0; i < (cols - 1); i++) {
((nbr += rows) <= max) ? (printf("%*d", colwidth, nbr)) : (0) ;
}
printf("\n");
}
}

return 0;

}
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,comp.lang.c++ on Thu Mar 12 15:03:22 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 15:00 schrieb DFS:

On 3/12/2026 2:24 AM, Bonita Montero wrote:

There was a programming-"contest" on comp.lang.c and I wanted to show
the simpler code in C, here it is:

#include <iostream>
#include <sstream>
#include <iomanip>
#include <optional>
#include <algorithm>

using namespace std;

static optional<size_t> parse( const char *str );

int main( int argc, char **argv )
{
     if( argc < 3 )
         return EXIT_FAILURE;
     optional<size_t>
         pRows = parse( argv[1] ),
         pCols = parse( argv[2] );
     if( !pRows || !pCols )
         return EXIT_FAILURE;
     size_t rows = *pRows, cols = *pCols;
     optional<size_t> pClip( rows * cols );
     if( argc >= 4 && !(pClip = parse( argv[3] )) )
         return EXIT_FAILURE;
     size_t clip = min( *pClip, rows * cols );
     streamsize width = (ostringstream() << clip).str().length();
     for( size_t row = 1; row <= min( rows, clip ); ++row )
     {
         bool head = true;
         for( size_t value = row; value <= clip; value += rows, head =
false )
             cout << " "sv.substr( head, !head ) << right <<
setw( width ) << value;
         cout << endl;
     }
}

static optional<size_t> parse( const char *str )
{
     istringstream iss( str );
     size_t ret;
     iss >> ret;
     if( !iss || !iss.eof() )
         return nullopt;
     return ret;
}

C++ really rocks since you've to deal with much less details than in C.

Is your strategy to just ignore reality, and keep making bogus claims
that - for this challenge at least - you can't support?

Where's your check for parsing errors ? Your code would be longer with
that.
But real C++ code is usually multiple times shorter, mostly because of
generic programming. C really sucks since you have to flip every bit
on your own.

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
    if (argc < 3 || argc > 4) {
        printf("Enter 2 or 3 arguments:\n$./prog rows columns [stop]\n");
        return 0;
    }

    int rows = atoi(argv[1]);
    int cols = atoi(argv[2]);
    int max = (argc == 4) ? (atoi(argv[3])) : (rows * cols);
    char cw[12];
    int colwidth = sprintf(cw,"%d ",rows * cols);

    for (int r = 1; r <= rows; r++) {
        if (r <= max) {
            int nbr = r;
            printf("%*d", colwidth, nbr);
            for (int i = 0; i < (cols - 1); i++) {
                ((nbr += rows) <= max) ? (printf("%*d", colwidth,
nbr)) : (0) ;
            }
            printf("\n");
        }
    }

    return 0;

}

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Mar 12 15:13:27 2026

From Newsgroup: comp.lang.c

Here, do that in C:

static optional<size_t> parse( const char *str )
{
size_t ret;
if( from_chars_result fcr = from_chars( str, str + strlen( str ), ret ); (bool)fcr.ec || *fcr.ptr )
return nullopt;
return ret;
}

--- Synchronet 3.21d-Linux NewsLink 1.2

From tTh@tth@none.invalid to comp.lang.c,comp.lang.c++ on Thu Mar 12 15:27:50 2026

From Newsgroup: comp.lang.c

On 3/12/26 15:03, Bonita Montero wrote:

But real C++ code is usually multiple times shorter, mostly because of generic programming. C really sucks since you have to flip every bit
on your own.

C++ really sucks because you can't know who flip the bits.
--
** **
* tTh des Bourtoulots *
* http://maison.tth.netlib.re/ *
** **
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,comp.lang.c++ on Thu Mar 12 15:34:23 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 15:27 schrieb tTh:

On 3/12/26 15:03, Bonita Montero wrote:

But real C++ code is usually multiple times shorter, mostly because of
generic programming. C really sucks since you have to flip every bit
on your own.

C++ really sucks because you can't know who flip the bits.

Absolutely not, C++ can anyhing you want if you need that. But in C++
you often doesn't need that. Bad programmers which don't understand
that are rare since C++ has very high skill demands.

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Thu Mar 12 10:43:18 2026

From Newsgroup: comp.lang.c

On 3/12/2026 10:13 AM, Bonita Montero wrote:

Here, do that in C:

static optional<size_t> parse( const char *str )
{
    size_t ret;
    if( from_chars_result fcr = from_chars( str, str + strlen( str ), ret ); (bool)fcr.ec || *fcr.ptr )
        return nullopt;
    return ret;
}

Explain in detail what it does.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,comp.lang.c++ on Thu Mar 12 16:10:41 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 15:43 schrieb DFS:

On 3/12/2026 10:13 AM, Bonita Montero wrote:

Here, do that in C:

static optional<size_t> parse( const char *str )
{
     size_t ret;
     if( from_chars_result fcr = from_chars( str, str + strlen( str ), >> ret ); (bool)fcr.ec || *fcr.ptr )
         return nullopt;
     return ret;
}

Explain in detail what it does.

If that isn't self-explanatory stick with C.

I've got a task for you: Do the same in C:

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>
#include <vector>

using namespace std;

int main( int argc, char **argv )
{
if( argc < 2 )
return EXIT_FAILURE;
ifstream ifs( argv[1] );
static regex rxNameTel( "^\\s*\"([^\"]*)\"\\s*\"([^\"]*)\"\\s*$" );
struct name_tel { string name, tel; };
vector<name_tel> phoneList;
while( !ifs.eof() )
{
string line;
getline( ifs, line );
match_results<string::const_iterator> sm;
if( regex_match( line, sm, rxNameTel ) )
phoneList.emplace_back( string( sm[1].first, sm[1].second ), string(
sm[2].first, sm[2].second ) );
}
sort( phoneList.begin(), phoneList.end(),
[]( const name_tel &left, const name_tel &right ) { return left.name <
right.name; } );
for( name_tel &phone : phoneList )
cout << "\"" << phone.name << "\"\t\"" << phone.tel << "\"" << endl;
}

1. Read a file and parse it with ne mentioned regex-pattern.
2. Split both parts of every line in two strings.
3. Sort the "vector" according to the first string.
4. Print it.

I guess you don't manage to do that with less than five times the work.
Every external lib allowed.

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Thu Mar 12 11:22:24 2026

From Newsgroup: comp.lang.c

On 3/12/2026 11:10 AM, Bonita Montero wrote:

Am 12.03.2026 um 15:43 schrieb DFS:

On 3/12/2026 10:13 AM, Bonita Montero wrote:

Here, do that in C:

static optional<size_t> parse( const char *str )
{
     size_t ret;
     if( from_chars_result fcr = from_chars( str, str + strlen( str >>> ), ret ); (bool)fcr.ec || *fcr.ptr )
         return nullopt;
     return ret;
}

Explain in detail what it does.

If that isn't self-explanatory stick with C.

I've got a task for you: Do the same in C:

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>
#include <vector>

using namespace std;

int main( int argc, char **argv )
{
    if( argc < 2 )
        return EXIT_FAILURE;
    ifstream ifs( argv[1] );
    static regex rxNameTel( "^\\s*\"([^\"]*)\"\\s*\"([^\"]*)\"\\s*$" );
    struct name_tel { string name, tel; };
    vector<name_tel> phoneList;
    while( !ifs.eof() )
    {
        string line;
        getline( ifs, line );
        match_results<string::const_iterator> sm;
        if( regex_match( line, sm, rxNameTel ) )
            phoneList.emplace_back( string( sm[1].first, sm[1].second
), string( sm[2].first, sm[2].second ) );
    }
    sort( phoneList.begin(), phoneList.end(),
        []( const name_tel &left, const name_tel &right ) { return left.name < right.name; } );
    for( name_tel &phone : phoneList )
        cout << "\"" << phone.name << "\"\t\"" << phone.tel << "\"" <<
endl;
}

1. Read a file and parse it with ne mentioned regex-pattern.
2. Split both parts of every line in two strings.
3. Sort the "vector" according to the first string.
4. Print it.

I guess you don't manage to do that with less than five times the work.
Every external lib allowed.

Give me the file you used.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Mar 12 16:25:09 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 16:22 schrieb DFS:

Give me the file you used.

The input file looks like this:

"White House" "001 202 456 1414"
"Mother" "0049 211 151395"
"Mickey Mouse" "001 123 456 7890"

The output shout look sorted:

"White House" "001 202 456 1414"
"Mickey Mouse" "001 123 456 7890"
"Mother" "0049 211 151395"

Input with additional spaces, output without.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Mar 12 16:25:39 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 16:25 schrieb Bonita Montero:

Am 12.03.2026 um 16:22 schrieb DFS:

Give me the file you used.

The input file looks like this:

"White House" "001 202 456 1414"
"Mother" "0049 211 151395"
"Mickey Mouse" "001 123 456 7890"

The output shout look sorted:

"White House" "001 202 456 1414"
"Mickey Mouse" "001 123 456 7890"
"Mother" "0049 211 151395"

Sorry,

"Mickey Mouse" "001 123 456 7890"
"Mother" "0049 211 151395"
"White House" "001 202 456 1414"

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,comp.lang.c++ on Thu Mar 12 16:44:01 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 16:10 schrieb Bonita Montero:

static regex rxNameTel( "^\\s*\"([^\"]*)\"\\s*\"([^\"]*)\"\\s*$" );

Easier to read as a C++ raw string: R"~(^\s*"([^"]*)"\s*"([^"]*)"\s*$)~"
The regex is between ~( and )~.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Mar 12 16:48:39 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 16:22 schrieb DFS:

Give me the file you used.

Better take this:

"Max Mustermann" "0123-4567890"
"Anna Müller" "0987-6543210"
"Peter Schmidt" "030-1234567"
"Laura Fischer" "040-9876543"
"Tim Becker" "0151-1112223"
"Julia Neumann" "0221-3344556"
"Michael Braun" "0170-9988776"
"Sophie Wagner" "089-2233445"
"Felix Hoffmann" "0711-5566778"
"Lea Richter" "0341-1122334"
"Jonas Klein" "030-4455667"
"Emma Wolf" "040-5566778"
"Lukas König" "069-7788990"
"Clara Hofmann" "0157-2233445"
"Paul Schäfer" "0228-3344556"
"Mia Keller" "089-6677889"
"Leon Zimmermann" "0711-1122445"
"Nina Krause" "0341-4455667"
"David Schulz" "030-9988776"
"Sarah Lehmann" "040-2233445"
"Ben Richter" "069-3344556"
"Hannah Wagner" "0221-5566778"
"Tom Bauer" "0171-1122334"
"Lena Fischer" "0151-6677889"
"Simon Meier" "030-7788990"
"Marie Becker" "040-4455667"
"Jan Hoffmann" "0711-3344556"
"Leonie Klein" "0341-5566778"
"Philipp König" "089-9988776"
"Laura Schulze" "0228-1122334"
"Moritz Wolf" "0157-4455667"
"Jana Zimmer" "030-6677889"
"Felix Neumann" "0170-2233445"
"Sarah Braun" "040-7788990"
"Tim Schäfer" "0711-5566778"
"Anna Fischer" "0341-9988776"
"Maximilian Keller" "030-1122334"
"Lea Wagner" "0151-3344556"
"Lukas Hofmann" "089-6677889"
"Marie Richter" "0221-7788990"
"Jonas Klein" "0171-4455667"
"Clara Zimmermann" "040-5566778"
"Paul Wolf" "030-2233445"
"Sophie Neumann" "0711-3344556"
"Ben Meier" "0341-5566778"
"Emma Bauer" "0157-9988776"
"Leon Krause" "089-1122334"
"Julia Schulz" "0228-4455667"
"Tim Richter" "030-6677889"
"Anna Becker" "0170-2233445"

The output should be this:

"Anna Becker" "0170-2233445"
"Anna Fischer" "0341-9988776"
"Anna M├╝ller" "0987-6543210"
"Ben Meier" "0341-5566778"
"Ben Richter" "069-3344556"
"Clara Hofmann" "0157-2233445"
"Clara Zimmermann" "040-5566778"
"David Schulz" "030-9988776"
"Emma Bauer" "0157-9988776"
"Emma Wolf" "040-5566778"
"Felix Hoffmann" "0711-5566778"
"Felix Neumann" "0170-2233445"
"Hannah Wagner" "0221-5566778"
"Jan Hoffmann" "0711-3344556"
"Jana Zimmer" "030-6677889"
"Jonas Klein" "0171-4455667"
"Jonas Klein" "030-4455667"
"Julia Neumann" "0221-3344556"
"Julia Schulz" "0228-4455667"
"Laura Fischer" "040-9876543"
"Laura Schulze" "0228-1122334"
"Lea Richter" "0341-1122334"
"Lea Wagner" "0151-3344556"
"Lena Fischer" "0151-6677889"
"Leon Krause" "089-1122334"
"Leon Zimmermann" "0711-1122445"
"Leonie Klein" "0341-5566778"
"Lukas Hofmann" "089-6677889"
"Lukas K├Ânig" "069-7788990"
"Marie Becker" "040-4455667"
"Marie Richter" "0221-7788990"
"Max Mustermann" "0123-4567890"
"Maximilian Keller" "030-1122334"
"Mia Keller" "089-6677889"
"Michael Braun" "0170-9988776"
"Moritz Wolf" "0157-4455667"
"Nina Krause" "0341-4455667"
"Paul Sch├ñfer" "0228-3344556"
"Paul Wolf" "030-2233445"
"Peter Schmidt" "030-1234567"
"Philipp K├Ânig" "089-9988776"
"Sarah Braun" "040-7788990"
"Sarah Lehmann" "040-2233445"
"Simon Meier" "030-7788990"
"Sophie Neumann" "0711-3344556"
"Sophie Wagner" "089-2233445"
"Tim Becker" "0151-1112223"
"Tim Richter" "030-6677889"
"Tim Sch├ñfer" "0711-5566778"
"Tom Bauer" "0171-1122334"
--- Synchronet 3.21d-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Thu Mar 12 16:56:55 2026

From Newsgroup: comp.lang.c

Bonita Montero <Bonita.Montero@gmail.com> writes:

Here, do that in C:

static optional<size_t> parse( const char *str )
{
size_t ret;
if( from_chars_result fcr = from_chars( str, str + strlen( str ), ret
); (bool)fcr.ec || *fcr.ptr )
return nullopt;
return ret;
}

bool parse(const char *str, size_t *rval)
{
char *cp;
*rval = strtoul(str, &cp, 0);
if ((cp == str) || (*cp != '\0')) {
return false;
}
return true;
}

--- Synchronet 3.21d-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c,comp.lang.c++ on Thu Mar 12 17:32:08 2026

From Newsgroup: comp.lang.c

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 12.03.2026 um 15:43 schrieb DFS:

On 3/12/2026 10:13 AM, Bonita Montero wrote:

Here, do that in C:

static optional<size_t> parse( const char *str )
{
     size_t ret;
     if( from_chars_result fcr = from_chars( str, str + strlen( str ),
ret ); (bool)fcr.ec || *fcr.ptr )
         return nullopt;
     return ret;
}

Explain in detail what it does.

If that isn't self-explanatory stick with C.

I've got a task for you: Do the same in C:

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>
#include <vector>

using namespace std;

int main( int argc, char **argv )
{
if( argc < 2 )
return EXIT_FAILURE;

A silent failure implies to the interactive user
that the command succeeded. You should print
a responsive message before returning a failure code.

cerr << "Usage: " << argv[0] << " " <phonelist>" << eol;

(although I would prefer fprintf(stderr, ...)).

ifstream ifs( argv[1] );

You didn't bother to check if the file was opened.
You should print a responsive message and exit if the streams
failbit is set.

static regex rxNameTel( "^\\s*\"([^\"]*)\"\\s*\"([^\"]*)\"\\s*$" );
struct name_tel { string name, tel; };
vector<name_tel> phoneList;
while( !ifs.eof() )
{
string line;
getline( ifs, line );
match_results<string::const_iterator> sm;
if( regex_match( line, sm, rxNameTel ) )
phoneList.emplace_back( string( sm[1].first, sm[1].second ), string(
sm[2].first, sm[2].second ) );
}
sort( phoneList.begin(), phoneList.end(),
[]( const name_tel &left, const name_tel &right ) { return left.name <
right.name; } );
for( name_tel &phone : phoneList )
cout << "\"" << phone.name << "\"\t\"" << phone.tel << "\"" << endl;
}

1. Read a file and parse it with ne mentioned regex-pattern.
2. Split both parts of every line in two strings.
3. Sort the "vector" according to the first string.
4. Print it.

This is certainly a task that can be done simply using a shell
script and the standard POSIX utility set. awk(1) would be
a good starting point; it's possible that it can be done with
a single invocation of the posix 'sort' utility.

Of course, the lack of any in-line documentation (e.g. comments) is
a typical defect in your C++ code.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.c,comp.lang.c++ on Thu Mar 12 20:15:29 2026

From Newsgroup: comp.lang.c

On 12.03.26 11:36, Bonita Montero wrote:

Am 12.03.2026 um 09:32 schrieb Janis Papanagnou:

Concerning your subject; it wouldn't appear to me to use the term
"beauty" in context of C++. C++ inherited so much ugly syntax and
concepts (from "C") and it added yet more syntactic infelicities.
(And I'm saying that as someone who liked to program in C++, also
professionally, for many many years.)

My perception of beauty is just as subjective as your perception of the
ugly aspects of C++. I like the span of lowleven means in C++ that are inherited from C to higher level abstractions C++ has inherited from
other more moderrn language.

Or even from older languages, like classes and inheritance from Simula, type-safety (string type checking) as provided by many older langauges, functional principles (cf. STL) from functional languages, and whatnot.

And if you don't the subjektive part of beauty or ugly parts of this
language C++ is several times more effective than C, i.e. you need
only a fraction of the code size while mostly maintaining the same performance.

Yes, I have agreed on that part, what you functionally get with C++.

Concerning the terms beauty and ugliness being subjective is obvious.

I was particularly speaking about syntax. - If you happen to know a
sufficient set of (modern or historic) languages - and I have no
reason to believe you haven't - you can observe that there are many
languages that are _a lot_ clearer designed than C++, and much safer
to use. Languages where the time span from design to a running system
is much shorter than in a language where it's easy to make subtle or
blatant errors.

You mentioned an IMO noteworthy specific aspect of C++; you seem to
like the low-to-high-level option. - Myself I also like both levels
to program in, depending on the application area. - But I consider
it a problem if the syntax and semantics of the inherited low-level
language taints the abstract concepts of a language. Personally I'd
prefer here a readable and powerful high level language with an
*interface* to connect arbitrary code from low-level languages. IMO
there's no need to do all levels in one language. And there are also
existing paragons that support that language design principle.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.c on Thu Mar 12 20:25:49 2026

From Newsgroup: comp.lang.c

On 12.03.26 16:48, Bonita Montero wrote:

Am 12.03.2026 um 16:22 schrieb DFS:

Give me the file you used.

Better take this:
[snip list of names of people and telephone numbers]

The output should be this:
[snip list of names of people and telephone numbers]

I see there are real existing people in that list.

Do you have the consent of these persons to disseminate
their names and telephone numbers?

I'm just curious.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Thu Mar 12 16:18:52 2026

From Newsgroup: comp.lang.c

On 3/12/2026 11:10 AM, Bonita Montero wrote:

I've got a task for you: Do the same in C:

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>
#include <vector>

using namespace std;

int main( int argc, char **argv )
{
    if( argc < 2 )
        return EXIT_FAILURE;
    ifstream ifs( argv[1] );
    static regex rxNameTel( "^\\s*\"([^\"]*)\"\\s*\"([^\"]*)\"\\s*$" );
    struct name_tel { string name, tel; };
    vector<name_tel> phoneList;
    while( !ifs.eof() )
    {
        string line;
        getline( ifs, line );
        match_results<string::const_iterator> sm;
        if( regex_match( line, sm, rxNameTel ) )
            phoneList.emplace_back( string( sm[1].first, sm[1].second
), string( sm[2].first, sm[2].second ) );
    }
    sort( phoneList.begin(), phoneList.end(),
        []( const name_tel &left, const name_tel &right ) { return left.name < right.name; } );
    for( name_tel &phone : phoneList )
        cout << "\"" << phone.name << "\"\t\"" << phone.tel << "\"" <<
endl;
}

1. Read a file and parse it with ne mentioned regex-pattern.
2. Split both parts of every line in two strings.
3. Sort the "vector" according to the first string.
4. Print it.

I guess you don't manage to do that with less than five times the work.
Every external lib allowed.

I'll try that in C if you commit to trying the following in C++. Deal?

--------------------------------------------------------------------------
* read in a list of words from a file here: https://people.sc.fsu.edu/~jburkardt/datasets/words/words.html

* pick N random words from that list and put them in an array (OK if
there are a few dupes - if you can get no dupes that's better)

* sort and print the array of randoms, adding a blank line each time the
1st letter changes

(note: I just now wrote this. It's not old code.)

usage is
$./randwords filename N
$./randwords special_english.txt 200

-------------------------------------------------------------------------- //public domain code to read a list of words in and print N
//random words in sorted order

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

int compare(const void *a, const void *b) {
const char* ca = *(const char**)a;
const char* cb = *(const char**)b;
return strcmp(ca,cb);
}

int main(int argc, char *argv[]) {

if (argc != 3) {
printf("Enter file name and a number:\n$./prog file.txt 200\n");
return 0;
}

int i = 0;
int lines = 0;
char line[35] = "";
int N = atoi(argv[2]);
int randnbr, randnbr2;

FILE *fin = fopen(argv[1],"r");
while (fgets(line,sizeof line, fin) != NULL) {lines++;}
if (N > lines) {
printf("Enter a value < %d\n", lines);
exit(0);
}

rewind(fin);
char **wordlist = malloc(sizeof(char*) * lines);
while (fgets(line,sizeof line, fin) != NULL) {
int datalen = strlen(line);
wordlist[i] = malloc(datalen + 1);
strncpy(wordlist[i], line, datalen);
wordlist[i][datalen] = '\0';
i++;
}

char **wordrand = malloc(sizeof(char*) * N);
srand(time(NULL) + getpid());
for (i = 0; i < N; i++) {
int randnbr = (rand() % lines) + 1;
int datalen = strlen(wordlist[randnbr]);
wordrand[i] = malloc(datalen + 1);
strncpy(wordrand[i], wordlist[randnbr], datalen);
wordrand[i][datalen] = '\0';
}

qsort(wordrand, N, sizeof(char*), compare);

printf("%d words read in\n", lines);
printf("%d random words extracted\n",N);

int currletter = wordrand[0][0];
for (i = 0; i < N; i++) {
if ((wordrand[i][0] != currletter) && (i > 0)) {
printf("\n");
}
currletter = wordrand[i][0];
printf("%d. %s",i+1, wordrand[i]);
}

printf("\n");

free(wordrand);
free(wordlist);
return 0;
}
--------------------------------------------------------------------------

output is something like:
$ ./randwords special_english.txt 200
1477 words read in
200 random words extracted
1. above
2. accept
3. after
4. against
5. agency
6. ammunition
7. anger
8. anniversary
9. army
10. arrive
11. art
12. artillery
13. automobile
14. autumn

15. bed
16. below
17. bleed
18. blow
19. blue
20. boat
21. boil
22. bread
23. bridge
24. brown
25. business

26. cancer
27. claim
28. clean
29. cloud
30. cloud
31. cloud
32. combine
33. compare
34. conflict
35. consider
36. contain
37. correct
38. credit
39. criticize
40. cross
41. crowd
42. customs

43. decide
44. demand
...
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,comp.lang.c++ on Fri Mar 13 00:56:21 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 18:32 schrieb Scott Lurndal:

You didn't bother to check if the file was opened.

That's not necessary to compare it against a equal solution in C.

This is certainly a task that can be done simply using a shell
script and the standard POSIX utility set. awk(1) would be
a good starting point; it's possible that it can be done with
a single invocation of the posix 'sort' utility.

It's comparison of C against C++.

Of course, the lack of any in-line documentation (e.g. comments) is
a typical defect in your C++ code.

Complete idiot.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 13 00:57:11 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 20:25 schrieb Janis Papanagnou:

On 12.03.26 16:48, Bonita Montero wrote:

Am 12.03.2026 um 16:22 schrieb DFS:

Give me the file you used.

Better take this:
[snip list of names of people and telephone numbers]
The output should be this:
[snip list of names of people and telephone numbers]

I see there are real existing people in that list.

Do you have the consent of these persons to disseminate
their names and telephone numbers?

These are no real people.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,de.comp.lang.c++ on Fri Mar 13 01:06:02 2026

From Newsgroup: comp.lang.c

Failed ...
You're parsing a completely different type of file.
And with 2.5 times the source length.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,comp.lang.c++ on Fri Mar 13 01:14:00 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 18:32 schrieb Scott Lurndal:

You didn't bother to check if the file was opened.

If the file couldn't been opened nothing happens.
No difference to a check.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,comp.lang.c++ on Fri Mar 13 01:27:52 2026

From Newsgroup: comp.lang.c

I shortened my code a bit.
Do that:

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>

using namespace std;

int main( int argc, char **argv )
{
if( argc < 2 )
return EXIT_FAILURE;
ifstream ifs( argv[1] );
static regex rxNameTel( R"~(^\s*"([^"]*)"\s*"([^"]*)"\s*$)~" );
struct name_tel { string name, tel; };
vector<name_tel> phoneList;
smatch sm;
for( string line; getline( ifs, line ); )
if( regex_match( line, sm, rxNameTel ) )
phoneList.emplace_back( sm[1].str(), sm[2].str() );
sort( phoneList.begin(), phoneList.end(),
[]( name_tel &left, name_tel &right ) { return left.name < right.name;
} );
for( name_tel &phone : phoneList )
cout << "\"" << phone.name << "\"\t\"" << phone.tel << "\"" << endl;
}

Have a closer look at the regex.
--- Synchronet 3.21d-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Fri Mar 13 00:54:08 2026

From Newsgroup: comp.lang.c

DFS <nospam@dfs.com> writes:

I'll try that in C if you commit to trying the following in C++. Deal?

--------------------------------------------------------------------------
* read in a list of words from a file here: >https://people.sc.fsu.edu/~jburkardt/datasets/words/words.html

* pick N random words from that list and put them in an array (OK if
there are a few dupes - if you can get no dupes that's better)

* sort and print the array of randoms, adding a blank line each time the
1st letter changes

(note: I just now wrote this. It's not old code.)

usage is
$./randwords filename N
$./randwords special_english.txt 200

--------------------------------------------------------------------------

A POSIX version:

#include <errno.h>
#include <fcntl.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

#include <sys/mman.h>
#include <sys/stat.h>

int
qsortcompare(const void *a, const void *b)
{
const char* ca = *(const char**)a;
const char* cb = *(const char**)b;
return strcmp(ca,cb);
}

int
main(int argc, const char **argv)
{
int fd;
struct stat st;
char *words;
char *cp;
char *end;
char **wordlist;
char **sorted;
char ch = '\0';
size_t wordcount = 0u;
size_t i;
size_t n;

if (argc < 3) {
fprintf(stderr, "Usage: %s <wordlist> <N>\n", argv[0]);
return 1;
}

n = strtoul(argv[2], &cp, 0);
if ((cp == argv[2]) || (*cp != '\0')) {
fprintf(stderr, "%s: <N> argument '%s' must be fully numeric in base 8, 10 or 16\n", argv[0], argv[2]);
return 1;
}

if (stat(argv[1], &st) == -1) {
fprintf(stderr, "%s: Unable to open '%s': %s\n", argv[0], argv[1], strerror(errno));
return 2;
}

fd = open(argv[1], O_RDONLY, 0);
if (fd == -1) {
fprintf(stderr, "%s: Unable to open '%s': %s\n", argv[0], argv[1], strerror(errno));
return 2;
}

words = mmap(NULL, st.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0ul);
if (words == MAP_FAILED) {
fprintf(stderr, "%s: Unable to mmap '%s': %s\n", argv[0], argv[1], strerror(errno));
return 2;
}
close(fd);

cp = words;
for(i = 0ul; i < st.st_size; i++, cp++) {
if (*cp == '\n') wordcount++;
}

wordlist = malloc(wordcount * sizeof(const char *));
if (wordlist == NULL) {
fprintf(stderr, "%s: Unable to allocate %zu bytes\n", argv[0], wordcount * sizeof(const char *));
return 3;
}

cp = words;
end = cp + st.st_size;
i = 0u;
while (cp < end) {
wordlist[i++] = cp;
for(; (cp < end) && (*cp != '\n'); cp++) {}
*cp++ = '\0';
}

sorted = malloc(n * sizeof(const char *));
if (sorted == NULL) {
fprintf(stderr, "%s: Unable to allocate %zu bytes\n", argv[0], n * sizeof(const char *));
return 3;
}

srand(time(NULL));
for(i = 0ul; i < n; i++) {
sorted[i] = wordlist[rand() % wordcount];
}
qsort(sorted, n, sizeof(sorted[0]), qsortcompare);

for(size_t w = 0ul; w < n; w++) {
if (ch && (ch != sorted[w][0])) fputc('\n', stdout);
fprintf(stdout, "%s\n", sorted[w]);
ch = sorted[w][0];
}

free(wordlist);
munmap(words, st.st_size);

return 0;
}
--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.c on Fri Mar 13 02:19:13 2026

From Newsgroup: comp.lang.c

On 13.03.26 00:57, Bonita Montero wrote:

Am 12.03.2026 um 20:25 schrieb Janis Papanagnou:

On 12.03.26 16:48, Bonita Montero wrote:

Am 12.03.2026 um 16:22 schrieb DFS:

Give me the file you used.

Better take this:
[snip list of names of people and telephone numbers]
The output should be this:
[snip list of names of people and telephone numbers]

I see there are real existing people in that list.

Do you have the consent of these persons to disseminate
their names and telephone numbers?

These are no real people.

This is an interesting statement. - I first saw the typical
"Max Mustermann" that is in our country the prototype of an
artificial test entry with obvious test number 0123-4567890.
Then I picked an arbitrary entry and looked it up; I found
that person with exactly the associated telephone number. -
Where did you get these real names from?

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Thu Mar 12 21:22:45 2026

From Newsgroup: comp.lang.c

On 3/12/2026 11:48 AM, Bonita Montero wrote:

Am 12.03.2026 um 16:22 schrieb DFS:

Give me the file you used.

Better take this:

That's 50 in and 50 out.

So what's the RegEx for?

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Fri Mar 13 00:31:29 2026

From Newsgroup: comp.lang.c

On 3/12/2026 8:54 PM, Scott Lurndal wrote:

DFS <nospam@dfs.com> writes:

I'll try that in C if you commit to trying the following in C++. Deal?

-------------------------------------------------------------------------- >> * read in a list of words from a file here:
https://people.sc.fsu.edu/~jburkardt/datasets/words/words.html

* pick N random words from that list and put them in an array (OK if
there are a few dupes - if you can get no dupes that's better)

* sort and print the array of randoms, adding a blank line each time the
1st letter changes

(note: I just now wrote this. It's not old code.)

usage is
$./randwords filename N
$./randwords special_english.txt 200

--------------------------------------------------------------------------

A POSIX version:

I took out all 7 instances of error trapping, and it throws:
Floating point exception (core dumped)
at line 61 or 62

======================================================================== #include <errno.h>
#include <fcntl.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>

int
qsortcompare(const void *a, const void *b)
{
const char* ca = *(const char**)a;
const char* cb = *(const char**)b;
return strcmp(ca,cb);
}

int
main(int argc, const char **argv)
{
int fd;
struct stat st;
char *words;
char *cp;
char *end;
char **wordlist;
char **sorted;
char ch = '\0';
size_t wordcount = 0u;
size_t i;
size_t n;

n = strtoul(argv[2], &cp, 0);
fd = open(argv[1], O_RDONLY, 0);
words = mmap(NULL, st.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE,
fd, 0ul);
close(fd);

cp = words;
for(i = 0ul; i < st.st_size; i++, cp++) {
if (*cp == '\n') wordcount++;
}
wordlist = malloc(wordcount * sizeof(const char *));

cp = words;
end = cp + st.st_size;
i = 0u;
while (cp < end) {
wordlist[i++] = cp;
for(; (cp < end) && (*cp != '\n'); cp++) {}
*cp++ = '\0';
}

sorted = malloc(n * sizeof(const char *));

srand(time(NULL));

//Floating point exception (core dumped)
for(i = 0ul; i < n; i++) {
sorted[i] = wordlist[rand() % wordcount];
}

qsort(sorted, n, sizeof(sorted[0]), qsortcompare);

for(size_t w = 0ul; w < n; w++) {
if (ch && (ch != sorted[w][0])) fputc('\n', stdout);
fprintf(stdout, "%s\n", sorted[w]);
ch = sorted[w][0];
}

free(wordlist);
munmap(words, st.st_size);

return 0;
}
========================================================================

#include <errno.h>
#include <fcntl.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

#include <sys/mman.h>
#include <sys/stat.h>

int
qsortcompare(const void *a, const void *b)
{
const char* ca = *(const char**)a;
const char* cb = *(const char**)b;
return strcmp(ca,cb);
}

int
main(int argc, const char **argv)
{
int fd;
struct stat st;
char *words;
char *cp;
char *end;
char **wordlist;
char **sorted;
char ch = '\0';
size_t wordcount = 0u;
size_t i;
size_t n;

if (argc < 3) {
fprintf(stderr, "Usage: %s <wordlist> <N>\n", argv[0]);
return 1;
}

n = strtoul(argv[2], &cp, 0);
if ((cp == argv[2]) || (*cp != '\0')) {
fprintf(stderr, "%s: <N> argument '%s' must be fully numeric in base 8, 10 or 16\n", argv[0], argv[2]);
return 1;
}

if (stat(argv[1], &st) == -1) {
fprintf(stderr, "%s: Unable to open '%s': %s\n", argv[0], argv[1], strerror(errno));
return 2;
}

fd = open(argv[1], O_RDONLY, 0);
if (fd == -1) {
fprintf(stderr, "%s: Unable to open '%s': %s\n", argv[0], argv[1], strerror(errno));
return 2;
}

words = mmap(NULL, st.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0ul);
if (words == MAP_FAILED) {
fprintf(stderr, "%s: Unable to mmap '%s': %s\n", argv[0], argv[1], strerror(errno));
return 2;
}
close(fd);

cp = words;
for(i = 0ul; i < st.st_size; i++, cp++) {
if (*cp == '\n') wordcount++;
}

wordlist = malloc(wordcount * sizeof(const char *));
if (wordlist == NULL) {
fprintf(stderr, "%s: Unable to allocate %zu bytes\n", argv[0], wordcount * sizeof(const char *));
return 3;
}

cp = words;
end = cp + st.st_size;
i = 0u;
while (cp < end) {
wordlist[i++] = cp;
for(; (cp < end) && (*cp != '\n'); cp++) {}
*cp++ = '\0';
}

sorted = malloc(n * sizeof(const char *));
if (sorted == NULL) {
fprintf(stderr, "%s: Unable to allocate %zu bytes\n", argv[0], n * sizeof(const char *));
return 3;
}

srand(time(NULL));
for(i = 0ul; i < n; i++) {
sorted[i] = wordlist[rand() % wordcount];
}
qsort(sorted, n, sizeof(sorted[0]), qsortcompare);

for(size_t w = 0ul; w < n; w++) {
if (ch && (ch != sorted[w][0])) fputc('\n', stdout);
fprintf(stdout, "%s\n", sorted[w]);
ch = sorted[w][0];
}

free(wordlist);
munmap(words, st.st_size);

return 0;
}

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 13 06:14:29 2026

From Newsgroup: comp.lang.c

Am 13.03.2026 um 02:19 schrieb Janis Papanagnou:

This is an interesting statement. - I first saw the typical
"Max Mustermann" that is in our country the prototype of an
artificial test entry with obvious test number 0123-4567890.
Then I picked an arbitrary entry and looked it up; I found
that person with exactly the associated telephone number. -
Where did you get these real names from?

The list was generated with ChatGpt.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 13 06:15:54 2026

From Newsgroup: comp.lang.c

Am 13.03.2026 um 02:22 schrieb DFS:

That's 50 in and 50 out.

Yes, but sorted.

So what's the RegEx for?

Learn Regex !
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 13 06:24:16 2026

From Newsgroup: comp.lang.c

Am 13.03.2026 um 05:31 schrieb DFS:

    //Floating point exception (core dumped)
    for(i = 0ul; i < n; i++) {
        sorted[i] = wordlist[rand() % wordcount];
    }

...

    free(wordlist);
    munmap(words, st.st_size);

BIG mistake: If you crash in the first part the memory in
the second part isn't freed, at least under Windows 3.11. ;-)
--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Fri Mar 13 01:34:19 2026

From Newsgroup: comp.lang.c

On 3/13/2026 12:31 AM, DFS wrote:

On 3/12/2026 8:54 PM, Scott Lurndal wrote:

DFS <nospam@dfs.com> writes:

I took out all 7 instances of error trapping, and it throws:
Floating point exception (core dumped)
at line 61 or 62

Put back in just the stat() call

stat(argv[1], &st);

and the program works again.

I see what happened.
Without the stat() call, st.st_size is 1.
With the stat() call, st.st_size is 9836.

The mmap-ing executes either way. With no stat() call, the words array
still ended up with 606 words (using a file with 1477 words).

But after that wordcount was set to 0:

for(i = 0ul; i < st.st_size;

and then the call to

wordlist[rand() % wordcount]

threw the floating point exception.

Question: wordlist was malloced with a size of 0:

wordlist = malloc(wordcount * sizeof(const char *));

Why are you allowed to malloc size 0?

After malloc, you populate it and it ended up containing only the first
letter of the first word:

wordlist[0] = 'a'
wordlist[1 and up] = (null)

I used file special_english.txt from that link I gave.

======================================================================== #include <errno.h>
#include <fcntl.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>

int
qsortcompare(const void *a, const void *b)
{
     const char* ca = *(const char**)a;
     const char* cb = *(const char**)b;
     return strcmp(ca,cb);
}

int
main(int argc, const char **argv)
{
    int fd;
    struct stat st;
    char *words;
    char *cp;
    char *end;
    char **wordlist;
    char **sorted;
    char ch = '\0';
    size_t wordcount = 0u;
    size_t i;
    size_t n;

    n = strtoul(argv[2], &cp, 0);
    fd = open(argv[1], O_RDONLY, 0);
    words = mmap(NULL, st.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0ul);
    close(fd);

    cp = words;
    for(i = 0ul; i < st.st_size; i++, cp++) {
        if (*cp == '\n') wordcount++;
    }
    wordlist = malloc(wordcount * sizeof(const char *));

    cp = words;
    end = cp + st.st_size;
    i = 0u;
    while (cp < end) {
        wordlist[i++] = cp;
    for(; (cp < end) && (*cp != '\n'); cp++) {}
    *cp++ = '\0';
    }

    sorted = malloc(n * sizeof(const char *));

    srand(time(NULL));

    //Floating point exception (core dumped)
    for(i = 0ul; i < n; i++) {
        sorted[i] = wordlist[rand() % wordcount];
    }

    qsort(sorted, n, sizeof(sorted[0]), qsortcompare);

    for(size_t w = 0ul; w < n; w++) {
    if (ch && (ch != sorted[w][0])) fputc('\n', stdout);
        fprintf(stdout, "%s\n", sorted[w]);
    ch = sorted[w][0];
    }

    free(wordlist);
    munmap(words, st.st_size);

    return 0;
}
========================================================================

#include <errno.h>
#include <fcntl.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

#include <sys/mman.h>
#include <sys/stat.h>

int
qsortcompare(const void *a, const void *b)
{
      const char* ca = *(const char**)a;
      const char* cb = *(const char**)b;
      return strcmp(ca,cb);
}

int
main(int argc, const char **argv)
{
     int fd;
     struct stat st;
     char *words;
     char *cp;
     char *end;
     char **wordlist;
     char **sorted;
     char ch = '\0';
     size_t wordcount = 0u;
     size_t i;
     size_t n;

     if (argc < 3) {
         fprintf(stderr, "Usage: %s <wordlist> <N>\n", argv[0]);
         return 1;
     }

     n = strtoul(argv[2], &cp, 0);
     if ((cp == argv[2]) || (*cp != '\0')) {
         fprintf(stderr, "%s: <N> argument '%s' must be fully numeric
in base 8, 10 or 16\n", argv[0], argv[2]);
         return 1;
     }

     if (stat(argv[1], &st) == -1) {
         fprintf(stderr, "%s: Unable to open '%s': %s\n", argv[0], >> argv[1], strerror(errno));
         return 2;
     }

     fd = open(argv[1], O_RDONLY, 0);
     if (fd == -1) {
         fprintf(stderr, "%s: Unable to open '%s': %s\n", argv[0], >> argv[1], strerror(errno));
      return 2;
     }

     words = mmap(NULL, st.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, >> fd, 0ul);
     if (words == MAP_FAILED) {
         fprintf(stderr, "%s: Unable to mmap '%s': %s\n", argv[0], >> argv[1], strerror(errno));
         return 2;
     }
     close(fd);

     cp = words;
     for(i = 0ul; i < st.st_size; i++, cp++) {
    if (*cp == '\n') wordcount++;
     }

     wordlist = malloc(wordcount * sizeof(const char *));
     if (wordlist == NULL) {
         fprintf(stderr, "%s: Unable to allocate %zu bytes\n",
argv[0], wordcount * sizeof(const char *));
         return 3;
     }

     cp = words;
     end = cp + st.st_size;
     i = 0u;
     while (cp < end) {
         wordlist[i++] = cp;
    for(; (cp < end) && (*cp != '\n'); cp++) {}
    *cp++ = '\0';
     }

     sorted = malloc(n * sizeof(const char *));
     if (sorted == NULL) {
         fprintf(stderr, "%s: Unable to allocate %zu bytes\n",
argv[0], n * sizeof(const char *));
         return 3;
     }

     srand(time(NULL));
     for(i = 0ul; i < n; i++) {
         sorted[i] = wordlist[rand() % wordcount];
     }
     qsort(sorted, n, sizeof(sorted[0]), qsortcompare);

     for(size_t w = 0ul; w < n; w++) {
    if (ch && (ch != sorted[w][0])) fputc('\n', stdout);
         fprintf(stdout, "%s\n", sorted[w]);
    ch = sorted[w][0];
     }

     free(wordlist);
     munmap(words, st.st_size);

     return 0;
}

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Fri Mar 13 01:40:21 2026

From Newsgroup: comp.lang.c

On 3/13/2026 1:24 AM, Bonita Montero wrote:

Am 13.03.2026 um 05:31 schrieb DFS:

     //Floating point exception (core dumped)
     for(i = 0ul; i < n; i++) {
         sorted[i] = wordlist[rand() % wordcount];
     }

...

     free(wordlist);
     munmap(words, st.st_size);

BIG mistake: If you crash in the first part the memory in
the second part isn't freed, at least under Windows 3.11. ;-)

Lucky I'm running Win95 then...

(actually using Ubuntu on WSL on Win11 Pro 25H2)

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 13 07:24:10 2026

From Newsgroup: comp.lang.c

Am 13.03.2026 um 05:31 schrieb DFS:

I took out all 7 instances of error trapping, and it throws:
Floating point exception (core dumped)
at line 61 or 62

Take my signal_scope<> class, it makes signals thread-specific:

thread_local jmp_buf jb;
signal_scope<SIGFPE> fpeExc( +[]( int, siginfo_t *, void * )
{
static const char PrintThis[] = "caught!\n";
(void)write( 1, PrintThis, sizeof PrintThis - 1 );
siglongjmp( jb, 1 );
} );
if( int ret = sigsetjmp( jb, 1 ); !ret )
...
else
...

Here's the source of the signal_scope class:

#pragma once
#include <cstdlib>
#include <variant>
#include <mutex>
#include <cassert>
#include <unistd.h>
#include <signal.h>
#include <setjmp.h>

template<int SigNo>
struct signal_scope
{
static_assert(SigNo == SIGILL || SigNo == SIGFPE || SigNo == SIGSEGV ||
SigNo == SIGBUS || SigNo == SIGTRAP, "only sychronous signals");
using handler_fn = bool (*)( int );
using siginfo_handler_fn = bool (*)( int, siginfo_t *, void * );
using handler_variant = std::variant<int, handler_fn, siginfo_handler_fn>;
signal_scope( handler_variant handler = handler_variant() ) noexcept;
~signal_scope();
void operator =( handler_variant handler ) noexcept;
static void fallback( handler_variant handler ) noexcept;
static void re_init( const sigset_t *pSet, int flags );
private:
inline static std::mutex g_mtxFallback;
inline static handler_variant g_fallback = handler_variant();
inline static thread_local handler_variant t_handler = handler_variant();
handler_variant m_handlerBefore;
inline static struct init
{
init();
~init();
void reset( const sigset_t *pSet, int flags, bool old );
void dummy() {}
struct sigaction m_saBefore;
} g_init;
static void action( int sig, siginfo_t *info, void *uContext ) noexcept;
static bool callHandler( const handler_variant &handler, int sig, siginfo_t *info, void *uContext );
};

template<int SigNo>
signal_scope<SigNo>::signal_scope( handler_variant handler ) noexcept :
m_handlerBefore( t_handler )
{
(void)g_init;
t_handler = handler;
}

template<int SigNo>
inline signal_scope<SigNo>::~signal_scope()
{
t_handler = m_handlerBefore;
}

template<int SigNo>
inline void signal_scope<SigNo>::operator =( handler_variant handler ) noexcept
{
t_handler = handler;
}

template<int SigNo>
void signal_scope<SigNo>::fallback( handler_variant handler ) noexcept
{
using namespace std;
lock_guard lock( g_mtxFallback );
g_fallback = handler;
}

template<int SigNo>
void signal_scope<SigNo>::re_init( const sigset_t *pSet, int flags )
{
g_init.reset( pSet, flags, false );
}

template<int SigNo>
signal_scope<SigNo>::init::init()
{
reset( nullptr, 0, true );
}

template<int SigNo>
signal_scope<SigNo>::init::~init()
{
sigaction( SigNo, &m_saBefore, nullptr );
}

template<int SigNo>
void signal_scope<SigNo>::init::reset( const sigset_t *pSet, int flags,
bool old )
{
struct sigaction sa;
sa.sa_sigaction = action;
if( pSet )
sa.sa_mask = *pSet;
else
sigfillset( &sa.sa_mask );
sa.sa_flags = flags | SA_SIGINFO;
sigaction( SigNo, &sa, old ? &m_saBefore : nullptr );
}

template<int SigNo>
void signal_scope<SigNo>::action( int sig, siginfo_t *info, void
*uContext ) noexcept
{
using namespace std;
if( callHandler( t_handler, sig, info, uContext ) ) [[likely]]
return;
handler_variant fallback;
{
lock_guard lock( g_mtxFallback );
fallback = g_fallback;
}
callHandler( fallback, sig, info, uContext );
}

template<int SigNo>
bool signal_scope<SigNo>::callHandler( const handler_variant &handler,
int sig, siginfo_t *info, void *uContext )
{
if( holds_alternative<handler_fn>( handler ) ) [[likely]]
return get<handler_fn>( handler )( sig );
if( holds_alternative<siginfo_handler_fn>( handler ) ) [[likely]]
return get<siginfo_handler_fn>( handler )( sig, info, uContext );
return false;
}

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 13 08:08:48 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 21:18 schrieb DFS:

output is something like:
$ ./randwords special_english.txt 200
1477 words read in
200 random words extracted
1. above
2. accept
3. after
4. against
5. agency
6. ammunition
7. anger
8. anniversary
9. army
10. arrive
11. art
12. artillery
13. automobile
14. autumn

15. bed
16. below
17. bleed
18. blow
19. blue
20. boat
21. boil
22. bread
23. bridge
24. brown
25. business

26. cancer
27. claim
28. clean
29. cloud
30. cloud
31. cloud
32. combine
33. compare
34. conflict
35. consider
36. contain
37. correct
38. credit
39. criticize
40. cross
41. crowd
42. customs

43. decide
44. demand
...

What's the format of the input file ? Also numbered lines ?
--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Fri Mar 13 04:19:52 2026

From Newsgroup: comp.lang.c

On 3/13/2026 3:08 AM, Bonita Montero wrote:

Am 12.03.2026 um 21:18 schrieb DFS:

output is something like:
$ ./randwords special_english.txt 200
1477 words read in
200 random words extracted
1. above
2. accept
3. after
...

What's the format of the input file ? Also numbered lines ?

not numbered

* read in a list of words from a file here: https://people.sc.fsu.edu/~jburkardt/datasets/words/words.html

--- Synchronet 3.21d-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Fri Mar 13 01:48:13 2026

From Newsgroup: comp.lang.c

On 3/12/2026 10:14 PM, Bonita Montero wrote:

Am 13.03.2026 um 02:19 schrieb Janis Papanagnou:

This is an interesting statement. - I first saw the typical
"Max Mustermann" that is in our country the prototype of an
artificial test entry with obvious test number 0123-4567890.
Then I picked an arbitrary entry and looked it up; I found
that person with exactly the associated telephone number. -
Where did you get these real names from?

The list was generated with ChatGpt.

Oh my. Beware, and always double, and triple, check its reams of code.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 13 09:49:53 2026

From Newsgroup: comp.lang.c

Am 13.03.2026 um 09:48 schrieb Chris M. Thomasson:

On 3/12/2026 10:14 PM, Bonita Montero wrote:

Am 13.03.2026 um 02:19 schrieb Janis Papanagnou:

This is an interesting statement. - I first saw the typical
"Max Mustermann" that is in our country the prototype of an
artificial test entry with obvious test number 0123-4567890.
Then I picked an arbitrary entry and looked it up; I found
that person with exactly the associated telephone number. -
Where did you get these real names from?

The list was generated with ChatGpt.

Oh my. Beware, and always double, and triple, check its reams of code.

It's only word / telephone number tuples as .txt.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Fri Mar 13 01:51:31 2026

From Newsgroup: comp.lang.c

On 3/12/2026 11:24 PM, Bonita Montero wrote:

Am 13.03.2026 um 05:31 schrieb DFS:

I took out all 7 instances of error trapping, and it throws:
Floating point exception (core dumped)
at line 61 or 62

Take my signal_scope<> class, it makes signals thread-specific:

thread_local jmp_buf jb;
signal_scope<SIGFPE> fpeExc( +[]( int, siginfo_t *, void * )
    {
        static const char PrintThis[] = "caught!\n";
        (void)write( 1, PrintThis, sizeof PrintThis - 1 );
        siglongjmp( jb, 1 );
    } );
if( int ret = sigsetjmp( jb, 1 ); !ret )
    ...
else
    ...

Here's the source of the signal_scope class:

#pragma once
#include <cstdlib>
#include <variant>
#include <mutex>
#include <cassert>
#include <unistd.h>
#include <signal.h>
#include <setjmp.h>

template<int SigNo>
struct signal_scope
{
    static_assert(SigNo == SIGILL || SigNo == SIGFPE || SigNo == SIGSEGV || SigNo == SIGBUS || SigNo == SIGTRAP, "only sychronous signals");
    using handler_fn = bool (*)( int );
    using siginfo_handler_fn = bool (*)( int, siginfo_t *, void * );
    using handler_variant = std::variant<int, handler_fn, siginfo_handler_fn>;
    signal_scope( handler_variant handler = handler_variant() ) noexcept;
    ~signal_scope();
    void operator =( handler_variant handler ) noexcept;
    static void fallback( handler_variant handler ) noexcept;
    static void re_init( const sigset_t *pSet, int flags );
private:
    inline static std::mutex g_mtxFallback;
    inline static handler_variant g_fallback = handler_variant();
    inline static thread_local handler_variant t_handler = handler_variant();
    handler_variant m_handlerBefore;
    inline static struct init
    {
        init();
        ~init();
        void reset( const sigset_t *pSet, int flags, bool old );
        void dummy() {}
        struct sigaction m_saBefore;
    } g_init;
    static void action( int sig, siginfo_t *info, void *uContext ) noexcept;
    static bool callHandler( const handler_variant &handler, int sig, siginfo_t *info, void *uContext );
};

template<int SigNo>
signal_scope<SigNo>::signal_scope( handler_variant handler ) noexcept :
    m_handlerBefore( t_handler )
{
    (void)g_init;
    t_handler = handler;
}

template<int SigNo>
inline signal_scope<SigNo>::~signal_scope()
{
    t_handler = m_handlerBefore;
}

template<int SigNo>
inline void signal_scope<SigNo>::operator =( handler_variant handler ) noexcept
{
    t_handler = handler;
}

template<int SigNo>
void signal_scope<SigNo>::fallback( handler_variant handler ) noexcept
{
    using namespace std;
    lock_guard lock( g_mtxFallback );
    g_fallback = handler;
}

template<int SigNo>
void signal_scope<SigNo>::re_init( const sigset_t *pSet, int flags )
{
    g_init.reset( pSet, flags, false );
}

template<int SigNo>
signal_scope<SigNo>::init::init()
{
    reset( nullptr, 0, true );
}

template<int SigNo>
signal_scope<SigNo>::init::~init()
{
    sigaction( SigNo, &m_saBefore, nullptr );
}

template<int SigNo>
void signal_scope<SigNo>::init::reset( const sigset_t *pSet, int flags,
bool old )
{
    struct sigaction sa;
    sa.sa_sigaction = action;
    if( pSet )
        sa.sa_mask = *pSet;
    else
        sigfillset( &sa.sa_mask );
    sa.sa_flags = flags | SA_SIGINFO;
    sigaction( SigNo, &sa, old ? &m_saBefore : nullptr );
}

template<int SigNo>
void signal_scope<SigNo>::action( int sig, siginfo_t *info, void
*uContext ) noexcept
{
    using namespace std;
    if( callHandler( t_handler, sig, info, uContext ) ) [[likely]]
        return;
    handler_variant fallback;
    {
        lock_guard lock( g_mtxFallback );
        fallback = g_fallback;
    }
    callHandler( fallback, sig, info, uContext );
}

template<int SigNo>
bool signal_scope<SigNo>::callHandler( const handler_variant &handler,
int sig, siginfo_t *info, void *uContext )
{
    if( holds_alternative<handler_fn>( handler ) ) [[likely]]
        return get<handler_fn>( handler )( sig );
    if( holds_alternative<siginfo_handler_fn>( handler ) ) [[likely]]
        return get<siginfo_handler_fn>( handler )( sig, info, uContext );
    return false;
}

well, is that your code? Or the AI's code? Need to kick that AI to the
curb from time to time. Right?
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 13 09:53:24 2026

From Newsgroup: comp.lang.c

Am 12.03.2026 um 21:18 schrieb DFS:

* read in a list of words from a file here: https://people.sc.fsu.edu/~jburkardt/datasets/words/words.html
* pick N random words from that list and put them in an array (OK if
there are a few dupes - if you can get no dupes that's better)
* sort and print the array of randoms, adding a blank line each time the
1st letter changes

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <sstream>
#include <algorithm>
#include <random>

using namespace std;

int main( int argc, char **argv )
{
if( argc < 2 )
return EXIT_FAILURE;
size_t nLines;
if( argc < 3 || !(istringstream( argv[2] ) >> nLines) )
nLines = 200;
ifstream ifs( argv[1] );
vector<string> lines;
size_t iLine = 0;
for( string line; iLine < nLines && !ifs.eof(); ++iLine )
if( getline( ifs, line ) )
lines.emplace_back( line );
vector<string> rndLines;
mt19937_64 mt;
for( size_t n = 0; n < nLines; ++n )
if( size_t i = mt() % nLines; lines[i].size() )
rndLines.emplace_back( move( lines[i] ) );
sort( rndLines.begin(), rndLines.end() );
iLine = 0;
string *pPrev = nullptr;
for( string &rndLine : rndLines )
{
if( pPrev && tolower( rndLine[0] ) != tolower( pPrev->front() ) )
cout << endl;
cout << ++iLine << ". " << rndLine << endl;
pPrev = &rndLine;
}
}

39 lines vs. 72 lines, and much more readability on my side.
And your code may have duplicates in the random list which
is sorted afterwards.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 13 09:54:23 2026

From Newsgroup: comp.lang.c

Am 13.03.2026 um 09:51 schrieb Chris M. Thomasson:

well, is that your code? Or the AI's code? Need to kick that AI to the
curb from time to time. Right?

AI never generates such dodgy ideas.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 13 09:56:24 2026

From Newsgroup: comp.lang.c

Am 13.03.2026 um 09:53 schrieb Bonita Montero:

Am 12.03.2026 um 21:18 schrieb DFS:

* read in a list of words from a file here:
https://people.sc.fsu.edu/~jburkardt/datasets/words/words.html
* pick N random words from that list and put them in an array (OK if
   there are a few dupes - if you can get no dupes that's better)
* sort and print the array of randoms, adding a blank line each time the
   1st letter changes

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <sstream>
#include <algorithm>
#include <random>

using namespace std;

int main( int argc, char **argv )
{
    if( argc < 2 )
        return EXIT_FAILURE;
    size_t nLines;
    if( argc < 3 || !(istringstream( argv[2] ) >> nLines) )
        nLines = 200;
    ifstream ifs( argv[1] );
    vector<string> lines;
    size_t iLine = 0;
    for( string line; iLine < nLines && !ifs.eof(); ++iLine )
        if( getline( ifs, line ) )
            lines.emplace_back( line );
    vector<string> rndLines;
    mt19937_64 mt;
    for( size_t n = 0; n < nLines; ++n )
        if( size_t i = mt() % nLines; lines[i].size() )

if( size_t i = mt() % lines.size(); lines[i].size() )

            rndLines.emplace_back( move( lines[i] ) );
    sort( rndLines.begin(), rndLines.end() );
    iLine = 0;
    string *pPrev = nullptr;
    for( string &rndLine : rndLines )
    {
        if( pPrev && tolower( rndLine[0] ) != tolower( pPrev->front() ) )
            cout << endl;
        cout << ++iLine << ". " << rndLine << endl;
        pPrev = &rndLine;
    }
}

39 lines vs. 72 lines, and much more readability on my side.
And your code may have duplicates in the random list which
is sorted afterwards.

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Fri Mar 13 05:10:18 2026

From Newsgroup: comp.lang.c

On 3/13/2026 4:53 AM, Bonita Montero wrote:

Am 12.03.2026 um 21:18 schrieb DFS:

* read in a list of words from a file here:
https://people.sc.fsu.edu/~jburkardt/datasets/words/words.html
* pick N random words from that list and put them in an array (OK if
   there are a few dupes - if you can get no dupes that's better)
* sort and print the array of randoms, adding a blank line each time the
   1st letter changes

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <sstream>
#include <algorithm>
#include <random>

using namespace std;

int main( int argc, char **argv )
{
    if( argc < 2 )
        return EXIT_FAILURE;
    size_t nLines;
    if( argc < 3 || !(istringstream( argv[2] ) >> nLines) )
        nLines = 200;
    ifstream ifs( argv[1] );
    vector<string> lines;
    size_t iLine = 0;
    for( string line; iLine < nLines && !ifs.eof(); ++iLine )
        if( getline( ifs, line ) )
            lines.emplace_back( line );
    vector<string> rndLines;
    mt19937_64 mt;
    for( size_t n = 0; n < nLines; ++n )
        if( size_t i = mt() % nLines; lines[i].size() )
            rndLines.emplace_back( move( lines[i] ) );
    sort( rndLines.begin(), rndLines.end() );
    iLine = 0;
    string *pPrev = nullptr;
    for( string &rndLine : rndLines )
    {
        if( pPrev && tolower( rndLine[0] ) != tolower( pPrev->front() ) )
            cout << endl;
        cout << ++iLine << ". " << rndLine << endl;
        pPrev = &rndLine;
    }
}

39 lines vs. 72 lines,

Taking out blank lines
Yours is 37 LOC
Mine is 61 LOC

But your code doesn't work right.
./montero special-english.txt 50

Outputs 35 words, and they're not random. They all begin with a.

and much more readability on my side.

C++ = gag

And your code may have duplicates in the random list which
is sorted afterwards.

My word output definitely does have dupes. C rand() produces tons of
dupe random numbers. I'm working on a method to use only unique randoms.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 13 10:14:34 2026

From Newsgroup: comp.lang.c

I won't test my before code again.

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <sstream>
#include <algorithm>
#include <random>

using namespace std;

int main( int argc, char **argv )
{
if( argc < 2 )
return EXIT_FAILURE;
size_t nLines;
if( argc < 3 || !(istringstream( argv[2] ) >> nLines) )
nLines = 200;
ifstream ifs( argv[1] );
vector<string> lines;
for( string line; !ifs.eof(); )
if( getline( ifs, line ) )
lines.emplace_back( move( line ) );
vector<string> rndLines;
mt19937_64 mt;
for( size_t n = 0; n < nLines; ++n )
{
size_t i = mt() % lines.size();
rndLines.emplace_back( move( lines[i] ) );
lines.erase( lines.begin() + i );
}
sort( rndLines.begin(), rndLines.end() );
size_t iLine = 1;
string *pPrev = nullptr;
for( string &rndLine : rndLines )
{
if( pPrev && tolower( rndLine[0] ) != tolower( pPrev->front() ) )
cout << endl;
cout << iLine++ << ". " << rndLine << endl;
pPrev = &rndLine;
}
}

This is the current version.
If I call it with "test.exe anagram_dictionary.txt 10" it prints:

1. deceived
2. desorption

3. moronic

4. quantitativeness

5. reimpose

6. sleigh

7. tabooing

8. unreconciled

9. waxing
10. woody

C ia a joke, not C++.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Mar 13 10:27:23 2026

From Newsgroup: comp.lang.c

On 3/13/26 09:49, Bonita Montero wrote:

Am 13.03.2026 um 09:48 schrieb Chris M. Thomasson:

On 3/12/2026 10:14 PM, Bonita Montero wrote:

The list was generated with ChatGpt.

Oh my. Beware, and always double, and triple, check its reams of code.

It's only word / telephone number tuples as .txt.

The AI strikes back!

First, my apologies; I was mistakenly assuming real people.

What I did was to search for name and number (without area code).
That was answered by the search engine by "missing area code".
So I added it and got a complete record of some doctor of medicine.

But then I noticed too late that I had inadvertently got into some
AI query mode of the search engine.

So I repeated my query in ordinary mode and got nothing as result.

Then I repeated my query once again in that AI mode with the same
data and got again a complete data record, but of *another* doctor.

These inquiries just made-up those entries on the fly. - Scary!

There's a small note at the bottom that AI-results may contain
errors. But these replies are not "errors"; they are completely
made-up statements out of thin air!

Do AI inquiries for *random* test data create not only the list of
test data but also "create" the data entries as *real* in the AI's
memory for later responses to other queries?

Where AI-dreams come true...

You can of course then ask that AI whether that answer is correct,
whether that person is actually existing. - And then the AI will
willingly tell you that there's many indicators that her response
is most likely wrong, and she is enumerating all the issues (that
her original reply had)!

(I thought it's bad. But it's worse than I thought.)

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Mar 13 10:29:45 2026

From Newsgroup: comp.lang.c

On 3/13/26 09:54, Bonita Montero wrote:

Am 13.03.2026 um 09:51 schrieb Chris M. Thomasson:

well, is that your code? Or the AI's code? Need to kick that AI to the
curb from time to time. Right?

AI never generates such dodgy ideas.

It does. Even worse. See my most recent post.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 13 10:33:30 2026

From Newsgroup: comp.lang.c

Am 13.03.2026 um 10:29 schrieb Janis Papanagnou:

On 3/13/26 09:54, Bonita Montero wrote:

Am 13.03.2026 um 09:51 schrieb Chris M. Thomasson:

well, is that your code? Or the AI's code? Need to kick that AI to
the curb from time to time. Right?

AI never generates such dodgy ideas.

It does. ...

No, AI doesn't even understand what I did there so that I had to correct
the code while reviewing it through AI. Redirecting signals to threads
is really uncommon.
--- Synchronet 3.21d-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Fri Mar 13 14:29:44 2026

From Newsgroup: comp.lang.c

DFS <nospam@dfs.com> writes:

On 3/12/2026 8:54 PM, Scott Lurndal wrote:

DFS <nospam@dfs.com> writes:

I'll try that in C if you commit to trying the following in C++. Deal?

-------------------------------------------------------------------------- >>> * read in a list of words from a file here:
https://people.sc.fsu.edu/~jburkardt/datasets/words/words.html

* pick N random words from that list and put them in an array (OK if
there are a few dupes - if you can get no dupes that's better)

* sort and print the array of randoms, adding a blank line each time the >>> 1st letter changes

(note: I just now wrote this. It's not old code.)

usage is
$./randwords filename N
$./randwords special_english.txt 200

-------------------------------------------------------------------------- >>

A POSIX version:

I took out all 7 instances of error trapping, and it throws:

Why on earth did you remove the error checks? They're
there for a reason.

You don't say how you invoked the executable.

--- Synchronet 3.21d-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Fri Mar 13 14:38:51 2026

From Newsgroup: comp.lang.c

DFS <nospam@dfs.com> writes:

On 3/13/2026 12:31 AM, DFS wrote:

On 3/12/2026 8:54 PM, Scott Lurndal wrote:

DFS <nospam@dfs.com> writes:

I took out all 7 instances of error trapping, and it throws:
Floating point exception (core dumped)
at line 61 or 62

Put back in just the stat() call

stat(argv[1], &st);

and the program works again.

Yes, you can't willy-nilly remove lines from a program.

Question: wordlist was malloced with a size of 0:

wordlist = malloc(wordcount * sizeof(const char *));

Why are you allowed to malloc size 0?

"If the size of the space requested is 0, the behavior is
implementation-defined: either a null pointer shall be returned,
or the behavior shall be as if the size were some non-zero value,
except that the behavior is undefined if the returned pointer is
used to access an object."

https://pubs.opengroup.org/onlinepubs/9799919799/functions/malloc.html

--- Synchronet 3.21d-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Fri Mar 13 15:31:22 2026

From Newsgroup: comp.lang.c

scott@slp53.sl.home (Scott Lurndal) writes:

DFS <nospam@dfs.com> writes:

On 3/13/2026 12:31 AM, DFS wrote:

On 3/12/2026 8:54 PM, Scott Lurndal wrote:

DFS <nospam@dfs.com> writes:

I took out all 7 instances of error trapping, and it throws:
Floating point exception (core dumped)
at line 61 or 62

Put back in just the stat() call

stat(argv[1], &st);

and the program works again.

Yes, you can't willy-nilly remove lines from a program.

Question: wordlist was malloced with a size of 0:

wordlist = malloc(wordcount * sizeof(const char *));

Why are you allowed to malloc size 0?

"If the size of the space requested is 0, the behavior is
implementation-defined: either a null pointer shall be returned,
or the behavior shall be as if the size were some non-zero value,
except that the behavior is undefined if the returned pointer is
used to access an object."

https://pubs.opengroup.org/onlinepubs/9799919799/functions/malloc.html

IIRC, this caveat was added due to differences in the malloc(3)
implementations for System V Unix and BSD Unix.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Fri Mar 13 17:32:12 2026

From Newsgroup: comp.lang.c

On 12/03/2026 15:10, Bonita Montero wrote:

Am 12.03.2026 um 15:43 schrieb DFS:

On 3/12/2026 10:13 AM, Bonita Montero wrote:

Here, do that in C:

static optional<size_t> parse( const char *str )
{
     size_t ret;
     if( from_chars_result fcr = from_chars( str, str +
strlen( str ), ret ); (bool)fcr.ec || *fcr.ptr )
         return nullopt;
     return ret;
}

Explain in detail what it does.

If that isn't self-explanatory stick with C.

I've got a task for you: Do the same in C:

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>
#include <vector>

using namespace std;

int main( int argc, char **argv )
{
    if( argc < 2 )
        return EXIT_FAILURE;
    ifstream ifs( argv[1] );
    static regex rxNameTel( "^\\s*\"([^\"]*)\"\\s*\"([^\"]*)\"\\s*$" );
    struct name_tel { string name, tel; };
    vector<name_tel> phoneList;
    while( !ifs.eof() )
    {
        string line;
        getline( ifs, line );
        match_results<string::const_iterator> sm;
        if( regex_match( line, sm, rxNameTel ) )
            phoneList.emplace_back( string( sm[1].first, sm[1].second ), string( sm[2].first, sm[2].second ) );
    }
    sort( phoneList.begin(), phoneList.end(),
        []( const name_tel &left, const name_tel &right ) { return left.name < right.name; } );
    for( name_tel &phone : phoneList )
        cout << "\"" << phone.name << "\"\t\"" << phone.tel << "\"" <<
endl;
}

1. Read a file and parse it with ne mentioned regex-pattern.
2. Split both parts of every line in two strings.
3. Sort the "vector" according to the first string.
4. Print it.

I guess you don't manage to do that with less than five times the work.

This stuff is child's play with any scripting language. The code will
also be cleaner and simpler.

Here you're trying to write C++ as though it was a scripting language,
by utilising its many bundled libraries, but it fails badly.

There is too much excruciating detail that you still have to write. Even
the easy bit, printing the sorted list, is a mess of punctuation.

In mine, if I use a similar record type and table name, the loop is:

for x in phonelist do
println x.name:"q15jl", x.tel:"q"
end

('q' adds quotes; '15' is field width; 'jl' is left-justified. Note your output uses hard tabs, which can give unpredictable output.)

While your compare function:

[]( const name_tel &left, const name_tel &right ) { return left.name
< right.name; }

would be '{l,r: l.name < r.name}'

I guess you don't manage to do that with less than five times the work.

Every external lib allowed.

If external libraries are allowed then C wouldn't be five times the line count. They just don't come as standard.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 13 18:44:08 2026

From Newsgroup: comp.lang.c

Am 13.03.2026 um 18:32 schrieb Bart:

This stuff is child's play with any scripting language. The code will
also be cleaner and simpler.

Also the C programms here that do similar things. But if your require-
ment is systems programming or performance you won't chose a scripting language.
You're really narrow-minded if you see it so superficially.

Here you're trying to write C++ as though it was a scripting language,
by utilising its many bundled libraries, but it fails badly.

No, the code works correctly according to the requirements.

There is too much excruciating detail that you still have to write.
Even the easy bit, printing the sorted list, is a mess of punctuation.

You're focussed on details without seeing the concept.

While your compare function:

[]( const name_tel &left, const name_tel &right ) { return left.name
< right.name; }
would be '{l,r: l.name < r.name}'

Yes, with one percent of the performance.

If external libraries are allowed then C wouldn't be five times the line count. They just don't come as standard.

The handling of the external libraries is much more complex.
In C++ you won't have to deal with error handling (exceptions)
and there is no explicit memory management if you use containers.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c,comp.lang.c++ on Fri Mar 13 11:54:23 2026

From Newsgroup: comp.lang.c

On 3/12/2026 4:56 PM, Bonita Montero wrote:

Am 12.03.2026 um 18:32 schrieb Scott Lurndal:

You didn't bother to check if the file was opened.

That's not necessary to compare it against a equal solution in C.

This is certainly a task that can be done simply using a shell
script and the standard POSIX utility set. awk(1) would be
a good starting point; it's possible that it can be done with
a single invocation of the posix 'sort' utility.

It's comparison of C against C++.

Of course, the lack of any in-line documentation (e.g. comments) is
a typical defect in your C++ code.

Complete idiot.

Scott is the opposite of idiot.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Fri Mar 13 11:57:10 2026

From Newsgroup: comp.lang.c

On 3/13/2026 2:33 AM, Bonita Montero wrote:

Am 13.03.2026 um 10:29 schrieb Janis Papanagnou:

On 3/13/26 09:54, Bonita Montero wrote:

Am 13.03.2026 um 09:51 schrieb Chris M. Thomasson:

well, is that your code? Or the AI's code? Need to kick that AI to
the curb from time to time. Right?

AI never generates such dodgy ideas.

It does. ...

No, AI doesn't even understand what I did there so that I had to correct
the code while reviewing it through AI. Redirecting signals to threads
is really uncommon.

At least you can take a look at the AI reams o' code, and correct it.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Fri Mar 13 11:58:31 2026

From Newsgroup: comp.lang.c

On 3/13/2026 11:57 AM, Chris M. Thomasson wrote:

On 3/13/2026 2:33 AM, Bonita Montero wrote:

Am 13.03.2026 um 10:29 schrieb Janis Papanagnou:

On 3/13/26 09:54, Bonita Montero wrote:

Am 13.03.2026 um 09:51 schrieb Chris M. Thomasson:

well, is that your code? Or the AI's code? Need to kick that AI to
the curb from time to time. Right?

AI never generates such dodgy ideas.

It does. ...

No, AI doesn't even understand what I did there so that I had to correct
the code while reviewing it through AI. Redirecting signals to threads
is really uncommon.

At least you can take a look at the AI reams o' code, and correct it.

Also, be weary of correcting the AI, the little shit will learn and then
you made it smarter? Uggg...
--- Synchronet 3.21d-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Fri Mar 13 11:59:47 2026

From Newsgroup: comp.lang.c

On 3/13/2026 1:49 AM, Bonita Montero wrote:

Am 13.03.2026 um 09:48 schrieb Chris M. Thomasson:

On 3/12/2026 10:14 PM, Bonita Montero wrote:

Am 13.03.2026 um 02:19 schrieb Janis Papanagnou:

This is an interesting statement. - I first saw the typical
"Max Mustermann" that is in our country the prototype of an
artificial test entry with obvious test number 0123-4567890.
Then I picked an arbitrary entry and looked it up; I found
that person with exactly the associated telephone number. -
Where did you get these real names from?

The list was generated with ChatGpt.

Oh my. Beware, and always double, and triple, check its reams of code.

It's only word / telephone number tuples as .txt.

Fine. But, think if real names leaked? Oh shit, sorry. But shit can
happen. ;^)
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Fri Mar 13 19:36:41 2026

From Newsgroup: comp.lang.c

On 13/03/2026 17:44, Bonita Montero wrote:

Am 13.03.2026 um 18:32 schrieb Bart:

This stuff is child's play with any scripting language. The code will
also be cleaner and simpler.

Also the C programms here that do similar things. But if your require-
ment is systems programming or performance you won't chose a scripting language.
You're really narrow-minded if you see it so superficially.

Here you're trying to write C++ as though it was a scripting language,
by utilising its many bundled libraries, but it fails badly.

No, the code works correctly according to the requirements.

There is too much excruciating detail that you still have to write.
Even the easy bit, printing the sorted list, is a mess of punctuation.

You're focussed on details without seeing the concept.

We're not interested in those details! The task is what's important.

While your compare function:

[]( const name_tel &left, const name_tel &right ) { return
left.name < right.name; }
would be '{l,r: l.name < r.name}'

Yes, with one percent of the performance.

It's not that bad. An example of sorting millions of such records by
name, was 13x slower with my scripting language than g++-03.

That's with the sort function itself written in that language. With
CPython where it is built-in, it was only 3x as slow. This is before any acceleration is applied.

There are also languages that are statically typed but have cleaner
syntax with much less boilerplate, and compile to native code.

Obviously that is a far more preferable direction to go in.

BTW that C++ test, based on your small example, took 3 seconds to
compile. It must be pulling in a huge amount of stuff.

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Fri Mar 13 16:11:39 2026

From Newsgroup: comp.lang.c

On 3/12/2026 8:27 PM, Bonita Montero wrote:

I shortened my code a bit.
Do that:

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>

using namespace std;

int main( int argc, char **argv )
{
    if( argc < 2 )
        return EXIT_FAILURE;
    ifstream ifs( argv[1] );
    static regex rxNameTel( R"~(^\s*"([^"]*)"\s*"([^"]*)"\s*$)~" );
    struct name_tel { string name, tel; };
    vector<name_tel> phoneList;
    smatch sm;
    for( string line; getline( ifs, line ); )
        if( regex_match( line, sm, rxNameTel ) )
            phoneList.emplace_back( sm[1].str(), sm[2].str() );
    sort( phoneList.begin(), phoneList.end(),
        []( name_tel &left, name_tel &right ) { return left.name < right.name; } );
    for( name_tel &phone : phoneList )
        cout << "\"" << phone.name << "\"\t\"" << phone.tel << "\"" <<
endl;
}

Have a closer look at the regex.

Got a massive set of compile errors:

$ g++ montero-sort2.cpp -o montero
In file included from /usr/include/x86_64-linux-gnu/c++/11/bits/c++allocator.h:33,
from /usr/include/c++/11/bits/allocator.h:46,
from /usr/include/c++/11/string:41,
from /usr/include/c++/11/bits/locale_classes.h:40,
from /usr/include/c++/11/bits/ios_base.h:41,
from /usr/include/c++/11/ios:42,
from /usr/include/c++/11/ostream:38,
from /usr/include/c++/11/iostream:39,
from montero-sort2.cpp:1: /usr/include/c++/11/ext/new_allocator.h: In instantiation of ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up = main(int, char**)::name_tel; _Args = {std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >}; _Tp = main(int, char**)::name_tel]’: /usr/include/c++/11/bits/alloc_traits.h:516:17: required from ‘static
void std::allocator_traits<std::allocator<_CharT>

::construct(std::allocator_traits<std::allocator<_CharT>
::allocator_type&, _Up*, _Args&& ...) [with _Up = main(int, char**)::name_tel; _Args = {std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>,

std::allocator<char> >}; _Tp = main(int, char**)::name_tel; std::allocator_traits<std::allocator<_CharT> >::allocator_type = std::allocator<main(int, char**)::name_tel>]’ /usr/include/c++/11/bits/vector.tcc:115:30: required from ‘std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >}; _Tp = main(int, char**)::name_tel; _Alloc = std::allocator<main(int, char**)::name_tel>; std::vector<_Tp, _Alloc>::reference = main(int, char**)::name_tel&]’ montero-sort2.cpp:20:35: required from here /usr/include/c++/11/ext/new_allocator.h:162:11: error: no matching
function for call to ‘main(int, char**)::name_tel::name_tel(std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char>)’
162 | { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ montero-sort2.cpp:15:16: note: candidate: ‘main(int, char**)::name_tel::name_tel()’
15 | struct name_tel { string name, tel; };
| ^~~~~~~~
montero-sort2.cpp:15:16: note: candidate expects 0 arguments, 2 provided montero-sort2.cpp:15:16: note: candidate: ‘main(int, char**)::name_tel::name_tel(const main(int, char**)::name_tel&)’ montero-sort2.cpp:15:16: note: candidate expects 1 argument, 2 provided montero-sort2.cpp:15:16: note: candidate: ‘main(int, char**)::name_tel::name_tel(main(int, char**)::name_tel&&)’ montero-sort2.cpp:15:16: note: candidate expects 1 argument, 2 provided

--- Synchronet 3.21d-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Fri Mar 13 18:47:56 2026

From Newsgroup: comp.lang.c

DFS <nospam@dfs.com> writes:
...

Question: wordlist was malloced with a size of 0:

wordlist = malloc(wordcount * sizeof(const char *));

Why are you allowed to malloc size 0?

Because, in some contexts, it's convenient to allow a mixture of 0-sized
and non-zero sized allocations, depending upon the value of a variable,
so some pre-standard versions of malloc() supported malloc(0) returning
a unique pointer to memory that could not be accessed. The uniqueness of
the pointer allowed the value of the pointer to be used as an identifier
for the thing that might or might not have been allocated.
This was a sufficiently common feature that the C committee decided to
allow it, but sufficiently rare that the C committee decided not to
mandate it.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,comp.lang.c++ on Sat Mar 14 06:01:53 2026

From Newsgroup: comp.lang.c

Am 13.03.2026 um 21:11 schrieb DFS:

On 3/12/2026 8:27 PM, Bonita Montero wrote:

I shortened my code a bit.
Do that:

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>

using namespace std;

int main( int argc, char **argv )
{
     if( argc < 2 )
         return EXIT_FAILURE;
     ifstream ifs( argv[1] );
     static regex rxNameTel( R"~(^\s*"([^"]*)"\s*"([^"]*)"\s*$)~" );
     struct name_tel { string name, tel; };
     vector<name_tel> phoneList;
     smatch sm;
     for( string line; getline( ifs, line ); )
         if( regex_match( line, sm, rxNameTel ) )
             phoneList.emplace_back( sm[1].str(), sm[2].str() ); >>      sort( phoneList.begin(), phoneList.end(),
         []( name_tel &left, name_tel &right ) { return left.name < >> right.name; } );
     for( name_tel &phone : phoneList )
         cout << "\"" << phone.name << "\"\t\"" << phone.tel << "\""
<< endl;
}

Have a closer look at the regex.

Got a massive set of compile errors:

= -std=c++20

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Mar 14 06:03:40 2026

From Newsgroup: comp.lang.c

Am 13.03.2026 um 20:36 schrieb Bart:

BTW that C++ test, based on your small example, took 3 seconds
to compile. It must be pulling in a huge amount of stuff.

With C++20 modules that's much faster, but I don't use them since
I've got a fast computer.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Mar 14 06:48:55 2026

From Newsgroup: comp.lang.c

Am 13.03.2026 um 02:22 schrieb DFS:

On 3/12/2026 11:48 AM, Bonita Montero wrote:

Am 12.03.2026 um 16:22 schrieb DFS:

Give me the file you used.

Better take this:

That's 50 in and 50 out.

So what's the RegEx for?

For each line:
1. Skip as many whitespace as possible.
2. Match '"'.
3. Read name until a '"' comes.
4. Match '"'.
5. Skip as many whitespace as possible.
6. Repeat step 1 to 5 for the telephone number.
7. Match line end.
7. Store name and telephone number in a list.
8. Sort the list according to the name.
9. Print each entry.

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Sat Mar 14 01:49:09 2026

From Newsgroup: comp.lang.c

On 3/14/2026 1:01 AM, Bonita Montero wrote:

Am 13.03.2026 um 21:11 schrieb DFS:

On 3/12/2026 8:27 PM, Bonita Montero wrote:

I shortened my code a bit.
Do that:

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>

using namespace std;

int main( int argc, char **argv )
{
     if( argc < 2 )
         return EXIT_FAILURE;
     ifstream ifs( argv[1] );
     static regex rxNameTel( R"~(^\s*"([^"]*)"\s*"([^"]*)"\s*$)~" ); >>>      struct name_tel { string name, tel; };
     vector<name_tel> phoneList;
     smatch sm;
     for( string line; getline( ifs, line ); )
         if( regex_match( line, sm, rxNameTel ) )
             phoneList.emplace_back( sm[1].str(), sm[2].str() );
     sort( phoneList.begin(), phoneList.end(),
         []( name_tel &left, name_tel &right ) { return left.name <
right.name; } );
     for( name_tel &phone : phoneList )
         cout << "\"" << phone.name << "\"\t\"" << phone.tel << "\""
<< endl;
}

Have a closer look at the regex.

Got a massive set of compile errors:

= -std=c++20

Got it.

Your output is messy. What you want to do is iterate the data and find
the longest name, then pad spaces after the name so the phone numbers
line up.

See at the bottom how it should look.

===========================================================
// read in a Montero file of names and phone numbers
// trim spaces, align data, sort, print

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <wchar.h> //to handle multibyte characters

//compare function for qsort
int compare(const void *a, const void *b) {
return strcmp(*(const char**)a, *(const char**)b);
}

//replace character in a string
void replace(char *str, char from, char to) {
int i=0;
while(str[i] != '\0'){
if(str[i] == from){
str[i] = to;
} i++;
}
}

//remove leading and trailing spaces
void trim(char *str) {
int i=0, j=0, start=0, end=strlen(str)-1;
for (i = 0; i < strlen(str); i++) {if (isspace(str[i])) {start++;} else {break;}}
for (i = strlen(str) - 1; i > 0; i--) {if (isspace(str[i])) {end--; } else {break;}}
for (i = start; i <= end; i++) {str[j++] = str[i];}
str[j] = '\0';
}

//align name and phone, and handle multibyte characters like ü
char setspacing(char *str, int longname)
{
char *token;
char name[50] = {0};
char spaces[] = " ";
token = strtok(str," ");
int i = 0;
while(token != NULL && i < 3) {
strncat(name, token, strlen(token));
if (i==0) {strncat(name, spaces, 1);}
int namesize = strlen(name);
int mbsize = mbstowcs(NULL, name, 0);
if (mbsize < 0) {namesize += mbsize;}
if (i==1) {strncat(name, spaces, longname - namesize + 3);}
token = strtok(NULL, " ");
i++;
}

strncpy(str,name,strlen(name));
str[strlen(str)] = '\0';
}

int main(int argc, char *argv[]) {
int i = 0;
int names = 0;
char name[100] = "";
int longname = 0;

//count names, and determine longest name
FILE *fin = fopen(argv[1],"r");
while (fgets(name,sizeof name, fin) != NULL) {
names++;
trim(name);
for (i = 1; i < strlen(name); i++) {
if (name[i] == '"') {
if (i > longname) {
longname = i;
}
break;
}
}
}

//add names to array, also trim spaces, remove
//quote marks and align name/phone
rewind(fin);
i = 0;
char **namelist = malloc(sizeof(char*) * names);
while (fgets(name,sizeof name, fin) != NULL) {
trim(name);
replace(name,'"',' ');
char *dupename = strdup(name);
setspacing(dupename, longname);
int datalen = strlen(dupename);
namelist[i] = malloc(datalen + 1);
strncpy(namelist[i], dupename, datalen);
i++;
}
fclose(fin);

//sort and print names
qsort(namelist, names, sizeof(char*), compare);
for (i = 0; i < names; i++) {
printf("%2d. %s\n",i+1, namelist[i]);
}

printf("\n");
free(namelist);
return 0;
}
===========================================================

88 LOC
16.84MB executable
perfect output

$ ./dfs names-numbers-unsorted-montero.txt
1. Anna Becker 0170-2233445
2. Anna Fischer 0341-9988776
3. Anna Müller 0987-6543210
4. Ben Meier 0341-5566778
5. Ben Richter 069-3344556
6. Clara Hofmann 0157-2233445
7. Clara Zimmermann 040-55667788
8. David Schulz 030-9988776
9. Emma Bauer 0157-9988776
10. Emma Wolf 040-5566778
11. Felix Hoffmann 0711-5566778
12. Felix Neumann 0170-2233445
13. Hannah Wagner 0221-5566778
14. Jan Hoffmann 0711-3344556
15. Jana Zimmer 030-6677889
16. Jonas Klein 0171-4455667
17. Jonas Klein 030-4455667
18. Julia Neumann 0221-3344556
19. Julia Schulz 0228-4455667
20. Laura Fischer 040-98765433
21. Laura Schulze 0228-1122334
22. Lea Richter 0341-1122334
23. Lea Wagner 0151-3344556
24. Lena Fischer 0151-6677889
25. Leon Krause 089-1122334
26. Leon Zimmermann 0711-1122445
27. Leonie Klein 0341-5566778
28. Lukas Hofmann 089-6677889
29. Lukas König 069-7788990
30. Marie Becker 040-4455667
31. Marie Richter 0221-7788990
32. Max Mustermann 0123-4567890
33. Maximilian Keller 030-1122334
34. Mia Keller 089-6677889
35. Michael Braun 0170-9988776
36. Moritz Wolf 0157-4455667
37. Nina Krause 0341-4455667
38. Paul Schäfer 0228-3344556
39. Paul Wolf 030-2233445
40. Peter Schmidt 030-1234567
41. Philipp König 089-9988776
42. Sarah Braun 040-7788990
43. Sarah Lehmann 040-2233445
44. Simon Meier 030-7788990
45. Sophie Neumann 0711-3344556
46. Sophie Wagner 089-2233445
47. Tim Becker 0151-1112223
48. Tim Richter 030-6677889
49. Tim Schäfer 0711-5566778
50. Tom Bauer 0171-1122334
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,comp.lang.c++ on Sat Mar 14 07:23:45 2026

From Newsgroup: comp.lang.c

Am 14.03.2026 um 06:49 schrieb DFS:

Your output is messy. What you want to do is iterate the data and find
the longest name, then pad spaces after the name so the phone numbers
line up.

No, alphabetically ordered. My code exactly does that.

88 LOC
16.84MB executable
perfect output

$ ./dfs names-numbers-unsorted-montero.txt
1. Anna Becker 0170-2233445
...

My output is the same without line numbers:

"Anna Becker" "0170-2233445"
"Anna Fischer" "0341-9988776"
"Anna Müller" "0987-6543210"
"Ben Meier" "0341-5566778"
"Ben Richter" "069-3344556"
"Clara Hofmann" "0157-2233445"
"Clara Zimmermann" "040-5566778"
"David Schulz" "030-9988776"
"Emma Bauer" "0157-9988776"
"Emma Wolf" "040-5566778"
"Felix Hoffmann" "0711-5566778"
"Felix Neumann" "0170-2233445"
"Hannah Wagner" "0221-5566778"
"Jan Hoffmann" "0711-3344556"
"Jana Zimmer" "030-6677889"
"Jonas Klein" "0171-4455667"
"Jonas Klein" "030-4455667"
"Julia Neumann" "0221-3344556"
"Julia Schulz" "0228-4455667"
"Laura Fischer" "040-9876543"
"Laura Schulze" "0228-1122334"
"Lea Richter" "0341-1122334"
"Lea Wagner" "0151-3344556"
"Lena Fischer" "0151-6677889"
"Leon Krause" "089-1122334"
"Leon Zimmermann" "0711-1122445"
"Leonie Klein" "0341-5566778"
"Lukas Hofmann" "089-6677889"
"Lukas König" "069-7788990"
"Marie Becker" "040-4455667"
"Marie Richter" "0221-7788990"
"Max Mustermann" "0123-4567890"
"Maximilian Keller" "030-1122334"
"Mia Keller" "089-6677889"
"Michael Braun" "0170-9988776"
"Moritz Wolf" "0157-4455667"
"Nina Krause" "0341-4455667"
"Paul Schäfer" "0228-3344556"
"Paul Wolf" "030-2233445"
"Peter Schmidt" "030-1234567"
"Philipp König" "089-9988776"
"Sarah Braun" "040-7788990"
"Sarah Lehmann" "040-2233445"
"Simon Meier" "030-7788990"
"Sophie Neumann" "0711-3344556"
"Sophie Wagner" "089-2233445"
"Tim Becker" "0151-1112223"
"Tim Richter" "030-6677889"
"Tim Schäfer" "0711-5566778"
"Tom Bauer" "0171-1122334"

But your code is 101 lines of code, my 25. C really sucks.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Mar 14 07:52:19 2026

From Newsgroup: comp.lang.c

Now my output is exactly the same as yours with 38 lines instead of 101.

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>
#include <sstream>
#include <iomanip>

using namespace std;

int main( int argc, char **argv )
{
if( argc < 2 )
return EXIT_FAILURE;
ifstream ifs( argv[1] );
static regex rxNameTel( R"~(^\s*"([^"]*)"\s*"([^"]*)"\s*$)~" );
struct name_tel { string name, tel; };
vector<name_tel> phoneList;
smatch sm;
for( string line; !ifs.eof(); )
if( getline( ifs, line ) && regex_match( line, sm, rxNameTel ) )
phoneList.emplace_back( sm[1].str(), sm[2].str() );
if( !phoneList.size() )
return EXIT_FAILURE;
sort( phoneList.begin(), phoneList.end(),
[]( name_tel &left, name_tel &right ) { return left.name < right.name;
} );
size_t
maxName = max_element( phoneList.begin(), phoneList.end(),
[]( name_tel &a, name_tel &b ) { return a.name.length() <
b.name.length(); } )->name.length(),
maxLineNo = (ostringstream() << phoneList.size()).rdbuf()->view().size();
string totalNamePad( maxName, ' ' );
size_t iLine = 1;
for( name_tel &phone : phoneList )
{
auto namePad = string_view( totalNamePad.begin(), totalNamePad.begin()
+ (maxName - phone.name.length()) );
cout << setw( maxLineNo ) << right << iLine++ << ". " << setw( maxName
) << left << phone.name << " " << phone.tel << endl;
}
}
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Mar 14 07:53:16 2026

From Newsgroup: comp.lang.c

Eh, 35 lines. I forget to remove namePad.

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>
#include <sstream>
#include <iomanip>

using namespace std;

int main( int argc, char **argv )
{
if( argc < 2 )
return EXIT_FAILURE;
ifstream ifs( argv[1] );
static regex rxNameTel( R"~(^\s*"([^"]*)"\s*"([^"]*)"\s*$)~" );
struct name_tel { string name, tel; };
vector<name_tel> phoneList;
smatch sm;
for( string line; !ifs.eof(); )
if( getline( ifs, line ) && regex_match( line, sm, rxNameTel ) )
phoneList.emplace_back( sm[1].str(), sm[2].str() );
if( !phoneList.size() )
return EXIT_FAILURE;
sort( phoneList.begin(), phoneList.end(),
[]( name_tel &left, name_tel &right ) { return left.name < right.name;
} );
size_t
maxName = max_element( phoneList.begin(), phoneList.end(),
[]( name_tel &a, name_tel &b ) { return a.name.length() <
b.name.length(); } )->name.length(),
maxLineNo = (ostringstream() << phoneList.size()).rdbuf()->view().size();
string totalNamePad( maxName, ' ' );
size_t iLine = 1;
for( name_tel &phone : phoneList )
cout << setw( maxLineNo ) << right << iLine++ << ". " << setw( maxName
) << left << phone.name << " " << phone.tel << endl;
}
--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Sat Mar 14 02:58:31 2026

From Newsgroup: comp.lang.c

On 3/14/2026 2:23 AM, Bonita Montero wrote:

Am 14.03.2026 um 06:49 schrieb DFS:

Your output is messy. What you want to do is iterate the data and
find the longest name, then pad spaces after the name so the phone
numbers line up.

No, alphabetically ordered. My code exactly does that.

88 LOC
16.84MB executable
perfect output

$ ./dfs names-numbers-unsorted-montero.txt
  1. Anna Becker          0170-2233445
...

My output is the same without line numbers:

What? Your output is FUBAR. You don't give anyone a list looking like
this:
> "Anna Becker"    "0170-2233445"

"Anna Fischer"    "0341-9988776"
"Anna Müller"    "0987-6543210"
"Ben Meier"    "0341-5566778"
"Ben Richter"    "069-3344556"
"Clara Hofmann"    "0157-2233445"
"Clara Zimmermann"    "040-5566778"
"Tom Bauer"    "0171-1122334"

....
> But your code is 101 lines of code, my 25. C really sucks.

88 LOC, and C is great.

Did you notice your 25 lines of C++ turns into a ridiculous 665MB
executable. wtf? 2/3 of a GB for that little bit of processing?
Something's wrong with that language.

But C and C++ are the wrong languages for text processing and formatting anyway. You want python (or maybe perl, or other text processing
languages).

-------------------------------------------------------------------
import sys
names = []
with open(sys.argv[1],'r') as f:
    for line in f:
if line != '\n':
line = line.replace('"','')
names.append(line.strip())
for j, name in enumerate(sorted(names)):
    e = [i.strip() for i in name.split(' ') if i]
    print("%2d. %-20s %s" % (j+1, e[0] + ' ' + e[1],e[2])) -------------------------------------------------------------------

$ python3 sortnames.py names-numbers-unsorted.txt
1. Anna Becker 0170-2233445
2. Anna Fischer 0341-9988776
3. Anna Müller 0987-6543210
4. Ben Meier 0341-5566778
5. Ben Richter 069-3344556
6. Clara Hofmann 0157-2233445
7. Clara Zimmermann 040-5566778
8. David Schulz 030-9988776
9. Emma Bauer 0157-9988776
10. Emma Wolf 040-5566778
11. Felix Hoffmann 0711-5566778
12. Felix Neumann 0170-2233445
13. Hannah Wagner 0221-5566778
14. Jan Hoffmann 0711-3344556
15. Jana Zimmer 030-6677889
16. Jonas Klein 030-4455667
17. Jonas Klein 0171-4455667
18. Julia Neumann 0221-3344556
19. Julia Schulz 0228-4455667
20. Laura Fischer 040-9876543
21. Laura Schulze 0228-1122334
22. Lea Richter 0341-1122334
23. Lea Wagner 0151-3344556
24. Lena Fischer 0151-6677889
25. Leon Krause 089-1122334
26. Leon Zimmermann 0711-1122445
27. Leonie Klein 0341-5566778
28. Lukas Hofmann 089-6677889
29. Lukas König 069-7788990
30. Marie Becker 040-4455667
31. Marie Richter 0221-7788990
32. Max Mustermann 0123-4567890
33. Maximilian Keller 030-1122334
34. Mia Keller 089-6677889
35. Michael Braun 0170-9988776
36. Moritz Wolf 0157-4455667
37. Nina Krause 0341-4455667
38. Paul Schäfer 0228-3344556
39. Paul Wolf 030-2233445
40. Peter Schmidt 030-1234567
41. Philipp König 089-9988776
42. Sarah Braun 040-7788990
43. Sarah Lehmann 040-2233445
44. Simon Meier 030-7788990
45. Sophie Neumann 0711-3344556
46. Sophie Wagner 089-2233445
47. Tim Becker 0151-1112223
48. Tim Richter 030-6677889
49. Tim Schäfer 0711-5566778
50. Tom Bauer 0171-1122334

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Sat Mar 14 03:05:07 2026

From Newsgroup: comp.lang.c

On 3/14/2026 2:53 AM, Bonita Montero wrote:

Eh, 35 lines. I forget to remove namePad.

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>
#include <sstream>
#include <iomanip>

using namespace std;

int main( int argc, char **argv )
{
    if( argc < 2 )
        return EXIT_FAILURE;
    ifstream ifs( argv[1] );
    static regex rxNameTel( R"~(^\s*"([^"]*)"\s*"([^"]*)"\s*$)~" );
    struct name_tel { string name, tel; };
    vector<name_tel> phoneList;
    smatch sm;
    for( string line; !ifs.eof(); )
        if( getline( ifs, line ) && regex_match( line, sm, rxNameTel ) )
            phoneList.emplace_back( sm[1].str(), sm[2].str() );
    if( !phoneList.size() )
        return EXIT_FAILURE;
    sort( phoneList.begin(), phoneList.end(),
        []( name_tel &left, name_tel &right ) { return left.name < right.name; } );
    size_t
        maxName = max_element( phoneList.begin(), phoneList.end(),
            []( name_tel &a, name_tel &b ) { return a.name.length() <
b.name.length(); } )->name.length(),
        maxLineNo = (ostringstream() << phoneList.size()).rdbuf()->view().size();
    string totalNamePad( maxName, ' ' );
    size_t iLine = 1;
    for( name_tel &phone : phoneList )
        cout << setw( maxLineNo ) << right << iLine++ << ". " << setw(
maxName ) << left << phone.name << " " << phone.tel << endl;
}

You didn't handle multibyte characters like ä and ö and ü, so the phone numbers aren't lined up.

1. Anna Becker 0170-2233445
2. Anna Fischer 0341-9988776
3. Anna Müller 0987-6543210
4. Ben Meier 0341-5566778
5. Ben Richter 069-3344556
6. Clara Hofmann 0157-2233445
7. Clara Zimmermann 040-5566778
8. David Schulz 030-9988776
9. Emma Bauer 0157-9988776
10. Emma Wolf 040-5566778
11. Felix Hoffmann 0711-5566778
12. Felix Neumann 0170-2233445
13. Hannah Wagner 0221-5566778
14. Jan Hoffmann 0711-3344556
15. Jana Zimmer 030-6677889
16. Jonas Klein 030-4455667
17. Jonas Klein 0171-4455667
18. Julia Neumann 0221-3344556
19. Julia Schulz 0228-4455667
20. Laura Fischer 040-9876543
21. Laura Schulze 0228-1122334
22. Lea Richter 0341-1122334
23. Lea Wagner 0151-3344556
24. Lena Fischer 0151-6677889
25. Leon Krause 089-1122334
26. Leon Zimmermann 0711-1122445
27. Leonie Klein 0341-5566778
28. Lukas Hofmann 089-6677889
29. Lukas König 069-7788990
30. Marie Becker 040-4455667
31. Marie Richter 0221-7788990
32. Max Mustermann 0123-4567890
33. Maximilian Keller 030-1122334
34. Mia Keller 089-6677889
35. Michael Braun 0170-9988776
36. Moritz Wolf 0157-4455667
37. Nina Krause 0341-4455667
38. Paul Schäfer 0228-3344556
39. Paul Wolf 030-2233445
40. Peter Schmidt 030-1234567
41. Philipp König 089-9988776
42. Sarah Braun 040-7788990
43. Sarah Lehmann 040-2233445
44. Simon Meier 030-7788990
45. Sophie Neumann 0711-3344556
46. Sophie Wagner 089-2233445
47. Tim Becker 0151-1112223
48. Tim Richter 030-6677889
49. Tim Schäfer 0711-5566778
50. Tom Bauer 0171-1122334

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Mar 14 08:10:58 2026

From Newsgroup: comp.lang.c

Am 14.03.2026 um 08:05 schrieb DFS:

You didn't handle multibyte characters like ä and ö and ü, so the phone numbers aren't lined up.

No, my output is correct. Don't trust the console,
print everyhting into a file with "xxx yyy > filename".

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Sat Mar 14 03:17:59 2026

From Newsgroup: comp.lang.c

On 3/14/2026 3:10 AM, Bonita Montero wrote:

Am 14.03.2026 um 08:05 schrieb DFS:

You didn't handle multibyte characters like ä and ö and ü, so the
phone numbers aren't lined up.

No, my output is correct. Don't trust the console,
print everyhting into a file with "xxx yyy > filename".

Your output is most definitely NOT correct.

Mine is, though. See the setspacing() function to see how I did it.

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Sat Mar 14 03:29:50 2026

From Newsgroup: comp.lang.c

On 3/14/2026 1:48 AM, Bonita Montero wrote:

Am 13.03.2026 um 02:22 schrieb DFS:

On 3/12/2026 11:48 AM, Bonita Montero wrote:

Am 12.03.2026 um 16:22 schrieb DFS:

Give me the file you used.

Better take this:

That's 50 in and 50 out.

So what's the RegEx for?

For each line:
1. Skip as many whitespace as possible.
2. Match '"'.
3. Read name until a '"' comes.
4. Match '"'.
5. Skip as many whitespace as possible.
6. Repeat step 1 to 5 for the telephone number.
7. Match line end.
7. Store name and telephone number in a list.
8. Sort the list according to the name.
9. Print each entry.

Thanks.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Mar 14 08:59:11 2026

From Newsgroup: comp.lang.c

Am 14.03.2026 um 08:17 schrieb DFS:

Your output is most definitely NOT correct.

The problem is that setw( xxx ) on cout doesn't support UTF-8 strings
with Visual C++ even if I imbue cout to "de_DE.utf8". I'm looking for
a solution without counting the characters myself.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Mar 14 09:12:59 2026

From Newsgroup: comp.lang.c

Am 14.03.2026 um 08:17 schrieb DFS:

On 3/14/2026 3:10 AM, Bonita Montero wrote:

Am 14.03.2026 um 08:05 schrieb DFS:

You didn't handle multibyte characters like ä and ö and ü, so the
phone numbers aren't lined up.

No, my output is correct. Don't trust the console,
print everyhting into a file with "xxx yyy > filename".

Your output is most definitely NOT correct.

Mine is, though. See the setspacing() function to see how I did it.

Now it works without imbuing cout to de_DE.utf8:

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>
#include <sstream>
#include <iomanip>
#include <locale>

using namespace std;

int main( int argc, char **argv )
{
if( argc < 2 )
return EXIT_FAILURE;
ifstream ifs( argv[1] );
static regex rxNameTel( R"~(^\s*"([^"]*)"\s*"([^"]*)"\s*$)~" );
struct name_tel { string name, tel; };
vector<name_tel> phoneList;
smatch sm;
for( string line; !ifs.eof(); )
if( getline( ifs, line ) && regex_match( line, sm, rxNameTel ) )
phoneList.emplace_back( sm[1].str(), sm[2].str() );
if( !phoneList.size() )
return EXIT_FAILURE;
sort( phoneList.begin(), phoneList.end(),
[]( name_tel &left, name_tel &right ) { return left.name < right.name;
} );
static auto length = []( string_view sv ) { return count_if( sv.begin(), sv.end(), []( char c ) { return (c & 0xC0) != 0x80; } ); };
static auto extents = []( string_view sv ) { return count_if( sv.begin(), sv.end(), []( char c ) { return (c & 0xC0) == 0x80; } ); };
size_t
maxName = length( max_element( phoneList.begin(), phoneList.end(),
[&]( name_tel &a, name_tel &b ) { return length( a.name ) < length(
b.name ); } )->name ),
maxLineNo = (ostringstream() << phoneList.size()).rdbuf()->view().size();
size_t iLine = 1;
for( name_tel &phone : phoneList )
cout << setw( maxLineNo ) << right << iLine++ << ". " << setw( maxName
+ extents( phone.name ) ) << left << phone.name << " " << phone.tel << endl;
}
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Sat Mar 14 12:15:20 2026

From Newsgroup: comp.lang.c

On 14/03/2026 06:53, Bonita Montero wrote:

Eh, 35 lines. I forget to remove namePad.

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>
#include <sstream>
#include <iomanip>

using namespace std;

int main( int argc, char **argv )
{
    if( argc < 2 )
        return EXIT_FAILURE;
    ifstream ifs( argv[1] );
    static regex rxNameTel( R"~(^\s*"([^"]*)"\s*"([^"]*)"\s*$)~" );
    struct name_tel { string name, tel; };
    vector<name_tel> phoneList;
    smatch sm;
    for( string line; !ifs.eof(); )
        if( getline( ifs, line ) && regex_match( line, sm, rxNameTel ) )
            phoneList.emplace_back( sm[1].str(), sm[2].str() );
    if( !phoneList.size() )
        return EXIT_FAILURE;
    sort( phoneList.begin(), phoneList.end(),
        []( name_tel &left, name_tel &right ) { return left.name < right.name; } );
    size_t
        maxName = max_element( phoneList.begin(), phoneList.end(),
            []( name_tel &a, name_tel &b ) { return a.name.length() <
b.name.length(); } )->name.length(),
        maxLineNo = (ostringstream() << phoneList.size()).rdbuf()-

view().size();

    string totalNamePad( maxName, ' ' );
    size_t iLine = 1;
    for( name_tel &phone : phoneList )
        cout << setw( maxLineNo ) << right << iLine++ << ". " << setw( maxName ) << left << phone.name << " " << phone.tel << endl;
}

I got it down to 32 lines (you left in 2 blank lines, and there was one
{ on its own line).

Doing the same to DFS's 88 line solution (that is, 88 lines without
comments), it was down to around 80 lines.

In terms of file size, it's about 1.9:1. Both use spaced indents. If I
get rid of leading white space (and some trailing white space in DFS
version), then difference is 1.7:1.

However, even at 1.9:1, the readability of DFS version is far superior.
It's also easier to maintain, and to port.

Your C++ always looks like total gobbledygook. It's also impossible to
port, even to C++ (eg. to older versions).

As for binary sizes, those aren't so interesting: using -Os -s:

BM: 87KB
DFS: 50KB

Compilation time however was very different:

BM g++-Os -s 4.9 seconds, or 6.5 lps
DFS gcc-Os -s 0.3 seconds, or 270 lps

Both are poor frankly, but the C++ was still significantly slower.

When I don't care about size or performance and need a quick build-cycle:

BM g++ -s 4.0 seconds (producing a 280KB file)
DFS tcc 0.05 seconds (producing a 5KB file)

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Mar 14 14:00:00 2026

From Newsgroup: comp.lang.c

Am 14.03.2026 um 13:15 schrieb Bart:

In terms of file size, it's about 1.9:1. Both use spaced indents. If I
get rid of leading white space (and some trailing white space in DFS version), then difference is 1.7:1.

Ridiculous, a rough estimate of the relation is sufficient.

Your C++ always looks like total gobbledygook.

If you can handle C++ it's easier.

As for binary sizes, those aren't so interesting: using -Os -s:
BM:    87KB
DFS:   50KB

My PC has 128GiB of memory.

BM g++-Os -s    4.9 seconds, or 6.5 lps
DFS gcc-Os -s    0.3 seconds, or 270 lps

This is worth the less development times.

Both are poor frankly, but the C++ was still significantly slower.

It's for sure as slow as most of the time is taken by the kernel
flushes to cout / stdout.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Sat Mar 14 16:22:47 2026

From Newsgroup: comp.lang.c

On 14/03/2026 06:53, Bonita Montero wrote:

Eh, 35 lines. I forget to remove namePad.

#include <iostream>
#include <regex>
#include <fstream>
#include <vector>
#include <algorithm>
#include <sstream>
#include <iomanip>

using namespace std;

int main( int argc, char **argv )
{
    if( argc < 2 )
        return EXIT_FAILURE;
    ifstream ifs( argv[1] );
    static regex rxNameTel( R"~(^\s*"([^"]*)"\s*"([^"]*)"\s*$)~" );
    struct name_tel { string name, tel; };
    vector<name_tel> phoneList;
    smatch sm;
    for( string line; !ifs.eof(); )
        if( getline( ifs, line ) && regex_match( line, sm, rxNameTel ) )
            phoneList.emplace_back( sm[1].str(), sm[2].str() );
    if( !phoneList.size() )
        return EXIT_FAILURE;
    sort( phoneList.begin(), phoneList.end(),
        []( name_tel &left, name_tel &right ) { return left.name < right.name; } );
    size_t
        maxName = max_element( phoneList.begin(), phoneList.end(),
            []( name_tel &a, name_tel &b ) { return a.name.length() <
b.name.length(); } )->name.length(),
        maxLineNo = (ostringstream() << phoneList.size()).rdbuf()-

view().size();

    string totalNamePad( maxName, ' ' );
    size_t iLine = 1;

    for( name_tel &phone : phoneList )
        cout << setw( maxLineNo ) << right << iLine++ << ". " << setw( maxName ) << left << phone.name << " " << phone.tel << endl;

In C it would be:

printf("%*d. %-*s %s\n", maxLineNo, iLine++, maxName, phone.name, phone.tel);

(I assume left/right set justification; a weird way of doing it.)

For that matter, this can be done in C++ too, but you chose the most long-winded syntax possible.

You seem incapable of using a simple approach when a more complicated
exists! You don't /have/ to use every possible feature you know.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Mar 14 18:04:08 2026

From Newsgroup: comp.lang.c

Am 14.03.2026 um 17:22 schrieb Bart:

cout << setw( maxLineNo ) << right << iLine++ << ". " <<
setw( maxName ) << left << phone.name << " " << phone.tel << endl;

In C it would be:
printf("%*d. %-*s %s\n", maxLineNo, iLine++, maxName, phone.name, phone.tel);

With dynamic widths according to the maximum length of name and tel ?

For that matter, this can be done in C++ too, but you chose the most long-winded syntax possible.

With additional flexibility as I said, and type-safe;
printf() is sick in that sense.

You seem incapable of using a simple approach when a more complicated exists! You don't /have/ to use every possible feature you know.

My approach is simpler since you didn't check the constraints.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Sat Mar 14 17:39:11 2026

From Newsgroup: comp.lang.c

On 14/03/2026 17:04, Bonita Montero wrote:

Am 14.03.2026 um 17:22 schrieb Bart:

cout << setw( maxLineNo ) << right << iLine++ << ". " <<
setw( maxName ) << left << phone.name << " " << phone.tel << endl;

In C it would be:
printf("%*d. %-*s %s\n", maxLineNo, iLine++, maxName, phone.name,
phone.tel);

With dynamic widths according to the maximum length of name and tel ?

That's the purpose of those '*' characters; each is replaced by the next
print item. While that '-' sign will left-justify the field.

For that matter, this can be done in C++ too, but you chose the most
long-winded syntax possible.

With additional flexibility as I said, and type-safe;
printf() is sick in that sense.

Yes that's one of my many criticisms of it: you have to get those %d and
%s formats right. But given that, here it does the job.

(In my languages, 'print' is generally simpler and also type-safe. With complex formatting needs, it becomes fiddly too. But it still uses conventional syntax. For this it would be:

println maxlineno:"v", iline++:"*", maxname:"v", phone.name:"*jl", phone.tel

However formatting info can be set up beforehand, so it can also be:

println iline++:lfmt, phone.name:namefmt, phone.tel

(You'd set the 'lfmt' variable instead of 'maxLineNo' for example.)

That 'iline++' would be different, as even a loop like this has an
accessible in index:

for i, phone in phonelist

Does 'for (name_tel &phone : phoneList)' not expose the internal index?)

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Mar 14 19:25:55 2026

From Newsgroup: comp.lang.c

Am 14.03.2026 um 18:39 schrieb Bart:

Yes that's one of my many criticisms of it: you have to get those %d and
%s formats right. But given that, here it does the job.

Maybe some things are shorter in C, but that's mostly not typical, and
the not typesafe C code you've shown is a no-go. The C++ in this case
is more verbose and I think it's much more redable.
Most of the higher productivity in C++ comes from generic components.
Imagine you would write the containers C++ has for every template type yourself; with C++ that's a one-liner. RAII is another productivity
-enhancer and it's much easier to write safe code with that.
You've chosen a small part of my code and you're losing the overall
view. The whole program is shorter at last and doesn't have that much
lowlevel details which are hard to control in larger programs.
C is an outdated language, but it is the choice for kernel-hackers since
they don't fit with abstractions. Rust would be an improvement the Rust integration in the Linux-kernel with several thousand lines. It's just
because the kernel hackers want to flip every bit themselfes, even if
the abstractions in Rust or C++ have mostly no additional cost over a
manual implementation.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Mar 15 13:15:56 2026

From Newsgroup: comp.lang.c

scott@slp53.sl.home (Scott Lurndal) writes:

scott@slp53.sl.home (Scott Lurndal) writes:

DFS <nospam@dfs.com> writes:

On 3/13/2026 12:31 AM, DFS wrote:

On 3/12/2026 8:54 PM, Scott Lurndal wrote:

DFS <nospam@dfs.com> writes:

I took out all 7 instances of error trapping, and it throws:
Floating point exception (core dumped)
at line 61 or 62

Put back in just the stat() call

stat(argv[1], &st);

and the program works again.

Yes, you can't willy-nilly remove lines from a program.

Question: wordlist was malloced with a size of 0:

wordlist = malloc(wordcount * sizeof(const char *));

Why are you allowed to malloc size 0?

"If the size of the space requested is 0, the behavior is
implementation-defined: either a null pointer shall be returned,
or the behavior shall be as if the size were some non-zero value,
except that the behavior is undefined if the returned pointer is
used to access an object."

https://pubs.opengroup.org/onlinepubs/9799919799/functions/malloc.html

IIRC, this caveat was added due to differences in the malloc(3) implementations for System V Unix and BSD Unix.

That sounds right, except I might say "put in" rather than "added"
since as best I can determine that allowance was present in the
earliest drafts of the C standard and POSIX discussions.

Note that malloc() is not mentioned in K&R, and apparently was
added to AT&T Unix in Unix V7. The timing on that was about the
same time as the first BSD Unix.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Mar 15 14:38:06 2026

From Newsgroup: comp.lang.c

James Kuyper <jameskuyper@alumni.caltech.edu> writes:

DFS <nospam@dfs.com> writes:
...

Question: wordlist was malloced with a size of 0:

wordlist = malloc(wordcount * sizeof(const char *));

Why are you allowed to malloc size 0?

Because, in some contexts, it's convenient to allow a mixture of
0-sized and non-zero sized allocations, depending upon the value of
a variable, so some pre-standard versions of malloc() supported
malloc(0) returning a unique pointer to memory that could not be
accessed. The uniqueness of the pointer allowed the value of the
pointer to be used as an identifier for the thing that might or
might not have been allocated.
This was a sufficiently common feature that the C committee decided
to allow it, but sufficiently rare that the C committee decided not
to mandate it.

The explanation offered by Scott Lurndal is more convincing. The
rule was put in to accommodate both AT&T Unix and BSD Unix. There
is no reason in evidence to suggest there was any evaluation by
the C committee; they allowed both behaviors because the two main
branches of Unix at the time had made different decisions.
--- Synchronet 3.21d-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Mon Mar 16 15:18:46 2026

From Newsgroup: comp.lang.c

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

scott@slp53.sl.home (Scott Lurndal) writes:

scott@slp53.sl.home (Scott Lurndal) writes:

DFS <nospam@dfs.com> writes:

On 3/13/2026 12:31 AM, DFS wrote:

On 3/12/2026 8:54 PM, Scott Lurndal wrote:

DFS <nospam@dfs.com> writes:

I took out all 7 instances of error trapping, and it throws:
Floating point exception (core dumped)
at line 61 or 62

Put back in just the stat() call

stat(argv[1], &st);

and the program works again.

Yes, you can't willy-nilly remove lines from a program.

Question: wordlist was malloced with a size of 0:

wordlist = malloc(wordcount * sizeof(const char *));

Why are you allowed to malloc size 0?

"If the size of the space requested is 0, the behavior is
implementation-defined: either a null pointer shall be returned,
or the behavior shall be as if the size were some non-zero value,
except that the behavior is undefined if the returned pointer is
used to access an object."

https://pubs.opengroup.org/onlinepubs/9799919799/functions/malloc.html

IIRC, this caveat was added due to differences in the malloc(3)
implementations for System V Unix and BSD Unix.

That sounds right, except I might say "put in" rather than "added"
since as best I can determine that allowance was present in the
earliest drafts of the C standard and POSIX discussions.

Note that malloc() is not mentioned in K&R, and apparently was
added to AT&T Unix in Unix V7. The timing on that was about the
same time as the first BSD Unix.

Right. K&R 1st Ed. defined a 'morecore' function based on sbrk.

K&R 2nd Ed. (ANSI) added malloc.

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Mon Mar 16 16:43:00 2026

From Newsgroup: comp.lang.c

On 3/14/2026 8:15 AM, Bart wrote:

On 14/03/2026 06:53, Bonita Montero wrote:

Eh, 35 lines. I forget to remove namePad.

<snip>

Your C++ always looks like total gobbledygook.

Not sure you can fix it - it's just how C++ looks.

Without using or looking back at Montero's code, I did my own research
and replicated this little bit of functionality using recommended C++
style and libraries and functions:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

and this is what it looks like:

=================================================================================
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <regex>
#include <iomanip>

using namespace std;

int main(int argc, char* argv[]) {
vector<string> lines; // store all lines
string line; // store one line

regex re_strip ("^\\s+|\\s+$"); // regex to strip whitespace
regex re_quotes ("\""); // regex to remove quote marks

ifstream file(argv[1]); // Open the file
while (getline(file, line)) { // read lines, clean, add to array
line = regex_replace(line, re_strip , "");
line = regex_replace(line, re_quotes, "");
lines.push_back(line);
}
file.close();

sort(lines.begin(), lines.end());
int i = 0;
for (const string& stored_line : lines) {
cout << right << setw(2) << ++i << ". " << stored_line << endl;
}
return 0;
} =================================================================================

fairly hideous to look at, and 6 includes required?

Even with the -Os compile flag you mentioned, the executable is 101MB.
That's crazy for that tiny bit of functionality.

But for this little code, C++ does some nice things for you: memory management, easier regex usage, and easier sorting.

But for my money, python is the best tradeoff of ease of use,
readability, speed and functionality.

--- Synchronet 3.21d-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Mon Mar 16 20:57:25 2026

From Newsgroup: comp.lang.c

DFS <nospam@dfs.com> writes:

On 3/14/2026 8:15 AM, Bart wrote:

On 14/03/2026 06:53, Bonita Montero wrote:

Eh, 35 lines. I forget to remove namePad.

<snip>

Your C++ always looks like total gobbledygook.

Not sure you can fix it - it's just how C++ looks.

Without using or looking back at Montero's code, I did my own research
and replicated this little bit of functionality using recommended C++
style and libraries and functions:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

Technically, your example removes _leading_ and _trailing_
whitespace, no?

$ sed -e 's/^[ \t]*//;s/[ \t]*$//' -e 's/"//' < inputfile | sort

But for my money, python is the best tradeoff of ease of use,
readability, speed and functionality.

For functionality that can be composed from standard command
line utilities, even python loses.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Richard Harnden@richard.nospam@gmail.invalid to comp.lang.c on Mon Mar 16 22:26:13 2026

From Newsgroup: comp.lang.c

On 16/03/2026 20:43, DFS wrote:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

---
#!/bin/ksh

while read
do
echo ${REPLY//[ \"]/}
done | sort
---

[After Scott's sed]

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Mon Mar 16 22:26:36 2026

From Newsgroup: comp.lang.c

On 16/03/2026 20:43, DFS wrote:

On 3/14/2026 8:15 AM, Bart wrote:

On 14/03/2026 06:53, Bonita Montero wrote:

Eh, 35 lines. I forget to remove namePad.

<snip>

Your C++ always looks like total gobbledygook.

Not sure you can fix it - it's just how C++ looks.

Without using or looking back at Montero's code, I did my own research
and replicated this little bit of functionality using recommended C++
style and libraries and functions:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

and this is what it looks like:

=================================================================================
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <regex>
#include <iomanip>

using namespace std;

int main(int argc, char* argv[]) {
    vector<string> lines;             // store all lines
    string line;                      // store one line

    regex re_strip ("^\\s+|\\s+$"); // regex to strip whitespace
    regex re_quotes ("\"");             // regex to remove quote marks

    ifstream file(argv[1]);            // Open the file
    while (getline(file, line)) {     // read lines, clean, add to array
        line = regex_replace(line, re_strip , "");
        line = regex_replace(line, re_quotes, "");
        lines.push_back(line);
    }
    file.close();

    sort(lines.begin(), lines.end());
    int i = 0;
    for (const string& stored_line : lines) {
        cout << right << setw(2) << ++i << ". " << stored_line << endl;
    }
    return 0;
}

This works very differently:

* The two bits of info on each line (name and telno) are combined into
one string

* Extra white space between name and teleno, even outside the quotes, is
retained, and contributes to sorting

* Sorting is done on both name and telephone number

* The output doesn't put the telephone number into its own tabulated
column

* It doesn't ignore blank lines in the input

With this test input:

---------------------
"White House" "001 202 456 1414"
"White House" "001 202 456 1413"
"White House" "001 202 456 1415"
"Mother" "0049 211 151395"
"Mickey Mouse" "001 123 456 7890"

---------------------

(Note this has an extra blank line I hadn't noticed.)

BM's version produces:

1. Mickey Mouse 001 123 456 7890
2. Mother 0049 211 151395
3. White House 001 202 456 1414
4. White House 001 202 456 1413
5. White House 001 202 456 1415

Notice the three White House numbers stay in the same order. With yours:

1.
2. Mickey Mouse 001 123 456 7890
3. Mother 0049 211 151395
4. White House 001 202 456 1415
5. White House 001 202 456 1413
6. White House 001 202 456 1414

The blank line has not been ignored. The numbers aren't lined up. The
internal white space has been retained. The WH entries have moved
around, even the two without the extra spacing.

Perhaps it's not as simple as it seemed! Did your Python version do the
same as this?

It's a rather fiddly spec for me to bother doing it properly, but I
threw something together in my scripting language which is shown below.
It shows the equivalent of BM's output.

But it cuts some corners: the column widths of the output are hardcoded
(easy to fix but untidy), and it uses a custom sort routine (not shown,
but is a bubble sort), as I don't have a ready-made library routine that
takes a compare function.

=================================================================================

fairly hideous to look at, and 6 includes required?

Even with the -Os compile flag you mentioned, the executable is 101MB. That's crazy for that tiny bit of functionality.

But for this little code, C++ does some nice things for you: memory management,

Yeah. But in a language like C (or my static one), I'd just use a
slightly different approach. Maybe an extra pass over the data to
establish the size of the table, then you just allocate it all at once.

(My compiler project never actually free any memory!)

easier regex usage, and easier sorting.

I've never used regex. As you can see from my example, decent i/o
routines can eliminate the need.

Sorting though can get complex. You'd need a lot more than Sort with a
compare function to work with real phone data.

-------------------------
record rec =
var name, tel
end

f:=openfile(sread("n"))

phonelist::=()

while not eof(f) do
readln @f, name:"s", tel:"s"
nextloop when name=""

phonelist &:= rec(name, tel)
end

closefile(f)

sort(phonelist)

for i,x in phonelist do
fprintln "#. # #", i:"3", x.name:"15jl", x.tel
end

------------------

(The "s" input format reads case-preserved names or files. Inputs can be quoted to allow embedded separators within context. Quotes are
discarded. White space around items is skipped anyway.

Here also, names can have embedded quotes, but they need to be doubled up:

"White ""House""" -> White "House")

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Mon Mar 16 22:35:31 2026

From Newsgroup: comp.lang.c

On 16/03/2026 22:26, Richard Harnden wrote:

On 16/03/2026 20:43, DFS wrote:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

---
#!/bin/ksh

while read
do
echo ${REPLY//[ \"]/}
done | sort
---

[After Scott's sed]

If I change 'ksh' to 'bash' then this gives me this output (for the
input shown in my last post):

-----------------

MickeyMouse0011234567890
Mother0049211151395
WhiteHouse0012024561413
WhiteHouse0012024561414
WhiteHouse0012024561415
-----------------

(As I explained there, the sorting needs to be on the name field only,
and the two fields displayed in separate columns.)
--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Mon Mar 16 19:07:41 2026

From Newsgroup: comp.lang.c

On 3/16/2026 4:57 PM, Scott Lurndal wrote:

DFS <nospam@dfs.com> writes:

On 3/14/2026 8:15 AM, Bart wrote:

On 14/03/2026 06:53, Bonita Montero wrote:

Eh, 35 lines. I forget to remove namePad.

<snip>

Your C++ always looks like total gobbledygook.

Not sure you can fix it - it's just how C++ looks.

Without using or looking back at Montero's code, I did my own research
and replicated this little bit of functionality using recommended C++
style and libraries and functions:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

Technically, your example removes _leading_ and _trailing_
whitespace, no?

Yes.

$ sed -e 's/^[ \t]*//;s/[ \t]*$//' -e 's/"//' < inputfile | sort

This is what I got from that:

Anna Becker" "0170-2233445"
Anna Fischer" "0341-9988776"
Anna Müller" "0987-6543210"
Ben Meier" "0341-5566778"
Ben Richter" "069-3344556"
...

Not what we're looking for.

But for my money, python is the best tradeoff of ease of use,
readability, speed and functionality.

For functionality that can be composed from standard command
line utilities, even python loses.

Sure, but sed and regex are inscrutable.

original Montero file

"Max Mustermann" "0123-4567890"
"Anna Müller" "0987-6543210"
"Peter Schmidt" "030-1234567"
"Laura Fischer" "040-9876543"
"Tim Becker" "0151-1112223"
"Julia Neumann" "0221-3344556"
"Michael Braun" "0170-9988776"
"Sophie Wagner" "089-2233445"
"Felix Hoffmann" "0711-5566778"
"Lea Richter" "0341-1122334"
"Jonas Klein" "030-4455667"
"Emma Wolf" "040-5566778"
"Lukas König" "069-7788990"
"Clara Hofmann" "0157-2233445"
"Paul Schäfer" "0228-3344556"
"Mia Keller" "089-6677889"
"Leon Zimmermann" "0711-1122445"
"Nina Krause" "0341-4455667"
"David Schulz" "030-9988776"
"Sarah Lehmann" "040-2233445"
"Ben Richter" "069-3344556"
"Hannah Wagner" "0221-5566778"
"Tom Bauer" "0171-1122334"
"Lena Fischer" "0151-6677889"
"Simon Meier" "030-7788990"
"Marie Becker" "040-4455667"
"Jan Hoffmann" "0711-3344556"
"Leonie Klein" "0341-5566778"
"Philipp König" "089-9988776"
"Laura Schulze" "0228-1122334"
"Moritz Wolf" "0157-4455667"
"Jana Zimmer" "030-6677889"
"Felix Neumann" "0170-2233445"
"Sarah Braun" "040-7788990"
"Tim Schäfer" "0711-5566778"
"Anna Fischer" "0341-9988776"
"Maximilian Keller" "030-1122334"
"Lea Wagner" "0151-3344556"
"Lukas Hofmann" "089-6677889"
"Marie Richter" "0221-7788990"
"Jonas Klein" "0171-4455667"
"Clara Zimmermann" "040-5566778"
"Paul Wolf" "030-2233445"
"Sophie Neumann" "0711-3344556"
"Ben Meier" "0341-5566778"
"Emma Bauer" "0157-9988776"
"Leon Krause" "089-1122334"
"Julia Schulz" "0228-4455667"
"Tim Richter" "030-6677889"
"Anna Becker" "0170-2233445"

This 12-line python reads the file, strips leading/trailing spaces,
removes quote marks, determines the longest name for spacing (if new
longer names are added it still pretty-prints), and outputs
first-last-phone numbered and aligned and sorted by last-first.

import sys
names = []
longname = 0
with open(sys.argv[1],'r') as f:
for line in f:
line = line.replace('"','')
e = [i.strip() for i in line.split(' ') if i]
ln = len(e[0]) + len(e[1])
if ln > longname: longname = ln
names.append((e[1], e[0], e[2]))
for i,n in enumerate(sorted(names)):
print("%2d. %-*s %s" % (i+1, longname+2, n[1]+' '+n[0], n[2]))

1. Emma Bauer 0157-9988776
2. Tom Bauer 0171-1122334
3. Anna Becker 0170-2233445
4. Marie Becker 040-4455667
5. Tim Becker 0151-1112223
6. Michael Braun 0170-9988776
7. Sarah Braun 040-7788990
8. Anna Fischer 0341-9988776
9. Laura Fischer 040-9876543
10. Lena Fischer 0151-6677889
11. Felix Hoffmann 0711-5566778
12. Jan Hoffmann 0711-3344556
13. Clara Hofmann 0157-2233445
14. Lukas Hofmann 089-6677889
15. Maximilian Keller 030-1122334
16. Mia Keller 089-6677889
17. Jonas Klein 0171-4455667
18. Jonas Klein 030-4455667
19. Leonie Klein 0341-5566778
20. Leon Krause 089-1122334
21. Nina Krause 0341-4455667
22. Lukas König 069-7788990
23. Philipp König 089-9988776
24. Sarah Lehmann 040-2233445
25. Ben Meier 0341-5566778
26. Simon Meier 030-7788990
27. Max Mustermann 0123-4567890
28. Anna Müller 0987-6543210
29. Felix Neumann 0170-2233445
30. Julia Neumann 0221-3344556
31. Sophie Neumann 0711-3344556
32. Ben Richter 069-3344556
33. Lea Richter 0341-1122334
34. Marie Richter 0221-7788990
35. Tim Richter 030-6677889
36. Peter Schmidt 030-1234567
37. David Schulz 030-9988776
38. Julia Schulz 0228-4455667
39. Laura Schulze 0228-1122334
40. Paul Schäfer 0228-3344556
41. Tim Schäfer 0711-5566778
42. Hannah Wagner 0221-5566778
43. Lea Wagner 0151-3344556
44. Sophie Wagner 089-2233445
45. Emma Wolf 040-5566778
46. Moritz Wolf 0157-4455667
47. Paul Wolf 030-2233445
48. Jana Zimmer 030-6677889
49. Clara Zimmermann 040-5566778
50. Leon Zimmermann 0711-1122445

If you can do that with sed and regex and pipes in 1 or 2 lines, I'm
gonna hurl.

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Mon Mar 16 19:09:18 2026

From Newsgroup: comp.lang.c

On 3/16/2026 6:26 PM, Richard Harnden wrote:

On 16/03/2026 20:43, DFS wrote:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

---
#!/bin/ksh

while read
do
echo ${REPLY//[ \"]/}
done | sort
---

[After Scott's sed]

output:

AnnaBecker0170-2233445
AnnaFischer0341-9988776
AnnaMüller0987-6543210
BenMeier0341-5566778
BenRichter069-3344556
ClaraHofmann0157-2233445

Me know that not right!

--- Synchronet 3.21d-Linux NewsLink 1.2

From Richard Harnden@richard.nospam@gmail.invalid to comp.lang.c on Mon Mar 16 23:17:10 2026

From Newsgroup: comp.lang.c

On 16/03/2026 23:09, DFS wrote:

On 3/16/2026 6:26 PM, Richard Harnden wrote:

On 16/03/2026 20:43, DFS wrote:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

---
#!/bin/ksh

while read
do
echo ${REPLY//[ \"]/}
done | sort
---

[After Scott's sed]

output:

AnnaBecker0170-2233445
AnnaFischer0341-9988776
AnnaMüller0987-6543210
BenMeier0341-5566778
BenRichter069-3344556
ClaraHofmann0157-2233445

Me know that not right!

Yes. You and Bart are right.

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Mon Mar 16 19:21:22 2026

From Newsgroup: comp.lang.c

On 3/16/2026 7:17 PM, Richard Harnden wrote:

On 16/03/2026 23:09, DFS wrote:

On 3/16/2026 6:26 PM, Richard Harnden wrote:

On 16/03/2026 20:43, DFS wrote:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

---
#!/bin/ksh

while read
do
echo ${REPLY//[ \"]/}
done | sort
---

[After Scott's sed]

output:

AnnaBecker0170-2233445
AnnaFischer0341-9988776
AnnaMüller0987-6543210
BenMeier0341-5566778
BenRichter069-3344556
ClaraHofmann0157-2233445

Me know that not right!

Yes. You and Bart are right.

Can you make it look like:

Anna Becker 0170-2233445
Anna Fischer 0341-9988776
Anna Müller 0987-6543210
Ben Meier 0341-5566778
Ben Richter 069-3344556
Clara Hofmann 0157-2233445

--- Synchronet 3.21d-Linux NewsLink 1.2

From Richard Harnden@richard.nospam@gmail.invalid to comp.lang.c on Mon Mar 16 23:34:57 2026

From Newsgroup: comp.lang.c

On 16/03/2026 23:21, DFS wrote:

On 3/16/2026 7:17 PM, Richard Harnden wrote:

On 16/03/2026 23:09, DFS wrote:

On 3/16/2026 6:26 PM, Richard Harnden wrote:

On 16/03/2026 20:43, DFS wrote:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

---
#!/bin/ksh

while read
do
     echo ${REPLY//[ \"]/}
done | sort
---

[After Scott's sed]

output:

AnnaBecker0170-2233445
AnnaFischer0341-9988776
AnnaMüller0987-6543210
BenMeier0341-5566778
BenRichter069-3344556
ClaraHofmann0157-2233445

Me know that not right!

Yes. You and Bart are right.

Can you make it look like:

Anna Becker     0170-2233445
Anna Fischer    0341-9988776
Anna Müller     0987-6543210
Ben Meier       0341-5566778
Ben Richter     069-3344556
Clara Hofmann   0157-2233445

Kinda ...

$ awk -F\" '{printf("%-20s|%s\n", $2, $4)}' input.txt \ |
sort -t\| -k1 \ |
tr -d "|" \ |
head
Anna Becker 0170-2233445
Anna Fischer 0341-9988776
Anna Müller 0987-6543210
Ben Meier 0341-5566778
Ben Richter 069-3344556
Clara Hofmann 0157-2233445
Clara Zimmermann 040-5566778
David Schulz 030-9988776
Emma Bauer 0157-9988776
Emma Wolf 040-5566778

... it's not pretty.

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Mon Mar 16 19:41:07 2026

From Newsgroup: comp.lang.c

On 3/16/2026 6:26 PM, Bart wrote:

On 16/03/2026 20:43, DFS wrote:

On 3/14/2026 8:15 AM, Bart wrote:

On 14/03/2026 06:53, Bonita Montero wrote:

Eh, 35 lines. I forget to remove namePad.

<snip>

Your C++ always looks like total gobbledygook.

Not sure you can fix it - it's just how C++ looks.

Without using or looking back at Montero's code, I did my own research
and replicated this little bit of functionality using recommended C++
style and libraries and functions:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

and this is what it looks like:

=================================================================================
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <regex>
#include <iomanip>

using namespace std;

int main(int argc, char* argv[]) {
     vector<string> lines;             // store all lines >>      string line;                      // store one line

     regex re_strip ("^\\s+|\\s+$"); // regex to strip whitespace
     regex re_quotes ("\"");             // regex to remove quote marks

     ifstream file(argv[1]);            // Open the file
     while (getline(file, line)) {     // read lines, clean, add to array
         line = regex_replace(line, re_strip , "");
         line = regex_replace(line, re_quotes, "");
         lines.push_back(line);
     }
     file.close();

     sort(lines.begin(), lines.end());
     int i = 0;
     for (const string& stored_line : lines) {
         cout << right << setw(2) << ++i << ". " << stored_line << endl;
     }
     return 0;
}

This works very differently:

It wasn't meant to duplicate Montero's.

It was meant as a demo of how C++ looks by default.

Perhaps it's not as simple as it seemed! Did your Python version do the
same as this?

My Python version:

1. Anna Becker 0170-2233445
2. Anna Fischer 0341-9988776
3. Anna Müller 0987-6543210
4. Ben Meier 0341-5566778
5. Ben Richter 069-3344556
6. Clara Hofmann 0157-2233445
7. Clara Zimmermann 040-5566778
...

it handled blank lines, but no other malformed data checks.

It's a rather fiddly spec for me to bother doing it properly, but I
threw something together in my scripting language which is shown below.
It shows the equivalent of BM's output.

But it cuts some corners: the column widths of the output are hardcoded (easy to fix but untidy), and it uses a custom sort routine (not shown,
but is a bubble sort), as I don't have a ready-made library routine that takes a compare function.

This is a python implementation of the 'quicksort' algorithm I found online.

# Simplified in-place QuickSort in Python
def quick_sort(array, low, high):
if low < high:
pi = partition(array, low, high)
quick_sort(array, low, pi - 1)
quick_sort(array, pi + 1, high)

def partition(array, low, high):
pivot = array[high]
i = low - 1
for j in range(low, high):
if array[j] <= pivot:
i += 1
array[i], array[j] = array[j], array[i]
array[i + 1], array[high] = array[high], array[i + 1]
return i + 1

I assume you could easily port it to BartScript?

Note: Python now uses PowerSort - a recent change from the long-time
TimSort.

=================================================================================

fairly hideous to look at, and 6 includes required?

Even with the -Os compile flag you mentioned, the executable is 101MB.
That's crazy for that tiny bit of functionality.

But for this little code, C++ does some nice things for you: memory
management,

Yeah. But in a language like C (or my static one), I'd just use a
slightly different approach. Maybe an extra pass over the data to
establish the size of the table, then you just allocate it all at once.

(My compiler project never actually free any memory!)

easier regex usage, and easier sorting.

I've never used regex. As you can see from my example, decent i/o
routines can eliminate the need.

Sorting though can get complex. You'd need a lot more than Sort with a compare function to work with real phone data.

-------------------------
record rec =
    var name, tel
end

f:=openfile(sread("n"))

phonelist::=()

while not eof(f) do
    readln @f, name:"s", tel:"s"
    nextloop when name=""

    phonelist &:= rec(name, tel)
end

closefile(f)

sort(phonelist)

for i,x in phonelist do
    fprintln "#. # #", i:"3", x.name:"15jl", x.tel
end

------------------

(The "s" input format reads case-preserved names or files. Inputs can be quoted to allow embedded separators within context. Quotes are
discarded. White space around items is skipped anyway.

Here also, names can have embedded quotes, but they need to be doubled up:

"White ""House""" -> White "House")

How can we run a BartScript program?

I assume you rarely reply with C code because of the coding time?

Python takes me 1/5th to 1/3rd the time of C to write. Fantastic
language in many ways.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Richard Harnden@richard.nospam@gmail.invalid to comp.lang.c on Tue Mar 17 00:09:22 2026

From Newsgroup: comp.lang.c

On 16/03/2026 23:34, Richard Harnden wrote:

On 16/03/2026 23:21, DFS wrote:

On 3/16/2026 7:17 PM, Richard Harnden wrote:

On 16/03/2026 23:09, DFS wrote:

On 3/16/2026 6:26 PM, Richard Harnden wrote:

On 16/03/2026 20:43, DFS wrote:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

---
#!/bin/ksh

while read
do
     echo ${REPLY//[ \"]/}
done | sort
---

[After Scott's sed]

output:

AnnaBecker0170-2233445
AnnaFischer0341-9988776
AnnaMüller0987-6543210
BenMeier0341-5566778
BenRichter069-3344556
ClaraHofmann0157-2233445

Me know that not right!

Yes. You and Bart are right.

Can you make it look like:

Anna Becker     0170-2233445
Anna Fischer    0341-9988776
Anna Müller     0987-6543210
Ben Meier       0341-5566778
Ben Richter     069-3344556
Clara Hofmann   0157-2233445

Kinda ...

$ awk -F\" '{printf("%-20s|%s\n", $2, $4)}' input.txt \ |
sort -t\| -k1 \ |
tr -d "|" \ |
head
Anna Becker         0170-2233445
Anna Fischer        0341-9988776
Anna Müller         0987-6543210
Ben Meier           0341-5566778
Ben Richter         069-3344556
Clara Hofmann       0157-2233445
Clara Zimmermann    040-5566778
David Schulz        030-9988776
Emma Bauer          0157-9988776
Emma Wolf           040-5566778

... it's not pretty.

I split the long line and got the '| \' backwards :(

Second attempt, this time with line numbers ...

$ awk -F\" '{printf("%-20s|%s\n", $2, $4)}' input.txt | \
sort -t\| -k1 | \
awk -F\| 'BEGIN {i=1} {printf("%2d. %-20s|%s\n", i,$1,$2); i++}' | \
tr -d "|"

1. Anna Becker 0170-2233445
2. Anna Fischer 0341-9988776
3. Anna Müller 0987-6543210
4. Ben Meier 0341-5566778
5. Ben Richter 069-3344556
[...]
47. Tim Becker 0151-1112223
48. Tim Richter 030-6677889
49. Tim Schäfer 0711-5566778
50. Tom Bauer 0171-1122334

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Tue Mar 17 00:29:11 2026

From Newsgroup: comp.lang.c

On 16/03/2026 23:41, DFS wrote:

On 3/16/2026 6:26 PM, Bart wrote:

This works very differently:

It wasn't meant to duplicate Montero's.

It was meant as a demo of how C++ looks by default.

OK.

This is a python implementation of the 'quicksort' algorithm I found
online.

# Simplified in-place QuickSort in Python
def quick_sort(array, low, high):
    if low < high:
        pi = partition(array, low, high)
        quick_sort(array, low, pi - 1)
        quick_sort(array, pi + 1, high)

def partition(array, low, high):
    pivot = array[high]
    i = low - 1
    for j in range(low, high):
        if array[j] <= pivot:
            i += 1
            array[i], array[j] = array[j], array[i]
    array[i + 1], array[high] = array[high], array[i + 1]
    return i + 1

I assume you could easily port it to BartScript?

Well, I have quicksort routines, they could have been adapted, but for
this task the tricky bits were input, string processing, and formatted
output.

(Also I quite like using bubble sort because so many deride it!)

   "White ""House""" -> White "House")

How can we run a BartScript program?

I assume you're on Linux? Then download this C file:

https://github.com/sal55/langs/blob/master/qq.c

(Note: this is 2MB and is some 75Kloc of machine-generated C.)

Build it like this (instructions are at the top too):

gcc -O2 qq.c -o qq -lm -ldl -fno-strict-aliasing

At the same link you will also see hello.q and test.q (the full version
of the program that I'd posted). Note the language is actually called
'Q' and source files have .q extensions.

Test like this (file extensions are optional):

./qq hello

Run the telno program 'test.q' like this:

./qq test input

(Replace 'input' with the name of the data-file.)

(The qq.c file was generated on Windows by a special version of my
systems compiler, like this:

c:\qx>mc -linux qq
Compiling qq.m to qq.c

This translates the intermediate representation of this 35-module 40Kloc application into a single C source file, but it is very poor quality
code. The -linux option makes it include the right OS-specific module.)

I assume you rarely reply with C code because of the coding time?

I've been posting here on and off for some decades. I've posted plenty
of C code! But I'm interested in PL design and ergonomics, so take a
wider view of things than most here.

Python takes me 1/5th to 1/3rd the time of C to write. Fantastic
language in many ways.

Yeah, but it is rather on the big scale compared to how my own tools
work. Here is how I've just run that same 'test.q' file, but with a
couple of twists:

c:\qapps>tim mm8 -r \mx\mm -r \qx\qq test abc
Compiling \mx\mm.m to \mx\mm.(run)
Compiling \qx\qq.m to \qx\qq.(run)
1. Mickey Mouse 001 123 456 7890
2. Mother 0049 211 151395
3. White House 001 202 456 1414
4. White House 001 202 456 1413
5. White House 001 202 456 1415
Time: 0.157

'tim' is a timing tool. 'mm8.exe' is a production compiler for v8 of my systems language. It has the ability to run programs from source (-r).
'qq' is the interpreter for 'Bartscript'.

Here, I built a new version of this compiler from source, ran it
immediately to build and immediately run the Q interpreter from source,
which then runs this test. It took 1/6th second in total.

This is the equivalent of first using gcc to build gcc from source; then
using that to build CPython from source; then using that to run test.py.

That's not going to happen in 1/6th of a second! Not unless you have a supercomputer.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c on Tue Mar 17 00:49:42 2026

From Newsgroup: comp.lang.c

On 16/03/2026 23:07, DFS wrote:

On 3/16/2026 4:57 PM, Scott Lurndal wrote:

DFS <nospam@dfs.com> writes:

On 3/14/2026 8:15 AM, Bart wrote:

On 14/03/2026 06:53, Bonita Montero wrote:

Eh, 35 lines. I forget to remove namePad.

<snip>

Your C++ always looks like total gobbledygook.

Not sure you can fix it - it's just how C++ looks.

Without using or looking back at Montero's code, I did my own research
and replicated this little bit of functionality using recommended C++
style and libraries and functions:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

Technically, your example removes _leading_ and _trailing_
whitespace, no?

Yes.

$ sed -e 's/^[ \t]*//;s/[ \t]*$//' -e 's/"//' < inputfile | sort

This is what I got from that:

Anna Becker" "0170-2233445"
Anna Fischer"   "0341-9988776"
Anna Müller"   "0987-6543210"
Ben Meier" "0341-5566778"
Ben Richter" "069-3344556"
...

Not what we're looking for.

But for my money, python is the best tradeoff of ease of use,
readability, speed and functionality.

For functionality that can be composed from standard command
line utilities, even python loses.

Sure, but sed and regex are inscrutable.

original Montero file

   "Max Mustermann"   "0123-4567890"
"Anna Müller"   "0987-6543210"
    "Peter Schmidt" "030-1234567"
"Laura Fischer"      "040-9876543"
"Tim Becker" "0151-1112223"
    "Julia Neumann" "0221-3344556"
"Michael Braun"   "0170-9988776"
"Sophie Wagner"   "089-2233445"
"Felix Hoffmann" "0711-5566778"
    "Lea Richter"   "0341-1122334"
"Jonas Klein"   "030-4455667"
"Emma Wolf" "040-5566778"
   "Lukas König" "069-7788990"
"Clara Hofmann"     "0157-2233445"
"Paul Schäfer" "0228-3344556"
    "Mia Keller" "089-6677889"
"Leon Zimmermann"   "0711-1122445"
"Nina Krause" "0341-4455667"
"David Schulz" "030-9988776"
    "Sarah Lehmann"   "040-2233445"
"Ben Richter" "069-3344556"
"Hannah Wagner" "0221-5566778"
   "Tom Bauer"   "0171-1122334"
"Lena Fischer" "0151-6677889"
   "Simon Meier" "030-7788990"
"Marie Becker"   "040-4455667"
"Jan Hoffmann"   "0711-3344556"
"Leonie Klein" "0341-5566778"
    "Philipp König" "089-9988776"
"Laura Schulze" "0228-1122334"
"Moritz Wolf" "0157-4455667"
"Jana Zimmer" "030-6677889"
    "Felix Neumann" "0170-2233445"
"Sarah Braun" "040-7788990"
"Tim Schäfer" "0711-5566778"
"Anna Fischer"   "0341-9988776"
"Maximilian Keller" "030-1122334"
"Lea Wagner" "0151-3344556"
   "Lukas Hofmann" "089-6677889"
"Marie Richter" "0221-7788990"
     "Jonas Klein" "0171-4455667"
"Clara Zimmermann"   "040-5566778"
"Paul Wolf" "030-2233445"
"Sophie Neumann" "0711-3344556"
    "Ben Meier" "0341-5566778"
"Emma Bauer" "0157-9988776"
"Leon Krause" "089-1122334"
"Julia Schulz"   "0228-4455667"
"Tim Richter" "030-6677889"
"Anna Becker" "0170-2233445"

This 12-line python reads the file, strips leading/trailing spaces,
removes quote marks, determines the longest name for spacing (if new
longer names are added it still pretty-prints), and outputs
first-last-phone numbered and aligned and sorted by last-first.

import sys
names = []
longname = 0
with open(sys.argv[1],'r') as f:
    for line in f:
        line = line.replace('"','')
        e = [i.strip() for i in line.split(' ') if i]
        ln = len(e[0]) + len(e[1])
        if ln > longname: longname = ln
        names.append((e[1], e[0], e[2]))
for i,n in enumerate(sorted(names)):
    print("%2d. %-*s %s" % (i+1, longname+2, n[1]+' '+n[0], n[2]))

1. Emma Bauer         0157-9988776
2. Tom Bauer          0171-1122334
3. Anna Becker        0170-2233445
4. Marie Becker       040-4455667
5. Tim Becker         0151-1112223
6. Michael Braun      0170-9988776
7. Sarah Braun        040-7788990
8. Anna Fischer       0341-9988776
9. Laura Fischer      040-9876543
10. Lena Fischer       0151-6677889
11. Felix Hoffmann     0711-5566778
12. Jan Hoffmann       0711-3344556
13. Clara Hofmann      0157-2233445
14. Lukas Hofmann      089-6677889
15. Maximilian Keller 030-1122334
16. Mia Keller         089-6677889
17. Jonas Klein        0171-4455667
18. Jonas Klein        030-4455667
19. Leonie Klein       0341-5566778
20. Leon Krause        089-1122334
21. Nina Krause        0341-4455667
22. Lukas König        069-7788990
23. Philipp König      089-9988776
24. Sarah Lehmann      040-2233445
25. Ben Meier          0341-5566778
26. Simon Meier        030-7788990
27. Max Mustermann     0123-4567890
28. Anna Müller        0987-6543210
29. Felix Neumann      0170-2233445
30. Julia Neumann      0221-3344556
31. Sophie Neumann     0711-3344556
32. Ben Richter        069-3344556
33. Lea Richter        0341-1122334
34. Marie Richter      0221-7788990
35. Tim Richter        030-6677889
36. Peter Schmidt      030-1234567
37. David Schulz       030-9988776
38. Julia Schulz       0228-4455667
39. Laura Schulze      0228-1122334
40. Paul Schäfer       0228-3344556
41. Tim Schäfer        0711-5566778
42. Hannah Wagner      0221-5566778
43. Lea Wagner         0151-3344556
44. Sophie Wagner      089-2233445
45. Emma Wolf          040-5566778
46. Moritz Wolf        0157-4455667
47. Paul Wolf          030-2233445
48. Jana Zimmer        030-6677889
49. Clara Zimmermann   040-5566778
50. Leon Zimmermann    0711-1122445

If you can do that with sed and regex and pipes in 1 or 2 lines, I'm
gonna hurl.

I can do something that works with the provided data with util-linux,
sed, awk, and coreutils:

{ sed 's/"//g' | awk '{print $1 " " $2 ":" $3}' | sort -k2 | column -t
-s: | nl -s". " ; } < data

where the file "data" contains the original list. Although there will be
some differences in behaviour for some other data and I think your
python and the above will do unintuitive things with other data.

The pipeline goes as follows:
sed strips quotes
awk makes two delimited columns
sort sorts on surnames, and where surnames are identical on telno
column presents columns with a visual model
nl adds number prefixes

If I really try and use tee and fifos I can use many fewer tools.

None of these are really good languages/toolsets for the task. A DSL
with a data schema would be best.
--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Mon Mar 16 21:27:59 2026

From Newsgroup: comp.lang.c

On 3/16/2026 7:34 PM, Richard Harnden wrote:

On 16/03/2026 23:21, DFS wrote:

On 3/16/2026 7:17 PM, Richard Harnden wrote:

On 16/03/2026 23:09, DFS wrote:

On 3/16/2026 6:26 PM, Richard Harnden wrote:

On 16/03/2026 20:43, DFS wrote:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

---
#!/bin/ksh

while read
do
     echo ${REPLY//[ \"]/}
done | sort
---

[After Scott's sed]

output:

AnnaBecker0170-2233445
AnnaFischer0341-9988776
AnnaMüller0987-6543210
BenMeier0341-5566778
BenRichter069-3344556
ClaraHofmann0157-2233445

Me know that not right!

Yes. You and Bart are right.

Can you make it look like:

Anna Becker     0170-2233445
Anna Fischer    0341-9988776
Anna Müller     0987-6543210
Ben Meier       0341-5566778
Ben Richter     069-3344556
Clara Hofmann   0157-2233445

Kinda ...

$ awk -F\" '{printf("%-20s|%s\n", $2, $4)}' input.txt \ |
sort -t\| -k1 \ |
tr -d "|" \ |
head
Anna Becker         0170-2233445
Anna Fischer        0341-9988776
Anna Müller         0987-6543210
Ben Meier           0341-5566778
Ben Richter         069-3344556
Clara Hofmann       0157-2233445
Clara Zimmermann    040-5566778
David Schulz        030-9988776
Emma Bauer          0157-9988776
Emma Wolf           040-5566778

... it's not pretty.

But you cranked it out in a few minutes.

I never learned those old Unix utilities, like sed and awk. They're
powerful, but odd to look at.

I dropped your awk in ChatGPT and it gave a nice analysis.

Then it suggested this one-line "improvement" in awk:

awk -F\" '{a[$2]=$4} END {for (i in a) printf("%-20s%s\n", i, a[i]) |
"sort"}' input.txt

and it worked perfectly.

It's a new day!

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Mon Mar 16 21:45:36 2026

From Newsgroup: comp.lang.c

On 3/16/2026 8:09 PM, Richard Harnden wrote:

I split the long line and got the '| \' backwards :(

Second attempt, this time with line numbers ...

$ awk -F\" '{printf("%-20s|%s\n", $2, $4)}' input.txt | \
    sort -t\| -k1 | \
    awk -F\| 'BEGIN {i=1} {printf("%2d. %-20s|%s\n", i,$1,$2); i++}' | \
    tr -d "|"

1. Anna Becker         0170-2233445
2. Anna Fischer        0341-9988776
3. Anna Müller         0987-6543210
4. Ben Meier           0341-5566778
5. Ben Richter         069-3344556
[...]
47. Tim Becker          0151-1112223
48. Tim Richter         030-6677889
49. Tim Schäfer         0711-5566778
50. Tom Bauer           0171-1122334

Nice.

Next things to try are:

1) continue to output first-last name, but sort by last name-first name
2) instead of hard-coding 20, have it determine the longest name in the
input, and adjust the spacing so it always pretty-prints with no name
collision into the phone number column.

The python 12-liner for that is:

import sys
names = []
longname = 0
with open(sys.argv[1],'r') as f:
    for line in f:
line = line.replace('"','')
e = [i.strip() for i in line.split(' ') if i]
ln = len(e[0]) + len(e[1])
if ln > longname: longname = ln
names.append((e[1], e[0], e[2]))
for i,n in enumerate(sorted(names)):
    print("%2d. %-*s %s" % (i+1, longname+2, n[1]+' '+n[0], n[2]))

1. Emma Bauer 0157-9988776
2. Tom Bauer 0171-1122334
3. Anna Becker 0170-2233445
4. Marie Becker 040-4455667
5. Tim Becker 0151-1112223
6. Michael Braun 0170-9988776
7. Sarah Braun 040-7788990
8. Anna Fischer 0341-9988776
9. Laura Fischer 040-9876543
10. Lena Fischer 0151-6677889
11. Felix Hoffmann 0711-5566778
12. Jan Hoffmann 0711-3344556
13. Clara Hofmann 0157-2233445
14. Lukas Hofmann 089-6677889
15. Maximilian Keller 030-1122334
16. Mia Keller 089-6677889
17. Jonas Klein 0171-4455667
18. Jonas Klein 030-4455667
19. Leonie Klein 0341-5566778
20. Leon Krause 089-1122334
21. Nina Krause 0341-4455667
22. Lukas König 069-7788990
23. Philipp König 089-9988776
24. Sarah Lehmann 040-2233445
25. Ben Meier 0341-5566778
26. Simon Meier 030-7788990
27. Max Mustermann 0123-4567890
28. Anna Müller 0987-6543210
29. Felix Neumann 0170-2233445
30. Julia Neumann 0221-3344556
31. Sophie Neumann 0711-3344556
32. Ben Richter 069-3344556
33. Lea Richter 0341-1122334
34. Marie Richter 0221-7788990
35. Tim Richter 030-6677889
36. Peter Schmidt 030-1234567
37. David Schulz 030-9988776
38. Julia Schulz 0228-4455667
39. Laura Schulze 0228-1122334
40. Paul Schäfer 0228-3344556
41. Tim Schäfer 0711-5566778
42. Hannah Wagner 0221-5566778
43. Lea Wagner 0151-3344556
44. Sophie Wagner 089-2233445
45. Emma Wolf 040-5566778
46. Moritz Wolf 0157-4455667
47. Paul Wolf 030-2233445
48. Jana Zimmer 030-6677889
49. Clara Zimmermann 040-5566778
50. Leon Zimmermann 0711-1122445
--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Mar 17 05:21:11 2026

From Newsgroup: comp.lang.c

On 2026-03-17 01:49, Tristan Wibberley wrote:

On 16/03/2026 23:07, DFS wrote:

On 3/16/2026 4:57 PM, Scott Lurndal wrote:

DFS <nospam@dfs.com> writes:

On 3/14/2026 8:15 AM, Bart wrote:

On 14/03/2026 06:53, Bonita Montero wrote:

Eh, 35 lines. I forget to remove namePad.

<snip>

Your C++ always looks like total gobbledygook.

Not sure you can fix it - it's just how C++ looks.

Without using or looking back at Montero's code, I did my own research >>>> and replicated this little bit of functionality using recommended C++
style and libraries and functions:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

Technically, your example removes _leading_ and _trailing_
whitespace, no?

Yes.

$ sed -e 's/^[ \t]*//;s/[ \t]*$//' -e 's/"//' < inputfile | sort

This is what I got from that:

Anna Becker" "0170-2233445"
Anna Fischer"   "0341-9988776"
Anna Müller"   "0987-6543210"
Ben Meier" "0341-5566778"
Ben Richter" "069-3344556"
...

Not what we're looking for.

But for my money, python is the best tradeoff of ease of use,
readability, speed and functionality.

For functionality that can be composed from standard command
line utilities, even python loses.

Sure, but sed and regex are inscrutable.

original Montero file

[...]

This 12-line python reads the file, strips leading/trailing spaces,
removes quote marks, determines the longest name for spacing (if new
longer names are added it still pretty-prints), and outputs
first-last-phone numbered and aligned and sorted by last-first.

import sys
names = []
longname = 0
with open(sys.argv[1],'r') as f:
    for line in f:
        line = line.replace('"','')
        e = [i.strip() for i in line.split(' ') if i]
        ln = len(e[0]) + len(e[1])
        if ln > longname: longname = ln
        names.append((e[1], e[0], e[2]))
for i,n in enumerate(sorted(names)):
    print("%2d. %-*s %s" % (i+1, longname+2, n[1]+' '+n[0], n[2]))

[...]

If you can do that with sed and regex and pipes in 1 or 2 lines, I'm
gonna hurl.

I can do something that works with the provided data with util-linux,
sed, awk, and coreutils:

{ sed 's/"//g' | awk '{print $1 " " $2 ":" $3}' | sort -k2 | column -t
-s: | nl -s". " ; } < data

If quote-stripping is the only thing that 'sed' is used for I'd
suggest to use 'tr -d' instead (and avoid 'sed'; but see below).

The way Awk is used here may work for the given data with names
of exactly two name components but fail otherwise (say for names
like "Charles Antony Richard Hoare").

The key tasks is sensibly identifying the actual payload data.

where the file "data" contains the original list. Although there will be
some differences in behaviour for some other data and I think your
python and the above will do unintuitive things with other data.

The pipeline goes as follows:
sed strips quotes
awk makes two delimited columns
sort sorts on surnames, and where surnames are identical on telno
column presents columns with a visual model
nl adds number prefixes

If I really try and use tee and fifos I can use many fewer tools.

None of these are really good languages/toolsets for the task. A DSL
with a data schema would be best.

I've lost track of what the actual requirements meanwhile are.
Double-quoted names and numbers are the payload to extract?
Result sorted by surname part of the name data?
Equal width columns? Numbered entries?
As Tristan already did; just compose tools/parameters as needed.

I usually don't like 'sed' too much (prefer 'awk' for example),
but 'sed' can do the payload extraction, and the rest can be
composed by other standard Unix tools as desired. For example,

sed -e 's/[^"]*"$[^"]*$"[ \t]*"$[^"]*$"/\1:\2/' < data |
sort -k2 | column -ts: | nl -s ". "

The regexp needs some attention; it basically just extracts the
two data parts between the quotes and produces the data separated
by a colon. The rest does (as borrowed from Tristan's suggestion)
the sorting, the "columnation", and numbering.

This was designed to separate the key elements of the requirements
with clear responsibilities between the tools.

(You could of course do all that also with GNU Awk alone. But it
would then look more clumsy, a bit like that Python script above.)

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Mar 17 05:38:23 2026

From Newsgroup: comp.lang.c

On 2026-03-17 01:29, Bart wrote:

(Also I quite like using bubble sort because so many deride it!)

How miserable! (I feel so sorry for you.)

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Tue Mar 17 06:25:13 2026

From Newsgroup: comp.lang.c

Am 16.03.2026 um 21:43 schrieb DFS:

Not sure you can fix it - it's just how C++ looks.

It depends on you but not on the language.

fairly hideous to look at, and 6 includes required?

If this is really a problem for you don't program at all.

Even with the -Os compile flag you mentioned, the executable is 101MB.

Windows / MSCVC: ~260kiB.
Linux / clang++: ~190kiB.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Richard Harnden@richard.nospam@gmail.invalid to comp.lang.c on Tue Mar 17 10:42:54 2026

From Newsgroup: comp.lang.c

On 17/03/2026 01:45, DFS wrote:

On 3/16/2026 8:09 PM, Richard Harnden wrote:

I split the long line and got the '| \' backwards :(

Second attempt, this time with line numbers ...

$ awk -F\" '{printf("%-20s|%s\n", $2, $4)}' input.txt | \
     sort -t\| -k1 | \
     awk -F\| 'BEGIN {i=1} {printf("%2d. %-20s|%s\n", i,$1,$2); i++}' | \
     tr -d "|"

  1. Anna Becker         0170-2233445
  2. Anna Fischer        0341-9988776
  3. Anna Müller         0987-6543210
  4. Ben Meier           0341-5566778
  5. Ben Richter         069-3344556
[...]
47. Tim Becker          0151-1112223
48. Tim Richter         030-6677889
49. Tim Schäfer         0711-5566778
50. Tom Bauer           0171-1122334

Nice.

Next things to try are:

1) continue to output first-last name, but sort by last name-first name
2) instead of hard-coding 20, have it determine the longest name in the
   input, and adjust the spacing so it always pretty-prints with no name
   collision into the phone number column.

The python 12-liner for that is:

import sys
names = []
longname = 0
with open(sys.argv[1],'r') as f:
    for line in f:
        line = line.replace('"','')
        e = [i.strip() for i in line.split(' ') if i]
        ln = len(e[0]) + len(e[1])
        if ln > longname: longname = ln
        names.append((e[1], e[0], e[2]))
for i,n in enumerate(sorted(names)):
    print("%2d. %-*s %s" % (i+1, longname+2, n[1]+' '+n[0], n[2]))

1. Emma Bauer         0157-9988776
2. Tom Bauer          0171-1122334
[...]
49. Clara Zimmermann   040-5566778
50. Leon Zimmermann    0711-1122445

18 line ksh ...

#!/bin/ksh

typeset -i MAX=$(
while read FIRST LAST PHONE
do
NAME="${FIRST} ${LAST}"
echo ${#NAME}
done <input.txt |\
sort -nr |\
head -1
)

tr -d \" <input.txt |\
awk -v MAX=${MAX} '{printf("%-*s%s\n", MAX, $1" "$2, $3)}' |\
sort -k2 -k1 |\
nl -w2 -s". "

return 0

Output:
$ ./foo.ksh
1. Emma Bauer 0157-9988776
2. Tom Bauer 0171-1122334
3. Tim Becker 0151-1112223
4. Anna Becker 0170-2233445
5. Marie Becker 040-4455667
6. Michael Braun 0170-9988776
7. Sarah Braun 040-7788990
8. Lena Fischer 0151-6677889
9. Anna Fischer 0341-9988776
10. Laura Fischer 040-9876543
11. Jan Hoffmann 0711-3344556
12. Felix Hoffmann 0711-5566778
13. Clara Hofmann 0157-2233445
14. Lukas Hofmann 089-6677889
15. Maximilian Keller 030-1122334
16. Mia Keller 089-6677889
17. Jonas Klein 0171-4455667
18. Jonas Klein 030-4455667
19. Leonie Klein 0341-5566778
20. Lukas König 069-7788990
21. Philipp König 089-9988776
22. Nina Krause 0341-4455667
23. Leon Krause 089-1122334
24. Sarah Lehmann 040-2233445
25. Simon Meier 030-7788990
26. Ben Meier 0341-5566778
27. Anna Müller 0987-6543210
28. Max Mustermann 0123-4567890
29. Felix Neumann 0170-2233445
30. Julia Neumann 0221-3344556
31. Sophie Neumann 0711-3344556
32. Marie Richter 0221-7788990
33. Tim Richter 030-6677889
34. Lea Richter 0341-1122334
35. Ben Richter 069-3344556
36. Paul Schäfer 0228-3344556
37. Tim Schäfer 0711-5566778
38. Peter Schmidt 030-1234567
39. Julia Schulz 0228-4455667
40. David Schulz 030-9988776
41. Laura Schulze 0228-1122334
42. Lea Wagner 0151-3344556
43. Hannah Wagner 0221-5566778
44. Sophie Wagner 089-2233445
45. Moritz Wolf 0157-4455667
46. Paul Wolf 030-2233445
47. Emma Wolf 040-5566778
48. Jana Zimmer 030-6677889
49. Clara Zimmermann 040-5566778
50. Leon Zimmermann 0711-1122445

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Tue Mar 17 11:47:38 2026

From Newsgroup: comp.lang.c

On 17/03/2026 04:38, Janis Papanagnou wrote:

On 2026-03-17 01:29, Bart wrote:

(Also I quite like using bubble sort because so many deride it!)

How miserable! (I feel so sorry for you.)

Why would that be miserable?

Bubble sort is perfectly fine for smallish values of N and where it is
not frequently used. I also like it because I can easily write custom
sort routines from memory.

This function is used to sort the export tables for Windows's DLL files:

https://github.com/sal55/langs/blob/master/bubble.m

Such a table might have from tens of exported functions to perhaps hundreds.

Bubble-sorting even 1000 strings takes about 10ms, which is small
fraction of overall build-time of a library exporting 1000 functions.

For a mere 100 functions, sort time is negligible.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Mar 17 13:04:41 2026

From Newsgroup: comp.lang.c

On 2026-03-17 11:42, Richard Harnden wrote:

On 17/03/2026 01:45, DFS wrote:

On 3/16/2026 8:09 PM, Richard Harnden wrote:

I split the long line and got the '| \' backwards :(

Second attempt, this time with line numbers ...

$ awk -F\" '{printf("%-20s|%s\n", $2, $4)}' input.txt | \
     sort -t\| -k1 | \
     awk -F\| 'BEGIN {i=1} {printf("%2d. %-20s|%s\n", i,$1,$2); i++}' >>> | \
     tr -d "|"

  1. Anna Becker         0170-2233445
  2. Anna Fischer        0341-9988776
  3. Anna Müller         0987-6543210
  4. Ben Meier           0341-5566778
  5. Ben Richter         069-3344556
[...]
47. Tim Becker          0151-1112223
48. Tim Richter         030-6677889
49. Tim Schäfer         0711-5566778
50. Tom Bauer           0171-1122334

Nice.

Next things to try are:

1) continue to output first-last name, but sort by last name-first name
2) instead of hard-coding 20, have it determine the longest name in the
    input, and adjust the spacing so it always pretty-prints with no name >>     collision into the phone number column.

The python 12-liner for that is:

import sys
names = []
longname = 0
with open(sys.argv[1],'r') as f:
     for line in f:
         line = line.replace('"','')
         e = [i.strip() for i in line.split(' ') if i]
         ln = len(e[0]) + len(e[1])
         if ln > longname: longname = ln
         names.append((e[1], e[0], e[2]))
for i,n in enumerate(sorted(names)):
     print("%2d. %-*s %s" % (i+1, longname+2, n[1]+' '+n[0], n[2]))

  1. Emma Bauer         0157-9988776
  2. Tom Bauer          0171-1122334
  [...]
49. Clara Zimmermann   040-5566778
50. Leon Zimmermann    0711-1122445

18 line ksh ...

(I would count that as 11 "net lines", ignoring non-functional lines.)

#!/bin/ksh

typeset -i MAX=$(
    while read FIRST LAST PHONE
    do
        NAME="${FIRST} ${LAST}"
        echo ${#NAME}
    done <input.txt |\
    sort -nr |\
    head -1
)

tr -d \" <input.txt |\
    awk -v MAX=${MAX} '{printf("%-*s%s\n", MAX, $1" "$2, $3)}' |\
    sort -k2 -k1 |\
    nl -w2 -s". "

return 0

Output:
[...]

A version in GNU Awk (this time sorted on given name for simplicity):

{ line = gensub (/[^"]*"([^"]*)"[ \t]*"([^"]*)"/, "\\1:\\2", "g")
split (line, data, ":")
if ((len = length (data[1])) > max) max = len
list[data[1]] = data[2]
}
END { PROCINFO["sorted_in"] = "@ind_str_asc"
w = length (NR)
for (d in list)
printf "%*d. %-*s %s\n", w, ++c, max, d, list[d]
}

(with width of the sequence number also computed, not hard-coded,
and single-pass).

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Mar 17 13:08:47 2026

From Newsgroup: comp.lang.c

On 2026-03-17 12:47, Bart wrote:

On 17/03/2026 04:38, Janis Papanagnou wrote:

On 2026-03-17 01:29, Bart wrote:

(Also I quite like using bubble sort because so many deride it!)

How miserable! (I feel so sorry for you.)

Why would that be miserable?

Because of the reason you pretend for using it.

[...]

Bubble-sorting even 1000 strings takes about 10ms, which is small
fraction of overall build-time of a library exporting 1000 functions.

For a mere 100 functions, sort time is negligible.

And because of reason here based on arbitrary absolute times instead
of actual algorithmic complexity.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Richard Harnden@richard.nospam@gmail.invalid to comp.lang.c on Tue Mar 17 12:17:21 2026

From Newsgroup: comp.lang.c

On 17/03/2026 12:04, Janis Papanagnou wrote:

On 2026-03-17 11:42, Richard Harnden wrote:

On 17/03/2026 01:45, DFS wrote:

On 3/16/2026 8:09 PM, Richard Harnden wrote:

I split the long line and got the '| \' backwards :(

Second attempt, this time with line numbers ...

$ awk -F\" '{printf("%-20s|%s\n", $2, $4)}' input.txt | \
     sort -t\| -k1 | \
     awk -F\| 'BEGIN {i=1} {printf("%2d. %-20s|%s\n", i,$1,$2); i+ >>>> +}' | \
     tr -d "|"

  1. Anna Becker         0170-2233445
  2. Anna Fischer        0341-9988776
  3. Anna Müller         0987-6543210
  4. Ben Meier           0341-5566778
  5. Ben Richter         069-3344556
[...]
47. Tim Becker          0151-1112223
48. Tim Richter         030-6677889
49. Tim Schäfer         0711-5566778
50. Tom Bauer           0171-1122334

Nice.

Next things to try are:

1) continue to output first-last name, but sort by last name-first name
2) instead of hard-coding 20, have it determine the longest name in the
    input, and adjust the spacing so it always pretty-prints with no >>> name
    collision into the phone number column.

The python 12-liner for that is:

import sys
names = []
longname = 0
with open(sys.argv[1],'r') as f:
     for line in f:
         line = line.replace('"','')
         e = [i.strip() for i in line.split(' ') if i]
         ln = len(e[0]) + len(e[1])
         if ln > longname: longname = ln
         names.append((e[1], e[0], e[2]))
for i,n in enumerate(sorted(names)):
     print("%2d. %-*s %s" % (i+1, longname+2, n[1]+' '+n[0], n[2])) >>>

  1. Emma Bauer         0157-9988776
  2. Tom Bauer          0171-1122334
  [...]
49. Clara Zimmermann   040-5566778
50. Leon Zimmermann    0711-1122445

18 line ksh ...

(I would count that as 11 "net lines", ignoring non-functional lines.)

#!/bin/ksh

typeset -i MAX=$(
     while read FIRST LAST PHONE
     do
         NAME="${FIRST} ${LAST}"
         echo ${#NAME}
     done <input.txt |\
     sort -nr |\
     head -1
)

tr -d \" <input.txt |\
     awk -v MAX=${MAX} '{printf("%-*s%s\n", MAX, $1" "$2, $3)}' |\
     sort -k2 -k1 |\
     nl -w2 -s". "

return 0

Output:
[...]

A version in GNU Awk (this time sorted on given name for simplicity):

    { line = gensub (/[^"]*"([^"]*)"[ \t]*"([^"]*)"/, "\\1:\\2", "g")
      split (line, data, ":")
      if ((len = length (data[1])) > max) max = len
      list[data[1]] = data[2]
    }
    END { PROCINFO["sorted_in"] = "@ind_str_asc"
      w = length (NR)
      for (d in list)
        printf "%*d. %-*s %s\n", w, ++c, max, d, list[d]
    }

(with width of the sequence number also computed, not hard-coded,
and single-pass).

Janis

That's very nice. I wasn't happy with my parsing the file twice.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Tue Mar 17 12:31:04 2026

From Newsgroup: comp.lang.c

On 17/03/2026 01:45, DFS wrote:

On 3/16/2026 8:09 PM, Richard Harnden wrote:

I split the long line and got the '| \' backwards :(

Second attempt, this time with line numbers ...

$ awk -F\" '{printf("%-20s|%s\n", $2, $4)}' input.txt | \
     sort -t\| -k1 | \
     awk -F\| 'BEGIN {i=1} {printf("%2d. %-20s|%s\n", i,$1,$2); i++}' | \
     tr -d "|"

  1. Anna Becker         0170-2233445
  2. Anna Fischer        0341-9988776
  3. Anna Müller         0987-6543210
  4. Ben Meier           0341-5566778
  5. Ben Richter         069-3344556
[...]
47. Tim Becker          0151-1112223
48. Tim Richter         030-6677889
49. Tim Schäfer         0711-5566778
50. Tom Bauer           0171-1122334

Nice.

Next things to try are:

1) continue to output first-last name, but sort by last name-first name
2) instead of hard-coding 20, have it determine the longest name in the
   input, and adjust the spacing so it always pretty-prints with no name
   collision into the phone number column.

The python 12-liner for that is:

import sys
names = []
longname = 0
with open(sys.argv[1],'r') as f:
    for line in f:
        line = line.replace('"','')
        e = [i.strip() for i in line.split(' ') if i]
        ln = len(e[0]) + len(e[1])
        if ln > longname: longname = ln
        names.append((e[1], e[0], e[2]))
for i,n in enumerate(sorted(names)):
    print("%2d. %-*s %s" % (i+1, longname+2, n[1]+' '+n[0], n[2]))

Huh. I can just about do it in 11 lines:

maxlen:=0
names::=()
for x in readtextfile("abc") do
readln @x, name:"s", tel:"s"
(c, s):=splitstring(name, " ")
maxlen max:= name.len
names &:=(s, c, tel)
end
for i, x in sort(names) do
fprintln "#. # #", i:"2", x[2]+" "+x[1]:tostr(maxlen)+"jl", x[3]
end

Python has an advantage in not needing those 'end' lines! Otherwise my solution would be 9 lines (obviously still a long way from a one-liner
like Awk).

However mine is missing a couple of things: 'name.len' needs to be
replaced by a function that counts UTF8 characters. And 'sort' currently
is another custom routine that sorts only the surname.

(Python's 'sorted' presumably compares all elements LTR.)

But these lie outside the main program which remains at 11 lines.

(It's not clear how the Python would work when there is more than one
first name.)

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Tue Mar 17 12:37:33 2026

From Newsgroup: comp.lang.c

On 17/03/2026 12:08, Janis Papanagnou wrote:

On 2026-03-17 12:47, Bart wrote:

On 17/03/2026 04:38, Janis Papanagnou wrote:

On 2026-03-17 01:29, Bart wrote:

(Also I quite like using bubble sort because so many deride it!)

How miserable! (I feel so sorry for you.)

Why would that be miserable?

Because of the reason you pretend for using it.

[...]

Bubble-sorting even 1000 strings takes about 10ms, which is small
fraction of overall build-time of a library exporting 1000 functions.

For a mere 100 functions, sort time is negligible.

And because of reason here based on arbitrary absolute times instead
of actual algorithmic complexity.

No, timing is considered relative to the rest of the task.

If this ever became a bottleneck then it is a trivial upgrade to a
better routine.

(The real problem however is why MS require programs that generate DLL
files to do their own sorting, since it doesn't specify the sort
criteria. If it doesn't exactly match MS's compare function for looking
up symbol names, then it won't work.)
--- Synchronet 3.21d-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Tue Mar 17 14:46:29 2026

From Newsgroup: comp.lang.c

DFS <nospam@dfs.com> writes:

On 3/16/2026 4:57 PM, Scott Lurndal wrote:

DFS <nospam@dfs.com> writes:

On 3/14/2026 8:15 AM, Bart wrote:

On 14/03/2026 06:53, Bonita Montero wrote:

Eh, 35 lines. I forget to remove namePad.

<snip>

Your C++ always looks like total gobbledygook.

Not sure you can fix it - it's just how C++ looks.

Without using or looking back at Montero's code, I did my own research
and replicated this little bit of functionality using recommended C++
style and libraries and functions:

read file in
clean data (remove whitespace, remove quote marks)
sort
print

Technically, your example removes _leading_ and _trailing_
whitespace, no?

Yes.

$ sed -e 's/^[ \t]*//;s/[ \t]*$//' -e 's/"//' < inputfile | sort

This is what I got from that:

Anna Becker" "0170-2233445"
Anna Fischer" "0341-9988776"
Anna Müller" "0987-6543210"
Ben Meier" "0341-5566778"
Ben Richter" "069-3344556"
...

Not what we're looking for.

There's a missing 'g' in the second sed expression. My bad.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Mar 18 02:40:50 2026

From Newsgroup: comp.lang.c

On 2026-03-17 13:37, Bart wrote:

On 17/03/2026 12:08, Janis Papanagnou wrote:

On 2026-03-17 12:47, Bart wrote:

On 17/03/2026 04:38, Janis Papanagnou wrote:

On 2026-03-17 01:29, Bart wrote:

(Also I quite like using bubble sort because so many deride it!)

How miserable! (I feel so sorry for you.)

Why would that be miserable?

Because of the reason you pretend for using it.

[...]

Bubble-sorting even 1000 strings takes about 10ms, which is small
fraction of overall build-time of a library exporting 1000 functions.

For a mere 100 functions, sort time is negligible.

And because of reason here based on arbitrary absolute times instead
of actual algorithmic complexity.

No, timing is considered relative to the rest of the task.

If this ever became a bottleneck then it is a trivial upgrade to a
better routine.

If there are simple better algorithms there's no reason but
ignorance to not use them in the first place! - It's really
stupid to deliberately use inferior, or (as here) even the
worst existing algorithms. - What do professionals? - Let's
have a look at a Quicksort implementation on a CDC mainframe
back in the 1980's. They implemented it in a way that scales
according to the CS state-of-the-art; divide-et-impera for
the Quicksort, pick pivot element from a set of three, and
sort divided runs of length smaller than 10 with a straight
insertion sort algorithm. - No educated computer scientist
would have decided to choose Bubblesort in any place here.
Let alone for your "reason" that this algorithm is "derided".
There's a reason why it's derided; that's called scientific
analysis of its quality.

Janis

[...]

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Wed Mar 18 11:21:51 2026

From Newsgroup: comp.lang.c

On Wed, 18 Mar 2026 02:40:50 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-03-17 13:37, Bart wrote:

On 17/03/2026 12:08, Janis Papanagnou wrote:

On 2026-03-17 12:47, Bart wrote:

On 17/03/2026 04:38, Janis Papanagnou wrote:

On 2026-03-17 01:29, Bart wrote:

(Also I quite like using bubble sort because so many deride
it!)

How miserable!� (I feel so sorry for you.)

Why would that be miserable?

Because of the reason you pretend for using it.

[...]

Bubble-sorting even 1000 strings takes about 10ms, which is small
fraction of overall build-time of a library exporting 1000
functions.

For a mere 100 functions, sort time is negligible.

And because of reason here based on arbitrary absolute times
instead of actual algorithmic complexity.

No, timing is considered relative to the rest of the task.

If this ever became a bottleneck then it is a trivial upgrade to a
better routine.

If there are simple better algorithms there's no reason but
ignorance to not use them in the first place! - It's really
stupid to deliberately use inferior, or (as here) even the
worst existing algorithms.

Bubble sort is not the worst existing, far from it. It is O(n**2).
Some algos, which names I don't remember are O(n**3).
The problem with bubble sort is that relatively to other popular
O(n**2) sorting algorithms, i.e. using Knuth's nomenclature, Straight
Insertion and Straight Selection, it is not just measurably slower,
but also a little more complicated to code.
However it has one pro point to it: it is very fast when applied to
almost sorted data sets.

- What do professionals? - Let's
have a look at a Quicksort implementation on a CDC mainframe
back in the 1980's. They implemented it in a way that scales
according to the CS state-of-the-art; divide-et-impera for
the Quicksort, pick pivot element from a set of three, and
sort divided runs of length smaller than 10 with a straight
insertion sort algorithm. -

All that is good, except of use of CDC mainframe in 1980s, which I'd
consider more sub-optimal choice than bubble sort algorithm.

No educated computer scientist
would have decided to choose Bubblesort in any place here.
Let alone for your "reason" that this algorithm is "derided".
There's a reason why it's derided; that's called scientific
analysis of its quality.

Janis

[...]

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Mar 18 10:49:45 2026

From Newsgroup: comp.lang.c

On 2026-03-18 10:21, Michael S wrote:

Bubble sort is not the worst existing, far from it. It is O(n**2).

It is the worst. It's used in academia to show the most primitive
algorithm and compare all others against it (as the lowest bound).

Some algos, which names I don't remember are O(n**3).

If you don't put deliberately unnecessary code into the algorithm
you will hardly find one - the question is; why would you do that!

A commonly known algorithm with exponential complexity was/is the
Shellsort (with n**1.2, IIRC). This is a bit of an outlier since
most good algorithms are O(N log N), and the O(N²) used just for
special cases. The only case where Bubblesort is acceptable is in
the corner case that your data is anyway already or almost sorted.

[...]

All that is good, except of use of CDC mainframe in 1980s,

What was wrong with the CDC 175/176 supercomputers? (These were
amongst the fastest back then! - Don't you remember one the first
chess matches against grand masters during the 1980's?)

which I'd
consider more sub-optimal choice than bubble sort algorithm.

There's hardly a reason to use suboptimal algorithms when there's
plenty good ones available (even if you have to invest a few more
bytes to program the standard algorithms).

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Wed Mar 18 11:20:03 2026

From Newsgroup: comp.lang.c

On 18/03/2026 10:21, Michael S wrote:

On Wed, 18 Mar 2026 02:40:50 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-03-17 13:37, Bart wrote:

On 17/03/2026 12:08, Janis Papanagnou wrote:

On 2026-03-17 12:47, Bart wrote:

On 17/03/2026 04:38, Janis Papanagnou wrote:

On 2026-03-17 01:29, Bart wrote:

(Also I quite like using bubble sort because so many deride
it!)

How miserable! (I feel so sorry for you.)

Why would that be miserable?

Because of the reason you pretend for using it.

[...]

Bubble-sorting even 1000 strings takes about 10ms, which is small
fraction of overall build-time of a library exporting 1000
functions.

For a mere 100 functions, sort time is negligible.

And because of reason here based on arbitrary absolute times
instead of actual algorithmic complexity.

No, timing is considered relative to the rest of the task.

If this ever became a bottleneck then it is a trivial upgrade to a
better routine.

If there are simple better algorithms there's no reason but
ignorance to not use them in the first place! - It's really
stupid to deliberately use inferior, or (as here) even the
worst existing algorithms.

Bubble sort is not the worst existing, far from it. It is O(n**2).
Some algos, which names I don't remember are O(n**3).

Bogosort is O(n * n!) on average - keep rearranging the elements at
random, then check if they are sorted. (The worst case is only bounded complexity if your random generator is pseduo-random rather than truly random.)

O(n²) worst-case is not uncommon, even amongst sorting algorithms that
are quite smart, like quicksort.

The problem with bubble sort is that relatively to other popular
O(n**2) sorting algorithms, i.e. using Knuth's nomenclature, Straight Insertion and Straight Selection, it is not just measurably slower,
but also a little more complicated to code.

I am not convinced on the complexity point - but it will vary by
programming language, and what you consider "complicated".

However it has one pro point to it: it is very fast when applied to
almost sorted data sets.

It is also stable, has no memory overhead, exchanges pairs of data
in-place, and only ever exchanges adjacent data. These can also be
advantages in some situations. (Of course there are alternative sorting algorithms that have these same advantages and are, at least usually,
more efficient.)

I don't think pure bubblesort has much serious use outside educational purposes, but it is easy to understand, easy to implement correctly in
pretty much any language, and can do a perfectly good job if you don't
need efficient sorting of large datasets (for small enough datasets, a hand-written bubblesort in C will be faster than calling qsort).

But I also think people sometimes jump to simplistic views of
efficiency, as though a sorting algorithm can be boiled down to just a
single "O(f(n))" complexity. With enough data, things like cache
coherency matter more than operation counts - and with little data,
efficiency often doesn't matter at all.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Wed Mar 18 15:10:49 2026

From Newsgroup: comp.lang.c

On 18/03/2026 09:49, Janis Papanagnou wrote:

On 2026-03-18 10:21, Michael S wrote:

Bubble sort is not the worst existing, far from it. It is O(n**2).

It is the worst. It's used in academia to show the most primitive
algorithm and compare all others against it (as the lowest bound).

My use of it is within a compiler that can approach a million lines per
second throughput.

Yet when I point out that most other compilers are far slower, even for
the same standard of generated code, then that's perfectly fine!

c:\mx>tim .\mm -time -dll big\fann4x
Compiling big\fann4x.m to big\fann4x.dll
...
-----------------------------
Total: 745 ms 100.0 % # internal compile time
Time: 0.817 # overall

Test input is 740Kloc, 10,000 functions, of which 1,000 are exported to
the DLL. DLL binary is 5.6MB.

Those 1000 function names are sorted via bubble-sort, which will be
something over 10ms of total compile-time.

This is the equivalent in C, using gcc:

c:\cx\big>tim gcc -shared fann4x.c -s -o fann4x.dll
Time: 44.067

1000 of the 10000 functions are not static, which here is sufficient to
export them from the DLL. DLL binary is 9.6MB.

(I couldn't test with TCC; it won't export unless
'__declspec(dllexport)' is applied, but this generated errors. However
the time for a version for 100% static functions was 0.86s, slower than
mine, although the C code has a higher line count.)

It might be hard to believe, but perhaps I know what I'm doing!

I like to use the simplest possible algorithms unless there's a pressing reason to use something more elaborate.

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Wed Mar 18 12:40:36 2026

From Newsgroup: comp.lang.c

On 3/17/2026 12:21 AM, Janis Papanagnou wrote:

On 2026-03-17 01:49, Tristan Wibberley wrote:

On 16/03/2026 23:07, DFS wrote:

import sys
names = []
longname = 0
with open(sys.argv[1],'r') as f:
     for line in f:
         line = line.replace('"','')
         e = [i.strip() for i in line.split(' ') if i]
         ln = len(e[0]) + len(e[1])
         if ln > longname: longname = ln
         names.append((e[1], e[0], e[2]))
for i,n in enumerate(sorted(names)):
     print("%2d. %-*s %s" % (i+1, longname+2, n[1]+' '+n[0], n[2])) >>>

look more clumsy, a bit like that Python script above.)

You misspelled beautiful.

You can put makeup on most code by using more descriptive names and
lining things up:

import sys
namelist = []
longname = 0
with open(sys.argv[1],'r') as file:
for line in file:
line = line.replace('"','')
elements = [parts.strip() for parts in line.split(' ') if parts]
thisname = len(elements[0]) + len(elements[1])
if thisname > longname: longname = thisname
namelist.append((elements[1], elements[0], elements[2]))
for count, name in enumerate(sorted(namelist)):
print("%2d. %-*s %s" % (count+1, longname+2, name[1] +' '+ name[0], name[2]))

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Wed Mar 18 17:06:47 2026

From Newsgroup: comp.lang.c

On 18/03/2026 16:40, DFS wrote:

On 3/17/2026 12:21 AM, Janis Papanagnou wrote:

On 2026-03-17 01:49, Tristan Wibberley wrote:

On 16/03/2026 23:07, DFS wrote:

import sys
names = []
longname = 0
with open(sys.argv[1],'r') as f:
     for line in f:
         line = line.replace('"','')
         e = [i.strip() for i in line.split(' ') if i]
         ln = len(e[0]) + len(e[1])
         if ln > longname: longname = ln
         names.append((e[1], e[0], e[2]))
for i,n in enumerate(sorted(names)):
     print("%2d. %-*s %s" % (i+1, longname+2, n[1]+' '+n[0], n[2])) >>>>

look more clumsy, a bit like that Python script above.)

You misspelled beautiful.

You can put makeup on most code by using more descriptive names and
lining things up:

import sys
namelist = []
longname = 0
with open(sys.argv[1],'r') as file:
    for line in file:
        line     = line.replace('"','')
        elements = [parts.strip() for parts in line.split(' ') if parts]
        thisname = len(elements[0]) + len(elements[1])
        if thisname > longname: longname = thisname
        namelist.append((elements[1], elements[0], elements[2]))
for count, name in enumerate(sorted(namelist)):
    print("%2d. %-*s %s" % (count+1, longname+2, name[1] +' '+ name[0], name[2]))

I don't think that helps! It's exactly the same, somewhat indigestible
lump of code; the longer names just obscure things more.

Some blank lines wouldn't have gone amiss; those don't contribute to line-count.

However, the approach used could be different. The input is something
like this, bounded by <>:

<..."abc def"..."ghi"...>

the ... represent unwanted white space.

Your first step is to get rid of those ", to end up with:

<...abc def...ghi...>

But now that white space, which had been neatly excluded by the quotes, because part of the content and needs dealing with.

Maybe also, the tel no contain embedded spaces, or maybe the name is
only a first or last name, or there is a middle name. So removing the
quotes also removed the demarcation between name and tel no.

In my scripted version, the 'read' process uses the quotes to delimit
each of the two items; it will be read <abc def> and <ghi>.
--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Wed Mar 18 15:46:39 2026

From Newsgroup: comp.lang.c

On 3/18/2026 1:06 PM, Bart wrote:

On 18/03/2026 16:40, DFS wrote:

On 3/17/2026 12:21 AM, Janis Papanagnou wrote:

On 2026-03-17 01:49, Tristan Wibberley wrote:

On 16/03/2026 23:07, DFS wrote:

import sys
names = []
longname = 0
with open(sys.argv[1],'r') as f:
     for line in f:
         line = line.replace('"','')
         e = [i.strip() for i in line.split(' ') if i]
         ln = len(e[0]) + len(e[1])
         if ln > longname: longname = ln
         names.append((e[1], e[0], e[2]))
for i,n in enumerate(sorted(names)):
     print("%2d. %-*s %s" % (i+1, longname+2, n[1]+' '+n[0], n[2])) >>>>>

look more clumsy, a bit like that Python script above.)

You misspelled beautiful.

You can put makeup on most code by using more descriptive names and
lining things up:

import sys
namelist = []
longname = 0
with open(sys.argv[1],'r') as file:
     for line in file:
         line     = line.replace('"','')
         elements = [parts.strip() for parts in line.split(' ') if parts]
         thisname = len(elements[0]) + len(elements[1])
         if thisname > longname: longname = thisname
         namelist.append((elements[1], elements[0], elements[2]))
for count, name in enumerate(sorted(namelist)):
     print("%2d. %-*s %s" % (count+1, longname+2, name[1] +' '+
name[0], name[2]))

I don't think that helps!

It helps a lot!

It goes from above avg python/pseudocode to great python/pseudocode.

It really can't be improved. I'm not even sure it can be shortened and
still maintain the same functionality.

Maybe I could combine the .replace() and the .split() to save one line,
but it's not worth it.

elements = [parts.strip() for parts in line.replace('"','').split(' ')
if parts]

Didn't try it.

You can construct fantastic one-liners in python, but like a long regex they're more trouble to write and read than they're worth.

It's exactly the same, somewhat indigestible lump of code;

eh? It's rare that python code is labeled "indigestible".

If it was perl or Montero C++ you'd be correct.

I notice you didn't deliver any Q code. Why not?

the longer names just obscure things more.

The longer names make it even better. I should use long names all the time.

But maxcolumnwidth is too long, and mcw is too short and
non-descriptive, so I'll typically use maxcolwidth, etc. It's a tradeoff.

Some blank lines wouldn't have gone amiss; those don't contribute to line-count.

Here I didn't test for blank lines. They didn't exist in the input, and rarely exist in real life.

However, the approach used could be different. The input is something
like this, bounded by <>:

<..."abc def"..."ghi"...>

the ... represent unwanted white space.

Your first step is to get rid of those ", to end up with:

<...abc def...ghi...>

Yes.

But now that white space, which had been neatly excluded by the quotes, because part of the content and needs dealing with.

Now that the quotes are gone, the python strip() command inside the list comprehension works right:

elements = [parts.strip() for parts in line.split(' ') if parts]

That gives me the 3 elements - first, last, phone - stripped of white space

If the quote marks are still in place they become part of the names and
phone.

Maybe also, the tel no contain embedded spaces, or maybe the name is
only a first or last name, or there is a middle name. So removing the
quotes also removed the demarcation between name and tel no.

Now you're changing the input format or the transformation spec or both.

In my scripted version, the 'read' process uses the quotes to delimit
each of the two items; it will be read <abc def> and <ghi>.

Then you can't sort by last-first, but output by first-last.

So to match what I did here your read process needs to parse it into:
<abc> <def> <ghi>.

I did it exactly right, in the right order, and I bet you will too.

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Wed Mar 18 16:14:07 2026

From Newsgroup: comp.lang.c

On 3/16/2026 8:49 PM, Tristan Wibberley wrote:

I can do something that works with the provided data with util-linux,
sed, awk, and coreutils:

{ sed 's/"//g' | awk '{print $1 " " $2 ":" $3}' | sort -k2 | column -t
-s: | nl -s". " ; } < data

where the file "data" contains the original list. Although there will be
some differences in behaviour for some other data and I think your
python and the above will do unintuitive things with other data.

Emma Bauer:0157-9988776
Tom Bauer:0171-1122334
Tim Becker:0151-1112223
Anna Becker:0170-2233445
Marie Becker:040-4455667
Michael Braun:0170-9988776
Sarah Braun:040-7788990

You need numbering, and the names should remain together, and the phones left-aligned.

1. Emma Bauer 0157-9988776
2. Tom Bauer 0171-1122334
3. Tim Becker 0151-1112223
4. Anna Becker 0170-2233445
5. Marie Becker 040-4455667
6. Michael Braun 0170-9988776
7. Sarah Braun 040-7788990

The pipeline goes as follows:
sed strips quotes
awk makes two delimited columns
sort sorts on surnames, and where surnames are identical on telno
column presents columns with a visual model
nl adds number prefixes

If I really try and use tee and fifos I can use many fewer tools.

None of these are really good languages/toolsets for the task. A DSL
with a data schema would be best.

Perl? ugh.

Python ftw: easy, readable, and short enough.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Mar 18 21:57:38 2026

From Newsgroup: comp.lang.c

On 2026-03-18 11:20, David Brown wrote:

On 18/03/2026 10:21, Michael S wrote:

Bubble sort is not the worst existing, far from it. It is O(n**2).
Some algos, which names I don't remember are O(n**3).

Bogosort is O(n * n!) on average - keep rearranging the elements at
random, then check if they are sorted. (The worst case is only bounded complexity if your random generator is pseduo-random rather than truly random.)

Just to make clear; the key point is that it makes no sense to
implement sorting algorithms of complexities worse that O(N²).
With that complexity you can compare and move every element to
be sorted with any other element.[*] So Michael's hypothetical
O(N**3) algorithm would be just nonsense (or ignorance) for
any sensible application in practice.

[*] It would otherwise be like implementing some operations on
linear lists with an algorithm worse than O(N). Depending on
the actual function you usually strive for O(log N) or O(1) by
organizing your data appropriately, but never worse than O(N).

O(n²) worst-case is not uncommon, even amongst sorting algorithms that
are quite smart, like quicksort.

Yes. It's indeed not widely known that one of the [practically]
fastest algorithm is that "bad" (concerning actual complexity).
The point is, for one, that it's on average much better; just
in very special cases it's getting quadratic.[**] And the other
point is that you can manage its complexity by means mentioned
in my previous post (like using a medium-of-three pivot element,
for example). (Quicksort has in that respect been thoroughly
examined already in the 1980's and various optimizations have
been developed.)

[**] As opposed to Bubblesort that shows acceptable behavior
just in the very rare special case of already sorted elements.
But then one should consider to not *sort* the data in the first
place but *insert* a new element at the right place to keep the
sorting order.

The problem with bubble sort is that relatively to other popular
O(n**2) sorting algorithms, i.e. using Knuth's nomenclature, Straight
Insertion and Straight Selection, it is not just measurably slower,
but also a little more complicated to code.

I am not convinced on the complexity point - but it will vary by
programming language, and what you consider "complicated".

If you're lucky you use a programming language where you don't
have to implement basic things like the sorting algorithm.[***]
My expectation is that standard library functions would support
sophisticated efficient O(N log N) algorithms out of the box.

[***] Here Bonita Montero with his enthusiasm for C++ has a point
that cannot be derived. I would also not be surprised if qsort(3)
from the standard "C" lib would be based on Quicksort or Quicksort
hybrids (but I've never checked). In C++'s STL you get complexities
of algorithms guaranteed at least; that's sensible CS/IT software
design!

Janis

[...]

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Mar 18 22:01:13 2026

From Newsgroup: comp.lang.c

On 2026-03-18 21:57, Janis Papanagnou wrote:

[***] Here Bonita Montero with his enthusiasm for C++ has a point
that cannot be derived.

...that cannot be derided. [And should rather be esteemed.]

Janis

[...]

--- Synchronet 3.21d-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Wed Mar 18 21:20:21 2026

From Newsgroup: comp.lang.c

Bart <bc@freeuk.com> wrote:

On 18/03/2026 09:49, Janis Papanagnou wrote:

On 2026-03-18 10:21, Michael S wrote:

Bubble sort is not the worst existing, far from it. It is O(n**2).

It is the worst. It's used in academia to show the most primitive
algorithm and compare all others against it (as the lowest bound).

My use of it is within a compiler that can approach a million lines per second throughput.

Yet when I point out that most other compilers are far slower, even for
the same standard of generated code, then that's perfectly fine!

c:\mx>tim .\mm -time -dll big\fann4x
Compiling big\fann4x.m to big\fann4x.dll
...
-----------------------------
Total: 745 ms 100.0 % # internal compile time
Time: 0.817 # overall

Test input is 740Kloc, 10,000 functions, of which 1,000 are exported to
the DLL. DLL binary is 5.6MB.

Those 1000 function names are sorted via bubble-sort, which will be something over 10ms of total compile-time.

This is the equivalent in C, using gcc:

c:\cx\big>tim gcc -shared fann4x.c -s -o fann4x.dll
Time: 44.067

1000 of the 10000 functions are not static, which here is sufficient to export them from the DLL. DLL binary is 9.6MB.

Well, some day will come Bart^2 and feed your compiler with different
data. One on my codebases is about 210 k wc lines (about 120 k sloc).
There is about 15000 functions there, of which about 8000 is exported.
So, the thing is much smaller than your test file, but has much
more exported functions. If you scale it keeping the same
average function size and proportion of exported functions you
will quickly arrive at cases when this single sort dominates.

FYI, I have an old Modula-2 to C translater (written in late eighties
by a team in Karlsruche). It is reasonably fast handling 10000 lines,
it is hard to exactly measure execution time, but speed is
somewhere between 150 klps and 750 klps (not as fast as your
compiler, but quite respectable IMO). On smaller files (and possibly
this one) its execution time seem to be dominated by startup time.
But it slows down on bigger files. For curiosity I tracked the
problem and it is in handling of symbol table. While most
things use asymptoticaly fast algorithms parts that looks up
symbols is quadratic and this single thing dominates runtime.

I like to use the simplest possible algorithms unless there's a pressing reason to use something more elaborate.

Heapsort is only marginaly more complex than bubble sort and has N*log(N)
worst case complexity. On average heapsort is not as fast as quicksort
(main factor seem to by much worse cache behaviour), but IMO is pretty
good choice is average speed is not critical but you want to avoid
catastrophic blow up of execution time on some data.

Merge sort sort is pretty simple too. There is a variation
of shell sort and buble sort called "comb sort". It performs
predetermined number of passes like bubble sort, but with
bigger and decreasing steps, once step is 1 it works as
usual bubble sort. Authors of "comb sort" claimed that
its performance is quite good, but of course, like quicksort
it may be quadratic on some data.
--
Waldek Hebisch
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Wed Mar 18 22:14:22 2026

From Newsgroup: comp.lang.c

On 18/03/2026 19:46, DFS wrote:

On 3/18/2026 1:06 PM, Bart wrote:

It helps a lot!

It goes from above avg python/pseudocode to great python/pseudocode.

It really can't be improved. I'm not even sure it can be shortened and still maintain the same functionality.

Maybe I could combine the .replace() and the .split() to save one line,
but it's not worth it.

elements = [parts.strip() for parts in line.replace('"','').split(' ')
if parts]

Didn't try it.

You can construct fantastic one-liners in python, but like a long regex they're more trouble to write and read than they're worth.

It's exactly the same, somewhat indigestible lump of code;

eh? It's rare that python code is labeled "indigestible".

If it was perl or Montero C++ you'd be correct.

I notice you didn't deliver any Q code. Why not?

I thought I did, maybe you didn't see it, so I've repeated my post
below. It was in response to your 12-line Python.

(Note that I'm not saying that is much more digestible than the Python.
Only a little!)

Some blank lines wouldn't have gone amiss; those don't contribute to
line-count.

Here I didn't test for blank lines. They didn't exist in the input, and rarely exist in real life.

I meant in the Python code.

In my scripted version, the 'read' process uses the quotes to delimit
each of the two items; it will be read <abc def> and <ghi>.

Then you can't sort by last-first, but output by first-last.

Your Python seems to assume a format like this:

..."bart simpson"..."1-800-123-4567"

That is, exactly two parts to the name (as in, separated with white
space), and one part to the phone number.

In the UK, telephone numbers have spaces. But that shouldn't matter:
those quotes exactly indicate which part is the name, and which part is
the number. But you discard that useful data too early.

----------------------

Here is my post first sent 17-Mar-26 at 12:31 GMT; I've added some white space:

--------------------------------------------------

Huh. I can just about do it in 11 lines:

maxlen:=0
names::=()

for x in readtextfile("abc") do
readln @x, name:"s", tel:"s"
(c, s) := splitstring(name, " ")
maxlen max:= name.len
names &:= (s, c, tel)
end

for i, x in sort(names) do
fprintln "#. # #", i:"2", x[2]+" "+x[1]:tostr(maxlen)+"jl", x[3]
end

Python has an advantage in not needing those 'end' lines! Otherwise my solution would be 9 lines (obviously still a long way from a one-liner
like Awk).

However mine is missing a couple of things: 'name.len' needs to be
replaced by a function that counts UTF8 characters. And 'sort' currently
is another custom routine that sorts only the surname.

(Python's 'sorted' presumably compares all elements LTR.)

But these lie outside the main program which remains at 11 lines.

(It's not clear how the Python would work when there is more than one
first name.)

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Wed Mar 18 23:13:43 2026

From Newsgroup: comp.lang.c

On 18/03/2026 21:20, Waldek Hebisch wrote:

Bart <bc@freeuk.com> wrote:

On 18/03/2026 09:49, Janis Papanagnou wrote:

On 2026-03-18 10:21, Michael S wrote:

Bubble sort is not the worst existing, far from it. It is O(n**2).

It is the worst. It's used in academia to show the most primitive
algorithm and compare all others against it (as the lowest bound).

My use of it is within a compiler that can approach a million lines per
second throughput.

Yet when I point out that most other compilers are far slower, even for
the same standard of generated code, then that's perfectly fine!

c:\mx>tim .\mm -time -dll big\fann4x
Compiling big\fann4x.m to big\fann4x.dll
...
-----------------------------
Total: 745 ms 100.0 % # internal compile time
Time: 0.817 # overall

Test input is 740Kloc, 10,000 functions, of which 1,000 are exported to
the DLL. DLL binary is 5.6MB.

Those 1000 function names are sorted via bubble-sort, which will be
something over 10ms of total compile-time.

This is the equivalent in C, using gcc:

c:\cx\big>tim gcc -shared fann4x.c -s -o fann4x.dll
Time: 44.067

1000 of the 10000 functions are not static, which here is sufficient to
export them from the DLL. DLL binary is 9.6MB.

Well, some day will come Bart^2 and feed your compiler with different
data. One on my codebases is about 210 k wc lines (about 120 k sloc).
There is about 15000 functions there, of which about 8000 is exported.
So, the thing is much smaller than your test file, but has much
more exported functions. If you scale it keeping the same
average function size and proportion of exported functions you
will quickly arrive at cases when this single sort dominates.

As I say, I'd just update that. But I did try this test program to see
what would happen:

#include <stdio.h>
void yjrsrh(void) {puts("yjrsrh");}
....
void ofcpxg(void) {puts("ofcpxg");}

This is 8001 lines and exports 8000 functions.

I used my C compiler (which uses the same backend as my other one), and
sure enough it was very slow: it took 2 seconds to produce the DLL,
nearly all spent sorting.

However, 50% of sort-time is spent repeatedly extracting the base-names
of my 'decorated' names, used for comparisons. This could just done once
at the start, but since this is C and names are not decorated, that
conversion is not needed anyway and can be disabled.

So it takes 1 second for this test, using bubble-sort.

I then tried it with gcc:

gcc -s c.c -shared -o c.dll

This took 4.5 seconds! It was still slower despite my slow sort.

Further, the DLL produced was 650KB compared with my 360KB.

I can reduce the size using -Os, but only down to 430KB, and it now took
9.6 seconds.

FYI, I have an old Modula-2 to C translater (written in late eighties
by a team in Karlsruche). It is reasonably fast handling 10000 lines,
it is hard to exactly measure execution time, but speed is
somewhere between 150 klps and 750 klps (not as fast as your
compiler, but quite respectable IMO).

It's pretty good.

On smaller files (and possibly
this one) its execution time seem to be dominated by startup time.
But it slows down on bigger files. For curiosity I tracked the
problem and it is in handling of symbol table. While most
things use asymptoticaly fast algorithms parts that looks up
symbols is quadratic and this single thing dominates runtime.

The NASM assembler has something wrong like that. You only see it on
large files though.

If I create a c. 270Kloc ASM file (from sql.c) in different formats,
then these are the assembly times I get from various assemblers:

AA 0.12 seconds (my assembler; unoptimised)
AS 0.87 seconds
YASM 1.15 seconds (mostly NASM-compatible)
NASM 232 seconds (using -O0)
NASM 375 seconds (defaults)

There's about 3000 times difference between fastest and slowest. It's obviously not right!

I like to use the simplest possible algorithms unless there's a pressing
reason to use something more elaborate.

Heapsort is only marginaly more complex than bubble sort

The thing is I can write a bubble sort without needing to think about
it. I need to look up those other sorts, or modify one I have lying around.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Thu Mar 19 11:19:59 2026

From Newsgroup: comp.lang.c

On Wed, 18 Mar 2026 11:20:03 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 18/03/2026 10:21, Michael S wrote:

On Wed, 18 Mar 2026 02:40:50 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-03-17 13:37, Bart wrote:

On 17/03/2026 12:08, Janis Papanagnou wrote:

On 2026-03-17 12:47, Bart wrote:

On 17/03/2026 04:38, Janis Papanagnou wrote:

On 2026-03-17 01:29, Bart wrote:

(Also I quite like using bubble sort because so many deride
it!)

How miserable!� (I feel so sorry for you.)

Why would that be miserable?

Because of the reason you pretend for using it.

[...]

Bubble-sorting even 1000 strings takes about 10ms, which is
small fraction of overall build-time of a library exporting 1000
functions.

For a mere 100 functions, sort time is negligible.

And because of reason here based on arbitrary absolute times
instead of actual algorithmic complexity.

No, timing is considered relative to the rest of the task.

If this ever became a bottleneck then it is a trivial upgrade to a
better routine.

If there are simple better algorithms there's no reason but
ignorance to not use them in the first place! - It's really
stupid to deliberately use inferior, or (as here) even the
worst existing algorithms.

Bubble sort is not the worst existing, far from it. It is O(n**2).
Some algos, which names I don't remember are O(n**3).

Bogosort is O(n * n!) on average - keep rearranging the elements at
random, then check if they are sorted. (The worst case is only
bounded complexity if your random generator is pseduo-random rather
than truly random.)

I didn't mean something intentionally crippled.
I remember seeing examples of worse than O(n**2) algorithms that at
first glance look reasonable. I don't remember details.

O(n�) worst-case is not uncommon, even amongst sorting algorithms
that are quite smart, like quicksort.

The problem with bubble sort is that relatively to other popular
O(n**2) sorting algorithms, i.e. using Knuth's nomenclature,
Straight Insertion and Straight Selection, it is not just
measurably slower, but also a little more complicated to code.

I am not convinced on the complexity point - but it will vary by
programming language, and what you consider "complicated".

The language is 'C'.
Here are implementations of Straight Insertion and Straight Select.
Show me implementation of Bubble sort that is not at least a little
more complicated.
Remember that it has to be "real" bubble sort, not a simplified bubble
sort that does unnecessary work by starting each time from the
beginning. Your variant shall avoid obviously unnecessary work and
shall be opportunistic, i.e. quick at handling almost sorted cases.
#include <stddef.h>
#include <string.h>
void straight_insertion_sort(const char** buffer, size_t n)
{
for (size_t i = 1; i < n; ++i) {
const char* a = buffer[i];
size_t k;
for (k = i; k != 0; --k) {
if (!(strcmp(a, buffer[k-1]) < 0))
break;
buffer[k] = buffer[k-1];
}
buffer[k] = a;
}
}
void straight_select_sort(const char** buffer, size_t n)
{
for (size_t i = 1; i < n; ++i) {
const char* mn = buffer[i];
size_t mn_k = i;
for (size_t k = i+1; k < n; ++k) {
if (strcmp(buffer[k], mn) < 0) {
mn = buffer[k];
mn_k = k;
}
}
buffer[mn_k] = buffer[i];
buffer[i] = mn;
}
}

However it has one pro point to it: it is very fast when applied to
almost sorted data sets.

It is also stable, has no memory overhead, exchanges pairs of data
in-place, and only ever exchanges adjacent data. These can also be advantages in some situations. (Of course there are alternative
sorting algorithms that have these same advantages and are, at least
usually, more efficient.)

I don't think pure bubblesort has much serious use outside
educational purposes, but it is easy to understand, easy to implement correctly in pretty much any language, and can do a perfectly good
job if you don't need efficient sorting of large datasets (for small
enough datasets, a hand-written bubblesort in C will be faster than
calling qsort).

IMHO, Bubble Sort is harder to understand then Straight Select Sort.

But I also think people sometimes jump to simplistic views of
efficiency, as though a sorting algorithm can be boiled down to just
a single "O(f(n))" complexity. With enough data, things like cache coherency matter more than operation counts - and with little data, efficiency often doesn't matter at all.

Cache coherency does not matter until you start to parallelize.
Things that even at moderate N could matter more than operation counts
are predictability of branches and locality of of array access.
But it applies only to cases when key comparison is quick and records
are small. When either of two conditions does not hold you are back at operation count (either # of comparisons or # of moves) as a main
bottleneck.
--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Thu Mar 19 10:43:08 2026

From Newsgroup: comp.lang.c

On 18/03/2026 21:57, Janis Papanagnou wrote:

On 2026-03-18 11:20, David Brown wrote:

On 18/03/2026 10:21, Michael S wrote:

Bubble sort is not the worst existing, far from it. It is O(n**2).
Some algos, which names I don't remember are O(n**3).

Bogosort is O(n * n!) on average - keep rearranging the elements at
random, then check if they are sorted. (The worst case is only
bounded complexity if your random generator is pseduo-random rather
than truly random.)

Just to make clear; the key point is that it makes no sense to
implement sorting algorithms of complexities worse that O(N²).
With that complexity you can compare and move every element to
be sorted with any other element.[*] So Michael's hypothetical
O(N**3) algorithm would be just nonsense (or ignorance) for
any sensible application in practice.

Yes, that's fair enough. You have to go out of your way to make a
sorting algorithm that is worse than O(n²), so they will not be found in
real code. But they certainly do exist. (It is not inconceivable that dedicated hardware solutions doing sorts with many comparisons in
parallel do more than O(n²) comparisons, in much less than O(n²) time.
But that's a different kind of system.)

[*] It would otherwise be like implementing some operations on
linear lists with an algorithm worse than O(N). Depending on
the actual function you usually strive for O(log N) or O(1) by
organizing your data appropriately, but never worse than O(N).

O(n²) worst-case is not uncommon, even amongst sorting algorithms that
are quite smart, like quicksort.

Yes. It's indeed not widely known that one of the [practically]
fastest algorithm is that "bad" (concerning actual complexity).
The point is, for one, that it's on average much better; just
in very special cases it's getting quadratic.[**] And the other
point is that you can manage its complexity by means mentioned
in my previous post (like using a medium-of-three pivot element,
for example). (Quicksort has in that respect been thoroughly
examined already in the 1980's and various optimizations have
been developed.)

Yes. "Pure" quicksort is O(n²) worst case, and it hits its worst case
when the list is already sorted, so it did not take long for variations
to be developed.

[**] As opposed to Bubblesort that shows acceptable behavior
just in the very rare special case of already sorted elements.

In many applications, already sorted or nearly sorted can be the usual
case, not a rarity. If you want an efficient sorting algorithm for a particular task, you need to know the details of the task.

But then one should consider to not *sort* the data in the first
place but *insert* a new element at the right place to keep the
sorting order.

That can sometimes be the best strategy.

The problem with bubble sort is that relatively to other popular
O(n**2) sorting algorithms, i.e. using Knuth's nomenclature, Straight
Insertion and Straight Selection, it is not just measurably slower,
but also a little more complicated to code.

I am not convinced on the complexity point - but it will vary by
programming language, and what you consider "complicated".

If you're lucky you use a programming language where you don't
have to implement basic things like the sorting algorithm.[***]
My expectation is that standard library functions would support
sophisticated efficient O(N log N) algorithms out of the box.

Usually, yes. But sometimes that is not the fastest tool for the job.

I've had occasion to need to sort arrays of numbers (ints or floats)
with a compile-time fixed small size - such as 4 or 6 entries. While I
did not use bubblesort, a bubblesort would have beaten standard C
library qsort by an order of magnitude.

The fun of sorting is that there is no single perfect algorithm that is
always the best choice.

[***] Here Bonita Montero with his enthusiasm for C++ has a point
that cannot be derived. I would also not be surprised if qsort(3)
from the standard "C" lib would be based on Quicksort or Quicksort
hybrids (but I've never checked). In C++'s STL you get complexities
of algorithms guaranteed at least; that's sensible CS/IT software
design!

C++'s sorts have the advantage of being template based, and thus can be
more efficient than generic memcpy() and comparison functions for C's
qsort. Usually, at least for larger datasets, sorts are hybrid
algorithms, switching between things like quicksort, insertion sort and
heap sort at different stages in order to maximise cache hits and to get
the balance between the "O" complexity and the constants in the O
function. (By that I mean that if you have two algorithms with timing
of the form "a.n² + b.n + c", they are both O(n²) and for big n then the factor "a" is all-important. But for small n, b and c can be dominant.)

--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Thu Mar 19 10:49:40 2026

From Newsgroup: comp.lang.c

On 19/03/2026 10:19, Michael S wrote:

On Wed, 18 Mar 2026 11:20:03 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 18/03/2026 10:21, Michael S wrote:

On Wed, 18 Mar 2026 02:40:50 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-03-17 13:37, Bart wrote:

On 17/03/2026 12:08, Janis Papanagnou wrote:

On 2026-03-17 12:47, Bart wrote:

On 17/03/2026 04:38, Janis Papanagnou wrote:

On 2026-03-17 01:29, Bart wrote:

(Also I quite like using bubble sort because so many deride
it!)

How miserable! (I feel so sorry for you.)

Why would that be miserable?

Because of the reason you pretend for using it.

[...]

Bubble-sorting even 1000 strings takes about 10ms, which is
small fraction of overall build-time of a library exporting 1000 >>>>>>> functions.

For a mere 100 functions, sort time is negligible.

And because of reason here based on arbitrary absolute times
instead of actual algorithmic complexity.

No, timing is considered relative to the rest of the task.

If this ever became a bottleneck then it is a trivial upgrade to a
better routine.

If there are simple better algorithms there's no reason but
ignorance to not use them in the first place! - It's really
stupid to deliberately use inferior, or (as here) even the
worst existing algorithms.

Bubble sort is not the worst existing, far from it. It is O(n**2).
Some algos, which names I don't remember are O(n**3).

Bogosort is O(n * n!) on average - keep rearranging the elements at
random, then check if they are sorted. (The worst case is only
bounded complexity if your random generator is pseduo-random rather
than truly random.)

I didn't mean something intentionally crippled.
I remember seeing examples of worse than O(n**2) algorithms that at
first glance look reasonable. I don't remember details.

Now I am curious! Perhaps it was dependent on what you were counting as operations - some algorithms aim to minimise swaps or moves, even if it
costs more in comparisons or memory usage. Without having the details
in my head, I could believe that an algorithm that aimed to move each
element at most once could require more than O(n²) comparisons.

O(n²) worst-case is not uncommon, even amongst sorting algorithms
that are quite smart, like quicksort.

The problem with bubble sort is that relatively to other popular
O(n**2) sorting algorithms, i.e. using Knuth's nomenclature,
Straight Insertion and Straight Selection, it is not just
measurably slower, but also a little more complicated to code.

I am not convinced on the complexity point - but it will vary by
programming language, and what you consider "complicated".

The language is 'C'.
Here are implementations of Straight Insertion and Straight Select.
Show me implementation of Bubble sort that is not at least a little
more complicated.
Remember that it has to be "real" bubble sort, not a simplified bubble
sort that does unnecessary work by starting each time from the
beginning. Your variant shall avoid obviously unnecessary work and
shall be opportunistic, i.e. quick at handling almost sorted cases.

#include <stddef.h>
#include <string.h>

void straight_insertion_sort(const char** buffer, size_t n)
{
for (size_t i = 1; i < n; ++i) {
const char* a = buffer[i];
size_t k;
for (k = i; k != 0; --k) {
if (!(strcmp(a, buffer[k-1]) < 0))
break;
buffer[k] = buffer[k-1];
}
buffer[k] = a;
}
}

void straight_select_sort(const char** buffer, size_t n)
{
for (size_t i = 1; i < n; ++i) {
const char* mn = buffer[i];
size_t mn_k = i;
for (size_t k = i+1; k < n; ++k) {
if (strcmp(buffer[k], mn) < 0) {
mn = buffer[k];
mn_k = k;
}
}
buffer[mn_k] = buffer[i];
buffer[i] = mn;
}
}

However it has one pro point to it: it is very fast when applied to
almost sorted data sets.

It is also stable, has no memory overhead, exchanges pairs of data
in-place, and only ever exchanges adjacent data. These can also be
advantages in some situations. (Of course there are alternative
sorting algorithms that have these same advantages and are, at least
usually, more efficient.)

I don't think pure bubblesort has much serious use outside
educational purposes, but it is easy to understand, easy to implement
correctly in pretty much any language, and can do a perfectly good
job if you don't need efficient sorting of large datasets (for small
enough datasets, a hand-written bubblesort in C will be faster than
calling qsort).

IMHO, Bubble Sort is harder to understand then Straight Select Sort.

But I also think people sometimes jump to simplistic views of
efficiency, as though a sorting algorithm can be boiled down to just
a single "O(f(n))" complexity. With enough data, things like cache
coherency matter more than operation counts - and with little data,
efficiency often doesn't matter at all.

Cache coherency does not matter until you start to parallelize.

Sorry, I did not mean to write "cache coherency" - I meant to write
"cache locality". I must have been thinking of other things while
typing. Over a certain size, pretty much all algorithm efficiency is
about cache usage patterns.

Things that even at moderate N could matter more than operation counts
are predictability of branches and locality of of array access.
But it applies only to cases when key comparison is quick and records
are small. When either of two conditions does not hold you are back at operation count (either # of comparisons or # of moves) as a main
bottleneck.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Thu Mar 19 12:23:41 2026

From Newsgroup: comp.lang.c

On Thu, 19 Mar 2026 10:43:08 +0100
David Brown <david.brown@hesbynett.no> wrote:

C++'s sorts have the advantage of being template based, and thus can
be more efficient than generic memcpy() and comparison functions for
C's qsort.

The problem with C qsort is not just lack of efficiency. It's also
lacke of flexibilty at API level. In my pratice, in about 30% of the
cases sorting, C qsort will either didn't do what I want at all or at
very least would not do it in thread-safe manner. qsort_r() (POSIX ?)
is a more flexible API that should have been part of C Standard at least
since C11. But it still is not.

--- Synchronet 3.21d-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Thu Mar 19 13:28:54 2026

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-03-18 11:20, David Brown wrote:

On 18/03/2026 10:21, Michael S wrote:

Bubble sort is not the worst existing, far from it. It is O(n**2).
Some algos, which names I don't remember are O(n**3).

Bogosort is O(n * n!) on average - keep rearranging the elements at
random, then check if they are sorted. (The worst case is only bounded
complexity if your random generator is pseduo-random rather than truly
random.)

Just to make clear; the key point is that it makes no sense to
implement sorting algorithms of complexities worse that O(N²).

Well, in Prolog it is natural to implement sort in a way that
is equvalent to recursive search of ordering permuations.
And when you are sorting say 5 element lists it can work
quite well.

With that complexity you can compare and move every element to
be sorted with any other element.[*] So Michael's hypothetical
O(N**3) algorithm would be just nonsense (or ignorance) for
any sensible application in practice.

You assume O(1) access to elements. Take any O(N^2) array
sort and replace array by a list. Now your O(N^2) suddenly
becomes O(N^3). Note that this can easily happen in dynamicaly
typed languages with true lists: such languages tend to have
"element access" operation which is O(1) for arrays but O(N)
for lists. It can happen in C if you write a "generic" sort
which takes accessors as arguments (eiter function pointer
arguments or via appropriate macrology) and plug in list
operations as arguemnts.

[*] It would otherwise be like implementing some operations on
linear lists with an algorithm worse than O(N). Depending on
the actual function you usually strive for O(log N) or O(1) by
organizing your data appropriately, but never worse than O(N).

You know about lists, but somewhat ignore possibility of
combining lists with O(N^2) sort.
--
Waldek Hebisch
--- Synchronet 3.21d-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Thu Mar 19 14:09:10 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> wrote:

On 19/03/2026 10:19, Michael S wrote:

I didn't mean something intentionally crippled.
I remember seeing examples of worse than O(n**2) algorithms that at
first glance look reasonable. I don't remember details.

Now I am curious! Perhaps it was dependent on what you were counting as operations - some algorithms aim to minimise swaps or moves, even if it costs more in comparisons or memory usage. Without having the details
in my head, I could believe that an algorithm that aimed to move each element at most once could require more than O(n²) comparisons.

Depends on what you consider "move each element at most once". If
you allow only assignments of form "a[i] = a[j]" for element of
array that should be sorted, then AFAICS sort is impossible (first
move will destroy some element). It you allow 2 temporaries
to avoid destroying elements + a few administrative variables, then
simple O(N^2) sort will work. You can find desired position of
an element in O(N) comparisons without moving anything. You
go trough array looking for elements that are in wrong places.
If you have such an element you move it to right place, first
moving element occupying its position to a temporary. Then
you keep moving element in the temporary to right place
(and save displaced to the temporary). At some moment you
will move element to the slot freed by fist moved element,
then you stop movement loop and resume scanning for wrongly
placed element. Note that scanning loop looks once at each
element for total cost O(N^2). Each iteration of movement
loop moves one element to correct place, so there will be
at most N iteration, each of cost O(N). So total cost is O(N^2).

I allowed 2 temporaries because they get used in turns:
at most one temporary is used during search for position.
Then element at destination is stored in the free temporary.
Next, element from previousy used temporary is stored in the
destination freeing that temporary for subsequent use.
Given element participates in at most two assignments. More
precisely, element that are in right place need no assignment,
other need 2 assignments. I think that this is close to minimal
amount if you want fixed size extra storage. If you allow storage
for permutation (say size N array of pointers), you can first
produce needed permutation without moving anything in orignal
array. Then for each cycle you move one element of the cycle
to the temporary and then a single assigment for each other
element of the cycle will move it to the right place.

If you consider swaps as a single move you can implement fixed
storage version using just one temporary. But usual implementation
of swap needs 3 assignments and a temporary, so it does not look
like an improvement.
--
Waldek Hebisch
--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Thu Mar 19 15:22:47 2026

From Newsgroup: comp.lang.c

On 19/03/2026 11:23, Michael S wrote:

On Thu, 19 Mar 2026 10:43:08 +0100
David Brown <david.brown@hesbynett.no> wrote:

C++'s sorts have the advantage of being template based, and thus can
be more efficient than generic memcpy() and comparison functions for
C's qsort.

The problem with C qsort is not just lack of efficiency. It's also
lacke of flexibilty at API level. In my pratice, in about 30% of the
cases sorting, C qsort will either didn't do what I want at all or at
very least would not do it in thread-safe manner. qsort_r() (POSIX ?)
is a more flexible API that should have been part of C Standard at least since C11. But it still is not.

Yes, standard C qsort() has a lot of limitations. A thread-safe version
would fix one of these, but miss out on many others.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Thu Mar 19 14:49:16 2026

From Newsgroup: comp.lang.c

On 19/03/2026 09:19, Michael S wrote:

On Wed, 18 Mar 2026 11:20:03 +0100
David Brown <david.brown@hesbynett.no> wrote:

The problem with bubble sort is that relatively to other popular
O(n**2) sorting algorithms, i.e. using Knuth's nomenclature,
Straight Insertion and Straight Selection, it is not just
measurably slower, but also a little more complicated to code.

I am not convinced on the complexity point - but it will vary by
programming language, and what you consider "complicated".

The language is 'C'.
Here are implementations of Straight Insertion and Straight Select.
Show me implementation of Bubble sort that is not at least a little
more complicated.
Remember that it has to be "real" bubble sort, not a simplified bubble
sort that does unnecessary work by starting each time from the
beginning. Your variant shall avoid obviously unnecessary work and
shall be opportunistic, i.e. quick at handling almost sorted cases.

#include <stddef.h>
#include <string.h>

void straight_insertion_sort(const char** buffer, size_t n)
{
for (size_t i = 1; i < n; ++i) {
const char* a = buffer[i];
size_t k;
for (k = i; k != 0; --k) {
if (!(strcmp(a, buffer[k-1]) < 0))
break;
buffer[k] = buffer[k-1];
}
buffer[k] = a;
}
}

void straight_select_sort(const char** buffer, size_t n)
{
for (size_t i = 1; i < n; ++i) {

I think this needs to start from 0.

const char* mn = buffer[i];
size_t mn_k = i;
for (size_t k = i+1; k < n; ++k) {
if (strcmp(buffer[k], mn) < 0) {
mn = buffer[k];
mn_k = k;
}
}
buffer[mn_k] = buffer[i];
buffer[i] = mn;
}
}

I ported both of these to scripting code, then compared against the
simplest version of bubble-sort.

They were faster, by 2.7x and 3.6x respectively. But both still had O(n-squared) behaviour.

Now that I have copies, I may possible use them next time I might have
written a bubble-sort. But I would only have done that anyway when I
knew its performance was tolerable or negligible.

Normally however (and again in scripting code) I'd use my built-in
sort() based on quicksort, which is nearly 1000 times faster than
bubble-sort for my test (sort 20K random strings), and some 300x faster
than your routines. It's not O(n-squared) either.

So basically, if I don't care about speed or its not needed, I might as
well use bubble-sort.
--- Synchronet 3.21d-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Thu Mar 19 15:07:20 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> writes:

On 19/03/2026 11:23, Michael S wrote:

On Thu, 19 Mar 2026 10:43:08 +0100
David Brown <david.brown@hesbynett.no> wrote:

C++'s sorts have the advantage of being template based, and thus can
be more efficient than generic memcpy() and comparison functions for
C's qsort.

The problem with C qsort is not just lack of efficiency. It's also
lacke of flexibilty at API level. In my pratice, in about 30% of the
cases sorting, C qsort will either didn't do what I want at all or at
very least would not do it in thread-safe manner. qsort_r() (POSIX ?)
is a more flexible API that should have been part of C Standard at least
since C11. But it still is not.

Yes, standard C qsort() has a lot of limitations. A thread-safe version >would fix one of these, but miss out on many others.

On the other hand, qsort() works just fine for relatively small
data sets which I suspect is the common use for qsort (aside from
introductory programming exercises).
--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Thu Mar 19 17:09:43 2026

From Newsgroup: comp.lang.c

On Thu, 19 Mar 2026 14:49:16 +0000
Bart <bc@freeuk.com> wrote:

On 19/03/2026 09:19, Michael S wrote:

On Wed, 18 Mar 2026 11:20:03 +0100
David Brown <david.brown@hesbynett.no> wrote:

The problem with bubble sort is that relatively to other popular
O(n**2) sorting algorithms, i.e. using Knuth's nomenclature,
Straight Insertion and Straight Selection, it is not just
measurably slower, but also a little more complicated to code.

I am not convinced on the complexity point - but it will vary by
programming language, and what you consider "complicated".

The language is 'C'.
Here are implementations of Straight Insertion and Straight Select.
Show me implementation of Bubble sort that is not at least a little
more complicated.
Remember that it has to be "real" bubble sort, not a simplified
bubble sort that does unnecessary work by starting each time from
the beginning. Your variant shall avoid obviously unnecessary work
and shall be opportunistic, i.e. quick at handling almost sorted
cases.

#include <stddef.h>
#include <string.h>

void straight_insertion_sort(const char** buffer, size_t n)
{
for (size_t i = 1; i < n; ++i) {
const char* a = buffer[i];
size_t k;
for (k = i; k != 0; --k) {
if (!(strcmp(a, buffer[k-1]) < 0))
break;
buffer[k] = buffer[k-1];
}
buffer[k] = a;
}
}

void straight_select_sort(const char** buffer, size_t n)
{
for (size_t i = 1; i < n; ++i) {

I think this needs to start from 0.

Yes, sorry. My copy&past mistake.

const char* mn = buffer[i];
size_t mn_k = i;
for (size_t k = i+1; k < n; ++k) {
if (strcmp(buffer[k], mn) < 0) {
mn = buffer[k];
mn_k = k;
}
}
buffer[mn_k] = buffer[i];
buffer[i] = mn;
}
}

I ported both of these to scripting code, then compared against the
simplest version of bubble-sort.

They were faster, by 2.7x and 3.6x respectively. But both still had O(n-squared) behaviour.

For 'C', I'd expect the 1st method to be faster.
The 2nd method is preferable for sorting of big records by simple
keys.

Now that I have copies, I may possible use them next time I might
have written a bubble-sort. But I would only have done that anyway
when I knew its performance was tolerable or negligible.

Normally however (and again in scripting code) I'd use my built-in
sort() based on quicksort, which is nearly 1000 times faster than bubble-sort for my test (sort 20K random strings), and some 300x
faster than your routines. It's not O(n-squared) either.

So basically, if I don't care about speed or its not needed, I might
as well use bubble-sort.

The point is: both methods are simpler to code than bubble sort. In
case of 2nd method, also simpler to understand.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Mar 19 16:13:17 2026

From Newsgroup: comp.lang.c

Am 19.03.2026 um 11:23 schrieb Michael S:

The problem with C qsort is not just lack of efficiency. It's also
lacke of flexibilty at API level. In my pratice, in about 30% of the
cases sorting, C qsort will either didn't do what I want at all or at
very least would not do it in thread-safe manner. qsort_r() (POSIX ?)
is a more flexible API that should have been part of C Standard at least since C11. But it still is not.

If you're not allowed to use global variables you can use thread
-local variables. These also have fixed addresses. So at the end
qsort() is thread-safe.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Thu Mar 19 17:29:55 2026

From Newsgroup: comp.lang.c

On Thu, 19 Mar 2026 14:49:16 +0000
Bart <bc@freeuk.com> wrote:

On 19/03/2026 09:19, Michael S wrote:

On Wed, 18 Mar 2026 11:20:03 +0100
David Brown <david.brown@hesbynett.no> wrote:

Normally however (and again in scripting code) I'd use my built-in
sort() based on quicksort, which is nearly 1000 times faster than bubble-sort for my test (sort 20K random strings), and some 300x
faster than your routines. It's not O(n-squared) either.

For lexicographic sort of 20K random strings, plain quicksort is
probably quite sub-optimal.
If performance is important, I'd consider combined method: first
pre-sort by 3-char or 4-char prefix with Radix sort ('by LS character
first' variation of algorithm), then use quicksprt to sort sections
with the same prefix. For string taken from the real world it will not
work as well as for artificial random strings, but should still
significantly outperform plain quicksort.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Thu Mar 19 17:41:37 2026

From Newsgroup: comp.lang.c

On Thu, 19 Mar 2026 16:13:17 +0100
Bonita Montero <Bonita.Montero@gmail.com> wrote:

Am 19.03.2026 um 11:23 schrieb Michael S:

The problem with C qsort is not just lack of efficiency. It's also
lacke of flexibilty at API level. In my pratice, in about 30% of the
cases sorting, C qsort will either didn't do what I want at all or
at very least would not do it in thread-safe manner. qsort_r()
(POSIX ?) is a more flexible API that should have been part of C
Standard at least since C11. But it still is not.

If you're not allowed to use global variables you can use thread
-local variables. These also have fixed addresses. So at the end
qsort() is thread-safe.

In practice that is likely true. In theory it is not, because C Standard
does not guarantee that qsort() is not multi-threaded internally.
Of course, with existing API multi-threaded implementation of qsort is impractical, but I'm now wearing my Language Lawyer's hat. It (hat)
looks rather new, because I don't wear it often.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Thu Mar 19 18:33:13 2026

From Newsgroup: comp.lang.c

On 19/03/2026 15:29, Michael S wrote:

On Thu, 19 Mar 2026 14:49:16 +0000
Bart <bc@freeuk.com> wrote:

On 19/03/2026 09:19, Michael S wrote:

On Wed, 18 Mar 2026 11:20:03 +0100
David Brown <david.brown@hesbynett.no> wrote:

Normally however (and again in scripting code) I'd use my built-in
sort() based on quicksort, which is nearly 1000 times faster than
bubble-sort for my test (sort 20K random strings), and some 300x
faster than your routines. It's not O(n-squared) either.

For lexicographic sort of 20K random strings, plain quicksort is
probably quite sub-optimal.
If performance is important, I'd consider combined method: first
pre-sort by 3-char or 4-char prefix with Radix sort ('by LS character
first' variation of algorithm), then use quicksprt to sort sections
with the same prefix. For string taken from the real world it will not
work as well as for artificial random strings, but should still
significantly outperform plain quicksort.

What do you think the slow-down would be? I set up another test, sorting
1M random strings each exactly 16 characters long.

This is how long it took to sort via various means:

WSL shell 'sort': 2.3/3.5 seconds (real/user, from/to a file)
Windows 'sort': 4.2 seconds (from/to a file)
C's qsort: 0.5 seconds (gcc; initialised char*[]/inplace)
0.6 seconds (tcc)
My script lang: 2.3/2.8 seconds (sort only/all, file to in-memory)

(That last timing is somewhat remarkable given (1) that the sort routine itself runs as interpreted, dynamic bytecode; (2) each string compare
involves calling C's 'strcmp' /after/ converting args to ensure strings
are zero-terminated.)

So how much faster ought it to be?

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Thu Mar 19 21:40:46 2026

From Newsgroup: comp.lang.c

On Thu, 19 Mar 2026 18:33:13 +0000
Bart <bc@freeuk.com> wrote:

On 19/03/2026 15:29, Michael S wrote:

On Thu, 19 Mar 2026 14:49:16 +0000
Bart <bc@freeuk.com> wrote:

On 19/03/2026 09:19, Michael S wrote:

On Wed, 18 Mar 2026 11:20:03 +0100
David Brown <david.brown@hesbynett.no> wrote:

Normally however (and again in scripting code) I'd use my built-in
sort() based on quicksort, which is nearly 1000 times faster than
bubble-sort for my test (sort 20K random strings), and some 300x
faster than your routines. It's not O(n-squared) either.

For lexicographic sort of 20K random strings, plain quicksort is
probably quite sub-optimal.
If performance is important, I'd consider combined method: first
pre-sort by 3-char or 4-char prefix with Radix sort ('by LS
character first' variation of algorithm), then use quicksprt to
sort sections with the same prefix. For string taken from the real
world it will not work as well as for artificial random strings,
but should still significantly outperform plain quicksort.

What do you think the slow-down would be? I set up another test,
sorting 1M random strings each exactly 16 characters long.

This is how long it took to sort via various means:

WSL shell 'sort': 2.3/3.5 seconds (real/user, from/to a file)
Windows 'sort': 4.2 seconds (from/to a file)
C's qsort: 0.5 seconds (gcc; initialised char*[]/inplace)
0.6 seconds (tcc)
My script lang: 2.3/2.8 seconds (sort only/all, file to in-memory)

(That last timing is somewhat remarkable given (1) that the sort
routine itself runs as interpreted, dynamic bytecode; (2) each string
compare involves calling C's 'strcmp' /after/ converting args to
ensure strings are zero-terminated.)

So how much faster ought it to be?

I don't understand the question. What answer could there possibly be
except for "There are no limits to perfection!" ?

--- Synchronet 3.21d-Linux NewsLink 1.2

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c on Thu Mar 19 22:39:11 2026

From Newsgroup: comp.lang.c

On 17/03/2026 04:21, Janis Papanagnou wrote:

I've lost track of what the actual requirements meanwhile are.
Double-quoted names and numbers are the payload to extract?

Yes, the python code didn't seem to dequote to objects but assumed the
quotes were noise and there would be two name parts.
--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c on Thu Mar 19 22:42:01 2026

From Newsgroup: comp.lang.c

On 18/03/2026 20:14, DFS wrote:

On 3/16/2026 8:49 PM, Tristan Wibberley wrote:

I can do something that works with the provided data with util-linux,
sed, awk, and coreutils:

{ sed 's/"//g' | awk '{print $1 " " $2 ":" $3}' | sort -k2 | column -t
-s: | nl -s". " ; } < data

where the file "data" contains the original list. Although there will be
some differences in behaviour for some other data and I think your
python and the above will do unintuitive things with other data.

Emma        Bauer:0157-9988776
Tom         Bauer:0171-1122334
Tim         Becker:0151-1112223
Anna        Becker:0170-2233445
Marie       Becker:040-4455667
Michael     Braun:0170-9988776
Sarah       Braun:040-7788990

You need numbering, and the names should remain together, and the phones left-aligned.

1. Emma Bauer     0157-9988776
2. Tom Bauer      0171-1122334
3. Tim Becker     0151-1112223
4. Anna Becker    0170-2233445
5. Marie Becker   040-4455667
6. Michael Braun 0170-9988776
7. Sarah Braun    040-7788990

I got the second result, maybe a LC_ALL=C { ... } is needed at the start
to get the same result everywhere. I always forget the LC_ALL=C.
--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- Synchronet 3.21d-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Thu Mar 19 23:21:43 2026

From Newsgroup: comp.lang.c

Bart <bc@freeuk.com> wrote:

On 19/03/2026 15:29, Michael S wrote:

On Thu, 19 Mar 2026 14:49:16 +0000
Bart <bc@freeuk.com> wrote:

On 19/03/2026 09:19, Michael S wrote:

On Wed, 18 Mar 2026 11:20:03 +0100
David Brown <david.brown@hesbynett.no> wrote:

Normally however (and again in scripting code) I'd use my built-in
sort() based on quicksort, which is nearly 1000 times faster than
bubble-sort for my test (sort 20K random strings), and some 300x
faster than your routines. It's not O(n-squared) either.

For lexicographic sort of 20K random strings, plain quicksort is
probably quite sub-optimal.
If performance is important, I'd consider combined method: first
pre-sort by 3-char or 4-char prefix with Radix sort ('by LS character
first' variation of algorithm), then use quicksprt to sort sections
with the same prefix. For string taken from the real world it will not
work as well as for artificial random strings, but should still
significantly outperform plain quicksort.

What do you think the slow-down would be? I set up another test, sorting
1M random strings each exactly 16 characters long.

This is how long it took to sort via various means:

WSL shell 'sort': 2.3/3.5 seconds (real/user, from/to a file)
Windows 'sort': 4.2 seconds (from/to a file)
C's qsort: 0.5 seconds (gcc; initialised char*[]/inplace)
0.6 seconds (tcc)
My script lang: 2.3/2.8 seconds (sort only/all, file to in-memory)

(That last timing is somewhat remarkable given (1) that the sort routine itself runs as interpreted, dynamic bytecode; (2) each string compare involves calling C's 'strcmp' /after/ converting args to ensure strings
are zero-terminated.)

So how much faster ought it to be?

quicksort needs 20-30 passes, on several passes you are likely to get
cache miss per access to string. Assuming nominally 100 clocks
per cache miss, miss in each pass and 25 passes we get 2.5e9 clocks
which for 2.4GHz machine would give 1.2s. Hardware may be able
to handle more than one miss in parallel, for quicksort it is
tricky to get more than 2 misses in parallel. 0.5s means that your
hardware is doing reasonably good job Radix sort should be able to
do work in 10-15 passes with similar cost per pass, so there is
potential for speeding this 2 times. Single pass of radix sort will
give modest speedup.

AFAICS more speedup is possible by working directly on strings,
that is treating each string as 16 byte memory area and comparing
them with 8 byte operations. In such case single pass of quicksort
should take 2-3 ms so the is possibility of 5 time speedup.
but working directly with strings gets more complicated when
strings are of variable size. As a compromise one could pad
each string to multiple of 8 bytes, work with pointers but
also copy strings. That will ensure sequential access to
string data.

There is a tradeoff: copying means more instructions, but it
allows sequential access in the next pass, so means less
cache misses. Copying only pointers means more cache misses.
If strings are long one could do some number of passes working
on copy of prefix and orignal pointer, once this is sorted
one can work in subsequent characters. That way one can
reduce amount of data that needs copying (but somewhat
decrease locality).

Knuth claimed that tape sort was of comparable cost to copy:
that is even if you had perfect knowledge where each piece
of data should go it would not help much: you would need to
do something like sort to ensure sequential access.
--
Waldek Hebisch
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Thu Mar 19 23:53:35 2026

From Newsgroup: comp.lang.c

On 19/03/2026 19:40, Michael S wrote:

On Thu, 19 Mar 2026 18:33:13 +0000
Bart <bc@freeuk.com> wrote:

On 19/03/2026 15:29, Michael S wrote:

On Thu, 19 Mar 2026 14:49:16 +0000
Bart <bc@freeuk.com> wrote:

On 19/03/2026 09:19, Michael S wrote:

On Wed, 18 Mar 2026 11:20:03 +0100
David Brown <david.brown@hesbynett.no> wrote:

Normally however (and again in scripting code) I'd use my built-in
sort() based on quicksort, which is nearly 1000 times faster than
bubble-sort for my test (sort 20K random strings), and some 300x
faster than your routines. It's not O(n-squared) either.

For lexicographic sort of 20K random strings, plain quicksort is
probably quite sub-optimal.
If performance is important, I'd consider combined method: first
pre-sort by 3-char or 4-char prefix with Radix sort ('by LS
character first' variation of algorithm), then use quicksprt to
sort sections with the same prefix. For string taken from the real
world it will not work as well as for artificial random strings,
but should still significantly outperform plain quicksort.

What do you think the slow-down would be? I set up another test,
sorting 1M random strings each exactly 16 characters long.

This is how long it took to sort via various means:

WSL shell 'sort': 2.3/3.5 seconds (real/user, from/to a file)
Windows 'sort': 4.2 seconds (from/to a file)
C's qsort: 0.5 seconds (gcc; initialised char*[]/inplace)
0.6 seconds (tcc)
My script lang: 2.3/2.8 seconds (sort only/all, file to in-memory)

(That last timing is somewhat remarkable given (1) that the sort
routine itself runs as interpreted, dynamic bytecode; (2) each string
compare involves calling C's 'strcmp' /after/ converting args to
ensure strings are zero-terminated.)

So how much faster ought it to be?

I don't understand the question. What answer could there possibly be
except for "There are no limits to perfection!" ?

You said that plain quicksort (I guess that is what I used) is likely 'sub-optimal' and that your complex approach will 'significantly
outperform' it.

So I'm just asking 'by how much'?

My figures suggested it was fast enough. However I don't know what kind
of sort routines are used in that table except for mine, which is not
native code.

So I ported my sort to C, but the figures are pretty much the same as
C's qsort(): 0.5 seconds for both tcc/gcc, even though it uses dedicated compare code.

That routine is given below (not my algorithm; I adapted it long ago).

That it is 5-8 times as fast as shell methods of sorting (even allowing
for those to load and write files) seems good enough for me.

However, in my programs, I would look askance at anything that required
such a sorting step anyway. It's something I try and avoid.

-----------------------
void isort(char** data, int ll, int rr) {
char* temp;
int i = ll, j = rr;
char* pivot = data[(ll + rr) / 2];

do {
while (strcmp(pivot, data[i]) > 0 && i < rr) ++i;
while (strcmp(pivot, data[j]) < 0 && j > ll) --j;

if (i <= j) {
temp = data[i]; data[i] = data[j]; data[j] = temp;
++i;
--j;
}
} while (i <= j);

if (ll < j) isort(data, ll, j);
if (i < rr) isort(data, i, rr);
}

//For a char* array A of N elements, call as 'isort(A, 0, n-1)'.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Fri Mar 20 00:15:50 2026

From Newsgroup: comp.lang.c

On 19/03/2026 23:53, Bart wrote:

That routine is given below (not my algorithm; I adapted it long ago).

-----------------------
void isort(char** data, int ll, int rr) {
    char* temp;
    int i = ll, j = rr;
    char* pivot = data[(ll + rr) / 2];

    do {
        while (strcmp(pivot, data[i]) > 0 && i < rr) ++i;
        while (strcmp(pivot, data[j]) < 0 && j > ll) --j;

        if (i <= j) {
            temp = data[i]; data[i] = data[j]; data[j] = temp;
            ++i;
            --j;
        }
    } while (i <= j);

    if (ll < j) isort(data, ll, j);
    if (i < rr) isort(data, i, rr);
}

//For a char* array A of N elements, call as 'isort(A, 0, n-1)'.

BTW input data was embedded in the program like this:

char* data[] = {
"yjrsrhzkgupmbfyd", // 0
....
"chkjogsemrusystx" // 999999
};

(999998 lines elided.)

(This gave problems in the shared backend of all my own tools because of limitations, so couldn't test there. I'll have to fix that.

But if read from a file - a better idea anyway - then sort time was the
same: between 0.4s and 0.45s in all cases when the sort is isolated.)
--- Synchronet 3.21d-Linux NewsLink 1.2

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c on Fri Mar 20 01:33:38 2026

From Newsgroup: comp.lang.c

On 17/03/2026 05:25, Bonita Montero wrote:

Am 16.03.2026 um 21:43 schrieb DFS:

Even with the -Os compile flag you mentioned, the executable is 101MB.

...
Linux / clang++: ~190kiB.

That seems pretty crazy itself, what causes it to be more than 1kiB?
template instantiations not in a .so even though they're really really
common?

How does one fix it without U/B due to not having the same definition of
an explicit instantiation throughout, i.e. via toolchain direction to
drop instantiations at link time because they're already available in a .so?
--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Mar 20 03:45:12 2026

From Newsgroup: comp.lang.c

On 2026-03-19 14:28, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-03-18 11:20, David Brown wrote:

On 18/03/2026 10:21, Michael S wrote:

Bubble sort is not the worst existing, far from it. It is O(n**2).
Some algos, which names I don't remember are O(n**3).

Bogosort is O(n * n!) on average - keep rearranging the elements at
random, then check if they are sorted. (The worst case is only bounded >>> complexity if your random generator is pseduo-random rather than truly
random.)

Just to make clear; the key point is that it makes no sense to
implement sorting algorithms of complexities worse that O(N²).

Well, in Prolog it is natural to implement sort in a way that
is equvalent to recursive search of ordering permuations.
And when you are sorting say 5 element lists it can work
quite well.

With that complexity you can compare and move every element to
be sorted with any other element.[*] So Michael's hypothetical
O(N**3) algorithm would be just nonsense (or ignorance) for
any sensible application in practice.

You assume O(1) access to elements. Take any O(N^2) array
sort and replace array by a list. Now your O(N^2) suddenly
becomes O(N^3).

I disagree with that view. You can convert any list data to
an in memory array or external linear file in O(N). And the
sorting algorithm is still O(N log N) because the transform
is just O(N). (Don't mix data array dimension "N" with the
"N" from the complexity of sorting algorithms; a matrix with
dimension of N containing M=N*N elements can be sorted with
O(M log M).)

Note that this can easily happen in dynamicaly
typed languages with true lists: such languages tend to have
"element access" operation which is O(1) for arrays but O(N)
for lists. It can happen in C if you write a "generic" sort
which takes accessors as arguments (eiter function pointer
arguments or via appropriate macrology) and plug in list
operations as arguemnts.

You may have worse runtime behavior because of convoluted or
inappropriate data representation but you cannot "blame" the
sorting algorithm for badly structured data; the complexity
of the sorting algorithm is still at most O(N^2) and can be
done with typical sophisticated sorting algorithms (e.g. with
Heapsort in memory, or Mergesort with external linear stores,
guaranteed to be O(N log N)).

[*] It would otherwise be like implementing some operations on
linear lists with an algorithm worse than O(N). Depending on
the actual function you usually strive for O(log N) or O(1) by
organizing your data appropriately, but never worse than O(N).

You know about lists, but somewhat ignore possibility of
combining lists with O(N^2) sort.

Not sure what you have in mind here.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Mar 20 04:01:18 2026

From Newsgroup: comp.lang.c

On 2026-03-19 10:43, David Brown wrote:

Yes, that's fair enough. You have to go out of your way to make a
sorting algorithm that is worse than O(n²), so they will not be found in real code. But they certainly do exist. [...]

Well, that is the not so uncommonly seen effect of bad software.

[...] But sometimes that is not the fastest tool for the job.

I've had occasion to need to sort arrays of numbers (ints or floats)
with a compile-time fixed small size - such as 4 or 6 entries. While I
did not use bubblesort, a bubblesort would have beaten standard C
library qsort by an order of magnitude.

I can't tell about the 'qsort'; I've never had a need to use it.
(And therefore I've never inspected it, as said.)

But the characteristic of a good sorting algorithm implementation
is that it's often a hybrid. - I mentioned the CDC implementation
of Quicksort [in Pascal] that had Straight Insertion Sort for runs
smaller than 10. - So even for small datasets you wouldn't have a
significant degradation.

The fun of sorting is that there is no single perfect algorithm that is always the best choice.

There's sets of (sophisticated) algorithms available to choose from.
And typical algorithms provided are sensibly hybrids. (Myself I've
implemented a Mergesort with a Heapsort to create long initial runs
decades ago, for example. And I read from current documentation that
this is quite common for libraries written sensibly; e.g. C++/STL.)

C++'s sorts have the advantage of being template based, and thus can be
more efficient than generic memcpy() and comparison functions for C's qsort. Usually, at least for larger datasets, sorts are hybrid
algorithms, switching between things like quicksort, insertion sort and
heap sort at different stages in order to maximise cache hits and to get
the balance between the "O" complexity and the constants in the O function. [...]

Ah, I now see you know all that already, so I could have spared some
writing. :-)

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Mar 20 04:16:02 2026

From Newsgroup: comp.lang.c

On 2026-03-19 16:07, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 19/03/2026 11:23, Michael S wrote:

On Thu, 19 Mar 2026 10:43:08 +0100
David Brown <david.brown@hesbynett.no> wrote:

C++'s sorts have the advantage of being template based, and thus can
be more efficient than generic memcpy() and comparison functions for
C's qsort.

The problem with C qsort is not just lack of efficiency. It's also
lacke of flexibilty at API level. [...]

I think this is in practice indeed a crucial point. - How
flexible it's usable without compromising the basic quality
of any underlying algorithm.

On the other hand, qsort() works just fine for relatively small
data sets which I suspect is the common use for qsort (aside from introductory programming exercises).

Au contraire; given it's outstanding quality concerning the
complexity measure it's especially usable for large datasets,
and for small sets the hybrid mechanisms should pop in. (You
shouldn't notice a difference. - I repeat my disclaimer of
not knowing how C's qsort() function is actually implemented,
whether it's a Quicksort at all, whether it's hybrid, etc.)

Quicksort (as opposed to qsort() - where the details are not
obvious) is certainly a case for computer science lectures;
I'd only have hoped that these CS basics are broadly known,
also with non-CS educated or the many amateur programmers.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Mar 20 04:33:10 2026

From Newsgroup: comp.lang.c

On 2026-03-19 10:19, Michael S wrote:

On Wed, 18 Mar 2026 11:20:03 +0100
David Brown <david.brown@hesbynett.no> wrote:

Bogosort is O(n * n!) on average - keep rearranging the elements at
random, then check if they are sorted. (The worst case is only
bounded complexity if your random generator is pseduo-random rather
than truly random.)

I didn't mean something intentionally crippled.
I remember seeing examples of worse than O(n**2) algorithms that at
first glance look reasonable. I don't remember details.

Then it's best to abandon imagined examples. (Or search for it
and then report about them so that we have something substantial
to talk about.)

[...]

The language is 'C'.
Here are implementations of Straight Insertion and Straight Select.
Show me implementation of Bubble sort that is not at least a little
more complicated.

I don't know about you, but Bubblesort is trivial, and the other
two O(N^2) methods you mention are close to trivial. Certainly
they play in the same "league". Compare these algorithms to the
other sophisticated algorithms whose principles usually cannot be
*obviously* understood (e.g. Quicksort, Shellsort, even Heapsort.

(There's a point (but negligible) if some other poster previously
said that it's easier for him to program Bubblesort than something
even slightly more sophisticated.)

Of course you can tweak any algorithm to make it better. But if
you're starting with a bad choice of an algorithm you won't fix
the inherent issues.

Remember that it has to be "real" bubble sort, not a simplified bubble
sort that does unnecessary work by starting each time from the
beginning. [...]

(There's Bubblesort. There's not "real" Bubblesort. Such phrases
neither explain anything nor are they helpful for discussions.)

Janis

[...]

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Mar 20 05:05:57 2026

From Newsgroup: comp.lang.c

On 2026-03-20 00:53, Bart wrote:

On 19/03/2026 19:40, Michael S wrote:

[...]

You said that plain quicksort (I guess that is what I used) is likely 'sub-optimal' and that your complex approach will 'significantly
outperform' it.

I don't recall a statement "[quicksort] is *likely* 'sub-optimal'"
(emphasis by me); but if that has been said - and if the algorithm
was meant - it's of course nonsense.

Quicksort is, for significantly large data sets, more likely faster
even in its basic form if compared other plain O(N log N) algorithms
and especially if compared to any O(N^2) algorithms. Even if it
exhibits in rare corner cases O(N^2) it typically outperforms other
algorithms of O(N log N) class in practice on average random data.

So I'm just asking 'by how much'?

Concrete numbers are secondary here, they hide the basic comprehension.
The complexity classes show how the algorithms scale with increasing
data sets (there's various complexity measures, BTW, with other bound
criteria; which may be of interest to not get repelled by the O(N^2)
"threat" of Quicksort's worst case bound in its primitive form).

Janis

Anecdotally; Google is known to ask in job interviews about complexity
of algorithms knowledge. A friend of mine who applied for a job had
been specifically asked about the big-O complexity of Quicksort.

[...]

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 20 07:42:46 2026

From Newsgroup: comp.lang.c

Am 20.03.2026 um 02:33 schrieb Tristan Wibberley:

That seems pretty crazy itself, what causes it to be more than 1kiB?
template instantiations not in a .so even though they're really really common?

How does one fix it without U/B due to not having the same definition of
an explicit instantiation throughout, i.e. via toolchain direction to
drop instantiations at link time because they're already available in a .so?

Where's the problem ?
--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Fri Mar 20 08:35:05 2026

From Newsgroup: comp.lang.c

On 20/03/2026 04:01, Janis Papanagnou wrote:

Ah, I now see you know all that already, so I could have spared some
writing. :-)

It's still nice to see we are on the same page. We all have daft ideas
or unexpected misunderstandings at times - things we've "always known"
that are actually completely wrong. So it's nice to get conformation
that others, thinking and writing independently, reach the same conclusions.

(I'd expect that all the regulars here will know these things about
sorting algorithms, but there's always a chance that someone learns
something from the posts.)

--- Synchronet 3.21d-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Fri Mar 20 02:14:12 2026

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[...]

Quicksort (as opposed to qsort() - where the details are not
obvious) is certainly a case for computer science lectures;
I'd only have hoped that these CS basics are broadly known,
also with non-CS educated or the many amateur programmers.

[...]

C doesn't specify which algorithm qsort() uses. The name is almost
certainly derived from the name of the Quicksort algorithm, but the
C standard only says that the contents of the array are sorted.
A conforming but perverse implemention could use Bubblesort or
Bogosort.

Even the GNU libc documentation doesn't say what algorithm that
implementation uses. (In the source code, I see references to
mergesort and heapsort.)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Mar 20 12:58:24 2026

From Newsgroup: comp.lang.c

On Fri, 20 Mar 2026 05:05:57 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-03-20 00:53, Bart wrote:

On 19/03/2026 19:40, Michael S wrote:

[...]

You said that plain quicksort (I guess that is what I used) is
likely 'sub-optimal' and that your complex approach will
'significantly outperform' it.

I don't recall a statement "[quicksort] is *likely* 'sub-optimal'"
(emphasis by me); but if that has been said - and if the algorithm
was meant - it's of course nonsense.

Your ignorance shows.

Quicksort is, for significantly large data sets, more likely faster
even in its basic form if compared other plain O(N log N) algorithms
and especially if compared to any O(N^2) algorithms.
Even if it
exhibits in rare corner cases O(N^2) it typically outperforms other algorithms of O(N log N) class in practice on average random data.

Correct.
But Radix sort (or Counting sort, if you want; at the end it is the
same thing) is not O(N*logN). It is O(N*m) where m is the length of the
key. In variation that I proposed to Bart, Radix sort is used to
pre-sort by short prefix, 3 or 4 characters long. So, m is 3 or 4,
which is several times smaller than logN.
BTW, quicksort is O(N*logN) only as long as comparison is O(1). Which
does not hold in the general case of lexicographic sort.

So I'm just asking 'by how much'?

Concrete numbers are secondary here, they hide the basic
comprehension. The complexity classes show how the algorithms scale
with increasing data sets (there's various complexity measures, BTW,
with other bound criteria; which may be of interest to not get
repelled by the O(N^2) "threat" of Quicksort's worst case bound in
its primitive form).

Janis

Anecdotally; Google is known to ask in job interviews about complexity
of algorithms knowledge. A friend of mine who applied for a job had
been specifically asked about the big-O complexity of Quicksort.

[...]

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Mar 20 12:38:07 2026

From Newsgroup: comp.lang.c

On 2026-03-20 10:14, Keith Thompson wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[...]

Quicksort (as opposed to qsort() - where the details are not
obvious) is certainly a case for computer science lectures;
I'd only have hoped that these CS basics are broadly known,
also with non-CS educated or the many amateur programmers.

[...]

C doesn't specify which algorithm qsort() uses. The name is almost
certainly derived from the name of the Quicksort algorithm, but the
C standard only says that the contents of the array are sorted.
A conforming but perverse implemention could use Bubblesort or
Bogosort.

Yes, there's no [obvious] information; that's the dilemma![*]

Even the GNU libc documentation doesn't say what algorithm that implementation uses. (In the source code, I see references to
mergesort and heapsort.)

Though the use of Mergesort is somewhat irritating (to me).

But both are O(N log N), which is good to know. (And, frankly,
I wouldn't have expected any worse O(N^2) algorithm here.)

Janis

[*] C++/STL has at least guarantees for the complexities.
For me that would basically suffice. I don't necessarily need
to know whether it's the concrete algorithm A or B.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Mar 20 12:47:39 2026

From Newsgroup: comp.lang.c

On 2026-03-20 08:35, David Brown wrote:

On 20/03/2026 04:01, Janis Papanagnou wrote:

Ah, I now see you know all that already, so I could have spared some
writing. :-)

It's still nice to see we are on the same page. We all have daft ideas
or unexpected misunderstandings at times - things we've "always known"
that are actually completely wrong. So it's nice to get conformation
that others, thinking and writing independently, reach the same
conclusions.

Yes, you're absolutely right. - Actually if there's some statement
and it's neither opposed nor confirmed, but just silence, we cannot
really value or if necessary correct a view.

(I'd expect that all the regulars here will know these things about
sorting algorithms, but there's always a chance that someone learns something from the posts.)

I probably would have shared that expectation, but just recently
there were a couple individuals that had, umm.., interesting views
and opinions about such CS basics (or even the role of CS for IT).

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Mar 20 12:53:57 2026

From Newsgroup: comp.lang.c

On 2026-03-20 11:58, Michael S wrote:

On Fri, 20 Mar 2026 05:05:57 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-03-20 00:53, Bart wrote:

On 19/03/2026 19:40, Michael S wrote:

[...]

You said that plain quicksort (I guess that is what I used) is
likely 'sub-optimal' and that your complex approach will
'significantly outperform' it.

I don't recall a statement "[quicksort] is *likely* 'sub-optimal'"
(emphasis by me); but if that has been said - and if the algorithm
was meant - it's of course nonsense.

Your ignorance shows.

(Not worth a reply.)

Quicksort is, for significantly large data sets, more likely faster
even in its basic form if compared other plain O(N log N) algorithms
and especially if compared to any O(N^2) algorithms.
Even if it
exhibits in rare corner cases O(N^2) it typically outperforms other
algorithms of O(N log N) class in practice on average random data.

Correct.

Glad you understand it.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Mar 20 14:01:10 2026

From Newsgroup: comp.lang.c

On Thu, 19 Mar 2026 23:53:35 +0000
Bart <bc@freeuk.com> wrote:

On 19/03/2026 19:40, Michael S wrote:

On Thu, 19 Mar 2026 18:33:13 +0000
Bart <bc@freeuk.com> wrote:

On 19/03/2026 15:29, Michael S wrote:

On Thu, 19 Mar 2026 14:49:16 +0000
Bart <bc@freeuk.com> wrote:

On 19/03/2026 09:19, Michael S wrote:

On Wed, 18 Mar 2026 11:20:03 +0100
David Brown <david.brown@hesbynett.no> wrote:

Normally however (and again in scripting code) I'd use my
built-in sort() based on quicksort, which is nearly 1000 times
faster than bubble-sort for my test (sort 20K random strings),
and some 300x faster than your routines. It's not O(n-squared)
either.

For lexicographic sort of 20K random strings, plain quicksort is
probably quite sub-optimal.
If performance is important, I'd consider combined method: first
pre-sort by 3-char or 4-char prefix with Radix sort ('by LS
character first' variation of algorithm), then use quicksprt to
sort sections with the same prefix. For string taken from the real
world it will not work as well as for artificial random strings,
but should still significantly outperform plain quicksort.

What do you think the slow-down would be? I set up another test,
sorting 1M random strings each exactly 16 characters long.

This is how long it took to sort via various means:

WSL shell 'sort': 2.3/3.5 seconds (real/user, from/to a file)
Windows 'sort': 4.2 seconds (from/to a file)
C's qsort: 0.5 seconds (gcc; initialised
char*[]/inplace) 0.6 seconds (tcc)
My script lang: 2.3/2.8 seconds (sort only/all, file to
in-memory)

(That last timing is somewhat remarkable given (1) that the sort
routine itself runs as interpreted, dynamic bytecode; (2) each
string compare involves calling C's 'strcmp' /after/ converting
args to ensure strings are zero-terminated.)

So how much faster ought it to be?

I don't understand the question. What answer could there possibly be
except for "There are no limits to perfection!" ?

You said that plain quicksort (I guess that is what I used) is likely 'sub-optimal' and that your complex approach will 'significantly
outperform' it.

So I'm just asking 'by how much'?

How could I possibly know?
It depends on dataset (size and distribution) and on specifics of your
compute engine (CPU core, caches, memory speed).
The only thing I can tell that for the range of N you are talking
about, i.e. 20k to 1M, the speed up of the sort part would be
significant. I am sure about 2x, but not about 10x. For the rest, only measurements can tell.

Of course, if your file system is slow, e.g. because of antivirus or
because of ancient storage hardware, then the speedup of the whole
job of reading-sorting-writing would be lost in the noise.
But you know that already.

Ifyou want data set to compare your results with other people, then I
propose to use bible.txt from Canterbury Corpus: https://corpus.canterbury.ac.nz/resources/large.zip

Or, if you found it too short, here is the text that considered the
longest novel ever written: https://standardebooks.org/ebooks/marcel-proust/in-search-of-lost-time/c-k-scott-moncrieff/text/single-page

It is rather short, too, only 100K lines, but I can not think about
anything longer and not artificial.

My figures suggested it was fast enough. However I don't know what
kind of sort routines are used in that table except for mine, which
is not native code.

So I ported my sort to C, but the figures are pretty much the same as
C's qsort(): 0.5 seconds for both tcc/gcc, even though it uses
dedicated compare code.

That routine is given below (not my algorithm; I adapted it long ago).

That it is 5-8 times as fast as shell methods of sorting (even
allowing for those to load and write files) seems good enough for me.

However, in my programs, I would look askance at anything that
required such a sorting step anyway. It's something I try and avoid.

-----------------------
void isort(char** data, int ll, int rr) {
char* temp;
int i = ll, j = rr;
char* pivot = data[(ll + rr) / 2];

do {
while (strcmp(pivot, data[i]) > 0 && i < rr) ++i;
while (strcmp(pivot, data[j]) < 0 && j > ll) --j;

if (i <= j) {
temp = data[i]; data[i] = data[j]; data[j] = temp;
++i;
--j;
}
} while (i <= j);

if (ll < j) isort(data, ll, j);
if (i < rr) isort(data, i, rr);
}

//For a char* array A of N elements, call as 'isort(A, 0, n-1)'.

Looks o.k. speed wise.
It is not ideomatic C, probably a literal translation from Fortran or
Pascal, but with modern compilers it is not supposed to make a lot of difference. With tcc, it could pay of to re-write in more ideomatic
manner

In professional practice people normally do two or three modifications
to the basic algorithm:
1. pivot is selected as median of 3 {data[0], data[n/2], data[n-1])
2. after split, shorter section is sorted before longer section
3. short sections, say, under 10 entries, are sorted by straight
insertion algorithm

(1) make worst case O(N*N) behavior extremely unlikely
(2) assures that even when worst case happens the depth of recursion do
not exceed log2(N). Which can be important on systems with small
default stack size.
(3) is believed to provide some speed up. The belief not always found
true, but it persists.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Mar 20 13:06:21 2026

From Newsgroup: comp.lang.c

Am 20.03.2026 um 12:38 schrieb Janis Papanagnou:

But both are O(N log N), which is good to know. (And, frankly,
I wouldn't have expected any worse O(N^2) algorithm here.)

Mergesort is always N(log N). Database server's don't use
quicksort since it doesn't perform well it the items being
sorted have a variable size; so they use quicksort, also
because of it has very linear accesses, which is good for
I/O.

[*] C++/STL has at least guarantees for the complexities.
For me that would basically suffice. I don't necessarily
need to know whether it's the concrete algorithm A or B.

You're too compulsive.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Mar 20 13:13:41 2026

From Newsgroup: comp.lang.c

On 2026-03-20 11:58, Michael S wrote:

[...]
BTW, quicksort is O(N*logN) only as long as comparison is O(1). Which
does not hold in the general case of lexicographic sort.

You have to differentiate the complexity of an _algorithm_ here;
we talk about how many comparisons and how many data swaps are
necessary.

If your _comparisons_ are costly, or your _data swaps_ are costly,
we don't blame the algorithm. If you happen to have a comparison
function of some O(X) the _sorting algorithm_ is still of its own
inherent algorithmic complexity. If you happen to swap BLOB data
by copying every byte (instead of sorting an index, for example)
you also cannot blame the sorting algorithm, its complexity still
holds.

If it were different any complexity measure for algorithms would
be meaningless.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c on Fri Mar 20 12:16:17 2026

From Newsgroup: comp.lang.c

On 20/03/2026 06:42, Bonita Montero wrote:

Am 20.03.2026 um 02:33 schrieb Tristan Wibberley:

That seems pretty crazy itself, what causes it to be more than 1kiB?
template instantiations not in a .so even though they're really really
common?

How does one fix it without U/B due to not having the same definition of
an explicit instantiation throughout, i.e. via toolchain direction to
drop instantiations at link time because they're already available in
a .so?

Where's the problem ?

I said.
--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Mar 20 14:24:21 2026

From Newsgroup: comp.lang.c

On Fri, 20 Mar 2026 04:33:10 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-03-19 10:19, Michael S wrote:

On Wed, 18 Mar 2026 11:20:03 +0100
David Brown <david.brown@hesbynett.no> wrote:

Bogosort is O(n * n!) on average - keep rearranging the elements at
random, then check if they are sorted. (The worst case is only
bounded complexity if your random generator is pseduo-random rather
than truly random.)

I didn't mean something intentionally crippled.
I remember seeing examples of worse than O(n**2) algorithms that at
first glance look reasonable. I don't remember details.

Then it's best to abandon imagined examples. (Or search for it
and then report about them so that we have something substantial
to talk about.)

[...]

The language is 'C'.
Here are implementations of Straight Insertion and Straight Select.
Show me implementation of Bubble sort that is not at least a little
more complicated.

I don't know about you, but Bubblesort is trivial, and the other
two O(N^2) methods you mention are close to trivial. Certainly
they play in the same "league". Compare these algorithms to the
other sophisticated algorithms whose principles usually cannot be
*obviously* understood (e.g. Quicksort, Shellsort, even Heapsort.

(There's a point (but negligible) if some other poster previously
said that it's easier for him to program Bubblesort than something
even slightly more sophisticated.)

Of course you can tweak any algorithm to make it better. But if
you're starting with a bad choice of an algorithm you won't fix
the inherent issues.

Remember that it has to be "real" bubble sort, not a simplified
bubble sort that does unnecessary work by starting each time from
the beginning. [...]

(There's Bubblesort. There's not "real" Bubblesort. Such phrases
neither explain anything nor are they helpful for discussions.)

Janis

[...]

The challenge was issued for David Brown and for Bart.
I never expected that you will give constructive reply.
Thank you for confirming my expectations.

--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Fri Mar 20 13:26:34 2026

From Newsgroup: comp.lang.c

On 20/03/2026 13:13, Janis Papanagnou wrote:

On 2026-03-20 11:58, Michael S wrote:

[...]
BTW, quicksort is O(N*logN) only as long as comparison is O(1). Which
does not hold in the general case of lexicographic sort.

You have to differentiate the complexity of an _algorithm_ here;
we talk about how many comparisons and how many data swaps are
necessary.

If your _comparisons_ are costly, or your _data swaps_ are costly,
we don't blame the algorithm. If you happen to have a comparison
function of some O(X) the _sorting algorithm_ is still of its own
inherent algorithmic complexity. If you happen to swap BLOB data
by copying every byte (instead of sorting an index, for example)
you also cannot blame the sorting algorithm, its complexity still
holds.

If it were different any complexity measure for algorithms would
be meaningless.

That's all true.

If you have a situation where swaps are particularly costly, or
comparisons are particularly costly, or other factors (such as memory
usage) are particularly costly, then you might have to be more nuanced
in the complexity measurements you use when comparing algorithms. You
might find that an algorithm that is O(n²) in comparisons but O(n) in
swaps is better for that purpose than one that is O(n.log n) in both.

And don't forget that the constant factors in the O can be relevant.
There's a known algorithm for multiplication of big numbers that is
O(n.log n), which is (probably) the optimal order of complexity for the
task. But the constant factors mean it is only better than other
algorithms once you are using truly absurdly big numbers (so that you
are saving a few minutes in your million-year calculations).

Sometimes a single O number is not nearly enough to tell you what you
need to know in comparisons between algorithms.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Mar 20 13:27:06 2026

From Newsgroup: comp.lang.c

On 2026-03-20 13:06, Bonita Montero wrote:

Am 20.03.2026 um 12:38 schrieb Janis Papanagnou:

But both are O(N log N), which is good to know. (And, frankly,
I wouldn't have expected any worse O(N^2) algorithm here.)

Mergesort is always N(log N).

Sure. - But you certainly meant O(N log N).

But Mergesort was originally targeted to sort external data
on sequential media (like tapes). For memory-internal sorting
there's more advantageous algorithms.

Database server's don't use
quicksort since it doesn't perform well it the items being
sorted have a variable size; so they use quicksort,

"don't use QS so they use QS"? - I think there's a typo in
there. And I'm guessing you probably intended to say about
the same what I said above....

also
because of it has very linear accesses, which is good for
I/O.

[*] C++/STL has at least guarantees for the complexities.
For me that would basically suffice. I don't necessarily
need to know whether it's the concrete algorithm A or B.

You're too compulsive.

You think it's "compulsive" to *ignore* the internally used
algorithm in libraries beyond knowing its complexity? - This
sounds completely crude, but if you want to explain what you
mean I'm interested.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Mar 20 14:42:10 2026

From Newsgroup: comp.lang.c

On Fri, 20 Mar 2026 08:35:05 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 20/03/2026 04:01, Janis Papanagnou wrote:

Ah, I now see you know all that already, so I could have spared some writing. :-)

It's still nice to see we are on the same page. We all have daft
ideas or unexpected misunderstandings at times - things we've "always
known" that are actually completely wrong. So it's nice to get
conformation that others, thinking and writing independently, reach
the same conclusions.

(I'd expect that all the regulars here will know these things about
sorting algorithms, but there's always a chance that someone learns something from the posts.)

Pay attention that while it is not codified in C++ Standard,
implementations of std::sort are expected to be "in-place", which
practically means that extra storage should be O(logN) or at worst
O(sqrt(N)). It means that merge sort is out of question. Radix/Count
sort is out of question both for this reason and because std::sort API
does not provide sufficient guarantees about the structure of key.
Heapsort is o.k in that regard.
I think that most real-world STL implementations have heapsort as a
back up for extremely rare case of primary algorithm (quicksort with median-of-3 pivot) misbehaving.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Mar 20 15:08:40 2026

From Newsgroup: comp.lang.c

On Fri, 20 Mar 2026 13:26:34 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 20/03/2026 13:13, Janis Papanagnou wrote:

On 2026-03-20 11:58, Michael S wrote:

[...]
BTW, quicksort is O(N*logN) only as long as comparison is O(1).
Which does not hold in the general case of lexicographic sort.

You have to differentiate the complexity of an _algorithm_ here;
we talk about how many comparisons and how many data swaps are
necessary.

If your _comparisons_ are costly, or your _data swaps_ are costly,
we don't blame the algorithm. If you happen to have a comparison
function of some O(X) the _sorting algorithm_ is still of its own
inherent algorithmic complexity. If you happen to swap BLOB data
by copying every byte (instead of sorting an index, for example)
you also cannot blame the sorting algorithm, its complexity still
holds.

If it were different any complexity measure for algorithms would
be meaningless.

That's all true.

No, it is not. That's all nonsense that demonstrates a shallow thinking. Comparison method one uses in lexicographic sort is very much
variable part of algorithm rather than something fixed.
Many good methods start sorting by using just few (sometimes one)
leading characters and only in later passes that typically operate on
much much shorter sections, they use full string comparison.
One example where doing it smart is particularly beneficial is a sort
at core of Burrows-Wheeler Transform.

If you have a situation where swaps are particularly costly, or
comparisons are particularly costly, or other factors (such as memory
usage) are particularly costly, then you might have to be more
nuanced in the complexity measurements you use when comparing
algorithms. You might find that an algorithm that is O(n�) in
comparisons but O(n) in swaps is better for that purpose than one
that is O(n.log n) in both.

And don't forget that the constant factors in the O can be relevant.
There's a known algorithm for multiplication of big numbers that is
O(n.log n), which is (probably) the optimal order of complexity for
the task. But the constant factors mean it is only better than other algorithms once you are using truly absurdly big numbers (so that you
are saving a few minutes in your million-year calculations).

Sometimes a single O number is not nearly enough to tell you what you
need to know in comparisons between algorithms.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Fri Mar 20 13:43:50 2026

From Newsgroup: comp.lang.c

On 20/03/2026 13:08, Michael S wrote:

On Fri, 20 Mar 2026 13:26:34 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 20/03/2026 13:13, Janis Papanagnou wrote:

On 2026-03-20 11:58, Michael S wrote:

[...]
BTW, quicksort is O(N*logN) only as long as comparison is O(1).
Which does not hold in the general case of lexicographic sort.

You have to differentiate the complexity of an _algorithm_ here;
we talk about how many comparisons and how many data swaps are
necessary.

If your _comparisons_ are costly, or your _data swaps_ are costly,
we don't blame the algorithm. If you happen to have a comparison
function of some O(X) the _sorting algorithm_ is still of its own
inherent algorithmic complexity. If you happen to swap BLOB data
by copying every byte (instead of sorting an index, for example)
you also cannot blame the sorting algorithm, its complexity still
holds.

If it were different any complexity measure for algorithms would
be meaningless.

That's all true.

No, it is not. That's all nonsense that demonstrates a shallow thinking. Comparison method one uses in lexicographic sort is very much
variable part of algorithm rather than something fixed.
Many good methods start sorting by using just few (sometimes one)
leading characters and only in later passes that typically operate on
much much shorter sections, they use full string comparison.

I just tried something like that with bubble-sort: I split the data into
26 arrays each only containing strings that start with 'a', 'b', 'c' and
so on.

The arrays were sorted separately then concatenated (so no longer
in-place unless I take the extra step of overwriting the original).

Sorting 20K random strings reduced from 25 seconds to 0.8 seconds
(interpreted code).

Of course, being random strings (and also all lower case), there was a near-perfect distribution of strings starting with each letter!

It also requires extra storage and copying.

--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Fri Mar 20 14:47:18 2026

From Newsgroup: comp.lang.c

On 20/03/2026 14:08, Michael S wrote:

On Fri, 20 Mar 2026 13:26:34 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 20/03/2026 13:13, Janis Papanagnou wrote:

On 2026-03-20 11:58, Michael S wrote:

[...]
BTW, quicksort is O(N*logN) only as long as comparison is O(1).
Which does not hold in the general case of lexicographic sort.

You have to differentiate the complexity of an _algorithm_ here;
we talk about how many comparisons and how many data swaps are
necessary.

If your _comparisons_ are costly, or your _data swaps_ are costly,
we don't blame the algorithm. If you happen to have a comparison
function of some O(X) the _sorting algorithm_ is still of its own
inherent algorithmic complexity. If you happen to swap BLOB data
by copying every byte (instead of sorting an index, for example)
you also cannot blame the sorting algorithm, its complexity still
holds.

If it were different any complexity measure for algorithms would
be meaningless.

That's all true.

No, it is not. That's all nonsense that demonstrates a shallow thinking. Comparison method one uses in lexicographic sort is very much
variable part of algorithm rather than something fixed.
Many good methods start sorting by using just few (sometimes one)
leading characters and only in later passes that typically operate on
much much shorter sections, they use full string comparison.
One example where doing it smart is particularly beneficial is a sort
at core of Burrows-Wheeler Transform.

Yes, I realise that various kinds of radix or bucket sorts are often
good for large data sets, either in whole or as the first steps. But I
am not clear about how it contradicts what Janis has been saying - I
feel the two of you are talking somewhat past each other. The cost of
doing a comparison does not affect the complexity of an algorithm - but
it can certainly affect the best choice of algorithm for the task in hand.

If you have a situation where swaps are particularly costly, or
comparisons are particularly costly, or other factors (such as memory
usage) are particularly costly, then you might have to be more
nuanced in the complexity measurements you use when comparing
algorithms. You might find that an algorithm that is O(n²) in
comparisons but O(n) in swaps is better for that purpose than one
that is O(n.log n) in both.

And don't forget that the constant factors in the O can be relevant.
There's a known algorithm for multiplication of big numbers that is
O(n.log n), which is (probably) the optimal order of complexity for
the task. But the constant factors mean it is only better than other
algorithms once you are using truly absurdly big numbers (so that you
are saving a few minutes in your million-year calculations).

Sometimes a single O number is not nearly enough to tell you what you
need to know in comparisons between algorithms.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Mar 20 15:51:58 2026

From Newsgroup: comp.lang.c

On Fri, 20 Mar 2026 13:43:50 +0000
Bart <bc@freeuk.com> wrote:

On 20/03/2026 13:08, Michael S wrote:

On Fri, 20 Mar 2026 13:26:34 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 20/03/2026 13:13, Janis Papanagnou wrote:

On 2026-03-20 11:58, Michael S wrote:

[...]
BTW, quicksort is O(N*logN) only as long as comparison is O(1).
Which does not hold in the general case of lexicographic sort.

You have to differentiate the complexity of an _algorithm_ here;
we talk about how many comparisons and how many data swaps are
necessary.

If your _comparisons_ are costly, or your _data swaps_ are costly,
we don't blame the algorithm. If you happen to have a comparison
function of some O(X) the _sorting algorithm_ is still of its own
inherent algorithmic complexity. If you happen to swap BLOB data
by copying every byte (instead of sorting an index, for example)
you also cannot blame the sorting algorithm, its complexity still
holds.

If it were different any complexity measure for algorithms would
be meaningless.

That's all true.

No, it is not. That's all nonsense that demonstrates a shallow
thinking. Comparison method one uses in lexicographic sort is very
much variable part of algorithm rather than something fixed.
Many good methods start sorting by using just few (sometimes one)
leading characters and only in later passes that typically operate
on much much shorter sections, they use full string comparison.

I just tried something like that with bubble-sort: I split the data
into 26 arrays each only containing strings that start with 'a', 'b',
'c' and so on.

The arrays were sorted separately then concatenated (so no longer
in-place unless I take the extra step of overwriting the original).

Sorting 20K random strings reduced from 25 seconds to 0.8 seconds (interpreted code).

You wouldn't see anything near that speed up with O(N*logN) algorithm.

Of course, being random strings (and also all lower case), there was
a near-perfect distribution of strings starting with each letter!

On the other hand, being random string means that strcmp() is very
close to O(1). So, may be, it illustrates related point, but it can not illustate my original point.

It also requires extra storage and copying.

You win some, you lose some. So it goes.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Fri Mar 20 13:22:18 2026

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-03-20 10:14, Keith Thompson wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[...]

Quicksort (as opposed to qsort() - where the details are not
obvious) is certainly a case for computer science lectures;
I'd only have hoped that these CS basics are broadly known,
also with non-CS educated or the many amateur programmers.

[...]
C doesn't specify which algorithm qsort() uses. The name is almost
certainly derived from the name of the Quicksort algorithm, but the
C standard only says that the contents of the array are sorted.
A conforming but perverse implemention could use Bubblesort or
Bogosort.

Yes, there's no [obvious] information; that's the dilemma![*]

I don't see a dilemma. qsort() sorts. No serious implementation is
going to use an algorithm with worse than O(n log n) performance.

Even the GNU libc documentation doesn't say what algorithm that
implementation uses. (In the source code, I see references to
mergesort and heapsort.)

Though the use of Mergesort is somewhat irritating (to me).

But both are O(N log N), which is good to know. (And, frankly,
I wouldn't have expected any worse O(N^2) algorithm here.)

In the glibc sources, stdlib/qsort.c has functions heapsort_r() and qsort_r_mergesort(). I haven't examined it closely enough to know when
and how they're used.

Janis

[*] C++/STL has at least guarantees for the complexities.
For me that would basically suffice. I don't necessarily need
to know whether it's the concrete algorithm A or B.

You effectively have the same guarantee for qsort(), even though it's
not spelled out in the standard.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Fri Mar 20 17:10:44 2026

From Newsgroup: comp.lang.c

On 3/20/2026 3:35 AM, David Brown wrote:

(I'd expect that all the regulars here will know these things about
sorting algorithms, but there's always a chance that someone learns something from the posts.)

This.

Sometimes listening to you clc guys is like listening to a room full of
CS professors (which I suspect some of you are or were).

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sat Mar 21 02:25:06 2026

From Newsgroup: comp.lang.c

On 2026-03-20 21:22, Keith Thompson wrote:

[...]

I don't see a dilemma. qsort() sorts. No serious implementation is
going to use an algorithm with worse than O(n log n) performance.

Yes, I also said upthread that my *expectation* is exactly that;
that professional libraries use sophisticated algorithms.

Even the GNU libc documentation doesn't say what algorithm that
implementation uses. (In the source code, I see references to
mergesort and heapsort.)

Though the use of Mergesort is somewhat irritating (to me).

But both are O(N log N), which is good to know. (And, frankly,
I wouldn't have expected any worse O(N^2) algorithm here.)

[...]

[*] C++/STL has at least guarantees for the complexities.
For me that would basically suffice. I don't necessarily need
to know whether it's the concrete algorithm A or B.

You effectively have the same guarantee for qsort(), even though it's
not spelled out in the standard.

But that was my point here; that there's a difference if that's
an assured/documented property or not. And I pointed to C++/STL
for the right approach.

YMMV, but back in my C++ days I had looked into the complexities
of the C++/STL algorithms when (or rather: before) I used them;
I needed guarantees in their runtime complexity or otherwise we'd
have had to implement own variants tailored for the applications
we implemented. Now C++/STL is a well documented sophisticatedly
designed library.

But, sadly, generally you cannot rely on sophisticated implemented
libraries in the IT-world. Take for example Regular Expressions;
if you're using Regexps you are (already by the theory of Regular
Expressions!) _guaranteed_ that you have a linear O(N) complexity.
Alas, some "Regexp"-libraries implemented extensions, and not only
syntactic sugar (details that stay within that Chomsky(3)-class),
but also extensions that go beyond. The consequence is that these
libraries implemented non-regular (backtracking) algorithms that
can handle such extensions, but you won't have the O(N) guarantee
any more. There were, even for the real Regular Expression subset
tremendous run-times (many magnitudes!) to observe.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sat Mar 21 02:53:34 2026

From Newsgroup: comp.lang.c

On 2026-03-20 22:10, DFS wrote:

Sometimes listening to you clc guys is like listening to a room full of
CS professors (which I suspect some of you are or were).

CS professors sitting in their ivory tower and without practical
experiences, and programmers without a substantial CS background;
both of these extreme characters can be problematic (if fanatic).
Many folks here seem to have a good mix of necessary practical
IT, Project, and CS knowledge and proficiencies, as I'd value it.
And a clear mind to discuss topics. Sometimes it gets heated and
personal, though; certainly nothing to foster.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From DFS@nospam@dfs.com to comp.lang.c on Fri Mar 20 22:35:06 2026

From Newsgroup: comp.lang.c

On 3/20/2026 9:53 PM, Janis Papanagnou wrote:

On 2026-03-20 22:10, DFS wrote:

Sometimes listening to you clc guys is like listening to a room full
of CS professors (which I suspect some of you are or were).

CS professors sitting in their ivory tower and without practical
experiences, and programmers without a substantial CS background;
both of these extreme characters can be problematic (if fanatic).

It was meant as a compliment. Plenty of CS professors have good
practical and industry experience, too, which you'll see on their bios
and cv's.

It's got to be tough to find good CS teachers that stick around, given
they can probably make much more money in private industry.

Many folks here seem to have a good mix of necessary practical
IT, Project, and CS knowledge and proficiencies, as I'd value it.
And a clear mind to discuss topics. Sometimes it gets heated and
personal, though; certainly nothing to foster.

I've definitely seen Bart take some heat for continuing to post code in
his scripting language, rather than C.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Fri Mar 20 23:07:01 2026

From Newsgroup: comp.lang.c

DFS <nospam@dfs.com> writes:

On 3/12/2026 2:24 AM, Bonita Montero wrote:

There was a programming-"contest" on comp.lang.c and I wanted to show
the simpler code in C, here it is:

[C++ code]

C++ really rocks since you've to deal with much less details than in C.

Is your strategy to just ignore reality, and keep making bogus claims
that - for this challenge at least - you can't support?

Please don't encourage people who insist on using comp.lang.c
for other languages. It's better to just ignore them.
--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Sat Mar 21 15:39:54 2026

From Newsgroup: comp.lang.c

On 20/03/2026 22:10, DFS wrote:

On 3/20/2026 3:35 AM, David Brown wrote:

(I'd expect that all the regulars here will know these things about
sorting algorithms, but there's always a chance that someone learns
something from the posts.)

This.

Sometimes listening to you clc guys is like listening to a room full of
CS professors (which I suspect some of you are or were).

I can't speak for others, but I am not an academic (though my university degree was quite theoretical - maths and computation).

And sometimes the people that learn from these kinds of posts are the
people making the posts. I know I have learned things as a result of
making posts (though I don't think I have learned anything new in this particular discussion).

After all, the fastest way to get an answer to a question on the
internet is to post an incorrect claim!

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Sat Mar 21 14:42:38 2026

From Newsgroup: comp.lang.c

On 21/03/2026 02:35, DFS wrote:

On 3/20/2026 9:53 PM, Janis Papanagnou wrote:

On 2026-03-20 22:10, DFS wrote:

Sometimes listening to you clc guys is like listening to a room full
of CS professors (which I suspect some of you are or were).

CS professors sitting in their ivory tower and without practical
experiences, and programmers without a substantial CS background;
both of these extreme characters can be problematic (if fanatic).

It was meant as a compliment. Plenty of CS professors have good
practical and industry experience, too, which you'll see on their bios
and cv's.

It's got to be tough to find good CS teachers that stick around, given
they can probably make much more money in private industry.

Many folks here seem to have a good mix of necessary practical
IT, Project, and CS knowledge and proficiencies, as I'd value it.
And a clear mind to discuss topics. Sometimes it gets heated and
personal, though; certainly nothing to foster.

I've definitely seen Bart take some heat for continuing to post code in
his scripting language, rather than C

People post bits of code in all sorts of languages when it suits them.
It's happened in this thread (Bash, AWK, Python as well as C++), and in
many others.

But I get a lot of heat because I use my personal languages, especially
my systems language which is probably the closest to C in type system
and capabilities (if not in syntax) that I know of.

Not so much posting code in it, but using my experience and perspective
to talk about aspects of language design, or about how compilers should
work.

Apparently such first-hand experience makes your opinion worth less, not
more!

Actually, to be strictly topical, nobody should even be talking about
any C extensions, or C compilers, or build systems, or programs that
happen to be written in C - only Standard C, The Language. Yet there
have been plenty of discussions about all those and a lot more.

I once made a post about my own C-subset compiler:

https://groups.google.com/g/comp.lang.c/c/0lLNz9lathE/m/Lt4Jh0qqAwAJ

and it was deemed off-topic by Tim Rentsch:

"Please confine your postings in comp.lang.c to topics and subjects
relevant to the C language. None of what you say in your posting
is topical in comp.lang.c. An obvious suggestion is the newsgroup comp.compilers instead."

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Sun Mar 22 02:03:23 2026

From Newsgroup: comp.lang.c

On Fri, 20 Mar 2026 14:47:18 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 20/03/2026 14:08, Michael S wrote:

On Fri, 20 Mar 2026 13:26:34 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 20/03/2026 13:13, Janis Papanagnou wrote:

On 2026-03-20 11:58, Michael S wrote:

[...]
BTW, quicksort is O(N*logN) only as long as comparison is O(1).
Which does not hold in the general case of lexicographic sort.

You have to differentiate the complexity of an _algorithm_ here;
we talk about how many comparisons and how many data swaps are
necessary.

If your _comparisons_ are costly, or your _data swaps_ are costly,
we don't blame the algorithm. If you happen to have a comparison
function of some O(X) the _sorting algorithm_ is still of its own
inherent algorithmic complexity. If you happen to swap BLOB data
by copying every byte (instead of sorting an index, for example)
you also cannot blame the sorting algorithm, its complexity still
holds.

If it were different any complexity measure for algorithms would
be meaningless.

That's all true.

No, it is not. That's all nonsense that demonstrates a shallow
thinking. Comparison method one uses in lexicographic sort is very
much variable part of algorithm rather than something fixed.
Many good methods start sorting by using just few (sometimes one)
leading characters and only in later passes that typically operate
on much much shorter sections, they use full string comparison.
One example where doing it smart is particularly beneficial is a
sort at core of Burrows-Wheeler Transform.

Yes, I realise that various kinds of radix or bucket sorts are often
good for large data sets, either in whole or as the first steps. But
I am not clear about how it contradicts what Janis has been saying -
I feel the two of you are talking somewhat past each other.

Read the thread. We were talking with Bart and understanding each other
well enough. Then came Janis with ignorant statement that in effect was
saying that plain quicksort algorithm with strcmp() as comparison
routine is optimal or close to optimal for lexicographic sorting
similar to one done by Linux or Windows sort commands applied to text
files with 20K to 1M lines of average length of few dozens characters.

The cost
of doing a comparison does not affect the complexity of an algorithm
- but it can certainly affect the best choice of algorithm for the
task in hand.

In case of lexicographic sort comparison most certainly affects BigO.
BigO is about counting elementary operations or, at least, about
counting non-elementary operations that have complexity independent of
N. In out specific case (described in the previous paragraph) an average
number of elementary comparisons within strcmp() does depend on N. One
can expect that it grows with N, because for longer N quicksort spends relatively more time comparing strings with longer and longer common
prefixes.

I did a couple of experiments.
Here is a number of character comparisons during sorting by quicksort
with strcmp-alike comparison as a function number of sorted strings.
Inputs used for experiments were first N lines of two long books (those
that I suggested to Bart in other post in this thread).

Book 1:
1000 40367
2000 93404
3000 156995
4000 238062
5000 322275
6000 401332
7000 483294
8000 588128
9000 707559
10000 811159
11000 969934
12000 1061037
13000 1172776
14000 1327489
15000 1456356
16000 1565355
17000 1691351
18000 1811836
19000 1971700
20000 2101477
21000 2237132
22000 2348410
23000 2492149
24000 2618858
25000 2726828
26000 2881609
27000 3017903
28000 3183088
29000 3320716
30000 3523695
31000 3719782
32000 3852290
33000 4007476
34000 4149704
35000 4228143
36000 4399583
37000 4581223
38000 4761085
39000 4919182
40000 5110223
41000 5463271
42000 5605860
43000 5830282
44000 6096857
45000 6464758
46000 6860786
47000 7101972
48000 7434769
49000 7736464
50000 8028681
51000 8334358
52000 8560822
53000 8814096
54000 8986343
55000 9162604
56000 9358948
57000 9468853
58000 9638512
59000 9863370
60000 10000734
61000 10242187
62000 10500148
63000 10654166
64000 10882232
65000 11187624
66000 11290778
67000 11481047
68000 11664433
69000 11858529
70000 12059970
71000 12173240
72000 12232160
73000 12418818
74000 12577954
75000 12786122
76000 12939916
77000 13083515
78000 13275432
79000 13616563
80000 13797048
81000 14137362
82000 14388124
83000 14667272
84000 14914600
85000 15174195
86000 15227048
87000 15414838
88000 15526278
89000 15670034
90000 15966396
91000 16120859
92000 16267285
93000 16402168
94000 16718506
95000 16968960
96000 17169618
97000 17362159
98000 17486179
99000 17944209
100000 18085095
101000 18455071
102000 18733130
102100 18769521

Book 2:
1000 104846
2000 243080
3000 399564
4000 661436
5000 855870
6000 1017042
7000 1205936
8000 1341884
9000 1511727
10000 1681973
11000 1820114
12000 1978196
13000 2081813
14000 2162091
15000 2280537
16000 2375272
17000 2500672
18000 2617364
19000 2787782
20000 2960678
21000 3149483
22000 3286894
23000 3481047
24000 3638352
25000 3842186
26000 4001105
27000 4083466
28000 4239273
29000 4399250
30000 4505316
30384 4598719

It is easy to see that at the beginning # of comparisons grows faster
than N*log(N). For the 1st book the trend continues throughout all
chart. For the 2nd book it does not.
I'd guess that the reasons behind it is that the 1st book is relatively homogeneous, while the 2nd book consists of rather distinct parts,
likely with different characteristics of beginnings of lines.
At parts the 2nd book is full of periods, of rhythmical prose close to
poetry and at other parts it is not. I didn't have time to dig deeper.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sun Mar 22 04:03:56 2026

From Newsgroup: comp.lang.c

On 2026-03-22 01:03, Michael S wrote:

On Fri, 20 Mar 2026 14:47:18 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 20/03/2026 14:08, Michael S wrote:

On Fri, 20 Mar 2026 13:26:34 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 20/03/2026 13:13, Janis Papanagnou wrote:

On 2026-03-20 11:58, Michael S wrote:

[...]
BTW, quicksort is O(N*logN) only as long as comparison is O(1).
Which does not hold in the general case of lexicographic sort.

You have to differentiate the complexity of an _algorithm_ here;
we talk about how many comparisons and how many data swaps are
necessary.

If your _comparisons_ are costly, or your _data swaps_ are costly,
we don't blame the algorithm. If you happen to have a comparison
function of some O(X) the _sorting algorithm_ is still of its own
inherent algorithmic complexity. If you happen to swap BLOB data
by copying every byte (instead of sorting an index, for example)
you also cannot blame the sorting algorithm, its complexity still
holds.

If it were different any complexity measure for algorithms would
be meaningless.

That's all true.

No, it is not. That's all nonsense that demonstrates a shallow
thinking. Comparison method one uses in lexicographic sort is very
much variable part of algorithm rather than something fixed.
Many good methods start sorting by using just few (sometimes one)
leading characters and only in later passes that typically operate
on much much shorter sections, they use full string comparison.
One example where doing it smart is particularly beneficial is a
sort at core of Burrows-Wheeler Transform.

Yes, I realise that various kinds of radix or bucket sorts are often
good for large data sets, either in whole or as the first steps. But
I am not clear about how it contradicts what Janis has been saying -
I feel the two of you are talking somewhat past each other.

Read the thread. We were talking with Bart and understanding each other
well enough. Then came Janis with ignorant statement that in effect was saying that plain quicksort algorithm with strcmp() as comparison
routine is optimal or close to optimal for lexicographic sorting
similar to one done by Linux or Windows sort commands applied to text
files with 20K to 1M lines of average length of few dozens characters.

What I said were various things; correcting mainly misinformation (or inaccuracies) you spread! - Basically that the O-complexity measures
of sorting algorithms do not include the effort you spend in the
comparison function or in the data-move function. If they did they
would be *arbitrary* and no _reliable quality measure_ of algorithms!
(Then you could compare e.g. climatic state of two country entries
by doing the complex hydrodynamic calculations and the whole sorting
function would become of combinatorial complexity. This is an extreme
example but makes your wrong thinking obvious. Less extreme examples
are the already mentioned physical move-operations of BLOBs, instead
of working with indexes/references/pointers.)

The main point with Bart was his "reasoning" for his use of Bubblesort
which was just ridiculous.

The cost
of doing a comparison does not affect the complexity of an algorithm
- but it can certainly affect the best choice of algorithm for the
task in hand.

Yes, indeed. - The program designer has still the responsibility to
choose, besides the appropriate sorting algorithms, also the way how
to implement the whole function efficiently.

Janis

[...]

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sun Mar 22 04:39:07 2026

From Newsgroup: comp.lang.c

On 2026-03-20 13:42, Michael S wrote:

On Fri, 20 Mar 2026 08:35:05 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 20/03/2026 04:01, Janis Papanagnou wrote:

Ah, I now see you know all that already, so I could have spared some
writing. :-)

It's still nice to see we are on the same page. We all have daft
ideas or unexpected misunderstandings at times - things we've "always
known" that are actually completely wrong. So it's nice to get
conformation that others, thinking and writing independently, reach
the same conclusions.

(I'd expect that all the regulars here will know these things about
sorting algorithms, but there's always a chance that someone learns
something from the posts.)

Pay attention that while it is not codified in C++ Standard,
implementations of std::sort are expected to be "in-place", which
practically means that extra storage should be O(logN) or at worst O(sqrt(N)). It means that merge sort is out of question. Radix/Count
sort is out of question both for this reason and because std::sort API
does not provide sufficient guarantees about the structure of key.
Heapsort is o.k in that regard.
I think that most real-world STL implementations have heapsort as a
back up for extremely rare case of primary algorithm (quicksort with median-of-3 pivot) misbehaving.

I had found these statements:

* By default, std::sort() uses Introsort, a hybrid algorithm
combining Quick Sort, Heap Sort, and Insertion Sort.
* Its time complexity is O(Nlog(N)) in the average and worst
cases.

The Insertion Sort function had ever been inherent part of Quicksort implementations for data sub-ranges once they become of small sizes.

I'm positive, though, that your statement of "misbehaving" Quicksort
makes absolutely no sense (at least as you formulated it). How could
a call to sort() decide to use a "backup" algorithm; the very "rare"
O(N^2) corner case that you spoke about is depending on the _actual_
_data_ and cannot be characterized a priori!

What the Introsort algorithm actually does is dynamically depending
on the _recursion depth_, and to control that. To quote:
"Introsort begins with quicksort and if the recursion depth
goes more than a particular limit it switches to Heapsort".

But then you should also know that you can also natively in Quicksort
control the recursive calls to not exceed recursive calls or the stack
size by log(N).

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sun Mar 22 04:50:52 2026

From Newsgroup: comp.lang.c

On 2026-03-21 03:35, DFS wrote:

On 3/20/2026 9:53 PM, Janis Papanagnou wrote:

On 2026-03-20 22:10, DFS wrote:

Sometimes listening to you clc guys is like listening to a room full
of CS professors (which I suspect some of you are or were).

CS professors sitting in their ivory tower and without practical
experiences, and programmers without a substantial CS background;
both of these extreme characters can be problematic (if fanatic).

It was meant as a compliment. Plenty of CS professors have good
practical and industry experience, too, which you'll see on their bios
and cv's.

This is certainly correct. - But I've met all sorts of "CV"s. Titles
alone doesn't mean anything! Some proficiencies are how good they are
teaching, doing relevant research, having actually spent significant
time in relevant industrial areas (as you say), and whatnot.

It's got to be tough to find good CS teachers that stick around, given
they can probably make much more money in private industry.

Depends on the mindset, I suppose; whether their primary goal is to
make money or to disseminate their valuable academic knowledge.

I've met the whole range from excellent to lousy CS teachers.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sun Mar 22 04:57:22 2026

From Newsgroup: comp.lang.c

On 2026-03-21 15:42, Bart wrote:

[...]

Actually, to be strictly topical, nobody should even be talking about
any C extensions, or C compilers, or build systems, or programs that
happen to be written in C - only Standard C, The Language. Yet there
have been plenty of discussions about all those and a lot more.

I once made a post about my own C-subset compiler:

https://groups.google.com/g/comp.lang.c/c/0lLNz9lathE/m/Lt4Jh0qqAwAJ

and it was deemed off-topic by Tim Rentsch:

"Please confine your postings in comp.lang.c to topics and subjects
relevant to the C language. None of what you say in your posting
is topical in comp.lang.c. An obvious suggestion is the newsgroup comp.compilers instead."

He's (as you or others here) an individual. If you think he's valuing situations or your contents (and on a regular basis) inappropriately
I suggest to just ignore him. It make little sense to whine about it.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sun Mar 22 05:06:46 2026

From Newsgroup: comp.lang.c

On 2026-03-20 13:24, Michael S wrote:

The challenge was issued for David Brown and for Bart.

If you think that Usenet is for private communication you've a
fundamental misconception about that.

I never expected that you will give constructive reply.

Your perception may be severely impaired, but I don't care much.

I don't know about you, but I find requests for clarification a
sensible demand. You didn't answer to that but preferred keeping
the discussion muddy with your phrases; the context was you saying:

Remember that it has to be "real" bubble sort, not a simplified bubble
sort that does unnecessary work by starting each time from the
beginning. [...]

And I noted:

(There's Bubblesort. There's not "real" Bubblesort. Such phrases
neither explain anything nor are they helpful for discussions.)

You could have clarified that fuzzy statement instead of rambling.

Thank you for confirming my expectations.

I hope you feel good in your mental bubble.[*]

Janis

[*] No pun with Bubblesort intended.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Sun Mar 22 08:33:37 2026

From Newsgroup: comp.lang.c

On Sun, 22 Mar 2026 04:39:07 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-03-20 13:42, Michael S wrote:

On Fri, 20 Mar 2026 08:35:05 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 20/03/2026 04:01, Janis Papanagnou wrote:

Ah, I now see you know all that already, so I could have spared
some writing. :-)

It's still nice to see we are on the same page. We all have daft
ideas or unexpected misunderstandings at times - things we've
"always known" that are actually completely wrong. So it's nice
to get conformation that others, thinking and writing
independently, reach the same conclusions.

(I'd expect that all the regulars here will know these things about
sorting algorithms, but there's always a chance that someone learns
something from the posts.)

Pay attention that while it is not codified in C++ Standard, implementations of std::sort are expected to be "in-place", which practically means that extra storage should be O(logN) or at worst O(sqrt(N)). It means that merge sort is out of question. Radix/Count
sort is out of question both for this reason and because std::sort
API does not provide sufficient guarantees about the structure of
key. Heapsort is o.k in that regard.
I think that most real-world STL implementations have heapsort as a
back up for extremely rare case of primary algorithm (quicksort with median-of-3 pivot) misbehaving.

I had found these statements:

* By default, std::sort() uses Introsort, a hybrid algorithm
combining Quick Sort, Heap Sort, and Insertion Sort.
* Its time complexity is O(Nlog(N)) in the average and worst
cases.

The Insertion Sort function had ever been inherent part of Quicksort implementations for data sub-ranges once they become of small sizes.

May be, it is true for STL. But not generally.
There are plenty of implementations of quicksort that switch to other
algorithm at short sections or don't switch at all.
For example, when one sorts array of big records by small by simple key (expensive move, cheap comparison) then staright select sort would be
faster than straight insertion sort.
STL, being generic, does not know relative cost of move vs comparison,
so it has to peek one or another algo without such knowledge. It peeks insertion sort, which is reasonable, because in practice people rarely
sort arrays of big records, more commonly they sort arrays of pointers
to big records.

I'm positive, though, that your statement of "misbehaving" Quicksort
makes absolutely no sense (at least as you formulated it). How could
a call to sort() decide to use a "backup" algorithm; the very "rare"
O(N^2) corner case that you spoke about is depending on the _actual_
_data_ and cannot be characterized a priori!

If you decided to misunderstand my statement, which of course always
was about dynamic swithing then I can not do anything about it.
IMHO, my statement was hard to misunderstand.

What the Introsort algorithm actually does is dynamically depending
on the _recursion depth_, and to control that. To quote:
"Introsort begins with quicksort and if the recursion depth
goes more than a particular limit it switches to Heapsort".

But then you should also know that you can also natively in Quicksort
control the recursive calls to not exceed recursive calls or the stack
size by log(N).

Yes. Just always sort shorter of two sub-sections after split ahead
of longer sub-section and you are guaranteed to never exceed recursion
depth of log2(N).
The same can happen even without one's intention during O(N**2)
behavior. And it means that recursion depth alone can't be used for
robust switching from quicksort to heapsort.
It is easy to invent few more robust criteria for switching. I don't
know what criterion is chosen by "default" (what it means, BTW?) by
this or that STL author.

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Sun Mar 22 09:30:16 2026

From Newsgroup: comp.lang.c

On Sun, 22 Mar 2026 05:06:46 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-03-20 13:24, Michael S wrote:

The challenge was issued for David Brown and for Bart.

If you think that Usenet is for private communication you've a
fundamental misconception about that.

I never expected that you will give constructive reply.

Your perception may be severely impaired, but I don't care much.

Probably I had to be more polite.
But you were not involved in this particular argument where Bart and
David were claiming that good implementation of bublesort is of the
same complexity of coding as Staright Insertion and Straigh Select sorts
and I was claiming that the lattter two are simpler, even if not by
much.

So, even if you provide your variant of bublesort coded in 'C' it would
not be conclusive, because you are not a party interstead in the proof
of his point.

I don't know about you, but I find requests for clarification a
sensible demand. You didn't answer to that but preferred keeping
the discussion muddy with your phrases; the context was you saying:

Remember that it has to be "real" bubble sort, not a simplified
bubble sort that does unnecessary work by starting each time from
the beginning. [...]

And I noted:

(There's Bubblesort. There's not "real" Bubblesort. Such phrases
neither explain anything nor are they helpful for discussions.)

You could have clarified that fuzzy statement instead of rambling.

Thank you for confirming my expectations.

I hope you feel good in your mental bubble.[*]

Janis

[*] No pun with Bubblesort intended.

--- Synchronet 3.21d-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Microbot
  Sun Mar 22 08:05:36 2026
  from Moore, Ok via Telnet
- Pixelrez
  Sat Mar 21 16:03:42 2026
  from Lenexa,ks via Telnet
- Pixelrez
  Sat Mar 21 15:57:39 2026
  from Lenexa,ks via Telnet
- Pixelrez
  Sat Mar 21 15:57:11 2026
  from Lenexa,ks via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,104
Nodes:	10 (1 / 9)
Uptime:	492386:38:07
Calls:	14,150
Calls today:	1
Files:	186,281
D/L today:	2,214 files (841M bytes)
Messages:	2,501,137

Isn't that beauty ?

Who's Online

Recent Visitors

System Info