• Bare Metal C vs. libc: Is the overhead worth it on small MCUs?,08:52

    From Oguz Kaan Ocal@oguzkaanocal3169@hotmail.com to comp.lang.c on Tue Mar 17 10:09:39 2026
    From Newsgroup: comp.lang.c

    Hey guys,

    I've been debating this with myself for a while now. When you're working
    on resource-constrained hardware, do you guys actually use the standard
    C library (libc) or do you go full Bare Metal for everything?

    I'm talking about things like using sprintf() vs. writing your own
    itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
    the binary by several KB and eat up the stack like crazy.

    A few things I'm curious about:

    Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?

    For string manipulation (memcpy, memset, etc.), do you trust the
    compiler's built-in optimizations or do you write your own
    assembly/manual loops to save those extra cycles?

    How do you handle things like dynamic memory? Is malloc() ever
    acceptable in a mission-critical embedded loop, or is it always static allocation only?

    I feel like using the standard library is "cheating" and adds too much
    hidden overhead, but rewriting every basic utility feels like
    reinventing the wheel.

    What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Tue Mar 17 09:55:21 2026
    From Newsgroup: comp.lang.c

    Am 17.03.2026 um 08:09 schrieb Oguz Kaan Ocal:

    For string manipulation (memcpy, memset, etc.), do you trust the
    compiler's built-in optimizations or do you write your own assembly/
    manual loops to save those extra cycles?

    Tell me which application you have where the memory availabilit is
    like that (32kiB) and you have constraints on the performance of string-processing ?

    How do you handle things like dynamic memory? Is malloc() ever
    acceptable in a mission-critical embedded loop, or is it always
    static allocation only?

    I guess on systems which have 32kiB every data is allocated
    statically; maybe with little object pools.

    What's your take? Do you prefer the portability of standard C or
    the lean-and-mean performance of custom bare-metal implementations?

    I don't like C, but when it comes to such little applications it is
    the right choice since in other language you would have the extra
    work to invent everything of a standard lib yourself also.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Tue Mar 17 11:08:55 2026
    From Newsgroup: comp.lang.c

    On Tue, 17 Mar 2026 10:09:39 +0300
    Oguz Kaan Ocal <oguzkaanocal3169@hotmail.com> wrote:

    Hey guys,

    <snip>

    How do you handle things like dynamic memory? Is malloc() ever
    acceptable in a mission-critical embedded loop, or is it always
    static allocation only?


    malloc() at initialization is o.k.
    free() - no, in my opinion it is too much.

    I feel like using the standard library is "cheating" and adds too
    much hidden overhead, but rewriting every basic utility feels like reinventing the wheel.

    What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?

    I did no really small staff recently (==decade+).
    For sizes I do "almost standard clib with few corners cut" in form of newlib-nano is sufficiently small.


    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Oguz Kaan Ocal@oguzkaanocal3169@hotmail.com to comp.lang.c on Tue Mar 17 12:30:14 2026
    From Newsgroup: comp.lang.c

    On 17.03.2026 11:55, Bonita Montero wrote:

    Tell me which application you have where the memory availabilit is
    like that (32kiB) and you have constraints on the performance of string-processing ?

    It’s not just about printing 'Hello World'. Think about high-frequency
    NMEA parsing or handling AT commands on a low-power MCU (clocks at 1-4
    MHz for energy efficiency). In those cases, sscanf or sprintf are not
    just bloated in terms of Flash, but they also introduce unacceptable
    latency.

    Also, when driving an OLED display over SPI within a tight loop, every
    cycle spent on string formatting is a cycle lost for UI responsiveness.
    In 32KB systems, I'd rather spend my cycles on the application logic
    than on a generic, 'one-size-fits-all' libc function.

    I guess on systems which have 32kiB every data is allocated
    statically; maybe with little object pools.

    Exactly. On such tight constraints, determinism is king. You don't want
    your mission-critical loop to fail after 48 hours just because of heap fragmentation. Using static buffers or fixed-size object pools gives you
    a clear picture of your memory map at compile-time. If it fits in the
    linker script, it fits in the chip.

    I don't like C, but when it comes to such little applications it is
    the right choice since in other language you would have the extra
    work to invent everything of a standard lib yourself also.

    It's a love-hate relationship for many of us. C's biggest strength isn't
    just the language itself, but the fact that it's the lingua franca of
    embedded systems.

    Even if you want to switch to something like Rust or Zig for better
    safety, the 'extra work' to integrate with existing vendor HALs and
    headers can be a huge deterrent on small chips. At 32KB, you want to
    spend your time fighting the memory constraints, not the toolchain. That
    said, do you think C's dominance is strictly due to its standard library support, or is it just the lack of better 'bare-metal' alternatives for
    these tiny targets?


    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Mar 17 10:37:39 2026
    From Newsgroup: comp.lang.c

    On 17/03/2026 08:09, Oguz Kaan Ocal wrote:
    Hey guys,


    You might also be interested in the comp.arch.embedded group. It is not
    as popular as comp.lang.c, but there are a number of folks who quietly
    follow the group in case interesting questions turn up.

    I've been debating this with myself for a while now. When you're working
    on resource-constrained hardware, do you guys actually use the standard
    C library (libc) or do you go full Bare Metal for everything?


    I use some of the standard library in my bare-metal (or small RTOS)
    coding. The freestanding, or function-free headers are of course in
    heavy use - I have <stdint.h> <stdbool.h> in pretty much every file.
    (The latter is now unnecessary with C23.) Small utility functions, like memcpy and string functions, are also fine in almost every system.
    Typically a good compiler will inline many of those uses directly rather
    than calling a library function.

    Like most serious small-system embedded programmers, I avoid any use of malloc/free except in very specific and controlled circumstances. Even
    then, I usually use specially written dynamic memory functions.

    From the printf family, snprintf is the most common choice. For
    smaller systems I'll often have specifically written conversion routines
    for output (typically a debug UART connection). For bigger systems,
    I'll often use snprintf to do the formatting for convenience. Yes, the
    printf family has overhead in code size and runtime. If that overhead
    is too big for the target, I don't use it - if it is small enough not to matter, I /do/ use it. There is no fixed rule. (For many toolchains,
    you can choose between a (sn)printf that has floating point support, and
    one that does not - if you don't need floating point output, especially
    if the target does not have floating point hardware, picking the
    non-floating point version can save a large amount of code space.)

    I'm talking about things like using sprintf() vs. writing your own
    itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
    the binary by several KB and eat up the stack like crazy.

    A few things I'm curious about:

    Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?


    Nobody /ever/ uses the full standard library, in any code, on any
    system. You only ever use the parts of the library that make sense to
    the code you are writing. That applies whether you are programming for
    a data centre or for a tiny microcontroller.

    Basically, if the C library has a function that does the job you need,
    and the overhead (due to features you don't need) is not too high, use
    that function. There are no fixed rules. And in embedded systems your
    code typically does not need to be highly portable, so feel free to use additional functions provided by your toolchain and its library.

    There are parts of the C library that rarely make any sense on small
    systems - the file I/O functions, signals, wide characters, signals,
    most maths functions, etc. There are parts that are generally fine (but
    read the small print!) such as the mem* and str* functions. Memory
    management is always a big concern, so generic malloc/free are often not
    a good idea. The big thing to be wary of, however, are the atomics and
    thread functions in C11 - these are often either useless or far worse
    than useless in embedded toolchains.

    For string manipulation (memcpy, memset, etc.), do you trust the
    compiler's built-in optimizations or do you write your own
    assembly/manual loops to save those extra cycles?


    It is a /long/ time since I have felt my hand-written assembly does
    better than the compiler. But it can depend on your target. If you
    still have to deal with brain-dead 8-bit CISC devices, manual assembly
    can sometimes be faster (but "faster" is not the only meaning of
    "better"). For decent microcontrollers - 32-bit ARM Cortex-M being
    massively dominant in the market - you are unlikely to do better than
    the compiler in any meaningful way. Of course, you do have to have a
    basic understanding of the compiler and how to work with it to generate efficient results. If you don't enable optimisations, or use
    appropriate target flags, or write the code in a way that can lead to
    good generated code, then your end results will be poor.

    Note that it is fully possible to view the generated assembly code. In situations where speed is critical and each cycle counts, I always
    examine the generated assembly, and if necessary adjust my C code or use additional compiler features until I am happy with the assembly. That
    is a more flexible and maintainable solution that writing the assembly directly. If I do need to write assembly, I do so with absolute minimal inline assembly code that can be optimised within the C code (at least
    with gcc). In fact, most inline assembly code I use is actually empty,
    but exists to force particular effects in gcc.

    How do you handle things like dynamic memory? Is malloc() ever
    acceptable in a mission-critical embedded loop, or is it always static allocation only?

    Malloc is rarely acceptable in real-time embedded systems - at least for
    the important stuff. /Maybe/ you can use it in connection with logging
    or debugging code. The real problem is not malloc, but free - it is not uncommon to see embedded systems where the "malloc" implementation is
    just a stack, and "free" does not exist. If you really need dynamic
    memory (and you often do for things using Ethernet or Wifi networks),
    you will often want a specialised setup with allocation pools for
    different purposes.


    I feel like using the standard library is "cheating" and adds too much hidden overhead, but rewriting every basic utility feels like
    reinventing the wheel.

    What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?

    I use what does the job in a way that I know is correct and efficient
    enough for the task at hand. There is no single universal answer. As
    for portability, I don't make code unnecessarily non-portable (within
    reason - there's usually no need to be obsessive about code that works
    on /every/ system). But I have no qualms about making code non-portable
    if there are advantages in doing so, and if attempts to compile the code
    on unsuitable systems will give a compile-time error.


    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Oguz Kaan Ocal@oguzkaanocal3169@hotmail.com to comp.lang.c on Tue Mar 17 12:49:55 2026
    From Newsgroup: comp.lang.c

    On 17.03.2026 12:08, Michael S wrote:
    malloc() at initialization is o.k.
    free() - no, in my opinion it is too much.

    I fully agree with the "allocate once, never free" approach. It’s a
    solid middle ground that gives you the flexibility of dynamic setup at
    boot while keeping the main loop safe from the nightmare of heap fragmentation. It's actually a pattern I see even in high-reliability
    projects where a runtime crash is not an option.

    For sizes I do "almost standard clib with few corners cut" in form of newlib-nano is sufficiently small.

    Good point on newlib-nano. It really solves that "cheating" vs
    "reinventing" dilemma by stripping out the heavy features (like complex
    float support) while keeping the familiar API. At 32KB, it seems like
    the sweet spot where you don't have to fight the toolchain, but you also
    don't waste precious Flash on things you'll never use.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Oguz Kaan Ocal@oguzkaanocal3169@hotmail.com to comp.lang.c on Tue Mar 17 12:55:14 2026
    From Newsgroup: comp.lang.c

    On 17.03.2026 12:37, David Brown wrote:
    It is a /long/ time since I have felt my hand-written assembly does
    better than the compiler. [...] In situations where speed is critical,
    I always examine the generated assembly, and if necessary adjust my C
    code.

    I think this is the most professional way to handle it. Trusting the
    compiler but verifying the output. As you said, modern compilers
    (especially for Cortex-M) are incredibly good at inlining memcpy or
    memset into a few store instructions. I’ve found that fighting the
    compiler with manual assembly often just breaks the optimizer's ability
    to schedule instructions around it.

    The real problem is not malloc, but free - it is not
    uncommon to see embedded systems where the "malloc" implementation is
    just a stack, and "free" does not exist.

    This is a great distinction. A "bump allocator" (or stack-based malloc)
    is essentially harmless because it doesn't suffer from fragmentation.
    It's the non-deterministic nature of free() and the heap-rebuilding
    overhead that kills real-time performance.

    Nobody /ever/ uses the full standard library, in any code, on any
    system. You only ever use the parts of the library that make sense to
    the code you are writing.

    That’s a very grounding perspective. I think I fell into the trap of
    viewing libc as a monolithic "all-or-nothing" package. Using <stdint.h>
    and <stdbool.h> (or moving to C23 as you mentioned to avoid the header)
    is just common sense.

    From the printf family, snprintf is the most common choice. [...]
    if you don't need floating point output, picking the
    non-floating point version can save a large amount of code space.

    I’ve definitely seen the "printf-float" flag make or break a 32KB flash limit.

    Thanks for the tip about comp.arch.embedded; I’ll definitely check that group out for more of these "quietly interesting" discussions. It seems
    the consensus here is that "cheating" isn't a thing—efficiency and correctness are the only metrics that matter.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Mar 17 11:15:24 2026
    From Newsgroup: comp.lang.c

    On 2026-03-17 08:09, Oguz Kaan Ocal wrote:

    I've been debating this with myself for a while now. When you're working
    on resource-constrained hardware, do you guys actually use the standard
    C library (libc) or do you go full Bare Metal for everything?

    I'm talking about things like using sprintf() vs. writing your own
    itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
    the binary by several KB and eat up the stack like crazy.

    If I don't need any complex formatting of the 'printf' family and
    the required functionality was comparably primitive I had written
    own functions (for various things); that was in the (farther) past.

    A few things I'm curious about:

    Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?

    The "smallest" system I have written software for was a DSP, and
    I've done it in assembler, heavily optimized for memory and speed.


    For string manipulation (memcpy, memset, etc.), do you trust the
    compiler's built-in optimizations or do you write your own assembly/
    manual loops to save those extra cycles?

    For a C++ library - back then when there was no 'string' available
    in C++ - we wrote a string library based on memcpy, memcmp, etc.
    (note: not strcpy, strcmp, etc. because we supported '\0' as part
    of the string alphabet). So, yes, I trust the standard libraries.


    How do you handle things like dynamic memory? Is malloc() ever
    acceptable in a mission-critical embedded loop, or is it always static allocation only?

    For the DSP project not at all dynamic; all allocation was static.


    I feel like using the standard library is "cheating" and adds too much hidden overhead, but rewriting every basic utility feels like
    reinventing the wheel.

    I suggest to replace the "feeling" by measurement. - What are the
    requirements, what does your system provides to fulfill them, are
    sensible questions!

    But I also wouldn't reinvent the wheel unnecessarily, or write my
    own libraries, or an own version of the "C" language, etc.


    What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?

    If you don't use portable libraries and implement all by yourself
    you also deliver your code with your product. So there's nothing
    wrong with that in principle. I would avoid proprietary libraries.

    (My guess would be that nowadays you'd also have larger and better
    offerings than back in the 1980's.)

    Janis

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Tue Mar 17 14:53:30 2026
    From Newsgroup: comp.lang.c

    Oguz Kaan Ocal <oguzkaanocal3169@hotmail.com> writes:
    Hey guys,

    I've been debating this with myself for a while now. When you're working
    on resource-constrained hardware, do you guys actually use the standard
    C library (libc) or do you go full Bare Metal for everything?

    I'm talking about things like using sprintf() vs. writing your own
    itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
    the binary by several KB and eat up the stack like crazy.


    For bare metal code like Operating Systems and Hypervisors,
    we never used libc directly, but rather re-implemented
    the portions necessary including the necessary parts of
    the crt (C Run-time), such as calling C++ static constructors,
    implementations of the default 'new' and 'delete' operators,
    atexit stub, itoa, atoi, a printf formatter, str* functions,
    et alia.

    Amounted about a 1000 lines of C + assembler.

    Other boot-time functionality includes setting up the hardware
    (e.g. page tables, MTRR and other privileged registers, etc.)
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Mar 17 16:56:04 2026
    From Newsgroup: comp.lang.c

    On 17/03/2026 10:55, Oguz Kaan Ocal wrote:
    On 17.03.2026 12:37, David Brown wrote:
    It is a /long/ time since I have felt my hand-written assembly does
    better than the compiler. [...] In situations where speed is critical,
    I always examine the generated assembly, and if necessary adjust my C
    code.

    I think this is the most professional way to handle it. Trusting the compiler but verifying the output. As you said, modern compilers
    (especially for Cortex-M) are incredibly good at inlining memcpy or
    memset into a few store instructions. I’ve found that fighting the compiler with manual assembly often just breaks the optimizer's ability
    to schedule instructions around it.


    I've often seen people write manual assembly that is, at best, very
    fragile because they don't understand the interaction with the compiler.

    Most of the gcc "asm" statements I use in code these days are empty -
    it's the interaction with the compiler that matters. (A common example
    is a memory barrier - asm volatile ("" ::: "memory"); )

    The real problem is not malloc, but free - it is not
    uncommon to see embedded systems where the "malloc" implementation is
    just a stack, and "free" does not exist.

    This is a great distinction. A "bump allocator" (or stack-based malloc)
    is essentially harmless because it doesn't suffer from fragmentation.
    It's the non-deterministic nature of free() and the heap-rebuilding
    overhead that kills real-time performance.


    It is still preferable, where possible, to use static allocation rather
    than such allocators. Static allocation makes things clearer in map
    files, gives you consistent addresses to aid during debugging, can be marginally more efficient, and - most importantly - overuse of memory is
    a build-time failure rather than a run-time failure.

    Nobody /ever/ uses the full standard library, in any code, on any
    system. You only ever use the parts of the library that make sense to
    the code you are writing.

    That’s a very grounding perspective. I think I fell into the trap of viewing libc as a monolithic "all-or-nothing" package. Using <stdint.h>
    and <stdbool.h> (or moving to C23 as you mentioned to avoid the header)
    is just common sense.

    From the printf family, snprintf is the most common choice. [...]
    if you don't need floating point output, picking the
    non-floating point version can save a large amount of code space.

    I’ve definitely seen the "printf-float" flag make or break a 32KB flash limit.

    Thanks for the tip about comp.arch.embedded; I’ll definitely check that group out for more of these "quietly interesting" discussions. It seems
    the consensus here is that "cheating" isn't a thing—efficiency and correctness are the only metrics that matter.

    Correctness /is/ the only metric matters until the code is correct.
    Only when the code is correct does it matter how efficient it is. (If
    timing or code size is essential, then that is part of correctness.)
    But for many real-world programs (not just embedded ones), strict
    conformance to particular C standard versions or relying purely on standard-compliant features of the language is not part of correctness.
    I am happy to use compiler-specific features when documented - whether
    they are labelled as implementation-defined in the C standards, or as documented extensions in the compiler manual. But I am not happy to
    rely on undocumented features or assumptions. Such code is often, by my interpretation of the word, incorrect - even if it happens to work.


    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Mar 17 16:59:45 2026
    From Newsgroup: comp.lang.c

    On 17/03/2026 15:53, Scott Lurndal wrote:
    Oguz Kaan Ocal <oguzkaanocal3169@hotmail.com> writes:
    Hey guys,

    I've been debating this with myself for a while now. When you're working
    on resource-constrained hardware, do you guys actually use the standard
    C library (libc) or do you go full Bare Metal for everything?

    I'm talking about things like using sprintf() vs. writing your own
    itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
    the binary by several KB and eat up the stack like crazy.


    For bare metal code like Operating Systems and Hypervisors,
    we never used libc directly, but rather re-implemented
    the portions necessary including the necessary parts of
    the crt (C Run-time), such as calling C++ static constructors, implementations of the default 'new' and 'delete' operators,
    atexit stub, itoa, atoi, a printf formatter, str* functions,
    et alia.

    Amounted about a 1000 lines of C + assembler.

    Other boot-time functionality includes setting up the hardware
    (e.g. page tables, MTRR and other privileged registers, etc.)

    I had a project (on a 68k microcontroller) where I wrote the C startup
    code myself rather than using the toolchain's version. The toolchain's version was hand-written assembly - I wrote it in C (bar a couple of
    assembly instructions), compiled with the same compiler, and the result
    was a lot smaller and faster.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Mar 17 17:03:24 2026
    From Newsgroup: comp.lang.c

    On 17/03/2026 11:15, Janis Papanagnou wrote:
    On 2026-03-17 08:09, Oguz Kaan Ocal wrote:

    Does anyone here still use the full standard library on chips with
    <32KB Flash? Or is it an immediate "no-go" for you?

    The "smallest" system I have written software for was a DSP, and
    I've done it in assembler, heavily optimized for memory and speed.


    The smallest target I used had 2 KB of flash and no ram at all - just 32
    8-bit registers. I programmed it in C, using gcc and a couple of
    assembly instructions to skip any use of the C startup code. I didn't
    use any of the C standard library (except for <stdint.h> and
    <stdbool.h>), because I did not need any. It was quite a simple program!

    I have happily used both C and C++ on devices with 32 KB flash or less.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Tue Mar 17 19:05:21 2026
    From Newsgroup: comp.lang.c

    Am 17.03.2026 um 16:59 schrieb David Brown:

    I had a project (on a 68k microcontroller) where I wrote the C startup
    code myself rather than using the toolchain's version.  The toolchain's version was hand-written assembly - I wrote it in C (bar a couple of assembly instructions), compiled with the same compiler, and the result
    was a lot smaller and faster.

    Performance of startup code ?
    What have you been smoking ?

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Tue Mar 17 18:34:31 2026
    From Newsgroup: comp.lang.c

    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 17.03.2026 um 16:59 schrieb David Brown:

    I had a project (on a 68k microcontroller) where I wrote the C startup
    code myself rather than using the toolchain's version.  The toolchain's
    version was hand-written assembly - I wrote it in C (bar a couple of
    assembly instructions), compiled with the same compiler, and the result
    was a lot smaller and faster.

    Performance of startup code ?
    What have you been smoking ?

    You clearly have absolutely _zero_ experience in the real world.

    Startup performance (think BIOS and OS boot time, for example),
    is very important to real customers. It is particularly important
    for industrial and commercial microcontrollers in appliances and
    industrial applications.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Mar 17 19:48:04 2026
    From Newsgroup: comp.lang.c

    On 17/03/2026 19:34, Scott Lurndal wrote:
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 17.03.2026 um 16:59 schrieb David Brown:

    I had a project (on a 68k microcontroller) where I wrote the C startup
    code myself rather than using the toolchain's version.  The toolchain's >>> version was hand-written assembly - I wrote it in C (bar a couple of
    assembly instructions), compiled with the same compiler, and the result
    was a lot smaller and faster.

    Performance of startup code ?
    What have you been smoking ?

    You clearly have absolutely _zero_ experience in the real world.

    Startup performance (think BIOS and OS boot time, for example),
    is very important to real customers. It is particularly important
    for industrial and commercial microcontrollers in appliances and
    industrial applications.

    While all that is true, and I have worked with systems where startup performance was very important (because the system had to use an
    absolute minimum of energy, and was only powered on by the events it had
    to handle), performance of the startup code was not important in this
    case. There were other reasons for writing my own startup code on that project - there were several bits of hardware that needed to be
    initialised before the usual "clear bss" and similar pre-main code could
    run. The greater efficiency of the C code was a side-effect, not a
    goal. The point was that the C code was significantly (a factor of four
    or more) faster than the hand-written assembly that was written by the toolchain vendor themselves.


    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Tue Mar 17 12:32:31 2026
    From Newsgroup: comp.lang.c

    On 3/17/2026 12:09 AM, Oguz Kaan Ocal wrote:
    Hey guys,

    I've been debating this with myself for a while now. When you're working
    on resource-constrained hardware, do you guys actually use the standard
    C library (libc) or do you go full Bare Metal for everything?

    I'm talking about things like using sprintf() vs. writing your own
    itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
    the binary by several KB and eat up the stack like crazy.

    Well, I used one of my region allocators for a system that had no
    malloc, its an older version of quadros

    https://pastebin.com/raw/f37a23918

    https://groups.google.com/g/comp.lang.c/c/7oaJFWKVCTw/m/sSWYU9BUS_QJ





    A few things I'm curious about:

    Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?

    For string manipulation (memcpy, memset, etc.), do you trust the
    compiler's built-in optimizations or do you write your own assembly/
    manual loops to save those extra cycles?

    How do you handle things like dynamic memory? Is malloc() ever
    acceptable in a mission-critical embedded loop, or is it always static allocation only?

    I feel like using the standard library is "cheating" and adds too much hidden overhead, but rewriting every basic utility feels like
    reinventing the wheel.

    What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Tue Mar 17 21:04:55 2026
    From Newsgroup: comp.lang.c

    Oguz Kaan Ocal <oguzkaanocal3169@hotmail.com> wrote:
    Hey guys,

    I've been debating this with myself for a while now. When you're working
    on resource-constrained hardware, do you guys actually use the standard
    C library (libc) or do you go full Bare Metal for everything?

    I'm talking about things like using sprintf() vs. writing your own
    itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
    the binary by several KB and eat up the stack like crazy.

    A few things I'm curious about:

    Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?

    I have the following in my Makefiles:

    CFLAGS = -Os -fno-builtin ...

    LD_FLAGS = ... -Wl,--gc-sections -nostartfiles -nostdlib

    Which means nothing from toolchain C library gets included in my
    program. The -nostartfiles is partially because in the past
    toolchain startup was calling library functions, partially
    because I use custom startup code.

    My embedded programs are small and most do only a little.
    Frequently binaries are in 1-2 kB range. Strictly speaking
    I could tolerate much bigger binaries as my smallest targets
    have 8kB flash and more typical have 32kB to 256kB flash.
    But for developement I find convenient to compile to RAM
    and that means much smaller size. Also, parts of programs
    are intended to form kind of library that could be used in
    bigger program, so I want to keep each part small.

    For string manipulation (memcpy, memset, etc.), do you trust the
    compiler's built-in optimizations or do you write your own
    assembly/manual loops to save those extra cycles?

    Normally I consider compiler optimizations to be good enough.
    I do see various inefficiencies and by coding in assembly I
    probably could gain 5% in speed and size. But that would
    require large effort so in most cases possible gain in
    speed and code size does not justify effort for assembly
    coding.

    If you mean library functions for string handling I rarely
    use them. In non-embedded code I use 'memcpy', 'strlen'
    and similar when needed. But if I need slightly more
    "interesting" string handling I do this in my own code.
    Frequently a single loop can do what would otherwise
    require several library calls. This may be partially
    because I am used to writing such loops.

    How do you handle things like dynamic memory? Is malloc() ever
    acceptable in a mission-critical embedded loop, or is it always static allocation only?

    I feel like using the standard library is "cheating" and adds too much hidden overhead, but rewriting every basic utility feels like
    reinventing the wheel.

    My feeling is that I do not need most of the standard library.
    Writing simple functions to do things that I need, like
    converting integer to a string is not a big deal. My functions
    typically have simpler interface than C library functions
    and whole implementation is comparable in size with a wrapper
    that offers new interface, so I am not worried about "reinventing
    the wheel".

    For example, my typical output device is 2x16 alphanumeric LCD
    display. Connecting it to an output stream gains almost
    nothing. I have functions to drive the display, they are
    needed as C library does not provide device drivers. I also
    have a function to convert integer into a string and that
    + driver are enough for comfortable use of the display.

    What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?

    As I wrote, I found little use for standard C library so for
    me working without C library is no problem. But if you need
    C library then use it.

    Concerning portablity, note that C library encapulates inherently
    unportable things like interface to OS. Those unportable aspects
    vary quite a lot in embedded systems and C library implementation
    will make choices that may or may not be good for you. If a
    third party makes a complete toolchain and you are statified
    with choices made by the makers, than good. But if not you
    will need to assemble your own toolchain. Which means that you
    may be forced to maintain nonportable part of C library that you
    use. In particular, you may be forced to maintain part that
    you do not use simply to be able to compile the library (and
    use what you need).

    Anyway, once you make choice of a toolchain most portability is
    gone. Your program will depend on choices made by the toolchain
    and will require work porting to a different toolchain.
    Of course, "portable" C functions hopefully will be
    compatible enough in different toolchais. But portable C
    code is likely to be at least as portable. The trouble is
    in device-dependent parts and different toolchains shows
    significant differences.
    --
    Waldek Hebisch
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Tue Mar 17 22:36:48 2026
    From Newsgroup: comp.lang.c

    Am 17.03.2026 um 19:34 schrieb Scott Lurndal:

    Startup performance (think BIOS and OS boot time, for example),
    is very important to real customers. It is particularly important
    for industrial and commercial microcontrollers in appliances and
    industrial applications.

    Yes, 1us vs 1ms.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Mar 18 02:14:20 2026
    From Newsgroup: comp.lang.c

    On 2026-03-17 17:03, David Brown wrote:
    On 17/03/2026 11:15, Janis Papanagnou wrote:
    On 2026-03-17 08:09, Oguz Kaan Ocal wrote:

    Does anyone here still use the full standard library on chips with
    <32KB Flash? Or is it an immediate "no-go" for you?

    The "smallest" system I have written software for was a DSP, and
    I've done it in assembler, heavily optimized for memory and speed.


    The smallest target I used had 2 KB of flash and no ram at all - just 32 8-bit registers.  I programmed it in C, using gcc and a couple of
    assembly instructions to skip any use of the C startup code.  I didn't
    use any of the C standard library (except for <stdint.h> and
    <stdbool.h>), because I did not need any.  It was quite a simple program!

    I have happily used both C and C++ on devices with 32 KB flash or less.

    I'm sure the decisions heavily depend on what you intend to do.

    The DSP-thing I mentioned was actually a comparably complex system;
    the most demanding part was a Viterbi decoder (based on code path
    searches in large graphs). But the system had also a couple other
    (less demanding) functions implemented; like the Convolutional Code
    encoder, CRC codec, interleaver, noise generator, channel I/O, etc.
    I managed to encode all that in 1k (16 bit-)words of assembler code,
    the data organized all in the CPU-internal cache memory (512 words
    available, IIRC), and, as in your case, with no external data memory.

    The tailoring was necessary. The algorithms optimized on both levels,
    the algorithmic and technical level. - I don't think that using a "C"
    or other compiler would have helped in any way to fulfill the memory
    and time requirements. (Also C++ compilers weren't mature back then.)

    Janis

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Wed Mar 18 08:18:05 2026
    From Newsgroup: comp.lang.c

    On 18/03/2026 02:14, Janis Papanagnou wrote:
    On 2026-03-17 17:03, David Brown wrote:
    On 17/03/2026 11:15, Janis Papanagnou wrote:
    On 2026-03-17 08:09, Oguz Kaan Ocal wrote:

    Does anyone here still use the full standard library on chips with
    <32KB Flash? Or is it an immediate "no-go" for you?

    The "smallest" system I have written software for was a DSP, and
    I've done it in assembler, heavily optimized for memory and speed.


    The smallest target I used had 2 KB of flash and no ram at all - just
    32 8-bit registers.  I programmed it in C, using gcc and a couple of
    assembly instructions to skip any use of the C startup code.  I didn't
    use any of the C standard library (except for <stdint.h> and
    <stdbool.h>), because I did not need any.  It was quite a simple program! >>
    I have happily used both C and C++ on devices with 32 KB flash or less.

    I'm sure the decisions heavily depend on what you intend to do.


    Of course.

    The DSP-thing I mentioned was actually a comparably complex system;
    the most demanding part was a Viterbi decoder (based on code path
    searches in large graphs). But the system had also a couple other
    (less demanding) functions implemented; like the Convolutional Code
    encoder, CRC codec, interleaver, noise generator, channel I/O, etc.
    I managed to encode all that in 1k (16 bit-)words of assembler code,
    the data organized all in the CPU-internal cache memory (512 words
    available, IIRC), and, as in your case, with no external data memory.


    DSP programming is a special art. You often need to write the kernels
    in assembly - or in C with such particular specialised intrinsic
    functions that it might as well be assembly. C compilers for DSPs are,
    IME, rarely particular good, and have no chance of generating the
    perfect instruction sequence.

    The tailoring was necessary. The algorithms optimized on both levels,
    the algorithmic and technical level. - I don't think that using a "C"
    or other compiler would have helped in any way to fulfill the memory
    and time requirements. (Also C++ compilers weren't mature back then.)


    I am not at all surprised.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Wed Mar 18 15:20:29 2026
    From Newsgroup: comp.lang.c

    David Brown <david.brown@hesbynett.no> writes:

    <snip>

    DSP programming is a special art. You often need to write the kernels
    in assembly - or in C with such particular specialised intrinsic
    functions that it might as well be assembly. C compilers for DSPs are,
    IME, rarely particular good, and have no chance of generating the
    perfect instruction sequence.

    Have you used Ceva's DSPs?
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Wed Mar 18 20:31:14 2026
    From Newsgroup: comp.lang.c

    On 18/03/2026 16:20, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:

    <snip>

    DSP programming is a special art. You often need to write the kernels
    in assembly - or in C with such particular specialised intrinsic
    functions that it might as well be assembly. C compilers for DSPs are,
    IME, rarely particular good, and have no chance of generating the
    perfect instruction sequence.

    Have you used Ceva's DSPs?

    No. I've used perhaps three or four different DSP architectures, but
    there is a /very/ large number of them that I have not used.

    --- Synchronet 3.21d-Linux NewsLink 1.2