Forum: War Ensemble BBS

Bare Metal C vs. libc: Is the overhead worth it on small MCUs?,08:52

From Oguz Kaan Ocal@oguzkaanocal3169@hotmail.com to comp.lang.c on Tue Mar 17 10:09:39 2026

From Newsgroup: comp.lang.c

Hey guys,

I've been debating this with myself for a while now. When you're working
on resource-constrained hardware, do you guys actually use the standard
C library (libc) or do you go full Bare Metal for everything?

I'm talking about things like using sprintf() vs. writing your own
itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
the binary by several KB and eat up the stack like crazy.

A few things I'm curious about:

Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?

For string manipulation (memcpy, memset, etc.), do you trust the
compiler's built-in optimizations or do you write your own
assembly/manual loops to save those extra cycles?

How do you handle things like dynamic memory? Is malloc() ever
acceptable in a mission-critical embedded loop, or is it always static allocation only?

I feel like using the standard library is "cheating" and adds too much
hidden overhead, but rewriting every basic utility feels like
reinventing the wheel.

What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Tue Mar 17 09:55:21 2026

From Newsgroup: comp.lang.c

Am 17.03.2026 um 08:09 schrieb Oguz Kaan Ocal:

For string manipulation (memcpy, memset, etc.), do you trust the
compiler's built-in optimizations or do you write your own assembly/
manual loops to save those extra cycles?

Tell me which application you have where the memory availabilit is
like that (32kiB) and you have constraints on the performance of string-processing ?

How do you handle things like dynamic memory? Is malloc() ever
acceptable in a mission-critical embedded loop, or is it always
static allocation only?

I guess on systems which have 32kiB every data is allocated
statically; maybe with little object pools.

What's your take? Do you prefer the portability of standard C or
the lean-and-mean performance of custom bare-metal implementations?

I don't like C, but when it comes to such little applications it is
the right choice since in other language you would have the extra
work to invent everything of a standard lib yourself also.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Tue Mar 17 11:08:55 2026

From Newsgroup: comp.lang.c

On Tue, 17 Mar 2026 10:09:39 +0300
Oguz Kaan Ocal <oguzkaanocal3169@hotmail.com> wrote:

Hey guys,

<snip>

How do you handle things like dynamic memory? Is malloc() ever
acceptable in a mission-critical embedded loop, or is it always
static allocation only?

malloc() at initialization is o.k.
free() - no, in my opinion it is too much.

I feel like using the standard library is "cheating" and adds too
much hidden overhead, but rewriting every basic utility feels like reinventing the wheel.

What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?

I did no really small staff recently (==decade+).
For sizes I do "almost standard clib with few corners cut" in form of newlib-nano is sufficiently small.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Oguz Kaan Ocal@oguzkaanocal3169@hotmail.com to comp.lang.c on Tue Mar 17 12:30:14 2026

From Newsgroup: comp.lang.c

On 17.03.2026 11:55, Bonita Montero wrote:

Tell me which application you have where the memory availabilit is
like that (32kiB) and you have constraints on the performance of string-processing ?

It’s not just about printing 'Hello World'. Think about high-frequency
NMEA parsing or handling AT commands on a low-power MCU (clocks at 1-4
MHz for energy efficiency). In those cases, sscanf or sprintf are not
just bloated in terms of Flash, but they also introduce unacceptable
latency.

Also, when driving an OLED display over SPI within a tight loop, every
cycle spent on string formatting is a cycle lost for UI responsiveness.
In 32KB systems, I'd rather spend my cycles on the application logic
than on a generic, 'one-size-fits-all' libc function.

I guess on systems which have 32kiB every data is allocated
statically; maybe with little object pools.

Exactly. On such tight constraints, determinism is king. You don't want
your mission-critical loop to fail after 48 hours just because of heap fragmentation. Using static buffers or fixed-size object pools gives you
a clear picture of your memory map at compile-time. If it fits in the
linker script, it fits in the chip.

I don't like C, but when it comes to such little applications it is
the right choice since in other language you would have the extra
work to invent everything of a standard lib yourself also.

It's a love-hate relationship for many of us. C's biggest strength isn't
just the language itself, but the fact that it's the lingua franca of
embedded systems.

Even if you want to switch to something like Rust or Zig for better
safety, the 'extra work' to integrate with existing vendor HALs and
headers can be a huge deterrent on small chips. At 32KB, you want to
spend your time fighting the memory constraints, not the toolchain. That
said, do you think C's dominance is strictly due to its standard library support, or is it just the lack of better 'bare-metal' alternatives for
these tiny targets?

--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Mar 17 10:37:39 2026

From Newsgroup: comp.lang.c

On 17/03/2026 08:09, Oguz Kaan Ocal wrote:

Hey guys,

You might also be interested in the comp.arch.embedded group. It is not
as popular as comp.lang.c, but there are a number of folks who quietly
follow the group in case interesting questions turn up.

I've been debating this with myself for a while now. When you're working
on resource-constrained hardware, do you guys actually use the standard
C library (libc) or do you go full Bare Metal for everything?

I use some of the standard library in my bare-metal (or small RTOS)
coding. The freestanding, or function-free headers are of course in
heavy use - I have <stdint.h> <stdbool.h> in pretty much every file.
(The latter is now unnecessary with C23.) Small utility functions, like memcpy and string functions, are also fine in almost every system.
Typically a good compiler will inline many of those uses directly rather
than calling a library function.

Like most serious small-system embedded programmers, I avoid any use of malloc/free except in very specific and controlled circumstances. Even
then, I usually use specially written dynamic memory functions.

From the printf family, snprintf is the most common choice. For
smaller systems I'll often have specifically written conversion routines
for output (typically a debug UART connection). For bigger systems,
I'll often use snprintf to do the formatting for convenience. Yes, the
printf family has overhead in code size and runtime. If that overhead
is too big for the target, I don't use it - if it is small enough not to matter, I /do/ use it. There is no fixed rule. (For many toolchains,
you can choose between a (sn)printf that has floating point support, and
one that does not - if you don't need floating point output, especially
if the target does not have floating point hardware, picking the
non-floating point version can save a large amount of code space.)

I'm talking about things like using sprintf() vs. writing your own
itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
the binary by several KB and eat up the stack like crazy.

A few things I'm curious about:

Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?

Nobody /ever/ uses the full standard library, in any code, on any
system. You only ever use the parts of the library that make sense to
the code you are writing. That applies whether you are programming for
a data centre or for a tiny microcontroller.

Basically, if the C library has a function that does the job you need,
and the overhead (due to features you don't need) is not too high, use
that function. There are no fixed rules. And in embedded systems your
code typically does not need to be highly portable, so feel free to use additional functions provided by your toolchain and its library.

There are parts of the C library that rarely make any sense on small
systems - the file I/O functions, signals, wide characters, signals,
most maths functions, etc. There are parts that are generally fine (but
read the small print!) such as the mem* and str* functions. Memory
management is always a big concern, so generic malloc/free are often not
a good idea. The big thing to be wary of, however, are the atomics and
thread functions in C11 - these are often either useless or far worse
than useless in embedded toolchains.

For string manipulation (memcpy, memset, etc.), do you trust the
compiler's built-in optimizations or do you write your own
assembly/manual loops to save those extra cycles?

It is a /long/ time since I have felt my hand-written assembly does
better than the compiler. But it can depend on your target. If you
still have to deal with brain-dead 8-bit CISC devices, manual assembly
can sometimes be faster (but "faster" is not the only meaning of
"better"). For decent microcontrollers - 32-bit ARM Cortex-M being
massively dominant in the market - you are unlikely to do better than
the compiler in any meaningful way. Of course, you do have to have a
basic understanding of the compiler and how to work with it to generate efficient results. If you don't enable optimisations, or use
appropriate target flags, or write the code in a way that can lead to
good generated code, then your end results will be poor.

Note that it is fully possible to view the generated assembly code. In situations where speed is critical and each cycle counts, I always
examine the generated assembly, and if necessary adjust my C code or use additional compiler features until I am happy with the assembly. That
is a more flexible and maintainable solution that writing the assembly directly. If I do need to write assembly, I do so with absolute minimal inline assembly code that can be optimised within the C code (at least
with gcc). In fact, most inline assembly code I use is actually empty,
but exists to force particular effects in gcc.

How do you handle things like dynamic memory? Is malloc() ever
acceptable in a mission-critical embedded loop, or is it always static allocation only?

Malloc is rarely acceptable in real-time embedded systems - at least for
the important stuff. /Maybe/ you can use it in connection with logging
or debugging code. The real problem is not malloc, but free - it is not uncommon to see embedded systems where the "malloc" implementation is
just a stack, and "free" does not exist. If you really need dynamic
memory (and you often do for things using Ethernet or Wifi networks),
you will often want a specialised setup with allocation pools for
different purposes.

I feel like using the standard library is "cheating" and adds too much hidden overhead, but rewriting every basic utility feels like
reinventing the wheel.

What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?

I use what does the job in a way that I know is correct and efficient
enough for the task at hand. There is no single universal answer. As
for portability, I don't make code unnecessarily non-portable (within
reason - there's usually no need to be obsessive about code that works
on /every/ system). But I have no qualms about making code non-portable
if there are advantages in doing so, and if attempts to compile the code
on unsuitable systems will give a compile-time error.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Oguz Kaan Ocal@oguzkaanocal3169@hotmail.com to comp.lang.c on Tue Mar 17 12:49:55 2026

From Newsgroup: comp.lang.c

On 17.03.2026 12:08, Michael S wrote:

malloc() at initialization is o.k.
free() - no, in my opinion it is too much.

I fully agree with the "allocate once, never free" approach. It’s a
solid middle ground that gives you the flexibility of dynamic setup at
boot while keeping the main loop safe from the nightmare of heap fragmentation. It's actually a pattern I see even in high-reliability
projects where a runtime crash is not an option.

For sizes I do "almost standard clib with few corners cut" in form of newlib-nano is sufficiently small.

Good point on newlib-nano. It really solves that "cheating" vs
"reinventing" dilemma by stripping out the heavy features (like complex
float support) while keeping the familiar API. At 32KB, it seems like
the sweet spot where you don't have to fight the toolchain, but you also
don't waste precious Flash on things you'll never use.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Oguz Kaan Ocal@oguzkaanocal3169@hotmail.com to comp.lang.c on Tue Mar 17 12:55:14 2026

From Newsgroup: comp.lang.c

On 17.03.2026 12:37, David Brown wrote:

It is a /long/ time since I have felt my hand-written assembly does
better than the compiler. [...] In situations where speed is critical,
I always examine the generated assembly, and if necessary adjust my C
code.

I think this is the most professional way to handle it. Trusting the
compiler but verifying the output. As you said, modern compilers
(especially for Cortex-M) are incredibly good at inlining memcpy or
memset into a few store instructions. I’ve found that fighting the
compiler with manual assembly often just breaks the optimizer's ability
to schedule instructions around it.

The real problem is not malloc, but free - it is not
uncommon to see embedded systems where the "malloc" implementation is
just a stack, and "free" does not exist.

This is a great distinction. A "bump allocator" (or stack-based malloc)
is essentially harmless because it doesn't suffer from fragmentation.
It's the non-deterministic nature of free() and the heap-rebuilding
overhead that kills real-time performance.

Nobody /ever/ uses the full standard library, in any code, on any
system. You only ever use the parts of the library that make sense to
the code you are writing.

That’s a very grounding perspective. I think I fell into the trap of
viewing libc as a monolithic "all-or-nothing" package. Using <stdint.h>
and <stdbool.h> (or moving to C23 as you mentioned to avoid the header)
is just common sense.

From the printf family, snprintf is the most common choice. [...]
if you don't need floating point output, picking the
non-floating point version can save a large amount of code space.

I’ve definitely seen the "printf-float" flag make or break a 32KB flash limit.

Thanks for the tip about comp.arch.embedded; I’ll definitely check that group out for more of these "quietly interesting" discussions. It seems
the consensus here is that "cheating" isn't a thing—efficiency and correctness are the only metrics that matter.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Mar 17 11:15:24 2026

From Newsgroup: comp.lang.c

On 2026-03-17 08:09, Oguz Kaan Ocal wrote:

I've been debating this with myself for a while now. When you're working
on resource-constrained hardware, do you guys actually use the standard
C library (libc) or do you go full Bare Metal for everything?

I'm talking about things like using sprintf() vs. writing your own
itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
the binary by several KB and eat up the stack like crazy.

If I don't need any complex formatting of the 'printf' family and
the required functionality was comparably primitive I had written
own functions (for various things); that was in the (farther) past.

A few things I'm curious about:

Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?

The "smallest" system I have written software for was a DSP, and
I've done it in assembler, heavily optimized for memory and speed.

For string manipulation (memcpy, memset, etc.), do you trust the
compiler's built-in optimizations or do you write your own assembly/
manual loops to save those extra cycles?

For a C++ library - back then when there was no 'string' available
in C++ - we wrote a string library based on memcpy, memcmp, etc.
(note: not strcpy, strcmp, etc. because we supported '\0' as part
of the string alphabet). So, yes, I trust the standard libraries.

How do you handle things like dynamic memory? Is malloc() ever
acceptable in a mission-critical embedded loop, or is it always static allocation only?

For the DSP project not at all dynamic; all allocation was static.

I feel like using the standard library is "cheating" and adds too much hidden overhead, but rewriting every basic utility feels like
reinventing the wheel.

I suggest to replace the "feeling" by measurement. - What are the
requirements, what does your system provides to fulfill them, are
sensible questions!

But I also wouldn't reinvent the wheel unnecessarily, or write my
own libraries, or an own version of the "C" language, etc.

What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?

If you don't use portable libraries and implement all by yourself
you also deliver your code with your product. So there's nothing
wrong with that in principle. I would avoid proprietary libraries.

(My guess would be that nowadays you'd also have larger and better
offerings than back in the 1980's.)

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Tue Mar 17 14:53:30 2026

From Newsgroup: comp.lang.c

Oguz Kaan Ocal <oguzkaanocal3169@hotmail.com> writes:

Hey guys,

I've been debating this with myself for a while now. When you're working
on resource-constrained hardware, do you guys actually use the standard
C library (libc) or do you go full Bare Metal for everything?

I'm talking about things like using sprintf() vs. writing your own
itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
the binary by several KB and eat up the stack like crazy.

For bare metal code like Operating Systems and Hypervisors,
we never used libc directly, but rather re-implemented
the portions necessary including the necessary parts of
the crt (C Run-time), such as calling C++ static constructors,
implementations of the default 'new' and 'delete' operators,
atexit stub, itoa, atoi, a printf formatter, str* functions,
et alia.

Amounted about a 1000 lines of C + assembler.

Other boot-time functionality includes setting up the hardware
(e.g. page tables, MTRR and other privileged registers, etc.)
--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Mar 17 16:56:04 2026

From Newsgroup: comp.lang.c

On 17/03/2026 10:55, Oguz Kaan Ocal wrote:

On 17.03.2026 12:37, David Brown wrote:

It is a /long/ time since I have felt my hand-written assembly does
better than the compiler. [...] In situations where speed is critical,
I always examine the generated assembly, and if necessary adjust my C
code.

I think this is the most professional way to handle it. Trusting the compiler but verifying the output. As you said, modern compilers
(especially for Cortex-M) are incredibly good at inlining memcpy or
memset into a few store instructions. I’ve found that fighting the compiler with manual assembly often just breaks the optimizer's ability
to schedule instructions around it.

I've often seen people write manual assembly that is, at best, very
fragile because they don't understand the interaction with the compiler.

Most of the gcc "asm" statements I use in code these days are empty -
it's the interaction with the compiler that matters. (A common example
is a memory barrier - asm volatile ("" ::: "memory"); )

The real problem is not malloc, but free - it is not
uncommon to see embedded systems where the "malloc" implementation is
just a stack, and "free" does not exist.

This is a great distinction. A "bump allocator" (or stack-based malloc)
is essentially harmless because it doesn't suffer from fragmentation.
It's the non-deterministic nature of free() and the heap-rebuilding
overhead that kills real-time performance.

It is still preferable, where possible, to use static allocation rather
than such allocators. Static allocation makes things clearer in map
files, gives you consistent addresses to aid during debugging, can be marginally more efficient, and - most importantly - overuse of memory is
a build-time failure rather than a run-time failure.

Nobody /ever/ uses the full standard library, in any code, on any
system. You only ever use the parts of the library that make sense to
the code you are writing.

That’s a very grounding perspective. I think I fell into the trap of viewing libc as a monolithic "all-or-nothing" package. Using <stdint.h>
and <stdbool.h> (or moving to C23 as you mentioned to avoid the header)
is just common sense.

From the printf family, snprintf is the most common choice. [...]
if you don't need floating point output, picking the
non-floating point version can save a large amount of code space.

I’ve definitely seen the "printf-float" flag make or break a 32KB flash limit.

Thanks for the tip about comp.arch.embedded; I’ll definitely check that group out for more of these "quietly interesting" discussions. It seems
the consensus here is that "cheating" isn't a thing—efficiency and correctness are the only metrics that matter.

Correctness /is/ the only metric matters until the code is correct.
Only when the code is correct does it matter how efficient it is. (If
timing or code size is essential, then that is part of correctness.)
But for many real-world programs (not just embedded ones), strict
conformance to particular C standard versions or relying purely on standard-compliant features of the language is not part of correctness.
I am happy to use compiler-specific features when documented - whether
they are labelled as implementation-defined in the C standards, or as documented extensions in the compiler manual. But I am not happy to
rely on undocumented features or assumptions. Such code is often, by my interpretation of the word, incorrect - even if it happens to work.

--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Mar 17 16:59:45 2026

From Newsgroup: comp.lang.c

On 17/03/2026 15:53, Scott Lurndal wrote:

Oguz Kaan Ocal <oguzkaanocal3169@hotmail.com> writes:

Hey guys,

I've been debating this with myself for a while now. When you're working
on resource-constrained hardware, do you guys actually use the standard
C library (libc) or do you go full Bare Metal for everything?

I'm talking about things like using sprintf() vs. writing your own
itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
the binary by several KB and eat up the stack like crazy.

For bare metal code like Operating Systems and Hypervisors,
we never used libc directly, but rather re-implemented
the portions necessary including the necessary parts of
the crt (C Run-time), such as calling C++ static constructors, implementations of the default 'new' and 'delete' operators,
atexit stub, itoa, atoi, a printf formatter, str* functions,
et alia.

Amounted about a 1000 lines of C + assembler.

Other boot-time functionality includes setting up the hardware
(e.g. page tables, MTRR and other privileged registers, etc.)

I had a project (on a 68k microcontroller) where I wrote the C startup
code myself rather than using the toolchain's version. The toolchain's version was hand-written assembly - I wrote it in C (bar a couple of
assembly instructions), compiled with the same compiler, and the result
was a lot smaller and faster.

--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Mar 17 17:03:24 2026

From Newsgroup: comp.lang.c

On 17/03/2026 11:15, Janis Papanagnou wrote:

On 2026-03-17 08:09, Oguz Kaan Ocal wrote:

Does anyone here still use the full standard library on chips with
<32KB Flash? Or is it an immediate "no-go" for you?

The "smallest" system I have written software for was a DSP, and
I've done it in assembler, heavily optimized for memory and speed.

The smallest target I used had 2 KB of flash and no ram at all - just 32
8-bit registers. I programmed it in C, using gcc and a couple of
assembly instructions to skip any use of the C startup code. I didn't
use any of the C standard library (except for <stdint.h> and
<stdbool.h>), because I did not need any. It was quite a simple program!

I have happily used both C and C++ on devices with 32 KB flash or less.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Tue Mar 17 19:05:21 2026

From Newsgroup: comp.lang.c

Am 17.03.2026 um 16:59 schrieb David Brown:

I had a project (on a 68k microcontroller) where I wrote the C startup
code myself rather than using the toolchain's version. The toolchain's version was hand-written assembly - I wrote it in C (bar a couple of assembly instructions), compiled with the same compiler, and the result
was a lot smaller and faster.

Performance of startup code ?
What have you been smoking ?

--- Synchronet 3.21d-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Tue Mar 17 18:34:31 2026

From Newsgroup: comp.lang.c

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 17.03.2026 um 16:59 schrieb David Brown:

I had a project (on a 68k microcontroller) where I wrote the C startup
code myself rather than using the toolchain's version. The toolchain's
version was hand-written assembly - I wrote it in C (bar a couple of
assembly instructions), compiled with the same compiler, and the result
was a lot smaller and faster.

Performance of startup code ?
What have you been smoking ?

You clearly have absolutely _zero_ experience in the real world.

Startup performance (think BIOS and OS boot time, for example),
is very important to real customers. It is particularly important
for industrial and commercial microcontrollers in appliances and
industrial applications.
--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Mar 17 19:48:04 2026

From Newsgroup: comp.lang.c

On 17/03/2026 19:34, Scott Lurndal wrote:

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 17.03.2026 um 16:59 schrieb David Brown:

I had a project (on a 68k microcontroller) where I wrote the C startup
code myself rather than using the toolchain's version. The toolchain's >>> version was hand-written assembly - I wrote it in C (bar a couple of
assembly instructions), compiled with the same compiler, and the result
was a lot smaller and faster.

Performance of startup code ?
What have you been smoking ?

You clearly have absolutely _zero_ experience in the real world.

Startup performance (think BIOS and OS boot time, for example),
is very important to real customers. It is particularly important
for industrial and commercial microcontrollers in appliances and
industrial applications.

While all that is true, and I have worked with systems where startup performance was very important (because the system had to use an
absolute minimum of energy, and was only powered on by the events it had
to handle), performance of the startup code was not important in this
case. There were other reasons for writing my own startup code on that project - there were several bits of hardware that needed to be
initialised before the usual "clear bss" and similar pre-main code could
run. The greater efficiency of the C code was a side-effect, not a
goal. The point was that the C code was significantly (a factor of four
or more) faster than the hand-written assembly that was written by the toolchain vendor themselves.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Tue Mar 17 12:32:31 2026

From Newsgroup: comp.lang.c

On 3/17/2026 12:09 AM, Oguz Kaan Ocal wrote:

Hey guys,

I've been debating this with myself for a while now. When you're working
on resource-constrained hardware, do you guys actually use the standard
C library (libc) or do you go full Bare Metal for everything?

I'm talking about things like using sprintf() vs. writing your own
itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
the binary by several KB and eat up the stack like crazy.

Well, I used one of my region allocators for a system that had no
malloc, its an older version of quadros

https://pastebin.com/raw/f37a23918

https://groups.google.com/g/comp.lang.c/c/7oaJFWKVCTw/m/sSWYU9BUS_QJ

A few things I'm curious about:

Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?

For string manipulation (memcpy, memset, etc.), do you trust the
compiler's built-in optimizations or do you write your own assembly/
manual loops to save those extra cycles?

How do you handle things like dynamic memory? Is malloc() ever
acceptable in a mission-critical embedded loop, or is it always static allocation only?

I feel like using the standard library is "cheating" and adds too much hidden overhead, but rewriting every basic utility feels like
reinventing the wheel.

What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?

--- Synchronet 3.21d-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Tue Mar 17 21:04:55 2026

From Newsgroup: comp.lang.c

Oguz Kaan Ocal <oguzkaanocal3169@hotmail.com> wrote:

Hey guys,

I've been debating this with myself for a while now. When you're working
on resource-constrained hardware, do you guys actually use the standard
C library (libc) or do you go full Bare Metal for everything?

I'm talking about things like using sprintf() vs. writing your own
itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
the binary by several KB and eat up the stack like crazy.

A few things I'm curious about:

Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?

I have the following in my Makefiles:

CFLAGS = -Os -fno-builtin ...

LD_FLAGS = ... -Wl,--gc-sections -nostartfiles -nostdlib

Which means nothing from toolchain C library gets included in my
program. The -nostartfiles is partially because in the past
toolchain startup was calling library functions, partially
because I use custom startup code.

My embedded programs are small and most do only a little.
Frequently binaries are in 1-2 kB range. Strictly speaking
I could tolerate much bigger binaries as my smallest targets
have 8kB flash and more typical have 32kB to 256kB flash.
But for developement I find convenient to compile to RAM
and that means much smaller size. Also, parts of programs
are intended to form kind of library that could be used in
bigger program, so I want to keep each part small.

For string manipulation (memcpy, memset, etc.), do you trust the
compiler's built-in optimizations or do you write your own
assembly/manual loops to save those extra cycles?

Normally I consider compiler optimizations to be good enough.
I do see various inefficiencies and by coding in assembly I
probably could gain 5% in speed and size. But that would
require large effort so in most cases possible gain in
speed and code size does not justify effort for assembly
coding.

If you mean library functions for string handling I rarely
use them. In non-embedded code I use 'memcpy', 'strlen'
and similar when needed. But if I need slightly more
"interesting" string handling I do this in my own code.
Frequently a single loop can do what would otherwise
require several library calls. This may be partially
because I am used to writing such loops.

How do you handle things like dynamic memory? Is malloc() ever
acceptable in a mission-critical embedded loop, or is it always static allocation only?

I feel like using the standard library is "cheating" and adds too much hidden overhead, but rewriting every basic utility feels like
reinventing the wheel.

My feeling is that I do not need most of the standard library.
Writing simple functions to do things that I need, like
converting integer to a string is not a big deal. My functions
typically have simpler interface than C library functions
and whole implementation is comparable in size with a wrapper
that offers new interface, so I am not worried about "reinventing
the wheel".

For example, my typical output device is 2x16 alphanumeric LCD
display. Connecting it to an output stream gains almost
nothing. I have functions to drive the display, they are
needed as C library does not provide device drivers. I also
have a function to convert integer into a string and that
+ driver are enough for comfortable use of the display.

What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?

As I wrote, I found little use for standard C library so for
me working without C library is no problem. But if you need
C library then use it.

Concerning portablity, note that C library encapulates inherently
unportable things like interface to OS. Those unportable aspects
vary quite a lot in embedded systems and C library implementation
will make choices that may or may not be good for you. If a
third party makes a complete toolchain and you are statified
with choices made by the makers, than good. But if not you
will need to assemble your own toolchain. Which means that you
may be forced to maintain nonportable part of C library that you
use. In particular, you may be forced to maintain part that
you do not use simply to be able to compile the library (and
use what you need).

Anyway, once you make choice of a toolchain most portability is
gone. Your program will depend on choices made by the toolchain
and will require work porting to a different toolchain.
Of course, "portable" C functions hopefully will be
compatible enough in different toolchais. But portable C
code is likely to be at least as portable. The trouble is
in device-dependent parts and different toolchains shows
significant differences.
--
Waldek Hebisch
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Tue Mar 17 22:36:48 2026

From Newsgroup: comp.lang.c

Am 17.03.2026 um 19:34 schrieb Scott Lurndal:

Startup performance (think BIOS and OS boot time, for example),
is very important to real customers. It is particularly important
for industrial and commercial microcontrollers in appliances and
industrial applications.

Yes, 1us vs 1ms.

--- Synchronet 3.21d-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Mar 18 02:14:20 2026

From Newsgroup: comp.lang.c

On 2026-03-17 17:03, David Brown wrote:

On 17/03/2026 11:15, Janis Papanagnou wrote:

On 2026-03-17 08:09, Oguz Kaan Ocal wrote:

Does anyone here still use the full standard library on chips with
<32KB Flash? Or is it an immediate "no-go" for you?

The "smallest" system I have written software for was a DSP, and
I've done it in assembler, heavily optimized for memory and speed.

The smallest target I used had 2 KB of flash and no ram at all - just 32 8-bit registers. I programmed it in C, using gcc and a couple of
assembly instructions to skip any use of the C startup code. I didn't
use any of the C standard library (except for <stdint.h> and
<stdbool.h>), because I did not need any. It was quite a simple program!

I have happily used both C and C++ on devices with 32 KB flash or less.

I'm sure the decisions heavily depend on what you intend to do.

The DSP-thing I mentioned was actually a comparably complex system;
the most demanding part was a Viterbi decoder (based on code path
searches in large graphs). But the system had also a couple other
(less demanding) functions implemented; like the Convolutional Code
encoder, CRC codec, interleaver, noise generator, channel I/O, etc.
I managed to encode all that in 1k (16 bit-)words of assembler code,
the data organized all in the CPU-internal cache memory (512 words
available, IIRC), and, as in your case, with no external data memory.

The tailoring was necessary. The algorithms optimized on both levels,
the algorithmic and technical level. - I don't think that using a "C"
or other compiler would have helped in any way to fulfill the memory
and time requirements. (Also C++ compilers weren't mature back then.)

Janis

--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Wed Mar 18 08:18:05 2026

From Newsgroup: comp.lang.c

On 18/03/2026 02:14, Janis Papanagnou wrote:

On 2026-03-17 17:03, David Brown wrote:

On 17/03/2026 11:15, Janis Papanagnou wrote:

On 2026-03-17 08:09, Oguz Kaan Ocal wrote:

Does anyone here still use the full standard library on chips with
<32KB Flash? Or is it an immediate "no-go" for you?

The "smallest" system I have written software for was a DSP, and
I've done it in assembler, heavily optimized for memory and speed.

The smallest target I used had 2 KB of flash and no ram at all - just
32 8-bit registers. I programmed it in C, using gcc and a couple of
assembly instructions to skip any use of the C startup code. I didn't
use any of the C standard library (except for <stdint.h> and
<stdbool.h>), because I did not need any. It was quite a simple program! >>
I have happily used both C and C++ on devices with 32 KB flash or less.

I'm sure the decisions heavily depend on what you intend to do.

Of course.

The DSP-thing I mentioned was actually a comparably complex system;
the most demanding part was a Viterbi decoder (based on code path
searches in large graphs). But the system had also a couple other
(less demanding) functions implemented; like the Convolutional Code
encoder, CRC codec, interleaver, noise generator, channel I/O, etc.
I managed to encode all that in 1k (16 bit-)words of assembler code,
the data organized all in the CPU-internal cache memory (512 words
available, IIRC), and, as in your case, with no external data memory.

DSP programming is a special art. You often need to write the kernels
in assembly - or in C with such particular specialised intrinsic
functions that it might as well be assembly. C compilers for DSPs are,
IME, rarely particular good, and have no chance of generating the
perfect instruction sequence.

The tailoring was necessary. The algorithms optimized on both levels,
the algorithmic and technical level. - I don't think that using a "C"
or other compiler would have helped in any way to fulfill the memory
and time requirements. (Also C++ compilers weren't mature back then.)

I am not at all surprised.

--- Synchronet 3.21d-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Wed Mar 18 15:20:29 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> writes:

<snip>

DSP programming is a special art. You often need to write the kernels
in assembly - or in C with such particular specialised intrinsic
functions that it might as well be assembly. C compilers for DSPs are,
IME, rarely particular good, and have no chance of generating the
perfect instruction sequence.

Have you used Ceva's DSPs?
--- Synchronet 3.21d-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Wed Mar 18 20:31:14 2026

From Newsgroup: comp.lang.c

On 18/03/2026 16:20, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

<snip>

DSP programming is a special art. You often need to write the kernels
in assembly - or in C with such particular specialised intrinsic
functions that it might as well be assembly. C compilers for DSPs are,
IME, rarely particular good, and have no chance of generating the
perfect instruction sequence.

Have you used Ceva's DSPs?

No. I've used perhaps three or four different DSP architectures, but
there is a /very/ large number of them that I have not used.

--- Synchronet 3.21d-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Microbot
  Sun Mar 22 08:05:36 2026
  from Moore, Ok via Telnet
- Pixelrez
  Sat Mar 21 16:03:42 2026
  from Lenexa,ks via Telnet
- Pixelrez
  Sat Mar 21 15:57:39 2026
  from Lenexa,ks via Telnet
- Pixelrez
  Sat Mar 21 15:57:11 2026
  from Lenexa,ks via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,104
Nodes:	10 (1 / 9)
Uptime:	492386:38:22
Calls:	14,150
Calls today:	1
Files:	186,281
D/L today:	2,215 files (841M bytes)
Messages:	2,501,137

Bare Metal C vs. libc: Is the overhead worth it on small MCUs?,08:52

Who's Online

Recent Visitors

System Info