For string manipulation (memcpy, memset, etc.), do you trust the
compiler's built-in optimizations or do you write your own assembly/
manual loops to save those extra cycles?
How do you handle things like dynamic memory? Is malloc() ever
acceptable in a mission-critical embedded loop, or is it always
static allocation only?
What's your take? Do you prefer the portability of standard C or
the lean-and-mean performance of custom bare-metal implementations?
Hey guys,
How do you handle things like dynamic memory? Is malloc() ever
acceptable in a mission-critical embedded loop, or is it always
static allocation only?
I feel like using the standard library is "cheating" and adds too
much hidden overhead, but rewriting every basic utility feels like reinventing the wheel.
What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?
Tell me which application you have where the memory availabilit is
like that (32kiB) and you have constraints on the performance of string-processing ?
I guess on systems which have 32kiB every data is allocated
statically; maybe with little object pools.
I don't like C, but when it comes to such little applications it is
the right choice since in other language you would have the extra
work to invent everything of a standard lib yourself also.
Hey guys,
I've been debating this with myself for a while now. When you're working
on resource-constrained hardware, do you guys actually use the standard
C library (libc) or do you go full Bare Metal for everything?
I'm talking about things like using sprintf() vs. writing your own
itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
the binary by several KB and eat up the stack like crazy.
A few things I'm curious about:
Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?
For string manipulation (memcpy, memset, etc.), do you trust the
compiler's built-in optimizations or do you write your own
assembly/manual loops to save those extra cycles?
How do you handle things like dynamic memory? Is malloc() ever
acceptable in a mission-critical embedded loop, or is it always static allocation only?
I feel like using the standard library is "cheating" and adds too much hidden overhead, but rewriting every basic utility feels like
reinventing the wheel.
What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?
malloc() at initialization is o.k.
free() - no, in my opinion it is too much.
For sizes I do "almost standard clib with few corners cut" in form of newlib-nano is sufficiently small.
It is a /long/ time since I have felt my hand-written assembly does
better than the compiler. [...] In situations where speed is critical,
I always examine the generated assembly, and if necessary adjust my C
code.
The real problem is not malloc, but free - it is not
uncommon to see embedded systems where the "malloc" implementation is
just a stack, and "free" does not exist.
Nobody /ever/ uses the full standard library, in any code, on any
system. You only ever use the parts of the library that make sense to
the code you are writing.
From the printf family, snprintf is the most common choice. [...]
if you don't need floating point output, picking the
non-floating point version can save a large amount of code space.
I've been debating this with myself for a while now. When you're working
on resource-constrained hardware, do you guys actually use the standard
C library (libc) or do you go full Bare Metal for everything?
I'm talking about things like using sprintf() vs. writing your own
itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
the binary by several KB and eat up the stack like crazy.
A few things I'm curious about:
Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?
For string manipulation (memcpy, memset, etc.), do you trust the
compiler's built-in optimizations or do you write your own assembly/
manual loops to save those extra cycles?
How do you handle things like dynamic memory? Is malloc() ever
acceptable in a mission-critical embedded loop, or is it always static allocation only?
I feel like using the standard library is "cheating" and adds too much hidden overhead, but rewriting every basic utility feels like
reinventing the wheel.
What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?
Hey guys,
I've been debating this with myself for a while now. When you're working
on resource-constrained hardware, do you guys actually use the standard
C library (libc) or do you go full Bare Metal for everything?
I'm talking about things like using sprintf() vs. writing your own
itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
the binary by several KB and eat up the stack like crazy.
On 17.03.2026 12:37, David Brown wrote:
It is a /long/ time since I have felt my hand-written assembly does
better than the compiler. [...] In situations where speed is critical,
I always examine the generated assembly, and if necessary adjust my C
code.
I think this is the most professional way to handle it. Trusting the compiler but verifying the output. As you said, modern compilers
(especially for Cortex-M) are incredibly good at inlining memcpy or
memset into a few store instructions. I’ve found that fighting the compiler with manual assembly often just breaks the optimizer's ability
to schedule instructions around it.
The real problem is not malloc, but free - it is not
uncommon to see embedded systems where the "malloc" implementation is
just a stack, and "free" does not exist.
This is a great distinction. A "bump allocator" (or stack-based malloc)
is essentially harmless because it doesn't suffer from fragmentation.
It's the non-deterministic nature of free() and the heap-rebuilding
overhead that kills real-time performance.
Nobody /ever/ uses the full standard library, in any code, on any
system. You only ever use the parts of the library that make sense to
the code you are writing.
That’s a very grounding perspective. I think I fell into the trap of viewing libc as a monolithic "all-or-nothing" package. Using <stdint.h>
and <stdbool.h> (or moving to C23 as you mentioned to avoid the header)
is just common sense.
From the printf family, snprintf is the most common choice. [...]
if you don't need floating point output, picking the
non-floating point version can save a large amount of code space.
I’ve definitely seen the "printf-float" flag make or break a 32KB flash limit.
Thanks for the tip about comp.arch.embedded; I’ll definitely check that group out for more of these "quietly interesting" discussions. It seems
the consensus here is that "cheating" isn't a thing—efficiency and correctness are the only metrics that matter.
Oguz Kaan Ocal <oguzkaanocal3169@hotmail.com> writes:
Hey guys,
I've been debating this with myself for a while now. When you're working
on resource-constrained hardware, do you guys actually use the standard
C library (libc) or do you go full Bare Metal for everything?
I'm talking about things like using sprintf() vs. writing your own
itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
the binary by several KB and eat up the stack like crazy.
For bare metal code like Operating Systems and Hypervisors,
we never used libc directly, but rather re-implemented
the portions necessary including the necessary parts of
the crt (C Run-time), such as calling C++ static constructors, implementations of the default 'new' and 'delete' operators,
atexit stub, itoa, atoi, a printf formatter, str* functions,
et alia.
Amounted about a 1000 lines of C + assembler.
Other boot-time functionality includes setting up the hardware
(e.g. page tables, MTRR and other privileged registers, etc.)
On 2026-03-17 08:09, Oguz Kaan Ocal wrote:
Does anyone here still use the full standard library on chips with
<32KB Flash? Or is it an immediate "no-go" for you?
The "smallest" system I have written software for was a DSP, and
I've done it in assembler, heavily optimized for memory and speed.
I had a project (on a 68k microcontroller) where I wrote the C startup
code myself rather than using the toolchain's version. The toolchain's version was hand-written assembly - I wrote it in C (bar a couple of assembly instructions), compiled with the same compiler, and the result
was a lot smaller and faster.
Am 17.03.2026 um 16:59 schrieb David Brown:
I had a project (on a 68k microcontroller) where I wrote the C startup
code myself rather than using the toolchain's version. The toolchain's
version was hand-written assembly - I wrote it in C (bar a couple of
assembly instructions), compiled with the same compiler, and the result
was a lot smaller and faster.
Performance of startup code ?
What have you been smoking ?
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 17.03.2026 um 16:59 schrieb David Brown:
I had a project (on a 68k microcontroller) where I wrote the C startup
code myself rather than using the toolchain's version. The toolchain's >>> version was hand-written assembly - I wrote it in C (bar a couple of
assembly instructions), compiled with the same compiler, and the result
was a lot smaller and faster.
Performance of startup code ?
What have you been smoking ?
You clearly have absolutely _zero_ experience in the real world.
Startup performance (think BIOS and OS boot time, for example),
is very important to real customers. It is particularly important
for industrial and commercial microcontrollers in appliances and
industrial applications.
Hey guys,
I've been debating this with myself for a while now. When you're working
on resource-constrained hardware, do you guys actually use the standard
C library (libc) or do you go full Bare Metal for everything?
I'm talking about things like using sprintf() vs. writing your own
itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
the binary by several KB and eat up the stack like crazy.
A few things I'm curious about:
Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?
For string manipulation (memcpy, memset, etc.), do you trust the
compiler's built-in optimizations or do you write your own assembly/
manual loops to save those extra cycles?
How do you handle things like dynamic memory? Is malloc() ever
acceptable in a mission-critical embedded loop, or is it always static allocation only?
I feel like using the standard library is "cheating" and adds too much hidden overhead, but rewriting every basic utility feels like
reinventing the wheel.
What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?
Hey guys,
I've been debating this with myself for a while now. When you're working
on resource-constrained hardware, do you guys actually use the standard
C library (libc) or do you go full Bare Metal for everything?
I'm talking about things like using sprintf() vs. writing your own
itoa(), or malloc() vs. static buffers. Even a simple printf() can bloat
the binary by several KB and eat up the stack like crazy.
A few things I'm curious about:
Does anyone here still use the full standard library on chips with <32KB Flash? Or is it an immediate "no-go" for you?
For string manipulation (memcpy, memset, etc.), do you trust the
compiler's built-in optimizations or do you write your own
assembly/manual loops to save those extra cycles?
How do you handle things like dynamic memory? Is malloc() ever
acceptable in a mission-critical embedded loop, or is it always static allocation only?
I feel like using the standard library is "cheating" and adds too much hidden overhead, but rewriting every basic utility feels like
reinventing the wheel.
What's your take? Do you prefer the portability of standard C or the lean-and-mean performance of custom bare-metal implementations?
Startup performance (think BIOS and OS boot time, for example),
is very important to real customers. It is particularly important
for industrial and commercial microcontrollers in appliances and
industrial applications.
On 17/03/2026 11:15, Janis Papanagnou wrote:
On 2026-03-17 08:09, Oguz Kaan Ocal wrote:
Does anyone here still use the full standard library on chips with
<32KB Flash? Or is it an immediate "no-go" for you?
The "smallest" system I have written software for was a DSP, and
I've done it in assembler, heavily optimized for memory and speed.
The smallest target I used had 2 KB of flash and no ram at all - just 32 8-bit registers. I programmed it in C, using gcc and a couple of
assembly instructions to skip any use of the C startup code. I didn't
use any of the C standard library (except for <stdint.h> and
<stdbool.h>), because I did not need any. It was quite a simple program!
I have happily used both C and C++ on devices with 32 KB flash or less.
On 2026-03-17 17:03, David Brown wrote:
On 17/03/2026 11:15, Janis Papanagnou wrote:
On 2026-03-17 08:09, Oguz Kaan Ocal wrote:
Does anyone here still use the full standard library on chips with
<32KB Flash? Or is it an immediate "no-go" for you?
The "smallest" system I have written software for was a DSP, and
I've done it in assembler, heavily optimized for memory and speed.
The smallest target I used had 2 KB of flash and no ram at all - just
32 8-bit registers. I programmed it in C, using gcc and a couple of
assembly instructions to skip any use of the C startup code. I didn't
use any of the C standard library (except for <stdint.h> and
<stdbool.h>), because I did not need any. It was quite a simple program! >>
I have happily used both C and C++ on devices with 32 KB flash or less.
I'm sure the decisions heavily depend on what you intend to do.
The DSP-thing I mentioned was actually a comparably complex system;
the most demanding part was a Viterbi decoder (based on code path
searches in large graphs). But the system had also a couple other
(less demanding) functions implemented; like the Convolutional Code
encoder, CRC codec, interleaver, noise generator, channel I/O, etc.
I managed to encode all that in 1k (16 bit-)words of assembler code,
the data organized all in the CPU-internal cache memory (512 words
available, IIRC), and, as in your case, with no external data memory.
The tailoring was necessary. The algorithms optimized on both levels,
the algorithmic and technical level. - I don't think that using a "C"
or other compiler would have helped in any way to fulfill the memory
and time requirements. (Also C++ compilers weren't mature back then.)
DSP programming is a special art. You often need to write the kernels
in assembly - or in C with such particular specialised intrinsic
functions that it might as well be assembly. C compilers for DSPs are,
IME, rarely particular good, and have no chance of generating the
perfect instruction sequence.
David Brown <david.brown@hesbynett.no> writes:
<snip>
DSP programming is a special art. You often need to write the kernels
in assembly - or in C with such particular specialised intrinsic
functions that it might as well be assembly. C compilers for DSPs are,
IME, rarely particular good, and have no chance of generating the
perfect instruction sequence.
Have you used Ceva's DSPs?
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,104 |
| Nodes: | 10 (1 / 9) |
| Uptime: | 492386:38:22 |
| Calls: | 14,150 |
| Calls today: | 1 |
| Files: | 186,281 |
| D/L today: |
2,215 files (841M bytes) |
| Messages: | 2,501,137 |