arp242 16 hours ago

> There are also many ficticious names for 64-bit x86, which you should avoid unless you want the younger generation to make fun of you. amd64 refers to AMD’s original implementation of long mode in their K8 microarchitecture, first shipped in their Athlon 64 product. Calling it amd64 is silly and also looks a lot like arm64, and I am honestly kinda annoyed at how much Go code I’ve seen with files named fast_arm64.s and fast_amd64.s. Debian also uses amd64/arm64, which makes browsing packages kind of annoying.

I prefer amd64 as it's so much easier to type and scans so much easier. x86_64 is so awkward.

Bikeshed I guess and in the abstract I can see how x86_64 is better, but pragmatism > purity and you'll take my amd64 from my cold dead hands.

As for Go, you can get the GOARCH/GOOS combinations from "go tool dist list". Can be useful at times if you want to ensure your code cross-compiles in CI.

peterldowns 18 hours ago

Some other sources of target triples (some mentioned in the article, some not):

rustc: `rustc --print target-list`

golang: `go tool dist list`

zig: `zig targets`

As the article point out, the complete lack of standardization and consistency in what constitutes a "triple" (sometimes actually a quad!) is kind of hellishly hilarious.

  • lifthrasiir 15 hours ago

    > what constitutes a "triple" (sometimes actually a quad!)

    It is actually a quintiple at most because the first part, architecture, may contain a version for e.g. ARM. And yet it doesn't fully describe the actual target because it may require an additional OS version for e.g. macOS. Doubly silly.

  • ycombinatrix 16 hours ago

    at least we don't have to deal with --build, --host, --target nonsense anymore

    • rendaw 11 hours ago

      You do on Nix. And it's as inconsistently implemented there as anywhere.

ComputerGuru 16 hours ago

Great article but I was really put off by this bit, which aside from being very condescending, simply isn't true and reveals a lack of appreciation for the innovation that I would have thought someone posting about target triples and compilers would have appreciated:

> Why the Windows people invented a whole other ABI instead of making things clean and simple like Apple did with Rosetta on ARM MacBooks? I have no idea, but http://www.emulators.com/docs/abc_arm64ec_explained.htm contains various excuses, none of which I am impressed by. My read is that their compiler org was just worse at life than Apple’s, which is not surprising, since Apple does compilers better than anyone else in the business.

I was already familiar with ARM64EC from reading about its development from Microsoft over the past years but had not come across the emulators.com link before - it's a stupendous (long) read and well worth the time if you are interested in lower-level shenanigans. The truth is that Microsoft's ARM64EC solution is a hundred times more brilliant and a thousand times better for backwards (and forwards) compatibility than Rosetta on macOS, which gave the user a far inferior experience than native code, executed (sometimes far) slower, prevented interop between legacy and modern code, left app devs having to do a full port to move to use newer tech (or even just have a UI that matched the rest of the system), and was always intended as a merely transitional bit of tech to last the few years it took for native x86 apps to be developed and take the place (usurp) of old ppc ones.

Microsoft's solution has none of these drawbacks (except the noted lack of AVX support), doesn't require every app to be 2x or 3x as large as a sacrifice to the fat binaries hack, offers a much more elegant solution for developers to migrate their code (piecemeal or otherwise) to a new platform where they don't know if it will be worth their time/money to invest in a full rewrite, lets users use all the apps they love, and maintains Microsoft's very much well-earned legacy for backwards compatibility.

When you run an app for Windows 2000 on Windows 11 (x86 or ARM), you don't see the old Windows 2000 aesthetic (and if you do, there's an easy way for users to opt into newer theming rather than requiring the developer to do something about it) and you aren't stuck with bugs from 30 years ago that were long since patched by the vendor many OS releases ago.

  • plorkyeran 14 hours ago

    The thing named Rosetta (actually Rosetta 2) for the x86_64 -> ARM transition is technologically completely unrelated to the PPC -> x86 Rosetta, and has none of the problems you mention. There's no user-observable difference between a program using Rosetta and a native program in modern macOS, and porting programs which didn't have any assembly or other CPU-arch-specific code was generally just a matter of wrangling your build system.

  • Zamiel_Snawley 16 hours ago

    Do those criticisms of Rosetta hold for Rosetta 2?

    I assumed the author was talking about the x86 emulator released for the arm migration a few years ago, not the powerpc one.

  • Philpax 16 hours ago

    This author has a tendency to be condescending about things they find disagreeable. It's why I stopped reading them.

  • juped 16 hours ago

    You have neglected to consider that Microsoft bad; consider how they once did something differently from a Linux distribution I use. (This sentiment is alive and well among otherwise intelligent people; it's embarrassing to read.)

jcranmer 19 hours ago

I did start to try to take clang's TargetInfo code (https://github.com/llvm/llvm-project/blob/main/clang/lib/Bas...) and porting it over to TableGen, primarily so somebody could actually extract useful auto-generated documentation out of it, like "What are all the targets available?"

I actually do have working code for the triple-to-TargetInfo instantiation portion (which is fun because there's one or two cases that juuuust aren't quite like all of the others, and I'm not sure if that's a bad copy-paste job or actually intentional). But I never got around to working out how to actually integrate the actual bodies of TargetInfo implementations--which provide things like the properties of C/C++ fundamental types or default macros--into the TableGen easily, so that patch is still merely languishing somewhere on my computer.

psanford 17 hours ago

As a Go developer, I certainly find the complaints about the go conventions amusing. I guess if you have really invested so much into understanding all the details in the rest of this article you might be annoyed that it doesn't translate 1 to 1 to Go.

But for the rest of us, I'm so glad that I can just cross compile things in Go without thinking about it. The annoying thing with setting up cross compilation in GCC is not learning the naming conventions, it is getting the correct toolchains installed and wired up correctly in your build system. Go just ships that out of the box and it is so much more pleasant.

Its also one thing that is great about zig. Using Go+zig when I need to cross compile something that includes cgo in it is so much better than trying to get GCC toolchains setup properly.

cbmuser 18 hours ago

»32-bit x86 is extremely not called “x32”; this is what Linux used to call its x86 ILP324 variant before it was removed.«

x32 support has not been removed from the Linux kernel. In fact, we‘re still maintaining Debian for x32 in Debian Ports.

vient 19 hours ago

> Kalimba, VE

> No idea what this is, and Google won’t help me.

Seems that Kalimba is a DSP, originally by CSR and now by Qualcomm. CSR8640 is using it, for example https://www.qualcomm.com/products/internet-of-things/consume...

VE is harder to find with such short name.

  • AKSF_Ackermann 18 hours ago

    NEC Vector Engine. Basically not a thing outside supercomputers.

    • fc417fc802 5 hours ago

      $800 for the 20B-P model on ebay. More memory bandwidth than a 4090. I wonder if llama.cpp could be made to run on it?

      I see rumors they charge for the compiler though.

IshKebab 19 hours ago

Funny thing I found when I gave up trying to find documentation and read the LLVM source code (seems to be what happened to the author too!): there are actually five components of the triple, not four.

I can't remember what the fifth one is, but yeah... insane system.

Thanks for writing this up! I wonder if anyone will ever come up with something more sensible.

  • o11c 17 hours ago

    There are up to 7 components in a triple, but not all are used at once, the general format is:

      <machine>-<vendor>-<kernel>-<libc?><abi?><fabi?>
    
    But there's also <obj>, see below.

    Note that there are both canonical and non-canonical triples in use. Canonical triples are output by `config.guess` or `config.sub`; non-canonical triples are input to `config.sub` and used as prefixes for commands.

    The <machine> field (1st) is what you're running on, and on some systems it includes a version number of sorts. Most 64-bit vs 32-bit differences go here, except if the runtime differs from what is natural (commonly "32-bit pointers even though the CPU is in 64-bit mode"), which goes in <abi> instead. Historically, "arm" and "mips" have been a mess here, but that has largely been fixed, in large part as a side-effect of Debian multiarch (whose triples only have to differ from GNU triples in that they canonicalize i[34567]86 to i386, but you should use dpkg-architecture to do the conversion for sanity).

    The <vendor> field (2nd) is not very useful these days. It defaults to "unknown" but as of a few years ago "pc" is used instead on x86 (this means that the canonical triple can change, but this hasn't been catastrophic since you should almost always use the non-canonical triple except when pattern-matching, and when pattern-matching you should usually ignore this field anyway).

    The <kernel> field (3rd) is pretty obvious when it's called that, but it's often called <os> instead since "linux" is an oddity for regularly having a <libc> component that differs. On many systems it includes version data (again, Linux is the oddity for having a stable syscall API/ABI). One notable exception: if a GNU userland is used on BSD/Solaris system, a "k" is prepended. "none" is often used for freestanding/embedded compilation, but see <obj>.

    The <libc> field (main part of the 4th) is usually absent on non-Linux systems, but mandatory for "linux". If it is absent, the dash after the kernel is usually removed, except if there are ABI components. Note that "gnu" can be both a kernel (Hurd) and a libc (glibc). Android uses "android" here, so maybe <libc> is a bit of a misnomer (it's not "bionic") - maybe <userland>?

    <abi>, if present, means you aren't doing the historical default for the platform specified by the main fields. Other than "eabi" for ARM, most of this is for "use 32-bit pointers but 64-bit registers".

    <fabi> can be "hf" for 32-bit ARM systems that actually support floats in hardware. I don't think I've seen anything else, though I admit the main reason I separately document this from <abi> is because of how Debian's architecture puts it elsewhere.

    <obj> is the object file format, usually "aout", "coff", or "elf". It can be appended to the kernel field (but before the kernel version number), or replace it if "none", or it can go in the <abi> field.

    • IshKebab 16 hours ago

      Nah I dunno where you're getting your information from but LLVM only supports 5 components.

      See the code starting at line 1144 here: https://llvm.org/doxygen/Triple_8cpp_source.html

      The components are arch-vendor-os-environment-objectformat.

      It's absolutely full of special cases and hacks. Really at this point I think the only sane option is an explicit list of fixed strings. I think Rust does that.

      • jcranmer 14 hours ago

        You're not really contradicting o11c here; what LLVM calls "environment" is a mixture of what they called libc/abi/fabi. There's also what LLVM calls "subarch" to distinguish between different architectures that may be relevant (e.g., i386 is not the same as i686, although LLVM doesn't record this difference since it's generally less interested in targeting old hardware), and there's also OS version numbers that may or may not be relevant.

        The underlying problem with target triples is that architecture-vendor-system isn't sufficient to uniquely describe the relevant details for specifying a toolchain, so the necessary extra information has been somewhat haphazardly added to the format. On top of that, since the relevance of some of the information is questionable for some tasks (especially the vendor field), different projects have chosen not to care about subtle differences, so the normalization of a triple is different between different projects.

        LLVM's definition is not more or less correct than gcc's here, nor are these the only definitions floating around.

        • o11c 9 hours ago

          Hm, looking to see if the vendor field is actually meaningful ... I see some stuff for m68k and mips and sysv targets ... some of it working around pre-standard vendor C implementations

          Ah, I found a modern one:

            i[3456789]86-w64-mingw* does not use winsup
            i[3456789]86-*-mingw* with other vendors does use winsup
          
          There are probably more; this is embedded in all sorts of random configure scripts and it is very not-greppable.
      • o11c 16 hours ago

        LLVM didn't invent the scheme; why should we pay attention to their copy and not look at the original?

        The GNU Config project is the original.

        • IshKebab 6 hours ago

          The article goes into this a bit. But basically because LLVM is extremely popular and used as a backend by lots of other languages, e.g. Rust.

          Frankly being the originators of this deranged scheme is a good reason not to listen to GNU!

jkelleyrtp 19 hours ago

The author's blog is a FANTASTIC source of information. I recommend checking out some of their other posts:

- https://mcyoung.xyz/2021/06/01/linker-script/

- https://mcyoung.xyz/2023/08/09/yarns/

- https://mcyoung.xyz/2023/08/01/llvm-ir/

  • eqvinox 18 hours ago

    Given TFA's bias against GCC, I'm not so sure. e.g. looking at the linker script article… it's also missing the __start_XYZ and __stop_XYZ symbols automatically created by the linker.

    • sramsay 15 hours ago

      I was really struck by the antipathy toward GCC. I'm not sure I quite understand where it's coming from.

    • matheusmoreira 16 hours ago

      It also focuses exclusively on sections. I wish it had at least mentioned segments, also known as program headers. Linux kernel's ELF loader does not care about sections, it only cares about segments.

      Sections and segments are more or less the same concept: metadata that tells the loader how to map each part of the file into the correct memory regions with the correct memory protection attributes. Biggest difference is segments don't have names. Also they aren't neatly organized into logical blocks like sections are, they're just big file extents. The segments table is essentially a table of arguments for the mmap system call.

      Learning this stuff from scratch was pretty tough. Linker script has commands to manipulate the program header table but I couldn't figure those out. In the end I asked developers to add command line options instead and the maintainer of mold actually obliged.

      Looks like very few people know about stuff like this. One can use it to do some heavy wizardry though. I leveraged this machinery into a cool mechanism for embedding arbitrary data into ELF files. The kernel just memory maps the data in before the program has even begun execution. Typical solutions involve the program finding its own executable on the file system, reading it into memory and then finding some embedded data section. I made the kernel do almost all of that automatically.

      https://www.matheusmoreira.com/articles/self-contained-lone-...

      • o11c 14 hours ago

        I wouldn't call them "same concept" at all. Segments (program headers) are all about the runtime (executables and shared libraries) and are low-cost. Sections are all about development (.o files) and are detailed.

        Generally there are many sections combined into a single segment, other than special-purpose ones. Unless you are reimplementing ld.so, you almost certainly don't want to touch segments; sections are far easier to work with.

        Also, normally you just just call `getauxval`, but if needed the type is already named `ElfW(auxv_t)*`.

        • matheusmoreira 14 hours ago

          > I wouldn't call them "same concept" at all.

          They are both metadata about file extents and their memory images.

          > sections are far easier to work with

          Yes. They are not, however, loaded into memory by default. Linkers do not generate LOAD segments for section metadata since they are not needed for execution. Thus it's impossible for a program to introspect its own sections without additional logic and I/O to read them into memory.

          > Also, normally you just just call `getauxval`, but if needed the type is already named `ElfW(auxv_t)*`.

          True. I didn't use it because it was not available. I wrote my article in the context of a freestanding nolibc program.

          • o11c 14 hours ago

            Right, but you can just use the section start/end symbols for a section that already goes into a mapped segment.

            • matheusmoreira 13 hours ago

              Can you show me how that would work?

              It's trivial to put arbitrary files into sections:

                objcopy --add-section program.files.1=file.1.dat \
                        --add-section program.files.2=file.2.dat \
                        program program+files
              
              The problem is the program.files.* sections do not get mapped in by a LOAD segment. I ended up having to write my own tool to patch in a LOAD segment into the segments table because objcopy does not have the ability to do it.

              Even asked a Stack Overflow question about this two years ago:

              https://stackoverflow.com/q/77468641

              The only answer I got told me to simply read the sections into memory via /proc/self/exe or edit the segments table and make it so that the LOAD segments cover the whole file. I eventually figured out ways to add LOAD segments to the table. By that point I didn't need sections anymore, just a custom segment type.

              • o11c 13 hours ago

                The whole point of section names is that they mean something. If you give it a name that matches `.rodata.*` it will be part of the existing read-only LOADed segments, or `.data.*` for (private) read-write.

                Use `ld --verbose` to see what sections are mapped by default (it is impossible for a linker to work without having such a linker script; we're just lucky that GNU ld exposes it in a sane form rather than hard-coding it as C code). In modern versions of the linker (there is still old documentation found by search engines), you can specify multiple SECTIONS commands (likely from multiple scripts, i.e. just files passed on the command line), but why would you when you can conform to the default one?

                You should pick a section name that won't collide with the section names generated by `-fdata-sections` (or `-ffunction-sections` if that's ever relevant for you).

                • matheusmoreira 11 hours ago

                  That requires relinking the executable. That is not always desirable or possible. Unless the dynamic linker ignores the segments table in favor of doing this on the fly... Even if that's the case, it won't work for statically linked executables. Only the dynamic linker can assign meaning to section names at runtime and the dynamic linker isn't involved at all in the case of statically linked programs.

      • eqvinox 5 hours ago

        Absolutely agree. Had my own fun dealings with ELF, and to be clear, on plain mainline shipping products (amd64 Linux), not toys/exercise/funky embedded. (Wouldn't have known about section start/stop symbols otherwise)

IAmLiterallyAB 11 hours ago

> However, due to the runaway popularity of LLVM, virtually all compilers now use target triples.

That's a wild take. I think its pretty universally accepted the GCC and the GNU toolchain is what made this ubiquitous.

Also, the x32 ABI is still around, support is still around, I don't know where the author got that notion

throw0101d 18 hours ago

Noticed endians listed in the table. It seems like little-endian has basically taken over the world in 2025:

* https://en.wikipedia.org/wiki/Endianness#Hardware

Is there anything that is used a lot that is not little? IBM's stuff?

Network byte order is BE:

* https://en.wikipedia.org/wiki/Endianness#Networking

  • dharmab 18 hours ago

    LEON, used by the European Space Agency, is big endian.

    • naruhodo 8 hours ago

      Should have been called BEON.

  • Palomides 18 hours ago

    IBM's Power chips can run in either little or big modes, but "used a lot" is a stretch

    • inferiorhuman 17 hours ago

      Most PowerPC related stuff (e.g. Freescale MPC5xx found in a bunch of automotiver applications) can run in either big or little endian mode, as can most ARM and MIPS (routers, IP cameras) stuff. Can't think of the last time I've seen any of them configured to run in big endian mode tho.

      • classichasclass 16 hours ago

        For the large Power ISA machines, it's most commonly when running AIX or IBM i these days, though the BSDs generally run big too.

  • rv3392 16 hours ago

    Apart from IBM Power/AIX systems, SPARC/Solaris is another one. I wouldn't say either of these are used a lot, but there's a reasonable amount of legacy systems out there that are still being supported by IBM and Oracle.

  • forrestthewoods 18 hours ago

    BE isn’t technically dead buts it’s practically dead for almost all projects. You can static_assert byte order and then never think about BE ever again.

    All of my custom network serialization formats use LE because there’s literally no reason to use BE for network byte order. It’s pure legacy cruft.

    • f_devd 9 minutes ago

      ...Until you find yourself having to workaround legacy code to support some weird target that does still use BE. Speaking from experience (tbf usually lower level than anything actually networked, more like RS485 and friends).

  • formerly_proven 18 hours ago

    10 years ago the fastest BE machines that were practical were then-ten year old powermacs. This hasn’t really changed. I guess they’re more expensive now.

    • eqvinox 18 hours ago

      e6500/T4240 are faster than powermacs. Not sure how rare they are nowadays, we didn't have any trouble buying some (on eBay). 12×2 cores, 48GB RAM, for BE that's essentially heaven…

  • thro3838484848 18 hours ago

    Java VM is BE.

    • kbolino 18 hours ago

      This is misleading at best. The JVM only exposes multibyte values to ordinary applications in such a way that byte order doesn't matter. You can't break out a pointer and step through the bytes of a long field to see what order it's in, at least not without the unsafe memory APIs.

      In practice, any real JVM implementation will simply use native byte order as much as possible. While bytecode and other data in class files is serialized in big endian order, it will be converted to native order whenever it's actually used. If you do pull out the unsafe APIs, you can see that e.g. values are little endian on x86(-64). The JVM would suffer from major performances issues if it tried to impose a byte order different from the underlying platform.

      • PhilipRoman 16 hours ago

        One relatively commonly used class which exposes this is ByteBuffer and its Int/Long variants, but there you can specify the endianness explicitly (or set it to match the native one).

cwood-sdf 18 hours ago

"And no, a “target quadruple” is not a thing and if I catch you saying that I’m gonna bonk you with an Intel optimization manual. "

https://github.com/ziglang/zig/issues/20690

  • debugnik 16 hours ago

    The argument is that they're called triples even when they've got more or less components than 3. They should have simply been called target tuples or target monikers.

    • o11c 14 hours ago

      "gnu tuple" and "gnu type" are also common names.

      The comments in `config.guess` and `config.sub`, which are the origin of triples, use a large variety of terms, at least the following:

        configuration name
        configuration type
        [machine] specification
        system name
        triplet
        tuple
pie_flavor 16 hours ago

Sorry, going to keep typing x64. Unlike the article's recommendation of x86, literally everyone knows exactly what it means at all times.

  • qu4z-2 15 hours ago

    If someone tells me x86, I am certainly thinking 32-bit protected mode not 64-bit long mode... Granted I'm in the weird space where I know enough to be dangerous but not enough to keep me up-to-date with idiomatic naming conventions.

ycombinatrix 16 hours ago

>There’s a few variants. wasm32-unknown-unknown (here using unknown instead of none as the system, oops)

Why isn't it called wasm32-none-none?

  • pie_flavor 16 hours ago

    As far as I can tell, it's because libstd exists (but is full of do-nothing stubs). There is another `wasm32-none` target which is no_std.

fweimer 19 hours ago

I think GCC's more-or-less equivalent to Clang's --target is called -B: https://gcc.gnu.org/onlinedocs/gcc/Directory-Options.html#in...

I assume it works with an all-targets binutils build. I haven't seen anyone building their cross-compilers in this way (at least not in recent memory).

  • JoshTriplett 19 hours ago

    I haven't either, probably because it would require building once per target and installing all the individual binaries.

    This is one of the biggest differences between clang and GCC: clang has one binary that supports multiple targets, while a GCC build is always target-specific.

  • o11c 14 hours ago

    Old versions of GCC used to provide `-b <machine>` (and also `-V <version>`), but they were removed a long time ago in favor of expecting people to just use and set `CC` correctly.

    It looks like gcc 3.3 through 4.5 just forwards to an external driver; prior to that it seems like it used the same driver for different paths, and after that it is removed.

dvektor 13 hours ago

Great read. Love those articles where you go in thinking that you have a pretty solid understanding of the topic and then proceed to learn much more than you thought you would.

psyclobe 18 hours ago

Sounds like what we use with vcpkg to define the systems tooling; still trying to make sense of it all these years later, but we define things like x64-Linux-static to imply target architecture platform and linkage style to runtime.

matheusmoreira 18 hours ago

> Go originally wanted to not have to link any system libraries, something that does not actually work

It does work on Linux, the only kernel that promises a stable binary interface to user space.

https://www.matheusmoreira.com/articles/linux-system-calls

  • lonjil 14 hours ago

    FreeBSD does as well, but old ABI versions aren't kept forever.

  • damagednoob 11 hours ago

    When developing a small program for my Synology NAS in Go, I'm sure I had to target a specific version of glibc.

  • guipsp 17 hours ago

    Does it really tho? I've had address resolution break more than once in go programs.

    • matheusmoreira 17 hours ago

      That's because on Linux systems it's typical for domain name resolution to be provided by glibc. As a result, people ended up depending on glibc. They were writing GNU/Linux software, not Linux software.

      https://wiki.archlinux.org/title/Domain_name_resolution

      https://en.wikipedia.org/wiki/Name_Service_Switch

      https://man.archlinux.org/man/getaddrinfo.3

      This is user space stuff. You can trash all of this and roll your own mechanism to resolve the names however you want. Go probably did so. Linux will not complain in any way whatsoever.

      Linux is the only kernel that lets you do this. Other kernels will break your software if you bypass their system libraries.

      • guipsp 16 hours ago

        I mean, that is fine and all, but it doesn't really matter for making the software run correctly on systems that currently exist.

        • matheusmoreira 15 hours ago

          It works fine on current Linux systems. We can have freestanding executables that talk to Linux directly and link against zero system libraries.

          It's just that those executables are going to have to resolve names all by themselves. Chances are they aren't going to do it exactly like glibc does. That may or may not be a problem.

          • o11c 14 hours ago

            Historically, when DNS breaks in a not-glibc environment, it's very often found to in fact be a violation of some standard by the not-glibc, rather than a program that fails to document a glibc dependency.

            • fc417fc802 6 hours ago

              Just connect to the service running on localhost ...

              I'm curious. Why isn't getaddrinfo implemented in a similar manner to the loaders that graphics APIs use? Shouldn't that functionality be the responsibility of whatever resolver has been installed?

therein 18 hours ago

I like the code editor style preview on the right. Enough to forgive the slightly clunky scroll.

  • SrslyJosh 17 hours ago

    It looks nice, but I find the choppy scrolling (on an M1 MBP, no less!) to be distracting.

    It also doesn't really tell me anything about the content, except where I'm going to see tables or code blocks, so I'm not sure what the benefit is.

    Given the really janky scrolling, I'd like to have a way to hide it.

  • tiffanyh 18 hours ago

    FYI - to see this you need to have your browser at least 1435px wide.

  • Starlevel004 17 hours ago

    Unfortunately the text in the preview shows up in ctrl+f.

forrestthewoods 18 hours ago

What a great article.

Everytime I deal with target triples I get confused and have to refresh my memory. This article makes me feel better in knowing that target triples are an unmitigated cluster fuck of cruft and bad design.

> Go does the correct thing and distributes a cross compiler.

Yes but also no. AFAIK Zig is the only toolchain to provide native cross compiling out of the box without bullshit.

Missing from this discussion is the ability to specify and target different versions of glibc. Something that I think only Zig even attempts to do because Linux’s philosophy of building against local system globals is an incomprehensibly bad choice. So all these target triples are woefully underspecified.

I like that at least Rust defines its own clear list of target triples that are more rational than LLVM’s. At this point I feel like the whole concept of a target triples needs to be thrown away. Everything about it is bad.

theoperagoer 17 hours ago

Great content. Also, this website is gorgeous!

o11c 18 hours ago

This article should be ignored, since it disregards the canonical origin of target triples (and the fact that it's linked to `configure`):

https://git.savannah.gnu.org/cgit/config.git/tree/

The `testsuite/` directory contains some data files with a fairly extensive list of known targets. The vendor field should be considered fully extensible, and new combinations of know machine/kernel/libc shouldn't be considered invalid, but anything else should have a patch submitted.

  • jcranmer 17 hours ago

    This article is a very LLVM-centric view, and it does ignore the GNU idea of a target triple, which is essentially $(uname -a)-vendor-$(uname -s), with vendor determined (so far as I can tell) entirely from uname -s, the system name undergoing some amount of butchering, and version numbers sometimes being included and sometimes not, and Linux getting a LIBC tacked on.

    But that doesn't mean the article should be ignored in its entirety. LLVM's target triple parsing is more relevant for several projects (especially given that the GNU target triple scheme doesn't include native Windows, which is one of the most common targets in practice!). Part of the problem is that for many people "what is a target triple" is actually a lead-in to the question "what are the valid targets?", and trying to read config.guess is not a good vehicle to discover the answer. config.guess isn't also a good way to find about target triples for systems that aren't designed to run general-purpose computing, like if you're trying to compile for a GPU architecture, or even a weird x86 context like UEFI.

    • o11c 17 hours ago

      The GNU scheme does in fact have support for various windows targets. It's just that the GNU compilers don't support them all.

    • pjc50 16 hours ago

      There's MinGW.

AceJohnny2 18 hours ago

Offtopic, but I'm distracted by the opening example:

> After all, you don’t want to be building your iPhone app on literal iPhone hardware.

iPhones are impressively powerful, but you wouldn't know it from the software lockdown that Apple holds on it.

Example: https://www.tomsguide.com/phones/iphones/iphone-16-is-actual...

There's a reason people were clamoring for Apple to make ARM laptops/desktops for years before Apple finally committed.

  • AceJohnny2 18 hours ago

    I do not think I like this author...

    > A critical piece of history here is to understand the really stupid way in which GCC does cross compiling. Traditionally, each GCC binary would be built for one target triple. [...] Nobody with a brain does this ^2

    You're doing GCC a great disservice by ignoring its storied and essential history. It's over 40 years old, and was created at a time where there were no free/libre compilers. Computers were small and slow. Of course you wouldn't bundle multiple targets in one distribution.

    LLVM benefitted from a completely different architecture and starting from a blank slate when computers were already faster and much larger, and was heavily sponsored by a vendor that was innately interested in cross-compiling: Apple. (Guess where LLVM's creator worked for years and lead the development tools team)

    • jaymzcampbell 17 hours ago

      The older I get the more this kind of commentary (the OP, not you!) is a total turn off. Systems evolve and there's usually, not always, a reason for why "things are the way they are". It's typically arrogance to have this kind of tone. That said I was a bit like that when I was younger, and it took a few knockings down to realise the world is complex.

    • FitCodIa 16 hours ago

      > and was heavily sponsored by a vendor that was innately interested in cross-compiling

      and innately disinterested in Free Software, too

    • steveklabnik 17 hours ago

      "This was the right way to do it forty years ago, so that's why the experience is worse" isn't a compelling reason for a user to suffer today.

      Also, in this specific case, this ignores the history around LLVM offering itself up to the FSF. gcc could have benefitted from this fresh start too. But purely by accident, it did not.

      • FitCodIa 16 hours ago

        > "This was the right way to do it forty years ago, so that's why the experience is worse" isn't a compelling reason for a user to suffer today.

        On my system, "dnf repoquery --whatrequires cross-gcc-common" lists 26 gcc-*-linux-gnu packages (that is, kernel / firmware cross compilers for 26 architectures). The command "dnf repoquery --whatrequires cross-binutils-common" lists 31 binutils-*-linux-gnu packages.

        The author writes, "LLVM and all cross compilers that follow it instead put all of the backends in one binary". Do those compilers support 25+ back-ends? And if they do, is it good design to install back-ends for (say) 23 such target architectures that you're never going to cross-compile for, in practice? Does that benefit the user?

        My impression is that the author does not understand the modularity of gcc cross compilers / packages because he's unaware of (or doesn't care for) the scale that gcc aims at.

        • steveklabnik 16 hours ago

          > And if they do, is it good design to install back-ends for (say) 23 such target architectures that you're never going to cross-compile for, in practice? Does that benefit the user?

             rustc --print target-list | wc -l
            287
          
          I'm kinda surprised at how large that is, actually. But yeah, I don't mind if I have the capability to cross-compile to x86_64-wrs-vxworks that I'm never going to use.

          I am not an expert on all of these details in clang specifically, but with rustc, we take advantage of llvm's target specifications, so you that you can even configure a backend that the compiler doesn't yet know about by simply giving it a json file with a description. https://doc.rust-lang.org/nightly/nightly-rustc/rustc_target...

          While these built-in ones aren't defined as JSON, you can ask the compiler to print one for you:

               rustc +nightly -Z unstable-options --target=x86_64-unknown-linux-gnu --print target-spec-json
          
          It's lengthy so instead of pasting here, I've put this in a gist: https://gist.github.com/steveklabnik/a25cdefda1aef25d7b40df3...

          Anyway, it is true that gcc supports more targets than llvm, at least in theory. https://blog.yossarian.net/2021/02/28/Weird-architectures-we...

      • AceJohnny2 17 hours ago

        I'd love to learn what accident you're referring to, Steve!

        I vaguely recall the FSF (or maybe only Stallman) arguing against the modular nature of LLVM because a monolothic structure (like GCC's) makes it harder for anti-GPL actors (Apple!) to undermine it. Was this related?

  • boricj 18 hours ago

    A more pertinent (if dated) example would be "you don't want to be building your GBA game on literal Game Boy Advance hardware".

  • plorkyeran 13 hours ago

    iPhones have terrible heat dispersion compared to even a fanless computer like a macbook air. You get a few minutes at full load before thermal throttling kicks in, so you could do the occasional build of your iPhone app on an iPhone but it'd be pretty terrible as a development platform.

    At work we had some benchmarking suites that ran on physical devices and even with significant effort put into cooling them they spent more time sleeping waiting to cool off than actually running the benchmarks.

bruce343434 18 hours ago

Why does this person have such negative views of GCC and positive bias towards LLVM?

  • nemothekid 18 hours ago

    If OP is above 30 - it's probably due to the frustration of trying to modularize GCC that led to the creation of LLVM in the first place. If OP is below 30, it's probably because he grew up in a world where most compiler research and design is done on LLVM and GCC is for grandpa.

  • steveklabnik 17 hours ago

    I have intense respect for the history of gcc, but everything about using it screams that it's stuck in the past.

    LLVM has a lot of problems, but it feels significantly more modern.

    I do wish we had a "new LLVM" doing to LLVM what it did to gcc. Just because it's better doesn't mean it's perfect.

    Basically, you can respect history while also being honest about the current state of things. But also, doing so requires you to care primarily about things like ease of use, rather than things like licenses. For some people, they care about licenses first, usability second.

    • tialaramex 15 hours ago

      Their IR is a mess. So a "new LLVM" ought to start by nailing down the IR.

      And as a bonus, seems to me a nailed down IR actually is that portable assembly language the C people keep telling us is what they wanted. Most of them don't actually want that and won't thank you - but if even 1% of the "I need a portable assembler" crowd actually did want a portable assembler they're a large volume of customers from day one.

      • o11c 14 hours ago

        Having tried writing plugins for both, I very much prefer GCC's codebase. You have to adapt to its quirks, but at least it won't pull the rug from under your feet gratuitously. There's a reason every major project ends up embedding a years-old copy of LLVM rather than just using the system version.

        If you're ignoring the API and writing IR directly there are advantages to LLVM though.

    • flkenosad 17 hours ago

      Honestly, I love that both exist with their respective world views.

      • steveklabnik 17 hours ago

        I for sure don't want to suggest that anyone who loves gcc shouldn't be working on what they love. More compilers are a good thing, generally. Just trying to say why I have a preference.

  • Skywalker13 18 hours ago

    It is unfortunate. GCC has enabled the compilation of countless lines of source code for nearly 40 years and has served millions of users. Regardless of whether its design is considered good or bad today, GCC has played an essential role and has enabled the emergence of many projects and new compilers. GCC deserves deep respect.

  • matheusmoreira 18 hours ago

    Good question. Author is incredibly hostile to one of the most important pieces of software ever developed because of the way they approached the problem nearly 40 years ago. Then he criticizes Go for trying to redesign the system instead of just using target triples...

    • FitCodIa 16 hours ago

      The author writes: "really stupid way in which GCC does cross compiling [...] Nobody with a brain does this [...]", and then admits in the footnote, "I’m not sure why GCC does this".

      Immature to the point of alienating.

  • xyst 18 hours ago

    Seems to have a decent amount of knowledge in this domain in education and professional work. Author is from MIT so maybe professors had a lot of influence here.

    also, gcc is relatively old and comes with a lot of baggage. LLVM is sort of the defacto standard now with improvements in performance

    • rlpb 18 hours ago

      > LLVM is sort of the defacto standard now...

      Distributions, and therefore virtually all the software used by a distribution user, still generally use gcc. LLVM is only the de facto standard when doing something new, and for JIT.

    • bruce343434 17 hours ago

      as someone who uses both Clang and GCC to cover eachothers weaknesses, as far as I can tell both LLVM and GCC are hopelessly beastly codebases in terms of raw size and their complexity. I think that's just what happens when people desire to build an "everything compiler".

      From what I gathered, LLVM has a lot of C++ specific design choices in its IR language anyway. I think I'd count that as baggage.

      I personally don't think one is better than the other. Sometimes clang produces faster code, sometimes gcc. I haven't really dealt with compiler bugs from either. They compile my projects at the same speed. Clang is better at certain analyses, gcc better at certain others.

      • ahartmetz 15 hours ago

        Clang used to compile much faster than GCC. I was excited. Now there is barely any difference, so I keep using GCC and occasionally some Clang-based tools such as iwyu, ClangBuildAnalyzer or sanitizer options (rare, Valgrind is easier and more powerful though sanitizers also have unique features).

  • flkenosad 17 hours ago

    It's the new anti-woke mind virus going around attacking anything "communist" such as copyleft, Stallman, GCC, GNU, etc.

Joker_vD 11 hours ago

> no one calls it x64 except for Microsoft. And even though it is fairly prevalent on Windows, I absolutely give my gamedev friends a hard time when they write x64.

So, it turns out, actually a lot of people call it x64 — including author's own friends! — it's just that the author dislikes it. Disliking something is fine, but why claim outright falsehood which you know first-hand is false?

Also, the actual proper name for this ISA is, of course, EM64T. /s

> The fourth entry of the triple (and I repeat myself, yes, it’s still a triple)

Any actual justification except the bald assertions from the personal preferences? Just call it a "tuple", or something...

kridsdale1 18 hours ago

I really appreciate the angular tilt of the heading type on that blog.

Retr0id 19 hours ago

Note to author, I'm not sure the word "anachronism" is being used correctly in the intro.

  • kupiakos 19 hours ago

    It's being used correctly: something that is conspicuously old-fashioned for its environment is an anachronism. A toolchain that only supports native builds fits.

    • Retr0id 17 hours ago

      The article does not place any given toolchain within an incorrect environment, though.

      If someone said "old compilers were usually cross-compilers", that would be an ahistoric statement (somewhat).

      If someone used clang in a movie set in the 90s, that would be anachronistic.

  • bqmjjx0kac 19 hours ago

    It's technically correct, but feels a bit forced.

  • compyman 19 hours ago

    I think the meaning is that the idea that compilers can only compile for their host machine is an ananchronism, since that was historically the case but is no longer true.

    • bregma 18 hours ago

      Heck, it hasn't been true since the 1950s. Consider it as "has never been true".

      Oh, sure, there have been plenty of native-host-only compilers. It was never a property of all compilers, though. Most system brings-ups, from the mainframes of the 1960s through the minis of the 1970s to the micros and embeddeds of the 1980s and onwards have required cross compilers.

      I think what he means is that a single-target toolchain is an anachronism. That's also not true, since even clang doesn't target everything under the sun in one binary. A toolchain needs far more than a compiler, for a start; it needs the headers and libraries and it needs a linker. To go from source to executable (or herd of dynamic shared objects) requires a whole lot more than installing the clang (or whatever front-end) binary and choosing a nifty target triple. Most builds of clang don't even support all the interesting target triples and you need to build it yourself, which require a lot more computer than I can afford.

      Target triples are not even something limited to toolchains. I maintain software that gets cross-built to all kinds of targets all the time and that requires target triples for the same reasons compilers do. Target triples are just a basic tool of the trade if you deal with anything other than scripting the browser and they're a solved problem rediscovered every now and then by people who haven;t studied their history.

    • stefan_ 19 hours ago

      Telling people that "Clang can compile for any architecture you like!" tends to confuse them more than it helps. I suppose it sets up unrealistic assumptions because of course outputting assembly for some architecture is a very long way from making working userland binaries for a system based on that architecture, which is what people actually want.

      And ironically in all of this, building a full toolchain based on GCC is still easier than with LLVM.