What the hell is a target triple?

171 points by ingve 3 months ago

arp242 3 months ago

> There are also many ficticious names for 64-bit x86, which you should avoid unless you want the younger generation to make fun of you. amd64 refers to AMD’s original implementation of long mode in their K8 microarchitecture, first shipped in their Athlon 64 product. Calling it amd64 is silly and also looks a lot like arm64, and I am honestly kinda annoyed at how much Go code I’ve seen with files named fast_arm64.s and fast_amd64.s. Debian also uses amd64/arm64, which makes browsing packages kind of annoying.

I prefer amd64 as it's so much easier to type and scans so much easier. x86_64 is so awkward.

Bikeshed I guess and in the abstract I can see how x86_64 is better, but pragmatism > purity and you'll take my amd64 from my cold dead hands.

As for Go, you can get the GOARCH/GOOS combinations from "go tool dist list". Can be useful at times if you want to ensure your code cross-compiles in CI.

kowabungalow 3 months ago

There's genuine AMD_64 and the knock off by their slower competitor who coined the genuine emphasis for the first to market. I don't see what is confusing about that.

peterldowns 3 months ago

Some other sources of target triples (some mentioned in the article, some not):

rustc: `rustc --print target-list`

golang: `go tool dist list`

zig: `zig targets`

As the article point out, the complete lack of standardization and consistency in what constitutes a "triple" (sometimes actually a quad!) is kind of hellishly hilarious.

lifthrasiir 3 months ago

> what constitutes a "triple" (sometimes actually a quad!)
It is actually a quintiple at most because the first part, architecture, may contain a version for e.g. ARM. And yet it doesn't fully describe the actual target because it may require an additional OS version for e.g. macOS. Doubly silly.
- achierius 3 months ago
  
  Why would macOS in particular require an OS version where other platforms would not -- just backwards compatibility?
ycombinatrix 3 months ago

at least we don't have to deal with --build, --host, --target nonsense anymore
- rendaw 3 months ago
  
  You do on Nix. And it's as inconsistently implemented there as anywhere.

psanford 3 months ago

As a Go developer, I certainly find the complaints about the go conventions amusing. I guess if you have really invested so much into understanding all the details in the rest of this article you might be annoyed that it doesn't translate 1 to 1 to Go.

But for the rest of us, I'm so glad that I can just cross compile things in Go without thinking about it. The annoying thing with setting up cross compilation in GCC is not learning the naming conventions, it is getting the correct toolchains installed and wired up correctly in your build system. Go just ships that out of the box and it is so much more pleasant.

Its also one thing that is great about zig. Using Go+zig when I need to cross compile something that includes cgo in it is so much better than trying to get GCC toolchains setup properly.

cbmuser 3 months ago

»32-bit x86 is extremely not called “x32”; this is what Linux used to call its x86 ILP324 variant before it was removed.«

x32 support has not been removed from the Linux kernel. In fact, we‘re still maintaining Debian for x32 in Debian Ports.

jcranmer 3 months ago

I did start to try to take clang's TargetInfo code (https://github.com/llvm/llvm-project/blob/main/clang/lib/Bas...) and porting it over to TableGen, primarily so somebody could actually extract useful auto-generated documentation out of it, like "What are all the targets available?"

I actually do have working code for the triple-to-TargetInfo instantiation portion (which is fun because there's one or two cases that juuuust aren't quite like all of the others, and I'm not sure if that's a bad copy-paste job or actually intentional). But I never got around to working out how to actually integrate the actual bodies of TargetInfo implementations--which provide things like the properties of C/C++ fundamental types or default macros--into the TableGen easily, so that patch is still merely languishing somewhere on my computer.

ComputerGuru 3 months ago

Great article but I was really put off by this bit, which aside from being very condescending, simply isn't true and reveals a lack of appreciation for the innovation that I would have thought someone posting about target triples and compilers would have appreciated:

> Why the Windows people invented a whole other ABI instead of making things clean and simple like Apple did with Rosetta on ARM MacBooks? I have no idea, but http://www.emulators.com/docs/abc_arm64ec_explained.htm contains various excuses, none of which I am impressed by. My read is that their compiler org was just worse at life than Apple’s, which is not surprising, since Apple does compilers better than anyone else in the business.

I was already familiar with ARM64EC from reading about its development from Microsoft over the past years but had not come across the emulators.com link before - it's a stupendous (long) read and well worth the time if you are interested in lower-level shenanigans. The truth is that Microsoft's ARM64EC solution is a hundred times more brilliant and a thousand times better for backwards (and forwards) compatibility than Rosetta on macOS, which gave the user a far inferior experience than native code, executed (sometimes far) slower, prevented interop between legacy and modern code, left app devs having to do a full port to move to use newer tech (or even just have a UI that matched the rest of the system), and was always intended as a merely transitional bit of tech to last the few years it took for native x86 apps to be developed and take the place (usurp) of old ppc ones.

Microsoft's solution has none of these drawbacks (except the noted lack of AVX support), doesn't require every app to be 2x or 3x as large as a sacrifice to the fat binaries hack, offers a much more elegant solution for developers to migrate their code (piecemeal or otherwise) to a new platform where they don't know if it will be worth their time/money to invest in a full rewrite, lets users use all the apps they love, and maintains Microsoft's very much well-earned legacy for backwards compatibility.

When you run an app for Windows 2000 on Windows 11 (x86 or ARM), you don't see the old Windows 2000 aesthetic (and if you do, there's an easy way for users to opt into newer theming rather than requiring the developer to do something about it) and you aren't stuck with bugs from 30 years ago that were long since patched by the vendor many OS releases ago.

plorkyeran 3 months ago

The thing named Rosetta (actually Rosetta 2) for the x86_64 -> ARM transition is technologically completely unrelated to the PPC -> x86 Rosetta, and has none of the problems you mention. There's no user-observable difference between a program using Rosetta and a native program in modern macOS, and porting programs which didn't have any assembly or other CPU-arch-specific code was generally just a matter of wrangling your build system.
- ComputerGuru 3 months ago
  
  I addressed that in my response here: https://news.ycombinator.com/item?id=43720758
Zamiel_Snawley 3 months ago

Do those criticisms of Rosetta hold for Rosetta 2?
I assumed the author was talking about the x86 emulator released for the arm migration a few years ago, not the powerpc one.
- ComputerGuru 3 months ago
  
  They do indeed. Rosetta 2 is lightyears beyond Rosetta when it comes to performance and emulation overhead strategies and benefits from hardware support (and having to do less work just because of fewer differences between the host/target architectures) but still fundamentally relies on the emulation the entirety of the stack. There is almost zero information about its internals disclosed, but from what I understand it still revolves around fat binaries - and necessitates that Apple compiles their frameworks against both x86_64 and arm64. Unlike the MS solution, with Rosetta 2 you cannot call a native ARM64 library from an x86_64 binary, you can't port your code over piece-by-piece, and once Apple decides to no longer ship the next version of xxx framework as a fat binary because they don't want to maintain support for two different architectures in their codebase (wholly understandable), you'll (at best) be left with an older version of said framework that hasn't been patched to address the latest bugs, doesn't behave the same way that newer apps linking against the newer version of the framework do, etc.
Philpax 3 months ago

This author has a tendency to be condescending about things they find disagreeable. It's why I stopped reading them.
juped 3 months ago

You have neglected to consider that Microsoft bad; consider how they once did something differently from a Linux distribution I use. (This sentiment is alive and well among otherwise intelligent people; it's embarrassing to read.)

matheusmoreira 3 months ago

> Go originally wanted to not have to link any system libraries, something that does not actually work

It does work on Linux, the only kernel that promises a stable binary interface to user space.

https://www.matheusmoreira.com/articles/linux-system-calls

lonjil 3 months ago

FreeBSD does as well, but old ABI versions aren't kept forever.
- matheusmoreira 3 months ago
  
  People have told me that before but I was unable to find official documentation of this fact. Can you point me to it? Closest I found is forum posts claiming the ABI compatibility is good.
damagednoob 3 months ago

When developing a small program for my Synology NAS in Go, I'm sure I had to target a specific version of glibc.
- matheusmoreira 3 months ago
  
  Probably because the networking libraries use it for name resolution. That's a choice the developers of the Go implementation made. It's not required.
guipsp 3 months ago

Does it really tho? I've had address resolution break more than once in go programs.
- matheusmoreira 3 months ago
  
  That's because on Linux systems it's typical for domain name resolution to be provided by glibc. As a result, people ended up depending on glibc. They were writing GNU/Linux software, not Linux software.
  https://wiki.archlinux.org/title/Domain_name_resolution
  https://en.wikipedia.org/wiki/Name_Service_Switch
  https://man.archlinux.org/man/getaddrinfo.3
  This is user space stuff. You can trash all of this and roll your own mechanism to resolve the names however you want. Go probably did so. Linux will not complain in any way whatsoever.
  Linux is the only kernel that lets you do this. Other kernels will break your software if you bypass their system libraries.
  - guipsp 3 months ago
    
    I mean, that is fine and all, but it doesn't really matter for making the software run correctly on systems that currently exist.
    
    matheusmoreira 3 months ago
    
    It works fine on current Linux systems. We can have freestanding executables that talk to Linux directly and link against zero system libraries.
    It's just that those executables are going to have to resolve names all by themselves. Chances are they aren't going to do it exactly like glibc does. That may or may not be a problem.
    
    o11c 3 months ago
    
    Historically, when DNS breaks in a not-glibc environment, it's very often found to in fact be a violation of some standard by the not-glibc, rather than a program that fails to document a glibc dependency.
    
    fc417fc802 3 months ago
    
    Just connect to the service running on localhost ...
    I'm curious. Why isn't getaddrinfo implemented in a similar manner to the loaders that graphics APIs use? Shouldn't that functionality be the responsibility of whatever resolver has been installed?
    
    o11c 3 months ago
    
    That is how `getaddrinfo` works under GLIBC; it's called NSS. The problem (well, one of them) is the non-GLIBC implementations that say "we don't need no stinkin' loader!"
    
    matheusmoreira 3 months ago
    
    The problem is people delete glibc and are surprised when glibc features are missing.
    The only point I'm making is: Linux does not require glibc.
    Users and their programs usually do require glibc but that's their choice. It's not mandated. People could theoretically rewrite the entire Linux user space in pure freestanding Rust if they wanted to.
    Name Service Switch is just a solution to a problem. There's no law that says Linux systems must have that system. Programs that depend on it will break if it's not there but that's not the fault of Linux.

vient 3 months ago

> Kalimba, VE

> No idea what this is, and Google won’t help me.

Seems that Kalimba is a DSP, originally by CSR and now by Qualcomm. CSR8640 is using it, for example https://www.qualcomm.com/products/internet-of-things/consume...

VE is harder to find with such short name.

AKSF_Ackermann 3 months ago

NEC Vector Engine. Basically not a thing outside supercomputers.
- fc417fc802 3 months ago
  
  $800 for the 20B-P model on ebay. More memory bandwidth than a 4090. I wonder if llama.cpp could be made to run on it?
  I see rumors they charge for the compiler though.

fweimer 3 months ago

I think GCC's more-or-less equivalent to Clang's --target is called -B: https://gcc.gnu.org/onlinedocs/gcc/Directory-Options.html#in...

I assume it works with an all-targets binutils build. I haven't seen anyone building their cross-compilers in this way (at least not in recent memory).

JoshTriplett 3 months ago

I haven't either, probably because it would require building once per target and installing all the individual binaries.
This is one of the biggest differences between clang and GCC: clang has one binary that supports multiple targets, while a GCC build is always target-specific.
o11c 3 months ago

Old versions of GCC used to provide `-b <machine>` (and also `-V <version>`), but they were removed a long time ago in favor of expecting people to just use and set `CC` correctly.
It looks like gcc 3.3 through 4.5 just forwards to an external driver; prior to that it seems like it used the same driver for different paths, and after that it is removed.

IshKebab 3 months ago

Funny thing I found when I gave up trying to find documentation and read the LLVM source code (seems to be what happened to the author too!): there are actually five components of the triple, not four.

I can't remember what the fifth one is, but yeah... insane system.

Thanks for writing this up! I wonder if anyone will ever come up with something more sensible.

o11c 3 months ago
There are up to 7 components in a triple, but not all are used at once, the general format is:
```
  <machine>-<vendor>-<kernel>-<libc?><abi?><fabi?>
```
But there's also <obj>, see below.
Note that there are both canonical and non-canonical triples in use. Canonical triples are output by `config.guess` or `config.sub`; non-canonical triples are input to `config.sub` and used as prefixes for commands.
The <machine> field (1st) is what you're running on, and on some systems it includes a version number of sorts. Most 64-bit vs 32-bit differences go here, except if the runtime differs from what is natural (commonly "32-bit pointers even though the CPU is in 64-bit mode"), which goes in <abi> instead. Historically, "arm" and "mips" have been a mess here, but that has largely been fixed, in large part as a side-effect of Debian multiarch (whose triples only have to differ from GNU triples in that they canonicalize i[34567]86 to i386, but you should use dpkg-architecture to do the conversion for sanity).
The <vendor> field (2nd) is not very useful these days. It defaults to "unknown" but as of a few years ago "pc" is used instead on x86 (this means that the canonical triple can change, but this hasn't been catastrophic since you should almost always use the non-canonical triple except when pattern-matching, and when pattern-matching you should usually ignore this field anyway).
The <kernel> field (3rd) is pretty obvious when it's called that, but it's often called <os> instead since "linux" is an oddity for regularly having a <libc> component that differs. On many systems it includes version data (again, Linux is the oddity for having a stable syscall API/ABI). One notable exception: if a GNU userland is used on BSD/Solaris system, a "k" is prepended. "none" is often used for freestanding/embedded compilation, but see <obj>.
The <libc> field (main part of the 4th) is usually absent on non-Linux systems, but mandatory for "linux". If it is absent, the dash after the kernel is usually removed, except if there are ABI components. Note that "gnu" can be both a kernel (Hurd) and a libc (glibc). Android uses "android" here, so maybe <libc> is a bit of a misnomer (it's not "bionic") - maybe <userland>?
<abi>, if present, means you aren't doing the historical default for the platform specified by the main fields. Other than "eabi" for ARM, most of this is for "use 32-bit pointers but 64-bit registers".
<fabi> can be "hf" for 32-bit ARM systems that actually support floats in hardware. I don't think I've seen anything else, though I admit the main reason I separately document this from <abi> is because of how Debian's architecture puts it elsewhere.
<obj> is the object file format, usually "aout", "coff", or "elf". It can be appended to the kernel field (but before the kernel version number), or replace it if "none", or it can go in the <abi> field.
- IshKebab 3 months ago
  
  Nah I dunno where you're getting your information from but LLVM only supports 5 components.
  See the code starting at line 1144 here: https://llvm.org/doxygen/Triple_8cpp_source.html
  The components are arch-vendor-os-environment-objectformat.
  It's absolutely full of special cases and hacks. Really at this point I think the only sane option is an explicit list of fixed strings. I think Rust does that.
  - jcranmer 3 months ago
    
    You're not really contradicting o11c here; what LLVM calls "environment" is a mixture of what they called libc/abi/fabi. There's also what LLVM calls "subarch" to distinguish between different architectures that may be relevant (e.g., i386 is not the same as i686, although LLVM doesn't record this difference since it's generally less interested in targeting old hardware), and there's also OS version numbers that may or may not be relevant.
    The underlying problem with target triples is that architecture-vendor-system isn't sufficient to uniquely describe the relevant details for specifying a toolchain, so the necessary extra information has been somewhat haphazardly added to the format. On top of that, since the relevance of some of the information is questionable for some tasks (especially the vendor field), different projects have chosen not to care about subtle differences, so the normalization of a triple is different between different projects.
    LLVM's definition is not more or less correct than gcc's here, nor are these the only definitions floating around.
    
    o11c 3 months ago
    
    Hm, looking to see if the vendor field is actually meaningful ... I see some stuff for m68k and mips and sysv targets ... some of it working around pre-standard vendor C implementations
    Ah, I found a modern one:
    i[3456789]86-w64-mingw* does not use winsup i[3456789]86-*-mingw* with other vendors does use winsup
    There are probably more; this is embedded in all sorts of random configure scripts and it is very not-greppable.
  - o11c 3 months ago
    
    LLVM didn't invent the scheme; why should we pay attention to their copy and not look at the original?
    The GNU Config project is the original.
    
    IshKebab 3 months ago
    
    The article goes into this a bit. But basically because LLVM is extremely popular and used as a backend by lots of other languages, e.g. Rust.
    Frankly being the originators of this deranged scheme is a good reason not to listen to GNU!

jkelleyrtp 3 months ago

The author's blog is a FANTASTIC source of information. I recommend checking out some of their other posts:

- https://mcyoung.xyz/2021/06/01/linker-script/

- https://mcyoung.xyz/2023/08/09/yarns/

- https://mcyoung.xyz/2023/08/01/llvm-ir/

eqvinox 3 months ago

Given TFA's bias against GCC, I'm not so sure. e.g. looking at the linker script article… it's also missing the __start_XYZ and __stop_XYZ symbols automatically created by the linker.
- matheusmoreira 3 months ago
  
  It also focuses exclusively on sections. I wish it had at least mentioned segments, also known as program headers. Linux kernel's ELF loader does not care about sections, it only cares about segments.
  Sections and segments are more or less the same concept: metadata that tells the loader how to map each part of the file into the correct memory regions with the correct memory protection attributes. Biggest difference is segments don't have names. Also they aren't neatly organized into logical blocks like sections are, they're just big file extents. The segments table is essentially a table of arguments for the mmap system call.
  Learning this stuff from scratch was pretty tough. Linker script has commands to manipulate the program header table but I couldn't figure those out. In the end I asked developers to add command line options instead and the maintainer of mold actually obliged.
  Looks like very few people know about stuff like this. One can use it to do some heavy wizardry though. I leveraged this machinery into a cool mechanism for embedding arbitrary data into ELF files. The kernel just memory maps the data in before the program has even begun execution. Typical solutions involve the program finding its own executable on the file system, reading it into memory and then finding some embedded data section. I made the kernel do almost all of that automatically.
  https://www.matheusmoreira.com/articles/self-contained-lone-...
  - o11c 3 months ago
    
    I wouldn't call them "same concept" at all. Segments (program headers) are all about the runtime (executables and shared libraries) and are low-cost. Sections are all about development (.o files) and are detailed.
    Generally there are many sections combined into a single segment, other than special-purpose ones. Unless you are reimplementing ld.so, you almost certainly don't want to touch segments; sections are far easier to work with.
    Also, normally you just just call `getauxval`, but if needed the type is already named `ElfW(auxv_t)*`.
    
    matheusmoreira 3 months ago
    
    > I wouldn't call them "same concept" at all.
    They are both metadata about file extents and their memory images.
    > sections are far easier to work with
    Yes. They are not, however, loaded into memory by default. Linkers do not generate LOAD segments for section metadata since they are not needed for execution. Thus it's impossible for a program to introspect its own sections without additional logic and I/O to read them into memory.
    > Also, normally you just just call `getauxval`, but if needed the type is already named `ElfW(auxv_t)*`.
    True. I didn't use it because it was not available. I wrote my article in the context of a freestanding nolibc program.
    
    o11c 3 months ago
    
    Right, but you can just use the section start/end symbols for a section that already goes into a mapped segment.
    
    matheusmoreira 3 months ago
    
    Can you show me how that would work?
    It's trivial to put arbitrary files into sections:
    objcopy --add-section program.files.1=file.1.dat \ --add-section program.files.2=file.2.dat \ program program+files
    The problem is the program.files.* sections do not get mapped in by a LOAD segment. I ended up having to write my own tool to patch in a LOAD segment into the segments table because objcopy does not have the ability to do it.
    Even asked a Stack Overflow question about this two years ago:
    https://stackoverflow.com/q/77468641
    The only answer I got told me to simply read the sections into memory via /proc/self/exe or edit the segments table and make it so that the LOAD segments cover the whole file. I eventually figured out ways to add LOAD segments to the table. By that point I didn't need sections anymore, just a custom segment type.
    
    o11c 3 months ago
    
    The whole point of section names is that they mean something. If you give it a name that matches `.rodata.*` it will be part of the existing read-only LOADed segments, or `.data.*` for (private) read-write.
    Use `ld --verbose` to see what sections are mapped by default (it is impossible for a linker to work without having such a linker script; we're just lucky that GNU ld exposes it in a sane form rather than hard-coding it as C code). In modern versions of the linker (there is still old documentation found by search engines), you can specify multiple SECTIONS commands (likely from multiple scripts, i.e. just files passed on the command line), but why would you when you can conform to the default one?
    You should pick a section name that won't collide with the section names generated by `-fdata-sections` (or `-ffunction-sections` if that's ever relevant for you).
    
    matheusmoreira 3 months ago
    
    That requires relinking the executable. That is not always desirable or possible. Unless the dynamic linker ignores the segments table in favor of doing this on the fly... Even if that's the case, it won't work for statically linked executables. Only the dynamic linker can assign meaning to section names at runtime and the dynamic linker isn't involved at all in the case of statically linked programs.
  - eqvinox 3 months ago
    
    Absolutely agree. Had my own fun dealings with ELF, and to be clear, on plain mainline shipping products (amd64 Linux), not toys/exercise/funky embedded. (Wouldn't have known about section start/stop symbols otherwise)
- sramsay 3 months ago
  
  I was really struck by the antipathy toward GCC. I'm not sure I quite understand where it's coming from.

forrestthewoods 3 months ago

What a great article.

Everytime I deal with target triples I get confused and have to refresh my memory. This article makes me feel better in knowing that target triples are an unmitigated cluster fuck of cruft and bad design.

> Go does the correct thing and distributes a cross compiler.

Yes but also no. AFAIK Zig is the only toolchain to provide native cross compiling out of the box without bullshit.

Missing from this discussion is the ability to specify and target different versions of glibc. Something that I think only Zig even attempts to do because Linux’s philosophy of building against local system globals is an incomprehensibly bad choice. So all these target triples are woefully underspecified.

I like that at least Rust defines its own clear list of target triples that are more rational than LLVM’s. At this point I feel like the whole concept of a target triples needs to be thrown away. Everything about it is bad.

SAI_Peregrinus 3 months ago

I'd say it's less bad design, & more a near-total lack of design. Someone needed a "good enough" way to specify target properties for a few targets they had to support, and picked a string format they could easily parse. Worked fine. Then more systems had to be added, and special cases happened, and nobody wanted to break backwards-compatibility so the system just grew. And nobody can agree on names, so people added alias support, and the system grew. And people started releasing OSes instead of just organizations so the "vendor" concept grew fuzzy, and the system grew. Now it is a hyphen-separated variable-length monster of confusion.
Ideally each component in the target "triple" would be a separate argument.

throw0101d 3 months ago

Noticed endians listed in the table. It seems like little-endian has basically taken over the world in 2025:

* https://en.wikipedia.org/wiki/Endianness#Hardware

Is there anything that is used a lot that is not little? IBM's stuff?

Network byte order is BE:

* https://en.wikipedia.org/wiki/Endianness#Networking

forrestthewoods 3 months ago

BE isn’t technically dead buts it’s practically dead for almost all projects. You can static_assert byte order and then never think about BE ever again.
All of my custom network serialization formats use LE because there’s literally no reason to use BE for network byte order. It’s pure legacy cruft.
- f_devd 3 months ago
  
  ...Until you find yourself having to workaround legacy code to support some weird target that does still use BE. Speaking from experience (tbf usually lower level than anything actually networked, more like RS485 and friends).
  - forrestthewoods 3 months ago
    
    18 years in my career and I’m still waiting for that BE target to rear its ugly head.
    I’m more than happy to static_assert little endian. If any platform needs BE support then I’ll add support to the minimum amount of libraries necessary to do so. Super easy.
    Here’s the thing. If you wrote BE compatible code today you probably dont even have a way to test it. So you’re adding a bunch of complexity and doing a bunch of work that you can’t even verify is correct! Complete and total waste of time.
dharmab 3 months ago

LEON, used by the European Space Agency, is big endian.
- naruhodo 3 months ago
  
  Should have been called BEON.
Palomides 3 months ago

IBM's Power chips can run in either little or big modes, but "used a lot" is a stretch
- inferiorhuman 3 months ago
  
  Most PowerPC related stuff (e.g. Freescale MPC5xx found in a bunch of automotiver applications) can run in either big or little endian mode, as can most ARM and MIPS (routers, IP cameras) stuff. Can't think of the last time I've seen any of them configured to run in big endian mode tho.
  - classichasclass 3 months ago
    
    For the large Power ISA machines, it's most commonly when running AIX or IBM i these days, though the BSDs generally run big too.
rv3392 3 months ago

Apart from IBM Power/AIX systems, SPARC/Solaris is another one. I wouldn't say either of these are used a lot, but there's a reasonable amount of legacy systems out there that are still being supported by IBM and Oracle.
formerly_proven 3 months ago

10 years ago the fastest BE machines that were practical were then-ten year old powermacs. This hasn’t really changed. I guess they’re more expensive now.
- eqvinox 3 months ago
  
  e6500/T4240 are faster than powermacs. Not sure how rare they are nowadays, we didn't have any trouble buying some (on eBay). 12×2 cores, 48GB RAM, for BE that's essentially heaven…
richardwhiuk 3 months ago

Some ARM stuff.
thro3838484848 3 months ago

Java VM is BE.
- kbolino 3 months ago
  
  This is misleading at best. The JVM only exposes multibyte values to ordinary applications in such a way that byte order doesn't matter. You can't break out a pointer and step through the bytes of a long field to see what order it's in, at least not without the unsafe memory APIs.
  In practice, any real JVM implementation will simply use native byte order as much as possible. While bytecode and other data in class files is serialized in big endian order, it will be converted to native order whenever it's actually used. If you do pull out the unsafe APIs, you can see that e.g. values are little endian on x86(-64). The JVM would suffer from major performances issues if it tried to impose a byte order different from the underlying platform.
  - PhilipRoman 3 months ago
    
    One relatively commonly used class which exposes this is ByteBuffer and its Int/Long variants, but there you can specify the endianness explicitly (or set it to match the native one).

cwood-sdf 3 months ago

"And no, a “target quadruple” is not a thing and if I catch you saying that I’m gonna bonk you with an Intel optimization manual. "

https://github.com/ziglang/zig/issues/20690

debugnik 3 months ago

The argument is that they're called triples even when they've got more or less components than 3. They should have simply been called target tuples or target monikers.
- o11c 3 months ago
  "gnu tuple" and "gnu type" are also common names.
  The comments in `config.guess` and `config.sub`, which are the origin of triples, use a large variety of terms, at least the following:
  configuration name configuration type [machine] specification system name triplet tuple

pie_flavor 3 months ago

Sorry, going to keep typing x64. Unlike the article's recommendation of x86, literally everyone knows exactly what it means at all times.

qu4z-2 3 months ago

If someone tells me x86, I am certainly thinking 32-bit protected mode not 64-bit long mode... Granted I'm in the weird space where I know enough to be dangerous but not enough to keep me up-to-date with idiomatic naming conventions.
kevin_thibedeau 3 months ago

You mean AMD64?

cestith 3 months ago

> “i386” (the first Intel microarchitecture that implemented protected mode)12

This is technically incorrect. The 286 had protected mode. It was a 16-bit protected mode, being a 16-bit processor. It was also incompatible with the later protected mode of the 386 through today’s processors. It did, however, exist.

IAmLiterallyAB 3 months ago

> However, due to the runaway popularity of LLVM, virtually all compilers now use target triples.

That's a wild take. I think its pretty universally accepted the GCC and the GNU toolchain is what made this ubiquitous.

Also, the x32 ABI is still around, support is still around, I don't know where the author got that notion

therein 3 months ago

I like the code editor style preview on the right. Enough to forgive the slightly clunky scroll.

SrslyJosh 3 months ago

It looks nice, but I find the choppy scrolling (on an M1 MBP, no less!) to be distracting.
It also doesn't really tell me anything about the content, except where I'm going to see tables or code blocks, so I'm not sure what the benefit is.
Given the really janky scrolling, I'd like to have a way to hide it.
tiffanyh 3 months ago

FYI - to see this you need to have your browser at least 1435px wide.
Starlevel004 3 months ago

Unfortunately the text in the preview shows up in ctrl+f.

ycombinatrix 3 months ago

>There’s a few variants. wasm32-unknown-unknown (here using unknown instead of none as the system, oops)

Why isn't it called wasm32-none-none?

pie_flavor 3 months ago

As far as I can tell, it's because libstd exists (but is full of do-nothing stubs). There is another `wasm32-none` target which is no_std.

psyclobe 3 months ago

Sounds like what we use with vcpkg to define the systems tooling; still trying to make sense of it all these years later, but we define things like x64-Linux-static to imply target architecture platform and linkage style to runtime.

dvektor 3 months ago

Great read. Love those articles where you go in thinking that you have a pretty solid understanding of the topic and then proceed to learn much more than you thought you would.

theoperagoer 3 months ago

Great content. Also, this website is gorgeous!

o11c 3 months ago

This article should be ignored, since it disregards the canonical origin of target triples (and the fact that it's linked to `configure`):

https://git.savannah.gnu.org/cgit/config.git/tree/

The `testsuite/` directory contains some data files with a fairly extensive list of known targets. The vendor field should be considered fully extensible, and new combinations of know machine/kernel/libc shouldn't be considered invalid, but anything else should have a patch submitted.

jcranmer 3 months ago

This article is a very LLVM-centric view, and it does ignore the GNU idea of a target triple, which is essentially $(uname -a)-vendor-$(uname -s), with vendor determined (so far as I can tell) entirely from uname -s, the system name undergoing some amount of butchering, and version numbers sometimes being included and sometimes not, and Linux getting a LIBC tacked on.
But that doesn't mean the article should be ignored in its entirety. LLVM's target triple parsing is more relevant for several projects (especially given that the GNU target triple scheme doesn't include native Windows, which is one of the most common targets in practice!). Part of the problem is that for many people "what is a target triple" is actually a lead-in to the question "what are the valid targets?", and trying to read config.guess is not a good vehicle to discover the answer. config.guess isn't also a good way to find about target triples for systems that aren't designed to run general-purpose computing, like if you're trying to compile for a GPU architecture, or even a weird x86 context like UEFI.
- o11c 3 months ago
  
  The GNU scheme does in fact have support for various windows targets. It's just that the GNU compilers don't support them all.
- pjc50 3 months ago
  
  There's MinGW.

bruce343434 3 months ago

Why does this person have such negative views of GCC and positive bias towards LLVM?

nemothekid 3 months ago

If OP is above 30 - it's probably due to the frustration of trying to modularize GCC that led to the creation of LLVM in the first place. If OP is below 30, it's probably because he grew up in a world where most compiler research and design is done on LLVM and GCC is for grandpa.
steveklabnik 3 months ago

I have intense respect for the history of gcc, but everything about using it screams that it's stuck in the past.
LLVM has a lot of problems, but it feels significantly more modern.
I do wish we had a "new LLVM" doing to LLVM what it did to gcc. Just because it's better doesn't mean it's perfect.
Basically, you can respect history while also being honest about the current state of things. But also, doing so requires you to care primarily about things like ease of use, rather than things like licenses. For some people, they care about licenses first, usability second.
- tialaramex 3 months ago
  
  Their IR is a mess. So a "new LLVM" ought to start by nailing down the IR.
  And as a bonus, seems to me a nailed down IR actually is that portable assembly language the C people keep telling us is what they wanted. Most of them don't actually want that and won't thank you - but if even 1% of the "I need a portable assembler" crowd actually did want a portable assembler they're a large volume of customers from day one.
  - o11c 3 months ago
    
    Having tried writing plugins for both, I very much prefer GCC's codebase. You have to adapt to its quirks, but at least it won't pull the rug from under your feet gratuitously. There's a reason every major project ends up embedding a years-old copy of LLVM rather than just using the system version.
    If you're ignoring the API and writing IR directly there are advantages to LLVM though.
- flkenosad 3 months ago
  
  Honestly, I love that both exist with their respective world views.
  - steveklabnik 3 months ago
    
    I for sure don't want to suggest that anyone who loves gcc shouldn't be working on what they love. More compilers are a good thing, generally. Just trying to say why I have a preference.
Skywalker13 3 months ago

It is unfortunate. GCC has enabled the compilation of countless lines of source code for nearly 40 years and has served millions of users. Regardless of whether its design is considered good or bad today, GCC has played an essential role and has enabled the emergence of many projects and new compilers. GCC deserves deep respect.
matheusmoreira 3 months ago

Good question. Author is incredibly hostile to one of the most important pieces of software ever developed because of the way they approached the problem nearly 40 years ago. Then he criticizes Go for trying to redesign the system instead of just using target triples...
- FitCodIa 3 months ago
  
  The author writes: "really stupid way in which GCC does cross compiling [...] Nobody with a brain does this [...]", and then admits in the footnote, "I’m not sure why GCC does this".
  Immature to the point of alienating.
xyst 3 months ago

Seems to have a decent amount of knowledge in this domain in education and professional work. Author is from MIT so maybe professors had a lot of influence here.
also, gcc is relatively old and comes with a lot of baggage. LLVM is sort of the defacto standard now with improvements in performance
- rlpb 3 months ago
  
  > LLVM is sort of the defacto standard now...
  Distributions, and therefore virtually all the software used by a distribution user, still generally use gcc. LLVM is only the de facto standard when doing something new, and for JIT.
- bruce343434 3 months ago
  
  as someone who uses both Clang and GCC to cover eachothers weaknesses, as far as I can tell both LLVM and GCC are hopelessly beastly codebases in terms of raw size and their complexity. I think that's just what happens when people desire to build an "everything compiler".
  From what I gathered, LLVM has a lot of C++ specific design choices in its IR language anyway. I think I'd count that as baggage.
  I personally don't think one is better than the other. Sometimes clang produces faster code, sometimes gcc. I haven't really dealt with compiler bugs from either. They compile my projects at the same speed. Clang is better at certain analyses, gcc better at certain others.
  - ahartmetz 3 months ago
    
    Clang used to compile much faster than GCC. I was excited. Now there is barely any difference, so I keep using GCC and occasionally some Clang-based tools such as iwyu, ClangBuildAnalyzer or sanitizer options (rare, Valgrind is easier and more powerful though sanitizers also have unique features).
flkenosad 3 months ago

It's the new anti-woke mind virus going around attacking anything "communist" such as copyleft, Stallman, GCC, GNU, etc.

AceJohnny2 3 months ago

Offtopic, but I'm distracted by the opening example:

> After all, you don’t want to be building your iPhone app on literal iPhone hardware.

iPhones are impressively powerful, but you wouldn't know it from the software lockdown that Apple holds on it.

Example: https://www.tomsguide.com/phones/iphones/iphone-16-is-actual...

There's a reason people were clamoring for Apple to make ARM laptops/desktops for years before Apple finally committed.

AceJohnny2 3 months ago

I do not think I like this author...
> A critical piece of history here is to understand the really stupid way in which GCC does cross compiling. Traditionally, each GCC binary would be built for one target triple. [...] Nobody with a brain does this ^2
You're doing GCC a great disservice by ignoring its storied and essential history. It's over 40 years old, and was created at a time where there were no free/libre compilers. Computers were small and slow. Of course you wouldn't bundle multiple targets in one distribution.
LLVM benefitted from a completely different architecture and starting from a blank slate when computers were already faster and much larger, and was heavily sponsored by a vendor that was innately interested in cross-compiling: Apple. (Guess where LLVM's creator worked for years and lead the development tools team)
- jaymzcampbell 3 months ago
  
  The older I get the more this kind of commentary (the OP, not you!) is a total turn off. Systems evolve and there's usually, not always, a reason for why "things are the way they are". It's typically arrogance to have this kind of tone. That said I was a bit like that when I was younger, and it took a few knockings down to realise the world is complex.
- steveklabnik 3 months ago
  
  "This was the right way to do it forty years ago, so that's why the experience is worse" isn't a compelling reason for a user to suffer today.
  Also, in this specific case, this ignores the history around LLVM offering itself up to the FSF. gcc could have benefitted from this fresh start too. But purely by accident, it did not.
  - FitCodIa 3 months ago
    
    > "This was the right way to do it forty years ago, so that's why the experience is worse" isn't a compelling reason for a user to suffer today.
    On my system, "dnf repoquery --whatrequires cross-gcc-common" lists 26 gcc-*-linux-gnu packages (that is, kernel / firmware cross compilers for 26 architectures). The command "dnf repoquery --whatrequires cross-binutils-common" lists 31 binutils-*-linux-gnu packages.
    The author writes, "LLVM and all cross compilers that follow it instead put all of the backends in one binary". Do those compilers support 25+ back-ends? And if they do, is it good design to install back-ends for (say) 23 such target architectures that you're never going to cross-compile for, in practice? Does that benefit the user?
    My impression is that the author does not understand the modularity of gcc cross compilers / packages because he's unaware of (or doesn't care for) the scale that gcc aims at.
    
    steveklabnik 3 months ago
    
    > And if they do, is it good design to install back-ends for (say) 23 such target architectures that you're never going to cross-compile for, in practice? Does that benefit the user?
    rustc --print target-list | wc -l 287
    I'm kinda surprised at how large that is, actually. But yeah, I don't mind if I have the capability to cross-compile to x86_64-wrs-vxworks that I'm never going to use.
    I am not an expert on all of these details in clang specifically, but with rustc, we take advantage of llvm's target specifications, so you that you can even configure a backend that the compiler doesn't yet know about by simply giving it a json file with a description. https://doc.rust-lang.org/nightly/nightly-rustc/rustc_target...
    While these built-in ones aren't defined as JSON, you can ask the compiler to print one for you:
    rustc +nightly -Z unstable-options --target=x86_64-unknown-linux-gnu --print target-spec-json
    It's lengthy so instead of pasting here, I've put this in a gist: https://gist.github.com/steveklabnik/a25cdefda1aef25d7b40df3...
    Anyway, it is true that gcc supports more targets than llvm, at least in theory. https://blog.yossarian.net/2021/02/28/Weird-architectures-we...
  - AceJohnny2 3 months ago
    
    I'd love to learn what accident you're referring to, Steve!
    I vaguely recall the FSF (or maybe only Stallman) arguing against the modular nature of LLVM because a monolothic structure (like GCC's) makes it harder for anti-GPL actors (Apple!) to undermine it. Was this related?
    
    steveklabnik 3 months ago
    
    That is true history, in my understanding, but it's not related.
    Chris Lattner offered to donate the copyright of LLVM to the FSF at one point: https://gcc.gnu.org/legacy-ml/gcc/2005-11/msg00888.html
    He even wrote some patches: https://gcc.gnu.org/legacy-ml/gcc/2005-11/msg01112.html
    However, due to Stallman's... idiosyncratic email setup, he missed this: https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00...
    > I am stunned to see that we had this offer.
    > Now, based on hindsight, I wish we had accepted it.
    Note this email is in 2015, ten years after the initial one.
    
    amszmidt 3 months ago
    
    The only truth to the story is the missed email.
    There is nothing "unmodular" about GCC -- considering that it supports plenty of architectures, operating systems, and languages.
    The big difference, which people seem to miss in the context of the GNU project and GNU system, is that modularity is for free software projects. GCC is planty modular, and very easy to extend in any way shape or form .. if you abide by the license!
    If you want to be a parasite on a project licensed under the GNU GPL, you will have a rough ride .. that is after all the whole idea of copyleft.
    
    Philpax 3 months ago
    
    Incredible. Thank you for sharing.
    
    steveklabnik 3 months ago
    
    You're welcome! It's a wild story. Sometimes, history happens by accident.
    
    matheusmoreira 3 months ago
    
    Wow that is wild. Imagine how different things could have been...
- FitCodIa 3 months ago
  
  > and was heavily sponsored by a vendor that was innately interested in cross-compiling
  and innately disinterested in Free Software, too
boricj 3 months ago

A more pertinent (if dated) example would be "you don't want to be building your GBA game on literal Game Boy Advance hardware".
- richardwhiuk 3 months ago
  
  Or a microcontroller
plorkyeran 3 months ago

iPhones have terrible heat dispersion compared to even a fanless computer like a macbook air. You get a few minutes at full load before thermal throttling kicks in, so you could do the occasional build of your iPhone app on an iPhone but it'd be pretty terrible as a development platform.
At work we had some benchmarking suites that ran on physical devices and even with significant effort put into cooling them they spent more time sleeping waiting to cool off than actually running the benchmarks.

Joker_vD 3 months ago

> no one calls it x64 except for Microsoft. And even though it is fairly prevalent on Windows, I absolutely give my gamedev friends a hard time when they write x64.

So, it turns out, actually a lot of people call it x64 — including author's own friends! — it's just that the author dislikes it. Disliking something is fine, but why claim outright falsehood which you know first-hand is false?

Also, the actual proper name for this ISA is, of course, EM64T. /s

> The fourth entry of the triple (and I repeat myself, yes, it’s still a triple)

Any actual justification except the bald assertions from the personal preferences? Just call it a "tuple", or something...

kridsdale1 3 months ago

I really appreciate the angular tilt of the heading type on that blog.

Retr0id 3 months ago

Note to author, I'm not sure the word "anachronism" is being used correctly in the intro.

kupiakos 3 months ago

It's being used correctly: something that is conspicuously old-fashioned for its environment is an anachronism. A toolchain that only supports native builds fits.
- Retr0id 3 months ago
  
  The article does not place any given toolchain within an incorrect environment, though.
  If someone said "old compilers were usually cross-compilers", that would be an ahistoric statement (somewhat).
  If someone used clang in a movie set in the 90s, that would be anachronistic.
bqmjjx0kac 3 months ago

It's technically correct, but feels a bit forced.
compyman 3 months ago

I think the meaning is that the idea that compilers can only compile for their host machine is an ananchronism, since that was historically the case but is no longer true.
- bregma 3 months ago
  
  Heck, it hasn't been true since the 1950s. Consider it as "has never been true".
  Oh, sure, there have been plenty of native-host-only compilers. It was never a property of all compilers, though. Most system brings-ups, from the mainframes of the 1960s through the minis of the 1970s to the micros and embeddeds of the 1980s and onwards have required cross compilers.
  I think what he means is that a single-target toolchain is an anachronism. That's also not true, since even clang doesn't target everything under the sun in one binary. A toolchain needs far more than a compiler, for a start; it needs the headers and libraries and it needs a linker. To go from source to executable (or herd of dynamic shared objects) requires a whole lot more than installing the clang (or whatever front-end) binary and choosing a nifty target triple. Most builds of clang don't even support all the interesting target triples and you need to build it yourself, which require a lot more computer than I can afford.
  Target triples are not even something limited to toolchains. I maintain software that gets cross-built to all kinds of targets all the time and that requires target triples for the same reasons compilers do. Target triples are just a basic tool of the trade if you deal with anything other than scripting the browser and they're a solved problem rediscovered every now and then by people who haven;t studied their history.
- stefan_ 3 months ago
  
  Telling people that "Clang can compile for any architecture you like!" tends to confuse them more than it helps. I suppose it sets up unrealistic assumptions because of course outputting assembly for some architecture is a very long way from making working userland binaries for a system based on that architecture, which is what people actually want.
  And ironically in all of this, building a full toolchain based on GCC is still easier than with LLVM.