-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starting point for porting to a 32-bit architecture? #2256
Comments
Sounds like you're thinking about a non-IA16 CPU? What are you thinking of? Porting to a non-segmented or non-16-bit CPU architecture probably won't work out, there's just too much code written around both. |
I was thinking of a x86 (32-bit) CPU to start with, since it's just further down the same processor line, and it puts me one step closer to porting ELKS to other architectures. I can rewrite as much as I need in my own repository, and I can make it architecture independent so I can still have the original IA16 port too. Of course, in each architecture there will need to still be platform-specific code to keep the resource usage as low as it already is, but (in my personal opinion) it would be worth it to have Linux anywhere I need it, especially on embedded devices. |
For a small but very powerful 32-bit Linux-like OS, I recommend Fiwix, it's fantastic and already runs a large variety of programs. For ELKS to support a 32-bit port, it'd be a major rewrite of all the core process handling, multitasking, all assembly language, and using a new C compiler for the kernel; a huge job. If you really want to try, I recommend studying the entire kernel source code base before starting. |
Fiwix looks pretty cool, but it's not specifically what I'm looking for. I'm looking to use ELKS specifically for the 512K of RAM needed at minimum to be useful. I don't exactly need ELKS on my desktop PC, but on the embedded devices I bring with me regularly, having a Linux(-like) environment would be great. I am willing to go the length to rewrite as much kernel code as needed for this. |
32 bit ... will mean more memory will be used which is in conflict with the 512 KB memory you want to use. On a 32 bit CPU you want to use the word (32 bit) which is the fastest you can access even if you do not need a 32 bit variable. I just checked - MS Windows does optimize memory to avoid wasting memory on small objects. Ask chatgpt: "How Windows optimizes memory on 64 bit architectures when variables does not need to be 64 bits?" Actually it depends on the software. It means programs that use a lot of 32bit ints will start working faster and the ones with many 16 bit ints just a bit slower. But memory consumption will increase for sure if you do not apply some tricks. |
I know that 32-bit will take more memory, but it has its optimizations just like 16-bit does. I have to rewrite most of the code anyways, so I will have to drop 16-bit optimizations and add 32-bit optimizations anyways. I don't need to make far memory reads & writes, so that part of the code can be dropped. Some 32-bit processors will have XIP, so they'll use way less memory than 512KB anyways. I'm not specifically aiming for 512KB, I used it as an example of how little ELKS needs to run. I'll start porting to either x86 or an RP2040 for an initial 32-bit port. |
Hi. |
I think there are too many assumptions in ELKS for the x86 architecture, but also I am not an expert. |
To make it architecture-independent, the assumptions have to be either changed to assume a different architecture or removed and replaced with abstractions that can be filled by an architecture-specific directory. |
Rewriting ELKS to run on 68000, or any other processor, for that matter, is "doable", but no ports other than to essentially 8086-based segmented-architectures have been done. The real question is whether the ELKS source base would be a good starting point. IMO, the answer to that depends on how much spare time you have, how much one knows about the new CPU (and 8086), and whether one has the required tools or not.
In order to port ELKS to a new CPU, the exact opposite needs to occur: the core kernel code needs to be rewritten in an extremely architecture-dependent way, so that the resulting OS is efficient and actually works. The porting process needs to start at the core required functionality, or nothing else will run. Most all of the source base that can be made portable has already been done, given various restrictions. The core of ELKS (and most of OSes) functionality is a program (i.e. a piece of code) that uses a hardware interrupt in order to switch application stacks and generally implement the illusion of multi-tasking. In addition, OS services aka system calls, use the same mechanism. This code is usually written in assembler and is innately tied to the way that the CPU handles stack switching and interrupts in hardware. In ELKS, this code is the _irqit routine, and is complicated: I have written a two-part article on system calls and interrupts on our Wiki to try to describe its subtle complexities. For those wanting to port ELKS to a new CPU, I recommend you start there and fully understand how that mechanism works on 8086.
The ELKS C library is a very portable set of code; we now have it ported to three different compiler toolchains. Starting here won't do much in porting any kernel - as the kernel uses no C library code. The ELKS C library will likely easily port to any new toolchain, and many of the ELKS user mode programs will also port easily, unless they are directly tied to the system internals. After understanding how the core ELKS kernel time-slicing code works, one is in a much better position to get that running on a new CPU, then start adding other parts of the system to the codebase. The beginning of real coding work will require the selection of a full C toolchain - compiler, assembler, and linker, along with a target board/system to port to and a way to transfer programs back and forth (or use an emulator). |
A few starting points: The compiler could be: https://github.com/M680x0/M680x0-llvm You can try to reuse: |
Thank you. I do not have much spare time to spend on this, but I will have a look at the links for sure. |
ps: ELKS do run on Intel 32 bits (of course, using the 16-bit real mode), you don't need an emulator, and with some effort, you could use XMS for using more memory, and even unreal mode, for example, to map more memory and use 32-bit instructions, while keep running the 16-bit ELKS kernel. My ELKS development machine is a Thinkpad T430, which is ages more recent than the last 8088 computer. |
ELKS already does just that right now. Enable CONFIG_FS_XMS_BUFFER=y and set CONFIG_FS_NR_XMS_BUFFERS=2500 and the system will use 2.5MB of XMS memory for system buffers - resulting in super high speeds as the system runs almost entirely from memory. (Be aware though, that disk syncing is not automatic and must be enabled using sync=N seconds in /bootopts or you could lose everything in a system crash).
However, this feature is entirely optional. In most cases, what people really want is 32-bit applications. A big potential problem with allowing 32-bit instructions in application programs is that the program then (of course) doesn't run on older systems, and complaints start. In general, if users need the advantage of 32-bit programs, they ought to be running a whole 32-bit system, including kernel (thus my recommendation for Fiwix above). Of course, we could allow 32-bit user applications, but that would entail getting another compiler ported, porting the C library again, only to IMO find out that most 32-bit applications would immediately run out of memory, since memory allocations would still be extremely limited, given the lack of RAM (XMS memory above can't always be directly accessed with unreal mode due to complications with certain later model PC BIOSes [e.g. Compaq 386] that run in protected mode themselves, causing conflicts with unreal mode). In conclusion, the motivations of the ELKS Project remain trying to get big things to happen with little resources without having to have complicated multiple versions of the software for different CPUs. Otherwise, all modern systems can just boot a 32- or 64-bit OS and all the "problems" go away. |
Such is my goal too. I'm not looking for 32-bit applications, I'm looking for big things with little resources, however most embedded devices I own are ARM (32-bit) and not IA16. I don't intend for this to ever be merged with mainstream ELKS since it would conflict with everything and defeat the whole purpose. I just need the capabilities of ELKS on devices with little to spare without either writing everything from scratch or emulation, so I'm willing to jump through the hoops necessary to achieve this. |
I was thinking that ELKS is reaching the limit of what can be done for some applications such as Nano-X. Having several applications running simultaneously is already exhausting the entire memory. There is no chance for a graphical browser under Nano-X for example. So is there some trick, some elegant way to access more memory on 286 and 386? One way could be to implement a working swap file. On 16bit machines the swap file will be slow or even disabled, but on machines with more RAM the swap file can be placed in RAM disk beyond the 1MB. In this case the applications stay 16 bit, but the additional RAM is still used. I am sure it is not simple, because it can be a swap file per process or by smaller chunks of memory and it will be complicated. |
I'm playing a bit with Microweb and Arachne (both 16-bit software), I can see they explicitly deal with swap, XMS / EMS. I agree something like a ramdisk will suffice for software developers, while a kernel API for XMS access/copy would also be useful. In order not to introduce a lot of complexity, I'd leave swapping logic to the userland. |
I see two big problems with implementing swap on ELKS, for the reasons above to increase memory:
Agreed. There are overlay linkers available for this sort of thing, and IIRC OpenWatcom library has some support for it, although likely lots of work to port that over. Writing programs to use this sort of thing is another issue entirely.
For the unreal mode implementation, yes. For protected-mode BIOSes, INT 15h is used by the kernel and could also likely be used by applications, possibly requiring just a small wrapper around the call.
Yes, such a thing wouldn't be too much effort to make work, since its already being done by the kernel. The API would only allow block-copying directly between main memory and an XMS region, no direct access. As mentioned previously, the direct access can't be made to work using unreal mode since some PC BIOSes (think Compaq) already use protected mode within their BIOS, and in those cases the BIOS INT 15h protected mode XMS block copy function is used instead of unreal mode. (Thus, programs running on a modern BIOS could probably just use the BIOS INT 15h block memory copy function now directly, without having explicit kernel support, should anyone really have programs ready to use this sort of thing). However, the biggest issue with memory-copy solutions is that the user land program(s) in almost all cases have to be explicitly designed to work in such a way. No modern 32-/64-bit software is written in this way, as they're all using mmap() and depend on OS demand paging to do their dirty work. This is equivalent to the problem of text editors specially written to handle files larger than memory, vs those that can only handle text files that fit in main memory. There are very few of the former text editors I'm aware of. |
Arachne browser for example allocates the memory using contexts, which relates to data dependencies of the runtime (eg. html table, images, fonts, core memory, cookies, etc), and so when swap is needed, the software knows which context is not being used and can "copy it out" to swap (be it on XMS or disk). The code is mostly ifdef'ed to be used only on DOS, but could be adapted to ELKS. The same can be said about MicroWeb I think, I still did not looked which strategy it uses, but it does do swap and/or uses XMS. |
I think ELKS is very important not only for 8088 and similar CPUs, but also 80286, as there are no modern Unix for it. I'm not sure the level of hardware support of unreal16 mode versus int 15h to access XMS, but the more ELKS provides APIs / ways for the developer to use, the better, so the software can check the machine it is running and choose the appropriate facilities for extended memory access. I'm usually giving the browser as example, but our C86 compiler is a good candidate for anyone wanting to play with swapping, for allowing it to compile larger C sources. |
So maybe a C library for swapping - an API that can be used by all applications. Maybe even this API can later be added to libc. The advantage will be that swap could have different backends. If XMS is detected then it will use XMS and not the disk (this is just an example to illustrate the idea). So on an old system it will be slow using the disk to swap everything, but on newer systems it will use the extra memory. And it will be only for applications that know how to use it. |
I like your idea @toncho11. Some kind of API like "copy_to_extended()", "copy_from_extended()", and some init functions to declare the size of memory block that can be swapped, an identifier may be. This could be in userland, so if there are improvements / new features in the kernel, they could be easily introduced without breaking userland. |
Can the first version be exactly that: store to XMS if available, otherwise to disk in /tmp (in the future block device) ? Some example API: So store_to_extended copies data from the pointer at a specified length. The kernel will need to handle this copy. |
Looking at the kernel XMS code in elks/arch/i86/mm/xms.c, it seems I got a little ahead of myself saying this won't be too much effort to get running. Close inspection shows that even the BIOS INT 15h interface requires the setup of a 80386 GDT (global descriptor table) in order to function. So read up on how 386 protected mode memory management works. The 386 unreal mode also uses a GDT to extend the "shadow" segment limit registers in the CPU beyond the normal 64K boundary. In order to describe these addresses, a "32-bit linear" address type The are definite complications to getting all this to work, and now that I look at it closer, its probably better to start doing this all in user land, rather than building a kernel interface first, only to see that it won't work with the architecture of the user programs requiring it, or that the browser base code is too big to run at all. Here are some of my further thoughts on it:
In summary, I would suggest to keep working on getting a browser to be initially operational, then determine how it should operate with various amounts of extra memory (or not), and publish its internal interface(s) for memory management. After that and fully understanding how the internal ELKS XMS works, it can be decided what kind of (arena/linear/other) memory allocation approach could be mapped on top of an unreal/INT15 XMS implementation. However, if the application were to use its internal interface to perform only disk <-> main memory copy in/out, this could be made to work without any of the above XMS complexities, and would not be limited to XMS memory nor arena allocation issues. It would run slowly, but would enable testing the application with much less work, it seems. |
I won't pretend I understood all the technical details :) Is it possible to create a RAM drive block device based on XMS currently? This means that no allocator is needed, we do not care for fragmentation, etc. This can be a very good first step because our "Swap API" will have the possibility to switch between real disk drive (in /tmp) and the XMS memory for the API later. I mean a RAM drive that can be mounted and used by applications in ELKS. This RAM drive is a well defined task. If doable it can be a bit like that "Here is some memory for you applications, you use it the way you want! (and can)" |
I'll studying a bit XSWAP implementation, described in: The interface is here: Some code: Reading the assembly, it seems it uses HIMEM.SYS facilities; |
Actually that's a pretty good idea - since XMS access is already present in the kernel, we should be able to extend the existing RAM drive to use linear XMS memory, so that the block device essentially becomes the memory allocator. Nice. This also could fit well into the idea of the simpler first approach of using disk swapping and then just replacing the /tmp swapfile with a RAM drive. I'll look more into it :)
Hmm, later version of HIMEM.SYS were used as the XMS allocator, rather than the older BIOS INT 15h function. But that's all RAM-based copy in/out. Does Arachne support direct to disk memory swapping without XMS? |
Yes, that code is DOS only, as it relies on INT 2fh calls into a DOS system to handle XMS memory likely in accordance with the DOS XMS spec. It is possible that the
This code looks like it tries to implement both XMS and disk memory swapping. However, the code is so terribly ugly and hacked on I can barely follow it - and the comments all written in Czeck or Polish doesn't help. Good luck!!! I would say you've got your work cut out for you if all Arachne code looks like this. If you can get disk swapping running using the existing source code, using @toncho11's idea of then just replacing this with and XMS ramdisk would be by far the easiest going forward. So essentially this means the path forward is to continue porting Arachne to ELKS using its existing source base and its existing disk swapping, with its XMS/HIMEM.SYS support turned off, if possible. |
So there should be a /dev/xms on which mkfs will put minix and then we will be able to mount it? |
Right. I'll hold on on the XMS code for now.
I agree. The code is not easy to follow indeed, but the interfaces for swapping seem a good start: And if it works out-of-the-box (or almost) it is already great stuff. |
Yes, pretty much. Either a new /dev/xms or probably better using the existing /dev/rd0 or /dev/rd1 ramdisks (or /dev/ssd), which are then setup using existing utilities to create the size of the ram disk, specify XMS or not, and then the MINIX filesystem on it. Currently we can do the following, which would then be very similar, perhaps with another option to use XMS memory instead of main memory:
For XMS, we could probably just specify "makexms" instead of "make" in the first line above. |
I see - ramdisk will bind /dev/rd0 to either standard RAM or XMS. Before that it is "empty" of size 0. |
The more I think about @toncho11's idea of using an XMS ram disk for handling programs that would not otherwise be able to handle larger amounts of data, the more I like it. The sole application requirement is that it contains a capability to write/swap data to disk and back (not a big technical feat, but definitely requires designing up front) - and "disk access" could be greatly sped up with the use of XMS memory with very little development, providing we look for the right applications in the first place. This goes for editors too - there are a number of "older" editors that used disk swapping in order to handle large files, instead of the "new" way of just allocating huge amounts of memory on 32-/64-bit machine's using malloc or mmap. These editors could probably be very easily converted to work on ELKS by just changing the name of the directory used for ram/disk swapping, and having the system setup as @toncho11 is suggesting with a large RAM disk. For instance, our own (Actually, a large RAM disk could be mounted directly on /tmp now and programs would automatically use RAM disk for their temp files, without any modifications at all). If we replaced our current set of programs (e.g. Thinking like this, it becomes very interesting to consider replacing those programs in ELKS now that don't work well with limited memory with those that are designed to use disk swap files, with or without XMS. |
@rafael2k, this same idea can also apply to image processing. As you know, many images are far too large to be fully decoded and held in RAM before being displayed. Using image decoders that can swap scanlines to swap disk, or even better decode scanline-by-scanline for display, will allow even large images to be displayed on ELKS. I recently used this approach with Of course, BMP decoding is well suited for single scanline decoding, while JPG probably needs at least 8x or 16 scanlines for the dither blocks, and I'm not sure how PNG works. I haven't had time to rewrite the Nano-X server to use this same approach, but this solves a big problem since the current Microwindows design is to have the server hold any images in decoded RGBA internal memory, for fast blitting to the output device. A single larger image would cause the NX server to run out of memory. This excess memory usage issue is the reason I've not been able to port the NX "launcher" application, which is similar to nxstart but uses graphical desktop icons to start applications. Even displaying each of those icons currently causes NX to run out of memory, before the user has a chance to do anything! I had previously considered linking in the images instead of loading them, but this still uses tons of memory. With porting the new single scanline BMP decoder from paint into Nano-X, we'll be able to proceed with the graphical desktop. Of course, there's a lot of other ways to run out of memory, but I wanted to mention this as a design consideration for ELKS that should pay dividends. It would be nice to decide on a somewhat-standard "ELKS" image format, so we don't have to write tons of decoders, and possibly then use image translator tools to convert to that standard format. While BMP allows for easy scanline decoding, its pretty old. Microwindows also supports the super old PNM/PBM/PGM/PPM formats, there aren't many tools for converting them (the NX launcher icons are all in PGM or PPM format). What are your thoughts on this? Have you looked deeper into some of the elks-viewer/ decoders to see whether they might be able to handle single-scanline decoding, or are you having to decode single images entirely into memory before displaying? |
PNG needs deflate and so is pretty memory intensive, this is why I'm holding a bit to work on PNG.
Cool! Certainly it is a design consideration. I used it for the elks-viewer.
All the decoders are single-scanline decoders (BMP, PGM and PPM), while the JPEG is per macroblock. They are pretty small and use very little memory. ELKS image format in my opinion is BMP (with RLE support, if possible! : ) ). It supports 1-bit, 4-bit and 8bit palette modes (including RLE encoding), and also higher color modes. |
And I also like @toncho11 idea of a ramdisk on XMS. Userland which already supports swapfile would be ready to use the XMS. |
TLVC had many commits recently for adding support for HMA: https://github.com/Mellvik/TLVC/commits/master/ |
Hello! I was wondering if anyone had any tips/pointers on where to start porting ELKS to a new architecture. I love this port of Linux and its very minimal resource usage, and I would love to use it on a regular for development. However, since it is currently only for IA16, I would have to use an emulator to do so. Any information would be greatly appreciated. Thank you in advance!
The text was updated successfully, but these errors were encountered: