Memory and Virtual Memory
This chapter describes allocating memory and the low-level routines for modifying memory maps in the kernel. It also describes a number of commonly used interfaces to the virtual memory system. It does not describe how to make changes in paging policy or add additional pagers. OS X does not support external pagers, although much of the functionality can be achieved in other ways, some of which are covered at a high level in this chapter. The implementation details of these interfaces are subject to change, however, and are thus left undocumented.
With the exception of the section Allocating Memory in the Kernel, this chapter is of interest only if you are writing file systems or are modifying the virtual memory system itself.
OS X VM Overview
The VM system used in OS X is a descendent of Mach VM, which was created at Carnegie Mellon University in the 1980s. To a large extent, the fundamental design is the same, although some of the details are different, particularly when enhancing the VM system. It does, however, support the ability to request certain paging behavior through the use of universal page lists (UPLs). See Universal Page Lists (UPLs) for more information.
The design of Mach VM centers around the concept of physical memory being a cache for virtual memory.
At its highest level, Mach VM consists of address spaces and ways to manipulate the contents of those address spaces from outside the space. These address spaces are sparse and have a notion of protections to limit what tasks can access their contents.
At a lower level, the object level, virtual memory is seen as a collection of VM objects and memory objects, each with a particular owner and protections. These objects can be modified with object calls that are available both to the task and (via the back end of the VM) to the pagers.
The VM object is internal to the virtual memory system, and includes basic information about accessing the memory. The memory object, by contrast, is provided by the pager. The contents of the memory associated with that memory object can be retrieved from disk or some other backing store by exchanging messages with the memory object. Implicitly, each VM object is associated with a given pager through its memory object.
VM objects are cached with system pages (RAM), which can be
any power of two multiple of the hardware page size. In the OS X kernel, system pages are the same size as hardware pages. Each
system page is represented in a given address space by a map entry. Each
map entry has its own protection and inheritance. A given map entry
can have an inheritance of shared
, copy
,
or none
. If a page is
marked shared
in a given
map, child tasks share this page for reading and writing. If a page
is marked copy
, child
tasks get a copy of this page (using copy-on-write). If a page is
marked none
, the child’s
page is left unallocated.
VM objects are managed by the machine-independent VM system,
with the underlying virtual to physical mappings handled by the
machine-dependent pmap
system. The pmap
system
actually handles page tables, translation lookaside buffers, segments,
and so on, depending on the design of the underlying hardware.
When a VM object is duplicated (for example,
the data pages from a process that has just called fork
),
a shadow object is created. A shadow object is initially
empty, and contains a reference to another object. When the contents
of a page are modified, the page is copied from the parent object
into the shadow object and then modified. When reading data from a
page, if that page exists in the shadow object, the page listed
in the shadow object is used. If the shadow object has no copy of
that page, the original object is consulted. A series of shadow
objects pointing to shadow objects or original objects is known
as a shadow chain.
Shadow chains can become arbitrarily long if an object is
heavily reused in a copy-on-write fashion. However, since fork
is
frequently followed by exec
,
which replaces all of the material being shadowed, long chains are
rare. Further, Mach automatically garbage collects shadow objects,
removing any intermediate shadow objects whose pages are no longer
referenced by any (nondefunct) shadow object. It is even possible
for the original object to be released if it no longer contains
pages that are relevant to the chain.
The VM calls available to an application include vm_map
and vm_allocate
,
which can be used to map file data or anonymous memory into the
address space. This is possible only because the address space is
initially sparse. In general, an application can either map a file into
its address space (through file mapping primitives, abstracted by
BSD) or it can map an object (after being passed a handle to that
object). In addition, a task can change the protections of the objects
in its address space and can share those objects with other tasks.
In addition to the mapping and allocation aspects of virtual memory, the VM system contains a number of other subsystems. These include the back end (pagers) and the shared memory subsystem. There are also other subsystems closely tied to VM, including the VM shared memory server. These are described in Other VM and VM-Related Subsystems.
Memory Maps Explained
Each Mach task has its own memory map. In Mach, this memory map takes the form of an ordered doubly linked list. As described in OS X VM Overview, each of these objects contains a list of pages and shadow references to other objects.
In general, you should never need to access a memory map directly
unless you are modifying something deep within the VM system. The vm_map_entry
structure
contains task-specific information about an individual mapping along
with a reference to the backing object. In essence, it is the glue
between an VM object and a VM map.
While the details of this data structure are beyond the scope of this document, a few fields are of particular importance.
The field is_submap
is
a Boolean value that tells whether this map entry is a normal VM object
or a submap.
A submap is a collection of mappings that is part of a larger map. Submaps
are often used to group mappings together for the purpose of sharing
them among multiple Mach tasks, but they may be used for many purposes.
What makes a submap particularly powerful is that when several tasks
have mapped a submap into their address space, they can see each
other’s changes, not only to the contents of the objects in the
map, but to the objects themselves. This means that as additional
objects are added to or deleted from the submap, they appear in
or disappear from the address spaces of all tasks that share that
submap.
The field behavior
controls
the paging reference behavior of a specified range in a given map.
This value changes how pageins are clustered. Possible values are VM_BEHAVIOR_DEFAULT
, VM_BEHAVIOR_RANDOM
, VM_BEHAVIOR_SEQUENTIAL
,
and VM_BEHAVIOR_RSEQNTL
,
for default, random, sequential, or reverse-sequential pagein ordering.
The protection
and max_protection
fields
control the permissions on the object. The protection
field
indicates what rights the task currently has for the object, while
the max_protection
field
contains the maximum access that the current task can obtain for
the object.
You might use the protection
field
when debugging shared memory. By setting the protection to be read-only,
any inadvertent writes to the shared memory would cause an exception.
However, when the task actually needs to write to that shared region,
it could increase its permissions in the protection
field
to allow writes.
It would be a security hole if a task could increase
its own permissions on a memory object arbitrarily, however. In
order to preserve a reasonable security model, the task that owns a
memory object must be able to limit the rights granted to a subordinate
task. For this reason, a task is not allowed to increase its protection
beyond the permissions granted in max_protection
.
Possible values for protection
and max_protection
are
described in detail in xnu/osfmk/mach/vm_prot.h
.
Finally, the use_pmap
field
indicates whether a submap’s low-level mappings should be shared
among all tasks into which the submap is mapped. If the mappings
are not shared, then the structure of the map is shared among all
tasks, but the actual contents of the pages are not.
For example, shared libraries are handled with
two submaps. The read-only shared code section has use_pmap
set
to true. The read-write (nonshared) section has use_pmap
set
to false, forcing a clean copy of the library’s DATA
segment
to be mapped in from disk for each new task.
Named Entries
The OS X VM system provides an abstraction known as a named entry. A named entry is nothing more than a handle to a shared object or a submap.
Shared memory support in OS X is achieved by sharing objects
between the memory maps of various tasks. Shared memory objects
must be created from existing VM objects by calling vm_allocate
to
allocate memory in your address space and then calling mach_make_memory_entry_64
to
get a handle to the underlying VM object.
The handle returned by mach_make_memory_entry_64
can
be passed to vm_map
to
map that object into a given task’s address space. The handle
can also be passed via IPC or other means to other tasks so that
they can map it into their address spaces. This provides the ability
to share objects with tasks that are not in your direct lineage,
and also allows you to share additional memory with tasks in your
direct lineage after those tasks are created.
The other form of named entry, the submap, is used to group
a set of mappings. The most common use of a submap is to share mappings
among multiple Mach tasks. A submap can be created with vm_region_object_create
.
What makes a submap particularly powerful is that when several tasks have mapped a submap into their address space, they can see each other’s changes to both the data and the structure of the map. This means that one task can map or unmap a VM object in another task’s address space simply by mapping or unmapping that object in the submap.
Universal Page Lists (UPLs)
A universal page list, or UPL, is a data structure used when
communicating with the virtual memory system. UPLs can be used to
change the behavior of pages with respect to caching, permissions,
mapping, and so on. UPLs can also be used to push data into and pull
data from VM objects. The term is also often used to refer to the
family of routines that operate on UPLs. The flags used when dealing
with UPLs are described in osfmk/mach/memory_object_types.h
.
The life cycle of a UPL looks like this:
A UPL is created based on the contents of a VM object. This UPL includes information about the pages within that object.
That UPL is modified in some way.
The changes to the UPL are either committed (pushed back to the VM system) or aborted, with
ubc_upl_commit
orubc_upl_abort
, respectively.
If you have a control handle for a given VM object (which
generally means that you are inside a pager), you can use vm_object_upl_request
to
get a UPL for that object. Otherwise, you must use the vm_map_get_upl
call.
In either case, you are left with a handle to the UPL.
When a pagein is requested, the pager receives a list of pages that are locked against the object, with certain pages set to not valid. The pager must either write data into those pages or must abort the transaction to prevent invalid data in the kernel. Similarly in pageout, the kernel must write the data to a backing store or abort the transaction to prevent data loss. The pager may also elect to bring additional pages into memory or throw additional pages out of memory at its discretion.
Because pagers can be used both for virtual memory and for
memory mapping of file data, when a pageout is requested, the data
may need to be freed from memory, or it may be desirable to keep
it there and simply flush the changes to disk. For this reason,
the flag UPL_CLEAN_IN_PLACE
exists
to allow a page to be flushed to disk but not removed from memory.
When a pager decides to page in or out additional pages, it
must determine which pages to move. A pager can request all of the
dirty pages by setting the RETURN_ONLY_DIRTY
flag. It
can also request all pages that are not in memory using the RETURN_ONLY_ABSENT
flag.
There is a slight problem, however. If a given page is marked
as BUSY
in the UPL, a
request for information on that page would normally block. If the
pager is doing prefetching or preflushing, this is not desirable,
since it might be blocking on itself or on some other pager that
is blocked waiting for the current transaction to complete. To avoid
such deadlock, the UPL mechanism provides the UPL_NOBLOCK
flag.
This is frequently used in the anonymous pager for requesting free
memory.
The flag QUERY_OBJECT_TYPE
can
be used to determine if an object is physically contiguous and to
get other properties of the underlying object.
The flag UPL_PRECIOUS
means
that there should be only one copy of the data. This prevents having
a copy both in memory and in the backing store. However, this breaks
the adjacency of adjacent pages in the backing store, and is thus
generally not used to avoid a performance hit.
The flag SET_INTERNAL
is
used by the BSD subsystem to cause all information about a UPL to
be contained in a single memory object so that it can be passed
around more easily. It can only be used if your code is running
in the kernel’s address space.
Since this handle can be used for multiple small transactions
(for example, when mapping a file into memory block-by-block), the
UPL API includes functions for committing and aborting changes to
only a portion of the UPL. These functions are upl_commit_range
and upl_abort_range
,
respectively.
To aid in the use of UPLs for handling multi-part transactions,
the upl_commit_range
and upl_abort_range
calls
have a flag that causes the UPL to be freed when there are no unmodified
pages in the UPL. If you use this flag, you must be very careful
not to use the UPL after all ranges have been committed or aborted.
Finally, the function vm_map_get_upl
is
frequently used in file systems. It gets the underlying VM object
associated with a given range within an address space. Since this returns
only the first object in that range, it is your responsibility to
determine whether the entire range is covered by the resulting UPL
and, if not, to make additional calls to get UPLs for other objects.
Note that while the vm_map_get_upl
call
is against an address space range, most UPL calls
are against a vm_object
.
Using Mach Memory Maps
From the context of the kernel (not from a KEXT), there are two maps that you will probably need to deal with. The first is the kernel map. Since your code is executing in the kernel’s address space, no additional effort is needed to use memory referenced in the kernel map. However, you may need to add additional mappings into the kernel map and remove them when they are no longer needed.
The second map of interest is the memory map for a given task.
This is of most interest for code that accepts input from user programs,
for example a sysctl
or
a Mach RPC handler. In nearly all cases, convenient wrappers provide
the needed functionality, however.
Most of these functions are based around the vm_offset_t
type, which is a pointer-sized integer. In effect, you can think of them as pointers, with the caveat that they are not necessarily pointers to data in the kernel’s address space, depending on usage.
The low-level VM map API includes the following functions:
kern_return_t vm_map_copyin(vm_map_t src_map, vm_offset_t src_addr, |
vm_size_t len, boolean_t src_destroy, |
vm_map_copy_t *copy_result); |
kern_return_t vm_map_copyout(vm_map_t map, vm_offset_t *addr, /* Out */ |
register vm_map_copy_t copy); |
kern_return_t vm_map_copy_overwrite(vm_map_t dst_map, |
vm_offset_t dst_address,vm_map_copy_t copy, |
boolean_t interruptible, pmap_t pmap); |
void vm_map_copy_discard(vm_map_copy_t copy); |
void vm_map_wire(vm_map_t map, vm_offset_t start, vm_offset_t end, |
vm_prot_t access_type, boolean_t user_wire); |
void vm_map_unwire(vm_map_t map, vm_offset_t start, vm_offset_t end, |
boolean_t user_wire); |
The function vm_map_copyin
copies
data from an arbitrary (potentially non–kernel) memory map into
a copy list and returns the copy list pointer in copy_result
.
If something goes wrong and you need to throw away this intermediate
object, it should be freed with vm_map_copy_discard
.
In order to actually get the data from the copy list, you
need to overwrite a memory object in the kernel’s address space
with vm_map_copy_overwrite
. This overwrites
an object with the contents of a copy list. For most purposes, the
value passed for interruptible
should be FALSE
,
and pmap
should be NULL
.
Copying data from the kernel to user space is exactly the
same as copying data from user space, except that you pass kernel_map
to vm_map_copyin
and
pass the user map to vm_map_copy_overwrite
.
In general, however, you should avoid doing this, since you could end
up with a task’s memory being fragmented into lots of tiny objects,
which is undesirable.
Do not use vm_map_copyout
when
copying data into an existing user task’s address map. The function vm_map_copyout
is
used for filling an unused region in an address map. If the region
is allocated, then vm_map_copyout
does
nothing. Because it requires knowledge of the current state of the
map, it is primarily used when creating a new address map (for example,
if you are manually creating a new process). For most purposes,
you do not need to use vm_map_copyout
.
The functions vm_map_wire
and vm_map_unwire
can
be used to wire and unwire portions of an address map. If you set
the argument user_wire
to TRUE
,
then the page can be unwired from user space. This should be set
to FALSE
if you are about
to use the memory for I/O or for some other operation that cannot
tolerate paging. In vm_map_wire
,
the argument access_type
indicates
the types of accesses that should not be allowed to generate a page fault.
In general, however, you should be using vm_wire
to
wire memory.
As mentioned earlier, this information is presented strictly for use in the heart of the kernel. You cannot use anything in this section from a kernel extension.
Other VM and VM-Related Subsystems
There are two additional VM subsystems: pagers and the working set detection subsystem. In addition, the VM shared memory server subsystem is closely tied to (but is not part of) the VM subsystem. This section describes these three VM and VM-related subsystems.
Pagers
OS X has three basic pagers: the vnode pager, the default pager (or anonymous pager), and the device pager. These are used by the VM system to actually get data into the VM objects that underlie named entries. Pagers are linked into the VM system through a combination of a subset of the old Mach pager interface and UPLs.
The default pager is what most people think of when they think of a VM system. It is responsible for moving normal data into and out of the backing store. In addition, there is a facility known as the dynamic pager that sits on top of the default pager and handles the creation and deletion of backing store files. These pager files are filled with data in clusters (groups of pages).
When the total fullness of the paging file pool reaches a high–water mark, the default pager asks the dynamic pager to allocate a new store file. When the pool drops below its low water mark, the VM system selects a pager file, moves its contents into other pager files, and deletes it from disk.
The vnode pager has a 1:1 (onto) mapping between objects in VM space and open files (vnodes). It is used for memory mapped file I/O. The vnode pager is generally hidden behind calls to BSD file APIs.
The device pager allows you to map non–general-purpose memory with the cache characteristics required for that memory (WIMG). Non–general–purpose memory includes physical addresses that are mapped onto hardware other than main memory—for example, PCI memory, frame buffer memory, and so on. The device pager is generally hidden behind calls to various I/O Kit functions.
Working Set Detection Subsystem
To improve performance, OS X has a subsystem known as the working set detection subsystem. This subsystem is called on a VM fault; it keeps a profile of the fault behavior of each task from the time of its inception. In addition, just before a page request, the fault code asks this subsystem which adjacent pages should be brought in, and then makes a single large request to the pager.
Since files on disk tend to have fairly good locality, and since address space locality is largely preserved in the backing store, this provides a substantial performance boost. Also, since it is based upon the application’s previous behavior, it tends to pull in pages that would probably have otherwise been needed later. This occurs for all pagers.
The working set code works well once it is established. However, without help, its performance would be the baseline performance until a profile for a given application has been developed. To overcome this, the first time that an application is launched in a given user context, the initial working set required to start the application is captured and stored in a file. From then on, when the application is started, that file is used to seed the working set.
These working set files are established on a per-user basis.
They are stored in /var/vm/app_profile
and
are only accessible by the super-user (and the kernel).
VM Shared Memory Server Subsystem
The VM shared memory server
subsystem is a BSD service that is closely tied to VM, but is not
part of VM. This server provides two submaps that are used for shared
library support in OS X. Because shared libraries contain
both read-only portions (text segment) and read-write portions (data
segment), the two portions are treated separately to maximize efficiency.
The read-only portions are completely shared between tasks, including
the underlying pmap
entries.
The read-write portions share a common submap, but have different
underlying data objects (achieved through copy-on-write).
The three functions exported by the VM shared memory server
subsystem should only be called by dyld
.
Do not use them in your programs.
The function load_shared_file
is
used to load a new shared library into the system. Once such a file
is loaded, other tasks can then depend on it, so a shared library
cannot be unshared. However, a new set of shared regions can be
created with new_system_shared_regions
so
that no new tasks will use old libraries.
The function reset_shared_file
can
be used to reset any changes that your task may have made to its
private copy of the data section for a file.
Finally, the function new_system_shared_regions
can
be used to create a new set of shared regions for future tasks.
New regions can be used when updating prebinding with new shared
libraries to cause new tasks to see the latest libraries at their
new locations in memory. (Users of old shared libraries will still
work, but they will fall off the pre-bound path and will perform
less efficiently.) It can also be used when dealing with private libraries
that you want to share only with your task’s descendents.
Address Spaces
This section explains issues that some developers may see when using their drivers in Panther or later. These changes were necessitated by a combination of hardware and underlying OS changes; however, you may see problems resulting from the changes even on existing hardware.
There are three basic areas of change in OS X v10.3. These are:
IOMemoryDescriptor
changesVM system (
pmap
) changesKernel dependency changes
These are described in detail in the sections that follow.
Background Info on PCI Address Translation
To allow existing device drivers to work with upcoming 64-bit system architectures, a number of changes were required. To explain these, a brief introduction to PCI bus bridges is needed.
When a PCI device needs to perform a data transaction to or from main memory, the device driver calls a series of functions intended to prepare this memory for I/O. In an architecture where both the device drivers and the memory subsystem use 32-bit addressing, everything just works, so long as the memory doesn't get paged out during the I/O operation. As kernel memory is generally not pageable, the preparation is largely superfluous.
On a system whose memory subsystem uses 64-bit addressing, however, this becomes a bit of a problem. Because the hardware devices on the PCI bus can only handle 32-bit addresses, the device can only “see” a 4 gigabyte aperture into the (potentially much larger) main memory at any given time.
There are two possible solutions for this problem. The easy (but slow) solution would be to use “bounce buffers”. In such a design, device drivers would copy data into memory specifically allocated within the bottom 4 gigs of memory. However, this incurs a performance penalty and also puts additional constraints on the lower 4 gigs of memory, causing numerous problems for the VM system.
The other solution, the one chosen in Apple's 64-bit implementation, is to use address translation to “map” blocks of memory into the 32-bit address space of the PCI devices. While the PCI device can still only see a 4 gig aperture, that aperture can then be non-contiguous, and thus bounce buffers and other restrictions are unnecessary. This address translation is done using a part of the memory controller known as DART, which stands for Device Address Resolution Table.
This introduces a number of potential problems, however. First, physical addresses as seen by the processor no longer map 1:1 onto the addresses as seen by PCI devices. Thus, a new term, I/O addresses, is introduced to describe this new view. Because I/O addresses and physical addresses are no longer the same, the DART must keep a table of translations to use when mapping between them. Fortunately, if your driver is written according to Apple guidelines (using only documented APIs), this process is handled transparently.
IOMemoryDescriptor
Changes
When your driver calls IOMemoryDescriptor
::
prepare
,
a mapping is automatically injected into the DART. When it calls IOMemoryDescriptor
::
release
,
the mapping is removed. If you fail to do this, your driver could
experience random data corruption or panics.
Because the DART requires different caching for reads and writes, the DMA direction is important on hardware that includes a DART. While you may receive random failures if the direction is wrong in general (on any system), if you attempt to call WriteBytes on a memory region whose DMA direction is set up for reading, you will cause a kernel panic on 64-bit hardware.
If you attempt to perform a DMA transaction to unwired (user) memory, on previous systems, you would only get random crashes, panics, and data corruption. On machines with a DART, you will likely get no data whatsoever.
As a side-effect of changes in the memory subsystem, OS X is much more likely to return physically contiguous page ranges in memory regions. Historically, OS X returned multi-page memory regions in reverse order, starting with the last page and moving towards the first page. The result of this was that multi-page memory regions essentially never had a contiguous range of physical pages.
Because of the increased probability of seeing physically contiguous blocks of memory in a memory region, this change may expose latent bugs in some drivers that only show up when handling contiguous ranges of physical pages, which could result in incorrect behavior or panics.
Note that the problems mentioned above are caused by bugs in the drivers, and could result in problems on older hardware prior to Panther. These issues are more likely to occur in Panther and later versions of OS X, however, because of the new hardware designs and the OS changes that were made to support those designs.
VM System and
pmap
Changes:
In Panther, as a result of the changes described in detail
in the section on PCI address translation, physical addresses obtained
directly from the pmap
layer
have no useful purpose outside the VM system itself. To prevent
their inadvertent use in device drivers, the pmap
calls
are no longer available from kernel extensions.
A few drivers written prior to the addition of the IOMemoryDescriptor
class
still use pmap
calls
to get the physical pages associated with a virtual address. Also,
a few developers have looked at the IOMemoryDescriptor
implementation
and chosen to obtain addresses directly from the pmap
layer to remove what was perceived as an unnecessary abstraction
layer.
Even without removing access to the pmap
calls,
these drivers would not function on systems with a DART (see the
PCI section above for info on DARTs). To better emphasize this upcoming
failure, Panther will cause these drivers to fail to load with an
undefined symbol error (generally for pmap_extract
)
even on systems without a DART.
Kernel Dependency Changes
Beginning in Panther, device drivers that declare a dependency on version 7 (the Panther version) of the I/O Kit will no longer automatically get symbols from Mach and BSD. This change was made to discourage I/O Kit developers from relying on symbols that are not explicitly approved for use in the I/O Kit.
Existing drivers are unaffected by this change. This change only affects you if you explicitly modify your device driver to declare a dependency on version 7 of the I/O Kit to take advantage of new I/O Kit features.
Summary
As described above, some device drivers may require minor modifications to support Panther and higher. Apple has made every effort to ensure compatibility with existing device drivers to the greatest extent possible, but a few drivers may break. If your driver breaks, you should first check to see if your driver includes any of the bugs described in the previous sections. If it does not, contact Apple Developer Technical Support for additional debugging suggestions.
Allocating Memory in the Kernel
As with most things in the OS X kernel, there are a number of ways to allocate memory. The choice of routines depends both on the location of the calling routine and on the reason for allocating memory. In general, you should use Mach routines for allocating memory unless you are writing code for use in the I/O Kit, in which case you should use I/O Kit routines.
Allocating Memory From a Non-I/O-Kit Kernel Extension
The <libkern/OSMalloc.h>
header defines the following routines for kernel memory allocation:
OSMalloc
—allocates a block of memory.OSMalloc_noblock
—allocates a block of memory, but immediately returns NULL if the request would block.OSMalloc_nowait
—same asOSMalloc_noblock
.OSFree
—releases memory allocated with any of theOSMalloc
variants.OSMalloc_Tagalloc
—allows you to create a unique tag for your memory allocations. Youmust
create at least one tag before you can use any of theOSMalloc
functions.OSMalloc_Tagfree
—releases a tag allocated with OSMalloc_Tagalloc. (You must release all allocations associated with that tag before you call this function.)
For example, to allocate and free a page of wired memory, you might write code like this:
#include <libkern/OSMalloc.h> |
#define MYTAGNAME "com.apple.mytag" |
... |
OSMallocTag mytag = OSMalloc_Tagalloc(MYTAGNAME, OSMT_DEFAULT); |
void *datablock = OSMalloc(PAGE_SIZE_64, mytag); |
... |
OSFree(datablock, PAGE_SIZE_64, mytag); |
To allocate a page of pageable memory, pass OSMT_PAGEABLE
instead of OSMT_DEFAULT
in your call to OSMalloc_Tagalloc
.
Allocating Memory From the I/O Kit
Although the I/O Kit is generally beyond the scope of this document, the I/O Kit memory management routines are presented here for completeness. In general, I/O Kit routines should not be used outside the I/O Kit. Similarly, Mach allocation routines should not be directly used from the I/O Kit because the I/O Kit has abstractions for those routines that fit the I/O Kit development model more closely.
The I/O Kit includes the following routines for kernel memory allocation:
void *IOMalloc(vm_size_t size); |
void *IOMallocAligned(vm_size_t size, vm_size_t alignment); |
void *IOMallocContiguous(vm_size_t size, vm_size_t alignment, |
IOPhysicalAddress *physicalAddress); |
void *IOMallocPageable(vm_size_t size, vm_size_t alignment); |
void IOFree(void *address, vm_size_t size); |
void IOFreeAligned(void *address, vm_size_t size); |
void IOFreeContiguous(void *address, vm_size_t size); |
void IOFreePageable(void *address, vm_size_t size); |
Most of these routines are relatively transparent wrappers around the Mach allocation functions. There are two major differences, however. First, the caller does not need to know which memory map is being modified. Second, they have a separate free call for each allocation call for internal bookkeeping reasons.
The functions IOMallocContiguous
and IOMallocAligned
differ somewhat from their Mach underpinnings. IOMallocAligned
uses calls directly to Mach VM to add support for arbitrary (power of 2) data alignment, rather than aligning based on the size of the object. IOMallocContiguous
adds an additional parameter, PhysicalAddress
. If this pointer is not NULL
, the physical address is returned through this pointer. Using Mach functions, obtaining the physical address requires a separate function call.
Allocating Memory In the Kernel Itself
In addition to the routines available to kernel extensions, there are a number of other functions you can call to allocate memory when you are modifying the Mach kernel itself. Mach routines provide a relatively straightforward interface for allocating and releasing memory. They are the preferred mechanism for allocating memory outside of the I/O Kit. BSD also offers _MALLOC
and _FREE
, which may be used in BSD parts of the kernel.
These routines do not provide for forced mapping of a given physical address to a virtual address. However, if you need such a mapping, you are probably writing a device driver, in which case you should be using I/O Kit routines instead of Mach routines.
Most of these functions are based around the vm_offset_t
type, which is a pointer-sized integer. In effect, you can think of them as pointers, with the caveat that they are not necessarily pointers to data in the kernel’s address space, depending on usage.
These are some of the commonly used Mach routines for allocating memory:
kern_return_t kmem_alloc(vm_map_t map, vm_offset_t *addrp, vm_size_t size); |
void kmem_free(vm_map_t map, vm_offset_t addr, vm_size_t size); |
kern_return_t mem_alloc_aligned(vm_map_t map, vm_offset_t *addrp, |
vm_size_t size); |
kern_return_t kmem_alloc_wired(vm_map_t map, vm_offset_t *addrp, |
vm_size_t size); |
kern_return_t kmem_alloc_pageable(vm_map_t map, vm_offset_t *addrp, |
vm_size_t size); |
kern_return_t kmem_alloc_contig(vm_map_t map, vm_offset_t *addrp, |
vm_size_t size, vm_offset_t mask, int flags); |
These functions all take a map as the first argument. Unless
you need to allocate memory in a different map, you should pass kernel_map
for
this argument.
All of the kmem_alloc
functions
except kmem_alloc_pageable
allocate
wired memory. The function kmem_alloc_pageable
creates
the appropriate VM structures but does not back the region with
physical memory. This function could be combined with vm_map_copyout
when creating
a new address map, for example. In practice, it is rarely used.
The function kmem_alloc_aligned
allocates
memory aligned according to the value of the size
argument,
which must be a power of 2.
The function kmem_alloc_wired
is
synonymous with kmem_alloc
and
is appropriate for data structures that cannot be paged out. It
is not strictly necessary; however, if you explicitly need certain
pieces of data to be wired, using kmem_alloc_wired
makes
it easier to find those portions of your code.
The function kmem_alloc_contig
attempts
to allocate a block of physically contiguous memory. This is not
always possible, and requires a full sort of the system free list
even for short allocations. After startup, this sort can cause long
delays, particularly on systems with lots of RAM. You should generally
not use this function.
The function kmem_free
is
used to free an object allocated with one of the kmem_alloc
functions.
Unlike the standard C free
function, kmem_free
requires
the length of the object. If you are not allocating fixed-size objects
(for example, sizeof struct foo
),
you may have to do some additional bookkeeping, since you must free
an entire object, not just a portion of one.
Copyright © 2002, 2013 Apple Inc. All Rights Reserved. Terms of Use | Privacy Policy | Updated: 2013-08-08