1 Zulkiramar

Memcpy Vs Assignment Pointers In Java

References might be implemented by storing the address.Usually Java references will be implemented as pointers, but that's not required by the specification. They may be using an additional layer of indirection to enable easier garbage collection. But in the end it will (almost always) boil down to (C-style) pointers being involved in the implementation of (Java-style) references.

You can't do pointer arithmetic with references. The most important difference between a pointer in C and a reference in Java is that you can't actually get to (and manipulate) the underlying value of a reference in Java. In other words: you can't do pointer arithmetic.

In C you can add something to a pointer (i.e. the address) or substract something to point to things that are "nearby" or point to places that are at any place.

In Java, a reference points to one thing and that thing only. You can make a variable hold a different reference, but you can't just ask it to point to "the thing after the original thing".

References are strongly typed. Another difference is that the type of a reference is much more strictly controlled in Java than the type of a pointer is in C. In C you can have an and cast it to a and just re-interpret the memory at that location. That re-interpretation doesn't work in Java: you can only interpret the object at the other end of the reference as something that it already is (i.e. you can cast a reference to reference only if the object pointed to is actually a ).

Those differences make C pointers more powerful, but also more dangerous. Both of those possibilities (pointer arithmetic and re-interpreting the values being pointed to) add flexibility to C and are the source of some of the power of the language. But they are also big sources of problems, because if used incorrectly they can easily break assumptions that your code is built around. And it's pretty easy to use them incorrectly.

Introduction

Several people have recently asked us questions similar to this one: “Can I use memcpy to copy an object of type string?”

Our first impulse is to say that if you have to ask, you shouldn’t be doing it — because you will get in trouble if you try. Nevertheless, the concepts behind the question are interesting enough to merit a closer look.

Briefly, the answer is that you can use memcpy safely to copy an object only if the object’s type is what is called a POD type, which stands for “Plain Old Data.” Because string is not a POD type, there is no guarantee that it is safe to use memcpy on a string.

What memcpy Does

As its name suggests, the memcpy function, from the C Standard library, copies memory:

void* memcpy(void* dest, const void* source, size_t n);

The source and dest arguments each refer to the initial byte of an n-byte region of memory; the two regions must not overlap. The memcpy function copies the memory in the source region to the memory in the dest region, obliterating whatever contents the dest region might have had previously. The memcpy function returns a copy of the dest pointer.

For example, suppose we write:

int x = 42; int y; memcpy(&y, &x, sizeof(int));

As it happens, int is a POD type, so it is safe to use memcpy on int objects. Accordingly, after executing these statements, y will have a value of 42, just as if we had executed:

y = x;

instead of calling memcpy.

The question, then, is what will happen if we write:

string s = "Hello, world!"; string t; memcpy(&t, &s, sizeof(string));

Will t have Hello, world! as its value or will the value be different? Will such a program even work at all?

The answer is that the program is not guaranteed to work, because string is not a POD type. Indeed, it is likely that this program fragment will cause a crash, as we shall see. The rest of this article explains what a POD type is and gives an idea of why memory-manipulation functions such as memcpy are generally safe only when applied to objects of POD type.

Fundamental Types

We can think of the memory in any computer that supports C++ as being composed of a collection of bytes, each of which contains an implementation-defined number of bits. All bytes contain the same number of bits. In a C++ implementation, that number must be at least eight; if the computer hardware does not support 8-bit or larger bytes, the C++ implementation must fake it in software. Most computers that support C++ have bytes that are exactly 8 bits long, but we have seen computers with bytes as long as 64 bits.

There are three important facts to know about bytes:

  1. A byte is the smallest addressable unit of memory. That is, every region of memory that it is possible to use pointers to define comprises an integral number of bytes. Accordingly, it is possible to use a byte address (which C++ uses the void* type to express) and an integer (which represents an object’s size) to refer to the memory that any object occupies.
  2. Every bit in a region of memory is part of exactly one byte. In particular, there is no information that might somehow fall into the cracks between the bytes [1].
  3. The sizeof operator, when given an object or a type as its argument, returns the number of bytes in an object of that type. All objects of a given type are the same size, so only the type matters.

These three properties imply that if x is an object, we can use ((void*)&x) and sizeof(x) together to represent the memory that x occupies. The question, then, is whether there is any more to x than the contents of its memory. It is that question that the POD notion exists to address: if a type is a POD type, the implication is that there is nothing more to an object of that type than the contents of its memory.

The fundamental types — that is, the arithmetic, enumeration, and pointer types — are POD types. In other words, the value of an object of a fundamental type depends entirely on the contents of the region of memory that corresponds to that object. It follows that using memcpy to copy an object of a fundamental type will copy that object’s value.

To see what’s happening more clearly, let’s look again at our earlier example:

int x = 42; int y; memcpy(&y, &x, sizeof(int));

Here, x and y are of fundamental type (int). They are therefore of POD type, so the bytes that constitute them completely determine their values. Moreover, every object of type int comprises sizeof(int) bytes.

When we call memcpy, it copies a number of bytes given by its third argument— in this case, sizeof(int) — from the region of memory occupied by x to the region occupied by y. Accordingly, the call to memcpy has the same effect as executing:

y = x;

because there is nothing more to x or y than the contents of the corresponding memory.

Structures

Let’s expand our universe by using memcpy to copy a structure. For example:

struct Point { int x, y; }; Point p, q; p.x = 123; p.y = 456; memcpy(q, p);

Will the call to memcpy still have the same effect as executing:

q = p;

or does the fact that Point is a user-defined type make memcpy not work?

The answer is that this structure is a POD type, because it is still so simple that its memory entirely determines its value, and therefore memcpy is safe to use in this context.

Structure Assignment

When we defined our Point structure, we did not give it an assignment operator. When we try to assign objects of such a type, the compiler treats such an assignment as being equivalent to assigning the objects’ data members. In other words, executing:

q = p;

has the same effect as executing:

q.x = p.x; q.y = p.y;

Because the x and y members of Point are of fundamental type, we can use memcpy to copy those members. Accordingly, it is also safe to use memcpy to copy the entire object.

Suppose, now, that we were to redefine Point to include an assignment operator:

struct Point { int x, y; Point& operator=(const Point&); };

We have deliberately omitted the definition of this assignment operator so that you won’t be tempted to think that you know what it does. It should now be clear that defining the assignment operator for Point has removed the guarantee that memcpy is safe to use on Point objects, because without seeing the definition of the assignment operator, we have no way of knowing that it has the same effect as calling memcpy.

Even if we define our assignment operator to have the same effect as the compiler-generated one:

Point& Point::operator=(const Point& p) { x = p.x; y = p.y; return *this; }

we should no longer consider it safe to use memcpy to copy a Point object, because doing so would rely on knowledge of the inner workings of the Point type, and those workings might change in the future.

In other words, we should be able to trust memcpy only when we are confident that using memcpy to copy an object will have the same effect as the assignment operator for that object. If the object, or any of its (non-static) data members, has a user-defined assignment operator, the compiler would have to read the definition of that operator to figure out whether it has the same effect as the compiler-generated assignment operator; such figuring in general is provably beyond the reach of any program. The moment any member acquires a user-defined assignment operator, this confidence vanishes. Therefore, a type that has a user-defined assignment operator in any of its data members is not a POD.

A More Precise Definition

We have seen two aspects of POD types: the fundamental types are POD types, and structures with user-defined assignment operators are not POD types. Here are the rest of the details:

  • Arithmetic types, enumeration types, and pointers (including pointers to functions and pointers to members) are POD types.
  • An array is a POD type if its elements are.
  • A structure or union is a POD if all of the following are true:
    • Every one of its non-static data members is a POD.
    • It has no user-declared constructors, assignment operators, or destructor.
    • It has no private or protected non-static data members.
    • It has no base classes.
    • It has no virtual functions.

The idea is that a class is a POD type if it has nothing to hide about its representation. Therefore, we can be sure that the value of an object of such a type is nothing more or less than the values of its components, so that copying the object is equivalent to copying its memory.

Discussion

Let us return to our original question: Is it safe to use memcpy to copy a string? We know that the string class has constructors, so it is not a POD. Therefore, the answer must be no. But what happens if we try anyway? Saying that a class is not a POD is saying only that memcpy is not guaranteed to work. It is not necessarily guaranteed to fail either. The question is whether copying the object is equivalent to copying its memory. To answer this question for the string class, we need to think about how it might be implemented.

A plausible implementation uses the string object itself to store an integer that represents the string’s length and pointer to dynamically allocated memory that contains the string’s characters. This integer and pointer are fundamental types, so surely it must be possible to use memcpy to copy them, right?

Wrong. Here’s the problem:

string s = "hello", t = "world"; memcpy(&s, &t, sizeof(string));

Before we call the memcpy, both s and t contain pointers that refer to memory somewhere:

Calling memcpy will overwrite the pointer in s with a copy of the pointer in t. Now, both pointers will point to the same memory, and the memory holding hello, to which s formerly referred, will be inaccessible and will therefore never be freed:

This program fragment will therefore leak memory. Moreover, when it comes time to destroy s and t, the effect of doing so is likely to be to try to deallocate the same memory twice, resulting in a crash.

This example should make it clear why the rules for defining POD types exclude types such as string. The moment a class author defines a constructor, destructor, or assignment operator, we can no longer be confident that copying an object of that class is equivalent to copying the object’s memory.

Summary

Functions such as memcpy, which deal with a class object’s memory directly, undercut the class author’s intentions. Doing so is dangerous unless the intentions are at so low a level as to make them impossible to undercut. Such a class is called a POD (Plain Old Data) to indicate that there is nothing more to the class than its contents. The moment data abstraction enters the picture, be it through constructors, destructors, assignment operators, base classes, or virtual functions, it is time to use only the operations that the class provides, and eschew low-level operations such as memcpy.

Note

[1] This fact is not as obvious as it sounds. For example, we have seen computers with 36-bit words in which the usual way to represent characters is to stuff five seven-bit characters into a word, with one bit per word left over. This implementation strategy fails on two counts: bytes contain fewer than eight bits, and there are bits that are not part of any byte.

A C++ implementation could solve the first problem by using eight-bit bytes, but that strategy would still leave unused bits in each word. Therefore, a correct solution would have to involve bytes with a size that divides evenly into 36, namely 9, 18, or 36 bits.

Andrew Koenig is a member of the Large-Scale Programming Research Department at AT&T’s Shannon Laboratory, and the Project Editor of the C++ standards committee. A programmer for more than 30 years, 15 of them in C++, he has published more than 150 articles about C++ and speaks on the topic worldwide. He is the author of C Traps and Pitfalls and co-author of Ruminations on C++.

Barbara E. Moo is an independent consultant with 20 years’ experience in the software field. During her nearly 15 years at AT&T, she worked on one of the first commercial projects ever written in C++, managed the company’s first C++ compiler project, and directed the development of AT&T’s award-winning WorldNet Internet service business. She is co-author of Ruminations on C++ and lectures worldwide.

Leave a Comment

(0 Comments)

Your email address will not be published. Required fields are marked *