Skip to content

Types

Fundamental Types

Target environment: x86_64 + Linux + GCC. Mapping between common Java and C++ fundamental types:

Java TypeNew C++ TypeOld C++ TypeNotes
voidvoid
booleanbool
charchar8_t (C++20), char16_t, char32_tchar, wchar
byteint8_t, uint8_t, std::byte (C++17) / gsl::bytechar, unsigned charByte should be represented using an unsigned type, but Java does not support unsigned types, so here it corresponds to a signed type
shortint16_tshort
intint32_tint
longint64_tlong, long long
floatfloat
doubledouble

Note

Lengths of legacy types like int / long depend on CPU / OS / compiler; do not assume a fixed size. See https://en.cppreference.com/w/cpp/language/types for common correspondences and detailed descriptions.

Other fundamental types common in C++ but not present in Java:

  • nullptr_t: the real type of nullptr (the introduction of nullptr was to solve the problem of whether 0 is an integer or a null pointer).
  • size_t / ssize_t (ssize_t is not from the C++ standard but provided by the POSIX standard; avoid using it when writing your own portable programs, but when you see it in system APIs you should understand its meaning): a sufficiently large integer type, large enough to represent the size of any C++ object, usually the machine word size (32 bits on 32-bit machines, 64 bits on 64-bit machines), though the standard allows it to be smaller or larger.
  • Unsigned versions of all integers: unsigned int, uint64_t, etc.

Note

If code does not conform to the C++ standard it may not be portable—e.g. it might fail to compile on ARM or behave unexpectedly. For variable-length fundamental types, never assume a specific size.

Unsigned types are called out separately to emphasize importance. Simply remember these principles:

  1. Prefer signed types to represent counts / lengths (STL / system APIs are historical exceptions)
  2. Use unsigned types when representing raw byte sequences / memory blocks / bit fields / explicitly non-negative semantics (e.g., use uint32_t to represent an IPv4 address)
  3. Be especially careful when comparing signed and unsigned types
    1. You can convert both to a signed type with a larger representable range and then compare, e.g. convert int32_t and uint32_t both to int64_t and compare
    2. If you can ensure the signed integer is not negative (use DCHECK_GE(value, 0) to check and serve as a hint), convert it to an unsigned integer for comparison, since in this case the unsigned version can represent a larger range of values than the signed version
    3. If you cannot ensure the signed integer is non-negative, first check whether it is negative; if it is negative it must be less than the unsigned integer, otherwise fall back to the previous case

Note

Avoid raw C++ arrays: when passed to a function they decay to raw pointers and length information is lost; raw pointers are unsafe.

Prefer std::span<int> (or absl::Span), or const std::array<int, kSize>& (less common since it encodes the size into the type and propagates into signatures). Or just use std::vector as a dynamic array (if using Abseil, absl::FixedArray may be better).

C++ Value Category Classification (Optional Reading)

These are the often-heard categories: lvalue, rvalue, xvalue, etc. For most Java-to-C++ application development you can get by without them; for lower-level library work you should learn them. See https://en.cppreference.com/w/cpp/language/value_category.

Extended Discussion on Using Signed Integer to Represent Length

A common error example of using unsigned integers:

// Loop never end because |unsigned int| would wrap to |UINT_MAX| when counting
// down beyond 0.
for (auto i = v.size(); i >= 0; --i) {
}

Google Style Guide mentions:

Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point.

The C++20 standard added the ssize() function to return container size expressed as a signed integer https://en.cppreference.com/w/cpp/iterator/size.

User-Defined Types

In principle user-defined types in C++ have the same capabilities as built-in fundamental types. Java differs: built-in numeric types are always value + stack; user-defined types are reference + heap.

class vs struct

C++ supports using class and struct to define user-defined types. In principle the capability of user-defined types is as strong as that of built-in types. The only difference between defining a type with class and with struct is that default member visibility is private (class) vs public (struct).

Note

There is no package visibility level in C++.

struct comes from C; in C++ it is just an alternative spelling for class differing only in default visibility. A “simple” C-style struct allows raw memory operations like memcpy (otherwise copy constructors etc. matter). Dangerous, but sometimes high-performance. Such simple types are called POD (deprecated since C++20) or Standard Layout. Details omitted (see Microsoft docs link).

Constructors and Destructors (Updated 2021-07-18)

All variables must be initialized before use unless you guarantee an assignment precedes every read (which is still initialization). Otherwise the value is indeterminate; memory is not guaranteed to be zeroed. Constructors in both Java and C++ ensure members are initialized.

Unlike Java, in C++ there are three ways to initialize member variables (ordered by recommended usage, and can be approximated as their execution order):

  1. Default initialization at the member declaration site
  2. Constructor initialization list
  3. Constructor body

Prefer initializing members at the declaration site. This avoids omissions when adding constructors and keeps default values near declarations. Initialize with constants / literals this way, e.g.:

class Person {
  //...
 private:
  static constexpr int32_t kUnspecifiedAge = -1;

  std::string name_{};            // Initialized with an empty string.
  int32_t age_{kUnspecifiedAge};  // Initialized with an constant.
};

// Defined in person.cc, prior to C++17.
constexpr int32_t Person::kUnspecifiedAge;

If constructor parameters are required, use the initialization list as much as possible (better performance); fall back to the body only when necessary.

class Person {
 public:
  Person() = default;  // Use default keyword to generate default constructor.
                       // Provide a default constructor is useful for receiving
                       // values from output parameters.
                       //
                       // Example:
                       //   Person p;
                       //   Status s = LoadPerson(db, key, &p);
                       //   CHECK(s.ok()) << "Failed to load. reason=" << s;

  // Although |name_| & |age_| default value is specified during declaration,
  // the member initializer list would override it when using this constructor.
  Person(std::string name, int32_t age) : name_(std::move(name)), age_(age) {
    // Execute constructor body after member initializer list executed.
    CHECK(!name_.empty());  // Cannot use |name| here because it already moved.
    CHECK_GT(age_, 0);
  }

 private:
  static constexpr int32_t kUnspecifiedAge = -1;

  std::string name_{};            // Initialized with an empty string.
  int32_t age_{kUnspecifiedAge};  // Initialized with an constant.
};

// Defined in person.cc, prior to C++17.
constexpr int32_t Person::kUnspecifiedAge;

Note

Pay special attention: keep the initialization list order aligned with member declaration order.

Initialization actually follows the order of member declarations regardless of list order. A mismatch invites confusion and subtle bugs. Example:

class Person {
 public:
  Person() = default;

  // !!! THIS IS AN INCORRECT PRACTICE !!!
  Person(std::string name)
      :  // Expecting |name| valid here, but actually not! UNDEFINED BEHAVIOR!
        id_(absl::StrCat(name, "_", GetNextUniqueId())),
        // The next line would be executed before initializing |id_|.
        name_(std::move(name)) {}

 private:
  std::string name_;
  std::string id_;
};

The destructor plays a role analogous to Java's AutoClosable.close() but is implicit: leaving scope invokes it (RAII / try-with-resources like behavior).

class File {
 public:
  static constexpr int kInvalidFileDescriptor = -1;

  File() = default;
  explicit File(int fd) : fd_(fd) {}

  // Disable Copy
  // ...

  // Close OS managed resource during destruction.
  // Example:
  //   {
  //     File file(::open(filename, O_RDONLY));
  //     PCHECK(file.valid()) << "Failed to open file '" << filename << "'.";
  //     // Read file contents...
  //   }  // Execute ~File() automatically to close the OS managed resource.
  ~File() {
    if (fd_ != kInvalidFileDescriptor) {
      PCHECK(::close(fd_) == 0) << "Failed to close file.";
    }
  }

 private:
  int fd_{kInvalidFileDescriptor};
};

// Defined in file.cc, prior to C++17.
constexpr int File::kInvalidFileDescriptor;

Note

In C++, constructors can throw exceptions and be caught, but if a destructor throws an exception it directly causes the program to exit. If a constructor might throw, then in the destructor you must be aware some member variables may not have been properly initialized before throwing.

In practice many projects avoid exceptions (see Google C++ Style Guide). Without exceptions a Status return type simulates throw/catch. Pattern for constructor-like failure handling:

class HttpClient {
 public:
  static absl::StatusOr<HttpClient> Make(HttpClientOptions options) {
    HttpClient client(std::move(options));
    absl::Status s = client.Init();
    if (!s.ok()) {
      return s;
    }

    return client;
  }

 protected:
  // Construct a |HttpClient| instance, must call |Init()| immediately after
  // creation. Use |explicit| keyword to prevent implicit cast.
  explicit HttpClient(HttpClientOptions options)
      : options_(std::move(options)) {}

  absl::Status Init() { return RefreshOauth2Token(); }

  absl::Status RefreshOauth2Token();

 private:
  HttpClientOptions options_{};
};

The constructor uses explicit because it has a single parameter. Without it an unintended implicit conversion would exist. Avoid such conversions to preserve type safety.

class ImplicitCastAllowedInt {
 public:
  ImplicitCastAllowedInt(int value);
};

class ImplicitCastDisallowedInt {
 public:
  explicit ImplicitCastDisallowedInt(int value);
};

void PassImplicitCastAllowedInt(ImplicitCastAllowedInt);
void PassImplicitCastDisallowedInt(ImplicitCastDisallowedInt);

PassImplicitCastAllowedInt(1);  // Implicit cast 1 to ImplicitCastAllowedInt(1)
PassImplicitCastDisallowedInt(1);  // Won't compile!

Note

Make constructors minimal: only essential initialization. Defer side effects (e.g. background threads) via separate start methods.

Example (background thread): starting one inside a constructor is problematic. Outline:

  1. The parent class constructor starts a background thread.
  2. The background thread calls a virtual function; at this time the child class constructor has not yet finished executing. Thus the overridden virtual function in the child class may access member variables not yet initialized.

Note

Similarly do not join/wait in the destructor; derived parts may already be gone.

Note

Avoid calling virtual functions (directly or indirectly) in constructors/destructors.

Suppose there is a base class Base and a derived class Child. Constructing a Child object first calls Base's constructor to initialize Base's members, then calls Child's constructor to initialize Child's members; destructing a Child object first destructs Child's members, then Base's members. In this regard C++ and Java are the same.

If a Base constructor calls a virtual overridden in Child, the Child portion is uninitialized—undefined behavior risk. Same for destructors. Hence: avoid.

Sometimes the call chain is indirect: ctor -> helper A -> virtual B. Or Start() launches a thread whose callback uses virtual functions; Stop() joins it; destructor auto-calls Stop() — same pitfall.

Note

Calling a virtual in a constructor dispatches only to the base-class implementation (as if non-virtual). Which can be even more confusing.

(Copy-by) Value-Semantics Types / Reference-Semantics Types

Java user-defined types have a major limitation: it is impossible to define a "value type". In Java, int is a value type while Integer is a reference type. A significant difference between them is as follows:

int a;
int b;
b = a; // Copy the value of `a` to `b`, both `a` and `b` have the same value.

Integer c;
Integer d;
d = c; // Copy the reference from `c` to `d`, both `c` and `d` reference the same value.

Note

Java lets you define only reference types (aside from primitives). C++ defaults to value semantics; extra work is required to force reference-only usage.

For value types, it is best to overload operator== and operator!= and provide a hash function. We will cover this later in the operator overloading section.

Below is an example of creating a "reference type" in C++:

// We can copy either the value or the reference of a ValueType.
//
// This is a trivial type. The compiler would generate the default constructor,
// default copy constructor & copy assignment operator, default move constructor
// & move assignment operator, default destructor for it. As a result, it's
// copyable & moveable.
//
// Example:
//   ValueType a;
//   ValueType b = a;   // Allowed to copy the value.
//   ValueType& c = a;  // Allowed to reference the value.
struct ValueType {
  int32_t id;
  std::string name;
};

// We cannot copy the value of a RefrenceType instance.
// We can only copy the reference to a ReferenceType instance.
//
// Example:
//   ReferenceType a(2, "mock_name");
//   ReferenceType b = a;   // !!!DISALLOW!!! Won't compile!
//   ReferenceType& b = a;  // Allowed to reference a ReferenceType instance.
class ReferenceType {
 public:
  ReferenceType(int32_t id, std::string name)
      : id_(id), name_(std::move(name)) {}
  // Make the destructor virtual to allow children override it.
  virtual ~ReferenceType() = default;

  // Disallow copy
  ReferenceType(const ReferenceType&) = delete;
  ReferenceType& operator=(const ReferenceType&) = delete;

  // Allow move, move means steal the content of `other` instance.
  // Don't panic. Would talk about move in following chapters.
  ReferenceType(ReferenceType&& other)
      : id_(other.id_), name_(std::move(other.name_)) {
    other.id_ = 0;
  }
  ReferenceType& operator=(ReferenceType&& other) {
    id_ = other.id_;               // Cannot move int, it's a primitive type.
    other.id_ = 0;                 // Copy then clear the field.
    name_ = std::move(other.name_);  // Move std::string
  }
};

Summary: a reference-semantics type should:

  1. Disable copy
  2. Virtual destructor
  3. (Preferably) allow move

More content see https://isocpp.org/wiki/faq/value-vs-ref-semantics

Note

If move is disallowed—or copy exists but move not yet provided—explicitly delete the move ctor/assignment.

Detailed explanation see https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c21-if-you-define-or-delete-any-copy-move-or-destructor-function-define-or-delete-them-all

Creating Objects on the Heap

Recall Java: almost all value types are created on the stack, all reference types are created on the heap:

int a = 3;                   // Construct value type on the stack.
Integer b = new Integer(5);  // Construct reference type on the heap.

In C++ we handle it like this:

int a = 3;  // Create on stack.

// Always use smart pointers, never use `new` keyword & `delete` keyword.
// Don't panic. Would talk about it in following sections.
// Check following documents for further details:
// * https://en.cppreference.com/w/cpp/memory
// * https://www.stroustrup.com/C++11FAQ.html#std-unique_ptr
auto b =
    std::make_unique<Integer>(5);  // Create on heap & forbid ownership sharing.
auto c =
    std::make_shared<Integer>(7);  // Create on heap & allow ownership sharing.

Type Aliases

You can create a type alias; using it is identical to using the original. C had typedef; C++ added using (template-friendly). Prefer using.

// Deprecated
typedef int int32_t;

// Suggest
using int32_t = int;

// Template
template <typename T>
using MyArray = std::vector<T>;

Aliases are true synonyms—sometimes convenient, sometimes troublesome (e.g. two semantic roles A and B for the same underlying type but assignment between them is still allowed).

using Orange = int;
using Apple = int;

Apple apple(2);
Orange orange = apple;      // Orange should not be able to become an Apple.
Orange x = orange + apple;  // Shouldn't add Oranges and Apples.
if (orange > apple)
  ;  // Shouldn't compare Apples to Oranges.

void foo(Orange);
void foo(Apple);  // Redefinition.

Workaround: StrongAlias and proposal N3741.

Smart Pointers

std::unique_ptr and std::shared_ptr

std::shared_ptr and std::unique_ptr are “smart pointers” (contrast with raw pointers). Example raw pointer usage:

Integer* b = new Integer(5);
// Calculate with raw pointer b
// ...
delete b;  // !!!DO REMEMBER TO DELETE IT!!!

new allocates; later you must delete. Remembering every delete across complex control flow (early returns, exceptions) is error-prone. A smart pointer is a small value-type wrapper owning a raw pointer whose destructor releases the resource (RAII: Resource Acquisition Is Initialization).

std::unique_ptr models exclusive ownership. Copy is disabled. Default to it unless you specifically need shared lifetime (it can be promoted to std::shared_ptr via std::move, never the reverse). Its destructor invokes delete on the held pointer.

std::shared_ptr adds reference counting: copying increments a control block count; destruction decrements; reaching zero deletes the managed object.

Note

When using std::shared_ptr, be careful to avoid cyclic references (e.g. a->b->c->a).

Common solution: separate ownership (e.g. singly linked list):

std::vector<std::unique_ptr<LinkedNode>> nodes;
LinkedNode* head;

auto c = std::make_unique<LinkedNode>("c", /* next */ nullptr);
auto b = std::make_unique<LinkedNode>("b", /* next */ c.get());
auto a = std::make_unique<LinkedNode>("a", /* next */ b.get());
c->set_next(a.get());

head = a.get();

nodes.emplace_back(std::move(c));
nodes.emplace_back(std::move(b));
nodes.emplace_back(std::move(a));

Smart pointers can take custom deleters; see documentation.

std::shared_ptr and std::weak_ptr

A common scenario is you need to access an object: if it has not been destructed you do something with it; if it has been destructed then do nothing. A tricky point is when you determine the object has not been destructed you must ensure it will not suddenly be destructed during your computation; but if you hold its std::shared_ptr, then it will never be destructed because you still hold a reference.

Example: a service exposes healthy(); a watchdog monitors it without extending its lifetime unintentionally.

void NotifyUnhealthy();

class MyServiceImpl : public Service {
 public:
  Status Start() override;
  Status Stop() override;

  bool healthy() const override;
  absl::Time last_known_healthy_time() const override;
};

class MyServiceWatchdogServiceImpl : public Service {
 public:
  explicit MyServiceWatchdogServiceImpl(
      std::weak_ptr<MyServiceImpl> weak_my_service)
      : weak_my_service_(std::move(weak_my_service)) {}

  Status Start() override;
  Status Stop() override;

 private:
  void BackgroundTaskEntryPoint();

  std::unique_ptr<std::thread> background_thread_;
  absl::Notification stopping_notification_;

  std::weak_ptr<MyServiceImpl> weak_my_service_;
};

void MyServiceWatchdogServiceImpl::BackgroundTaskEntryPoint() {
  absl::optional<absl::Time> previous_known_healthy_time;
  while (
      !stopping_notification_.WaitForNotificationWithTimeout(kLoopInterval)) {
    std::shared_ptr<MyServiceImpl> shared_my_service = weak_my_service_.lock();
    if (!shared_my_service) {
      // The instance of my_service was destroyed.
      NotifyUnhealthy();
      break;
    }

    if (shared_my_service->healthy()) {
      if (previous_known_healthy_time.has_value() &&
          (shared_my_service->last_known_healthy_time() -
               previous_known_healthy_time.value() >
           kHealthyCheckFailureDuration)) {
        // Healthy state not updated for a while, regard it unhealthy.
        NotifyUnhealthy();
        break;
      }

      previous_known_healthy_time =
          shared_my_service->last_known_healthy_time();
    }
  }
}

Read-Only Types

In Java the final keyword can mark a variable as read-only, but this semantics is problematic especially for reference-type objects.

final AtomicLong a = new AtomicLong(3);

Meaning: you cannot rebind a to a new AtomicLong, but you can still mutate the pointed-to object's internal state.

If you need immutability of the underlying value, you would require an AtomicLongReadOnlyView. This pattern is common; C++ offers const qualification to carve out a read-only interface.

The const keyword qualifies the type to its left, unless there is no type on its left, in which case it qualifies the type immediately to its right. For example const A* and A const* are equivalent.

AtomicLong* const a = new AtomicLong(3);  // Equal to the Java example,
                                          // The const is against the pointer.

const AtomicLong* b =
    new AtomicLong(5);  // Disallow to modify the internal value of b,
                        // but allow to assign b to a new pointer.
                        // The const is against AtomicLong type.

// Just for example
class AtomicLong {
 public:
  int64_t value() const;          // This is a read-only method, visible for an
                                  // AtomicLong const type.
  void set_value(int64_t value);  // This is not a read-only method, invisible
                                  // for an AtomicLong const type.

 private:
  int64_t value_;
};

Constants

Common ways to define constants in C++ (in recommended order):

More content see https://abseil.io/tips/140

constexpr int32_t kTwo = 2;
const int32_t kTwo2 = 2;

enum : int32_t {  // or any other necessary integral type.
  kTwo3 = 2,
};

// Only take the following pattern as a last resort.
#define YOUR_PROJECT_PREFIX_TWO 2

Note

Do not define constants of type std::string.

Reasoning:

  1. The constructor of std::string does not support constexpr until C++20.
  2. The initialization order of global variables across translation units (.cc, .cpp files) is undefined behavior, leading to a problem where if another global variable depends on this variable for initialization, the result is indeterminate. (See https://isocpp.org/wiki/faq/ctors#static-init-order)
// !!! DON'T DO THIS !!!
constexpr std::string kHelloWorldMessage1 = "Hello World!";

// Do it this way.
constexpr char kHelloWorldMessage2[] = "Hello World!";

// In case you really need std::string or some complex types.
// constants.h
const std::string& GetHelloWorldMessage3();
// constants.cc
const std::string& GetHelloWorldMessage3() {
  static const std::string kHelloWorldMessage3 = "Hello World!";
  return kHelloWorldMessage3;
}

Prior to C++17 you could not define constants and their values in a header file; you could use the macro ABSL_INTERNAL_INLINE_CONSTEXPR as a hack.

// Macro: ABSL_INTERNAL_INLINE_CONSTEXPR(type, name, init)
//
// Description:
//   Expands to the equivalent of an inline constexpr instance of the specified
//   `type` and `name`, initialized to the value `init`. If the compiler being
//   used is detected as supporting actual inline variables as a language
//   feature, then the macro expands to an actual inline variable definition.
//
// Requires:
//   `type` is a type that is usable in an extern variable declaration.
//
// Requires: `name` is a valid identifier
//
// Requires:
//   `init` is an expression that can be used in the following definition:
//     constexpr type name = init;
//
// Usage:
//
//   // Equivalent to: `inline constexpr size_t variant_npos = -1;`
//   ABSL_INTERNAL_INLINE_CONSTEXPR(size_t, variant_npos, -1);
//
// Differences in implementation:
//   For a direct, language-level inline variable, decltype(name) will be the
//   type that was specified along with const qualification, whereas for
//   emulated inline variables, decltype(name) may be different (in practice
//   it will likely be a reference type).

Type Conversions

https://www.modernescpp.com/index.php/c-core-guidelines-rules-for-conversions-and-casts

The C type system is messy; C++ inherits the baggage. Avoid conversions unless necessary.

The Four C++-Style Cast Operators

C-style casts are overly powerful. C++ splits them into:

  • const_cast: can only add or remove the const qualifier.
  • reinterpret_cast: used to convert pointer types (actually does nothing, just changes how we interpret the pointed-to content), or between integers and pointers (sometimes we want to use an integer to represent a pointer, e.g. taking the address directly as its hash value).
  • dynamic_cast: used to convert between pointers, but checks at runtime via RTTI (Runtime Type Information) whether the conversion type is correct; if not, returns nullptr.

Prefer the specialized cast operators; fall back to static_cast only when necessary.

Be Careful with const_cast

Note

If the object is truly const (or you cannot prove otherwise) removing const and mutating it is undefined behavior.

Converting Base Class Pointer/Reference to Derived Class Pointer/Reference

dynamic_cast downcasts with runtime checking. Often logic already guarantees correctness; then use it only in Debug, and static_cast in Release. Benefits:

  1. No unnecessary runtime type checks; better performance
  2. Avoid using RTTI information; resulting code can be smaller

Helper pattern:

// Use implicit_cast as a safe version of static_cast or const_cast
// for upcasting in the type hierarchy (i.e. casting a pointer to Foo
// to a pointer to SuperclassOfFoo or casting a pointer to Foo to
// a const pointer to Foo).
// When you use implicit_cast, the compiler checks that the cast is safe.
// Such explicit implicit_casts are necessary in surprisingly many
// situations where C++ demands an exact type match instead of an
// argument type convertible to a target type.
//
// The From type can be inferred, so the preferred syntax for using
// implicit_cast is the same as for static_cast etc.:
//
//   implicit_cast<ToType>(expr)
//
// implicit_cast would have been part of the C++ standard library,
// but the proposal was submitted too late.  It will probably make
// its way into the language in the future.
template<typename To, typename From>
inline To implicit_cast(From const &f) {
  return f;
}

// When you upcast (that is, cast a pointer from type Foo to type
// SuperclassOfFoo), it's fine to use implicit_cast<>, since upcasts
// always succeed.  When you downcast (that is, cast a pointer from
// type Foo to type SubclassOfFoo), static_cast<> isn't safe, because
// how do you know the pointer is really of type SubclassOfFoo?  It
// could be a bare Foo, or of type DifferentSubclassOfFoo.  Thus,
// when you downcast, you should use this macro.  In debug mode, we
// use dynamic_cast<> to double-check the downcast is legal (we die
// if it's not).  In normal mode, we do the efficient static_cast<>
// instead.  Thus, it's important to test in debug mode to make sure
// the cast is legal!
//    This is the only place in the code we should use dynamic_cast<>.
// In particular, you SHOULDN'T be using dynamic_cast<> in order to
// do RTTI (eg code like this:
//    if (dynamic_cast<Subclass1>(foo)) HandleASubclass1Object(foo);
//    if (dynamic_cast<Subclass2>(foo)) HandleASubclass2Object(foo);
// You should design the code some other way not to need this.

template<typename To, typename From>     // use like this: down_cast<T*>(foo);
inline To down_cast(From* f) {           // so we only accept pointers
  // Ensures that To is a sub-type of From *.  This test is here only
  // for compile-time type checking, and has no overhead in an
  // optimized build at run-time, as it will be optimized away
  // completely.
  if (false) {
    implicit_cast<From*, To>(0);
  }

#if !defined(NDEBUG) && YOUR_PROJECT_ENABLED_RTTI
  assert(f == nullptr || dynamic_cast<To>(f) != nullptr);  // RTTI: debug mode only!
#endif
  return static_cast<To>(f);
}

template<typename To, typename From>    // use like this: down_cast<T&>(foo);
inline To down_cast(From& f) {
  typedef typename std::remove_reference<To>::type* ToAsPointer;
  // Ensures that To is a sub-type of From *.  This test is here only
  // for compile-time type checking, and has no overhead in an
  // optimized build at run-time, as it will be optimized away
  // completely.
  if (false) {
    implicit_cast<From*, ToAsPointer>(0);
  }

#if !defined(NDEBUG) && YOUR_PROJECT_ENABLED_RTTI
  // RTTI: debug mode only!
  assert(dynamic_cast<ToAsPointer>(&f) != nullptr);
#endif
  return *static_cast<ToAsPointer>(&f);
}

bit_cast

Sometimes we need raw bit reinterpretation (e.g. treat 64 bits as double). bit_cast (C++20) provides this; before that use memcpy.

// bit_cast<Dest,Source> is a template function that implements the equivalent
// of "*reinterpret_cast<Dest*>(&source)".  We need this in very low-level
// functions like the protobuf library and fast math support.
//
//   float f = 3.14159265358979;
//   int i = bit_cast<int32_t>(f);
//   // i = 0x40490fdb
//
// The classical address-casting method is:
//
//   // WRONG
//   float f = 3.14159265358979;            // WRONG
//   int i = *reinterpret_cast<int*>(&f);   // WRONG
//
// The address-casting method actually produces undefined behavior according to
// the ISO C++98 specification, section 3.10 ("basic.lval"), paragraph 15.
// (This did not substantially change in C++11.)  Roughly, this section says: if
// an object in memory has one type, and a program accesses it with a different
// type, then the result is undefined behavior for most values of "different
// type".
//
// This is true for any cast syntax, either *(int*)&f or
// *reinterpret_cast<int*>(&f).  And it is particularly true for conversions
// between integral lvalues and floating-point lvalues.
//
// The purpose of this paragraph is to allow optimizing compilers to assume that
// expressions with different types refer to different memory.  Compilers are
// known to take advantage of this.  So a non-conforming program quietly
// produces wildly incorrect output.
//
// The problem is not the use of reinterpret_cast.  The problem is type punning:
// holding an object in memory of one type and reading its bits back using a
// different type.
//
// The C++ standard is more subtle and complex than this, but that is the basic
// idea.
//
// Anyways ...
//
// bit_cast<> calls memcpy() which is blessed by the standard, especially by the
// example in section 3.9 .  Also, of course, bit_cast<> wraps up the nasty
// logic in one place.
//
// Fortunately memcpy() is very fast.  In optimized mode, compilers replace
// calls to memcpy() with inline object code when the size argument is a
// compile-time constant.  On a 32-bit system, memcpy(d,s,4) compiles to one
// load and one store, and memcpy(d,s,8) compiles to two loads and two stores.
template <class Dest, class Source>
inline Dest bit_cast(const Source& source) {
  static_assert(sizeof(Dest) == sizeof(Source),
                "bit_cast requires source and destination to be the same size");
  static_assert(base::is_trivially_copyable<Dest>::value,
                "bit_cast requires the destination type to be copyable");
  static_assert(base::is_trivially_copyable<Source>::value,
                "bit_cast requires the source type to be copyable");

  Dest dest;
  memcpy(&dest, &source, sizeof(dest));
  return dest;
}

Smart Pointer Type Conversions

std::unique_ptr can convert to std::shared_ptr, but not vice versa

Thus prefer returning std::unique_ptr.

std::unique_ptr<Animal> MakeAnimal();

std::shared_ptr<Animal> animal = MakeAnimal();

std::unique_ptr<Animal> uniq_animal = MakeAnimal();
std::shared_ptr<Animal> shared_animal = std::move(uniq_animal);

*_pointer_cast

https://en.cppreference.com/w/cpp/memory/shared_ptr/pointer_cast

std::static_pointer_cast, std::dynamic_pointer_cast, std::const_pointer_cast, std::reinterpret_pointer_cast mirror the raw casts for shared_ptr. A custom down_pointer_cast can also be defined.

std::shared_ptr<Animal> animal_dog = MakeDog();
std::shared_ptr<Dog> dog = std::static_pointer_cast<Dog>(animal_dog);

Narrowing Conversions

Implicit narrowing (e.g. doubleint) still occurs with only a warning:

double d = 7.9;
int i = d;    // bad: narrowing: i becomes 7
i = (int) d;  // bad: we're going to claim this is still not explicit enough

Hence treat warnings seriously and eliminate them.

To prevent implicit narrowing: prefer brace initialization over parentheses:

class IntType {
 public:
  explicit IntType(int v);
  // ...
};

double d = 7.9;
IntType i1(d);  // bad: narrowing
IntType i2{d};  // Won't compile!
int i = d;      // bad: narrowing
int i{d};       // Won't compile!

When a narrowing conversion is intentional:

double d = 7.9;

// If you included GSL in your project
// https://github.com/microsoft/GSL
int i = gsl::narrow_cast<int>(d);

// Make your version of narrow_cast
// narrow_cast(): a searchable way to do narrowing casts of values
template <class T, class U>
constexpr T narrow_cast(U&& u) noexcept {
  return static_cast<T>(std::forward<U>(u));
}

// static_cast if neither include GSL nor make your version of narrow_cast
int i2 = static_cast<int>(d);

Generics (C++ Templates)

This is a beginner's guide: only basic generics, no template metaprogramming (many patterns are moving toward constexpr).

Tip: writing a simple recursive algorithm (e.g. Fibonacci) in Haskell helps internalize template meta "pattern matching + recursion" mechanics.

Generic Methods

public <T> java.util.List<T> fromArrayToList(T[] a) {
    return java.util.Arrays.stream(a).collect(java.util.stream.Collectors.toList());
}
template <typename T>
std::vector<T> fromArrayToList(const T* arr, int64_t size) {
  return std::vector<T>(arr, arr + size);
}

Generic Classes

public interface List<E> {
    void add(E x);
    Iterator<E> iterator();
}
template <typename T>
class List {
 public:
  virtual ~List() = default;

  virtual void Add(T element) = 0;
  template <typename Iterator>
  Iterator iterator() const;  // Placeholder; real Iterator type omitted.
};

Syntax Limitations in C++

All Must Be Written in Header Files

Details on headers later. For templates definitions must be visible (typically in headers). Workaround: put implementation in an .inc and include it from the header.

// vector.h
template <typename T>
class FixedArray {
 public:
  // omitted: constructor, destructor, copy/move, etc.

  void resize(int32_t new_size);

 private:
  std::unique_ptr<T[]> data_;
  int32_t size_;
};

#include "vector.inc"  // NOLINT

// vector.inc
template <typename T>
void FixedArray<T>::resize(int32_t new_size) {
  // ...
}

Constraining Types Is Complicated

As of 2021-05-14 Concepts are not widely adopted; constraining templates is clumsy via:

  1. Use static_assert for compile-time checks.
  2. Use std::enable_if to disable non-conforming branches during template instantiation.

Example:

public <T extends Number> List<T> fromArrayToList(T[] a) {
    // ...
}
template <typename T>
std::vector<T> fromArrayToList(const T* arr, int64_t size) {
  static_assert(std::is_integral<T>::value, "|T| must be an integral type.");
  return std::vector<T>(arr, arr + size);
}
// If the |cond| satisfied, the |std::enable_if<cond>::type| is a |void|, else
// invalid to evaluate it. So |typename std::enable_if<cond>::type*| is either
// |void*| or invalid.
//
// |template <void* ignored = nullptr>| is another usage of C++ template. We
// won't introduce it in this article. Just use it as an idiom here.
template <typename T,
          typename std::enable_if<std::is_integral<T>::value>::type* = nullptr>
std::vector<T> fromArrayToList(const T* arr, int64_t size) {
  return std::vector<T>(arr, arr + size);
}

// Another useful idiom is |absl::void_t<decltype(...your complex
// condition...)>|
//
// template <typename Container, typename Element, typename = void>
// struct HasFindWithNpos : std::false_type {};
//
// template <typename Container, typename Element>
// struct HasFindWithNpos<
//     Container, Element,
//     absl::void_t<decltype(std::declval<const Container&>().find(
//                               std::declval<const Element&>()) !=
//                           Container::npos)>> : std::true_type {};

Covariance and Contravariance

We construct a partial order between types via inheritance: if type A inherits from Base, then A ≤ Base.

TermMeaning
CovariancePreserves the ≤ ordering
ContravarianceReverses the ≤ ordering
InvarianceNeither of the above applies

For methods this separates into parameter vs return variance.

TODO

Probably most Java users haven't needed formal variance terms; skipping detailed treatment for now. References below.

Templates, covariance and contravariance

TypeCovariantContravariant
STL containersNoNo
std::initializer_list<T *>NoNo
std::future<T>NoNo
boost::optional<T>No (see note below)No
std::optional<T>No (see note below)No
std::shared_ptr<T>YesNo
std::unique_ptr<T>YesNo
std::pair<T *, U *>YesNo
std::tuple<T *, U *>YesNo
std::atomic<T *>YesNo
std::function<R *(T *)>Yes (in return)Yes (in arguments)

Note: boost::optional<T> and std::optional<T> are not covariant in general because they preserve the value semantics of T; some contextual usages may appear covariant but should not be relied upon.

Strings

Treat std::string primarily as a byte container; encoding concerns are largely absent. Its API differs from other containers; be cautious. (Chromium's stl_utils offers helpers like Contains.)

Note

If you need to process Unicode characters, absolutely do not use std::string.

You can consider using the UnicodeString type from the ICU project.

Enums

Java enums are rich (methods, fields). C/C++ enums are simple integer sets. Prefer scoped enums; classic enums leak names and lack type safety.

// 基本的 old-style 与 scoped-enum 对比示例
// Deprecated style.
enum Fruit {
  FRUIT_UNSPECIFIED = 0,
  FRUIT_APPLE = 1,
};

// |FRUIT_UNSPECIFIED| is visible.

// Suggested style. The only difference is |class| keyword.
enum class FruitNew {
  kUnspecified = 0,
  kApple = 1,
};

// |kUnspecified| is invisible. You need to reference it as
// |FruitNew::kUnspecified|.

You can also specify the underlying integer type.

enum FruitChar : char { /* omitted */ };

enum Animal64 : int64_t { /* omitted */ };

Legacy style sometimes appears for constants; avoid unless necessary.

class HeaderOnlyClass {
  enum { kDefaultAnswer = 42 };

 public:
  // ...

 private:
  int answer_{kDefaultAnswer};
};

std::variant and Tagged Union

Java lacks a native union; C/C++ unions let one memory region represent one of several types (mutually exclusive).

union UnionStorage {
  int32_t n;      // occupies 4 bytes
  uint16_t s[2];  // occupies 4 bytes
  uint8_t c;      // occupies 1 byte
};  // the whole union occupies 4 bytes

To track the active member, add an explicit tag enum.

enum class UnionStorageDataType {
  kUnspecified = 0,
  kInt32 = 1,
  kUint16Array = 2,
  kUint8 = 3,
};

struct UnionStorage {
  UnionStorageDataType data_type;
  union {
    int32_t n;
    uint16_t s[2];
    uint8_t c;
  } data;
};

Old unions struggle with non-trivial types (e.g. std::string). Prefer std::variant:

using UnionStorage =
    absl::variant<absl::monostate /* for empty case */, uint64_t, std::string,
                  double, std::vector<MyClass>>;

UnionStorage s{3.14};
double* d = absl::get_if<double>(&s);

std::optional

Null pointers historically modeled optional values in C/C++/Java—error-prone. See the classic talk:

Null References: The Billion Dollar Mistake

Prefer references over pointers when non-null is required. For maybe-absent values use std::optional (absl::optional) to make emptiness explicit.

absl::optional<Pie> MakePie() {
  bool failed = /* ... */;
  if (failed) {
    return absl::nullopt;
  }

  return Pie();
}

std::function

Analogous to Java's java.util.function, but allows arbitrary arity and treats all types uniformly.

Consumer<String> c = s -> System.out.println(s);
Predicate<String> p = s -> s.isEmpty();
// Function, BiFunction, ...
std::function<void(absl::string_view)> c = [](absl::string_view s) {
  LOG(INFO) << s;
};
std::function<bool(absl::string_view)> p = [](absl::string_view s) {
  return s.empty();
};

std::function<Status(int64_t /* id */, absl::string_view /* name */, Gender)>
    f = [](int64_t id, absl::string_view name, Gender gender) {
      // ...
    };

Lambdas have unique closure types; std::function type-erases them. Copying a std::function copies captured state.

Note

Avoid std::bind; see https://abseil.io/tips/108

std::function cannot hold move-only callables; lambdas can capture move-only objects. A move-only wrapper requires a custom workaround:

MoveOnlyInt v(1);
auto lambda = [v = std::move(v)]() { LOG(INFO) << v.value(); };
std::function<void()> f = [lambda = std::make_shared<decltype(lambda)>(
                               std::move(lambda))]() { lambda(); };

Comments