August 5, 2014

Return Value Optimization

Introduction

Return Value Optimization (RVO) is an important use of copy elision in C++. RVO is often used when a function returns a potentially big class object.

Although not part of the C++ standard, most calling conventions today for C++ functions returning a class object is more or less like this:

  • a. The caller provides enough uninitialized memory to hold the return object. Its address is passed as hidden first parameter to the function in addition to the parameters of the function.
  • b. The callee is responsible to construct the return object at the given address, and returns that address.
  • c. The caller access the address for the return value object. The object is an unnamed temporary.

Return Value Optimization (RVO) is about how the callee constructs the return object at the given address in step b) with minimal overhead.

The basic idea of RVO is to avoid creating a separate class object in the function implementation, by creating the object directly in the caller provided object memory. This is exactly using copy elision.

[Side note: If the returned value (an rvalue temporary) is used to initialize another object by the caller, a temporary object copy elision may be chained, i.e., the callee may directly construct on the final object.]

There are two types of RVOs: Unnamed Return Value Optimization (URVO) and Named Return Value Optimization (NRVO).

Unnamed Return Value Optimization

This is an example of Unnamed Return Value Optimization:

#include <iostream>

struct Foo
{
    Foo(int n) : num(n) {}
    Foo(const Foo&) { std::cout << "Foo copy constructor called" << std::endl; }
    int num;
};

Foo bar(bool ok)
{
    if (ok)
        return Foo(42); // URVO
    else
        return Foo(0);  // URVO
}

int main(int, char**)
{
    bar(true);
    return 0;
}

Notice that all the return paths must return an rvalue temporary object in URVO. With URVO, the temporary would be constructed directly on the return value object, and no copy/move construction is necessary. Even though this is a return statement, in fact it is temporary object copy elision.

In this example, both Debug and Release builds with my Visual Studio 2013 perform URVO and print nothing. If RVO is not performed, the copy constructor would be called and it should print Foo copy constructor called.

Named Return Value Optimization

This is an example of Named Return Value Optimization:

#include <iostream>

struct Foo
{
    Foo(int n) : num(n) {}
    Foo(const Foo&) { std::cout << "Foo copy constructor called" << std::endl; }
    int num;
};

Foo bar(bool ok)
{
    Foo a(0);
    if (ok)
    {
        a.num = 42;
        return a;   // NRVO: a
    }
    return a;       // NRVO: a
}

int main(int, char**)
{
    bar(true);
    return 0;
}

Notice that all the return paths must return the same local class object in NRVO, a, in this example. This allows the exact local object to collapse on the return value object, i.e., perform return value copy elision. For this example, Visual Studio 2013 performs NRVO in Release build so it does not print anything. In Debug build it prints one Foo copy constructor called.

This is an example that NRVO cannot be performed (practically):

#include <iostream>

struct Foo
{
    Foo(int n) : num(n) {}
    Foo(const Foo&) { std::cout << "Foo copy constructor called" << std::endl; }
    int num;
};

Foo bar(bool ok)
{
    Foo a(42);
    Foo b(0);
    if (ok)
        return a;   // No NRVO: a
    return b;       // No NRVO: b
}

int main(int, char**)
{
    bar(true);
    return 0;
}

The two paths would return different local variables, a and b. Since a and b are distinct objects, the compiler cannot collapse both of them to the return value object. It is practically impossible to perform copy elision in all paths. For this example, Visual Studio 2013 cannot perform RVO, and both Release and Debug builds print Foo copy constructor called.

Conclusion

With C++11’s move support, people are talking about the revival of returning value objects. It is true that moving can be much cheaper than copying. However if you are careful and write RVO enabled code, it can beat move, because it is completely free.

As Andrei Alexandrescu said in Going Native 2013, “No work is less work than some work.” To make your code RVO enabled, use either:

  • URVO: all paths return rvalues, or
  • NRVO: all paths return the same local object.
August 5, 2014

C++ Copy Elision

Copying big class objects could be expensive. Moving could be much cheaper. C++ has special rules to allow implementation to omit the copy/move construction of a class object. The purpose to elide the copy/move construction is for better efficiency. This is called copy elision.

There are two class objects in question for copy elision: the source object and the destination object. The source object is used to initialize the destination object. When copy elision is carried out, the source and destination objects collapse into one object, therefore in the code they appear like the aliases of the same single object. This object starts its life when the source object is constructed, and ends when the destination is destructed. With each copy elision, one construction and one matching destruction are omitted for run time saving, and one object is not created for space saving.

Copy elision is optional. The C++ compiler may choose to elide or perform the copy/move construction when copy elision is legally allowed.

The most often seen opportunities of copy elisions are with a function return statement, and copy/move constructing a class object with a temporary object of the same type.

Return Value Copy Elision

C++2011 12.8/31 allows return value copy elision:

– in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value

In general, a function’s return value is a distinct object. The rule above says, when the return expression is the name of a local object in the function, whose stripped class type is the same as the return value, then copy elision is allowed. This is possible because this local object is not used otherwise in the function beyond the return statement, so it can be directly constructed as the return value object, i.e., the two objects can be one same thing. This is an example:

#include <iostream>

struct Foo
{
    Foo() {}
    Foo(const Foo&) { std::cout << "Foo copy constructor called" << std::endl; }
};

Foo bar()
{
    Foo a;
    return a;   // copy elision
}

int main(int, char**)
{
    bar();
    return 0;
}

In the bar function, the return statement is copy elision eligible for source local object a and destination return value object. At the main function’s bar call, a return value object is constructed (although it is an unnamed temporary that destructs right away upon the end of bar call code line). When copy elision is in action, the bar function’s local object a is constructed in-line in the main function’s return value object for bar call, and there is no copy constructor call. If the implementation chooses to not use copy elision, then the local object a is a separate object, and at the return statement, Foo’s copy constructor is called to construct the unnamed return value object in the main function for bar call. In my Visual Studio 2013, the Debug build does not perform copy elision and it prints Foo copy constructor called; while the Release build performs copy elision and nothing is printed.

This is an example that is not eligible for copy elision:

#include <iostream>

struct Foo
{
    Foo() {}
    Foo(const Foo&) { std::cout << "Foo copy constructor called" << std::endl; }
};

Foo bar(Foo f)
{
    return f;   // no copy elision
}

int main(int, char**)
{
    bar(Foo());
    return 0;
}

The parameter f of function bar, although considered local in the function, is however not possible to be the elision source of return value. Conceptually the parameter f is constructed by the caller of the bar function before the function is actually called. When the function body starts to run, that parameter object already exists. The return value class object of the bar function, which actually lives in the stack frame of the caller, the main function, has a distinct address than the parameter f. Because they cannot collapse to one address, copy elision is impossible. Since copy elision is not possible, both Debug and Release builds with my Visual Studio 2013 print Foo copy constructor called in this example.

Temporary Object Copy Elision

C++2011 12.8/31 also allows temporary object copy elision:

-- when a temporary class object that has not been bound to a reference would be copied/moved to a class object with the same cv-unqualified type, the copy/move operation can be omitted by constructing the temporary object directly into the target of the omitted copy/move

This is easy to understand. If you are going to copy/move construct an object from a temporary class object of the same stripped type, you are allowed to construct the temporary in-place in the destination object and avoid the copy/move operation. See this example:

#include <iostream>

struct Foo
{
    Foo() {}
    Foo(const Foo&) { std::cout << "Foo copy constructor called" << std::endl; }
};

int main(int, char**)
{
    Foo a = Foo();  // copy elision
    return 0;
}

In the main function, if there is no copy elision, a temporary Foo object is first default constructed, then the local Foo object a is copy constructed. If copy elision is performed, the temporary object is right constructed in a’s spot, so there is no copy constructor call. Both Debug and Release builds with my Visual Studio 2013 perform copy elision and print nothing in this example.

Chained Copy Elisions

The copy elisions can be combined/chained. For example:

#include <iostream>

struct Foo
{
    Foo() {}
    Foo(const Foo&) { std::cout << "Foo copy constructor called" << std::endl; }
};

Foo bar()
{
    Foo a;
    return a;   // copy elision
}

int main(int, char**)
{
    Foo b = bar();  // copy elision
    return 0;
}

There are two possible copy elisions in the example. The bar function’s return statement can use return value copy elision, as seen in the earlier example. The construction of b in main function can use copy elision from the temporary object, i.e., the return value of bar call. So a valid C++ compiler can choose any of these:

  • Use no copy elision. There are two Foo copy constructor calls. The program would print two lines of Foo copy constructor called.
  • Use one copy elision at either locations. There is one Foo copy constructor call. The program would print one line of Foo copy constructor called. [Visual Studio 2013, Debug build]
  • Use two copy elisions. There is no Foo copy constructor calls. The program would print nothing. [Visual Studio 2013, Release build]

Because C++ is mostly a value based language, rather than reference based languages such as Java and C#, function calls and other expressions may involve temporary objects without your much thinking. Combining and chaining of copy elisions can save significantly for C++ programs.

Conclusion

  • Return value copy elision omits one object construction at return statement.
  • Temporary object copy elision reuses temporary object in constructing a new object.
  • Chaining copy elisions saves even more.

Also:

  • Copy elision is optional. The compiler may or may not perform it. Therefore,
  • If the copy/move constructor has side effects, the side effects are not guaranteed to happen or not happen (as is apparent in the examples above with printing through std::cout). Just do not count on that.

This blog post does not discuss about the copy elisions with throw and catch statements. But essentially, they are similar to return value copy elisions and temporary object copy elisions. You can refer to C++2011 12.8/31 for details.

Today’s compilers are very sophisticated, and most of them would perform copy elision when generating optimized code. An important use of copy elision is Return Value Optimization (RVO), see another post.

August 2, 2014

Power Reset GE Dishwasher That Won’t Run

My GE dishwasher (Model PDWT480V00SS) would not run. I could choose the program, such as Auto/DeepClean/Normal etc, but Time Remaining always stays on 1h. After I closed the door and pressed Start, the Start/Reset LED came on, but the Cycle Status Indicator LED on front door remained off, and the washer did not run. It seemed stuck in 1h no matter what I tried.

I remember when it worked, Time Remaining should change from 100 to 70 (minutes) for example when I changed the program from Auto to Normal. And when door was closed and Start pressed, the Cycle Status Indicator LED would turn amber and I could hear the dishwasher’s water noise, meaning that it’s running.

I checked the Owner’s Manual. The control panel was not locked. I was able to lock and unlock the panel by pressing/holding both Heated Dry and Delay Hours for 3 seconds, without problem. It’s just that Time Remaining was stuck at 1h and the washer would not start to run.

In the end, I tried power reset: unplug the dishwasher from power outlet for 1 minute (30 seconds according to owner’s manual), then plug it back. Upon receiving power, the panel no longer displays the persisting 1h, instead it displays 100. Now everything functions normally.

Conclusion: If the GE Dishwasher is stuck with Time Remaining 1h and does not run, try to reset it by unplugging the power for 1 minute and plugging back on.

July 29, 2014

On “Bit-Oriented I/O with Templates”

Today’s Dr.Dobb’s C/C++ Column published an article “Bit-Oriented I/O with Templates” by Mark Nelson. The article talks about using templates instead of traditional OOP’s virtual functions to enable abstraction and efficiency. With respect to the specific example given in the article, the author uses std::enable_if and other techniques to create a use case that only std::istream/ostream objects are allowed as the backing I/O. I think that is a bit of over-engineering.

The core component in the article is the compressor class template:

// requirements:
//   int  INPUT::getByte();
//   void OUTPUT::putByte(char);
template<typename INPUT, typename OUTPUT>
class compressor
{
public :
  compressor(INPUT &input, OUTPUT &output );
  ...
protected:
  void putBit( bool val ) 
  {
    // compression engine.
    // will call getByte() / putByte() when necessary.
  }
};

By delegating the actual byte-oriented I/O (as there is rarely real bit-oriented I/O) to the INPUT and OUTPUT abstractions, the compressor separates the compression algorithm from actual I/O so as to achieve modularity. By using INPUT and OUTPUT as template parameters, the compression code also achieves efficiency; otherwise using virtual classes would incur virtual function call overheads.

Now to use compressor with std::istream and std::ostream objects, the author creates a helper function compress, and then uses techniques such as partial template specialization, std::enable_if and SFINAE to make sure only std::istream and std::ostream objects (or derived class objects) can be used. The code structure is like:

// with template tricks to restrict T : std::istream or its derived classes
template<typename T>
class input_bytes
{
public :
  int getByte();
};

// with template tricks to restrict T : std::ostream or its derived classes
template<typename T>
class output_bytes
{
public :
  void putByte(char c);
};

// the helper function that user directly invokes
// requirements:
//   INPUT:  std::istream or its derived classes
//   OUTPUT: std::ostream or its derived classes
template<typename INPUT, typename OUTPUT>
int compress(INPUT &source, OUTPUT &target)
{
  input_bytes<INPUT> in(source);
  output_bytes<OUTPUT> out(target);
  compressor<input_bytes<INPUT>,output_bytes<OUTPUT> > c(in,out);
  return c();
}

Since the compress function above can only take an std::istream object as the input source and an std::ostream object as output target, I do not see why it needs to be a function template and why it needs to take template parameters. It could be simply:

// the helper function that user directly invokes
int compress(std::istream& source, std::ostream& target)
{
  input_bytes_istream in(source);
  output_bytes_ostream out(target);
  compressor<input_bytes_istream, output_bytes_ostream> c(in,out);
  return c();
}

By getting rid of the template parameters, the user would clearly see what can be passed to the function: std::istream/ostream (or their derived classes). The helper I/O classes do not need to be template classes. They are very simple adaptors for std::istream and std::ostream:

// models compressor::INPUT
struct input_bytes_istream
{
  inline input_bytes_istream(std::istream& s) : m_stream(s) {}
  inline int getByte() { return m_stream.get(); }
private:
  std::istream& m_stream;
};

// models compressor::OUTPUT
struct output_bytes_ostream
{
  inline output_bytes_ostream(std::ostream& s) : m_stream(s) {}
  inline void putByte(char c) { m_stream.put(c); }
private:
  std::ostream& m_stream;
};

These are small and simple classes with inlined constructors and getByte/putByte functions. I would expect the overhead is minimal and on par with the templated version in the article.

I can only think of one possible benefit to use the complex template approach, which is not specifically mentioned in the article. When there are multiple byte-oriented I/O devices, users need to create different specializations of input_bytes/output_bytes for these devices, but then use the same compress helper function template. The input_bytes/output_bytes templates shape the contract between the compress helper function and the actual devices. If users choose to use the simple approach as illustrated in this post, the bar to use a new I/O device is much lower. They also need to create different adaptor classes, but these are simple non-template classes. The drawback is that they need to create different versions of the compress helper functions which use these adaptor classes. Depending on how many different kinds of I/O devices need to be supported, one can choose to use the simple non-template approach or complex template one. Remember that std::istream/ostream is already a pretty good abstraction for different I/O devices, and maybe there is not too much need to support other kinds. In the end, even they need to create different implementations of the compress helper functions, they can be overloaded, so their users can still enjoy a simple uniform interface.

Basically, the use of templates in the compressor class template already provides the key benefits of both abstraction and efficiency. Over-engineering with complex template tricks does not show additional benefits for the peripheral constructs in the example above.

July 28, 2014

Static Dispatch, std::enable_if and static_assert

In a previous post I talked about static dispatch using partial class template specialization and function overloading.

It is also possible to use partial template specialization and std::enable_if to perform static dispatch. std::enable_if is used in conjunction with SFINAE to remove ineligible specialization from be considered when resolving a template. std::enable_if is especially useful when a group of classes should be removed, where the group eligibility is given to the first parameter of std::enable_if. For example:

#include <iostream>
#include <type_traits>

using namespace std;

struct Base
{};

// primary class template
template<typename T, typename Enable = void>
struct C
{
    void foo() { cout << "primary" << endl; }
};

// specialized class template
template<typename T>
struct C<T, typename std::enable_if<std::is_base_of<Base, T>::value>::type>
{
    void foo() { cout << "specialized" << endl; }
};

int main(int argc, char* argv[])
{
    C<int> a;
    a.foo();	// print "primary"
    C<Base> b;
    b.foo();	// print "specialized"
    return 0;
}

In the example above, an arbitrary T would go with the primary class template. Only when T is Base or a class derived from Base, the specialized class template is instantiated. The reason is because an arbitrary T fails std::is_base_of, and therefore std::enable_if would be an error. According to SFINAE, the specialized C is silently removed by the C++ compiler for T due to the error, so it can only be resolved to the primary class template with whatever default argument for the second template parameter (void here). On the other hand, if T is Base or Base derived, the specialization is valid, and it is more “specialized” than the primary, so the compiler would pick the specialization than the primary. Notice that enable_if’s default embedded type is void, so the specialization’s second template parameter is also void, as in the default second template parameter of the primary – yet the specialization and primary can have completely different implementations.

In a lot of cases, the purpose is not just to differentiate (static dispatch), but to prohibit certain template arguments to be passed to the class template. To disallow non-Base classes, one may think something like this:

#include <iostream>
#include <type_traits>

using namespace std;

struct Base
{};

// primary class template
template<typename T, typename Enable = void>
struct C
{
    static_assert(false, "arbitrary T not allowed"); // Does NOT work!
    void foo() { cout << "primary" << endl; }
};


// specialized class template
template<typename T>
struct C<T, typename std::enable_if<std::is_base_of<Base, T>::value>::type>
{
    void foo() { cout << "specialized" << endl; }
};


int main(int argc, char* argv[])
{
    C<Base> b;
    b.foo();
    return 0;
}

Unfortunately, the static_assert fires when the code is compiled, even though the primary template is not picked when resolving C<Base>. Even code below that does not instantiate any template would fail on static_assert:

#include <iostream>
#include <type_traits>

using namespace std;

struct Base
{};

// primary class template
template<typename T, typename Enable = void>
struct C
{
    static_assert(false, "arbitrary T not allowed"); // Does NOT work!
    void foo() { cout << "primary" << endl; }
};


// specialized class template
template<typename T>
struct C<T, typename std::enable_if<std::is_base_of<Base, T>::value>::type>
{
    void foo() { cout << "specialized" << endl; }
};


int main(int argc, char* argv[])
{
    return 0;
}

Obviously, the static_assert in primary C does not use any template parameters, so the compiler runs through it unconditionally. If I move the static_assert to the body of foo(), this would work as expected:

#include <iostream>
#include <type_traits>

using namespace std;

struct Base
{};

// primary class template
template<typename T, typename Enable = void>
struct C
{
    void foo() 
    {
        static_assert(false, "arbitrary T not allowed");
        cout << "primary" << endl; 
    }
};


// specialized class template
template<typename T>
struct C<T, typename std::enable_if<std::is_base_of<Base, T>::value>::type>
{
    void foo() { cout << "specialized" << endl; }
};


int main(int argc, char* argv[])
{
    C<int> a;
    //a.foo();  // this line would fire the static_assert!
    C<Base> b;
    b.foo();
    return 0;
}

This is because foo is lazily made available only when the compiler first sees that foo is called. However, the draw back is that I can only disallow on the member function granularity. But I want to disallow the whole class!

It turns out that it’s much simpler to achieve that:

#include <iostream>
#include <type_traits>

using namespace std;

struct Base
{};

// class template
template<typename T>
struct C
{
    static_assert(std::is_base_of<Base, T>::value, "only Base or its derived allowed");
    void foo() { cout << "okay" << endl; }
};

int main(int argc, char* argv[])
{
    //C<int> a; // error: only Base or its derived allowed
    C<Base> b;
    return 0;
}

I can certainly construct more complex expressions in static_assert to define my allowed group of T.

July 14, 2014

Backflow Preventer Water Discharge and Vibration

I was working on the some new irrigation sprinkler heads in the backyard. They were with the factory nozzle caps that had a hole for flushing dirt. I turned on the zone valve from the sprinkler controller, flushed the heads, and turned off the valve. I then tried on some other regular sprinklers. Now I was told that something was wrong at the front yard.

I went there and found the backflow preventer from public water supply near my curb was discharging a lot of water from its vent, and vibrating a great deal with big noise. It must have been something related to the backyard sprinkler work. I quickly turned off the major valve going to all irrigation zones. However, the backflow preventer did not stop. I then had to turn off the valve between public water and the backflow preventer to stop the annoyance.

I went online and did some research. It’s impossible that my water pressure was higher than public water to cause that. Given what I was doing in the backyard, some dirt may have got back into water pipes. Because the backyard is at a higher location than the backflow preventer (that’s why a backflow preventer is required per code), the dirt may have slip to the backflow preventer, causing some O-rings stuck. The simplest thing I could try was to flush again.

So I turned on the major valve to irrigations zones, and turned on the value for the zone with the new irrigation heads. Because these heads were widely open, the water pressure would be very low. Then I turned on the public water value. The high pressure public water flushed with least resistance through backflow preventer, then the irrigation zone in question, and a lot of water came out into the backyard. The backflow preventer did not discharge water and did not vibrate! After a few minutes, allowing the dirt to be removed from the system, I turned off the irrigation zone valve. The backflow preventer did not discharge water and did not vibrate! Then I turned on and off some other normal irrigation zones and regular faucets. The backflow preventer did not discharge water and did not vibrate! It worked.

In summary, if the backflow preventer discharges water from its vent and vibrates, it may be because of dirt stuck in it. Try to flush it before calling a plumber:

  • Turn off the public water valve to stop water discharge and vibration;
  • Open certain faucets or irrigation heads wide to allow dirt a route to be removed from the system;
  • Turn on the public water valve to flush the dirt for a few minutes.
June 28, 2014

Watch Amazon Instant Video on Android

If you want to watch Amazon Instant Video on your Android phone or tablet, you may be surprised to find that there is no such app available on Google Play Store. You can of course try to use the web Browser app in Android, get to Amazon’s website, and log in to Instant Video; but when you click any title, it does not play, and you see this message: “You can watch it on Kindle Fire, mobile devices, game consoles and other compatible devices”. Amazon’s list of Compatible Mobile Devices includes only Amazon Kindle, its new Fire Phone, and Apple iPad/iPhone/iPod Touch series. As you already know, Android phone/tablet is not on the list. It is unacceptable that Amazon Instant Video subscribers cannot use Android phones or tablets to watch the titles.

Android users basically have only one workaround – pretending to be using a Desktop web browser, like watching on your Windows PC.

Install Dolphin Browser

We need a bowser app on Android that can pretend to be a Desktop web browser. Luckily, Dolphin is such a browser on Android that you can set User agent to “Desktop”. That way the web servers would believe you are using a desktop computer than a phone/tablet. You can install Dolphin from Play Store.

When Dolphin is running, touch the little dolphin icon to the right of the web URL box, then choose Settings icon (to the left of Dolphin icon in the pop up), choose Customize, then touch User agent and change to Desktop.

Use Adobe Flash Instead of Microsoft Silverlight

If you think that’s it, you are wrong. Now you run Dolphin, log in to your Amazon account, and get to Instant Video. When you click to watch a video, it does not play and says that it needs Microsoft Silverlight. Silverlight is a web browser plug-in for PC. Amazon does not know your browser is in fact running on Android. It thinks you are running on Windows. The problem is, there is no Silverlight on Android! Microsoft does not provide that support for Android.

Amazon used to use Adobe Flash for playing videos in web browser. Some old posts over the internet worked with that assumption. But Amazon became favoring Microsoft Silverlight from Adobe Flash.

However, Amazon still allows you to use Flash. You need to click Settings in the web page and get to Amazon Instant Video Settings. Scroll down to near the bottom, you find WEB PLAYER PREFERENCES, and you choose Adobe Flash Player instead of Silverlight (Recommended). Now that annoying Download Silverlight message is gone.

Install Adobe Flash Player

But, most likely your Dolphin still does not play the movie. On my HP Touchpad CM10 (Android 4.0.4), Dolphin shows a little cube with a few question marks – basically no Adobe Flash Player is available in my Android tablet. Abode stopped support of Flash Player on mobile platforms a few years back. Most Android systems do not come with Adobe Flash Player installed today.

However, you can still download the stagnant Flash Player 11.1 for Android from Flash Player archives at Adobe. For example, depending on your Android version, you can download:

After the apk is downloaded, just install it and you are done. Now you can come back to Dolphin and reload the page, your Amazon Instant Video now plays!

Update 7/16/2014: Amazon has confirmed that it will launch an Android app for its video streaming service “soon” (PC Advisor).

June 17, 2014

Remove Hands Free Activation on Samsung Galaxy Rush from Boost Mobile

After rooting Samsung Galaxy Rush from Boost Mobile, there are a lot of possibilities. For example, it is now easy to get rid of Hands Free Activation, which would pop up when the phone is turned on. If it no longer uses Boost Mobile, there is no need to activate the phone at Boost Mobile.

There is a workaround to avoid seeing the Activation app even before rooting:

  • Turn on Airplane mode. The phone stops trying to connect to cellular networks. The Hands Free Activation detects that and will not pop up.
  • Then turn on Wi-Fi. Even though cellular network is not available, Wi-Fi can still be used by the phone to go online.

But the workaround has a drawback. Even though the Activation app does not pop up, its icon remains in the notification bar in the top of screen. The green icon with a check symbol takes precious space. If you swipe down, you will find it is impossible to dismiss it in the notification drawer. It is a system app that unprivileged user cannot stop or remove.

Once the phone is rooted, it is possible to disable or remove this Activation nuisance. The corresponding system app is System Update, Version 1.2.40JB, com.samsung.sdm, SprintDM.apk. There are a few different ways:

  • Install Titanium Backup at Play Store. Run Titanium Backup, select this app and uninstall it.
  • Install Astro File Explorer or ES File Explorer at Play Store. Find the apk file and rename to say SprintDM.bak.
  • Install Root Uninstaller from ROOT UNINSTALLER at Play Store. This tool supports backing up, disabling and uninstalling system apps. This is a good tool if you are not sure to permanently remove a system app, and you want to make a backup copy and/or just disable the system app. It is a free trail version so has some usage limitations. I used it to backup System Update (SprintDM.apk) to sdcard, and then disabled that app.
  • Install Root Uninstaller from dohkoos at Play Store. This tool is much smaller than the other Root Installer. There is no usage limitations. But it only supports uninstalling apps. If you are sure to remove the system app permanently, this is a handy tool.

All the tools above request root access, therefore the Super User Deny/Allow will pop up. You need to choose Allow for them to do their job. After System Update is disabled or uninstalled, you need to reboot the phone. The Hands Free Activation no longer appears in the notification bar, neither pops up to bother.

June 16, 2014

Root Android Phone: Samsung Galaxy Rush from Boost Mobile

Introduction and Preparation

Credits to the AndroidForums thread and TheUnlockr article.

There are basically two parts:

  • Install ClockworkMod Recovery (CWM) on the phone. This needs Odin3 (running on your PC) to flash CMW and replace the Samsung stock recovery on the phone.
  • From within ClockworkMod, install a Superuser zip to root the phone.

To root the phone, download these:

  • Odin3. I used Odin3-v3.04.zip. This will run on Windows PC. Extract the files in the zip to anywhere on your PC, and find Odin3 v3.04.exe.
  • CWM for Galaxy Rush. Either CWM-Rush-M830-Final.tar or CWM-Touch-Rush-m830.tar works. The difference is the touch version is more modern, while the non-touch version uses Volume Up/Down and Power buttons to navigate within CWM. I prefer the non-touch version, CWM-Rush-M830-Final.tar. CWM will be served by Odin3 to the phone.
  • Superuser (Superuser-1.0.1.1.zip). This is the root exploit that CWM is going to install on the phone.

Copy Superuser Zip File to Phone

Superuser-1.0.1.1.zip can be copied to the phone before or after CWM is flashed to the phone.

There is no need to insert a micro-SD card to the phone; the phone’s internal storage is sufficient.

Connect the phone to PC with a USB cable, the phone appears as SPH-M830 (Samsung model number for Galaxy Rush) on Windows 8:

image

Double click SPH-M830, it shows phone storage as a USB drive Phone:

image

Double click the drive and copy the Superuser zip file to it:

image

Flash ClockworkMod

Now install CWM to the phone as recovery tool using Odin on PC.

  • PC: Right click Odin3 v3.04.exe and run as administrator. Leave it running for now.
  • Phone: Unplug USB cable connecting to PC.
  • Phone: Power off.
  • Phone: Press and hold Volume Down and Power buttons until it boots. At the Warning prompt, press Volume Up button to continue to the Download mode.
  • Phone: Plug USB cable connecting to PC. Windows may install driver for the phone.
  • PC: Odin3 should find the phone device in the ID:COM box. The phone is connected to Odin3, and Odin3 can load files including CWM to the phone.
  • PC: Odin3: Uncheck all options, except Auto Reboot. Click PDA button, then choose the CWM file CWM-Rush-M830-Final.tar. Click Start. This starts loading CWM to phone.
  • Phone: It should show downloading progress. When done, it reboots.
  • PC: Odin3 should say PASS. Close Odin3.
  • Phone: Unplug USB cable. This is no longer needed.

If there is no problem, the phone has CWM as the recovery now.

Root It

With CWM rooting the phone is quite simple.

  • Power off the phone.
  • Press and hold Volume Up and Power buttons until it boots. The phone enters Recovery mode, i.e., enters ClockworkMod. In the non-touch version, press Volume Up/Down to navigate through menu items, and press Power button to choose a menu item.
  • Choose install zip from sdcard, then choose zip from sdcard. Then select and install Superuser-1.0.1.1.zip. It takes a few seconds.
  • Choose reboot system now.

After the phone reboots, it is rooted.

Unroot It

After rooting, you may change the phone software in many different ways, for example, removing some apps that were impossible to remove. In case you need to revert back to the unrooted Samsung stock rom for Galaxy Rush, follow the instructions here:

  • Download Stockrom.zip, copy it to the phone.
  • Press and hold Volume Up and Power buttons to boot into ClockworkMod.
  • Wipe data/factory reset
  • Wipe cache
  • Wipe dalvik cache
  • Install StockRom.zip
  • Reboot
June 15, 2014

abs Functions in C and C++

Introduction

The C standard library provides absolute value functions for integral and floating point types (For free close-to-standard drafts, see C89, C99, and C11):

int           abs(int j);             // stdlib.h  C89 C99 C11
long int      labs(long int j);       // stdlib.h  C89 C99 C11
long long int llabs(long long int j); // stdlib.h      C99 C11
double        fabs(double x);         // math.h    C89 C99 C11
float         fabsf(float x);         // math.h        C99 C11
long double   fabsl(long double x);   // math.h        C99 C11

The C language does not support function overloading, so each function needs a unique name.

C++ inherits all the above C standard library functions, i.e., you can still use these functions. Moreover, because C++ supports function overloading, we can call all these overloaded functions through unified name std::abs. This is way more convenient. See the table below:

int           abs(int j);             // cstdlib  C++98 C++11
long          abs(long j);            // cstdlib  C++98 C++11
long long     abs(long long j);       // cstdlib        C++11
float         abs(float x);           // cmath    C++98 C++11
double        abs(double x);          // cmath    C++98 C++11
long double   abs(long double x);     // cmath    C++98 C++11

You can call absolute value functions using other names similar in C (some other variants are not shown), but generally you will not need to do so:

long          labs(long j);           // cstdlib  C++98 C++11
long long     llabs(long long j);     // cinttypes      C++11
float         fabs(float x);          // cmath    C++98 C++11
double        fabs(double x);         // cmath    C++98 C++11
long double   fabs(long double x);    // cmath    C++98 C++11

Inlining and Performance

As one could imagine, the C++ std::abs function overloads are either an alias of, or an inline function calling, the corresponding C function. The C++ benefit is unified naming; the implementation just reuses the C implementation – there is no need to re-implement. Due to inlining, std::abs should not have performance overhead, and the naming benefit should come free.

For example, in my Visual Studio 2012, std::abs(double) and std::abs(float) can be found in math.h as the following:

inline double __CRTDECL abs(_In_ double _X)
        {return (fabs(_X)); }
inline float __CRTDECL abs(_In_ float _X)
        {return (fabsf(_X)); }

This is for fabsf(float):

inline float fabsf(_In_ float _X)
        {return ((float)fabs((double)_X)); }

This is the root C function fabs(double):

_CRT_JIT_INTRINSIC double  __cdecl fabs(_In_ double _X);

Obviously the C++ abs functions have an inlined call to the corresponding C function. The float C function fabsf(float) calls fabs(double) with additional float<->double conversions. Inlining itself should have virtually no overheads.

One may need to be careful about the performance difference of std::abs(float) and std::abs(double), in fact their underlying C functions fabsf(float) and fabs(double). Depending on the platform, these two functions may have quite different performance.

On x64 with Visual Studio 2012, the absolute floating point value is achieved using SSE. When a std::abs(double) or fabs call is inlined, the assembly code is as follows:

andpd    xmm0, QWORD PTR __xmm@7fffffffffffffff7fffffffffffffff ; clear sign bits

When a std::abs(float) or fabsf call is inlined, the assembly code is:

cvtss2sd xmm0, xmm0                                             ; float -> double
andpd    xmm0, QWORD PTR __xmm@7fffffffffffffff7fffffffffffffff ; clear sign bits
cvtsd2ss xmm0, xmm0                                             ; double -> float

As you can see, the SSE instruction ANDPD natively works on double values (and in fact the instruction clears sign bits of two 64-bit double values in XMM0 at once). To get the absolution value of a float, it needs to convert to double, clear the sign bit, and convert back to float. Therefore, std::abs(float) is in fact slower than std::abs(double) on x64.

Performance Comparison Example

Below is a small example program to show the performance absolute value functions on float and double types.

#include <windows.h>	// QueryPerformanceCounter, QueryPerformanceFrequency
#include <cmath>		// std::abs, fabs
#include <vector>		// std::vector
#include <algorithm>	// std::sort
#include <numeric>		// std::accumulate
#include <iostream>		// std::cout

// timing support
struct timing
{
	// stamp time
	timing() { QueryPerformanceCounter(&m_t); }

	// get time difference in milliseconds.
	double operator - (const timing& other) const { return (m_t.QuadPart-other.m_t.QuadPart)*1000.0/freq().QuadPart; }

private:
	static LARGE_INTEGER freq() { LARGE_INTEGER f; QueryPerformanceFrequency(&f); return f; }
	LARGE_INTEGER m_t;
};

// print statistical information
void stats(std::vector<double>& mss)
{
	std::sort(mss.begin(), mss.end());
	double mean = std::accumulate(mss.begin(), mss.end(), 0.0) / mss.size();
	double minv = mss.front();
	double maxv = mss.back();
	double median = mss[mss.size()/2];
	std::cout<<" ["<<mss.size()<<"]: mean="<<mean<<", median="<<median<<", min="<<minv<<", max="<<maxv<<"\n";
}

// measure performance of std::abs
template<typename Real>
void std_abs(std::vector<Real>& v)
{
	Real* p = &v.front();
	Real* e = p + v.size();
	while( p!=e )	// avoid vector overhead in debug
	{
		*p = std::abs(*p);
		++p;
	}
}

// measure performance of fabsf
void f_abs(std::vector<float>& v)
{
	float* p = &v.front();
	float* e = p + v.size();
	while( p!=e )	// avoid vector overhead in debug
	{
		*p = fabsf(*p);
		++p;
	}
}

// measure performance of fabs
void f_abs(std::vector<double>& v)
{
	double* p = &v.front();
	double* e = p + v.size();
	while( p!=e )	// avoid vector overhead in debug
	{
		*p = fabs(*p);
		++p;
	}
}


// test driver
template<typename Real>
void test_abs(size_t count, size_t repeat)
{
	std::vector<double> t_std_abs;
	std::vector<double> t_f_abs;
	t_std_abs.reserve(repeat);
	t_f_abs.reserve(repeat);

	std::vector<Real> v(count);
	
	// warm up: walk through memory
	std_abs(v);
	f_abs(v);

	// measure times
	for(size_t i=0; i<repeat; ++i)
	{
		timing t0;
		std_abs(v);
		timing t1;
		f_abs(v);
		timing t2;

		t_std_abs.push_back(t1-t0);
		t_f_abs.push_back(t2-t1);
	}

	// statistics
	std::cout<<"  std::abs() "; stats(t_std_abs);
	std::cout<<"  C abs()    "; stats(t_f_abs);
}

int main(int argc, char* argv[])
{
#ifdef _DEBUG
	size_t count = 1000000;		// debug is slower, use less
#else
	size_t count = 10000000;
#endif
	size_t repeat = 20;	// for statistical significance

	std::cout<<"float:  "<<count<<" abs, in ms\n";
	test_abs<float>(count, repeat);
	std::cout<<"double: "<<count<<" abs, in ms\n";
	test_abs<double>(count, repeat);
	
	return 0;
}

Built with Visual Studio 2012, below is the running result for Release build (Intel i5 3GHz):

float:  10000000 abs, in ms
  std::abs()  [20]: mean=24.9129, median=25.239, min=22.4168, max=26.9544
  C abs()     [20]: mean=24.6979, median=25.2444, min=22.3905, max=25.6948
double: 10000000 abs, in ms
  std::abs()  [20]: mean=13.5362, median=13.9866, min=12.6494, max=14.135
  C abs()     [20]: mean=13.6213, median=13.9969, min=12.7048, max=14.2889

Basically it shows no difference between the C++ and C functions (std::abs(float) vs. fabsf, std::abs(double) vs. fabs), but it shows that the double version run much faster than the float version. In fact, the double version runs at near 750M operations (abs+read/write+loop) per second.

Just for the reference, below is an example running the Debug build:

float:  1000000 abs, in ms
  std::abs()  [20]: mean=88.6267, median=86.4029, min=72.9652, max=118.84
  C abs()     [20]: mean=57.6703, median=56.0603, min=47.2727, max=77.3424
double: 1000000 abs, in ms
  std::abs()  [20]: mean=65.3075, median=67.8592, min=60.3175, max=68.3168
  C abs()     [20]: mean=24.0356, median=25.1282, min=22.3303, max=25.2725

Obviously the Debug build is much slower than the Release build (20 to 50x slower). And the C++ version is slower than the C version. But no useful conclusion should be drawn here because the instrumentation could be very different, since this is Debug build.

If you are using an up-to-date build environment and you observe performance difference between the C and C++ versions of the absolute value functions, this is what I suggest:

  • Make sure you are running a Release build. Don’t waste your time on a Debug build.
  • Make sure you are comparing an apple with an apple. For example, comparing fabsf(float) with std::abs(float), not fabs(double) with std::abs(float).

I highly doubt any C++ implementation would have a performance bug here.

In the end, I think you will appreciate that using C++ is a better experience with no performance overhead for abs operations.

Follow

Get every new post delivered to your Inbox.

Join 43 other followers