YJL: performance

Showing posts with label performance. Show all posts

3.5 to 5 years of crippled CPU performance 77%

Thursday, February 04, 2016

CPU frequency scaling performance power management

This morning, I noticed the system temperature was oddly low, only just above 40C. I didnt mind at first, but then I felt it seemed to run slower than before, which might just from confusion after I checked the frequencies from /proc/cpuinfo.

I had not looked at file for years, and I saw the frequency was fixed at 1833MHz and 1000MHz, two cores, respectively. So, I tried the ultimate fix, turning it off and on again, didnt work. I began to wonder if anything got updated recently, not the kernel nor any system/hardware stuff that I could remember.

At this point, I laughed before I knew there must be a setting wrong and for years, I had not realized that. So I went back to the power management in kernel configuration and found that I might have been using the wrong governor since 2012-08-27 as kernel 3.4 recommended ondemand governor according to ArchWiki.

That was 3.5 years ago.

Pymux and tmux performance comparison

Monday, January 11, 2016

performance PyPy Python terminal multiplexer tmux

First of all this isnt meant to be accurate or very reliable, even flawed, I just want to see some numbers, because in pymuxs README, it mentions about the performance, but not actual numbers:

Tmux is written in C, which is obviously faster than Python. This is noticeable when applications generate a lot of output. Where tmux is able to give fast real-time output for, for instance find / or yes, pymux will process the output slightly slower, and in this case render the output only a few times per second to the terminal. Usually, this should not be an issue. If it is, Pypy should provide a significant speedup.

Contents

1Method
2Used versions
3Result
4Conclusion

1Method

I use my own test script, termfps.sh, which is written in Bash. I thought about using find or yes as mentioned in the README, but I am too lazy to write a script for the tests, I used what I already have in hand.

Since pymux is written in Python, so I tested with two implementations, the official CPython and PyPy (RPython to C). They will all run within the environment virtualenv creates, and using pip to install pymux 0.5 and Pyte 0.4.10, and along with their dependencies.

The test script is run with reset && ./termfps.sh in urxvtc with font xft:Envy Code R:style=Regular:size=20:antialias in dwm with virtually full-screen of 1680x1050, dwm topbar is hidden, and the window size is 1669x1027 and geometry is 111x33.

Both tmux and pymux are run without configuration files.

Bash builtin `true` vs. command `/bin/true`

Wednesday, February 04, 2015

Bash performance

Six months ago, I wrote Performance in shell and with C about converting a Bash project td.sh into a C. During it, I got some numbers, which showed me the performances of shell builtin and C. Almost two years ago, I wrote about sleep command, also talking about the cost of invoking external command.

Last night, suddenly I realized that I could have just given a very simple example by using true, using Bashs time builtin to time the loops:

test	time
`for i in {1..10000}; do true; done`	00.049s
`for i in {1..10000}; do /bin/true; done`	11.556s

The builtin true is 23,484% faster than the external /bin/true, numbers got from looping the following commands for 1000 iterations, also as another example:

test	time
`e '(11.556-0.049)/0.049*100'`	0.028s
`bc <<< '(11.556-0.049)/0.049*100'`	2.560s

Note

bc actually returns 23400 if without using scale.

The e above is a shell builtin, from the e.bash project I forked off e, tiny expression evaluator. After I learned about the cost, I was no longer using bc just for simple calculation for floating numbers; but with e, I do calculate floating number again in Bash scripts.

These numbers should be very clear abort using external commands. If you have a long loop and there are lots of uses of external command, then you should really consider rewriting in other programming languages.

Performance in shell and with C

Tuesday, August 12, 2014

Bash C performance

This June, I started to working on transitioning td.sh into a C project as well as using GNU Autotools to build. When I decided to do it, it didnt come to my mind that I would have the following statistics to look at:

name	rate	type
Bash implementation
print_td	1856	Bash function calls
td.sh	159	Bash script executions
C implementation
td	52	C executions
Bash with C loadable extension
td	21631	Bash loadable executions
Python bindings
td.py	35	Python 2 script executions
td.py	12	Python 3 script executions

As you could see, the C loadable extension for Bash undoubtedly beats everyone else, 10+ times better in performance. Because of this, I decided to bring back vimps1, which was quietly disabled last August, when I couldnt get it to work with latest vcprompt code at the time, which was linked into vimps1.

At the time, when I didnt really have the way to measure the performance, but I knew it ran faster. How would I know? Well, thats simple, just holding down Enter and see the rate of prompt showing up. You could feel it when you used loadable extension or just pure Bash PS1.

Because of td.sh, now I have just brought back vimps1, although it has a problem with Git repositories, you could still see the benefits:

name	rate
vimps1	1355
bash_ps1	141

Its more than 10 times because bash_ps1 is a seriously stripped down version of my normal Bash PS1.

Not many people are aware how bad their shell or prompt is wasting, probably nobody else cares. However, I begin to think, maybe I need a shell that having performance in its feature list. Bash, although its not a bad shell, from time to time, I do hope it could run faster. It might be wrong to think so, since I havent really seen anyone really talk about shell performance.

After I finished td.shs C transition, I thought about making an extension that would do arithmetic with float number, but thats just wrong to do so, because if you want to make Bash does float, you need to patch Bash or just use other programming language to write the whole code, I have seen a lot of people using bc. Frankly, they truly are doing it all wrong.

Note

In September, 2014, I did make a project, e.bash, by forking e, tiny expression evaluator. It is a Bash builtin, and because I can, so I did it.

They dont understand how costly of an invocation, just look at the first table, 52 vs 21631 runs, thats the prize you have to pay. If you havent known, here is a tip for you, dont use external command in Bash script unless its something Bash cant do.

Note

In Feburary, 2015, I realized there is a very simple example using true to see the performances and costs.

And if most of code is relying on external commands, maybe you should consider writing in a language could do all the tasks. Nevertheless, if thats a because I can, then be my guest.

Simple simplejson parse time test

Thursday, February 17, 2011

C extension JSON performance Python

I want to see the performance between with and without C extension. I downloaded the latest 2.1.3 version. And make three builds using Python 2.6.6:

rm -rf build ; python setup.py build && rm build/lib*/simplejson/_speedups.so
rm -rf build ; python setup.py build
rm -rf build ; CFLAGS="-march=core2 -O2 -pipe -fomit-frame-pointer" python setup.py build

The first one removes the compiled C extension, the second one is normal, then third one uses the current CFLAGS I use in emerge. And the following lines is how C extensions got compiled:

Without customized CFLAGS
x86_64-pc-linux-gnu-gcc -pthread -fPIC -I/usr/include/python2.6 -c simplejson/_speedups.c -o build/temp.linux-x86_64-2.6/simplejson/_speedups.o
x86_64-pc-linux-gnu-gcc -pthread -shared build/temp.linux-x86_64-2.6/simplejson/_speedups.o -L/usr/lib64 -lpython2.6 -o build/lib.linux-x86_64-2.6/simplejson/_speedups.so

With customized CFLAGS
x86_64-pc-linux-gnu-gcc -pthread -march=core2 -O2 -pipe -fomit-frame-pointer -fPIC -I/usr/include/python2.6 -c simplejson/_speedups.c -o build/temp.linux-x86_64-2.6/simplejson/_speedups.o
x86_64-pc-linux-gnu-gcc -pthread -shared -march=core2 -O2 -pipe -fomit-frame-pointer build/temp.linux-x86_64-2.6/simplejson/_speedups.o -L/usr/lib64 -lpython2.6 -o build/lib.linux-x86_64-2.6/

I use the following code to do the test,

import timeit
t = timeit.Timer('json.loads(json_str)', 'import simplejson as json;json_str=open("test.json","r").read()')
print t.timeit(100)

The test.json is http://googleblog.blogspot.com/feeds/posts/default?alt=json&max-results=500, over 3 MB, 500 entires.

The results is:

Tests	Elapsed time for 100 `loads()`
Without C extension	126.230s
json 1.9 in Python 2.6.6	060.616s
With C extension	009.945s
With C extension (CFLAGS)	007.555s

With C extension is at least 10 times faster. I also put the simplejson 1.9 which in Python 2.6.6 in the result. Without extension, 1.9 -> 2.1.3, twice more slower. I didnt download simplejson 1.9 to double check, but I dont think its modified for being shipped with Python.

Testing FPS of terminal

Thursday, December 09, 2010

performance terminal URxvt xterm

I was just curious how fast can a terminal window (in X) get a refresh draw. So I wrote a script to test, termfps.sh. It prints characters to fill up the whole window by default, then reset the cursor to home using ANSI escape code. Print again, doing so for 100 times by default. 100 / elapsed time is the FPS.

I ran several tests using ./termfps.sh 1000 80 25. I used 80 by 25 because thats my VTs terminal size, and I maximize window before I run the test. Here are the result for 80x25 and 1000 frames, sort by fps:

terminal	elapsed time	fps
urxvtc¹	08.882	112.580
urxvtc	09.126	109.574
urxvt	09.140	109.401
urxvtc + tmux²	10.568	094.616
urxvtc + tmux	10.546	094.813
urxvtc + tmux³	11.214	089.173
xterm	16.487	060.653
lxterminal	39.211	025.502
vt1	54.984	018.187

[1]	no `.Xdefaults`

[2]	with `-2`, in right panel of two

[3]	with `-2`, in left panel of two

The slowest one is vt1, I didnt test framebuffer. urxvt is my terminal, but I also have xterm installed. I installed lxterminal for vte-based terminal test. My normal urxvt uses Rxvt.font: xft:Envy Code R:style=Regular:size=9:antialias=false. I ran more on urxvtc, invoked without .Xdefaults, so I could test without changes I made. Since I use tmux, I tested tmux invoked with 256 colors or not, running in a panel.

This script cant test the real FPS since Bash script takes some time to process, but the results arent really much lower and it does show the significant difference between terminals, or those FPS should all be capped around a same number. As you can see, urxvt runs fastest, then xterm, then lxterminal. Though there are some configuration differences, say the fonts, but its quite conclusive from the numbers I see.

For maximized urxvt+tmux terminal window I normally use in one screen with video played in another screen, here is the result:

For 239x65 100 frames, elapsed time: 18.567 seconds
Frames per second: 5.385