As you all know, Nuitka (see "what is Nuitka?" ) has recently completed a milestone. Always short on time, I am not doing a whole lot of benchmarking yet, and focus on development. But here is an interesting submission from Dave Kierans (CTO of iPowow! Ltd):
➜ ~ python pystone.py 1000000 Pystone(1.1) time for 1000000 passes = 10.2972 This machine benchmarks at 97113.5 pystones/second ➜ ~ cython --embed pystone.py;gcc pystone.c -I/usr/include/python2.6 -L /usr/lib/ -lpython2.6 -o ./pystone.cython;./pystone.cython 1000000 Pystone(1.1) time for 1000000 passes = 8.20789 This machine benchmarks at 121834 pystones/second ➜ ~ nuitka-python pystone.py 1000000 Pystone(1.1) time for 1000000 passes = 4.06196 This machine benchmarks at 246187 pystones/second
This is nice result for Nuitka, even if we all know that pystone is not a really good benchmark at all, and that its results definitely do not translate to other software. It definitely makes me feel good. With the 0.4.x series kicked off, it will be exciting to see, where these numbers can indeed go, once Nuitka actually applies standard compiler techniques.
This is to inform you about the new stable release of Nuitka. Please see the page "What is Nuitka?" for clarification of what it is now and what it wants to be.
This release brings massive progress on all fronts. The big highlight is of course: Full Python3.2 support. With this release, the test suite of CPython3.2 is considered passing when compiled with Nuitka.
Then lots of work on optimization and infrastructure. The major goal of this release was to get in shape for actual optimization. This is also why for the first time, it is tested that some things are indeed compile time optimized to spot regressions easier. And we are having performance diagrams, even if weak ones:
Checking compiled code with instance( some_function, types.FunctionType ) as "zope.interfaces" does, was causing compatibility problems. Now this kind of check passes for compiled functions too. Issue#53
The frame of modules had an empty locals dictionary, which is not compatible to CPython which puts the globals dictionary there too. Also discussed in Issue#53
For nested exceptions and interactions with generator objects, the exceptions in "sys.exc_info()" were not always fully compatible. They now are.
The range builtin was not raising exceptions if given arguments appeared to not have side effects, but were still illegal, e.g. range( [], 1, -1 ) was optimized away if the value was not used.
Don't crash on imported modules with syntax errors. Instead, the attempted recursion is simply not done.
Doing a del on __defaults and __module__ of compiled functions was crashing. This was noticed by a Python3 test for __kwdefaults__ that exposed this compiled functions weakness.
Wasn't detecting duplicate arguments, if one of them was not a plain arguments. Star arguments could collide with normal ones.
The __doc__ of classes is now only set, where it was in fact specified. Otherwise it only polluted the name space of locals().
When return from the tried statements of a try/finally block, was overridden, by the final block, a reference was leaked. Example code:
try: return 1 finally: return 2
Raising exception instances with value, was leaking references, and not raising the TypeError error it is supposed to do.
When raising with multiple arguments, the evaluation order of them was not enforced, it now is. This fixes a reference leak when raising exceptions, where building the exception was raising an exception.
This release marks a milestone. The support of Python3 is here. The re-formulation of complex calls, and the code generation improvements are quite huge. More re-formulation could be done for argument parsing, but generally this is now mostly complete.
The 0.3.x series had a lot releases. Many of which brought progress with re-formulations that aimed at making optimization easier or possible. Sometimes small things like making "return None" explicit. Sometimes bigger things, like making class creations normal functions, or getting rid of or and and. All of this was important ground work, to make sure, that optimization doesn't deal with complex stuff.
So, the 0.4.x series begins with this. The focus from now on can be almost purely optimization. This release contains already some of it, with frames being optimized away, with the assignment keepers from the or and and re-formulation being optimized away. This will be about achieving goals from the "ctypes" plan as discussed in the developer manual.
Also the performance page will be expanded with more benchmarks and diagrams as
This is kind of semi-interesting news for you Python3 lovers. My "Python compiler Nuitka" has been supporting the language somewhat, but in the next release it's going to be complete.
Support for annotations, unicode variable names, keyword only arguments, starred assignments, exception chaining plus causes, and full support for new meta classes all have been added. So Python3.2 is covered now too and passing the test suite on about the same level as with Python2.6 and Python2.7 already.
I readied this while working on optimization, which is also seeing some progress, that I will report on another day. Plus it's "available now on PyPI" too. At the time of this writing, there is pre-releases there, but after next release, there will only stable releases published.
My quick take on the Python3 that I saw:
Now seriously. I didn't miss that. And I am going to hate the extra effort it causes to implement argument parsing. And when I looked into default handling, I was just shocked:
def f( some_arg = function1(), *, some_key_arg = function2() )
What would you expect the evaluation order to be for defaults? I raised a CPython "bug report" about it. And I am kind of shocked this could be wrong.
def f( arg1 : c_types.c_int, arg2 : c_types.c_int ) -> c_types.c_long return arg1+arg2
That looks pretty promising. I had known about that already, and I guess, whatever "hints" idea, we come up with Nuitka, it should at least allow this notation. Unfortunately, evaluation order of annotations is not that great either. I would expected them to come first, but they come last. Coming last, they kind of come too late to accept do anything with defaults.
Now great. While it's one step closer to "you can't re-enter any identifier without copy&paste", it is also a whole lot more consistent, if you come e.g. from German or French and need only a few extra letters. And there has to be at least one advantage to making everything unicode, right?
Gotta love that. Exception handlers are often subject to bit rot. They will themselves contain errors, when they are actually used, hiding the important thing that happened. No more. That's great.
The exception causes on the other hand, might be useful, but I didn't see them a lot yet. That's probably because they are too new.
I was wondering already, why that didn't work before.
a, *b = something()
That's great stuff. Wish somebody had thought about meta classes that way from the outset.
Yeah, big stuff. I love it. Also being able to write closure variables is great for consistency.
Nothing ground breaking. Nothing that makes me give up on Python2.7 yet, but a bunch of surprises. And for Nuitka, this also found bugs and corner cases, where things were not yet properly tested. The Python3 test suite, inspired the finding of some obscure compatibility problems for sure.
Overall, it's possible to think of Python2 and Python3 as still the same language with only minor semantic differences. These can then be dealt with re-formulations into the simpler Python that Nuitka optimization deals with internally.
I for my part, am still missing the print statement. But well, now Python3 finds them when I accidentally check them in. So that's kind of a feature for as long as I develop in Python2.6/2.7 language.
Now if one of you would help out with Python3.3, that would be great. It would be about the new dictionary implementation mostly. And maybe finding a proper re-formulation for "yield from" too.
In a recent post, Stefan Behnel questioned the point of static compilation and suggests that in order to be useful, a static compiler needs to add something on top.
This is going to be a rebuttal.
First of all, let me start out, by saying that Nuitka is intended to be the fully optimizing compiler of Python. The optimizing part is not yet true. Right now, it's probably a correct compiler of Python. Correct in the sense that it's compatible to CPython as far as possible.
As examples of what I mean with compatibility:
While generally Nuitka will have a hard time to be faster and compatible to CPython, I don't have much concern about that. Using guards between optimistic, and less optimistic variants of code, there is no doubt in my head, that for programs, lots of code will only need very minimal type annotation and still receive respectable speedups.
Of course, at this point, this is only speculation. But I somehow gather that the sentiment is that incompatible and fast need to go along. I totally dispute that.
Now, the "in addition" stuff, that Stefan is talking about. I don't see the point at all. It is quite obvious that everything you can say with cdef syntax, could also be said with a more Pythonic syntax. And if it were missing, it could be added. And where it's a semantic change, it should be frowned upon.
For the Nuitka project, I never considered an own parser of Python. No matter how easy it would be, to roll your own, and I understand that Cython did that, it's always going to be wrong and it's work. Work that has no point. The CPython implementation exhibits and maintains the module ast that works just fine.
For Python, if this were so really useful, such language extensions should be added to Python itself. If there were missing meaningful things, I contend they would best be added there, not in a fork of it. Note how ctypes and cffi have been added. When I created bindings for Ada code (C exports) to Python, it was so easy to do that in pure Python with ctypes. I so much enjoyed that.
So, slow bindings are in my view really easy to create with plain Python now. Somebody ought to make a ".h" to ctypes/cffi declarations converter, once they are really faster to use (possibly due to Nuitka). For Nuitka it should be possible to accelerate these into direct calls and accesses. At which point, mixing generated C code and C include statements, will just be what it is, a source of bugs that Nuitka won't have.
Note
Further down, I will give examples of why I think that cdef is inferior to plain Python, even from a practical point of view.
Static compilation vs. interpretation as a discussion has little merits to me. I find it totally obvious that you don't need static compilation, but 2 other things:
To me static code analysis and compilation are means to achieve that speed, but not intended to remove interpretation, e.g. plugins need to still work, no matter how deep the go.
For Cython syntax there is no interpreter, is there? That makes it loose an important point. So it has to have another reason for using it, and that would be speed and probably convenience. Now suppose Nuitka takes over with these benefits, what would it be left with? Right. Nothing. At all. Well, of course legacy users.
The orinal sin fall of PyRex - that is now Cython - is nothing that Nuitka should repeat.
The Cython language is so underspecified, I doubt anybody could make a compatible implementation. Should you choose to use it, you will become locked in. That means, if Cython breaks or won't work to begin with, you are stuck.
That situation I totally despise. It seems an unnecessary risk to take. Not only, if your program does not work, you can't just try another compiler. You also will never really know, if it's either your fault or Cython's fault until you do know, whose fault it is. Find yourself playing with removing, adding, or re-phrasing cdef statements, until things work.
Common. I would rather use PyPy or anything else, that can be checked with CPython. Should I ever encounter a bug with it, I can try CPython, and narrow down with it. Or I can try Jython, IronPython, or low and behold, Nuitka.
I think, this totally makes it obvious, that static compilation of a non-Python language has no point to people with Python code.
What I will allways admit, is that Cython is (currently) the best way to create fast bindings, because Nuitka is not ready yet. But from my point of view, Cython has no point long term if a viable replacement that is Pythonic exists.
So, if you leave out static compilation vs. interpretation and JIT compilation, what would be the difference between PyPy and Nuitka? Well, obviously PyPy people are a lot cooler and cleverer. Their design is really inspiring and impressive. My design and whole approach to Nuitka is totally boring in comparison.
But from a practical standpoint, is there any difference? What is the difference between Jython and PyPy? The target VM it is. PyPy's or Java's. And performance it is, of course.
So, with Python implementations all being similar, and just differing in targets, and performances, do they all have no point? I believe taken to the logical conclusion, that is what Stefan suggests. I of course think that PyPy, Nuitka, and Jython have have as much of a point, as CPython does.
And just for fun. This is making up a use cases of type annotations:
plong = long if python_version < 3 else int @hints.signature( plong, plong ) def some_function( a ): return a ** 2
Notice how plong depends on an expression, that may become known during compile time or not. Should that turn out to be not possible, Nuitka can always generate code for both branches and branch when called.
Or more complex and useful like this:
def guess_signature( func ): types = [ None ] emit = types.append for arg in inspect.getargnames( func ): if arg == "l": emit( list ) elif arg == "f": emit( float ) elif arg == "i": emit( int ) else: hints.warning( "Unknown type %s" % arg ) emit( None ) return hints.signature( *types ) def many_hints( func ): # Won't raise exception. hints.doesnot_raise( func ) # Signature to be inferred by conventions guess_signature( func )( func ) # No side effects hints.pure( func ) @many_hints def some_func1( f ): return f + 2.0 @many_hints def some_func2( i ): return i + 2 @many_hints def some_func3( l ): return i + [ 2 ]
This is just a rough sketch, but hopefully you get the idea. Do this with Cython, can you?
The hints can be put into decorators, which may be discovered as inlinable, which then see more inlines. For this to work best, the loop over the compile time constant code object, needs to be unrolled, but that appears quite possible.
The signatures can therefore be done fully automatic. One could use prefix notation to indicate types.
Another way would put fixed types for certain variable names. In Nuitka code, "node", "code", "context", etc. have always the same types. I suspect many programs are the same, and it would be sweet, if you could plug something in and check such types throughout all of the package.
And then, what do you do then? Well, you can inspect these hints at run time as well, they work with CPython as well (though they won't make things faster, only will that find errors in your program), they will even work with PyPy, or at least not harm it. It will nicely JIT them away I suppose.
Your IDE will like the code. syntax highlighting, auto indent will work. With every Python IDE. PyLint will find the bugs I made in that code up there. And Nuitka will compile it and benefit from the hints.
My point here really is, that cdef is not flexible, not standard, not portable. It should die. It totally is anti-Pythonic to me.
In Java land, people compile to machine code as well. They probably also - like stupid me - didn't understand that static compilation would have no point. Why do they do it? Why am I using compiled binaries done with their compiler then?
And why didn't they take the chance to introduce ubercool cdef a-likes while doing it? They probably just didn't know better, did they?
No seriously. A compiler is just a compiler. It takes a source code in a language and turns it into a package to execute. That may be a required or an optional step. I prefer optional for development turn around. It should try and make code execute as fast as it can. But it should not change the language. With Cython I have to compile. With Nuitka I could.
In fact, I would be hard pressed to find another example of a compiler that extends the interpreted language compiled, just so there is a point in having it.
Nuitka has a point. On top of that I enjoy doing it. It's great to have the time to do this thing in the correct way.
So far, things worked out pretty well. My earlier experimentations with type inference had shown some promise. The "value friends" thing, and the whole plan, appears relatively sound, but likely is in need of an update. I will work on it in december. Up to now, and even right now I worked on re-formulations, that should have made it possible to get more release ready effects from this.
When I say correct way, I mean this. When I noticed that type inference was harder than it should be, I could take the time and re-architecture things so that it will be simpler. To me that is fun. This being my spare time allows me to do things this efficiently. That's not an excuse, it's a fact that explains my approach. It doesn't mean it makes less sense, not at all.
As for language compatibility, there is more progress with Python3. I am currently changing the class re-formulations for Python2 and Python3 (they need totally different ones due to metaclass changes) and then "test_desc.py" should pass with it too, which will be a huge achievement in that domain. I will do a post on that later.
Then infrastructure, should complete the valgrind based benchmark automatism. Numbers will become more important from now on. It starts to make sense to observe them. This is not entirely as fun. But with improving numbers, it will be good to show off.
And of course, I am going to document some more. The testing strategy of Nuitka is worth a look, because it's totally different from everything else people normally do.
Anyway. I am not a big fan of controversy. I respect Cython for all it achieved. I do want to go where it fails to achieve. I should not have to justify that, it's actually quite obvious, isn't it?
Yours, Kay
This is to inform you about the new stable release of Nuitka. Please see the page "What is Nuitka?" for clarification of what it is now and what it wants to be.
This release brings about changes on all fronts, bug fixes, new features. Also very importantly Nuitka no longer uses C++11 for its code, but mere C++03. There is new re-formulation work, and re-factoring of functions.
But the most important part is this: Mercurial unit tests are working. Nearly. With the usual disclaimer of me being wrong, all remaining errors are errors of the test, or minor things. Hope is that these unit tests can be added as release tests to Nuitka. And once that is done, the next big Python application can come.
Local variables were released when an exception was raised that escaped the local function. They should only be released, after another exception was raised somewhere. Issue#39.
Identifiers of nested tuples and lists could collide.
a = ( ( 1, 2 ), 3 ) b = ( ( 1, ), 2, 3 )
Both tuples had the same name previously, not the end of the tuple is marked too. Fixed in 0.3.24.1 already.
The __name__ when used read-only in modules in packages was optimized to a string value that didn't contain the package name.
Exceptions set when entering compiled functions were unset at function exit.
Compiled frames support. Before, Nuitka was creating frames with the standard CPython C/API functions, and tried its best to cache them. This involved some difficulties, but as it turns out, it is actually possible to instead provide a compatible type of our own, that we have full control over.
This will become the base of enhanced compatibility. Keeping references to local variables attached to exception tracebacks is something we may be able to solve now.
Enhanced Python3 support, added support for nonlocal declarations and many small corrections for it.
Writable __defaults__ attribute for compiled functions, actually changes the default value used at call time. Not supported is changing the amount of default parameters.
if a if b else c See above, this code is now the typical pattern for each ``or`` and ``and``, so this was much needed now.
The remaining uses of C++11 have been removed. Code generated with Nuitka and complementary C++ code now compile with standard C++03 compilers. This lowers the Nuitka requirements and enables at least g++ 4.4 to work with Nuitka.
The usages of the GNU extension operation a ?: b have replaced with standard C++ constructs. This is needed to support MSVC which doesn't have this.
Added examples for the typical use cases to the User Manual.
The "compare_with_cpython" script has gained an option to immediately remove the Nuitka outputs (build directory and binary) if successful. Also the temporary files are now put under "/var/tmp" if available.
Debian package improvements, registering with "doc-base" the User Manual so it is easier to discover. Also suggest "mingw32" package which provides the cross compiler to Windows.
Partial support for MSVC (Visual Studio 2008 to be exact, the version that works with CPython2.6 and CPython2.7).
All basic tests that do not use generators are working now, but those will currently cause crashes.
Renamed the --g++-only option to --c++-only.
The old name is no longer correct after clang and MSVC have gained support, and it could be misunderstood to influence compiler selection, rather than causing the C++ source code to not be updated, so manual changes will the used. This solves Issue#47.
Catch exceptions for continue, break, and return only where needed for try/finally and loop constructs.
This release marks an important point. The compiled frames are exciting new technology, that will allow even better integration with CPython, while improving speed. Lowering the requirements to C++03 means, we will become usable on Android and with MSVC, which will make adoption of Nuitka on Windows easier for many.
Structurally the outstanding part is the function as references cleanup. This was a blocker for value propagation, because now functions references can be copied, whereas previously this was duplicating the whole function body, which didn't work, and wasn't acceptable. Now, work can resume in this domain.
Also very exciting when it comes to optimization is the remove of special code for or and and operators, as these are now only mere conditional expressions. Again, this will make value propagation easier with two special cases less.
And then of course, with Mercurial unit tests running compiled with Nuitka, an important milestone has been hit.
For a while now, the focus will be on completing Python3 support, XML based optimization regression tests, benchmarks, and other open ends. Once that is done, and more certainty about Mercurial tests support, I may call it a 0.4 and start with local type inference for actual speed gains.
This post is about Nuitka the Python compiler started out using C++0x which is now C++11, and then chose to stop it.
Very early on, when I considered how to generate code from the node tree, in a way, that mistakes should practically be impossible to make, I made the fundamental decision, that every Python expression, which produces temporary variables, should become an expression in the generated code too.
Note
That is my choice, I think it keeps code generation more simple, and easier to understand. There may come a separate post about how that played out.
That decision meant some trouble. Certain things were not easy, but generally, it was achievable for g++ relatively quickly, and then lots of helper functions would be needed. Think of MAKE_TUPLE and MAKE_DICT, but also other stuff needed that. Calling a Python built-in with variable number of parameters e.g. could be implemented that way easily.
Other nice things were enum classes, and generally good stuff. It was really quick to get Nuitka code generation off the ground this way.
But then, as time went on, I found that the order of evaluation was becoming an issue. It became apparent that for more and more things, I needed to reverse it, so it works. Porting to ARM, it then became clear, that it needs to be the other way around for that platform. And checking out clang, which is also a C++11 compiler, I noticed, this one yet uses a different one.
So, for normal functions, I found a solution that involves the pre-processor to reverse or not, both function definition and call sites, and then it is already correct.
This of course, doesn't work for C++11 variadic functions. So, there came a point, where I had to realize, that each of its uses was more or less causing evaluation order bugs. So that most of their uses were already removed. And so I basically knew they couldn't stay that way.
Also, things I initially assumed, e.g. that lambda functions of C++11 may prove useful, or even "auto", didn't turn out to be true. There seemingly is a wealth of new features, besides variadic templates that I didn't see how Nuitka would benefit from it at all.
Then, at Europython, I realized, that Android is still stuck with g++-4.4 and as such, that an important target platform will be unavailable to me. This platform will become even more important, as I intend to buy an device now.
So what I did, was to remove all variadic functions and instead generate code for them as necessary. I just need to trace the used argument counts, and then provide those, simple enough.
Also, other things like deleted copy constructors, and so on, I had to give up on these a bit.
This change was probably suited to remove subtle evaluation order problems, although I don't recall seeing them.
The current stable release still requires C++11, but the next release will work on g++-4.4 and compiles fine with MSVC from Visual Studio 2008, although at this time, there is still the issue of generators not working yet, but I believe that ought to be solvable.
The new requirement is only C++03, which means, there is a good chance that supporting Android will become feasible. I know there is interest from App developers, because there, even the relatively unimportant 2x speedup, that Nuitka might give for some code, may matter.
So that is a detour, I have taken, expanding the base of Nuitka even further. I felt, this was important enough to write down the history part of it.
This is to inform you about the new stable release of Nuitka. Please see the page "What is Nuitka?" for clarification of what it is now and what it wants to be.
This release contains progress on many fronts, except performance.
The extended coverage from running the CPython 2.7 and CPython 3.2 (partially) test suites shows in a couple of bug fixes and general improvements in compatibility.
Then there is a promised new feature that allows to compile whole packages.
Also there is more Python3 compatibility, the CPython 3.2 test suite now succeeds up to "test_builtin.py", where it finds that str doesn't support the new parameters it has gained, future releases will improve on this.
And then of course, more re-formulation work, in this case, class definitions are now mere simple functions. This and later function references, is the important and only progress towards type inference.
The compiled method type can now be used with copy module. That means, instances with methods can now be copied too. Issue#40. Fixed in 0.3.23.1 already.
The assert statement as of Python2.7 creates the AssertionError object from a given value immediately, instead of delayed as it was with Python2.6. This makes a difference for the form with 2 arguments, and if the value is a tuple. Issue#41. Fixed in 0.3.23.1 already.
Sets written like this didn't work unless they were predicted at compile time:
{ value }
This apparently rarely used Python2.7 syntax didn't have code generation yet and crashed the compiler. Issue#42. Fixed in 0.3.23.1 already.
For Python2, the default encoding for source files is ascii, and it is now enforced by Nuitka as well, with the same SyntaxError.
Corner cases of exec statements with nested functions now give proper SyntaxError exceptions under Python2.
The exec statement with a tuple of length 1 as argument, now also gives a TypeError exception under Python2.
For Python2, the del of a closure variable is a SyntaxError.
Migrated "bin/benchmark.sh" to Python as "misc/run-valgrind.py" and made it a bit more portable that way. Prefers "/var/tmp" if it exists and creates temporary files in a secure manner. Triggered by the Debian "insecure temp file" bug.
Migrated "bin/make-dependency-graph.sh" to Python as "misc/make-dependency-graph.py" and made a more portable and powerful that way.
The filtering is done a more robust way. Also it creates temporary files in a secure manner, also triggered by the Debian "insecure temp file" bug.
And it creates SVG files and no longer PostScript as the first one is more easily rendered these days.
Removed the "misc/gist" git sub-module, which was previously used by "misc/make-doc.py" to generate HTML from User Manual and Developer Manual.
These are now done with Nikola, which is much better at it and it integrates with the web site.
Lots of formatting improvements to the change log, and manuals:
The creation of the class dictionaries is now done with normal function bodies, that only needed to learn how to throw an exception when directly called, instead of returning NULL.
Also the assignment of __module__ and __doc__ in these has become visible in the node tree, allowing their proper optimization.
These re-formulation changes allowed to remove all sorts of special treatment of class code in the code generation phase, making things a lot simpler.
There was still a declaration of PRINT_ITEMS and uses of it, but no definition of it.
Code generation for "main" module and "other" modules are now merged, and no longer special.
The use of raw strings was found unnecessary and potentially still buggy and has been removed. The dependence on C++11 is getting less and less.
This release brought forward the most important remaining re-formulation changes needed for Nuitka. Removing class bodies, makes optimization yet again simpler. Still, making function references, so they can be copied, is missing for value propagation to progress.
Generally, as usual, a focus has been laid on correctness. This is also the first time, I am release with a known bug though: That is Issue#39 which I believe now, may be the root cause of the mercurial tests not yet passing.
The solution will be involved and take a bit of time. It will be about "compiled frames" and be a (invasive) solution. It likely will make Nuitka faster too. But this release includes lots of tiny improvements, for Python3 and also for Python2. So I wanted to get this out now.
As usual, please check it out, and let me know how you fare.
That just killed some hope inside of me, breaking code that uses str ought to be forbidden.
Update
Turns out, this only a bug, and not intentional. And the bug is only in the doc string, so it's being fixed, and there is no inconsistency then any more.
Python 3:
Python 3.2.3 (default, Jun 25 2012, 23:10:56) [GCC 4.7.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> print( str.__doc__ ) str(string[, encoding[, errors]]) -> str Create a new string object from the given encoded string. encoding defaults to the current default string encoding. errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'. >>> str( string = "a" ) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'string' is an invalid keyword argument for this function
Python 2:
Python 2.7.3 (default, Jul 13 2012, 17:48:29) [GCC 4.7.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> print str.__doc__ str(object) -> string Return a nice string representation of the object. If the argument is a string, the return value is the same object. >>> str( object = "a" ) 'a'
I do understand that it's in fact just the old unicode built-in. In fact, I made it work like that for Nuitka just now. But there is a difference, for Python2, it was well behaved.
Python 2:
>>> print unicode.__doc__ unicode(string [, encoding[, errors]]) -> object Create a new Unicode object from the given encoded string. encoding defaults to the current default string encoding. errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'. >>> unicode( string = "a" ) u'a'
Is Python3 supposed to be more clean or what? I think it is not happening.
Yours, Kay Hayen
While writing Nuitka I get to see an absurd amount of CPython code. For a while now, it's also CPython3.2 that I look at. Checking out __future__ handling, I was surprised the other day though, this really works:
# Python 3.2.3 (default, Jun 25 2012, 23:10:56) # [GCC 4.7.1] on linux2 >>> from __future__ import barry_as_FLUFL >>> 1 <> 2 True >>> 1 != 2 File "<stdin>", line 1 1 != 2 ^ SyntaxError: invalid syntax
It's new in CPython3, and this the code that makes it possible, from the Python parser:
if (type == NOTEQUAL) { if (!(ps->p_flags & CO_FUTURE_BARRY_AS_BDFL) && strcmp(str, "!=")) { PyObject_FREE(str); err_ret->error = E_SYNTAX; break; } else if ((ps->p_flags & CO_FUTURE_BARRY_AS_BDFL) && strcmp(str, "<>")) { PyObject_FREE(str); err_ret->text = "with Barry as BDFL, use '<>' " "instead of '!='"; err_ret->error = E_SYNTAX; break; } }
Now who would think bad of that, would you? The fun aspect is, that Nuitka will easily supports it. By re-using the Python parser, it works out of the box, I only needed to add the flag value.
For fun, I tried to add a test that confirms - and then notice:
- It doesn't really work for CPython3.2 already.
- The flag is only used for eval and exec and not on the same level, so it's only inherited.
- And "2to3" kindly removes that flag silently. It probably should raise an error.
So yeah, oddities in Python3!
At Europython conference, in my presentation, I talked about re-formulations of Python into simpler Python. It is my intention to turn this into a series of Python quiz questions that you will hopefully enjoy.
Update
Due to comments feedback, I made it more clear that "-O" affects of course both cases, and due to work on getting the recent CPython2.7 test suite to work, I noticed, how the re-formulation for Quiz Question 2 needed a version dependent solution.
And I thought this one was easy. :-)
Say you have the following code:
assert x == y
How can you achieve the same thing, without using the assert statement at all. The behavior is required to be absolutely the same.
The answer is in the next paragraph, so stop reading if you want to find out yourself.
The correct answer is that assertions are the same as a raise exception in a conditional statement.
if not x == y: raise AssertionError
The thing where this makes a difference, is "-O", which will discard assertions, but I consider it rarely used. To be really compatible with that, it should be:
if __debug__ and not x == y: raise AssertionError
But wait, there is slightly more to it. Say you have the following code:
assert x == y, arg
How can you achieve the same thing, without using the assert statement at all. The behavior is required to be absolutely the same.
The answer is in the next paragraph, so stop reading if you want to find out yourself.
This is actually version dependent, due to recent optimizations of CPython.
For version 2.6 it is as follows:
The extra value to assert, simply becomes an extra value to raise, which indicates, delayed creation of the AssertionError exception.
if not x == y: raise AssertionError, arg
For version 2.7 and higher it is as follows:
The extra value to assert, simply becomes the argument to creating the AssertionError exception.
if not x == y: raise AssertionError( arg )
So, even in the more complex case, you end up with a conditional raise.
The only thing where this makes a difference, is "-O", which will discard assertions, but I consider it rarely used. To be really compatible with that, it should be:
if __debug__ and not x == y: raise AssertionError ....
Surprised? Well, yes, there really is nothing to assert statements. I am using this for my Python compiler Nuitka which benefits from having not to deal with assert as anything special at all. See also the respective section in the Developer Manual which explains this and other things.