Comment by Connelly

Worst case scenario: The authors of this project assert that the kwalitee metric is almost always "right," that all packages should use unit testing, and follow standards such as including a The Python community comes to believe that a project with low kwalitee must have low quality. The major Python projects such as scipy, pycrypto, PIL, PyOpenGL, etc make only halfhearted attempts at achieving a high kwalitee score, and end up not having a high kwalitee score because their project architecture is not really built to accommodate the things Cheesecake is looking for (for example, scipy might have the unit tests in a directory unit_tests/ instead of tests/ which Cheesecake looks for, and it may use its own unit testing framework). Thus either Cheesecake incorporates specific fixes which allow the major Python projects to keep their eccentricities and still achieve a high kwalitee score, or people ignore the kwalitee metric for the "major" projects, assuming that if a project is widely used then it MUST have quality. Thus people are willing to make an exception to the rule that "kwalitee = quality" for major projects, but not for minor projects. This forces minor projects to adhere to the "Cheesecake way" or be ignored. In the end we have (a) more bigotry in the Python community and (b) many people wasting a large amount of time trying to tweak packages endlessly and get a "high score" on Cheesecake.

It should be pretty easy to prevent this worst case scenario from happening. One solution is to clearly state that the Cheesecake recommendations are only that -- the Cheesecake score is a subjective measure, and following the ideals put forth by Cheesecake may in the authors' opinion often be "good" for the average package, but may not be "good" for every package. It might also help to state that achieving a high Cheesecake score per se is not the goal, rather if the ideas espoused by Cheesecake are followed and a low Cheesecake score results, then this should be considered a bug in the Cheesecake program and not something that the package author should fret about. My major point here is that the Python community is pretty uptight and rather prone to have religious assertions of subjective "fact," and two things that I would very much not like to see is an entire community converted to the "Cheesecake is absolutely correct" opinion or an endless flamewar between the "Cheesecake is absolutely correct" and "Cheesecake is absolutely wrong" factions.

I have not been terribly clear, so let me illustrate with some examples. When I asked about some obscure thread issue in the Python chat channel, I got lectured that threads should never be used and also told that threads are just fine. When I asked about globals I was told that I should never use globals. When I asked about looking up an execution frame's f_back attribute, I was told not to use this. I won't pretend that I like such patronizing attitudes -- in fact, I find them to be the single worst aspect of the otherwise excellent Python world! I don't think my experiences are unique, indeed after reading a rather insightful rant by Stevie Yegge it seems that in the Python community there is something endemic which causes people to believe that certain things are absolutely "correct" while others are "incorrect." For some issues, there are factions which debate these ideas back and forth endlessly. I think that if you emphasize that Cheesecake need not be in either the "correct" or "incorrect" category, then people may calm down and instead take it to be a generally useful metric but not Gospel nor the Book of Satan.

Finally, if you aren't convinced that unit tests and a are not always appropriate, I'll try to convince you. In the popular package pygame, many modules cause graphical output, interface with the video card, interface with the sound card, or interface with input devices. While the pygame package currently has only 3 unit tests, and more unit tests might help the project, it is not possible to have a really comprehensive unit test suite for the project because outputs are not consistent on different computers/graphics cards/sound cards (OpenGL does not specify exact behavior for pixels, color depth may change, the sound card may only support a fixed number of channels, etc). Also while unit tests for input devices might be useful, it would be impossible to run these in an automated testing process! In many cases it is more appropriate to simply run an example program and determine whether the graphics animate correctly on the screen, whether the mouse works, and whether the sound works. For an example of where a is inappropriate, you might look at htmldata by myself -- this module includes unit tests, docs, etc but does not include a because it is only one Python source file. It seems easier to drop the file in the appropriate directory than to use the process. I might be wrong! However, the general principle stands: it's probably "better" for everyone if package maintainers generally try to use "good" coding practices, however if they intentionally bend a few rules then it's OK, and the entire Python world doesn't need to freak out or devolve into a flamewar.

Finally, I apologize if I have hurt the feelings of anyone on the Cheesecake team. I do not expect that the Cheesecake team will say that their metric is "always right" -- it was rather a hypothetical, dystopian exaggeration to emphasize the potential problems that may arise in a somewhat "religious" community. I hope most of my post was just silly worrying, and that none of these problems will arise. I do think Cheesecake is a good idea, and will probably use it myself to determine if projects are well-commented, use unit tests, and so forth (and perhaps even to determine for my own projects whether I remembered to add docstrings!). Thanks for your time. I'd be interested in hearing your thoughts. - Connelly Barnes

Comment by Michal Kwiatkowski

Thank you for your comments. I can assure you that we're not fanatics and we are aware of weaknesses of Cheesecake indexing. Part of work that I'll do this summer will be focused on making Cheesecake score more reliable. Of course the score would always be somewhat subjective but I think certain level of trust can be achieved. For example, currently we're using pylint to score code kwalitee. Pylint is a nice tool, but more different your coding style is from what its authors consider to be ideal, the lower is your score. I feel it is dishonest, so I'll work on adding consistency check on coding style. No matter how different from pylint ideal your code is, you get high score for consistency. I think most of programmers can agree that style consistency is a good thing. Pylint checks consistency on the high level - consistency among whole Python community (e.g. how to call modules, how to use whitespaces, etc.). Adding lower-level consistency check (code style consistency for project in question) will make the whole score more reliable. So, even if you have strange coding style, as far as it's consistent, it's OK.

I gave only one example, but I want to emphasize that we'll try to improve Cheesecake score as much as we can. Point us to weaknesses of Cheesecake index and we'll try to fix them. But please remember we can't please all developers. If your project gets low Cheesecake score it doesn't mean it's bad. But it can mean you've written it in a way that other (let us say: most) Python developers would find it hard to read and modify. You may place your tests in unit_tests directory, but please create a test suite, and point to it in, so that every user can test your package by simple "python test". If you don't use setuptools it's your decision, but please note that users would have a hard time installing your script/library. Your module may have been written for developers to use, but remember that the user would have to manage dependencies manually if you don't use standard distribution method. Having that kind of standards is beneficial for both developers (easy to modify someone else code) and users (easy to test and contribute richer bug reports). And that is what is Cheesecake about - encouraging good practices that will be beneficial for both users and developers.

Response by Grig Gheorghiu to Connelly's comment

First of all, thank you very much, Connelly, for taking the time to give us your feedback. One of the goals of the Cheesecake project is to make people aware of these issues and to generate a discussion within the Python community, which discussion will hopefully prove constructive and will help raise the 'kwalitee' of Python projects.

I agree with you that the Cheesecake scores are not absolute, and must be taken with a grain of salt. We'll test our algorithms on the best-known projects in the Python world, and it those projects will have abysmal scores, we'll know we'll have to tweak the algorithms. However, a certain minimum standard should be met by all projects. I think all of us Python programmers expect to be able to install a package via "python install". Simply dropping the package files in a directory leads to all kinds of problems, as I have repeatedly seen. I personally think it would be much better if people started to use setuptools more heavily and made their packages available as eggs. Cheesecake will reward packages that do this.

As far as unit tests are concerned, let me disagree with you. I'm a *very* strong believer in unit tests. In the example you give, the pygame developers should be able to mock the external interfaces, and still have plenty of unit tests for the core logic in their code. I understand that code that deals with external interfaces is hard to test automatically, but those are integration tests, not proper unit tests. If all packages adhered to the rule that their unit tests can be run via "python test", then the naming of the directories where their unit tests reside would not even matter (and as a matter of fact, Cheesecake *will* look into any directory with 'test' in its name, so it will discover a directory called unit_tests for example).

Again, I'm glad we're starting to have a discussion about these things, and I'm sure Michal's work over the summer will uncover many such issues that need to be brought out in the open and discussed in the community.

Response by Connelly

First, I am relieved that you guys seem reasonable. As I said before, my greatest fear is that you guys take a particularly polemical stance and end up polarizing the community (the "RMS" effect).

On the subject of unit tests, my point was that for certain multimedia packages it may be easier to run an example program (which in practice will have the effect of verifying much of the interface) than writing a bunch of unit tests. Again I don't know whether this is "right" or "wrong" from the standpoint of efficiency or correctness, but I am observing that significant subgroups in the Python community function as if it were "right," and will probably become annoyed at you guys if you try to tell them otherwise. I haven't looked very hard, but I noticed that Soya3D also doesn't seem to use many unit tests, and at least Beautiful Soup, Raymond Hettinger's matrix library, and some of the modules by Bram Cohen are available as single .py files only. Of course these are just specific examples -- the general idea is that people have widely varying ideas of good design, so I think it would generally be polite to accommodate ideas which significant parts of the Python community intentionally choose to follow and call "good design" and which can plausibly fit into your own concept of the absolute Good Design.

Thanks, Connelly Barnes.

Comment by Robert Kern

copy&paste from mail to Grig

In my opinion, the Cheesecake project is okay as it stands. It's a tool that makes a set of measurements available to developers. Some developers want those measurements; a tool that does all of them at once is certainly a convenience. Some of the measurements are more useful than others, but that's up to the developer how he wants to use that information. Some of the measurements are more reliable proxies than others, but the developer can ignore the ones he deems unreliable.

One thing about the Cheesecake project that simply offends my sensibilities, though, is the Cheesecake index itself. If my scientific training taught me nothing else, it is that a meaningless measurement is far, far worse than no measurement at all. The algorithm and weightings that you use to combine all of the individual measurements into a single number is entirely arbitrary. Arbitrary measurements have no meaning. However, people inevitably believe that they do. Notice how, after an initial disclaimer that Cheesecake and other such programs only measure an artificial notion of "kwalitee," the student continues to talk about measuring the actual quality of the package.

In my opinion, the single most important improvement you could make to Cheesecake would be to drop the Cheesecake index entirely. Just run the individual tests and report the results to the user.

However, I still have large reservations about integrating Cheesecake into PyPI. It's a very backwards and, in my opinion, counterproductive way of achieving the goals you and the student claim. If you want to promote a common file layout for packages, write a skeleton generator. If you want to promote a standard "python test" scheme, write the code necessary to hook that distutils command up to all of the various unit testing systems out there. If you want to promote eggs, write the "nest" tool to easily manage egg installations or simply help individual projects convert to setuptools. If you want to increase the use of docstrings, write plugins for code editors that let developers easily navigate to function/class definitions missing docstrings. If you want to encourage developers to place their packages on PyPI itself, fix PyPI such that it doesn't automatically display the highest version of a package (which is frequently an unstable beta release rather than the preferred stable release).

Tools that actually help developers achieve goals are much more useful than tools that simply tell them that they fall short. There is no shortage of work to be done here.

The usefulness of the good Cheesecake features that I mentioned above is predicated on the assumption that the developer is choosing to use Cheesecake. I find the "push feedback" mode of integrating it into PyPI to be simply rude. Fredrik Lundh called this "non-productive control-freakery" on python-dev, and I have to agree.

An automatic testing system for PyPI would be quite useful. It would certainly be a "tool that actually helps developers." If the project had been about that and only that from the beginning, I would have been all for it. Unfortunately, it's still mixed up with the parts that I think will be detrimental to the community.

I know that you and Michał‚ have the best of intentions with Cheesecake and this SoC project, but I simply disagree with you on some fundamental points.

Comment by Michael Foord

Does this mean that projects with good doctests (instead of unittests) will be penalised by the ranking system ?

Response by Grig Gheorghiu

Michael -- no way! If anything, projects using doctest will be rewarded, because they practice "executable documentation" :-)

Response by Michal Kwiatkowski

To sum up: we'll try to support all well known Python testing frameworks, so using doctest is perfectly fine. But if you're going to use your own special testing mechanizm and you won't hook it up to any existing interfaces (like already mentioned test command) your package will get zero points for testing. It also won't be automatically tested during upload to PyPI.

Comment by Developer

Hi. Let me tell you a relevant story. gcc has a similar option for checking "good C++". It's the option --Weff-c++. It prints lots of warnings about what should be "correct C++" according to a book. Now there are two things. 1) If you enable this option on some real programs out there, there will be tons of warnings. 2) consequently, in time people have decided to ignore those metrics; many developers do not even care about trying this option on their project. The evaluation system has obsoleted itself.

The same thing may happen with cheesecake: What do you mean there should be a README file? If I add an empty README file I get a good cheesecake score, while somebody who has excellent information in a file DOC, gets a bad score? I'm afraid cheesecake's kwalitee is going to become a factor that won't be respected by developers. If cheesecake gives a bad score to a project it may mean that either the project is bad, *or* cheesecake is bad and the project is good. And if there are many good projects that get a bad score cheesecake will render itself wrong!

Response by Michal Kwiatkowski

And that's exactly the reason we're asking the community for feedback. We don't want to implement "book standards" but real working and useful practices that good Python developers use. If there will be a few good projects with DOC instead of README file, we'll add this check to Cheesecake. What I could call most common misunderstanding about Cheesecake is a misconception that we are trying to come up with our own standards and try to enforce them on Python developers. What we'll actually do is listen to what developers have to say and come up with a greatest common divider for all these good programming advices. Cheesecake score will represent project compliance to this common and established way of doing things. Important part of this is that Cheesecake won't be ever complete. It will have to incorporate changes in the same way the community and methodology changes over time. Having a tool to point you to current trends and suggest good advices on trivial things like file naming convention or distribution method is invaluable. This way you can focus on coding, leaving boring stuff to Cheesecake. One of our goals is to create an easy reference for all factors that affect Cheesecake score, so that every developer can easily look up why he's loosing points. And that doesn't mean these rules are set in stone - we encourage all developers to question them.

The point I'm trying to make is that Cheesecake is written to help programmers, not to put blame on them. It's one of agile development advices: "Criticize Ideas, Not People". If most developers use README it's probably a good thing to have a README in your project. It tells nothing about the quality of your documentation, it only suggest you don't have file that most people will try to look for first, right after package has been unpacked. It also make it easier for package managers in different open source distributions, like Debian or Gentoo, as they don't have to check manually which of your DOC/UserManual/anything file contains actual documentation they can, for example, incorporate into a man. Having one good way of doing things is very efficient for collaboration and project maintaince. Python was built upon this principle, so Cheesecake is merely a continuation of this thinking, but on different level.

And BTW, empty README are no longer accepted and counted, just fixed that in Brie iteration. ;-)

Comment by Will Guaraldi

I just did another run on PyBlosxom? with the SVN code and it's coming along very nicely. I think CheeseCake? has the ability to pass "good practices" information on to the programmer and as such it would be cool to offer an assessment after the run of things that the programmer should do to make the code better.

Maybe create a --recommend option that for each category if the score is less than half (or something along those lines), CheeseCake? spits out a short blurb about what the category score means and a url for where to go for more information about how to fix it. Then at that url (it could point to a specific page on this wiki) are resources regarding that category.

For example, somewhere else on this wiki I asked about where I might find documentation on what kind of information should go into files like README, CHANGELOG, ... That information would be really useful to have for all the categories.

If you need help--I'd be happy to help build such documentation. If you we put the information in the wiki, then it can grow and evolve over time which would also be useful because it might provide feedback into those specific categories and tests.

In part, I see CheeseCake? as an interesting tool to impart knowledge and make people better programmers as well as making software better software.