Today I'd like to share an interesting article from the Google research team. It was published in 2015 but I think it's still very relevant. It describes common problems with Machine Learning systems more from the engineering perspective.
Although many examples are closely related to Machine Learning, I think they still have many in common with general problems arising in software development. What makes the Machine Learning special is, in my opinion, that it's usually being considered the central part of any system using it. Numerous start-ups are trying to build their success on AI models themselves. So much attention is being put into Machine Learning part of the development that it's very easy to forget that in fact developing it is a tiny part of the process (as illustrated in Fig. 1. in the article). What's worse, even the actual programming aspect of the development Machine Learning models often can be belittled leading to substantial errors.
The article is available here: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
1. Do you care about code quality in your projects?
2. Do you think it's worth the effort to follow some software development standards or best practices while working on research projects?
3. Have you ever experienced or heard about scientific experiments leading to incorrect conclusions due to some problems with the programming code (or software)?
1. Do you care about code quality in your projects?
ReplyDeleteNot only. I try to take care of everything in projects that I implement or supervise. Not just about the quality of the code. The timeliness of implementation, the quality of used physical components and all that causes that the project is well perceived also.
When it comes to code quality, I always try to keep it clean. I think another programmer should start the source code and know what is where. Why it isn't longer necessarily need to know :-D.
2. Do you think it's worth the effort to follow some software development standards or best practices while working on research projects?
Source code, definitely. We should try to comply with good practices, standards and requirements. It gives quite good results - I think so. In life differently. The legal norm regulates many things and good practices are often not useful here. Often good practices against legal norms.
3. Have you ever experienced or heard about scientific experiments leading to incorrect conclusions due to some problems with the programming code (or software)?
Yes, about two semesters ago someone even presented an article in which after some time an error was found. Still a question without having to answer:
Why is Microsoft so often producing patches for its operating system?
Such examples can be multiplied.
Thanks for the comment. Yes, critical patches to already released, commercial software are proof that some deadlines are really final :D
DeleteMe, I've recently seen some updated scientific paper stating that the original results were affected by a bug spotted after the authors published the code on GitHub. Good thing is that the science is self-correcting.
1. Do you care about code quality in your projects?
ReplyDeleteIf I write for myself, I don't really care about the quality of the code. However, if he works on projects, the quality of the code and the use of appropriate design patterns is crucial. Although the written code does and works, it's not everything. The first thing is to improve its quality, legibility. And the most important thing. Always write comments. Last time I had to re-analyze my code because I forgot what it does. By taking care of the quality of the project we ensure less stress at work. For us as programmers, high quality code means that we can focus on making changes.
2. Do you think it's worth the effort to follow some software development standards or best practices while working on research projects?
Working in a research team is completely different from working in a corporation. The most important difference is that researchers rarely have experience in working as professional programmers. A completely different subject is dealt with by an employee at the university and others work in a corporation. An employee at the university has no experience in commercial programming. I work on college projects and create a lot of scripts. The biggest mistake I made was that I didn't comment what exactly these scripts are calculating.
3. Have you ever experienced or heard about scientific experiments leading to incorrect conclusions due to some problems with the programming code (or software)?
That's what I heard. The error was in using the wrong type of variable. The error was because the variable storing floating point values was used incorrectly. The work with the wrong results has been published. Luckily, I wasn't a co-author of those studies. I remember one more known error. The well-known spreadsheet automatically treats numbers as dates. The analysis with the time reference values concerned genetic testing.
Microsoft Excel blamed for gene study errors: https://www.bbc.com/news/technology-37176926
Thanks for the reference to Microsoft Excel related error. It seems ridiculous that there may be problems like that but when you think about it, auto correct sometimes may be difficult to spot.
DeleteVariable types also may cause a lot of troubles. For example in Python it's required to use dedicated module for high precision calculations rather than use built-in types.
You're right, probably one of the most properties of a good software is the easiness of modifications. It holds for academic software as well as reviewers often may ask for additional experiments that require significant changes in the original code.
1. Do you care about code quality in your projects?
ReplyDeleteYes, I care about code quality. I always try to write code that is compatible with programming language standards. Sometimes it’s not necessary to care about code quality because we need a working code in short time for one task. I think we can say it's a temporary code writing.
2. Do you think it's worth the effort to follow some software development standards or best practices while working on research projects?
I think it’s all depends for what purpose we use our code. If we want use code to many research projects then I think it’s worth to follow software development standards. A good quality code will work better and another programmer will easy understand how our code work. If we need code for one project and we need a working prototype as fast as we can then we can skip programmer standards.
3. Have you ever experienced or heard about scientific experiments leading to incorrect conclusions due to some problems with the programming code (or software)?
Yes, I have heard about incorrect conclusions caused by programming code. I don’t remember who told me about it and what exactly was it.
Thank you for the comment. You're right, there may be some tasks that can be solved by quick and dirty methods. But sometimes by mysterious turn of events you have to reproduce some results depending on that throw away code and nothing works as in the first time - this is the worst.
Delete
ReplyDelete1. Do you care about code quality in your projects?
I try to keep my code clean, but sometimes when the time is pressed I can not comment on the code and try to implement my work the fasterSometimes you have to choose between quality code and timely release implementation
2. Do you think it's worth the effort to follow some software development standards or best practices while working on research projects?
Yes, of course, I think the use of generally accepted norms in the project helps a lot. We also develop our local standards in the implementation of the project.
3. Have you ever experienced or heard about scientific experiments leading to incorrect conclusions due to some problems with the programming code (or software)?
Yes, of course, and not once. Even we use libraries in which we have our own mistakes and we learned how to get around them
Thanks for the comment, indeed there usually is a trade-off between code quality and delivery time. Especially in commercial products probably release dates are more important than quality.
ReplyDelete1. Do you care about code quality in your projects?
ReplyDeleteOf course, when it comes to more serious tasks in the project, I always try to keep my code clean. When I want to check on something fast, I don't care so much about it. I think that especially when working as a team, the quality of the written code is very important, nowadays I work mainly on my own with various solutions that I introduce.
2. Do you think it's worth the effort to follow some software development standards or best practices while working on research projects?
The clarity and legibility of the code written says a lot about its author. If I notice a poorly written code in some research project, my trust in the results is immediately lost.
3. Have you ever experienced or heard about scientific experiments leading to incorrect conclusions due to some problems with the programming code (or software)?
I happen to meet especially some older articles that I can't quite trust due to the code. However, I have never heard of any more serious scandal in which the article had to be retracted.
1. I mainly write the code for my own use and I have to admit that in this aspect I am terribly sloppy. When I look into old codes and see a comment, I thank myself from the past, because without it I wouldn’t understanding most things. Very often, instead of normal variable names, I write: a, b, aa etc. because when I test a solution, it is faster. Unfortunately, this is often the case. Fortunately, only I work with my code, so it doesn't hurt anyone.
ReplyDelete2. I repeat to my students that flexible and transparent code is very important. This helps to adapt to the standards required for various projects. Specifically, when it comes to research projects, the question is what do we want to achieve. If we will be in this project for a long time and the code from a year ago will still be useful to us, it is good to have everything sorted with comments so as not to get lost in everything. If it’s short-term solution, to conduct quick tests, it depends on who is writing. I have colleagues who pay close attention to code writing and formatting and do not allow any deviations.
3. I have not experienced it, but I have heard of several such experiments. It seems to me that sometimes we place too much confidence in a repository found somewhere on GitHub or another site. We also often look for a solution on the Internet when we are at a standstill. It is difficult to test a solution (someone's code) whether it is correct, since we consult the Internet when our own code does not work. We assume that since someone provides this somewhere, it is rather a tested solution.
1. Do you care about code quality in your projects?
ReplyDeleteI try to comply with certain writing standards. Usually, I start creating a program with a sheet of paper and a pencil to first draw an application scheme, then create dependencies, etc. I only optimize what I can do in the project at the end and try to simplify it as much as possible. I do not sit too long to improve the code so as not to spoil it even more.
2. Do you think it's worth the effort to follow some software development standards or best practices while working on research projects?
As I mentioned earlier, I try to follow certain standards. I also think that it is easier when creating applications, because at the design stage you can simplify or optimize a lot of things. Of course, usually during the project there are changes in functionality or how to solve emerging problems.
3. Have you ever experienced or heard about scientific experiments leading to incorrect conclusions due to some problems with the programming code (or software)?
Mistakes in scientific experiments due to software errors are common. At universities, test results are questioned even when "niche" software is used. A notable example was, for example, NASA error from 1999. Scientists from NASA performed calculations in metric units, while the team of engineers responsible for navigation software used inches and feet. The $ 80 million probe did not stay in orbit around Mars, crashing into its surface.
1. Do you care about code quality in your projects?
ReplyDeleteYes, I think it is very important matter. When I started programing my code was messy, and I learned by my mistakes that after creating something on hurry only to "work right now" we pay a big price when at the end of the project there is a need to make some small change and it turns out that in order to do it you have to change everything else. Besides that, when code is sloppy and you have to go through it in a while and you have no idea what “author” have in mind writing it frustration at yourself is rising high and your self-esteem is running down. I recommend books: clean code and clean coder for everyone who is coding – it changed my perception at this subject.
2. Do you think it's worth the effort to follow some software development standards or best practices while working on research projects?
Definitely, especially when you work in a group, it makes everyone’s life easier. Best practices are worth to follow, they make your code more efficient, and prevent form some reworks when at the end you go to conclusion that some part of your project should be more efficient, or it’s not working with others modules.
3. Have you ever experienced or heard about scientific experiments leading to incorrect conclusions due to some problems with the programming code (or software)?
I can't recall that case from the top of my head, but I’m sure that it happened. Many people are using ML algorithms without understanding it, and do not critically analyzed result. When you just taking results from “black box” without knowing what you should get from it you can make a mistake easily.
1. Do you care about code quality in your projects?
ReplyDeleteSure! Always! This is extremeky important for me because I hate bad code quality...the time which you will lose during write well quality code is much more shorter than time which you will waste trying to understand what this code do later.
2. Do you think it's worth the effort to follow some software development standards or best practices while working on research projects?
It depends. I understand that people which trying to show their idea don't have to been software developer so for them it could be hard. Moreover if conception is clear and well described then source code isn'a as important-it should only working to show that the results are correct.
3. Have you ever experienced or heard about scientific experiments leading to incorrect conclusions due to some problems with the programming code (or software)?
Yes, in my office (but no in any research conclussion). The error was slightly stupid-my colleague run program on other platform (32bit vs 64bit) and lost value precision. Then when he compare them with each other he get very big accuracy (which was fake). Live and learn.
1. Do you care about code quality in your projects?
ReplyDeleteI'm trying. I really do. But often times I have to work in teams that stick to metrics, without second thought or wider considerations. Quality then becomes a parody, a theatre of sort, in accordance with Goodhart's law. For example we are keeping code coverage by tests metric above the threshold, but still glaring obvious errors are present in the code - tests are not testing anything, just check the "happy path" to keep metrics goals met. In such cases I argue couple of times and then just give up, sit by the river and wait.
2. Do you think it's worth the effort to follow some software development standards or best practices while working on research projects?
Yes, but those are different best practices than application development, and different than library development. For research it is important to be able to track the flow of data, to be able to re-run parts of the process. Comments explaining why things are done are important. Research code is usually "denser" with more concepts per line of code, and don't have to bother with securing against malicious input.
3. Have you ever experienced or heard about scientific experiments leading to incorrect conclusions due to some problems with the programming code (or software)?
I've seen it happening, the input data was prepared incorrectly and weeks of model training went down the drain. I suspect there was some more cases that we didn't caught, but those must be minor compared to this.
1. Do you care about code quality in your projects?
ReplyDeleteYes, I cooperate with multiple programmers, and understandable and readable code is a must-have in our habitat.
2. Do you think it's worth the effort to follow some software development standards or best practices while working on research projects?
Yes, I think that this is a significant factor in the research reproducibility. Obtaining consistent results using the same data and code as the original study should be backed with open source code that we're using.
3. Have you ever experienced or heard about scientific experiments leading to incorrect conclusions due to some problems with the programming code (or software)?
Oh yes. Even a few months ago: https://arstechnica.com/information-technology/2019/10/chemists-discover-cross-platform-python-scripts-not-so-cross-platform/ - "[...] The reason for the variation was the scripts' use of Python's glob module, which searches for files matching a specific name pattern—the scripts generated a list of input files to read based on the glob results. But the module depends on the operating system for the order in which the files are returned. And the results of the scripts' calculations are affected by the order in which the files are processed." So yes, we should care about proper documentation and self-explanatory methods.
1. Do you care about code quality in your projects?
ReplyDeleteWhen I start writing code, I always tell myself that this time the code will be of good quality. Usually at the beginning my code is of good quality and then… it depends. Sometimes there is no time to follow all the programming rules, or, for example, I work in a team on a module and a colleague gives me his/her code, which I need to finish. If such code does not contain the principles of good programming, I do not change it but rather I continue.
2. Do you think it's worth the effort to follow some software development standards or best practices while working on research projects?
I think so, because using good programming practices or standards is independent of whether it is a commercial project in an IT company or a research project. Unfortunately, I know people who cannot program well and work at the university. I work with a person of that kind. She did not know the basic issues related to programming, so there was not about following good practices, but rather struggling for the program to compile.
3. Have you ever experienced or heard about scientific experiments leading to incorrect conclusions due to some problems with the programming code (or software)?
I wrote on the Internet "the biggest programming errors in history", and I found, among other things, information about a scientific experiment - The Mars Climate Orbiter. The Mars Climate Orbiter was launched and never completed its mission. The error was related to navigation. Teams that controlled the orbiter on Earth used imperial units while software calculations used the metric system. The calculations had an impact on the flight path. As a result of incorrect calculations, the orbiter was destroyed due to friction on the Martian atmosphere.