Influences

Software Packages

  • Traditional Spreadsheets like Visicalc and Microsoft Excel.
  • Alternative spreadsheets like Spreadsheet 2000 and Lotus Improv
  • Python, the language I'm most fluent in. My default is basically to imitate Python.
  • Haskell for being functional and having lazy evaluation.
  • Treesheets the freeware note-taking program
  • Ruby for some nice syntax:
    • Don't require parens for method calls with no arguments. E.g. obj.len instead of obj.len()
    • Nice unified object model. I'm sure Smalltalk or whatever had this too but I'm not familiar with them. Python has adopted a similar system, perhaps so they didn't look bad compared to Ruby.
    • Elipses, i.e. "5…10"
  • Lua - a simple language who's basic data structure is a "table", which is an associative array. Explicans basic data structure is also an associative array.
  • Scheme
    • A Simple and mostly functional language
    • Lexical scoping, I think it had it first!
    • Predicates end in question mark. I.e. "(empty? foo)". Ruby uses this convention also.
  • The R Language for the way functions act on each element of a vector.

Academic Papers

Why ignore decades of research on spreadsheets? Below is an annotated bibliography of papers I've read that have some applicability to Explicans.

Type Inference for Spreadsheets

[2006] Abraham, Robin; Erwig, Martin. "Type Inference for Spreadsheets" Proceedings of the 8th ACM SIGPLAN international conference on Principles and practice of declarative programming

This paper isn't directly relevant to Explicans, because it is concerned with using type systems on existing weakly typed spreadsheets instead of designing a type system for a new spreadsheet, but it does have some good quotes summarizing the role of spreadsheets and type systems:

  • "… Spreadsheets, which are (first-order) functional programs, are the most widely used programming systems in the world. It is estimated that each year tens of millions of professionals and managers create hundreds of millions of spreadsheets."
  • "Type systems have traditionally played a strong role in detecting errors in programming languages, and it is therefore reasonable to ask whether type systems could not be helpful in improving the current situation of spreadsheet programming."

SERP MBA Survey

[2006?] Baker, Kenneth R.; Foster-Johnson, Lynn; Lawson, Barry; and Powell, Stephen G. "A Survey of MBA Spreadsheet Users." http://mba.tuck.dartmouth.edu/spreadsheet/product_pubs_files/SurveyPaper.doc

This is a great collection of facts about spreadsheet use (mostly from other papers). Among them:

  • Most spreadsheets are not planned out, people just go at it. Almost no spreadsheets separate data from formulas. (p5-6)
  • The most frequent testing method was checking formulas with test data. (p7)
  • Spreadsheets contain very little documentation, and are frequently totally undocumented. (p8)
  • The majority of spreadsheets are reused, sometimes very often (e.g. multiple times per week) (p9)
  • Use of various spreadsheet functions (p9):
    • Macros are used in many spreadsheets (maybe around half)
    • Database functions are used by about 70% of spreadsheet users, and about a third use them very frequently.
    • Fewer use statistics functions. The majority never use summary statistical functions.
    • More advanced users tend to do what-if stuff, statistics, and macros more.
  • Over 85% of spreadsheets are modified after their initial implementation, usually by someone other than the author.
  • In a survey of MBA students, 79% said spreadsheets were "very important" or "critical" in their work.

Real Programmers Don't Use Spreadsheets

[1992] Casimir, Rommert J. "Real Programmers Don't Use Spreadsheets" ACM SIGPLAN Notices 27(6) p10-16.

TBD

What We Know About Spreadsheet Errors

[1998-2008] Panko, Raymond R. "What we Know About Spreadsheet Errors" Originally: Journal of End User Computing's Special issue on Scaling Up End User Development Volume 10, No 2. See http://panko.shidler.hawaii.edu/SSR/Mypapers/whatknow.htm

This is a great, up-to-date compilation of the research on spreadsheet errors. Here is a summary quote:

  • "Spreadsheets, even after careful development, contain errors in one percent or more of all formula cells. In large spreadsheets with thousands of formulas, there will be dozens of undetected errors."
  • "Every study that has attempted to measure errors, without exception, has found them at rates that would be unacceptable in any organization. These error rates, furthermore, are completely consistent with error rates found in other human activities. With such high cell error rates, most large spreadsheets will have multiple errors, and even relatively small "scratch pad" spreadsheets will have a significant probability of error."

According to surveys that Panko quotes, 85% of business leaders use spreadsheets for budget forecasting, and 95% for financial reporting. Also,

  • "[W]e would guess that the largest portion of large-scale end user applications today involve spreadsheet development."

Panko examines many surveys, and find that there is a kind of minimum human error rate for certain kinds of tasks, and spreadsheets are no exception:

  • "Broadly speaking, when humans do simple mechanical tasks, such as typing, they make undetected errors in about 0.5% of all actions. When they do more complex logical activities, such as writing programs, the error rate rises to about 5%. These are not hard and fast numbers, because how finely one defines reported "action" will affect the error rate. However, the logical tasks used in these studies generally had about the same scope as the creation of a formula in a spreadsheet."
  • "In spreadsheets, a similar metric [to faults/KLOC] is the cell error rate (CER). This is the number of errors divided by the combined number of numerical and formula cells. (Label cells are like comment cells, which are ignored in faults/KLOC). Equivalently, this is the percentage of non-label cells containing errors. Table 1 shows that just as faults/KLOC usually falls into a narrow range (Panko, 2005a), the CERs seen in spreadsheet studies have been very similar. The relatively few divergent numbers that appear within a life cycle stage, furthermore, come when only certain cells are considered—generally cells with high potentials for error."
  • Code inspection or some other form of intensive testing may be needed to detect errors that remain at the end of the development stage.

Panko gathers a lot of evidence on spreadsheet overconfidence (that we think that spreadsheets are a lot more accurate than they really are). Also see the Kruger-Dunning effect. Below are two good quotes:

  • The answer may be that both developers and corporations are overconfident about the accuracy of spreadsheets, despite the large amount of information to the contrary. As noted earlier, when Brown and Gould (Brown & Gould, 1987) gave three spreadsheet development tasks to nine highly experienced spreadsheet developers, all made at least one error, and 63% of the spreadsheets overall had errors. Yet when asked about their confidence in the correctness of their spreadsheets, their median score was "very confident."
  • While such massive levels of overconfidence seem unreasonable, overconfidence is perhaps the most well-established phenomenon in the behavioral sciences.

Here is his conclusion:

  • "In programming, we have seen from literally thousands of studies that programs will have errors in about 5% of their lines when the developer believes that he or she is finished (Panko, 2005a). A very rigorous testing stage after the development stage is needed to reduce error rates by about 80% (Panko, 2005a). Whether this is done by data testing, line-by-line code inspection, or both, testing is an onerous task and is difficult to do properly."

Spreadsheets on Trial

[1996] Panko, R.R. and Halverson, R.P. "Spreadsheets on Trial: A Survey of Research on Spreadsheet Risks" Proceedings of the 29th Annual Hawaii International Conference on System Sciences

Although an old paper, it gives an overview of the literature of the time and shows that many drawbacks of traditional spreadsheets have been appreciated for a long time. Two quotes are interesting:

  • "Broadly, a number of writers have expressed concerns over spreadsheeting as a development language. These writers have focused on such concerns as cryptic formulas and the difficulty in seeing more than one formula at a time." (p326)
  • "… So you can estimate the number of errors you are likely to find in a program by multiplying the number of non-comment lines of code by about 5%. Appendix A shows that CERs [cell entry errors] in spreadsheeting experiments and one field study have been comparable." (p328)

Explicans is designed to have much less cryptic formulas, have multiple formulas visible at the same time, and require many fewer lines of code for an equivalent task. It seems like it should help a lot with the above problems.

A critical review

[2008] Powell, Stephen, Baker, Kenneth, and Lawson, Barry. "A critical review of the literature on spreadsheet errors." Decision Support Systems 46, p128-138.

This is a recent paper that analyzes in great detail the different types of errors that are possible. They resist concluding anything definitively other than that spreadsheet errors are quite common and can be a big problem. A 5% cell error rate in an unaudited spreadsheet still seems to be a reasonable estimate.

One interesting section is on page 135, where they mention the use of "what-if" tests to detect errors. From the summary, it looked like that kind of testing did reveal a number of errors. However, the process of editing of the spreadsheet also introduced many errors, making the exercise sometimes self-defeating.

But with Explicans, it will be as easy as in a traditional programming language to turn a block of code into a function. Hopefully this will make what-if tests easier to do and less error prone.

Quality Control in Spreadsheets

[2000] Rajalingham, Kamalasen; Chadwick, David; Knight, Brian; and Edwards Dilwyn. "Quality Control in Spreadsheets: A Software Engineering-Based Approach to Spreadsheet Development" Proceedings of the 33rd Hawaii International Conference on System Sciences

TBD

This paper is interesting because it recommnds hierarchical (tree-form) representation of a spreadsheet as a way to reduce errors.

Yoder Papers

[1994] Yoder, Alan G. and Cohn, David L. "Real Spreadsheets for Real Programmers" Proceedings of the 1994 International Conference on Computer Languages

[1997] Yoder, Alan G. and Cohn, David L. "Domain-Specific and General-Purpose Aspects of Spreadsheet Languages" First ACM SIGPLAN Workshop on Domain-Specific Languages

These are interesting papers which shows how a redesigned spreadsheet could be used for "real" programming tasks. In particular, Yoder and Cohn allow for cells/blocks to be created dynamically, and for cells to contain other cells. They they implement some non-trivial algorithms like quicksort and a temperature gradient simulation. There are some interesting/crazy ideas:

  • Allow circular references to allow indefinite iteration as a kind of fixed-point locator
  • Relative references like this[+1, -2]
  • Message passing between cells, which have state
  • Manual allocation of memory with new/delete operators (!)

Most of these ideas are way too complicated for a spreadsheet. Automatic garbage collection seems like a safe design choice for a spreadsheet :) Still, their paper is great for expanding the space of spreadsheet possibilities, and there is an interesting section on macros, syntax, and inheritance in the "Domain-Specific" paper. There are also a few good quotes:

  • "The true measure of a programming metaphor is the problem domain it addresses. It's already well-known that modern spreadsheets are well-suited for ad-hoc business decision making and financial modeling."
  • "Spreadsheets are capable of solving problems from many domains. The poor quality of existing implementations have been the principle obstacle to a concrete realization of this. Poor choice of primitives and a lack of local variables and other HLL constructs have severely hampered users who attempt to stretch the envelope."

Other

  • Dungeons & Dragons for the idea that specific trumps general (probably not an original concept, but that's where I got it from)
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License