Explicans' Mission and Justification

Understanding calculations is critical

In physics, mathematics can be used to formulate general laws that apply with great precision and for all times. In the business world, mathematics is usually more rushed and much less successful. Business models are frequently thrown together by busy people, and, if they work at all, only work approximately in certain times and places.

As a result, the thought that underlies a business calculation or model will often be more important than the decision prompted by the calculation itself. For instance, regarding the use of macroeconomics for decision-making, the Economist wrote "But all models should ultimately be seen as pedagogical devices, their calculations a means to the end of helping policymakers think through their decisions." (2006) Business authors have been saying similar things for decades. Peter Drucker wrote that all tricky decisions were wrong (in the sense that they were at best kludges that worked for the moment) but not all decisions were fragile and brittle. (Effective Executive, 1966)

Even when a calculation's result is more important than its meaning, creating understandable calculations is very important for at least three reasons.

  1. As mentioned above many models have very limited applicability. Often the best way to know when a model can and can't be used is just to understand the model itself.
  2. Many calculations are constantly in flux. Because of their limited applicability they must be continually modified for new conditions. At any given time the human understanding which allow the calculation to be modified is often of greater long-term value than a single set of results.
  3. Finally, even if a model were eternally perfect and infinitely accurate, it would not be useful unless people believed in it. Quantitative data is surprisingly inaccurate, and business people are often (justifiably) suspicious of the results of black-box calculations. Conveying an understanding of a model is often the best way to convince someone of its usefulness.

According to a SERP survey of MBAs, over 85% of spreadsheets are modified after their initial implementation, usually by someone other than the author. Furthermore, real-world spreadsheets contain little documentation, and are frequently totally undocumented. (Baker et al., 2006) Given this, it is crucial that business models are built in a format that not only produces a numerical answer, but that also facilitates communication.

Explicans takes seriously the idea that a primary purpose of a calculation is to convey understanding. Furthermore, its target audience is not computer programmers but intelligent people whose specialty may not be abstract quantitative thought. These goals lead to design decisions that are quite different from most other other computation systems, like traditional spreadsheets or general-purpose programming languages.

Spreadsheet Errors

Studies agrees that spreadsheets are invariably full of errors (see Panko's "What we know About Spreadsheet Errors"). Some types of errors (for instance, data entry errors) may be unavoidable. However, Explicans is expected to have have far fewer errors than typical spreadsheets for three basic reasons:

  1. Better type system. A type system can catch many errors as soon as a formula is entered. ("compile-time" as it were)
  2. Fewer lines of code. Because higher-order concepts are expressible in Explicans, equivalent logic will be much more succinct.
  3. Easier inspection and testing. In the standard spreadsheet view, only one formula is viewed at a time. In Explicans, formulas are always visible, and blocks can be abstracted into formulas for testing.

These reasons are elaborated on below.

Type system

Two researchers on spreadsheet errors put it this way:

Type systems have traditionally played a strong role in detecting errors in programming languages, and it is therefore reasonable to ask whether type systems could not be helpful in improving the current situation of spreadsheet programming. (Type Inference for Spreadsheets, 2006)

Programmers are familiar with the utility of a type system in catching errors. Traditional spreadsheets like Excel have very weak type systems, for instance allowing boolean values to be concatenated with strings without errors.

Furthermore, traditional spreadsheets have a type systems that cannot contemplate ranged or nested data. Thus basic type functions like bounds checking or data encapsulation are impossible. Researchers have suggested using hierarchical representations with spreadsheets to control errors (e.g. Rajalingham et al., 2000) but these cannot be represented in the spreadsheet itself.

To take an example, it is common in Excel to do lookups using the VLOOKUP function, as in =VLOOKUP(A$3, Tab84!$K$84:Tab84!$AG128, 15, False). But with arrays this code could be reduced to Revenue[Sales Year]. Research shows that spreadsheet are particularly error-prone when a formula involves a cell from a different row and column. The row/column notation is just too confusing. Because Explicans will allow straightforward use of user-given names, this type of error will be reduced.

Fewer lines of code

Panko (1998-2008) puts this very well:

Broadly speaking, when humans do simple mechanical tasks, such as typing, they make undetected errors in about 0.5% of all actions. When they do more complex logical activities, such as writing programs, the error rate rises to about 5%. These are not hard and fast numbers, because how finely one defines reported "action" will affect the error rate. However, the logical tasks used in these studies generally had about the same scope as the creation of a formula in a spreadsheet.

In spreadsheets, a similar metric [to faults/KLOC] is the cell error rate (CER). This is the number of errors divided by the combined number of numerical and formula cells. (Label cells are like comment cells, which are ignored in faults/KLOC). Equivalently, this is the percentage of non-label cells containing errors. Table 1 shows that just as faults/KLOC usually falls into a narrow range (Panko, 2005a), the CERs seen in spreadsheet studies have been very similar.

Although it is a simplification, research has shown that in most cases number of errors is proportional to the number of lines of code for most languages. Explicans' language will allow for higher-level ideas to be expressed, greatly reducing the number of lines of code. By analogy, for a given calculation, it is likely code written initially in Python or C will contain fewer bugs than code written in assembly. Traditional spreadsheets are like assembly in that you refer directly to memory addresses, and only secondarily can use meaning data structures.

For example, if two arrays are multiplied together, traditional spreadsheets require that a formula be repeated for each cell in the array. Explicans will allow result=array1*array2 to be expressed directly, potentially saving hundreds of lines of code. If an Explicans file contains only 20% of the formulas of the equivalent traditional spreadsheet file, it is likely the numbers of errors in it will decrease about 80%.

Code inspection and testing

As Panko (1998-2008) writes:

A very rigorous testing stage after the development stage is needed to reduce error rates by about 80% (Panko, 2005a). Whether this is done by data testing, line-by-line code inspection, or both, testing is an onerous task and is difficult to do properly.

Programmers know that eliminating bugs requires more than just care when coding. However, traditional spreadsheets make it very difficult to either inspect code for errors. Most obviously, the formulas are un By default you can only see one line of code at a time. The traditional spreadsheet was not designed to be inspected, and this basic design decision has many ramifications.

Using test-cases or "what-if" data is another promising way to reduce errors that is common in software development. However, traditional spreadsheets are hard to test on new data. One article (Powell et al. p135, 2008) found what-if tests to be promising at finding spreadsheet errors, except that users often introduced new errors while trying to test their spreadsheet on different data!

Explicans will allow arbitrary code to be abstracted into function form, in the interface and without special programming skill.

Popularity of Spreadsheets

Spreadsheets are surprisingly neglected considering their popularity.

… Spreadsheets, which are (first-order) functional programs, are the most widely used programming systems in the world. It is estimated that each year tens of millions of professionals and managers create hundreds of millions of spreadsheets. (Abraham and Erwig, 2006)

It's not surprising that 95%+ of businesses use spreadsheets in some form (Panko, 1998-2008), but more surprisingly, many of the semi-advanced features of spreadsheets seem to get wide use also. For instance, a SERP survey (2006) found that 70% of MBAs use spreadsheets' database-style functionality.

Given all of the above, a next-generation open-source spreadsheet system seems well worth exploring.

References

See influences for all bibliographical information.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License