There are several stories playing out in The Red Telephone Box that talks a bit like me.
The first one is about my interest in acting and performing and whether a computer generated voice can ever be a decent actor. The second are a series of personal stories many of them related to family or childhood, some true, some embellishments of the truth and some just made up. The third are some failed attempts to create some sort of continuous narrative that glues the various episodes together. For example this attempt. https://k6.gravityisahat.co.uk/2019/06/05/operator/
So far as of today 21st May 2021 I haven’t alighted on a solution in which I am truly confident. Some of these attempts are documented in the blog so readers may find the sequence of events rather confusing as I change my mind and try other solutions without necessarily flagging up that I have done so.
I have copied the following from the few posts that are focused on issues other than technology.
Given the persuasiveness of the box itself (often referred to as an iconic architectural object representing Britishness) a sensible objective is to provide a setting in which the telephone box itself is displayed to full advantage and thus provides a credibly realistic as well as atmospheric experience. Happily, it is set in a relatively remote place, in some respects a likely location for a phone box, outside my house on the side of a quiet no through road with five adjacent houses. Given its remoteness most users will probably stumble upon the installation when out walking as cars are rare.
This is a walk-up installation in which the user is encourage to participate by the ringing of the telephone bell as they pass the box. If this feature is not switched on (which it frequently isn’t) the user can choose to enter the box voluntarily and lift the receiver. Written instructions are provided to direct the user to dial zero for the operator and on so doing they are given spoken instructions about the basic operation of the telephone. Many younger users are unfamiliar with dial techniques and the need to proceed slowly and patiently. This has proved to be a persistent problem particularly as children tend to be the predominant users.
For older users there are probably nostalgic associations with this type of telephone box. I have speculated that for some, despite it not being a blue police box, there could be an association with Dr Who and time travel which could also be useful. Some of the first scenarios I developed played on these qualities.
The box is for kissing, for emergencies, for good news, for bad news, for smoking, for pissing, for sheltering, for waiting.
Dialing a number on the phone connects to a computer version of my voice. The voice was created by my good friends at Cereproc who are world leaders in this field. The method requires me to create recording of many hours of specific sample sentences from which all the sounds required for the production of intelligible speech using my voice can be extracted. The TTS Text to Speech Synthesis technology then has to search for the right sounds and assemble them into words and sentences. It produces a slightly unreal version of my natural speaking voice. It certainly sounds like me but the emotion is largely gone and the inflection may sound rather false. Most clearly it doesn’t understand what it is saying. This is a limitation of all TTS systems at present. While it is possible to create a decent human like effect using a machine the machine has no awareness of the meaning of the words it is delivering. This can result in small silly mistakes or more serious chunks of unintelligibility but it always results in unbelievability, it simply isn’t a credible person speaking, hence so far at least, TTS has not caught on for audio books. In theory any text can be passed to it programmatically and it will attempt to render it in my voice. In addition to the computer voice, recordings of my real voice are used, some subject to significant digital signal processing to produce weird and wonderful effects. Finally, and only for specific scheduled performances, unbeknown to the user, my live voice can be added to the mix.
 The voice was created by Cereproc. Cereproc (2018) Welcome to CereProc | CereProc Text-to-Speech. Available online: https://www.cereproc.com/ [Accessed.
 These dynamic capabilities are not exploited in the installation yet.
It is a 1937 red telephone box known as kiosk no. 6 (K6) designed by Giles Gilbert Scott (Coltman, 2018) located in a quite North Yorkshire hamlet near the river Fleet. It was purchased from Ebay and restored by me. The telephone, model number 232 (Britishtelephones.com, 2018) looks and behaves like a normal telephone of the 1930’s, it can ring, it has a dial, it has a dialing tone etc. Both the telephone and the box are iconic examples of industrial design much celebrated by the heritage community and in the case of the red telephone box, by the British tourist industry. I cannot deny the fact that these qualities acted as a draw for me. Tangible industrial objects, engineered to last, using materials like brass and Bakelite lend themselves to reimagining in the way that many modern similar artefacts do not. Items such as these have given rise to Steampunk and the current vogue for decorative objects upcycled from vintage industrial detritus. Like any stage, the theatre in which it is situated, the world it inhabits beyond the proscenium arch is part of the experience. In this case the telephone box is both stage and theatre.
The telephone box, let’s call it ‘Chris’, is part vocal cenotaph, part gramophone, part ventriloquist and part sound installation. Just occasionally it is a telephone and sometimes it pretends to be an oracle. It merges four ideas. The test devised by Alan Turing to validate whether a machine can successfully impersonate a human, known as the Turing Test (Turing, 1950). A method used in Human Computer Interaction (HCI) research in which a machine interface is controlled by unseen human operatives, known as The Wizard of Oz method (Kelley, 2018), the unrealized machine proposed by Thomas Edison for communication with the dead in his paper ‘The Realms Beyond’ (Edison, 2015) and a fascination and fear I share with many others for ventriloquism (Connor, 2000).
The motivation for the installation erupted when my cancer diagnosis in 2014 seemed drastic and I felt the need to leave some sort of memorial to myself and a record of my obsessions. Happily, that urgency has abated but the desire to bring the project to fruition has not.
The choice of a speech-based installation was inspired by my career in the vocal arts, mainly as an opera director, my research in computer generated speech and Auslander’s observations on Liveness, (Auslander, 1999). In particular whether liveness (a quality of real-time live performance) can be faked in a non-live performance e.g. a recording. Knowing how to fake those qualities that make a performance feel ‘live’ was an aptitude I had seen in many professional actors. Of course, this met a different requirement from my telephone box as the performances were already live and the need was to create a sense of real-time spontaneity, of making every performance seem as if it is the first, but I speculated that the methods I saw exploited could be tried on non-live platforms such as computer voices. In particular I had become greatly enamored by a method of speaking Shakespeare’s verse that John Barton (Barton, 1984) and Peter Hall (Hall, 2004) had taught. The method required the actor to set aside factors such as character, situation and motivation and scrutinize the structure of the verse and prose, in particular observing line endings and full stops. This was a rigorous procedure required significant discipline on the part of the actor and needless to say some would rebel against its constraints, however those that persisted were often surprised when they discovered that the underlying ‘musical score’ of Shakespeare’s verse, if followed, could produce some remarkably ‘lively’ results. I had the idea that this procedural method of creating this liveliness in speech was something that a computer, with no capacity for motivation or real emotion, might be able to replicate. Of course, there turned out to be so much more to it than that however a sense that the answer to convincing, emotional, characterful computer-generated speech could be something other than just aping human speech, still interests me.
Auslander, P. (1999) Liveness: performance in a mediatized culture. London, New York: Routledge.
Barton, J. (1984) Playing Shakespeare. London: Methuen.
Britishtelephones.com (2018) Tele No. 232. Available online: http://www.britishtelephones.com/t232.htm
Cereproc (2018) Welcome to CereProc | CereProc Text-to-Speech. Available online: https://www.cereproc.com/
Coltman, R. (2018) The Telephone Box | Kiosk No 6. Richard Coltman. Available online: http://www.the-telephone-box.co.uk/kiosks/k6/
Connor, S. (2000) Dumbstruck: a cultural history of ventriloquism. Oxford: Oxford University Press.
Edison, T. (2015) Thomas Edison: The Lost Chapter • ITC Voices. Available online: http://itcvoices.org/thomas-edison-the-lost-chapter/
Hall, P. (2004) Shakespeare’s advice to the players. London: Oberon Books.
Kelley, J. F. (2018) Where did the term Wizard of Oz come from. Available online: http://www.musicman.net/oz.html
Turing, A. (1950) Computing machinery and intelligence. Mind, VOL. LIX. No.236., 433-460.
Interaction with the artwork has proved annoyingly problematic. I did not expect users to find a telephone dial difficult to use. I should have predicted this given the technology is now 30 years out of date. Many younger people have never ‘dialed’ in their lives. I should have taken account of the fact that when it was invented the British Post Office precursor of General PO produced short information films showing people where to stick their finger in the dial mechanism. In my box the knock-on effect of user error created a perfect storm of system failures. The technology of the 1930’s, the user and software of 2018 struggled to come to terms with differing expectations of system speed and responsiveness causing much bad-tempered slamming down of handsets and walking away moaning about having heard nothing.
My predicament is that I want the telephone box to continue to remain a bit of an enigma or a mystery to those that encounter it. If I were to ‘prop it up’ it with copious operating instructions some of the mystery would be undermined. It would become an arcade game or a gadget. Something to be solved or competed against. I am hoping to provide the somewhat debased concept of ‘an experience’, as something the user can discover for themselves. Obviously this throws up issues of usability on the assumption that the less information the user is given, the more errors they are likely to make and the less satisfactory the experience will be. Conversely too much information overwhelms the user with the same result. The unwillingness to wait for things to happen, analogue electro mechanical technology being so much slower than digital, was initially a problem, requiring the judicious use of audible messaging such as ‘please hold the line’ or wait ‘3 seconds.’ That said, even when they are given instructions users tend to ignore them, so I have reconciled myself to the inevitable slamming down of handsets. The time in rehearsal has helped solve some of these problems but by no means all.
I need to figure out how apply the theatrical adage of keeping ‘bums on seats.’ A surreal, alienating piece of audio experimentation stuck in a country lane in North Yorkshire does not keep people entertained long enough to justify the notion of a public theatre and while I have no unalterable desire to serve a public, I also don’t have a desire to bore them rigid, show them how clever I am or simply frustrate them. Thus, the content of the material spoken, the script if you like, has gone through many, many rethinks and is still being rethought as I write this.