How Steve Wozniak Wrote BASIC for the Original Apple From Scratch
May 3, 2014 11:41 PM Subscribe
"Integer BASIC, written by Steve Wozniak, was the BASIC interpreter of the Apple I and original Apple II computers. Originally available on cassette, then included in ROM on the original Apple II computer at release in 1977, it was the first version of BASIC used by many early home computer owners. Thousands of programs were written in Integer BASIC." Metafilter's own Steve Wozniak discusses how he wrote BASIC for the original apple from scratch. (Previously.)
Bill Gates (along with Paul Allen and Monte Davidoff) also wrote a Basic interpreter from scratch, for the Altair computer.
He didn't design the computer it ran on, so it is not as impressive.
posted by eye of newt at 12:21 AM on May 4, 2014 [3 favorites]
He didn't design the computer it ran on, so it is not as impressive.
posted by eye of newt at 12:21 AM on May 4, 2014 [3 favorites]
10 HGR 20 HCOLOR=RND(1)*15 : HPLOT RND(1)*128,RND(1)*128 : GOTO 20posted by loquacious at 12:26 AM on May 4, 2014 [7 favorites]
Actually, that code I just wrote probably won't work. Replace the 15 with an 8 for the multiplier after the first rnd and it'll run here:
http://www.calormen.com/jsbasic/
I don't remember what the actual bounding values are for hcolor and hplot in high res graphics mode, but I thought it was more than 9 colors.
posted by loquacious at 12:45 AM on May 4, 2014 [3 favorites]
http://www.calormen.com/jsbasic/
I don't remember what the actual bounding values are for hcolor and hplot in high res graphics mode, but I thought it was more than 9 colors.
posted by loquacious at 12:45 AM on May 4, 2014 [3 favorites]
WOZ > GATES by a huge margin. Gates could have never done some of the insanely clever things that Wozniak did with hardware and software. There are legends about the Apple II boards and floppy controllers using like half the chips of any previous designs by understanding whole systems, electronic, code, and physical.
One example is found in the original floppy drive. That horrible noise the drive always made was a direct result of Wozniak realizing he could save a lot of money on the circuits to align the read-write head simply by over-running the head into the physical limit stop and allowing the metal on nylon fittings physically slip, thus neatly aligning the head ready for use at the start of each new inserted disc.
posted by loquacious at 12:54 AM on May 4, 2014 [18 favorites]
One example is found in the original floppy drive. That horrible noise the drive always made was a direct result of Wozniak realizing he could save a lot of money on the circuits to align the read-write head simply by over-running the head into the physical limit stop and allowing the metal on nylon fittings physically slip, thus neatly aligning the head ready for use at the start of each new inserted disc.
posted by loquacious at 12:54 AM on May 4, 2014 [18 favorites]
" This was even before Steve Jobs saw that my computer existed" BANG! Perfect.
posted by marienbad at 1:03 AM on May 4, 2014 [1 favorite]
posted by marienbad at 1:03 AM on May 4, 2014 [1 favorite]
This just happened to be on my nerdpocalypse of a desk.
Please excuse the mess, but these things happen when 30+ years of personal computing co-mingle. Yes, those are original Apple stickers, Sierra Online and Apple discs, Apple rainbow stickers and Apple II manuals and stuff. One of those discs is actually a non-bootleg copy of Leisure Suit Larry for some reason, which I don't remember obtaining.
posted by loquacious at 2:37 AM on May 4, 2014 [8 favorites]
Please excuse the mess, but these things happen when 30+ years of personal computing co-mingle. Yes, those are original Apple stickers, Sierra Online and Apple discs, Apple rainbow stickers and Apple II manuals and stuff. One of those discs is actually a non-bootleg copy of Leisure Suit Larry for some reason, which I don't remember obtaining.
posted by loquacious at 2:37 AM on May 4, 2014 [8 favorites]
We keep calling him "Metafilter's Own" Steve Wozniak, but he's only ever written one comment, an answer in Ask the same day he joined, October 4, 2006. As far as we know he's forgotten we exist.
Which is okay. Woz, if you do by chance read this, know that we still think you're aces. We, and every other computer user of a certain age.
posted by JHarris at 3:03 AM on May 4, 2014 [12 favorites]
Which is okay. Woz, if you do by chance read this, know that we still think you're aces. We, and every other computer user of a certain age.
posted by JHarris at 3:03 AM on May 4, 2014 [12 favorites]
The Apple II was an awful hack. It was brilliantly conceived, with beautifully compact code and physical design, but that structure effectively prevented the production of an evolutionarily-similar Apple III: the Apple IIe came after the Apple III, because the Apple III was (a) a pile of crap and (b) there were so many compatibility issues. It seemed as if every component did at least two jobs. I swear, if Woz designed a car it would be half the price and twice the speed of any comparable cars, but you would have to lock and unlock the doors by winding the windows up and down, and the indicators would flash in time with the stereo system. Because why have two timer chips when one will do?
posted by Joe in Australia at 3:24 AM on May 4, 2014 [8 favorites]
posted by Joe in Australia at 3:24 AM on May 4, 2014 [8 favorites]
The Apple II was not responsible for the III. Even Wozniak called that thing a complete failure. They intentionally lobotomized the III to not be fully backward compatible, and it was basically entirely marketing designed, not engineer/hobbyist designed.
And the reasons why Wozniak designed the II so sparsely is that chips used to be insanely expensive, and they didn't have IBM's resources and supplier chain to help manage costs. In retrospect it's easy to say "this is an awful hack" when you can swallow a 64 GB micro SD card for the price of a meal at greasy diner, but at the beginning of this era 4K of RAM cost as much as the down payment on a house. Using one timer instead of 4 meant more RAM and computer for much less cost.
And in the end, that awkward kludginess lead to some amazing hacks and solutions that found their way into games and other hardware, as though the weirdness and porosity of Wozniak's designs infected many, many people with interesting ways to solve problems. A lot of Sierra's better games and graphics are fueled by this elegant madness.
posted by loquacious at 3:46 AM on May 4, 2014 [12 favorites]
And the reasons why Wozniak designed the II so sparsely is that chips used to be insanely expensive, and they didn't have IBM's resources and supplier chain to help manage costs. In retrospect it's easy to say "this is an awful hack" when you can swallow a 64 GB micro SD card for the price of a meal at greasy diner, but at the beginning of this era 4K of RAM cost as much as the down payment on a house. Using one timer instead of 4 meant more RAM and computer for much less cost.
And in the end, that awkward kludginess lead to some amazing hacks and solutions that found their way into games and other hardware, as though the weirdness and porosity of Wozniak's designs infected many, many people with interesting ways to solve problems. A lot of Sierra's better games and graphics are fueled by this elegant madness.
posted by loquacious at 3:46 AM on May 4, 2014 [12 favorites]
HGR was in Applesoft basic, not integer basic, FWIW. To use HIRES* graphics from integer basic, you needed a separate toolkit and peeks, pokes, and calls.
*Neither high-res nor Hires root beer.
posted by plinth at 4:03 AM on May 4, 2014 [3 favorites]
*Neither high-res nor Hires root beer.
posted by plinth at 4:03 AM on May 4, 2014 [3 favorites]
Plinth, are you sure about that? I think the commercial release version of Integer BASIC had HGR. I remember using it all the time as a kid, but AFAIR it was labeled "Applesoft Integer BASIC".
I'm 99% certain of this because I remember having issues with not having floating point values as my coding progressed.
posted by loquacious at 4:08 AM on May 4, 2014
I'm 99% certain of this because I remember having issues with not having floating point values as my coding progressed.
posted by loquacious at 4:08 AM on May 4, 2014
Actually, scratch my last comment. Plinth is correct. I just searched some docs.
This is even more confusing to me than it should be because the family childhood computer I grew up on wasn't actually an Apple II or //e, but a Franklin Ace 1000, though we used original Apple system discs.
I'm pretty sure we had both INT and FP BASIC on discs.
posted by loquacious at 4:14 AM on May 4, 2014 [1 favorite]
This is even more confusing to me than it should be because the family childhood computer I grew up on wasn't actually an Apple II or //e, but a Franklin Ace 1000, though we used original Apple system discs.
I'm pretty sure we had both INT and FP BASIC on discs.
posted by loquacious at 4:14 AM on May 4, 2014 [1 favorite]
I swear, if Woz designed a car it would be half the price and twice the speed of any comparable cars, but you would have to lock and unlock the doors by winding the windows up and down, and the indicators would flash in time with the stereo system. Because why have two timer chips when one will do?
So, you're saying Woz would have worked for Citroën?
posted by TheWhiteSkull at 4:23 AM on May 4, 2014
So, you're saying Woz would have worked for Citroën?
posted by TheWhiteSkull at 4:23 AM on May 4, 2014
So, you're saying Woz would have worked for Citroën?
Hey, just because you start a DS21 with a flick of a hydraulic gearshift lever behind a single spoke wheel that makes every shift as elegantly arch as the the gestures of an aging drag queen with a mother-of-pearl cigarette holder doesn't mean a hack is involved. Sometimes, you just desire a bit of left bank savoir faire.
That said, Citroën's clean sheet design for the 2CV and Woz's no-holds-barred design for the Apple ][ have much in common as victories for the brilliance of humble inspirations.
Can they do it all? After a fashion.
Fortunately, the spirit of Woz is alive and well in the Raspberry Pi. Wish the same could be said for the realm of automobiles, but unfortunately, we're still stuck in overblown UNIVAC territory there, for the most part.
posted by sonascope at 5:27 AM on May 4, 2014 [5 favorites]
Hey, just because you start a DS21 with a flick of a hydraulic gearshift lever behind a single spoke wheel that makes every shift as elegantly arch as the the gestures of an aging drag queen with a mother-of-pearl cigarette holder doesn't mean a hack is involved. Sometimes, you just desire a bit of left bank savoir faire.
That said, Citroën's clean sheet design for the 2CV and Woz's no-holds-barred design for the Apple ][ have much in common as victories for the brilliance of humble inspirations.
Can they do it all? After a fashion.
Fortunately, the spirit of Woz is alive and well in the Raspberry Pi. Wish the same could be said for the realm of automobiles, but unfortunately, we're still stuck in overblown UNIVAC territory there, for the most part.
posted by sonascope at 5:27 AM on May 4, 2014 [5 favorites]
I should add that, as a joint Apple ][ and Commodore 64 loyalist, that the later Commodore 64 was a far superior machine, but Commodore had the advantage of owning MOS Technology, so where Woz was working within the constraints of finding ways to get existing chips to do complicated jobs, CBM was able to come up with their own ASICs, like VIC-II and the SID. Woz just took what he wanted to do and figured out how to make it work, which is absolutely the spirit of focused, monomaniacal genius. Whenever I have to open up one of my Apple ][s (now running on the brilliant CFFA card instead of my venerable Disk ][ drives), I still feel like I'm opening the door to a tiny chapel of the Congregational Unified Church of Divine DIY.
posted by sonascope at 5:43 AM on May 4, 2014 [4 favorites]
posted by sonascope at 5:43 AM on May 4, 2014 [4 favorites]
Yeah, most of the ]['s kludginess is found in the video and disk controllers, both of which are legendary for what they manage to do with a handful of inexpensive standard chips. In other respects the ][ is a very standard and easily expanded machine. The expansion slots made it more cumbersome and expensive than machines that came after it like the TRS-80 and C64, but they also made it more easily upgradable with features that didn't exist when it was designed.
The thing is the ]['s kludges worked. By contrast, Commodore's post-hoc solution for a disk drive interface is one of the industry's most legendary monuments to $FAIL.
posted by localroger at 5:53 AM on May 4, 2014 [2 favorites]
The thing is the ]['s kludges worked. By contrast, Commodore's post-hoc solution for a disk drive interface is one of the industry's most legendary monuments to $FAIL.
posted by localroger at 5:53 AM on May 4, 2014 [2 favorites]
Woz's failure to include floating point in his Integer Basic meant that Apple eventually licensed Microsoft Basic for inclusion on the floppy disks. This led to one of Apple's worst deals when Microsoft cancelled Apple's Basic for the Macintosh.
More about Woz's brilliant design of the Apple II disk drive controller
The Apple2History site has a pretty good set of pages.
posted by blob at 6:14 AM on May 4, 2014 [2 favorites]
More about Woz's brilliant design of the Apple II disk drive controller
The Apple2History site has a pretty good set of pages.
posted by blob at 6:14 AM on May 4, 2014 [2 favorites]
By contrast, Commodore's post-hoc solution for a disk drive interface is one of the industry's most legendary monuments to $FAIL.
What, you mean making the disk drive almost a complete computer on its own was a bad idea?
Well, yeah, it's pretty baroque engineering, though I think it's part of the equation of making the 64 cheaper if you can live with the Datasette. The mechanical alignment thing was a dick, to be sure.
The Disk ][ controller is definitely a work of genius.
posted by sonascope at 6:17 AM on May 4, 2014
What, you mean making the disk drive almost a complete computer on its own was a bad idea?
Well, yeah, it's pretty baroque engineering, though I think it's part of the equation of making the 64 cheaper if you can live with the Datasette. The mechanical alignment thing was a dick, to be sure.
The Disk ][ controller is definitely a work of genius.
posted by sonascope at 6:17 AM on May 4, 2014
...is one of the industry's most legendary monuments to $FAIL,8,1
FTFY
posted by jquinby at 7:10 AM on May 4, 2014 [10 favorites]
FTFY
posted by jquinby at 7:10 AM on May 4, 2014 [10 favorites]
What, you mean making the disk drive almost a complete computer on its own was a bad idea?
No, that's a good idea, if somewhat gold-plated and expensive.
The bad, weird, awful and completely inexplicable idea was linking the two little computers together with a serial line with a lower bit rate than cassette tape.
posted by flabdablet at 7:36 AM on May 4, 2014 [2 favorites]
No, that's a good idea, if somewhat gold-plated and expensive.
The bad, weird, awful and completely inexplicable idea was linking the two little computers together with a serial line with a lower bit rate than cassette tape.
posted by flabdablet at 7:36 AM on May 4, 2014 [2 favorites]
We keep calling him "Metafilter's Own" Steve Wozniak, but he's only ever written one comment, an answer in Ask the same day he joined, October 4, 2006. As far as we know he's forgotten we exist.
Yeah, it was a little bit tongue-in-cheek. I've always liked that comment so much, though, that I feel as if he has an honorary status. Not only is the comment good on its own merits, but it's such a gracious response to an AskMe question that was pretty embarrassing.
posted by SpacemanStix at 7:44 AM on May 4, 2014 [1 favorite]
Yeah, it was a little bit tongue-in-cheek. I've always liked that comment so much, though, that I feel as if he has an honorary status. Not only is the comment good on its own merits, but it's such a gracious response to an AskMe question that was pretty embarrassing.
posted by SpacemanStix at 7:44 AM on May 4, 2014 [1 favorite]
I've written here before about the design of the Disk II controller.
I've often said that if Seymour Cray is the Bach of computer design, Woz is its Jimi Hendrix.
That horrible noise the drive always made was a direct result of Wozniak realizing he could save a lot of money on the circuits to align the read-write head simply by over-running the head into the physical limit stop and allowing the metal on nylon fittings physically slip.
Don't know that any mechanical slip was involved, just the fact that the head stepper motor can't generate enough force to push a drive head all the way through the back of the casing :-)
This abuse did occasionally cause things to slip a little, which would end up with a trip to the repair shop for disk head realignment. Done quite a few of those; there were special disks involved with an eccentric alignment track whose signal you'd monitor on an oscilloscope until all the odd peaks were the same height as the even ones.
posted by flabdablet at 7:45 AM on May 4, 2014 [4 favorites]
I've often said that if Seymour Cray is the Bach of computer design, Woz is its Jimi Hendrix.
That horrible noise the drive always made was a direct result of Wozniak realizing he could save a lot of money on the circuits to align the read-write head simply by over-running the head into the physical limit stop and allowing the metal on nylon fittings physically slip.
Don't know that any mechanical slip was involved, just the fact that the head stepper motor can't generate enough force to push a drive head all the way through the back of the casing :-)
This abuse did occasionally cause things to slip a little, which would end up with a trip to the repair shop for disk head realignment. Done quite a few of those; there were special disks involved with an eccentric alignment track whose signal you'd monitor on an oscilloscope until all the odd peaks were the same height as the even ones.
posted by flabdablet at 7:45 AM on May 4, 2014 [4 favorites]
This just happened to be on my nerdpocalypse of a desk.
That is awesome. I used to have that exact game programming manual, which I think I checked out from the library a bunch of times. It was probably pretty popular, but I don't think I've seen it for about 30 years. I'm buying a used copy on Amazon right now.
posted by SpacemanStix at 7:53 AM on May 4, 2014
That is awesome. I used to have that exact game programming manual, which I think I checked out from the library a bunch of times. It was probably pretty popular, but I don't think I've seen it for about 30 years. I'm buying a used copy on Amazon right now.
posted by SpacemanStix at 7:53 AM on May 4, 2014
Oh hey, and here's an online .pdf of the whole book.
posted by SpacemanStix at 7:54 AM on May 4, 2014 [2 favorites]
posted by SpacemanStix at 7:54 AM on May 4, 2014 [2 favorites]
*Neither high-res nor Hires root beer.
I remember being very confused as a young lad reading instructions that referred to the "hires" display (can't recall if it was for my C64 or Amiga 1000). "What did root beer have to do with the display?" I wondered. Then it was pointed out that in this context, it was pronounced high-res.
posted by juiceCake at 8:02 AM on May 4, 2014
I remember being very confused as a young lad reading instructions that referred to the "hires" display (can't recall if it was for my C64 or Amiga 1000). "What did root beer have to do with the display?" I wondered. Then it was pointed out that in this context, it was pronounced high-res.
posted by juiceCake at 8:02 AM on May 4, 2014
Heh, I actually thought that there was a Hires graphics company that had supplied the chips.
posted by octothorpe at 8:10 AM on May 4, 2014
posted by octothorpe at 8:10 AM on May 4, 2014
flabdablet, the 1540/1541 wasn't supposed to be the dog it was:
Initially, Commodore intended to use a hardware shift register (one component of the 6522 VIA) to maintain relatively brisk drive speeds with the new serial interface. However, a hardware bug with this chip prevented the initial design from working as anticipated, and the ROM code was hastily rewritten to handle the entire operation in software. According to Jim Butterfield, this caused a speed reduction by a factor of five.posted by jepler at 8:18 AM on May 4, 2014 [1 favorite]
(wikipedia, citing this archive of some usenet posts)
Quite so.
However, there is absolutely no reason beyond a lack of creative programming ability for the thing to have ended up running at under 500 bytes per second. I know from personal experience that it's possible to transfer 256-byte packets over a processor-driven bit-banging serial link using a 1MHz 6502 (as used in Apple II and Commodore 64) at over ten thousand bytes per second.
Various third party fast loaders did eventually become available for the 1541, as I recall.
To be fair, Woz dropped the ball when designing AppleDOS too. His lovely disk controller, capable of transferring data at 256kbits/sec, ended up horribly bottlenecked by a handful of astoundingly inefficient buffer-copying and data encoding/decoding routines. It wasn't until the advent of FDOS, DiversiDOS and other such third-party patches that AppleDOS's performance rose above woeful.
But even those patched versions, and Apple's own later ProDOS, still required at least two disk revolutions to read or write a single track. So I was blown away by Roland Gustafsson's fast disk loader, embedded in various Broderbund games, which could read (and write - all the write code was still there in the game loaders) an entire Disk II track in one revolution, without needing to wait for sector 0 to pass under the head, while increasing the usable track capacity from 4096 bytes to 4608 bytes at the same time.
posted by flabdablet at 8:45 AM on May 4, 2014 [4 favorites]
However, there is absolutely no reason beyond a lack of creative programming ability for the thing to have ended up running at under 500 bytes per second. I know from personal experience that it's possible to transfer 256-byte packets over a processor-driven bit-banging serial link using a 1MHz 6502 (as used in Apple II and Commodore 64) at over ten thousand bytes per second.
Various third party fast loaders did eventually become available for the 1541, as I recall.
To be fair, Woz dropped the ball when designing AppleDOS too. His lovely disk controller, capable of transferring data at 256kbits/sec, ended up horribly bottlenecked by a handful of astoundingly inefficient buffer-copying and data encoding/decoding routines. It wasn't until the advent of FDOS, DiversiDOS and other such third-party patches that AppleDOS's performance rose above woeful.
But even those patched versions, and Apple's own later ProDOS, still required at least two disk revolutions to read or write a single track. So I was blown away by Roland Gustafsson's fast disk loader, embedded in various Broderbund games, which could read (and write - all the write code was still there in the game loaders) an entire Disk II track in one revolution, without needing to wait for sector 0 to pass under the head, while increasing the usable track capacity from 4096 bytes to 4608 bytes at the same time.
posted by flabdablet at 8:45 AM on May 4, 2014 [4 favorites]
I miss BASIC. It is literally the only computer language I ever understood. Every other time I've tried to learn one it somehow boils down to "wgaergargaergaregaegigbberish, now YOU write a line of code!" instructions that I don't get. Feh.
posted by jenfullmoon at 8:57 AM on May 4, 2014 [1 favorite]
posted by jenfullmoon at 8:57 AM on May 4, 2014 [1 favorite]
I miss BASIC. It is literally the only computer language I ever understood.
Not to derail, but have you tried Python?
posted by Mr. Bad Example at 9:07 AM on May 4, 2014 [4 favorites]
Not to derail, but have you tried Python?
posted by Mr. Bad Example at 9:07 AM on May 4, 2014 [4 favorites]
WOZ > GATES by a huge margin.
Yeah. Bill never used his mad phreaking skillz to crank call the Pope.
posted by radwolf76 at 9:10 AM on May 4, 2014 [3 favorites]
Yeah. Bill never used his mad phreaking skillz to crank call the Pope.
posted by radwolf76 at 9:10 AM on May 4, 2014 [3 favorites]
Meanwhile, back on topic: Both Microsoft's Applesoft BASIC and Woz's Integer BASIC took much the same approach to storing program code: both of them parsed the code line by line on entry and stored a single-byte token to represent each BASIC reserved word, unpacking those again for listings.
Unlike Applesoft, Integer BASIC did pretty much all its syntax checking during that entry-time tokenization pass. Once an Integer BASIC program was loaded into RAM, you could POKE things into it to create programs that wouldn't pass the entry-time syntax checking, and surprisingly often the interpreter coped just fine with executing them.
I believe it was Bob Bishop (though memory is foggy at this distance; might have been Bruce Tognazzini) who first used this technique in a published program, allowing him to do clever things with LOMEM: that weren't allowed by the rules. Bishop subsequently built SiMPLE, a programming language for people who miss having one as straightforward and discoverable as Apple II BASIC on their Windows boxes.
Another interesting feature of Integer BASIC is that parts of it were written in Sweet16 virtual machine instructions to save code space.
posted by flabdablet at 9:13 AM on May 4, 2014 [3 favorites]
Unlike Applesoft, Integer BASIC did pretty much all its syntax checking during that entry-time tokenization pass. Once an Integer BASIC program was loaded into RAM, you could POKE things into it to create programs that wouldn't pass the entry-time syntax checking, and surprisingly often the interpreter coped just fine with executing them.
I believe it was Bob Bishop (though memory is foggy at this distance; might have been Bruce Tognazzini) who first used this technique in a published program, allowing him to do clever things with LOMEM: that weren't allowed by the rules. Bishop subsequently built SiMPLE, a programming language for people who miss having one as straightforward and discoverable as Apple II BASIC on their Windows boxes.
Another interesting feature of Integer BASIC is that parts of it were written in Sweet16 virtual machine instructions to save code space.
posted by flabdablet at 9:13 AM on May 4, 2014 [3 favorites]
To be fair, part of the Bill Gates legend is that his BASIC interpreter was written without ever having gotten his hands on the actual computer. I'm certain Woz could do that too, but it doesn't necessarily count against Gates that he didn't design the hardware, if we judge him instead as a software genius with less hardware prowess but more "Jobsian" business acumen and assholery that enabled him to blow up pretty damn high without getting shafted along the way. Woz is definitely a god though, and Bill Gates is looking more and more like a saint these days. Crazy.
posted by aydeejones at 9:28 AM on May 4, 2014
posted by aydeejones at 9:28 AM on May 4, 2014
Yeah. Bill never used his mad phreaking skillz to crank call the Pope.
Different men, different approaches. When Woz wants to talk to the Pope he uses One Weird Trick. BillG plays the Long Game: he spends years becoming a big enough deal that the Pope will simply take his calls.
posted by The Tensor at 10:55 AM on May 4, 2014 [11 favorites]
Different men, different approaches. When Woz wants to talk to the Pope he uses One Weird Trick. BillG plays the Long Game: he spends years becoming a big enough deal that the Pope will simply take his calls.
posted by The Tensor at 10:55 AM on May 4, 2014 [11 favorites]
localroger: The thing is the ]['s kludges worked. By contrast, Commodore's post-hoc solution for a disk drive interface is one of the industry's most legendary monuments to $FAIL.
God, the cumulative hours I spent waiting for that disk drive to do things. The Commodore 64 has many strengths, but the ubiquity of software fastloaders for the system is proof that Commodore messed up there. Of course, this is what flabdablet said.
SpacemanStix: Oh hey, and here's an online .pdf of the whole book.
I still have a physical copy of that somewhere I think. I obtained it too late in my youth for it to make much difference to my coding, not to mention that it was made for timeshare systems and its dialect of BASIC was out of date by the era of the Commodore 64. But if you can read one BASIC then mostly you can read others, and it was full of interesting programs, including an early version of the TREK computer game that was ubiquitous in the early days.
jepler: According to Jim Butterfield, this caused a speed reduction by a factor of five.
I think I've mentioned this before, but long ago I did an interview with the (now sadly-departed) Butterfield online in the by-then dusty, empty halls of the Commodore forum on Compuserve. One of my great regrets is losing the file with the transcript of that. I've got to have it *somewhere* on my many C64 disks, but I don't have the means to readily find it now.
jenfullmoon: I miss BASIC. It is literally the only computer language I ever understood. Every other time I've tried to learn one it somehow boils down to "wgaergargaergaregaegigbberish, now YOU write a line of code!" instructions that I don't get. Feh.
I hear ya. Now, it was said by some of the programming big-wigs that BASIC had a way of poisoning minds against "proper" coding practices. That might have been what happened to me, because while I was extremely conversant in BASIC when I was a kid, and even picked up some 6502 assembly, I haven't been able to get anywhere nearly as deep into C or its variants despite knowing, intellectually, how they work. At a deep level I lose track of important things while I'm coding, like the details of pointers, and the accumulated weight of those errors reduces the efficiency of my coding process. As Mr. Bad Example says, you really should try Python, it's like a language designed for us old BASIC users while also being generally modern and awesome.
flabdablet: Unlike Applesoft, Integer BASIC did pretty much all its syntax checking during that entry-time tokenization pass.
Which is just sensible design; do as much of the work as you can during tokenization, which is design-time, instead of at run-time. Microsoft's dialects of BASIC are filled with inefficiencies. In the Microsoft-produced Commodore 64 BASIC, doing math with integer variables is actually slower than using floating point, because the interpreter will convert all values to floating point, do the math, then convert them back!
aydeejones: Woz is definitely a god though, and Bill Gates is looking more and more like a saint these days. Crazy.
He can do whatever he likes with that huge pile of money now that he's gigantically rich. Us people who used computers back in the old days have long memories, and we remember the things he did to accumulate that pile of cash. He didn't kill anyone, sure, but the history of computing to this date is in large part the story of working around Microsoft's dictates, the things Microsoft has dictated to everyone because it makes the most economic sense to the company.
posted by JHarris at 12:21 PM on May 4, 2014 [7 favorites]
God, the cumulative hours I spent waiting for that disk drive to do things. The Commodore 64 has many strengths, but the ubiquity of software fastloaders for the system is proof that Commodore messed up there. Of course, this is what flabdablet said.
SpacemanStix: Oh hey, and here's an online .pdf of the whole book.
I still have a physical copy of that somewhere I think. I obtained it too late in my youth for it to make much difference to my coding, not to mention that it was made for timeshare systems and its dialect of BASIC was out of date by the era of the Commodore 64. But if you can read one BASIC then mostly you can read others, and it was full of interesting programs, including an early version of the TREK computer game that was ubiquitous in the early days.
jepler: According to Jim Butterfield, this caused a speed reduction by a factor of five.
I think I've mentioned this before, but long ago I did an interview with the (now sadly-departed) Butterfield online in the by-then dusty, empty halls of the Commodore forum on Compuserve. One of my great regrets is losing the file with the transcript of that. I've got to have it *somewhere* on my many C64 disks, but I don't have the means to readily find it now.
jenfullmoon: I miss BASIC. It is literally the only computer language I ever understood. Every other time I've tried to learn one it somehow boils down to "wgaergargaergaregaegigbberish, now YOU write a line of code!" instructions that I don't get. Feh.
I hear ya. Now, it was said by some of the programming big-wigs that BASIC had a way of poisoning minds against "proper" coding practices. That might have been what happened to me, because while I was extremely conversant in BASIC when I was a kid, and even picked up some 6502 assembly, I haven't been able to get anywhere nearly as deep into C or its variants despite knowing, intellectually, how they work. At a deep level I lose track of important things while I'm coding, like the details of pointers, and the accumulated weight of those errors reduces the efficiency of my coding process. As Mr. Bad Example says, you really should try Python, it's like a language designed for us old BASIC users while also being generally modern and awesome.
flabdablet: Unlike Applesoft, Integer BASIC did pretty much all its syntax checking during that entry-time tokenization pass.
Which is just sensible design; do as much of the work as you can during tokenization, which is design-time, instead of at run-time. Microsoft's dialects of BASIC are filled with inefficiencies. In the Microsoft-produced Commodore 64 BASIC, doing math with integer variables is actually slower than using floating point, because the interpreter will convert all values to floating point, do the math, then convert them back!
aydeejones: Woz is definitely a god though, and Bill Gates is looking more and more like a saint these days. Crazy.
He can do whatever he likes with that huge pile of money now that he's gigantically rich. Us people who used computers back in the old days have long memories, and we remember the things he did to accumulate that pile of cash. He didn't kill anyone, sure, but the history of computing to this date is in large part the story of working around Microsoft's dictates, the things Microsoft has dictated to everyone because it makes the most economic sense to the company.
posted by JHarris at 12:21 PM on May 4, 2014 [7 favorites]
flabdablet: Bishop subsequently built SiMPLE, a programming language for people who miss having one as straightforward and discoverable as Apple II BASIC on their Windows boxes.
Wow, that page is beautiful. Looking at it I feel like it's 1994 again, yet the page copyright is 2014. It's like the return of Geocities.
posted by JHarris at 12:25 PM on May 4, 2014 [1 favorite]
Wow, that page is beautiful. Looking at it I feel like it's 1994 again, yet the page copyright is 2014. It's like the return of Geocities.
posted by JHarris at 12:25 PM on May 4, 2014 [1 favorite]
Just a quick call out for Jim Butterfield. Really nice guy - missed. He played a big role in my computer career.
posted by parki at 12:54 PM on May 4, 2014 [2 favorites]
posted by parki at 12:54 PM on May 4, 2014 [2 favorites]
BASIC sure had the power to fire the imagination, whatever its shortcomings compared to FORTRAN or ALGOL.
posted by thelonius at 3:46 PM on May 4, 2014
posted by thelonius at 3:46 PM on May 4, 2014
radwolf76: "Yeah. Bill never used his mad phreaking skillz to crank call the Pope."
I can't seem to find it now but didn't Steve have some sort of involvement in a toll free suicide or kids help line in the early days of Apple?
posted by Mitheral at 4:52 PM on May 4, 2014
I can't seem to find it now but didn't Steve have some sort of involvement in a toll free suicide or kids help line in the early days of Apple?
posted by Mitheral at 4:52 PM on May 4, 2014
The Apple II was an awful beautiful hack.
I think there's a level of technique where the beautiful and terrible begin to converge. I had the same double reaction when reading parts of Lions' Commentary. There's an element of genius in knowing exactly where to stab straight through the abstraction that makes everything manageable and into the heart of the problem.
posted by hattifattener at 5:09 PM on May 4, 2014 [3 favorites]
I think there's a level of technique where the beautiful and terrible begin to converge. I had the same double reaction when reading parts of Lions' Commentary. There's an element of genius in knowing exactly where to stab straight through the abstraction that makes everything manageable and into the heart of the problem.
posted by hattifattener at 5:09 PM on May 4, 2014 [3 favorites]
I think there's a level of technique where the beautiful and terrible begin to converge.
Probably the pinnacle of this in the entire history of the computer industry, showing what a person similar to Woz could do with access to custom chip fab, was the Sinclair ZX80. It even did floating point math :-)
posted by localroger at 5:20 PM on May 4, 2014 [1 favorite]
Probably the pinnacle of this in the entire history of the computer industry, showing what a person similar to Woz could do with access to custom chip fab, was the Sinclair ZX80. It even did floating point math :-)
posted by localroger at 5:20 PM on May 4, 2014 [1 favorite]
I'm only about 20 pages in, but am currently enjoying Steven Weyhrich’s 2013 book Sophistication & Simplicity: The Life and Times of the Apple II Computer.
posted by blueberry at 5:43 PM on May 4, 2014
posted by blueberry at 5:43 PM on May 4, 2014
I'm absolutely sure that Integer Basic did not ship with HGR et. al. There was an empty ROM slot that could be filled with the "Programmer's AID ROM" which had hires routines.
I really hated the hires memory layout as the calculation for going from, say, a Y coordinate to the next screen address was horribly bad and given an already calculated address, the next address wasn't always curraddr + constant offset, which would have been nice because you would then be able to write code like this:
Which would be code for drawing a data pattern down a vertical strip. Instead you either had to do the basecalc, which was a bunch of shifts and adds and conditions (and shifts were dog-slow on the 6502) or you did what I ended up doing which was burning 384 bytes on two tables of addresses of the first byte of each scanline, which turns out to be several cycles faster per iteration than the code above. And yes, the code above and all of my Apple II bit-blitters were self-modifying because the addressing modes in the 6502 were...inspired, but not so useful. You could use the indirect indexed for blitting (ie, ($aa), Y), but that was 6 cycles instead of 5 for absolute indexed AND you had the freedom of using either X or Y. Even though you had the extra cost of writing over your own code, you only did that twice per scanline so on average a typical bit blitter of a bitmap 5 bytes wide was a win.
To put things in a really bizarre perspective, when I was in college, I remember a professor talking with reverence about Alan Turing's coding style which included a fair amount of (necessary) self-modifying code in machine language and my thought was, "what's the big deal? I've been doing that since I was 14 - that's how you write code when you have to count cycles." I also feel like if I ever had the chance to speak with Alan Turing it'd probably sound like the 4 Yorkshiremen sketch.
"Oh you had a barrel shifter in your ALU? Luxury. If wanted to shift, we had to build on hardware with our bare hands." "Well, it was *like* a barrel shifter, but it could only do one bit at a time. Me dad would make me get up every morning and write t' shift tables by hand so I could get one multiplication done b'for dinner."
Have you any idea what it's like to go from a machine with 56 instructions (and let's be frank, a good fifth of them where alias for the same microcode to either jam something into the P register or branch based on what was in it) to VAX when I was in college? Holy shit! I didn't know what to do with all the registers - no actually, I did. In assembly, we were assigned the task of writing a program to create and manipulate a family tree. I wrote it in Pascal first, debugged it, then hand-compiled it and used register passing for the parameters to functions because (1) I had more registers than any function needed to params and return values and (2) I had no recursion and (3) my 256 byte stack roots made me treat stack space as precious and I refused to gum up the stack with data that was just going in a register anyway...
I do drone on, don't I?
posted by plinth at 6:31 PM on May 4, 2014 [11 favorites]
I really hated the hires memory layout as the calculation for going from, say, a Y coordinate to the next screen address was horribly bad and given an already calculated address, the next address wasn't always curraddr + constant offset, which would have been nice because you would then be able to write code like this:
LDA #0
STA .1 + 1
LDA #$20
STA .1 + 2
LDX $xval
LDY $height
.2 LDA $dataval
.1 STA $2000,X
CLC
LDA .1 + 1
ADC #$28
BCC .3
INC .1 + 2
.3 DEY
BNE .2
RTS
Which would be code for drawing a data pattern down a vertical strip. Instead you either had to do the basecalc, which was a bunch of shifts and adds and conditions (and shifts were dog-slow on the 6502) or you did what I ended up doing which was burning 384 bytes on two tables of addresses of the first byte of each scanline, which turns out to be several cycles faster per iteration than the code above. And yes, the code above and all of my Apple II bit-blitters were self-modifying because the addressing modes in the 6502 were...inspired, but not so useful. You could use the indirect indexed for blitting (ie, ($aa), Y), but that was 6 cycles instead of 5 for absolute indexed AND you had the freedom of using either X or Y. Even though you had the extra cost of writing over your own code, you only did that twice per scanline so on average a typical bit blitter of a bitmap 5 bytes wide was a win.
To put things in a really bizarre perspective, when I was in college, I remember a professor talking with reverence about Alan Turing's coding style which included a fair amount of (necessary) self-modifying code in machine language and my thought was, "what's the big deal? I've been doing that since I was 14 - that's how you write code when you have to count cycles." I also feel like if I ever had the chance to speak with Alan Turing it'd probably sound like the 4 Yorkshiremen sketch.
"Oh you had a barrel shifter in your ALU? Luxury. If wanted to shift, we had to build on hardware with our bare hands." "Well, it was *like* a barrel shifter, but it could only do one bit at a time. Me dad would make me get up every morning and write t' shift tables by hand so I could get one multiplication done b'for dinner."
Have you any idea what it's like to go from a machine with 56 instructions (and let's be frank, a good fifth of them where alias for the same microcode to either jam something into the P register or branch based on what was in it) to VAX when I was in college? Holy shit! I didn't know what to do with all the registers - no actually, I did. In assembly, we were assigned the task of writing a program to create and manipulate a family tree. I wrote it in Pascal first, debugged it, then hand-compiled it and used register passing for the parameters to functions because (1) I had more registers than any function needed to params and return values and (2) I had no recursion and (3) my 256 byte stack roots made me treat stack space as precious and I refused to gum up the stack with data that was just going in a register anyway...
I do drone on, don't I?
posted by plinth at 6:31 PM on May 4, 2014 [11 favorites]
Steve Wozniak is a gentleman and a scholar.
posted by homunculus at 6:42 PM on May 4, 2014 [5 favorites]
posted by homunculus at 6:42 PM on May 4, 2014 [5 favorites]
and shifts were dog-slow on the 6502
That's a little harsh. Shifting the accumulator took two clocks and one byte. It's really only the read/modify/write in-memory shifts that are slow (five clocks for zero page, six for absolute) and only three out of the ten shifts in the vertical position calculation of HPOSN were in-memory (see page 81). The trouble was that calculating a hi-res base address just needed an awful lot of fartarsing about; lookup tables were definitely the Right Thing once 48K RAM became the normal amount to find in an Apple II.
posted by flabdablet at 9:52 PM on May 4, 2014 [3 favorites]
That's a little harsh. Shifting the accumulator took two clocks and one byte. It's really only the read/modify/write in-memory shifts that are slow (five clocks for zero page, six for absolute) and only three out of the ten shifts in the vertical position calculation of HPOSN were in-memory (see page 81). The trouble was that calculating a hi-res base address just needed an awful lot of fartarsing about; lookup tables were definitely the Right Thing once 48K RAM became the normal amount to find in an Apple II.
posted by flabdablet at 9:52 PM on May 4, 2014 [3 favorites]
flabdablet - you're right that two clocks is not bad, except when you have to do more than 1. :) 24 clocks for just the shifting per scanline is too much. Way too much.
Also think about shape objects on the apple II - if you didn't pre-shift them, your worst case was 6 right shifts and an or PER BYTE. 6 right shifts should be the same cost as 1 right shift, but nope. It makes sense to me that the engineer that designed Space Invaders made a hardware barrel shifter in the IO space of the machine (side note, if you think you are bad off for not having shoes with the 6502, try not having legs with the 8080).
posted by plinth at 3:48 AM on May 5, 2014 [2 favorites]
Also think about shape objects on the apple II - if you didn't pre-shift them, your worst case was 6 right shifts and an or PER BYTE. 6 right shifts should be the same cost as 1 right shift, but nope. It makes sense to me that the engineer that designed Space Invaders made a hardware barrel shifter in the IO space of the machine (side note, if you think you are bad off for not having shoes with the 6502, try not having legs with the 8080).
posted by plinth at 3:48 AM on May 5, 2014 [2 favorites]
A Woz thread always reminds me of opening a VHS to VHS-C adapter and seeing how it works.
Most of the damn thing is a kluge but it servers its purpose (until it doesn't, one of the parts that's supposed to guide another part wears out [makes me want to buy a 3D printer]). I'm just tryin to preserve things.
Aaaaaaaaaaalllll theeeeeeeeeses meemorries. (Is that a song?)
posted by coolxcool=rad at 10:12 AM on May 5, 2014
Most of the damn thing is a kluge but it servers its purpose (until it doesn't, one of the parts that's supposed to guide another part wears out [makes me want to buy a 3D printer]). I'm just tryin to preserve things.
Aaaaaaaaaaalllll theeeeeeeeeses meemorries. (Is that a song?)
posted by coolxcool=rad at 10:12 AM on May 5, 2014
if you didn't pre-shift them
you were doing it wrong! :)
The fact that each display byte resolved to 7 pixels instead of 8 was a massive pain in the arse as well, because it meant that to turn a horizontal coordinate into a byte offset and mask you couldn't just generate the mask from the bottom 3 bits and the offset from the top 6; you had to do an actual divide by 7 in some fashion. Slow, slow, slow in code, so more lookup tables there. This is presumably why quite a few games limited the active play area to a region 256 pixels wide, with some kind of decorative filler occupying the edge(s).
posted by flabdablet at 11:02 AM on May 5, 2014 [1 favorite]
you were doing it wrong! :)
The fact that each display byte resolved to 7 pixels instead of 8 was a massive pain in the arse as well, because it meant that to turn a horizontal coordinate into a byte offset and mask you couldn't just generate the mask from the bottom 3 bits and the offset from the top 6; you had to do an actual divide by 7 in some fashion. Slow, slow, slow in code, so more lookup tables there. This is presumably why quite a few games limited the active play area to a region 256 pixels wide, with some kind of decorative filler occupying the edge(s).
posted by flabdablet at 11:02 AM on May 5, 2014 [1 favorite]
There was a PAL colour knockoff of the Apple II that was even worse; I don't think Franklin made it, but they might have. It used nine pixels per byte because PAL's chroma clock is at 4.43MHz compared to NTSC's 3.58MHz, and the colour encoding was different on odd and even rows because PAL reverses the meaning of the chroma carrier's phase every second line. Total horizontal resolution was 360 pixels. It still used 40 bytes per row and it had some weird-arse page flipping scheme to get access to the RAM supplying the ninth and tenth (green/violet vs orange/blue) bits. Nice homage to the master, but so not compatible.
The Apple II itself came in a Euro edition that used a different master crystal (14.250MHz instead of 14.318) and some small alterations in the video divider chain to get the PAL frame and line timing right, but it didn't have native PAL color onboard. You had to use a plug-in PAL card, which actually decoded the Apple's 3.5MHz pseudo-NTSC video stream and re-encoded it in PAL. There was still a strong 3.5MHz component in the output and it would beat with the 4.43MHz PAL chroma carrier, so you ended up with this weird mixture of kinda-sorta visible vertical striping at 3.5MHz with drifty fringy PAL color overlaid on it. Fairly horrible.
I worked for a firm that made both PAL and RGB cards for Apple II, and the colour out of our RGB card was actually really nice. The card had a 4-bit shift register and a 2-bit counter both clocked at 14MHz, feeding a high speed PROM to do color decoding, and its output looked very clean indeed.
The initial version of the decoder PROM made the card emit the same pixels as a monochrome display would have done, only coloured. This looked really nice and sharp and you could make one-pixel-wide vertical lines with it, but it meant that colored areas got the same stripey look they would have done on a mono display - not ideal for gaming. We finally settled on one that enforced a minimum two-pixel width for any colored dot, and that looked about as good as it was possible for a color Apple II display to do.
posted by flabdablet at 11:33 AM on May 5, 2014 [4 favorites]
The Apple II itself came in a Euro edition that used a different master crystal (14.250MHz instead of 14.318) and some small alterations in the video divider chain to get the PAL frame and line timing right, but it didn't have native PAL color onboard. You had to use a plug-in PAL card, which actually decoded the Apple's 3.5MHz pseudo-NTSC video stream and re-encoded it in PAL. There was still a strong 3.5MHz component in the output and it would beat with the 4.43MHz PAL chroma carrier, so you ended up with this weird mixture of kinda-sorta visible vertical striping at 3.5MHz with drifty fringy PAL color overlaid on it. Fairly horrible.
I worked for a firm that made both PAL and RGB cards for Apple II, and the colour out of our RGB card was actually really nice. The card had a 4-bit shift register and a 2-bit counter both clocked at 14MHz, feeding a high speed PROM to do color decoding, and its output looked very clean indeed.
The initial version of the decoder PROM made the card emit the same pixels as a monochrome display would have done, only coloured. This looked really nice and sharp and you could make one-pixel-wide vertical lines with it, but it meant that colored areas got the same stripey look they would have done on a mono display - not ideal for gaming. We finally settled on one that enforced a minimum two-pixel width for any colored dot, and that looked about as good as it was possible for a color Apple II display to do.
posted by flabdablet at 11:33 AM on May 5, 2014 [4 favorites]
101 Basic Computer Games[pdf], edited by David H. Ahl, seems to have been influential to a bunch of us. I got a beat up copy of the original (not the later microcomputer edition) and used it to type every game into whatever BASIC-running computer (DEC, etc.) I could wangle access to. The lousy typography just adds to its charm. Thanks, David H. Ahl!
posted by Hello Dad, I'm in Jail at 11:38 PM on May 5, 2014 [1 favorite]
posted by Hello Dad, I'm in Jail at 11:38 PM on May 5, 2014 [1 favorite]
Decorative borders are for squids!
No, I treated the play area as 140 pixels wide since it was easier to colorize areas of a bitmap on the fly by making it white with the high bit set and masking it down to blue, orange, green, purple or that other white.
posted by plinth at 3:44 AM on May 6, 2014
No, I treated the play area as 140 pixels wide since it was easier to colorize areas of a bitmap on the fly by making it white with the high bit set and masking it down to blue, orange, green, purple or that other white.
posted by plinth at 3:44 AM on May 6, 2014
I decided to waste an hour at work this morning, so you're going to get an explanation of how to get the coin drop sound from Defender out of an Apple II. The Apple II has a speaker that has 1 bit output. You touch an IO address and the speaker toggles from fully in to fully out (IIRC). The typical way to make sound was to write a loop to pop the speaker then wait for a while and then repeat. Changing the length of the wait would change the resulting frequency of the output. This would make square waves, more or less.
There's a problem with this kind of code, however. If you loop 100 times (say), high frequency notes are way shorter in length of time played than low frequency notes. This is because the wait on high frequency notes is much shorter than low frequency notes and multiply that by 100 and it magnifies the difference.
The real solution is to eliminate the delay as much as possible. Instead, we're going to run a loop and pop the speaker every nth time through the loop. When you do this, you realize that there is actually a whole lot of extra CPU time to burn. So much that you can, in fact, embed a pop every mth time into the overall loop and now you have two pitches being player (more or less) simultaneously via this interleaving.
But wait, there's MORE. Remember how I said that the speaker is a toggle from all out to all in? If you toggle it really fast--as fast as the CPU can--the speaker will make a very quiet high pitched whine. This is because when you go at that speed, you're yanking the speaker back to the original position before it has had a chance to fully reach it. So if you're careful about timing, you can now control the volume of your output. And you can control the volume of each of the two voices independently.
So here's a sound routine that I (re)wrote this morning. There is a routine, SOUND, which has a couple of memory addresses set aside for the pitches and volumes of each voice and a memory address for the duration of the sound being played.
Inside the routine, while its running, the X register gets used for the inverse frequency of voice 1, the Y register gets used for the inverse frequency of voice 2 and for the volume controls of each. Register A is scratch.
In here I'm setting up the volume control for voice 1 (for voice 2, the code is nearly identical). I am self-modifying the sound production code by injecting the desired input volume + 1 and its complement into the voice 1 code. VOLCOMP is a lookup table for the complement of the volume. More on this later.
Now comes the code for voice 2. It is identical in all respects to voice 1 except that it's Y register based for the outside loop.
This is where we update the duration - just decrement until both 0. Except that we still have CPU time to burn here (more later), so I run a loop on X with a NOP to kill time.
Without going into too much detail, I'm very happy about that final delay loop there. There's just enough time left over to replace it with code that does something else. I could scan the keyboard and make something happen. Or I could write similar loops to the main voice loops that dynamically change the volumes of the voices allowing me to have attack/sustain/decay on each voice. At this point, I've created in software 2/3 of a General Instruments AY 8910 chip, used in most of the video games of the 80's. In fact, if you look at the data sheet for the chip, it's pretty much 6 count down timers and a linear feedback shift register (for noise) running off the input clock.
Now, onto the Defender coin drop sound. That's two voices, at nearly the same frequency so they have a nice ominous phase-shifter sound.
To do that, we just need to plug in $FF and $FE into the pitch values and let her rip:
And with a total was 142 bytes of code and data. Here's the output.
posted by plinth at 9:16 AM on May 6, 2014 [11 favorites]
There's a problem with this kind of code, however. If you loop 100 times (say), high frequency notes are way shorter in length of time played than low frequency notes. This is because the wait on high frequency notes is much shorter than low frequency notes and multiply that by 100 and it magnifies the difference.
The real solution is to eliminate the delay as much as possible. Instead, we're going to run a loop and pop the speaker every nth time through the loop. When you do this, you realize that there is actually a whole lot of extra CPU time to burn. So much that you can, in fact, embed a pop every mth time into the overall loop and now you have two pitches being player (more or less) simultaneously via this interleaving.
But wait, there's MORE. Remember how I said that the speaker is a toggle from all out to all in? If you toggle it really fast--as fast as the CPU can--the speaker will make a very quiet high pitched whine. This is because when you go at that speed, you're yanking the speaker back to the original position before it has had a chance to fully reach it. So if you're careful about timing, you can now control the volume of your output. And you can control the volume of each of the two voices independently.
So here's a sound routine that I (re)wrote this morning. There is a routine, SOUND, which has a couple of memory addresses set aside for the pitches and volumes of each voice and a memory address for the duration of the sound being played.
Inside the routine, while its running, the X register gets used for the inverse frequency of voice 1, the Y register gets used for the inverse frequency of voice 2 and for the volume controls of each. Register A is scratch.
1000 .OR $300
1010 SOUND
1020 LDA VOL1
1030 AND #$07
1040 TAY
1050 LDA VOLCOMP,Y
1060 INY
1070 STY MODVOL1+1
1080 STA MODCOM1+1
In here I'm setting up the volume control for voice 1 (for voice 2, the code is nearly identical). I am self-modifying the sound production code by injecting the desired input volume + 1 and its complement into the voice 1 code. VOLCOMP is a lookup table for the complement of the volume. More on this later.
1090 LDA VOL2
1100 AND #$07
1110 TAY
1120 LDA VOLCOMP,Y
1130 INY
1140 STY MODVOL2+1
1150 STA MODCOM2+1
1160 LDX PIT1
1170 LDY PIT2
1180 VOICE1
1190 DEX
1200 BNE VOICE2
1210 LDX PIT1
Here is the code to manage clicking the speaker. It does this by saving Y in A, popping the speaker (address $c030) running a very short loop, counting down from 1-8 inclusive, then it pops the speaker again and runs an identical loop, counting down from 8-1 inclusive, finally restoring Y from A.
1220 TYA
1230 STA $C030
1240 MODVOL1
1250 LDY #$00
1260 .1 DEY
1270 BNE .1
1280 STA $C030
1290 MODCOM1
1300 LDY #$00
1310 .1 DEY
1320 BNE .1
1330 TAY
Now comes the code for voice 2. It is identical in all respects to voice 1 except that it's Y register based for the outside loop.
1340 VOICE2
1350 DEY
1360 BNE DURUPDATE
1370 TYA
1380 STA $C030
1390 MODVOL2
1400 LDY #$00
1410 .1 DEY
1420 BNE .1
1430 STA $C030
1440 MODCOM2
1450 LDY #$00
1460 .1 DEY
1470 BNE .1
1480 TAY
This is where we update the duration - just decrement until both 0. Except that we still have CPU time to burn here (more later), so I run a loop on X with a NOP to kill time.
1490 DURUPDATE
1500 TXA
1510 LDX #$04
1520 .1 NOP
1530 DEX
1540 BNE .1
1550 TAX
1560 DEC DURLO
1570 BNE VOICE1
1580 DEC DURHI
1590 BNE VOICE1
1600 RTS
1610 VOLCOMP
1620 .HS 0807060504030201
1630 PIT1 .BS 01
1640 VOL1 .BS 01
1650 PIT2 .BS 01
1660 VOL2 .BS 01
1670 DURLO .BS 01
1680 DURHI .BS 01
Without going into too much detail, I'm very happy about that final delay loop there. There's just enough time left over to replace it with code that does something else. I could scan the keyboard and make something happen. Or I could write similar loops to the main voice loops that dynamically change the volumes of the voices allowing me to have attack/sustain/decay on each voice. At this point, I've created in software 2/3 of a General Instruments AY 8910 chip, used in most of the video games of the 80's. In fact, if you look at the data sheet for the chip, it's pretty much 6 count down timers and a linear feedback shift register (for noise) running off the input clock.
Now, onto the Defender coin drop sound. That's two voices, at nearly the same frequency so they have a nice ominous phase-shifter sound.
To do that, we just need to plug in $FF and $FE into the pitch values and let her rip:
1700 LDA #$7
1710 STA VOL1
1720 STA VOL2
1730 LDX #$00
1740 STX DURLO
1750 STX DURHI
1760 DEX
1770 STX PIT1
1780 DEX
1790 STX PIT2
1800 JMP SOUND
And with a total was 142 bytes of code and data. Here's the output.
posted by plinth at 9:16 AM on May 6, 2014 [11 favorites]
So if you're careful about timing, you can now control the volume of your output. And you can control the volume of each of the two voices independently.
If you're really careful, you can do interesting things with pulse width modulation of a supersonic carrier to get the equivalent of a multi-bit DAC.
I once wrote a working DTMF phone dialling tone generator for Apple II speaker, to help us manage long distance call costs in a share house.
We'd enter a user ID, a password and a phone number or speed dial code, press one cup from an old pair of around-the-ear headphones over the phone mouthpiece, and the Apple would log the user ID and the number, then dial it with a prefix to charge the call to a shared prepaid card.
As an actual appliance it was a total flop - people just ended up buying individual prepaid cards - but I was still pleased with the DTMF generator, which emitted two overlaid sine waves from that 1-bit speaker port with enough fidelity to dial completely reliably. I must dig that code out again and put it online somewhere.
I remember that it was based on pulse width modulating a 15625Hz square wave: I picked the same frequency as my PAL Apple II's horizontal video scan rate so I could use the video monitor to debug it by toggling the page 1 / page 2 switch instead of the speaker port.
15625Hz comes out to 65 processor clocks per cycle, and the PWM code was capable of generating any split of that from 4:61 to 58:7 in one-clock increments. That's 55 different output levels, almost six bits of simulated DAC resolution. Most of the inter-transition delay time was used to do sine wave lookup table access and 8 bit arithmetic for calculating the output level for the next sample, plus some time-invariant 16 bit arithmetic for duration control; if I recall correctly there was so little time left over from those tasks as to make explicit loops for delay padding unnecessary.
About five years later I was very pleased to find out that Michael Mahon had independently used a similar design idea in order to make DAC522 and AppleSynth, got in touch with him, and contributed some low-level improvements to his NadaNet project, which is how I found out that an Apple II can transmit or receive serial data using software bit banging at a peak rate of 8 CPU clocks per bit (125 kbits/sec on a 1MHz Apple II). If anybody can improve on that, please let both me and Michael know!
My stuff starts on page 60 of this NadaNet assembly listing PDF.
posted by flabdablet at 12:35 PM on May 6, 2014 [5 favorites]
If you're really careful, you can do interesting things with pulse width modulation of a supersonic carrier to get the equivalent of a multi-bit DAC.
I once wrote a working DTMF phone dialling tone generator for Apple II speaker, to help us manage long distance call costs in a share house.
We'd enter a user ID, a password and a phone number or speed dial code, press one cup from an old pair of around-the-ear headphones over the phone mouthpiece, and the Apple would log the user ID and the number, then dial it with a prefix to charge the call to a shared prepaid card.
As an actual appliance it was a total flop - people just ended up buying individual prepaid cards - but I was still pleased with the DTMF generator, which emitted two overlaid sine waves from that 1-bit speaker port with enough fidelity to dial completely reliably. I must dig that code out again and put it online somewhere.
I remember that it was based on pulse width modulating a 15625Hz square wave: I picked the same frequency as my PAL Apple II's horizontal video scan rate so I could use the video monitor to debug it by toggling the page 1 / page 2 switch instead of the speaker port.
15625Hz comes out to 65 processor clocks per cycle, and the PWM code was capable of generating any split of that from 4:61 to 58:7 in one-clock increments. That's 55 different output levels, almost six bits of simulated DAC resolution. Most of the inter-transition delay time was used to do sine wave lookup table access and 8 bit arithmetic for calculating the output level for the next sample, plus some time-invariant 16 bit arithmetic for duration control; if I recall correctly there was so little time left over from those tasks as to make explicit loops for delay padding unnecessary.
About five years later I was very pleased to find out that Michael Mahon had independently used a similar design idea in order to make DAC522 and AppleSynth, got in touch with him, and contributed some low-level improvements to his NadaNet project, which is how I found out that an Apple II can transmit or receive serial data using software bit banging at a peak rate of 8 CPU clocks per bit (125 kbits/sec on a 1MHz Apple II). If anybody can improve on that, please let both me and Michael know!
My stuff starts on page 60 of this NadaNet assembly listing PDF.
posted by flabdablet at 12:35 PM on May 6, 2014 [5 favorites]
I've just remembered the reasons I actually picked 65 system clocks as the period length for my PWM DAC, as opposed to the post-hoc rationalization about video debugging which I didn't think of until the project was well underway.
First, there's audible squeal masking. Even at 52 years old I can still hear 15625Hz squeal, so I've not been sad to see the end of the CRT television era. Twenty years ago it used to bother me a lot. It seemed to me at the time that if I locked the repetition rate of my PWM to the Apple II video horizontal scan rate, then the existing horizontal scanning squeal from the display would disguise any residual PWM squeal from the speaker and, more importantly, wouldn't beat with it to create spurious audible tones.
Second, the Apple II actually stretches every 65th system clock to make it equal to 8 pixel times rather than its usual 7, making a complete horizontal scan line take 65 × 7 + 1 = 456 pixel periods rather than 455, meaning that the fake NTSC chroma subcarrier generated by alternating display pixels will have the same phase on successive scan lines while keeping both the frequency of that subcarrier and the horizontal line rate workably close to the NTSC spec.
I knew about this because my former employer had made an 80 column text display add-on card that had had to find and lock onto the stretched clocks, putting them into the horizontal blank period in order to avoid a spurious vertical stripe appearing onscreen, and it seemed to me that running a PWM at some frequency unrelated to the horizontal scan rate would add a periodic component to the DAC's quantization noise that would probably cause an audible beat tone.
Unfortunately I can't seem to put my hands on the DTMF generator source code, which leaves me no option but to rewrite and improve it. I think I've just figured out a way to extend the PWM's timing range from 4:61 .. 58:7 all the way to the theoretical maximum 4:61 .. 61:4 (it takes a four-clock instruction to toggle an Apple II speaker).
Going out to code, now. I may be some time.
posted by flabdablet at 7:03 AM on May 7, 2014 [1 favorite]
First, there's audible squeal masking. Even at 52 years old I can still hear 15625Hz squeal, so I've not been sad to see the end of the CRT television era. Twenty years ago it used to bother me a lot. It seemed to me at the time that if I locked the repetition rate of my PWM to the Apple II video horizontal scan rate, then the existing horizontal scanning squeal from the display would disguise any residual PWM squeal from the speaker and, more importantly, wouldn't beat with it to create spurious audible tones.
Second, the Apple II actually stretches every 65th system clock to make it equal to 8 pixel times rather than its usual 7, making a complete horizontal scan line take 65 × 7 + 1 = 456 pixel periods rather than 455, meaning that the fake NTSC chroma subcarrier generated by alternating display pixels will have the same phase on successive scan lines while keeping both the frequency of that subcarrier and the horizontal line rate workably close to the NTSC spec.
I knew about this because my former employer had made an 80 column text display add-on card that had had to find and lock onto the stretched clocks, putting them into the horizontal blank period in order to avoid a spurious vertical stripe appearing onscreen, and it seemed to me that running a PWM at some frequency unrelated to the horizontal scan rate would add a periodic component to the DAC's quantization noise that would probably cause an audible beat tone.
Unfortunately I can't seem to put my hands on the DTMF generator source code, which leaves me no option but to rewrite and improve it. I think I've just figured out a way to extend the PWM's timing range from 4:61 .. 58:7 all the way to the theoretical maximum 4:61 .. 61:4 (it takes a four-clock instruction to toggle an Apple II speaker).
Going out to code, now. I may be some time.
posted by flabdablet at 7:03 AM on May 7, 2014 [1 favorite]
This has suddenly become an extremely interesting thread! More, please?
posted by JHarris at 5:23 PM on May 7, 2014
posted by JHarris at 5:23 PM on May 7, 2014
OK, if you insist :)
Best way I know to drag speed out of little CPUs is to keep as much program state as possible in the program counter.
The basic structure of the PWM DAC I'm currently redesigning is fifty-eight individual speaker-drive routines, each one taking exactly 65 CPU cycles to run and devoted to emitting a single rectangular pulse whose timing creates the required average DAC output level for that 65-cycle period.
On a PAL-compatible Apple IIe, which is the one I have available for testing, 65 CPU cycles translates to 1/15625 of a second as previously described. On an original NTSC-compatible machine it would be 1/15750 instead. That's probably close enough to the same speed to make no practical difference for a DTMF application, but if it isn't, the only things that should need to change are audio sample tables.
Each speaker-drive routine has to do all of the following:
* Turn the speaker output on a fixed number of CPU cycles from the beginning of the current 65-cycle slot; the exact number will depend on which of the 58 drive routines this is
* Turn it off again after another fixed number of CPU cycles
* Look up the next two audio sample levels
* Add those to calculate the final output level required for the next 65-cycle sample slot
* Arrange for control to transfer to the drive routine for that level when this one has completed its own 65 cycles
* Quit when enough samples have been processed to emit a tone of the required length
all of which turns out to be quite a lot to fit into 65 cycles. It helps that the Apple II speaker bit is very easy to toggle: a single write of don't-care data to a particular memory address is all that's required, and 6502 memory-write instructions don't change the processor status flags or register contents, so with a bit of juggling and shuffling the speaker toggling instructions can just get inserted into any of the other jobs without undue stress. The difficulty is in getting the timing exactly right.
The STA $C030 instruction usually used to toggle the speaker requires exactly four cycles: one to fetch the STA instruction, another to fetch the $30 byte of the address while decoding the STA, a third to fetch the $C0 address byte, and a fourth to do the actual memory write to address $C030. This puts a hard limit on the speaker toggle timing: it cannot be done any faster than using a pair of inline sequence of STA $C030 instructions to flip it back and forth four cycles apart. The 6502 doesn't have the kind of multiple-byte block moves you might be able to abuse on an architecture like x86 or 68K to get the job done faster than that. This is why the pulse ratio range available from this PWM can't possibly be made wider than 4:61 to 61:4.
Minimum instruction execution time on the 6502 is two cycles, not one, which means you can't just stick a 1-cycle padding instruction anywhere you might happen to want one. That makes achieving the 5:60 and 60:5 PWM ratios require a trick that I'll come back and explain in a bit.
posted by flabdablet at 10:26 PM on May 7, 2014 [1 favorite]
Best way I know to drag speed out of little CPUs is to keep as much program state as possible in the program counter.
The basic structure of the PWM DAC I'm currently redesigning is fifty-eight individual speaker-drive routines, each one taking exactly 65 CPU cycles to run and devoted to emitting a single rectangular pulse whose timing creates the required average DAC output level for that 65-cycle period.
On a PAL-compatible Apple IIe, which is the one I have available for testing, 65 CPU cycles translates to 1/15625 of a second as previously described. On an original NTSC-compatible machine it would be 1/15750 instead. That's probably close enough to the same speed to make no practical difference for a DTMF application, but if it isn't, the only things that should need to change are audio sample tables.
Each speaker-drive routine has to do all of the following:
* Turn the speaker output on a fixed number of CPU cycles from the beginning of the current 65-cycle slot; the exact number will depend on which of the 58 drive routines this is
* Turn it off again after another fixed number of CPU cycles
* Look up the next two audio sample levels
* Add those to calculate the final output level required for the next 65-cycle sample slot
* Arrange for control to transfer to the drive routine for that level when this one has completed its own 65 cycles
* Quit when enough samples have been processed to emit a tone of the required length
all of which turns out to be quite a lot to fit into 65 cycles. It helps that the Apple II speaker bit is very easy to toggle: a single write of don't-care data to a particular memory address is all that's required, and 6502 memory-write instructions don't change the processor status flags or register contents, so with a bit of juggling and shuffling the speaker toggling instructions can just get inserted into any of the other jobs without undue stress. The difficulty is in getting the timing exactly right.
The STA $C030 instruction usually used to toggle the speaker requires exactly four cycles: one to fetch the STA instruction, another to fetch the $30 byte of the address while decoding the STA, a third to fetch the $C0 address byte, and a fourth to do the actual memory write to address $C030. This puts a hard limit on the speaker toggle timing: it cannot be done any faster than using a pair of inline sequence of STA $C030 instructions to flip it back and forth four cycles apart. The 6502 doesn't have the kind of multiple-byte block moves you might be able to abuse on an architecture like x86 or 68K to get the job done faster than that. This is why the pulse ratio range available from this PWM can't possibly be made wider than 4:61 to 61:4.
Minimum instruction execution time on the 6502 is two cycles, not one, which means you can't just stick a 1-cycle padding instruction anywhere you might happen to want one. That makes achieving the 5:60 and 60:5 PWM ratios require a trick that I'll come back and explain in a bit.
posted by flabdablet at 10:26 PM on May 7, 2014 [1 favorite]
Yes, but you can get a 1 cycle pad by using something like
This is because while STA abs,Y takes 4 cycles, it will take an extra cycle if you have to cross a page boundary (ie, when you go past a 256 byte boundary). If we can force that by not storing into $c030, but into the previous page with an index in Y (or X) to hit $C030, we get the pad cycle.
The Aristocrats!
Honestly, when I was writing code for the Apple II, I was in my early-mid teens so a lot of the hackery was by chance experimentation rather than by method, although I did do a lot of reading of Byte and Kilobaud magazines to see what other people were doing. There was a great article on generating pink noise using a 9 bit linear feedback shift register. I played with that a lot.
I also disassembled a lot of other people's code to see how it worked as well. So yeah, I was looking at code by Nasir Gebelli, Bob Bishop, Bill Budge, Jun Wada, and Larry Miller (among others).
I wrote mostly games and utilities for writing games. For example, I wrote a tool that let me draw bitmaps pixel by pixel that also had full screen scrolling and when I had a set done, I could sweep out bounds and it would collect all the data and turn it into linear packed data that I could reference from code. That was written in a combination of BASIC and assembly.
At one point, Nasir had written a game called Horizon V which had a mode where if you paused the game, it put up one of the creatures from the game and if you ran music in through the cassette port, it would echo it through the speaker and make the creature dance. That combined with the death in Dung Beetles, let me to embark on something I called "Project Van Halen", which was a simple way to get myself thrown out of the high school library. I mean, honestly, there are way easier ways to do get thrown out, but this seemed like the one that met closest to my natural lawful good alignment.
I wrote some code to read the cassette port and record transitions with (more or less) 4 bit resolution and run-length compress them into memory. The code in low memory and when run would digitize into as much ram as I had, then used RWTS to write it out to disk. Then I wrote the opposite code - boot, loading rwts and read a file, filling memory then play it, read the next file etc. I digitized "Eruption" and played that in on the library Apple II. Time to ejection: 45 seconds.
posted by plinth at 7:31 AM on May 8, 2014 [1 favorite]
LDY #$31
...
STA $BFFF,Y
This is because while STA abs,Y takes 4 cycles, it will take an extra cycle if you have to cross a page boundary (ie, when you go past a 256 byte boundary). If we can force that by not storing into $c030, but into the previous page with an index in Y (or X) to hit $C030, we get the pad cycle.
The Aristocrats!
Honestly, when I was writing code for the Apple II, I was in my early-mid teens so a lot of the hackery was by chance experimentation rather than by method, although I did do a lot of reading of Byte and Kilobaud magazines to see what other people were doing. There was a great article on generating pink noise using a 9 bit linear feedback shift register. I played with that a lot.
I also disassembled a lot of other people's code to see how it worked as well. So yeah, I was looking at code by Nasir Gebelli, Bob Bishop, Bill Budge, Jun Wada, and Larry Miller (among others).
I wrote mostly games and utilities for writing games. For example, I wrote a tool that let me draw bitmaps pixel by pixel that also had full screen scrolling and when I had a set done, I could sweep out bounds and it would collect all the data and turn it into linear packed data that I could reference from code. That was written in a combination of BASIC and assembly.
At one point, Nasir had written a game called Horizon V which had a mode where if you paused the game, it put up one of the creatures from the game and if you ran music in through the cassette port, it would echo it through the speaker and make the creature dance. That combined with the death in Dung Beetles, let me to embark on something I called "Project Van Halen", which was a simple way to get myself thrown out of the high school library. I mean, honestly, there are way easier ways to do get thrown out, but this seemed like the one that met closest to my natural lawful good alignment.
I wrote some code to read the cassette port and record transitions with (more or less) 4 bit resolution and run-length compress them into memory. The code in low memory and when run would digitize into as much ram as I had, then used RWTS to write it out to disk. Then I wrote the opposite code - boot, loading rwts and read a file, filling memory then play it, read the next file etc. I digitized "Eruption" and played that in on the library Apple II. Time to ejection: 45 seconds.
posted by plinth at 7:31 AM on May 8, 2014 [1 favorite]
Anybody who has tried to make an Apple ][+ or //e speaker do anything interesting using either Integer or Applesoft BASIC will know that
10 SP = -16336
20 X = PEEK(SP)
makes a healthy CLICK, while
10 SP = -16336
20 POKE SP, 0
just doesn't.
The resulting lore (you must only ever read from the speaker address to make it click, because writing doesn't work) actually has nothing to do with the design of the speaker interface hardware and everything to do with the design of the 6502.
The machine instruction that ends up carrying out the actual work of a PEEK, on both versions of BASIC, is an indirect indexed Load Accumulator:
A fairly typical idiom, and one used by both the Apple BASIC interpreters, is to make sure the Y register is set to zero before doing this, which lets the (PTR),Y mode be used for simple indirect addressing.
The 6502 is a very simple CPU (the original version has, if I remember right, well under 4000 transistors) and it does a lot with a little. In particular, it has only a single 8 bit ALU (arithmetic/logic unit) which it uses for everything arithmetical, including address calculations. It doesn't have dedicated increment units for the program counter or the index registers; the ALU does everything, and the ALU is only eight bits wide.
With all that in mind, here's what the 6502 does, in detail, to run a STA ($20),Y instruction:
Cycle 0: Latch the current contents of the program counter (16 bits) onto the address bus and initiate a memory read cycle to fetch the opcode into the instruction register. While that's going on, feed the low byte of the program counter to one side of the ALU and the value 1 to the other, set the ALU to perform an Add, and store the result back to the low byte of the program counter.
This is the standard instruction fetch step, and every instruction starts this way.
Cycle 1: Latch the current contents of the program counter onto the address bus and initiate a memory read cycle to fetch a byte into the operand register. While that's going on, feed the low byte of the program counter to one side of the ALU and the value 1 to the other, set the ALU to perform an Add, and store the result back to the low byte of the program counter. And in parallel with all of that, decode the instruction byte fetched during Cycle 0 and work out what to do about it.
This is the standard operand fetch step and is also common to every instruction, and that's why the minimum execution time for a 6502 instruction is two clock cycles. Even instructions that have no use for an operand do this, which might seem stupid and wasteful until you realize that it needs to start happening before the 6502 has had time to work out which instruction it's dealing with. Most instructions do in fact need an operand, so overall this step saves time.
In the case we're looking at here, the value of the opcode fetched during cycle 0 was $B1, which encodes LDA (something),Y; the "something" got fetched during cycle 1 and it's the arbitrary value $20, which is about to get used as a memory address.
The next three cycles are common to all instructions that use the (something),Y addressing mode:
Cycle 2: Latch zeroes to the high half of the address bus, the contents of the operand byte ($20 in this example) to the low half, and initiate a memory read cycle to get the contents of location $20 into the low byte of the address register. While that's going on, feed that same operand value to one side of the ALU, the value 1 to the other, and set it to perform an Add with the result saved back into the operand register.
Cycle 3: Latch zeroes to the high half of the address bus, the updated contents of the operand register (now $21 in this example) to the low half, and initiate a memory read cycle to get the contents of location $21 into the high byte of the address register. While that's going on, feed the low byte of the address register (the one fetched during cycle 2) to one side of the ALU, the contents of the Y register to the other, and set it to perform an Add with the result saved back into the low byte of the address register.
Cycle 4: Intermission - drinks and sweets. For the lack of anything better to do with the address bus during this cycle, latch the current contents of the address register to it and initiate a memory read. While that's going on, feed the ALU's carry output back to one of its inputs, feed the high byte of the address register to the other, and set it to perform an Add with the result saved back into the high byte of the address register.
Cycle 5: do the thing. Address register to address bus, Accumulator register to data bus, and initiate a memory write to complete the STA. While that's going on, use the ALU to update the low byte of the program counter to get ready for Cycle 0 of the next instruction.
And now come the subtleties and edge cases. This is computing, so according to the Treaty of Westphalia you have to have subtleties and edge cases.
Any time the ALU is being used to add something to the low half of a 16 bit address like those in the program counter or the address register, it might happen that doing so generates a carry out - meaning that the addition result doesn't actually fit in 8 bits, and another 8 bit add needs to happen to update the high half of the address.
But by the time the 6502 notices this, it's set up to run some memory access cycle regardless, which it will do - generally using a bogus half-calculated address that's $100 too low - in order to give the ALU a cycle to complete the upper-half calculation in. Then it re-runs the same memory access cycle again, this time using the fully calculated address and simply discarding the result of the first cycle.
This is the cause of all those delays that the manual refers to as "page boundary crossings". Any time the upper half of some address needs updating, it costs an extra cycle to get that result from the 8 bit ALU.
Now, the occasional bogus read between friends is one thing, but the design of many peripheral chips makes the occasional bogus write something else entirely. Bogus writes are unacceptable in polite society because they cause memory corruption, so the 6502 never issues them; any time it has to perform a memory access cycle with an address that could be wrong, it forces that access to be done as a read.
The decision about whether a given read is bogus or not actually gets made during the read. If it turns out that the address was actually OK, the 6502 simply keeps the first result rather than doing a re-run.
This scheme works really well except on indexed store instructions, which must do their write during the repeat cycle because the initial try ("drinks and sweets", Cycle 4 above) can't be a write. The timing of indexed stores works as if they always experience a page boundary crossing, even when they don't, and the write cycle generated by an indexed store instruction is always preceded one cycle earlier by a read: either to the same address about to be written, or to a bogus address $100 bytes earlier in memory if a high-byte ALU cycle was in fact required.
So. Back to speakers.
When Apple BASIC does STA (PTR),Y and PTR and PTR+1 hold the address of the speaker control latch and the Y register contains zero, there is no possibility that the ALU will generate a carry-out while adding the contents of Y to the low byte of the control latch address; adding zero to anything can't cause a carry. That means that the bogus read and the intended write both hit the speaker control latch address on successive cycles. The speaker control latch selector completely ignores the state of the read/write signal because it doesn't use the data bus at all, relying solely on decoding the address bus; so the bogus read toggles it one way and the intended write immediately toggles it back again. The Apple speaker is simply not physically capable of moving in that little time, and stays completely silent. POKE: no click.
A LDA (PTR),Y as ultimately used by PEEK, on the other hand, does not generate two successive reads - there's no carry-out from the ALU during the first one, so that's the only one issued and the speaker gets toggled just once.
But there really is simply no decoding for Read/Write on that speaker control latch. If you're working in machine code, LDA $C030 and STA $C030 both makes the speaker hardware respond exactly the same way.
There's no indexing with either of those instructions, so the address bytes never go through the ALU, so there's no possibility of bogus writes, so there's no need for an intermission slot in their instruction timing. Both run in four cycles (fetch opcode, fetch operand/decode opcode, fetch address high byte, do the specified memory load or store).
To hit the speaker control latch twice with five cycles in between, using a store instruction rather than a read in order to avoid disturbing registers and flags that will probably be in use for other things at the time, I arrange elsewhere for the X register to contain $FF and then use
STA $C030
STA $BF31,X
The STA $C030 hits the control latch in the usual way. The STA $BF31,X hits it again five cycles later, not four, because of the "drinks and sweets" read that the 6502 inserts at cycle 3 to allow for ALU carry-out processing.
$31 + $FF > $FF so it does generate such a carry-out. The bogus read happens one cycle before the ALU gets to update the upper half of the address register to $C0, so the location that gets read is $BF30, which is RAM on an Apple II and not a sensitive I/O location. The speaker control latch never sees it, and there I have a basis for 5:60 and 60:5 pulse output routines.
On preview: jinx :)
posted by flabdablet at 9:26 AM on May 8, 2014 [2 favorites]
10 SP = -16336
20 X = PEEK(SP)
makes a healthy CLICK, while
10 SP = -16336
20 POKE SP, 0
just doesn't.
The resulting lore (you must only ever read from the speaker address to make it click, because writing doesn't work) actually has nothing to do with the design of the speaker interface hardware and everything to do with the design of the 6502.
The machine instruction that ends up carrying out the actual work of a PEEK, on both versions of BASIC, is an indirect indexed Load Accumulator:
LDA (PTR),YAnd for POKE, it's an indirect indexed Store Accumulator:
STA (PTR),YThe logic of the (PTR),Y addressing mode on a 6502 goes like this: PTR is a Page 0 address, referring to one of the bottom 256 memory locations and encoded into the associated instruction as a single byte following the opcode. The processor fetches that, reads the contents of memory locations PTR and PTR+1, concatenates the results to form a 16 bit number, then adds the contents of the 8-bit Y register to that to form the address of the memory byte the instruction is to operate on.
A fairly typical idiom, and one used by both the Apple BASIC interpreters, is to make sure the Y register is set to zero before doing this, which lets the (PTR),Y mode be used for simple indirect addressing.
The 6502 is a very simple CPU (the original version has, if I remember right, well under 4000 transistors) and it does a lot with a little. In particular, it has only a single 8 bit ALU (arithmetic/logic unit) which it uses for everything arithmetical, including address calculations. It doesn't have dedicated increment units for the program counter or the index registers; the ALU does everything, and the ALU is only eight bits wide.
With all that in mind, here's what the 6502 does, in detail, to run a STA ($20),Y instruction:
Cycle 0: Latch the current contents of the program counter (16 bits) onto the address bus and initiate a memory read cycle to fetch the opcode into the instruction register. While that's going on, feed the low byte of the program counter to one side of the ALU and the value 1 to the other, set the ALU to perform an Add, and store the result back to the low byte of the program counter.
This is the standard instruction fetch step, and every instruction starts this way.
Cycle 1: Latch the current contents of the program counter onto the address bus and initiate a memory read cycle to fetch a byte into the operand register. While that's going on, feed the low byte of the program counter to one side of the ALU and the value 1 to the other, set the ALU to perform an Add, and store the result back to the low byte of the program counter. And in parallel with all of that, decode the instruction byte fetched during Cycle 0 and work out what to do about it.
This is the standard operand fetch step and is also common to every instruction, and that's why the minimum execution time for a 6502 instruction is two clock cycles. Even instructions that have no use for an operand do this, which might seem stupid and wasteful until you realize that it needs to start happening before the 6502 has had time to work out which instruction it's dealing with. Most instructions do in fact need an operand, so overall this step saves time.
In the case we're looking at here, the value of the opcode fetched during cycle 0 was $B1, which encodes LDA (something),Y; the "something" got fetched during cycle 1 and it's the arbitrary value $20, which is about to get used as a memory address.
The next three cycles are common to all instructions that use the (something),Y addressing mode:
Cycle 2: Latch zeroes to the high half of the address bus, the contents of the operand byte ($20 in this example) to the low half, and initiate a memory read cycle to get the contents of location $20 into the low byte of the address register. While that's going on, feed that same operand value to one side of the ALU, the value 1 to the other, and set it to perform an Add with the result saved back into the operand register.
Cycle 3: Latch zeroes to the high half of the address bus, the updated contents of the operand register (now $21 in this example) to the low half, and initiate a memory read cycle to get the contents of location $21 into the high byte of the address register. While that's going on, feed the low byte of the address register (the one fetched during cycle 2) to one side of the ALU, the contents of the Y register to the other, and set it to perform an Add with the result saved back into the low byte of the address register.
Cycle 4: Intermission - drinks and sweets. For the lack of anything better to do with the address bus during this cycle, latch the current contents of the address register to it and initiate a memory read. While that's going on, feed the ALU's carry output back to one of its inputs, feed the high byte of the address register to the other, and set it to perform an Add with the result saved back into the high byte of the address register.
Cycle 5: do the thing. Address register to address bus, Accumulator register to data bus, and initiate a memory write to complete the STA. While that's going on, use the ALU to update the low byte of the program counter to get ready for Cycle 0 of the next instruction.
And now come the subtleties and edge cases. This is computing, so according to the Treaty of Westphalia you have to have subtleties and edge cases.
Any time the ALU is being used to add something to the low half of a 16 bit address like those in the program counter or the address register, it might happen that doing so generates a carry out - meaning that the addition result doesn't actually fit in 8 bits, and another 8 bit add needs to happen to update the high half of the address.
But by the time the 6502 notices this, it's set up to run some memory access cycle regardless, which it will do - generally using a bogus half-calculated address that's $100 too low - in order to give the ALU a cycle to complete the upper-half calculation in. Then it re-runs the same memory access cycle again, this time using the fully calculated address and simply discarding the result of the first cycle.
This is the cause of all those delays that the manual refers to as "page boundary crossings". Any time the upper half of some address needs updating, it costs an extra cycle to get that result from the 8 bit ALU.
Now, the occasional bogus read between friends is one thing, but the design of many peripheral chips makes the occasional bogus write something else entirely. Bogus writes are unacceptable in polite society because they cause memory corruption, so the 6502 never issues them; any time it has to perform a memory access cycle with an address that could be wrong, it forces that access to be done as a read.
The decision about whether a given read is bogus or not actually gets made during the read. If it turns out that the address was actually OK, the 6502 simply keeps the first result rather than doing a re-run.
This scheme works really well except on indexed store instructions, which must do their write during the repeat cycle because the initial try ("drinks and sweets", Cycle 4 above) can't be a write. The timing of indexed stores works as if they always experience a page boundary crossing, even when they don't, and the write cycle generated by an indexed store instruction is always preceded one cycle earlier by a read: either to the same address about to be written, or to a bogus address $100 bytes earlier in memory if a high-byte ALU cycle was in fact required.
So. Back to speakers.
When Apple BASIC does STA (PTR),Y and PTR and PTR+1 hold the address of the speaker control latch and the Y register contains zero, there is no possibility that the ALU will generate a carry-out while adding the contents of Y to the low byte of the control latch address; adding zero to anything can't cause a carry. That means that the bogus read and the intended write both hit the speaker control latch address on successive cycles. The speaker control latch selector completely ignores the state of the read/write signal because it doesn't use the data bus at all, relying solely on decoding the address bus; so the bogus read toggles it one way and the intended write immediately toggles it back again. The Apple speaker is simply not physically capable of moving in that little time, and stays completely silent. POKE: no click.
A LDA (PTR),Y as ultimately used by PEEK, on the other hand, does not generate two successive reads - there's no carry-out from the ALU during the first one, so that's the only one issued and the speaker gets toggled just once.
But there really is simply no decoding for Read/Write on that speaker control latch. If you're working in machine code, LDA $C030 and STA $C030 both makes the speaker hardware respond exactly the same way.
There's no indexing with either of those instructions, so the address bytes never go through the ALU, so there's no possibility of bogus writes, so there's no need for an intermission slot in their instruction timing. Both run in four cycles (fetch opcode, fetch operand/decode opcode, fetch address high byte, do the specified memory load or store).
To hit the speaker control latch twice with five cycles in between, using a store instruction rather than a read in order to avoid disturbing registers and flags that will probably be in use for other things at the time, I arrange elsewhere for the X register to contain $FF and then use
STA $C030
STA $BF31,X
The STA $C030 hits the control latch in the usual way. The STA $BF31,X hits it again five cycles later, not four, because of the "drinks and sweets" read that the 6502 inserts at cycle 3 to allow for ALU carry-out processing.
$31 + $FF > $FF so it does generate such a carry-out. The bogus read happens one cycle before the ALU gets to update the upper half of the address register to $C0, so the location that gets read is $BF30, which is RAM on an Apple II and not a sensitive I/O location. The speaker control latch never sees it, and there I have a basis for 5:60 and 60:5 pulse output routines.
On preview: jinx :)
posted by flabdablet at 9:26 AM on May 8, 2014 [2 favorites]
I actually understood all of that! Thank you, ol' Dr. Barbour and your Computer Architecture class!
posted by JHarris at 4:06 PM on May 8, 2014
posted by JHarris at 4:06 PM on May 8, 2014
Now I have an urge for a cigarette.
posted by localroger at 4:39 PM on May 8, 2014
posted by localroger at 4:39 PM on May 8, 2014
I actually understood all of that!
One of the things I miss, now we've got to 2014, is dealing with microprocessors that can be understood to an extent that allows for actual debugging, as opposed to simply having to establish some arbitrary level of confidence that everything is OK via extensive testing.
And no, a proliferation of layered automated correctness proof generators is not an acceptable substitute for direct understanding.
Age and crustiness catches up with all of us, given time.
posted by flabdablet at 9:52 PM on May 8, 2014
One of the things I miss, now we've got to 2014, is dealing with microprocessors that can be understood to an extent that allows for actual debugging, as opposed to simply having to establish some arbitrary level of confidence that everything is OK via extensive testing.
And no, a proliferation of layered automated correctness proof generators is not an acceptable substitute for direct understanding.
Age and crustiness catches up with all of us, given time.
posted by flabdablet at 9:52 PM on May 8, 2014
The thing I really liked about the Apple II was that not only could the whole thing be understood (at least by Flabdablet) but it was all open. You could literally do anything with it. You could boot it and start programming in BASIC, or drop into the monitor; you could rewire it or whatever; you could hack an enormous number of devices to work with it. The closest thing we have to this today is the Raspberry Pi (which is enormously more powerful than an Apple II) but the learning curve to doing systems-level stuff on a Raspberry Pi is much higher than for an Apple II.
posted by Joe in Australia at 10:38 PM on May 8, 2014 [1 favorite]
posted by Joe in Australia at 10:38 PM on May 8, 2014 [1 favorite]
You could literally do anything with it
as indeed you still can. My own Apple ][+ can still do all the things it did when Dad bought it in 1980. It's used up one high-voltage power transistor and a couple of high-voltage electrolytic capacitors in the power supply, and five years ago I cleaned every single pin on every single socketed IC with a pencil eraser along with all the gold edge connectors on the expansion cards, and it occasionally gets a head cleaning disc run through the floppy drives, but that's it.
Good solid machine. No keyboard faults even. But it's no use at all for watching Grumpy Cat videos, which apparently is the fundamental criterion for usability in 2014.
posted by flabdablet at 11:44 PM on May 8, 2014 [2 favorites]
as indeed you still can. My own Apple ][+ can still do all the things it did when Dad bought it in 1980. It's used up one high-voltage power transistor and a couple of high-voltage electrolytic capacitors in the power supply, and five years ago I cleaned every single pin on every single socketed IC with a pencil eraser along with all the gold edge connectors on the expansion cards, and it occasionally gets a head cleaning disc run through the floppy drives, but that's it.
Good solid machine. No keyboard faults even. But it's no use at all for watching Grumpy Cat videos, which apparently is the fundamental criterion for usability in 2014.
posted by flabdablet at 11:44 PM on May 8, 2014 [2 favorites]
Steve Wozniak to the FCC: Keep the Internet Free
posted by homunculus at 12:31 PM on May 20, 2014 [2 favorites]
posted by homunculus at 12:31 PM on May 20, 2014 [2 favorites]
That man really does embody everything I always ever thought was good about Apple.
posted by flabdablet at 9:41 PM on May 20, 2014 [1 favorite]
posted by flabdablet at 9:41 PM on May 20, 2014 [1 favorite]
« Older Close shave | So, you want to break the strike? Newer »
This thread has been archived and is closed to new comments
posted by cicadaverse at 12:18 AM on May 4, 2014 [2 favorites]