A super complicated automatic solution that doesn’t work that well
April 5, 2021 3:39 PM Subscribe
But what if I’m yelling and I want to show the proper regard for your humanity by capitalizing the first letter of your name? Then I want the [first letter of your name] to be upper-er case, even more upper case than before. Similarly, I might want to be super casual and disregard your humanity with a lower-er case letter. So that’s what this video is about… Now I’m definitely not going to blow you away with the results here, but they’re kind of interesting - and I think some of the story of we get there is fun and interesting as well. But this will be an example of derp (sic) learning.
tom7 previously
tom7 previously
Random historical computer fact: for a while there we were flirting with lowercase numbers.
posted by mhoye at 4:28 PM on April 5, 2021 [2 favorites]
posted by mhoye at 4:28 PM on April 5, 2021 [2 favorites]
I stopped watching when this guy used the caps lock key as the shift button.
posted by jonathanhughes at 4:30 PM on April 5, 2021 [4 favorites]
posted by jonathanhughes at 4:30 PM on April 5, 2021 [4 favorites]
→
aka non-lining figures. Which the Unicode consortium has a massive hate-on for, and they won't even consider them for inclusion because they say it's a style. So what are these, then: ① ⑴ ⒈ ⓵ ❶ ➀ ➊1 𝟏 𝟙 𝟣 𝟭 𝟷 🄂 ?
posted by scruss at 4:54 PM on April 5, 2021 [9 favorites]
Random historical computer fact: for a while there we were flirting with lowercase numbers.
aka non-lining figures. Which the Unicode consortium has a massive hate-on for, and they won't even consider them for inclusion because they say it's a style. So what are these, then: ① ⑴ ⒈ ⓵ ❶ ➀ ➊1 𝟏 𝟙 𝟣 𝟭 𝟷 🄂 ?
posted by scruss at 4:54 PM on April 5, 2021 [9 favorites]
Lowercase numbers are a style in the same way lowercase letters are a style.
posted by signal at 5:00 PM on April 5, 2021
posted by signal at 5:00 PM on April 5, 2021
I stopped watching when this guy used the caps lock key as the shift button.
Then you have missed the ending, wherein he gives a new and entirely appropriate purpose for the capslock key.
posted by pwnguin at 6:01 PM on April 5, 2021
Then you have missed the ending, wherein he gives a new and entirely appropriate purpose for the capslock key.
posted by pwnguin at 6:01 PM on April 5, 2021
tom7's extreme commitment to dopey ideas is a true inspiration.
posted by aubilenon at 6:23 PM on April 5, 2021 [12 favorites]
posted by aubilenon at 6:23 PM on April 5, 2021 [12 favorites]
This is truly incredible
posted by RustyBrooks at 8:06 PM on April 5, 2021 [1 favorite]
posted by RustyBrooks at 8:06 PM on April 5, 2021 [1 favorite]
That was just enough detail and an understandable enough example for someone who’s not regularly working with AI or computer “reading” of images to get just a little more educated and to stretch my brain just a little..
posted by Tandem Affinity at 9:01 PM on April 5, 2021 [2 favorites]
posted by Tandem Affinity at 9:01 PM on April 5, 2021 [2 favorites]
“It’s pretty relaxing playing chess against the letter E, because it does not care about chess at all.”
posted by lostburner at 10:56 PM on April 5, 2021 [1 favorite]
posted by lostburner at 10:56 PM on April 5, 2021 [1 favorite]
I was disappointed he didn't come up with a representation of the underlying letter shapes (allographs of the graphemes), which seemed to be the implicit approach in the fonts that he originally designed. Something like:
1. The shape of a letter Q in some font (NEVER CONVERTED INTO A RASTER OF PIXELS) ->
2. The underlying "logical" shape of a Q (a circle with a short line segment crossing it) ->
3. [A neural network trained on pairs of lower and upper letters] ->
4. An "augmented" shape (a circle with TWO short line segments crossing it) ->
5. FINAL UPPERMOST Q
posted by The Tensor at 11:02 PM on April 5, 2021
1. The shape of a letter Q in some font (NEVER CONVERTED INTO A RASTER OF PIXELS) ->
2. The underlying "logical" shape of a Q (a circle with a short line segment crossing it) ->
3. [A neural network trained on pairs of lower and upper letters] ->
4. An "augmented" shape (a circle with TWO short line segments crossing it) ->
5. FINAL UPPERMOST Q
posted by The Tensor at 11:02 PM on April 5, 2021
"Now, I only know of 26 letters" he says immediately after showing us a slide with a ʒ on it.
Also I don't believe he uses the word "majuscule" in the entire video, which, if so, shows an incredible and uncharacteristic amount of restraint.
posted by aubilenon at 11:28 PM on April 5, 2021 [2 favorites]
Also I don't believe he uses the word "majuscule" in the entire video, which, if so, shows an incredible and uncharacteristic amount of restraint.
posted by aubilenon at 11:28 PM on April 5, 2021 [2 favorites]
Well that was an enjoyably dorky excursion into the Forest of Font but we have a few readily available tools for the initial problem as stated. Super-majuscule for respeck? Dork or Dork. Microscule to diss? piffle. Then there is colour:
Kelly green "#4CBB17" will do for Irish references
Orange "#FFA500" for marmalade excursions or fried fish
Dull red "#990000" for fury
There's a war on about what best represents the Democrats "#3333FF" or " #00A6EF"
Related [in my head anyway] a 100 billion tweet analysis: Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings in PLOS One last year
posted by BobTheScientist at 12:48 AM on April 6, 2021 [1 favorite]
Kelly green "#4CBB17" will do for Irish references
Orange "#FFA500" for marmalade excursions or fried fish
Dull red "#990000" for fury
There's a war on about what best represents the Democrats "#3333FF" or " #00A6EF"
Related [in my head anyway] a 100 billion tweet analysis: Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings in PLOS One last year
posted by BobTheScientist at 12:48 AM on April 6, 2021 [1 favorite]
I liked tom7 before he was cool! I came across the Divide By Zero fonts when I was a student, and they've been a staple of my amateur desktop publishing projects (when you're doing a student magazine for a gaming society you can get adventurous with your title fonts). I was delighted to discover that he also does cool stuff besides fonts (and cool font-adjacent stuff).
posted by confluency at 1:26 AM on April 6, 2021 [1 favorite]
posted by confluency at 1:26 AM on April 6, 2021 [1 favorite]
I'd like to find a Times New Roman font, with a slider for bold, spacing, underlines, and capitalization. I'm filing legal documents that have exhibits that are illustrations excerpted from facsimile exhibits in the record, and sometimes I like to slightly emphasize certain words. Presently I do that in photoshop. It's kind of a fussy thing to do, but seems like it can clarify the arguments. All writing must be Times New Roman, 12 point, double spaced, but there are no specific rules for bold and italics, just that you need to indicate "emphasis added" in those cases. Wish I could control how much emphasis is added.
posted by StickyCarpet at 7:04 AM on April 6, 2021 [4 favorites]
posted by StickyCarpet at 7:04 AM on April 6, 2021 [4 favorites]
Why is their z a 3?
posted by GoblinHoney at 7:16 AM on April 6, 2021
posted by GoblinHoney at 7:16 AM on April 6, 2021
→
OpenType can do this, in theory, but Times New Roman is © The Monotype Corporation, so there's only so much they let you do. Additional weights (medium, semi-bold, extra bold) are available from $65 each, but this might not include an embedding licence. There's also a fair chance that whoever receives these files might be able to edit or print them and keep them the way you intended.
posted by scruss at 7:28 AM on April 6, 2021 [2 favorites]
a Times New Roman font, with a slider for bold, spacing, underlines, and capitalization
OpenType can do this, in theory, but Times New Roman is © The Monotype Corporation, so there's only so much they let you do. Additional weights (medium, semi-bold, extra bold) are available from $65 each, but this might not include an embedding licence. There's also a fair chance that whoever receives these files might be able to edit or print them and keep them the way you intended.
posted by scruss at 7:28 AM on April 6, 2021 [2 favorites]
aka non-lining figures. Which the Unicode consortium has a massive hate-on for, and they won't even consider them for inclusion because they say it's a style.
I suspect it’s a practical issue rather than a philosophical one. A typeface’s numerals are either designed as lining or non-lining and using them differently from their design just wouldn’t work. You could counter that a typeface should have both lining and non-lining numerals but in practice they currently don’t and the transition would be painful. But most important is what type designers would think of such a thing and I would wager that most would be adamantly opposed.
posted by sjswitzer at 11:06 AM on April 6, 2021
I suspect it’s a practical issue rather than a philosophical one. A typeface’s numerals are either designed as lining or non-lining and using them differently from their design just wouldn’t work. You could counter that a typeface should have both lining and non-lining numerals but in practice they currently don’t and the transition would be painful. But most important is what type designers would think of such a thing and I would wager that most would be adamantly opposed.
posted by sjswitzer at 11:06 AM on April 6, 2021
I was disappointed he didn't come up with a representation of the underlying letter shapes
But he did, though. Twice. He first tried representing the fonts with splined but the problem there is deciding how to compare two splines, which he never fully solved. Second, he tried signed distance fields, which he explained way too quickly. Those actually do produce a good representation of the shape and are much more conducive to the ML algorithms.
posted by sjswitzer at 11:14 AM on April 6, 2021
But he did, though. Twice. He first tried representing the fonts with splined but the problem there is deciding how to compare two splines, which he never fully solved. Second, he tried signed distance fields, which he explained way too quickly. Those actually do produce a good representation of the shape and are much more conducive to the ML algorithms.
posted by sjswitzer at 11:14 AM on April 6, 2021
That isn't a weird 3-shaped z, it's an ezh (ʒ), used in the International Phonetic Alphabet for the sound that the first syllable of "casual" ends with. It's a good solution to the problem of how to spell that word (cazh? caj? cas?) - sometimes you feel caʒ enough to pull in letters from other alphabets.
posted by wanderingmind at 11:31 AM on April 6, 2021 [3 favorites]
posted by wanderingmind at 11:31 AM on April 6, 2021 [3 favorites]
But he did, though. Twice. He first tried representing the fonts with splined but the problem there is deciding how to compare two splines, which he never fully solved.
He optimistically tried using the vector outlines of the fonts (with a little bit of normalization about point order) because if it worked he could have skipped all the in-between stages and gotten vector-outlines out the other end...but it didn't work.
Second, he tried signed distance fields, which he explained way too quickly. Those actually do produce a good representation of the shape and are much more conducive to the ML algorithms.
Those are still pixel rasters, which unsurprisingly led to really blurry-blobby results—think about a particular pixel near the upper left: it's part of upper-case W in some fonts, but not in others, as are a bunch of the nearby pixels, so the end result is a pixel-focused model that's all smeared out.
What's needed (IMHO) is a representation of what the "true" underlying letter looks like:
upper-case A: two segments joined at the top with a cross-bar halfway up
upper-case H: two segments that are parallel with a cross-bar halfway up
upper-case X: two segments that cross halfway up
etc.
(Not literally that text, but some kind of representation of the underlying strokes.)
So start with a component that takes fonts and extracts the "true" form (maybe render them at high resolution, then "erode" them until only single-pixel lines are left, then vectorize), train a model (probably not a CNN) on the underlying pairs of upper/lower "true" forms, then have another component that takes "true" forms and produces fonts. (This may consist of multiple models, trained on "true" forms and full forms, one for each font—so you have a "true"-to-Helvetica model, a "true"-to-Times Roman model, etc.)
I don't think this would work very well, either, because there's just not enough data—if you reduce letters to these "true" forms, you've effectively made the training data really homogenous—like all the upper-case A's are basically the same. That's not completely undesirable—I want all utmost-case Q's to have a similar "true" form, rather than different forms for each font—but only have a few dozen distinct items in the training set seems unlikely to work well. I'll bet the model might learn how to do letter pairs like C/c, F/f, K/k, M/m, N/n, O/o, P/p, S/s, U/u, V/v, W/w, X/x, Y/y, and Z/z, where there's a more-or-less straightforward geometric transformation, but not the others. Maybe you need a model to discriminate between "simple" letter pairs like those and "complex" letter pairs like A/a, G/g, Q/q, etc., and train a separate model on each. But now the training data is even thinner...
(I'd also be curious to see what kinds of results he'd get with his font-vector model and a GAN, where there's an "adversarial" model at rates and "punishes" the output based on how much it looks like a letter. That might cut way down on the the weirdo tumor results... Hmm, and the adversarial model should probably also score a letterform badly if it looks too much like either the existing upper- or lower-case letterform (hmm, and all the other proposed outputs, too...)
posted by The Tensor at 12:26 PM on April 6, 2021
He optimistically tried using the vector outlines of the fonts (with a little bit of normalization about point order) because if it worked he could have skipped all the in-between stages and gotten vector-outlines out the other end...but it didn't work.
Second, he tried signed distance fields, which he explained way too quickly. Those actually do produce a good representation of the shape and are much more conducive to the ML algorithms.
Those are still pixel rasters, which unsurprisingly led to really blurry-blobby results—think about a particular pixel near the upper left: it's part of upper-case W in some fonts, but not in others, as are a bunch of the nearby pixels, so the end result is a pixel-focused model that's all smeared out.
What's needed (IMHO) is a representation of what the "true" underlying letter looks like:
upper-case A: two segments joined at the top with a cross-bar halfway up
upper-case H: two segments that are parallel with a cross-bar halfway up
upper-case X: two segments that cross halfway up
etc.
(Not literally that text, but some kind of representation of the underlying strokes.)
So start with a component that takes fonts and extracts the "true" form (maybe render them at high resolution, then "erode" them until only single-pixel lines are left, then vectorize), train a model (probably not a CNN) on the underlying pairs of upper/lower "true" forms, then have another component that takes "true" forms and produces fonts. (This may consist of multiple models, trained on "true" forms and full forms, one for each font—so you have a "true"-to-Helvetica model, a "true"-to-Times Roman model, etc.)
I don't think this would work very well, either, because there's just not enough data—if you reduce letters to these "true" forms, you've effectively made the training data really homogenous—like all the upper-case A's are basically the same. That's not completely undesirable—I want all utmost-case Q's to have a similar "true" form, rather than different forms for each font—but only have a few dozen distinct items in the training set seems unlikely to work well. I'll bet the model might learn how to do letter pairs like C/c, F/f, K/k, M/m, N/n, O/o, P/p, S/s, U/u, V/v, W/w, X/x, Y/y, and Z/z, where there's a more-or-less straightforward geometric transformation, but not the others. Maybe you need a model to discriminate between "simple" letter pairs like those and "complex" letter pairs like A/a, G/g, Q/q, etc., and train a separate model on each. But now the training data is even thinner...
(I'd also be curious to see what kinds of results he'd get with his font-vector model and a GAN, where there's an "adversarial" model at rates and "punishes" the output based on how much it looks like a letter. That might cut way down on the the weirdo tumor results... Hmm, and the adversarial model should probably also score a letterform badly if it looks too much like either the existing upper- or lower-case letterform (hmm, and all the other proposed outputs, too...)
posted by The Tensor at 12:26 PM on April 6, 2021
Sorry to be pedantic, but to describe signed distance fields as bitmaps gives them short shrift. Each point in the field is the distance to the edge of the figure, thus SDFs, just like Béziers, describe an exact shape. In all cases there are questions of representational capability and precision. Scaled vector fields do poorly with small features, Béziers simply cannot precisely represent a circular arc, though they can approximate it “close enough.” But if a typeface designer wanted a precise circular arc—as many do—they are simply out of luck. They will have to accept a (very very good) approximation. In practice this is not a real problem.
The precise shape described by a signed distance field is also an approximation of the desired shape, and it’s not as good an approximation as splines. In particular, small features distort badly when magnified. Maybe not a good choice for serif fonts. But there are advantages: you get antialiasing and offsets for free. That’s actually really useful in a lot of applications. More information here.
posted by sjswitzer at 1:16 PM on April 6, 2021
The precise shape described by a signed distance field is also an approximation of the desired shape, and it’s not as good an approximation as splines. In particular, small features distort badly when magnified. Maybe not a good choice for serif fonts. But there are advantages: you get antialiasing and offsets for free. That’s actually really useful in a lot of applications. More information here.
posted by sjswitzer at 1:16 PM on April 6, 2021
I'm not familiar with signed distance fields beyond the video and the link you posted, but based on that limited understanding I'm not sure they're a good representation for a neural network to generate. Suppose we have an upper-case W and an upper-case H, both of which have a grid of signed distance field (SDF) values (which, fine, not pixels) in the upper-left that indicate they're close to the letter boundary, but aren't identical. Then you train a lower-to-upper-case neural network that produces SDF values and get some output values in that upper-left region that are kind of a "smeared" average of the two. Is there a deterministic way to go from any grid of SDF values to an outline? (It seems like there should be degenerate cases, like if the neural network produces a uniform grid of SDF values that all indicate they're two units outside the boundary...) My intuition is that a little bit of neural network "smearing" can produce a grid of SDF values that either isn't consistent, or produces a dramatic bulge in the outline that doesn't represent either of the inputs well.
posted by The Tensor at 1:45 PM on April 6, 2021
posted by The Tensor at 1:45 PM on April 6, 2021
All valid concerns, and in particular it is not likely that many generated SDFs would be fully consistent, such as your example of every point being distance 2. But nevertheless these inconsistent SDFs can be rendered as if they were consistent anyway. It’s not always clear what features neural networks are actually recognizing or transforming, but evidently this one was able to suss something from the SDF in as much as it was at least pretty good at predicting lower case letterforms from upper and vice versa. As for the rest, it was pretty whimsical. I took the whole thing in the spirit of throwing things at the wall to see what sticks. Exploratory research can be like that.
posted by sjswitzer at 2:38 PM on April 6, 2021
posted by sjswitzer at 2:38 PM on April 6, 2021
Oops, I didn’t answer your question.
Is there a deterministic way to go from any grid of SDF values to an outline?
Technically no, since there is no outline that will match an inconsistent SDF. It can still be rendered, though, by interpolating it to scale then rendering pixels by thresholding at zero. You could reverse-engineer a shape from that by tracing the figure and fitting splines but I don’t think that’s an answer to your exact question and it doesn’t seem very useful either.
posted by sjswitzer at 2:46 PM on April 6, 2021
Is there a deterministic way to go from any grid of SDF values to an outline?
Technically no, since there is no outline that will match an inconsistent SDF. It can still be rendered, though, by interpolating it to scale then rendering pixels by thresholding at zero. You could reverse-engineer a shape from that by tracing the figure and fitting splines but I don’t think that’s an answer to your exact question and it doesn’t seem very useful either.
posted by sjswitzer at 2:46 PM on April 6, 2021
Think about that H and W case again. Actually, no, think about the case of two upper-case W's with slightly different shapes, where the upper-left "arms" don't overlap. If you use the font outline representation, you might still hope to get a set of points from the neural network that form an outline sort of in between the two. But if you use the SDF representation grid, you could get TWO upper-left arms, or a single thick merged one.
Basically, I think of a letter in a font as:
1. Take an underlying "true" letter (a small group of line segments and dots)
2. Optionally transform (e.g., slanted)
3. Thicken zero-width lines to taste
4. Optionally add serifs or other decoration
...so I wanted to see him try modeling the underlying "true" form, rather than the output of that fontification process.
posted by The Tensor at 3:35 PM on April 6, 2021
Basically, I think of a letter in a font as:
1. Take an underlying "true" letter (a small group of line segments and dots)
2. Optionally transform (e.g., slanted)
3. Thicken zero-width lines to taste
4. Optionally add serifs or other decoration
...so I wanted to see him try modeling the underlying "true" form, rather than the output of that fontification process.
posted by The Tensor at 3:35 PM on April 6, 2021
Another example of "leveled-up" letters from Tumblr, complete with The Shitpost Calligrapher's contribution.
posted by ErisLordFreedom at 12:27 AM on April 7, 2021 [2 favorites]
posted by ErisLordFreedom at 12:27 AM on April 7, 2021 [2 favorites]
« Older Market index funds are... "worse than Marxism"? | Heavy Meadow Newer »
This thread has been archived and is closed to new comments
posted by rikschell at 4:23 PM on April 5, 2021 [2 favorites]