Deep Chernoff Faces

One of my favorite¹ concepts for multi-dimensional data visualization is the Chernoff Face. The idea here is that for a dataset with many dependent variables, it is often difficult to immediately understand the influences one variable may have on another. However, humans are great at recognizing small differences in faces, so maybe we can leverage that!

Tired: traditional plotting

The example from Wikipedia plots the differences between a few judges on some rating dimensions:

which already improves on the “traditional” way to present such data, which is something similar to a line chart:

or a radar chart:

Clearly² the Chernoff faces are the best way to present this data, but they leave some of the gamut unexplored.

Wired: using GANs to synthesize faces

The GAN technique is pretty easily summarized. You set up two neural networks, a generator, which tries to generate realistic images, and a discriminator, which tries to distinguish between the output of the generator and a real corpus of images. By training the networks together, adding more layers, fooling around with hyperparameters and overall “the scientific process”, you manage to get results where the generator network is able to fool the discriminator (and humans) with high-quality images of faces that do not actually map to anybody in the real world. StyleGAN improves on some of the basic structure here by using a novel generator architecture that stacks a bunch of layers together and ends up with an intermediate latent space that has “nice” properties.

Basically how it works.

A trained StyleGAN (1 or 2; the architecture for the dimensions does not change between versions), at the end of the day, takes a 512 element vector in the latent space “Z”, then sends it through some

nonsense

fully-connected layers to form a “dlatent³ vector of size 18×512. The 18 here indicates how many layers there are in the generator proper—the trained networks that NVIDIA provides produce 1024×1024 images; there are 2 layers for each of the dimensions from 2^2=4 to 2^10=1024.

What part of STACK MORE LAYERS did you not get?

The upshot is that the 18×512 space has nice linearity properties that will be useful to us when we want to generate photorealistic faces that only differ on one axis of importance. The authors of the NVIDIA paper call each of the 18 layers a “style”, and the observation is that copying qualities from each style gives qualitatively different results.

The styles that correspond to coarse resolutions bring high-level aspects like pose, hair style, face shape; the styles that correspond to fine resolutions bring low-level aspects like color scheme. But they only roughly correspond to these, so it’s going to be somewhat annoying to play around with the latent space! What we need is some way to automatically classify these images for the properties we want to use in our Chernoff faces…


Putting it all together

What I did was take an unofficial re-implementation of StyleGAN2 and run it to generate 4096 random images (corresponding actually to random seeds 0 – 4095 in the original repository). The reason why we’re using an unofficial implementation is that the unofficial implementation ports everything to TensorFlow 2, which also enabled me to run it with (a) less GPU RAM (I only have a GTX 1080 at home) and (b) on the CPU if needed for a future project where I serve these images dynamically.

I was also too lazy to train a network to recognize any features, so I fed these images through Kairos‘s free trial, receiving API responses that roughly looked like this. (There’s no code for this part, you can just do it with cURL).

Then, somewhere in this gigantic unorganized notebook that I need to clean up, I train some linear SVMs to split our sample of the latent space; eventually I will clean this up and make it so it’s a little more automated than the crazy experimentation I was doing. After I train the SVMs and receive the normal vector from their separating hyperplanes, I manually explore the latent space by considering only one style’s worth of elements from each of the normal vectors, plotting the results, and seeing what changes about the photos (which could be different than the feature from the API response!). This could probably be automated; I should likely use the conditional entropy of the Kairos API responses given the SVM result to rank which styles are most important and then automatically return that instead of manually pruning styles.


I ended up with seven usable properties plus one I threw out before I got tired of doing this by hand.

  • yaw
  • eye squint
  • age
  • smile
  • skin tone
  • gender
  • hair length
  • quality of photo

I decided not to use quality of photo (you can see the results in the notebook results) because I didn’t want half the photos to just look terrible. The good news is that I made a new notebook for generating the actual faces, one that should actually work for you if you clone the repo.

After generating a few images and using the `montage` command from ImageMagick, we get our preliminary results!

Now that’s the future!

¹ “Favorite” might be code for “useless,” going with the theme of this blog.
² Clearly.
³ It’s called a dlatent in the code, but the paper calls it the space W and the vectors w. I don’t know.
 

for q in `seq 0 4095`; do i=$(printf "%04d" $q); curl -d '{"image": "http://cantina.patrickxia.com/faces/seed'$i'.png"}' -H "app_id: xxx" -H "app_key: xxx" -H 'store_image: "false"' -H "Content-Type: application/json" http://api.kairos.com/detect > $i.json; sleep 1; done

and something similar for the `media` API, which gives you landmarks. If you care enough, I’ve hosted the images here, which lets you just look through them if you want.

Read More

Leave a Reply

Your email address will not be published.