fluffy 💜 (@fluffy.plush.city.ap.brid.gy)

VR performance setup 2.0 Last September I wrote about my VRChat performance setup, based on some new changes I was trying in order to do more to integrate backing tracks into my performances. I quickly ran into some limits with the approach I was taking, and have ended up completely changing how I do things since then, with a setup that is much more reliable, more capable, and higher-quality. It also allows me to use the same audio setup for both mic-boosted and streamed performances. So here’s how my performing setup works! Note: I may earn a commission on affiliated product links in this article. ### The hardware side Previously I was using an amalgam of the built-in mic on my headset, the line input on my onboard audio, and VoiceMeeter to tie everything together. Unfortunately, this setup had some pretty severe limitations, especially when it came to managing latency (particularly with backing tracks), as well as being able to add live effects to the signal chain. It also led to some embarrassing situations where my audio would go haywire due to a connector coming loose or the like. A few years ago, I upgraded my recording studio from a Focusrite Scarlett 18i8 to an 18i20, so I had this spare 18i8 just sitting around. I was using it on my office computer (where I do my video editing and programming), but it was massive overkill for those needs, and I came to realize it works much better for my performance setup instead. So now I have a plain old headphone amp in the office, and the 18i8 is on my VR computer. The 18i8 has a handy feature where you can set up multiple output/monitor submixes, so for example you can have different audio levels of different things going to the headphones, the line outputs, and so on. It also has a built-in “loopback” interface, where you can give it a monitor mix that then appears as a standard audio input to the computer. These features are _extremely_ useful for this use case (as well as any other live performance or studio recording situation). I have the following connections: * Front input 1: My microphone (I currently switch between an MXL condenser mic and an Electro-Voice dynamic depending on my mood and what’s sounding better at the moment) * Front input 2: My guitar signal chain * Headphone output 1: a LEKATO Wireless IEM system (which in turn connects to some old 3.5mm Apple earbuds, which give me a nice balance of size and audio quality; there are better ones to buy new but I had these lying around from some old iPhone or something)1 * Headphone output 2: my tiny lunchbox amp * My reverb unit, with its inputs on the 18i8’s Line 1-2 outputs, and its outputs on the 18i8’s Line 5-6 inputs * Front input 3 and 4: Available for other instruments (sometimes I plug my digital piano in there, for example) I have the reverb unit set to 100% wet, so that it is only being used as a bus send. I also have a bypass toggle pedal so that I can cut to a purely dry signal when needed (such as in A Long Plastic Hallway, which uses lack-of-reverb as an effect for emphasis). ### Mix setup First, I use the multiple input functionality to provide Windows audio devices for all of the necessary channels. Windows 10 and 11 also have a feature where you can assign arbitrary labels to your audio inputs and outputs; I use this to give the following names to the playback channels: * Playback 1+2: Playback * Playback 3+4: Game audio * Playback 5+6: Control room and I also set labels on my VR headset’s microphone and speakers, just to make them easier to keep track of. In Focusrite Control I have separate submixes for all three of the outputs, as well as the loopback interface. Headphone 1 gets the monitor mix that goes to my IEMs. It receives all of the input channels, as well as all three2 “Playback” channels (which are used to route multiple separate software outputs into separate mixes). Headphone 2 gets the same, minus the microphone (to avoid feedback), and is connected to the line input on my lunchbox amp.3 Line 1-2 (reverb send) gets just the instruments; I bake the reverb into my backing tracks. Loopback gets the mix that goes out to the stream and/or world. ### Software setup Unlike before, I do not need Voicemeeter, and I do not even have it installed anymore, as all of the audio that goes to the stream is now handled by the 18i8, and I do not need to mix anything into or from my VR headset. I have configured VLC (which I use for playing my backing tracks) to output to the Playback 1+2 device by default. #### OBS setup OBS allows you to set up multi-channel audio recording. I set OBS to use channel 1 in the stream, and to record all 6 channels to separate tracks in my local recording. Then I have the following audio input sources: Input name | Source name | OBS output track | Video audio channel ---|---|---|--- Loopback | Live mix | 1 | 1+2 Input 1-2 | Mic + guitar | 2 | 3+4 Input 3-4 | Piano | 3 | 5+6 Input 5-6 | Reverb | 3 | 5+6 Headset mic | Headset mic | 5 (panned left) | 9 As well as the following audio output captures: Output name | Source name | OBS output track | Video audio channel ---|---|---|--- Headset speakers | Headset speakers | 5 (panned right) | 10 Playback 1+2 (playback) | Backing track | 4 | 7+8 Playback 3+4 (game audio) | Game audio | 6 | 11+12 Playback 5+6 (control room) | Control room | 5 (panned right) | 10 Finally, I have a bunch of visuals set up; mostly this is Spout2 to capture my in-game streaming camera, and Waveform to let me do various audio visualizers in varying combinations. #### Mic boosted performances When doing a mic boosted performance, I set my system audio output to Playback 3+4 (Game Audio) and VRChat’s audio input to Loopback. This way, my full final audio mix goes to my VRChat microphone, and I hear the game in my IEMs. My backing track works the same as anywhere else. The one downside to this setup is that my lip sync will also follow my instruments and backing track, but there’s not a lot I can do about that aside from adding face tracking to my VR setup. If I want to record my performance, I launch OBS and set it to record. OBS is not involved in the signal chain going to the world at all. #### Streamed performances Most streamed performances involve a Discord voice call for coordinating between the show runners and the performers. This is where the “control room” channel comes in; I set Discord’s voice to use my VR headset’s microphone as input, and Playback 5+6 as speakers. This way the voice chat only hears my voice (rather than all of my instruments), and I can hear anything they say on my IEMs. The Discord call gets recorded to track 5, with the left channel being me and the right channel being everyone else. Otherwise, my audio setup is as follows: * System audio to Playback 3+4 * VRChat microphone is the VR headset mic (so it gets clean lipsync) * And I set VRChat’s “microphone output level” to 0%, so that people (and my camera) can see my lips move but the audio doesn’t go out into the world As far as running the stream itself goes, typically I stream either to my Owncloud instance or to VRCDN depending on the needs of the show. Larger shows provide their own streaming ingest. * Owncloud lets me serve an absolute crapton of viewers (thanks in part to my overly-complicated CDN setup), but it’s not allowed as a stream source by default in VRChat so people need to enable untrusted URLs. It also tends to be pretty high in latency, usually on the order of 6-10 seconds. * VRCDN limits me to 40 concurrent viewers, but the latency is pretty low (usually 1-2 seconds). This is fine for smaller shows, and many showrunners will restream my VRCDN stream into the world with their own visuals overlaid on top anyway, which also adds some latency. * Larger music festivals (such as VRelium) will provide their own ingest server and stream management. #### Editing recordings And now the really nice thing about this setup is that I can do some audio editing and remixing in retrospect. I do most of my video editing in DaVinci Resolve, which has pretty good multichannel audio support. By default, the video will be pulled in with the following stereo audio tracks: Track | Channels | Contents ---|---|--- 1 | 1+2 | Live mix 2 | 3+4 | Mic (left) + guitar (right) 3 | 5+6 | Piano + reverb 4 | 7+8 | Backing track 5 | 9+10 | Headset mic (left) + Discord call (right) 6 | 11+12 | Game audio/audience Typically what I’ll do is separate out the audio tracks from the video, and I’ll shift track 6 back to compensate for the latency between me and the audience. For mic-boost performances this isn’t much (usually under a second) and can usually be ignored, but for streamed performances this will be multiple seconds (often on the order of 20 or more!) and this is especially important during those magical times when people either respond to my banter or sing along with me! (The latter happens with the call-and-response bits in Safety In Numbers, and it fills me with warm fuzzies every time.) If the track 1 (live mix) audio is fine, I’ll use it and track 6 directly, and mute the other tracks. However, sometimes I need to get fancy and change the mix in retrospect. In that case, I’ll change my audio tracks as such: * Audio 1: Stereo, using channels 1+2 (live mix) * Audio 2: Mono, using channel 3 (voice) * Audio 3: Mono, using channel 4 (guitar) * Audio 4: Stereo, using channels 5+6 (reverb, and piano if I happened to use it) * Audio 5: Stereo, using channels 11+12 (game audio) * Audio 6: Stereo, using channels 7+8 (backing track) In this situation, I’ll mute tracks 2-4, and use track 1 to line up tracks 5 and 6, which will have differing amounts of latency. Then I’ll mute track 1 and unmute 2-4, and then adjust my recorded mix as necessary. Track 6 in particular needs to be lined up pretty carefully, as OBS captures outputs with no latency at all, but inputs get about 300ms of latency due to limitations in Windows audio. In theory I could have OBS add 300ms or so of latency to the output capture, but it’s fiddly and I’d still need to adjust things anyway, so I’d might as well just do it once when I edit.4 ### Backing track/playlist setup When I’m performing there’s a lot of stuff to keep track of. If I’m only doing 1-2 songs I can load the .wav files into VLC and it’s no big deal, but many of my shows are much more complicated and my ADHD brain can only hold so much stuff in my working memory. So, I have a Final Cut Pro library that contains all of my backing tracks; some songs have multiple versions available (e.g. with and without guitar mixed in, or album vs. live versions). In the library files I also have brief version notes and the dominant key signature, and with this I can quickly put together a set list with a reasonable progression and fitting the time constraints. A few of the songs also have a lyric display baked in, because try as I might, I just can’t memorize every song5. For some songs I’ll use Croonify to prepare a synchronized lyric display (replacing Croonify’s stem-separated audio with my own clean backing track), but for others I’ll just put up some basic text with the necessary cues to keep me from messing up too badly. When I prepare my set, I’ll also put in a bit of visual stuff for my own reference, such as having it display the title of the next song or little notes like “2 songs left” or specific banter points I need to hit. When I encode the video I’ll just use Final Cut’s “Export File (default)” to do a quick lossless encode and then I’ll use FFmpeg to encode the final video at a more useful bitrate, with: ffmpeg -i "2026-02-30 example.mov" -b:a 320k \ "~/Sync/backing tracks/shows/2026-02-30 example.mp4" Finally, I use SyncThing to automatically synchronize my `~/Sync` directory between my various computers, which is super handy. (It’s also a _lot_ easier than dealing with network shares!) When it’s time to perform, I’ll use SteamVR’s desktop overlay function to float my VLC window in the world with me, and so then I’ll always have my visual reference where I need it. (So if you see me looking downward a lot, it isn’t _just_ me being introverted.) When it comes to actually performing, I make sure that VLC has keyboard focus and then I can just press the space bar on my keyboard to start and, if necessary, pause the backing track. ### Doing sound checks I also finally recently figured out a better way of doing sound checks; previously I’d do this laborious process of recording the loopback interface into Audacity while performing parts of my set and then try making adjustements and it was super annoying. Nowadays I’ve found a much better way; basically, I open two instances of VLC. The first instance gets my backing track (and continues to play to Playback 1+2). The second instance is set to play to Playback 5+6, and I have it “open capture device,” and have it capture the loopback audio with a 10-second caching delay. Then, I can unpause my backing track player and perform for 10 seconds, then pause the backing track while I listen to how it sounded. This gives me a much faster means of iterating on my adjustments and getting things sounding really good. 1. Props to Niko Fox for turning me on to this specific wireless IEM unit. ↩ 2. I actually have all four going to it but Playback 7+8 isn’t used for anything. Someday I might figure out a way of using it for a click track or additional audio cues, though. ↩ 3. I don’t usually have the amplifier’s speakers on when I’m performing but it’s nice to have when I’m practicing. Additionally, the amplifier’s line output and instrument input are no longer connected to anything, as they are not needed in this setup. I do still use the amplifier’s power bus for my pedal board, however. ↩ 4. In theory one could use ASIO to mitigate the latency via the respective OBS plugin), but there’s still going to be _some_ latency, and I err on the side of pragmatism since I’m going to have to adjust things anyway. Also, even if the latency can be eliminated from OBS’s point of view, it’s still going to be present for VRChat, so it’s going to need to be adjusted between the visuals and the audio, and that is, in my experience, much harder to do well. ↩ There’s also the issue of ASIO often requiring exclusive access to the device, which means it might not even work in this situation to begin with. So basically, ASIO _might_ work but I haven’t tried it nor have I seen any compelling reason to. 5. Although I’m proud to say that most of the songs I perform I do have completely memorized, and the more I perform songs I’m not off-book on yet, the closer I get to getting there. Assistive technology FTW. ↩ https://sockpuppet.band/blog/1302-VR-performance-setup-2.0