omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    include standalone ObjC function

    Pythonista
    2
    12
    5605
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • JonB
      JonB last edited by

      By the way, the answer at the bottom of that stack overflow is what I've been playing around with... But the mixer is screwing up the inputNode, since the format are incompatible.

      1 Reply Last reply Reply Quote 0
      • daltonb
        daltonb last edited by daltonb

        No worries man, appreciate the help. Any chance you could just post that casting snippet for now? That sound tricky for me.

        As to the second point.. any reason not to just add the processing code to the tap block that updates the RecognitionRequest? (instead of adding a mixer)

        1 Reply Last reply Reply Quote 0
        • JonB
          JonB last edited by

          Sorry, on my phone, away from my iPad... But yes, you get access to the buffer in the handler, and can compute metet directly there before passing on to the recognizer.

          The one issue is that iOS doesn't seem to respect the buffer size -- instead giving us 16535 samples - about .375 sec -- so you only get new data a few times per second.
          There is in theory a way to request fewer samples (thus faster call rate And lower latency), using the lower level audiounit, but I can't seem to get that working...

          1 Reply Last reply Reply Quote 0
          • daltonb
            daltonb last edited by

            Ok, I'll try to figure out the right casting call in the meantime. Yeah that is annoying; however from that stackoverflow post I've verified that calling buffer.setFrameLength(1024) succeeds in speeding up the sampling rate significantly after the first long (0.375s) sample.. haven't checked yet to see if I can update that before the first sample, but shouldn't matter too much for my purposes.

            1 Reply Last reply Reply Quote 0
            • JonB
              JonB last edited by JonB

              def handler(_cmd,obj1_ptr,obj2_ptr):
              	# param1 = AVAudioPCMBuffer
              	#   The buffer parameter is a buffer of audio captured 
              	#   from the output of an AVAudioNode.
              	# param2 = AVAudioTime
              	#   The when parameter is the time the buffer was captured  
              	if obj1_ptr:
              		obj1 = ObjCInstance(obj1_ptr)
              		#print('length:',obj1.frameLength(),'sample',ObjCInstance(obj2_ptr).sampleTime())
              		#print('format:',obj1.format())
              		data=obj1.floatChannelData().contents
              		data_np=np.ctypeslib.as_array(obj=data,shape=(obj1.frameLength(),)) #if you want to use it outside of the handler, use .copy()
                      power=n.sqrt(np.mean(np.square(data_np)))
              1 Reply Last reply Reply Quote 0
              • JonB
                JonB last edited by

                You would then, in the handler, set an attribute on your view with the power, which will get used next frame. (Or better yet, don't use update in the view, instead trigger the draw using the handler, this ensuring you only draw when updated info is available.

                If you want 60Hz frame rate, you'd want the frameLength to be 735 samples.

                1 Reply Last reply Reply Quote 0
                • daltonb
                  daltonb last edited by

                  MONEY. This works great for accessing the sound data!!

                  Sadly, upon further testing, setting the frame length to 1024 makes the speech recognition results very poor. Not sure why.. any ideas? Do you think the speech recognizer is expecting the original frame length somehow? For instance I say "Hello" and it outputs "LOL" sometimes, so maybe the input is getting clipped.

                  My frame length on my phone is actually 4410 by default which is ok, but I guess this is a platform specific number.

                  1 Reply Last reply Reply Quote 0
                  • JonB
                    JonB last edited by

                    I have not tried the frameLength trick, but I wonder if the copy is having trouble keeping up, resulting in dropouts. You could write those samples to a .wav file, then listen to it using the quicklook, to see if the quality is suffering. If you comment out the numpy stuff, does the lower frame still cause poor results? If not, there are some techniques we can use to speed that processing.

                    Other possibilities would be to reduce sample rate (8000, 11050, or 22100), which should ease the processor burden.

                    1 Reply Last reply Reply Quote 0
                    • JonB
                      JonB last edited by JonB

                      this may be obvious, but be sure to set the frameLength prior to passing it to the recognizer, otherwise it will be getting duplicate data.

                      what happens, i think, is that the buffer contains all of the samples, including the initial 0.375 or whatever sec. if you change frame length to 1024, you are telling the engine how many samples you consumed -- it wants to keep that buffer the same size, and not ever skip, so it calls you sooner next time, where everything shifted left, and new samples appended at the end. The least latency would be those end samples. This takes the latency down from .375 for me to maybe 20-30 msec.

                      
                      def handler(_cmd,buffer_ptr, samptime_ptr):
                          if buffer_ptr:
                              buffer = ObjCInstance(buffer_ptr)
                              # a way to get the sample time in sec of start of buffer, comparable to time.perf_counter.  you can differnce these to see latency to start of buffer.	
                              hostTimeSec=AVAudioTime.secondsForHostTime_(ObjCInstance(samptime_ptr).hostTime())
                      
                              #you can also check for skips, by looking at sampleTime(), which should be always incrementing by whatever you set the framelength to... if more than that, then your other processing is taking too long
                      
                              #this just sets up pointers that numpy can read... no actual read yet
                              data=buffer.floatChannelData().contents
                              data_np=np.ctypeslib.as_array(obj=data,shape=(buffer.frameLength(),))
                      
                              #Take the LAST N samples for use in visualization... i.e the most recent, and least latency
                              update_path(data_np[-1024:])
                      
                              #this tells the engine how many samples we consumed ... next time, we will get samples [1024:] along with 1024 new samples
                              buffer.setFrameLength_(1024)
                      
                              # be sure to append the buffer AFTER setting the frameLength, otherwise you will keep feeding it repeated portions of the data
                              requestBuffer.append(buffer)
                      
                      1 Reply Last reply Reply Quote 0
                      • daltonb
                        daltonb last edited by

                        Hey @JonB sorry for the slow response, this did help me get over a hump though. I think my frameCapacity is less than yours which is apparently the upper limit for frameLength.. setting sample size to 2048 worked well. I'm planning to post a first crack at a live speech recognition module soon.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Powered by NodeBB Forums | Contributors