If you’re developing with OpenAI and using Ruby as your language, you’ll eventually need to implement a content filter before going into production. I’ve covered how to implement the content filter in Javascript in a previous post, so I wanted to do the same for Ruby. This code snippet will return a label value of either "0", "1", or "2" representing safe, sensitive, or unsafe content, respectively.

Please make sure that the required toxic_threshold has not changed. You can find the updated value in the official OpenAI docs.

Snippet

This snippet is also available on github here.

require 'ruby/openai'

CLIENT = OpenAI::Client.new

def content_filter(prompt)
  toxic_threshold = -0.355 # make sure required threshold hasn't changed https://beta.openai.com/docs/engines/content-filter
  wrapped_prompt = "<|endoftext>#{prompt}\n--\nLabel:"
  response = CLIENT.completions(engine: 'content-filter-alpha',
                                parameters: { prompt: wrapped_prompt,
                                              temperature: 0, max_tokens: 1, top_p: 0, logprobs: 10 }).parsed_response
  output_label = response['choices'][0]['text']
  if output_label == '2'
    logprobs = response['choices'][0]['logprobs']['top_logprobs'][0]
    if logprobs['2'] < toxic_threshold
      logprob_0 = logprobs['0']
      logprob_1 = logprobs['1']

      if logprob_0 && logprob_1
        output_label = if logprob_0 >= logprob_1
                         '0'
                       else
                         '1'
                       end
      elsif logprob_0 && !logprob_1
        output_label = '0'
      elsif !logprob_0 && logprob_1
        output_label = '1'
      end
    end
  end

  output_label = '2' unless %w[0 1 2].include?(output_label)

  output_label
end