Add selectable tokenizer supports on Ooba (#281)

# PR Checklist
- [ ] Did you check if it works normally in all models? *ignore this
when it dosen't uses models*
- [ ] Did you check if it works normally in all of web, local and node
hosted versions? if it dosen't, did you blocked it in those versions?
- [ ] Did you added a type def?

# Description
I write simple changes on code, which allow user to choose tokenizers.

As I write on https://github.com/kwaroran/RisuAI/issues/280, differences
in tokenizers makes error when use mistral based models.


![image](https://github.com/kwaroran/RisuAI/assets/62899533/3eb07735-874f-46d0-bc0c-c92a32ef927b)
As I'm not good at javascript, I simply implement this work by write
name of tokenizer model, and select one on tokenizer.ts file.

I test it on my node RisuAI and I send long context to my own server.

![image](https://github.com/kwaroran/RisuAI/assets/62899533/5b1f22a0-5b1b-4472-a994-bfe5472ba159)
As result, ooba returned 15858 as prompt tokens.


![image](https://github.com/kwaroran/RisuAI/assets/62899533/6d4c2185-07c9-4de1-8460-0983b6e45141)
And as I test on official tokenizer implementations, it shows 1k
differences between llama tokenizer and mistral tokenizer.

So I think adding this option will help users use oobabooga with less
error.
This commit is contained in:
kwaroran
2024-01-06 19:16:58 +09:00
committed by GitHub
3 changed files with 15 additions and 2 deletions

View File

@@ -61,6 +61,8 @@
<OptionalInput marginBottom={true} bind:value={$DataBase.reverseProxyOobaArgs.chat_instruct_command} />
{/if}
{/if}
<span class="text-textcolor">tokenizer</span>
<OptionalInput marginBottom={true} bind:value={$DataBase.reverseProxyOobaArgs.tokenizer} />
<span class="text-textcolor">min_p</span>
<OptionalInput marginBottom={true} bind:value={$DataBase.reverseProxyOobaArgs.min_p} numberMode />
<span class="text-textcolor">top_k</span>