risuai

Files

kwaroran cba3ff802c Add selectable tokenizer supports on Ooba (#281 )

# PR Checklist
- [ ] Did you check if it works normally in all models? *ignore this
when it dosen't uses models*
- [ ] Did you check if it works normally in all of web, local and node
hosted versions? if it dosen't, did you blocked it in those versions?
- [ ] Did you added a type def?

# Description
I write simple changes on code, which allow user to choose tokenizers.

As I write on https://github.com/kwaroran/RisuAI/issues/280, differences
in tokenizers makes error when use mistral based models.


![image](https://github.com/kwaroran/RisuAI/assets/62899533/3eb07735-874f-46d0-bc0c-c92a32ef927b)
As I'm not good at javascript, I simply implement this work by write
name of tokenizer model, and select one on tokenizer.ts file.

I test it on my node RisuAI and I send long context to my own server.

![image](https://github.com/kwaroran/RisuAI/assets/62899533/5b1f22a0-5b1b-4472-a994-bfe5472ba159)
As result, ooba returned 15858 as prompt tokens.


![image](https://github.com/kwaroran/RisuAI/assets/62899533/6d4c2185-07c9-4de1-8460-0983b6e45141)
And as I test on official tokenizer implementations, it shows 1k
differences between llama tokenizer and mistral tokenizer.

So I think adding this option will help users use oobabooga with less
error.

2024-01-06 19:16:58 +09:00