{"id":592844,"date":"2025-08-25T12:01:26","date_gmt":"2025-08-25T11:01:26","guid":{"rendered":"https:\/\/blog.jetbrains.com\/?post_type=pycharm&#038;p=592844"},"modified":"2026-02-13T14:40:56","modified_gmt":"2026-02-13T13:40:56","slug":"fine-tuning-and-deploying-gpt-models-using-hugging-face-transformers","status":"publish","type":"pycharm","link":"https:\/\/blog.jetbrains.com\/pycharm\/2025\/08\/fine-tuning-and-deploying-gpt-models-using-hugging-face-transformers\/","title":{"rendered":"A Practical Guide to Fine-Tuning and Deploying GPT Models Using Hugging Face Transformers"},"content":{"rendered":"\n<p>Hugging Face is currently a household name for machine learning researchers and enthusiasts. One of their biggest successes is <a href=\"https:\/\/huggingface.co\/docs\/transformers\/en\/index\" target=\"_blank\" rel=\"noopener\">Transformers<\/a>, a model-definition framework for machine learning models in text, computer vision, audio, and video. Because of the vast repository of state-of-the-art machine learning models available on the <a href=\"https:\/\/huggingface.co\/models\" target=\"_blank\" rel=\"noopener\">Hugging Face Hub<\/a> and the compatibility of Transformers with the majority of training frameworks, it is widely used for inference and model training.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why fine-tuning GPT models matters<\/h2>\n\n\n\n<p>Fine-tuning AI models is crucial for tailoring their performance to specific tasks and datasets, enabling them to achieve higher accuracy and efficiency compared to using a general-purpose model. By adapting a pre-trained model, fine-tuning reduces the need for training from scratch, saving time and resources. It also allows for better handling of specific formats, nuances, and edge cases within a particular domain, leading to more reliable and tailored outputs.<br><br>In this blog post, we will fine-tune a GPT model using the Hugging Face Transformers library with mathematical reasoning so it better handles math questions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Using Hugging Face models in PyCharm<\/h2>\n\n\n\n<p>After <a href=\"https:\/\/www.jetbrains.com\/pycharm\/download\/\" target=\"_blank\" rel=\"noopener\">downloading PyCharm<\/a>, we can easily browse and add any models from Hugging Face. In a new Python file, from the <em>Code<\/em> menu at the top, select <em>Insert HF Model<\/em>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"946\" height=\"1070\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2025\/08\/image-40.png\" alt=\"Using models from Hugging Face\" class=\"wp-image-594074\"\/><\/figure>\n\n\n\n<p>In the menu that opens, you can browse models by category or start typing in the search bar at the top. When you select a model, you can see its description on the right.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"1600\" height=\"923\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2025\/08\/image-41.png\" alt=\"Explore models from Hugging Face\" class=\"wp-image-594085\"\/><\/figure>\n\n\n\n<p>When you click<em> Use Model, <\/em>you will see a code snippet added to your file. And that&#8217;s it \u2013 You&#8217;re ready to start using your Hugging Face model.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"1600\" height=\"312\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2025\/08\/image-42.png\" alt=\"Use Hugging Face models in PyCharm\" class=\"wp-image-594096\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Understanding GPT models<\/h2>\n\n\n\n<p>GPT models are very popular on the <a href=\"https:\/\/huggingface.co\/models\" target=\"_blank\" rel=\"noopener\">Hugging Face Hub<\/a>, but what are they? GPTs are trained models that understand natural language and generate high-quality text. They are mainly used in tasks related to textual entailment, question answering, semantic similarity, and document classification. The most famous example is <a href=\"https:\/\/openai.com\/index\/chatgpt\/\" target=\"_blank\" rel=\"noopener\">ChatGPT, created by OpenAI<\/a>.<\/p>\n\n\n\n<p>A lot of OpenAI GPT models are available on the <a href=\"https:\/\/huggingface.co\/models\" target=\"_blank\" rel=\"noopener\">Hugging Face Hub<\/a>, and we will learn how to use these models with Transformers<em>, <\/em>fine-tune them with our own data, and deploy them in an application<em>.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What are transformers and GPT models?<\/strong><\/h2>\n\n\n\n<p>Transformers are a class of deep learning models designed to understand and generate sequences of data, most commonly text. At the heart of a transformer is the attention mechanism, which allows the model to weigh the importance of different words in a sequence relative to each other.<\/p>\n\n\n\n<p>While decoder-only models such as GPTs process tokens autoregressively from left to right during both training and inference, they still use self-attention over all previously generated tokens to build context. This attention-based approach enables the model to consider relevant information from across the whole available input, making it far better at handling long-range dependencies and complex language patterns.<\/p>\n\n\n\n<p>GPT models are a specific type of transformer that use a decoder-only architecture. This means they are trained to predict the next token in a sequence based solely on what has come before. By stacking multiple attention layers and feed-forward networks, GPT models learn rich representations of language during pre-training on large text corpora.<\/p>\n\n\n\n<p>The core principle behind a GPT is to pre-train on vast amounts of general text, then fine-tune on more specific data to adapt it to particular tasks. This enables GPT models to perform text generation, summarization, and domain-specific reasoning after fine-tuning.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Benefits of transformers for GPT fine-tuning<\/h2>\n\n\n\n<p>Transformers, together with other tools provided by Hugging Face, provides high-level tools for fine-tuning any sophisticated deep learning model. Instead of requiring you to fully understand a given model\u2019s architecture and tokenization method, these tools help make models \u201cplug and play\u201d with any compatible training data, while also providing a large amount of customization in tokenization and training.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Why use Hugging Face transformers?<\/strong><\/h2>\n\n\n\n<p>Hugging Face transformers provide a unified ecosystem for working with modern machine learning models, removing much of the complexity involved in training and deploying large language models (LLMs). Instead of building training loops, evaluation logic, and model loading from scratch, developers can rely on high-level abstractions that still allow for deep customization when needed.<\/p>\n\n\n\n<p>One of the key components is the <a href=\"https:\/\/huggingface.co\/docs\/transformers\/en\/main_classes\/trainer\" target=\"_blank\" rel=\"noopener\">Trainer API<\/a>, which handles the full training lifecycle. Instead of writing custom training loops, you can define your TrainingArguments, pass your training and evaluation datasets into Trainer, and start training with just a few lines of code. This makes fine-tuning large models far more approachable, especially when experimenting with different hyperparameters or datasets. Hugging Face also offers a rich datasets library, which simplifies loading, preprocessing, and splitting large-scale datasets directly from the Hub.<\/p>\n\n\n\n<p>Tokenizers are tightly integrated with each model architecture, ensuring consistent preprocessing and efficient handling of text at scale. For performance and scalability, the <a href=\"https:\/\/huggingface.co\/docs\/accelerate\/en\/index\" target=\"_blank\" rel=\"noopener\">Accelerate library<\/a> enables easy training across CPUs, GPUs, and multiple devices without significant code changes.<\/p>\n\n\n\n<p>Beyond the tooling, the <a href=\"https:\/\/huggingface.co\/docs\/transformers\/en\/community\" target=\"_blank\" rel=\"noopener\">Hugging Face community<\/a> is a major advantage. Thousands of open-source models, datasets, and examples are shared on the Hub, allowing teams to build on proven work, learn from others, and iterate faster.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Running transformers with Hugging Face<\/h2>\n\n\n\n<p>To get a closer look at<em> <\/em>Transformers in action, let\u2019s see how we can use it to interact with a GPT model.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Inference using a GPT pretrained model with a pipeline<\/h3>\n\n\n\n<p>After selecting and adding the OpenAI GPT-2 model to the code, this is what we\u2019ve got:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from transformers import pipeline\n\n\npipe = pipeline(\"text-generation\", model=\"openai-community\/gpt2\")<\/pre>\n\n\n\n<p>Before we can use it, we need to make a few preparations. First, we need to install a machine learning framework. In this example, we chose <a href=\"https:\/\/pytorch.org\/get-started\/locally\/\" target=\"_blank\" rel=\"noopener\">PyTorch<\/a>. You can install it easily via the <em>Python Packages<\/em> window in PyCharm.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"920\" height=\"654\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2025\/08\/image-43.png\" alt=\"Install PyTorch in PyCharm\" class=\"wp-image-594107\"\/><\/figure>\n\n\n\n<p>Then we need to install Transformers<em> <\/em>using the `torch` option. You can do that by using the terminal \u2013 open it using the button on the left or use the <em>\u2325 F12 <\/em>(macOS) or <em>Alt + F12<\/em> (Windows) hotkey.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"838\" height=\"502\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2025\/08\/image-44.png\" alt=\"Install Transformers in PyCharm's terminal\" class=\"wp-image-594118\"\/><\/figure>\n\n\n\n<p>In the terminal, since we are using uv, we use the following commands to add it as a dependency and install it:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"bash\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">uv add \u201ctransformers[torch]\u201d\nuv sync<\/pre>\n\n\n\n<p>If you are using pip:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"bash\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">pip install \u201ctransformers[torch]\u201d<\/pre>\n\n\n\n<p>We will also install a couple more libraries that we will need later, including python-dotenv, datasets<em>, <\/em>notebook,<em> <\/em>and<em> <\/em>ipywidgets<em>. <\/em>You can use either of the methods above to install them.<br>After that, it may be best to add a GPU device to speed up the model. Depending on what you have on your machine, you can add it by setting the device parameter in pipeline<em>. <\/em>Since I am using a Mac M2 machine, I can set<code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\"> device=\"mps\"<\/code> like this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">pipe = pipeline(\"text-generation\", model=\"openai-community\/gpt2\", device=\"mps\")<\/pre>\n\n\n\n<p>If you have CUDA GPUs you can also set <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">device=\"cuda\"<\/code>.<\/p>\n\n\n\n<p>Now that we\u2019ve set up our pipeline, let\u2019s try it out with a simple prompt:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from transformers import pipeline\n\n\npipe = pipeline(\"text-generation\", model=\"openai-community\/gpt2\", device=\"mps\")\n\n\nprint(pipe(\"A rectangle has a perimeter of 20 cm. If the length is 6 cm, what is the width?\", max_new_tokens=200))<\/pre>\n\n\n\n<p>Run the script with the <em>Run<\/em> button (<img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2025\/09\/AD_4nXf6ZDm7vSGyFlO0DzXegK6WP9JxsStUiJA-bkRZ0mwPsUsmn8M70emV5Sr8f17-fEK6z9V1EQKWEm3RPHdT8n8uqG18faVmQn5y09psVInQLU0CZQKXAEg2q7m7AOsh4hPU7G8gcQ.png\" width=\"30\" height=\"23\">) at the top:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"1050\" height=\"282\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2025\/08\/image-45.png\" alt=\"Run the script in PyCharm\" class=\"wp-image-594129\"\/><\/figure>\n\n\n\n<p>The result will look something like this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">[{'generated_text': 'A rectangle has a perimeter of 20 cm. If the length is 6 cm, what is the width?nnA rectangle has a perimeter of 20 cm. If the length is 6 cm, what is the width? A rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width? A rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width? A rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width?nnA rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width? A rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width? A rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width? A rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width?nnA rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width? A rectangle has a perimeter'}]<\/pre>\n\n\n\n<p>There isn\u2019t much reasoning in this at all, only a bunch of nonsense.&nbsp;<\/p>\n\n\n\n<p>You may also see this warning:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"bash\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.<\/pre>\n\n\n\n<p>This is the default setting. You can also manually add it as below, so this warning disappears, but we don\u2019t have to worry about it too much at this stage.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">print(pipe(\"A rectangle has a perimeter of 20 cm. If the length is 6 cm, what is the width?\", max_new_tokens=200, pad_token_id=pipe.tokenizer.eos_token_id))<\/pre>\n\n\n\n<p>Now that we\u2019ve seen how GPT-2 behaves out of the box, let\u2019s see if we can make it better at math reasoning with some fine-tuning.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Loading and preparing a dataset for GPT fine-tuning<\/h3>\n\n\n\n<p>Before we work on the GPT model, we first need training data. Let\u2019s see how to get a dataset from the Hugging Face Hub.<\/p>\n\n\n\n<p>If you haven&#8217;t already, sign up for a Hugging Face account and <a href=\"https:\/\/huggingface.co\/docs\/hub\/security-tokens#user-access-tokens\" target=\"_blank\" rel=\"noopener\">create an access token<\/a>. We only need a `read` token for now. Store your token in a `.env` file, like so:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"bash\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">HF_TOKEN=your-hugging-face-access-token<\/pre>\n\n\n\n<p>We will use this <a href=\"https:\/\/huggingface.co\/datasets\/Cheukting\/math-meta-reasoning-cleaned\" target=\"_blank\" rel=\"noopener\">Math Reasoning Dataset<\/a>, which has text describing some math reasoning. We will fine-tune our GPT model with this dataset so it can solve math problems more effectively.<\/p>\n\n\n\n<p>Let\u2019s create a new Jupyter notebook, which we\u2019ll use for fine-tuning because it lets us run different code snippets one by one and monitor the progress.<\/p>\n\n\n\n<p>In the first cell, we use this script to load the dataset from the Hugging Face Hub:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from datasets import load_dataset\nfrom dotenv import load_dotenv\nimport os\n\n\nload_dotenv()\ndataset = load_dataset(\"Cheukting\/math-meta-reasoning-cleaned\", token=os.getenv(\"HF_TOKEN\"))\ndataset<\/pre>\n\n\n\n<p>Run this cell (it may take a while, depending on your internet speed), which will download the dataset. When it\u2019s done, we can have a look at the result:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">DatasetDict({\n    train: Dataset({\n        features: ['id', 'text', 'token_count'],\n        num_rows: 987485\n    })\n})\n<\/pre>\n\n\n\n<p>If you are curious and want to have a peek at the data, you can do so in PyCharm. Open the <em>Jupyter Variables<\/em> window using the button on the right:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"1052\" height=\"740\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2025\/08\/image-46.png\" alt=\"Open Jupyter Variables in PyCharm\" class=\"wp-image-594140\"\/><\/figure>\n\n\n\n<p>Expand <em>dataset<\/em> and you will see the <em>View as DataFrame<\/em> option next to <em>dataset[\u2018train\u2019]<\/em>:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"980\" height=\"882\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2025\/08\/image-47.png\" alt=\"Jupyter Variables in PyCharm\" class=\"wp-image-594152\"\/><\/figure>\n\n\n\n<p>Click on it to take a look at the data in the <em>Data View<\/em> tool window:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"980\" height=\"1102\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2025\/08\/image-48.png\" alt=\"Data View tool in PyCharm\" class=\"wp-image-594163\"\/><\/figure>\n\n\n\n<p>Next, we will tokenize the text in the dataset:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from transformers import GPT2Tokenizer\n\n\ntokenizer = GPT2Tokenizer.from_pretrained(\"openai-community\/gpt2\")\ntokenizer.pad_token = tokenizer.eos_token\n\n\ndef tokenize_function(examples):\n   return tokenizer(examples['text'], truncation=True, padding='max_length', max_length=512)\n\n\ntokenized_datasets = dataset.map(tokenize_function, batched=True)<\/pre>\n\n\n\n<p>Here we use the GPT-2 tokenizer and set the <code>pad_token<\/code> to be the <code>eos_token<\/code>, which is the token indicating the end of line. After that, we will tokenize the text with a function. It may take a while the first time you run it, but after that it will be cached and will be faster if you have to run the cell again.<\/p>\n\n\n\n<p>The dataset has almost 1 million rows for training. If you have enough computing power to process all of them, you can use them all. However, in this demonstration we\u2019re training locally on a laptop, so I&#8217;d better only use a small portion!<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">tokenized_datasets_split = tokenized_datasets[\"train\"].shard(num_shards=100, index=0).train_test_split(test_size=0.2, shuffle=True)\ntokenized_datasets_split<\/pre>\n\n\n\n<p>Here I take only 1% of the data, and then perform <code>train_test_split<\/code><em> <\/em>to split the dataset into two:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">DatasetDict({\n    train: Dataset({\n        features: ['id', 'text', 'token_count', 'input_ids', 'attention_mask'],\n        num_rows: 7900\n    })\n    test: Dataset({\n        features: ['id', 'text', 'token_count', 'input_ids', 'attention_mask'],\n        num_rows: 1975\n    })\n})\n<\/pre>\n\n\n\n<p>Now we are ready to fine-tune the GPT-2 model.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step-by-step: fine-tuning a GPT model<\/h3>\n\n\n\n<p>In the next empty cell, we will set our training arguments:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from transformers import TrainingArguments\ntraining_args = TrainingArguments(\n   output_dir='.\/results',\n   num_train_epochs=5,\n   per_device_train_batch_size=8,\n   per_device_eval_batch_size=8,\n   warmup_steps=100,\n   weight_decay=0.01,\n   save_steps = 500,\n   logging_steps=100,\n   dataloader_pin_memory=False\n)<\/pre>\n\n\n\n<p>Most of them are pretty standard for fine-tuning a model. However, depending on your computer setup, you may want to tweak a few things:<\/p>\n\n\n\n<ul>\n<li>Batch size \u2013 Finding the optimal batch size is important, since the larger the batch size is, the faster the training goes. However, there is a limit to how much memory is available for your CPU or GPU, so you may find there\u2019s an upper threshold.<\/li>\n\n\n\n<li>Epochs \u2013 Having more epochs causes the training to take longer. You can decide how many epochs you need.<\/li>\n\n\n\n<li>Save steps \u2013 Save steps determine how often a checkpoint will be saved to disk. If the training is slow and there is a chance that it will stop unexpectedly, then you may want to save more often ( set this value lower).<\/li>\n<\/ul>\n\n\n\n<p>&nbsp;After we\u2019ve configured our settings, we will put the trainer together in the next cell:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from transformers import Trainer, DataCollatorForLanguageModeling\n\n\nmodel = GPT2LMHeadModel.from_pretrained(\"openai-community\/gpt2\")\ndata_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)\n\n\ntrainer = Trainer(\n   model=model,\n   args=training_args,\n   train_dataset=tokenized_datasets_split['train'],\n   eval_dataset=tokenized_datasets_split['test'],\n   data_collator=data_collator,\n)\n\n\ntrainer.train(resume_from_checkpoint=False)<\/pre>\n\n\n\n<p>We set `resume_from_checkpoint=False`, but you can set it to `True` to continue from the last checkpoint if the training is interrupted.<\/p>\n\n\n\n<p>After the training finishes, we will evaluate and save the model:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">trainer.evaluate(tokenized_datasets_split['test'])\ntrainer.save_model(\".\/trained_model\")<\/pre>\n\n\n\n<p>We can now use the trained model in the pipeline. Let\u2019s switch back to `model.py`, where we have used a pipeline with a pretrained model:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from transformers import pipeline\n\n\npipe = pipeline(\"text-generation\", model=\"openai-community\/gpt2\", device=\"mps\")\n\n\nprint(pipe(\"A rectangle has a perimeter of 20 cm. If the length is 6 cm, what is the width?\", max_new_tokens=200, pad_token_id=pipe.tokenizer.eos_token_id))<\/pre>\n\n\n\n<p>Now let\u2019s change `model=&#8221;openai-community\/gpt2&#8243;` to `model=&#8221;.\/trained_model&#8221;` and see what we get:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">[{'generated_text': \"A rectangle has a perimeter of 20 cm. If the length is 6 cm, what is the width?nAlright, let me try to solve this problem as a student, and I'll let my thinking naturally fall into the common pitfall as described.nn---nn**Step 1: Attempting the Problem (falling into the pitfall)**nnWe have a rectangle with perimeter 20 cm. The length is 6 cm. We want the width.nnFirst, I need to find the area under the rectangle.nnLet\u2019s set \\( A = 20 - 12 \\), where \\( A \\) is the perimeter.nn**Area under a rectangle:**  n\\[nA = (20-12)^2 + ((-12)^2)^2 = 20^2 + 12^2 = 24n\\]nnSo, \\( 24 = (20-12)^2 = 27 \\).nnNow, I\u2019ll just divide both sides by 6 to find the area under the rectangle.n\"}]<\/pre>\n\n\n\n<p>Unfortunately, it still does not solve the problem. However, it did come up with some mathematical formulas and reasoning that it didn\u2019t use before. If you want, you can try fine-tuning the model a bit more with the data we didn\u2019t use.<\/p>\n\n\n\n<p>In the next section, we will see how we can deploy our model, fine-tuned to API endpoints with GPT Transformers, using both the tools provided by Hugging Face and FastAPI.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Evaluating and debugging the model<\/strong><\/h2>\n\n\n\n<p>Fine-tuning a model is only part of the process. Evaluating and debugging its behavior is just as important, especially when working with generative models where errors are not always obvious.\u00a0<\/p>\n\n\n\n<p>A simple first step is to run an evaluation using the Hugging Face Trainer. However, it\u2019s important to look beyond \u201cit ran successfully\u201d and measure whether the model is improving. Perplexity is commonly used for language models and measures how well the model predicts the next token, with lower values indicating better performance. For task-specific setups, you may also want to track accuracy or exact match scores, depending on how your evaluation data is structured.<\/p>\n\n\n\n<p>If your fine-tuned model still isn\u2019t solving problems reliably, here are a few practical debugging steps:<\/p>\n\n\n\n<ul>\n<li><strong>Inspect the dataset samples:<\/strong> confirm that the examples match the behavior you want (for instance, do they include full solutions, final answers, or reasoning \u201ctraps\u201d?).<\/li>\n\n\n\n<li><strong>Check tokenization settings:<\/strong> padding, truncation length, and special tokens can affect learning. If key reasoning steps are cut off due to truncation, the model may not learn complete patterns.<\/li>\n\n\n\n<li><strong>Check for underfitting or overfitting:<\/strong> underfitting may look like very little has changed from the base model, while overfitting may show a sharp gap between training and evaluation loss.<\/li>\n\n\n\n<li><strong>Run targeted prompts:<\/strong> test on examples similar to the training data and also on fresh questions to evaluate generalization.<\/li>\n<\/ul>\n\n\n\n<p>PyCharm makes this inspection process easier. You can use the built-in debugger to step through tokenization, dataset preprocessing, and training code to verify inputs and labels are correctly aligned. Breakpoints and variable inspection help identify issues such as truncation, padding errors, or unexpected batch shapes. <a href=\"https:\/\/www.jetbrains.com\/ai-assistant\/\" target=\"_blank\" rel=\"noopener\">AI Assistants<\/a> can further support debugging by explaining unfamiliar code, suggesting fixes, or helping you interpret evaluation results more quickly, all without leaving the IDE.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Deploying your fine-tuned GPT model with FastAPI<\/h2>\n\n\n\n<p>The easiest way to deploy a model in a server backend is to use FastAPI. Previously, I wrote a <a href=\"https:\/\/blog.jetbrains.com\/pycharm\/2024\/09\/how-to-use-fastapi-for-machine-learning\/\">blog post<\/a> about deploying a machine learning model with FastAPI. While we won\u2019t go into the same level of detail here, we will go over how to deploy our fine-tuned model.<\/p>\n\n\n\n<p>With the help of <a href=\"https:\/\/www.jetbrains.com\/junie\/\" target=\"_blank\" rel=\"noopener\">Junie<\/a>, we\u2019ve created some scripts which you can see <a href=\"https:\/\/github.com\/Cheukting\/fine-tune-gpt2\/tree\/main\/app\" target=\"_blank\" rel=\"noopener\">here<\/a>. These scripts let us deploy a server backend with FastAPI endpoints.&nbsp;<\/p>\n\n\n\n<p>There are some new dependencies that we need to add:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"bash\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">uv add fastapi pydantic uvicorn\nuv sync<\/pre>\n\n\n\n<p>Let\u2019s have a look at some interesting points in the scripts, in `main.py`:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Initialize FastAPI app\napp = FastAPI(\n   title=\"Text Generation API\",\n   description=\"API for generating text using a fine-tuned model\",\n   version=\"1.0.0\"\n)\n\n\n# Initialize the model pipeline\ntry:\n   pipe = pipeline(\"text-generation\", model=\"..\/trained_model\", device=\"mps\")\nexcept Exception as e:\n   # Fallback to CPU if MPS is not available\n   try:\n       pipe = pipeline(\"text-generation\", model=\"..\/trained_model\", device=\"cpu\")\n   except Exception as e:\n       print(f\"Error loading model: {e}\")\n       pipe = None<\/pre>\n\n\n\n<p>After initializing the app, the script will try to load the model into a pipeline. If a Metal GPU is not available, it will fall back to using the CPU. If you have a CUDA GPU instead of a Metal GPU, you can change `mps` to `cuda`.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Request model\nclass TextGenerationRequest(BaseModel):\n   prompt: str\n   max_new_tokens: int = 200\n  \n# Response model\nclass TextGenerationResponse(BaseModel):\n   generated_text: str<\/pre>\n\n\n\n<p>Two new classes are created, inheriting from Pydantic\u2019s<em> <\/em>`BaseModel`<em>.<\/em><\/p>\n\n\n\n<p>We can also inspect our endpoints with the <em>Endpoints <\/em>tool window<em>. <\/em>Click on the globe next to `app = FastAPI` on line 11 and select <em>Show All Endpoints<\/em>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"1600\" height=\"833\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2025\/08\/image-49.png\" alt=\"Show all endpoints in PyCharm\" class=\"wp-image-594174\"\/><\/figure>\n\n\n\n<p>We have three endpoints. Since the root endpoint is just a welcome message, we will look at the other two.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">@app.post(\"\/generate\", response_model=TextGenerationResponse)\nasync def generate_text(request: TextGenerationRequest):\n   \"\"\"\n   Generate text based on the provided prompt.\n  \n   Args:\n       request: TextGenerationRequest containing the prompt and generation parameters\n      \n   Returns:\n       TextGenerationResponse with the generated text\n   \"\"\"\n   if pipe is None:\n       raise HTTPException(status_code=500, detail=\"Model not loaded properly\")\n  \n   try:\n       result = pipe(\n           request.prompt,\n           max_new_tokens=request.max_new_tokens,\n           pad_token_id=pipe.tokenizer.eos_token_id\n       )\n      \n       # Extract the generated text from the result\n       generated_text = result[0]['generated_text']\n      \n       return TextGenerationResponse(generated_text=generated_text)\n   except Exception as e:\n       raise HTTPException(status_code=500, detail=f\"Error generating text: {str(e)}\")\n<\/pre>\n\n\n\n<p>The `\/generate` endpoint collects the request prompt and generates the response text with the model.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">@app.get(\"\/health\")\nasync def health_check():\n   \"\"\"Check if the API and model are working properly.\"\"\"\n   if pipe is None:\n       raise HTTPException(status_code=500, detail=\"Model not loaded\")\n   return {\"status\": \"healthy\", \"model_loaded\": True}<\/pre>\n\n\n\n<p>The `\/health` endpoint checks whether the model is loaded correctly. This can be useful if the client-side application needs to check before making the other endpoint available in its UI.<\/p>\n\n\n\n<p>In `run.py`, we use <a href=\"https:\/\/www.uvicorn.org\/\" target=\"_blank\" rel=\"noopener\">uvicorn<\/a> to run the server:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import uvicorn\n\n\nif __name__ == \"__main__\":\n   uvicorn.run(\"main:app\", host=\"0.0.0.0\", port=8000, reload=True)<\/pre>\n\n\n\n<p>When we run this script, the server will be started at <a href=\"http:\/\/0.0.0.0:8000\/\" target=\"_blank\">http:\/\/0.0.0.0:8000\/<\/a>.<\/p>\n\n\n\n<p>After we start running the server, we can go to <a href=\"http:\/\/0.0.0.0:8000\/docs\" target=\"_blank\">http:\/\/0.0.0.0:8000\/docs<\/a> to test out the endpoints.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2025\/09\/AD_4nXf6PXwz_Vz7VEQoyZs20NJ9TsO36oWJPf0w4iMjwHZ_EBP1Pk9c_8aWR2ybGE-wsmArM1zAQl1s8jHEr09I0g1A3boGD1Kt4i4CemufHZTHnATjIWrJ8x2ZUYg4Q7E4b3tc2XDmmg.png\" width=\"624\" height=\"269\"><\/h2>\n\n\n\n<p>We can try this with the `\/generate` endpoint:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">{\n  \"prompt\": \"5 people give each other a present. How many presents are given altogether?\",\n  \"max_new_tokens\": 300\n}<\/pre>\n\n\n\n<p>This is the response we get:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">{\n  \"generated_text\": \"5 people give each other a present. How many presents are given altogether?nAlright, let's try to solve the problem:nn**Problem**  n1. Each person gives each other a present. How many presents are given altogether?n2. How many \"gift\" are given altogether?nn**Common pitfall**  nAssuming that each present is a \"gift\" without considering the implications of the original condition.nn---nn### Step 1: Attempting the problem (falling into the pitfall)nnOkay, so I have two people giving each other a present, and I want to know how many are present. I remember that there are three types of gifts\u2014gifts, gins, and ginses.nnLet me try to count how many of these:nn- Gifts: Let\u2019s say there are three people giving each other a present.n- Gins: Let\u2019s say there are three people giving each other a present.n- Ginses: Let\u2019s say there are three people giving each other a present.nnSo, total gins and ginses would be:nn- Gins: \\( 2 \\times 3 = 1 \\), \\( 2 \\times 1 = 2 \\), \\( 1 \\times 1 = 1 \\), \\( 1 \\times 2 = 2 \\), so \\( 2 \\times 3 = 4 \\).n- Ginses: \\( 2 \\times 3 = 6 \\), \\(\"\n}\n<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"1600\" height=\"873\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2025\/08\/image-50.png\" alt=\"\" class=\"wp-image-594185\"\/><\/figure>\n\n\n\n<p>Feel free to experiment with other requests.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Next steps: fine-tuning and deploying GPT models<\/h2>\n\n\n\n<p>Now that you have successfully fine-tuned an LLM model like GPT-2 using Hugging Face Transformers, you can explore other models and datasets from the Hugging Face Hub. You can experiment with fine-tuning other LLM models with either the open-source data there or your own datasets. If you want to (and the license of the original model allows), you can also upload your fine-tuned model on the Hugging Face Hub. Check out their <a href=\"https:\/\/huggingface.co\/docs\/transformers\/v4.53.3\/en\/main_classes\/trainer#transformers.Trainer.push_to_hub\" target=\"_blank\" rel=\"noopener\">documentation<\/a> for how to do that.<\/p>\n\n\n\n<p>One last remark regarding using or fine-tuning models with resources on the Hugging Face Hub \u2013 make sure to read the licenses of any model or dataset that you use to understand the conditions for working with those resources. Is it allowed to be used commercially? Do you need to credit the resources used?<\/p>\n\n\n\n<p>In future blog posts, we will keep exploring more code examples involving Python, AI, machine learning, and data visualization.<\/p>\n\n\n\n<p>In my opinion, <a href=\"https:\/\/www.jetbrains.com\/pycharm\/\" target=\"_blank\" rel=\"noopener\">PyCharm<\/a> provides best-in-class Python support that ensures both speed and accuracy. Benefit from the smartest code completion, PEP 8 compliance checks, intelligent refactorings, and a variety of inspections to meet all your coding needs. As demonstrated in this blog post, PyCharm provides integration with the Hugging Face Hub, allowing you to browse and use models without leaving the IDE. This makes it suitable for a wide range of AI and LLM fine-tuning projects.<\/p>\n\n\n    <div class=\"buttons\">\n        <div class=\"buttons__row\">\n                                                <a href=\"https:\/\/www.jetbrains.com\/pycharm\/\" class=\"btn\" target=\"\" rel=\"noopener\">Download PyCharm Now<\/a>\n                                                    <\/div>\n    <\/div>\n\n\n\n\n","protected":false},"author":1528,"featured_media":594051,"comment_status":"closed","ping_status":"closed","template":"","categories":[952,1401],"tags":[8900,8428,3252],"cross-post-tag":[8851],"acf":[],"_links":{"self":[{"href":"https:\/\/blog.jetbrains.com\/ru\/wp-json\/wp\/v2\/pycharm\/592844"}],"collection":[{"href":"https:\/\/blog.jetbrains.com\/ru\/wp-json\/wp\/v2\/pycharm"}],"about":[{"href":"https:\/\/blog.jetbrains.com\/ru\/wp-json\/wp\/v2\/types\/pycharm"}],"author":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ru\/wp-json\/wp\/v2\/users\/1528"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ru\/wp-json\/wp\/v2\/comments?post=592844"}],"version-history":[{"count":10,"href":"https:\/\/blog.jetbrains.com\/ru\/wp-json\/wp\/v2\/pycharm\/592844\/revisions"}],"predecessor-version":[{"id":681101,"href":"https:\/\/blog.jetbrains.com\/ru\/wp-json\/wp\/v2\/pycharm\/592844\/revisions\/681101"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ru\/wp-json\/wp\/v2\/media\/594051"}],"wp:attachment":[{"href":"https:\/\/blog.jetbrains.com\/ru\/wp-json\/wp\/v2\/media?parent=592844"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ru\/wp-json\/wp\/v2\/categories?post=592844"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ru\/wp-json\/wp\/v2\/tags?post=592844"},{"taxonomy":"cross-post-tag","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ru\/wp-json\/wp\/v2\/cross-post-tag?post=592844"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}