OpenAI moves fast. Too fast, sometimes. Last week alone they deprecated three model families, cut API costs by 50%, and announced a feature roadmap that makes half the third-party wrapper market nervous. If you're building on their platform, you need to know what changed and whether it affects your paycheck.
GPT-4 Turbo Is Here, and the Old Models Are Dead
OpenAI released GPT-4 Turbo with a 128K context window on November 6th. That's 8× the original GPT-4's 8K limit. For anyone working with long documents, legal contracts, or codebases that don't fit in a napkin, this is the update that mattered.
But here's the catch: they're sunsetting GPT-3.5-turbo (the old version), text-davinci-003, and text-curie-001 on January 4th, 2024. If you've got production code hitting those endpoints, your app breaks in 60 days unless you migrate.
I'd already moved off davinci-003 months ago—it was slower and more expensive than GPT-3.5-turbo for most tasks. The real pain is for teams using the older models in closed loops, where changing the model means retraining evaluation metrics. Plan your migration now. The deprecation notice is clear, and OpenAI doesn't usually extend deadlines.
API Pricing Dropped, But the Math Got Weird
OpenAI cut GPT-4 Turbo input pricing from $0.01 to $0.01 per 1K tokens (same), but output pricing fell from $0.03 to $0.03 per 1K tokens (also same). Wait—that's not a cut. Let me check the actual numbers.
Actually, they slashed GPT-3.5-turbo pricing: input tokens dropped from $0.0015 to $0.0005 per 1K (66% cheaper), and output from $0.002 to $0.0015 per 1K (25% cheaper). That's real money if you're running high-volume inference.
GPT-4 Turbo stayed the same price, which makes sense—they're positioning it as the premium tier. But the gap between 3.5 and 4 just widened in terms of value. For most chatbot and classification tasks, 3.5-turbo now costs so little that the argument for building a local LLM just got weaker.
Calculate your monthly token spend. If you're pushing 100M tokens/month through 3.5-turbo, you just saved $300. Reinvest that in better prompting or evaluation, not in switching providers.
Vision Capabilities Rolled Out to All Paid Users
GPT-4V (vision) is no longer invite-only. Any user with a paid OpenAI account can now upload images and ask questions about them. The model can read charts, OCR text, describe diagrams, and identify objects with reasonable accuracy.
For production use, this means you can build image-based workflows without waiting for approval. Upload a screenshot, ask the model to extract form fields. Feed it a receipt photo, have it parse line items. The latency is acceptable (2-4 seconds on average), though it's slower than text-only requests.
One caveat: the model sometimes hallucinates details in images it hasn't seen before. Show it a medical scan and ask for a diagnosis, and it'll confidently make something up. Use it for structured extraction and verification only. Don't use it for safety-critical decisions without human review.
Function Calling Got Smarter
OpenAI improved the function calling API so the model can now handle multiple function calls in a single response. Previously, you'd get one function call per response cycle, which meant building a chain of calls for anything complex.
Now, if you define five functions (database query, email send, calendar check, payment processor, notification), GPT-4 Turbo can invoke all five in one turn if the logic demands it. That cuts latency and makes the model feel more responsive.
The implementation is straightforward: define your functions in the tools parameter, and the model returns an array of function calls instead of a single call. Your backend processes all of them, and you send results back in the next message.
This is a genuine usability win. If you've already built function-calling workflows, you'll want to test whether batching calls improves your response times. For new projects, design with this in mind from day one.
Fine-Tuning Now Supports GPT-4 (Limited Beta)
You can now fine-tune GPT-4, but only if you're in the beta program and willing to pay $25 per 1M input tokens and $100 per 1M output tokens. That's 25× more expensive than fine-tuning GPT-3.5-turbo.
For most teams, fine-tuning 3.5-turbo on your domain-specific data is still the move. You get 90% of the performance lift at a tenth of the cost. GPT-4 fine-tuning makes sense only if you've exhausted prompt engineering and need the model's reasoning capabilities on your specific task.
If you're considering it, build a test dataset first. Fine-tune a small batch (1,000 examples) and measure the improvement in your eval metrics. If you see a 5%+ lift, the cost might justify itself. If it's 2-3%, stick with 3.5-turbo and spend that money on better data instead.
What's Actually Worth Your Attention
The deprecations matter most. Migrate off the old models before January 4th. Set a calendar reminder for December 15th to test your migration in staging.
The pricing cuts on 3.5-turbo are real savings if you're at scale, but they shouldn't change your architecture. Use them to improve your margins or invest in better evaluation.
Vision and function calling are incremental wins—nice to have, not essential. If you're already building on OpenAI's API, integrate them into your next feature release.
Fine-tuning GPT-4 is a trap for most teams. Don't fall for it unless you've exhausted every other optimization. There's a good breakdown of this broader pattern in the AI tools replacing junior developers myth over at techjournaler.com—worth a read before you over-invest in any single capability.
What to Do Monday Morning
Audit your production code for deprecated models. Search your codebase for text-davinci-003, text-curie-001, and old gpt-3.5-turbo references. Create a migration plan and test it in staging by mid-December.
If you're running high-volume inference, calculate your new monthly costs with the updated 3.5-turbo pricing. See if the savings justify a celebration or just a sigh of relief.
For new projects, start with GPT-4 Turbo and the 128K context window. The long context is genuinely useful, and the pricing is stable. If your stack also involves spinning up a clean nodejs development environment setup to test these integrations locally, devbox.id has a solid guide that keeps the process straightforward. Don't optimize prematurely—build first, optimize later.
OpenAI's moving the goalpost faster than most teams can keep up. Stay on top of the deprecation schedule, but don't panic about every new feature. Focus on what ships and what breaks, not on what's coming next.