The Smart Data Solutions Intern AI Engineer role is a fantastic opportunity for students and early-career professionals looking to gain hands-on experience in artificial intelligence, machine learning, and natural language processing. This full-time internship is based in Chennai, India, at Smart Data Solutions’ office in Perungudi, and is designed for candidates eager to work with cutting-edge technologies, large datasets, and advanced AI models.
As an Intern AI Engineer, you will contribute to developing machine learning and deep learning models for pharmaceutical document analysis, building OCR and OMR pipelines, and experimenting with large language models (LLMs) like Qwen and Nuextract. This internship is perfect for those passionate about AI in healthcare, biomedical informatics, and document processing.

About Smart Data Solutions
For over 20 years, Smart Data Solutions (SDS) has partnered with leading payer organizations to provide automation and technology solutions, focusing on data standardization and workflow automation. SDS handles claims and claims-related information in any format, digitizing and normalizing it for seamless use by payer clients. With over 420 healthcare organizations as clients, SDS processes more than 500 million transactions annually and maintains a 98%+ customer retention rate.
SDS has invested heavily in AI and machine learning to improve operational efficiency and client outcomes. Partnered with Parthenon Capital, SDS continues to accelerate product innovation and expansion. Interns joining SDS will gain exposure to real-world applications of AI, machine learning, and NLP in healthcare and pharmaceutical domains.
Role Overview
As a Smart Data Solutions Intern AI Engineer, your primary responsibilities include designing and implementing machine learning models, building OCR/OMR pipelines, and extracting structured data from unstructured documents such as clinical forms, prescriptions, and regulatory filings. You will also assist in integrating and experimenting with LLMs using frameworks like LangChain, supporting retrieval-augmented generation (RAG), and enhancing search capabilities.
This internship provides exposure to real-world AI projects, collaboration with domain experts, and the chance to apply machine learning techniques to complex healthcare datasets.
Key Responsibilities
- Develop and evaluate machine learning and deep learning models for document analysis in pharma and healthcare
- Build OCR/OMR pipelines and extract structured information from unstructured text
- Integrate LLMs such as Qwen, Nuextract, and other open-source models using LangChain
- Write scalable, testable Python and Java code for backend and integration tasks
- Assist in creating prompt templates and LLM-enhanced search capabilities
- Support data cleaning, annotation, and labeling tasks for medical/NLP datasets
- Collaborate with data scientists and domain experts to improve model performance and accuracy
- Handle large PDF/TIFF document corpora and use annotation tools effectively
Who Can Apply
| Criteria | Details |
|---|---|
| Education | Currently pursuing or completed Bachelor’s/Master’s in Computer Science, AI, Data Science, or related fields |
| Location | Chennai, India (Perungudi Office) |
| Duration | Full-Time Internship |
| Work Hours | 4:00 PM to 1:00 AM |
| Skills | Python, Java, OCR/OMR, NLP, Machine Learning, Deep Learning, LLM, Transformers, PyTorch, TensorFlow |
Share the opportunity
Required Skills
- Strong coding proficiency in Python and Java
- Solid understanding of machine learning, deep learning, and NLP fundamentals
- Hands-on experience or coursework in OCR/OMR, computer vision, and document data extraction
- Familiarity with libraries such as Transformers (Hugging Face), OpenCV, Tesseract, SpaCy, PyTorch, TensorFlow
- Knowledge of LLMs, LangChain, Qwen, Nuextract, or other instruction-following models
- Ability to work independently and collaboratively in cross-functional teams
Preferred Skills
- Background in Biomedical AI, Healthcare Informatics, or Pharmaceutical NLP projects
- Experience working with large document datasets and annotation tools
- Knowledge of information retrieval, prompt engineering, and LLM deployment
- Strong analytical and problem-solving skills
- Interest in applying AI to healthcare, pharma, and regulatory document processing
Follow us on
LinkedIn for the latest updates
Follow us on
Threads for the latest updates
Subscribe ▶️ YouTube Channel for Latest Updates
What You Will Gain
- Hands-on experience with machine learning, NLP, and LLMs in healthcare and pharma
- Exposure to real-world AI workflows, including OCR, OMR, and large-scale document processing
- Practical experience with Python, Java, Transformers, PyTorch, TensorFlow, and LangChain
- Opportunity to collaborate with experienced AI engineers, data scientists, and domain experts
- Development of technical, analytical, and problem-solving skills in a professional environment
- Experience with healthcare datasets, document annotation, and AI-driven workflow automation
How to Apply
To apply for the Smart Data Solutions Intern AI Engineer role, click the Apply Now button and submit your resume. Highlight your experience with Python, Java, NLP, OCR/OMR, and any AI/ML projects. Mention coursework, research, or prior internships relevant to LLMs, deep learning, or document data processing.
Conclusion
The Smart Data Solutions Intern AI Engineer role is an excellent opportunity for students and early-career professionals to gain practical experience in AI, machine learning, NLP, and healthcare informatics. With exposure to real-world projects, large datasets, and cutting-edge LLMs, interns will develop invaluable skills to advance their careers in AI and data science. If you are passionate about leveraging AI for healthcare and document automation, this internship is the perfect way to start your journey. 🤖📄💡
Share the opportunity