Articul8 AI Logo

Articul8 AI

Machine Learning Engineer - Data Pipeline

Posted 2 Days Ago
In-Office
Dublin, CA, USA
Mid level
In-Office
Dublin, CA, USA
Mid level
Design, build, and own end-to-end data acquisition and large-scale processing pipelines. Implement ML models to improve data quality, lead data ingestion/crawling projects, deploy scalable distributed systems, and maintain backend storage and indexing in Kubernetes environments.
The summary above was generated by AI

About us: 

At Articul8 AI, we relentlessly pursue excellence and create exceptional AI products that exceed customer expectations. We are a team of dedicated individuals who take pride in our work and strive for greatness in every aspect of our business. We believe in using our advantages to make a positive impact on the world and inspiring others to do the same. 

Job Description: 

We are seeking machine learning engineers to join our team full-time. As part of your role, you will help us build pipelines of data collection, data extraction, data filtering/synthetic data generation and data analysis.  You will own all work related to acquiring high-quality data to power the training of our domain-specific models end to end.  You will work closely with other researchers and engineers to empower our next generation of domain-specific models.  We value rapid prototyping, iterating, and shipping new systems quickly.  

Required Qualifications: 

  • BS/MS/PhD in Computer Science or a related field. 

  • Proficiency in at least one deep learning framework, such as PyTorch. 

  • Experience in machine learning projects in text or vision, e.g., has trained machine learning models to tackle a specific problem. 

  • Strong expertise in large stateful distributed systems and data processing. 

  • Strong proficiency in building large-scale data processing pipelines, familiar with distributed workload (e.g., multiprocessing, Ray, Docker, Kubernetes). 

  • Proficiency in at least one programming language commonly used in machine learning, such as Python and ability to write clean, maintainable code. 

  • Excellent problem-solving skills and attention to detail, especially when handling data anomalies and biases to further improve data quality. 

Key Competencies 

  • Active Github contributions are a big plus. 

  • Experience in building large-scale datasets. 

  • Familiar with at least one of the following tools for data crawling (e.g. Scrapy), data collection (e.g., VPNs, Selenium), data processing (e.g., Hadoop, Datasketch). 

  • Building bespoke data processing libraries from scratch. 

  • Keeping up with state-of-the-art techniques for preparing AI training data. 

  • Organizing and meticulously bookkeeping data across multiple clouds, of multiple modalities, and from many sources. 

  • Multilingual which contributes to enriching the language diversity crucial for robust model training. 

Responsibilities: 

  • Design and develop data processing pipelines, including data extraction, data filtering, data labeling, etc. 

  • Implement machine learning models to improve the quality and diversity of data (especially in the data extraction stage), e.g., quality classifier, document layout model, code verification model, etc. 

  • Own and lead engineering projects in the area of data acquisition, including web crawling, data ingestion, and processing. 

  • Collaborate with our Applied Research, Technology, and Architecture teams to ensure smooth data flow and system operability. 

  • Develop and deploy highly scalable distributed systems capable of handling terrabytes of data. 

  • Architect and implement algorithms for data indexing and search capabilities. 

  • Build and maintain backend services for data storage, including work with key-value databases and synchronization. 

  • Deploy solutions in a Kubernetes Infrastructure-as-Code environment and perform routine system checks. 

By joining our team, you become part of a community that embraces diversity, inclusiveness, and lifelong learning. We nurture curiosity and creativity, encouraging exploration beyond conventional wisdom. Through mentorship, knowledge exchange, and constructive feedback, we cultivate an environment that supports both personal and professional development. 

Your future experience at Articul8 will include continuous learning and growth opportunities as we embark on an exciting journey to disrupt the status quo. If you're excited about joining a team that's passionate about making a difference, we want to hear from you. 

If you're ready to join a team that's changing the game, apply now to become a part of the Articul8 team. Join us on this adventure and help shape the future of Generative AI in the enterprise. 

HQ

Articul8 AI Dublin, California, USA Office

4120 Dublin Blvd, Suite 250, Dublin, California , United States, 94568

Articul8 AI Santa Clara, California, USA Office

3979 Freedom Circle Mission Towers, Suite 340, , Santa Clara, CA , United States, 95054

Similar Jobs

6 Days Ago
In-Office
Sunnyvale, CA, USA
125K-222K Annually
Mid level
125K-222K Annually
Mid level
Hardware • Industrial
Build and maintain large-scale ETL and ML infrastructure to automate data selection, labeling, training, and testing loops. Connect real-world and simulation data to enable continuous model improvement, collaborate with modeling teams, and mentor junior engineers to scale data-centric development.
Top Skills: AirflowKafkaPythonSpark
15 Days Ago
In-Office
Sunnyvale, CA, USA
150K-200K Annually
Mid level
150K-200K Annually
Mid level
Automotive
The role involves developing large-scale data processing pipelines, automating data selection, and collaborating on ML model improvements to enhance vehicle performance and safety.
Top Skills: AirflowKafkaPythonSpark
4 Minutes Ago
Easy Apply
In-Office
Easy Apply
27-40 Annually
Junior
27-40 Annually
Junior
Aerospace • Hardware • Robotics • Software • Manufacturing
Assemble and test complex electro-mechanical mechanism assemblies for Terran R, following precise work instructions, using calibrated tools and test equipment, collaborating closely with engineers and technician teams to execute daily build operations.
Top Skills: AvionicsCalibrated ToolsElectrical TestingElectro-Mechanical AssembliesHarnessingHigh VoltageLow VoltagePrecision Inspection EquipmentTest EquipmentTorque WrenchesWiring

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account