Data Engineer (m/f/d) | Legal AI Tech Start-up | Full Remote

Festanstellung, Voll- oder Teilzeit · Berlin

Lesen Sie die Stellenbeschreibung in:
Warum wir?
Mission & Vision 

As a Data Engineer (m/f/d), you will support the development of our legal data system by designing and maintaining a robust data infrastructure. You will be responsible for building and optimizing ETL pipelines that process legal data from multiple jurisdictions, while developing data models that ensure consistency, scalability, and accuracy across diverse datasets. A key part of your role will be implementing metadata enrichment strategies that enhance the searchability and usability of legal information. You will also conduct database performance benchmarking and tuning to guarantee efficient query execution and long-term scalability. Working closely with product teams, researchers, and legal domain experts, you will deliver high-quality, reliable data solutions that help us unlock the value of complex, multilingual legal content.

Your Team

You will join our AI Team led by Felix (Head of AI), working closely with a group of approximately 5 AI experts. This highly collaborative team focuses on pushing the boundaries of generative AI, natural language processing, and privacy-preserving machine learning legal solutions.

Your Hiring Manager

Felix, our Head of AI, will guide you through your journey at Noxtua. With deep expertise in AI systems, Felix leads with a passion for innovation and a collaborative approach, ensuring every team member thrives. 

Benefits

  • Working hours: Flexible working hours: Full-time or Part-time. 
  • Vacation: 26 days + December 24th & 31st off 
  • Remote: 100% remote work possible (given a European residence), other countries upon request
  • Discounts: e.g. Urban Sports Club Membership
  • Equipment: Laptop (Lenovo or Mac), second screen, keyboard etc.   
Deine Aufgaben
  • Build and optimize ETL pipelines to process legal data from multiple jurisdictions, including chunking, embedding and ingesting legal data.
  • Develop and maintain data models that ensure consistency, scalability, and accuracy across diverse datasets and large amounts of data.
  • Coordinate data handover from different sources.
  • Implement metadata enrichment strategies to enhance searchability and usability of legal information.
  • Experiment with embedding strategies and training embedding models, including evaluation
  • Conduct database performance benchmarking and tuning to ensure efficient query execution and scalability.
  • Collaborate with product, AI, and legal domain experts to deliver high-quality, reliable data solutions.
Unser Tech Stack
  • Programming Languages: Python
  • Data format: XML, parquet
  • Frameworks: Blob Storage systems like S3 (especially OTC OBS), langchain, langgraph
  • Vector Search: ElasticSearch, Qdrant, Pinecone 
  • Graph Databases: Neo4j, Amazon Neptune, TigerGraph
  • Libraries: HuggingFace, Transformers, NumPy, Pandas, Pydantic, FastAPI, OpenAI & PyTorch
  • Deployment Tools: Docker 
  • Cloud Infrastructure: OTC, AWS, GCP, Azure 
  • Pipeline Orchestration: Apache Airflow, dagster, Prefect
  • Ticket System: Atlassian JIRA 
  • Repository: Github 
  • CI/CD System: GitHub Actions 
  • Documentation: Confluence 
  • Communication: Slack  
  • Office Application: MS365  
Dein Profil

Requirements:

  • Residence & Work Permit: Eligible to work in Germany or within the EU. 
  • Language: English proficiency at C2 level. 
  • Experience: in AI development or data engineering with successfully deployed projects 
  • Data: Expertise in data processing, filtering, and augmentation
  • Databases: Expertise in vector databases, data embedding, benchmarking and management
  • Programming: Strong Python skills and experience with AI pipelines 

Optional: 

  • Experience in deploying graph databases
  • RAG Systems: Experience in building up AI specific RAG pipelines
  • NLP & Generative AI: Familiarity with developing and deploying NLP, generative AI models 
  • Familiarity with Kubernetes deployments
  • Legal background knowledge

Sounds good?

Then, we look forward to receiving your CV via our online application form.
Über uns

Noxtua ist Europas führende souveräne Legal AI. Die rechtssichere und kompetente KI unterstützt juristische Fachkräfte bei der Recherche rechtlicher Fragestellungen sowie bei der Prüfung und Erstellung juristischer Dokumente. Die DSGVO-konforme Legal AI erfüllt die hohen fachlichen und datenschutzrechtlichen Anforderungen für Rechtsanwält:innen (§ 203 StGB, § 43e BRAO) und ist nach ISO 27001, 9001, 27018, 27017, 42001 sowie dem BSI C5-Standard zertifiziert. Die Produktversion Beck-Noxtua basiert auf den exklusiven Daten des führenden deutschen juristischen Fachverlags C.H.Beck sowie der größten deutschen Wirtschaftskanzlei CMS.

Hervorgegangen aus einem Forschungsprojekt von Dr. Leif-Nissen Lundbæk und Professor Dr. Michael Huth an der Universität Oxford und dem Imperial College London im Jahr 2017, verfügt das Legal-Tech-Unternehmen mit Sitz in Berlin – ehemals bekannt als Xayn – über umfassende Erfahrung in der Entwicklung hoch effizienter, DSGVO-konformer KI-Lösungen. Strategische Partner wie Deutschlands führender juristischer Fachverlag C.H.Beck, der Hochleistungsrechen-Spezialist Northern Data, Deutschlands größte Wirtschaftskanzlei und Co-Initiatorin der Legal AI Noxtua CMS sowie die weltweit größte Anwaltskanzlei Dentons haben im Rahmen der Series-B-Finanzierungsrunde insgesamt 80,7 Millionen Euro in das deutsche Startup investiert.


Our offer for you
Mission & Vision 

As a Data Engineer (m/f/d), you will support the development of our legal data system by designing and maintaining a robust data infrastructure. You will be responsible for building and optimizing ETL pipelines that process legal data from multiple jurisdictions, while developing data models that ensure consistency, scalability, and accuracy across diverse datasets. A key part of your role will be implementing metadata enrichment strategies that enhance the searchability and usability of legal information. You will also conduct database performance benchmarking and tuning to guarantee efficient query execution and long-term scalability. Working closely with product teams, researchers, and legal domain experts, you will deliver high-quality, reliable data solutions that help us unlock the value of complex, multilingual legal content.

Your Team

You will join our AI Team led by Felix (Head of AI), working closely with a group of approximately 5 AI experts. This highly collaborative team focuses on pushing the boundaries of generative AI, natural language processing, and privacy-preserving machine learning legal solutions.

Your Hiring Manager

Felix, our Head of AI, will guide you through your journey at Noxtua. With deep expertise in AI systems, Felix leads with a passion for innovation and a collaborative approach, ensuring every team member thrives. 

Benefits

  • Working hours: Flexible working hours: Full-time or Part-time
  • Vacation: 26 days + December 24th & 31st off 
  • Remote: 100% remote work possible (given a European residence), other countries upon request
  • Discounts: e.g. Urban Sports Club Membership
  • Equipment: Laptop (Lenovo or Mac), second screen, keyboard etc.   
Your responsibilities
  • Build and optimize ETL pipelines to process legal data from multiple jurisdictions, including chunking, embedding and ingesting legal data.
  • Develop and maintain data models that ensure consistency, scalability, and accuracy across diverse datasets and large amounts of data.
  • Coordinate data handover from different sources.
  • Implement metadata enrichment strategies to enhance searchability and usability of legal information.
  • Experiment with embedding strategies and training embedding models, including evaluation
  • Conduct database performance benchmarking and tuning to ensure efficient query execution and scalability.
  • Collaborate with product, AI, and legal domain experts to deliver high-quality, reliable data solutions.
Our Tech Stack
  • Programming Languages: Python
  • Data format: XML, parquet
  • Frameworks: Blob Storage systems like S3 (especially OTC OBS), langchain, langgraph
  • Vector Search: ElasticSearch, Qdrant, Pinecone 
  • Graph Databases: Neo4j, Amazon Neptune, TigerGraph
  • Libraries: HuggingFace, Transformers, NumPy, Pandas, Pydantic, FastAPI, OpenAI & PyTorch
  • Deployment Tools: Docker 
  • Cloud Infrastructure: OTC, AWS, GCP, Azure 
  • Pipeline Orchestration: Apache Airflow, dagster, Prefect
  • Ticket System: Atlassian JIRA 
  • Repository: Github 
  • CI/CD System: GitHub Actions 
  • Documentation: Confluence 
  • Communication: Slack  
  • Office Application: MS365 
About you

Requirements:

  • Residence & Work Permit: Eligible to work in Germany or within the EU. 
  • Language: English proficiency at C2 level. 
  • Experience: in AI development or data engineering with successfully deployed projects 
  • Data: Expertise in data processing, filtering, and augmentation
  • Databases: Expertise in vector databases, data embedding, benchmarking and management
  • Programming: Strong Python skills and experience with AI pipelines 

Optional: 

  • Experience in deploying graph databases
  • RAG Systems: Experience in building up AI specific RAG pipelines
  • NLP & Generative AI: Familiarity with developing and deploying NLP, generative AI models 
  • Familiarity with Kubernetes deployments
  • Legal background knowledge

Sounds good?

Then, we look forward to receiving your CV via our online application form.
About us

 Noxtua is Europe's leading sovereign Legal AI. The legally compliant and competent AI helps legal professionals to research legal issues and review, and draft legal documents. The GDPR-compliant Legal AI meets the high professional and data protection requirements for lawyers (§ 203 Penal Code, § 43e Federal Lawyers' Act) and is certified according to ISO 27001, 9001, 27018, 27017, 42001, and BSI C5. The product version Beck-Noxtua is based on the exclusive data of Germany’s leading legal publisher C.H.Beck and Germany’s largest business law firm CMS.

Emerging from a 2017 research project by Dr. Leif-Nissen Lundbæk and Professor Dr. Michael Huth at Oxford University and Imperial College London, the Berlin-based Legal-Tech company, formerly known as Xayn, boasts extensive experience in developing highly efficient GDPR-compliant AI solutions. Strategic partners, including Germany's leading legal publisher C.H.Beck, the High-Performance Computing specialist Northern Data, Germany's largest business law firm and co-initiator of the Legal AI Noxtua CMS, as well as the world's largest law firm Dentons, have invested a total of € 80.7 million in the German startup as part of its Series B funding round.

We explicitly encourage women to apply, as they are currently underrepresented. Our goal is to build a diverse and inclusive work environment that values different perspectives. Of course, we welcome applications from all qualified individuals – regardless of gender, ethnic origin, religion, disability, age, or sexual identity.
Wir freuen uns auf Sie!
Wir freuen uns über Ihr Interesse an der Noxtua AG. Bitte füllen Sie das folgende kurze Formular aus. Sollten Sie Schwierigkeiten mit dem Upload Ihrer Daten haben, wende Sie sich gerne per Email an jobs@xayn.com.
Dokument wird hochgeladen. Bitte warten Sie.
Fügen Sie alle erforderlichen (mit einem * gekennzeichneten) Angaben hinzu, um Ihre Bewerbung abzusenden.