Master Thesis - Resolving Chemical Identifier Challenges: Enhancing Data Integration with Graph Databases
☞ Helmholtz-Zentrum für Umweltforschung UFZ
看过: 68
更新日: 04-11-2024
类别: 制药/化学/生物技术
行业:
工作内容
Contract limitations
limited contractContact
Your contact for any questions you may have about the job:
ilhan.mutlu@ufz.de
Your application
Please submit your application via our online portal with your cover letter, CV (please omit your photo, age, or marital status) and relevant attachments.
Diversity and Inclusion
The UFZ has a strong commitment to diversity and actively supports equal opportunities for all employees regardless of their origin, religion, ideology, disability, age or sexual identity.
We look forward to applications from people who are open-minded and enjoy working in diverse teams.
The UFZ
The Helmholtz Centre for Environmental Research (UFZ) with its 1,100 employees has gained an excellent reputation as an international competence centre for environmental sciences. We are part of the largest scientific organisation in Germany, the Helmholtz association. Our mission: Our research seeks to find a balance between social development and the long-term protection of our natural resources.
The job
Chemical databases are vital in various scientific fields, including toxicology, pharmacology, and environmental science. These databases utilize a variety of chemical identifiers -such as CAS numbers, InChI, SMILES, DTXSID, and Norman SusDat ID- to catalog chemical substances. However, the diversity of these identifiers introduces significant challenges, including inconsistencies, ambiguities, and potential issues that complicate data integration and analysis.
In this Master’s thesis, the primary goal will be to provide a graph database solution for the identifier problem. An initial step will involve analyzing potential issues resulting from diverse chemical identifiers to get familiar with the topic. Following this, the focus will shift to implementing the proposed solution, ensuring more accurate and consistent chemical data management.
Once the issues are understood and addressed, the next step will be to develop a knowledge graph that holds identifiers and names, which will then be stored in a graph database. Graph databases offer a powerful solution for modeling complex relationships and hierarchies among chemical identifiers, making it easier to manage and resolve any remaining ambiguities. The student will develop a graph database schema, example queries for resolving identifier-related issues, and document how to interact with the graph database using the usual neo4j browser interface (via Cypher).
This research is essential for creating more robust and scalable solutions in chemical data management, with applications in toxicology, pharmacology, and environmental monitoring.
The student will be based in the Bio-Data Science Group of the Computational Biology and Chemistry Department at the UFZ, collaborating with the database management team. The start date for the work is flexible, the position is expected to last for six months and will be supervised at the site in Leipzig.
Your tasks
- Identifying and collecting inconsistencies, ambiguities related to the diversity of chemical identifiers (e.g., DTXSID, Norman SusDat ID, InChI, CAS numbers)
- Developing a knowledge graph (schema) of the selected identifiers and their relations
- Fill graph DB according to developed graph database schema with data from various public sources
- Testing the issues from the beginning for their solutions with the developed graph database, providing documentation and access to a browser-based interface to the graph DB
We offer
- Excellent supervision that supports your personal and professional development
- Exciting insights into the work of a leading research institute
- The chance to work in interdisciplinary, international teams and benefit from a wide range of perspectives
- The opportunity to contribute and actively shape your own ideas and impulses
right from the start - Modern technical equipment and IT service to optimally support your work
Your profile
- Enrolled Master’s student in Computer Science, Mathematics, Bioinformatics, or a related field
- Excellent programming skills in Python or R
- Strong interest in graph database technologies
- Experience with graph databases (e.g., Neo4j) are a plus
- Experience in database management and cheminformatics tools is beneficial
- Knowledge about chemical identifiers (e.g., CAS numbers, InChI) and their challenges is an advantage
- Independent and structured way of working
- Proficiency in English
Application deadline: 15.11.2024
www.ufz.de/career
LinkedIn @UFZ
Family Support
International Office
Accessibility
Diversity, Equity & Inclusion
最后期限: 04-12-2024
点击免费申请候选人
报告工作
相同的工作
-
⏰ 13-11-2024🌏 Leipzig, Saxony
-
⏰ 13-11-2024🌏 Leipzig, Saxony
-
⏰ 24-11-2024🌏 Dresden, Saxony
-
⏰ 21-11-2024🌏 Dresden, Saxony
-
⏰ 21-11-2024🌏 Dresden, Saxony