The availability of Big Data and computing capability makes Artificial Intelligence (AI) realistic and interesting. Over the years the two have continued to improve making it possible to have real or near-real-time applications that have AI at the core. This means that organizations with the knowledge, skills, and right attitude on how to handle and deal with the massive data being generated from any imaginable data points have a big advantage over their peers. AI depends on data.
Even in the age of pre-trained models, transfer learning and such, if we plan to build inclusive and unbiased AI/ML models, localized data is still very essential. In the simplest terms, AI is pattern matching at a scale. This means that with the right data, high processing capabilities, skills, knowledge, and the right attitude we can have inclusive AI.
The mentioned requisites to inclusive and unbiased AI are easier said than done. The world resources are unevenly distributed and more so highly limited in developing countries. In our quest for quality teaching of AI, NVIDIA has offered us the ability to use its existing computing infrastructure, training materials, and data.
In this blog, students at JENGA School explore the possibilities, with the resources given by NVIDIA in terms of the processing (GPU) capabilities and availability of data.
Using NVIDIA Morpheus in Detecting Fraudulent Claims in Insurance
~ Joy Grace Ngugi
With the increase of new technological innovations every other day, the ability to collect, store, and process data has dramatically escalated. There are compelling new tools that can help in automating processes, learning new things that couldn’t be seen before, recognizing patterns, and predicting what is likely to happen.
With all this information, the capacity to do new things has developed quickly and while the focus primarily has been on what we can do, this is rapidly shifting to what’s the right thing to do. This is the very definition of ethics. It’s evaluating how data is used and what it’s used for, considering who does and should have access, and anticipating how data could be misused. It means thinking through what data should and should not be connected with other data and how to securely store, move and use it.
It is amazing how NVIDIA’s tools can be leveraged to tackle the ethical issues that arise from data. One of these tools is NVIDIA Morpheus. NVIDIA Morpheus is an open application framework that enables cybersecurity developers to create optimized AI pipelines for filtering, processing and classifying large volumes of real-time data.
Morpheus allows its users to create AI pipelines that address specific use cases, such as fraud and phishing detection or leaked sensitive information, by filtering and processing large volumes of data from logs and other network telemetry sources. It provides AI skills that can be used to detect and mitigate these cybersecurity attacks.
But what does ethics have to do with security?
I am of the opinion that most of the ethical issues arising from data are founded on security issues. Take for example issues like leakage of sensitive information and personal user data. This has compromised the privacy and security of the users’ data and by extension the user if then used for malicious reasons.
With privacy being one of the core principles of ethical AI among transparency and fairness this then also becomes an ethical issue. Morpheus can be deployed in networks to ensure AI inference across the traffic to identify sensitive information and users’ credentials, like their name, email address, personal identification number, etc that have not been anonymized.
The bigger picture
This fits perfectly with my project, which is working with Insurance data to detect fraudulent claims by analyzing historical transactions to identify trends that would help to flag fraud before it is paid off.
The insurance industry generates and consumes massive amounts of data, which presents a huge risk to data privacy. Personal data is often anonymized to prevent identifying a person from the data. The leakage of this sensitive information is a huge concern as it could expose the user to potential security risks of malicious attacks. Morpheus helps to mitigate this by flagging areas where sensitive credentials are not anonymized and alerting so as to prevent any leakage.
Using NVIDIA’s Parallel Processing to Speed up my Solution to Africa’s Water Crisis
~ Felista Mogire
Many industry experts, including NVIDIA CEO Jensen Huang, have declared Moore’s law dead. Moore’s law is an observation made by Gordon Moore in 1965 that the number of transistors in a dense integrated circuit (IC) will double every two years.
Until recently, that was true. Nowadays the trend in semiconductor manufacturing is to shrink the size of microchips, therefore, making it complex to keep increasing the number of transistors. This means that the processing power of the processor (CPU) is no longer increasing exponentially but rather at a much slower pace.
This then limits the computational capabilities of the processor and makes general-purpose computing on GPU (GPGPU) a welcome idea for data scientists running massive machine learning and deep learning jobs.
NVIDIA’s CUDA Toolkit
With the CUDA platform from NVIDIA, we can now dramatically speed up our machine learning projects at JENGA School. I am currently working on an offender’s dataset that is composed of a series of rows and columns.
Since I have been working on a CPU and utilizing one CPU core at a time, my project often takes a while to run. Now with NVIDIA, I will be able to perform parallel computing on GPU which will enable me to complete my project faster.
Parallel Processing on GPU
Using NVIDIA’s Powerful Systems to Fasttrack My capstone project on Combating Air Pollution
~ Sifa Kinoti
As aforementioned, NVIDIA provides a user with graphic processing unit (GPU) capabilities that accelerate data processing through a parallel computing platform and application program interface (API) called CUDA.
In my project, I am dealing with data containing hundreds of thousands of entries. Therefore, a fast and powerful processing system, such as NVIDIA, is needed. This being my first time using NVIDIA, I enjoyed exploring its capabilities as proper documentation and beginner-friendly syntax are provided by the developers.
Normally, I would go about data processing with Python packages such as Numpy, Pandas, and Scikit-learn for machine learning algorithms. However, they are slow. Fortunately, there are data science packages, similar to the ones I have mentioned, that are much more powerful.
I explored the use of cuDF, which is the data frame library. It works just like Pandas, where you get to manipulate data frames by sorting, joining, dropping columns, and many more functionalities. I’ve also explored the use of cuPY, which is the Numpy alternative. This is useful for mathematical operations, array manipulations, matrix manipulations, and many more. These libraries are fundamental for exploratory data analysis.
As I progress, I look forward to exploring the cuSpatial and cuML libraries that would enrich the project. I would use cuSpatial for grid projection as I have coordinate data that I would like to map. Furthermore, I would employ a machine learning algorithm using cuML in order to perform prediction.
Using NVIDIA’s Powerful GPU capabilities in Image Segmentation Applications
~ Gacheri Nturibi
Often, when we talk about NVIDIA, we automatically think about their graphics cards that improve users’ gaming experience. There is more to NVIDIA in relation to data science, including powerful GPU capabilities that NVIDIA is renowned for.
So far, I have had an overall positive experience while using NVIDIA for my capstone project. The project involves heavy image processing tasks and massive amounts of data which consumed a lot of resources in my local computer until I began using NVIDIA. The available technologies and free content have made my workload much easier, as I can now run the processes much faster without consuming much space on my computer.
Self-paced NVIDIA Courses
NVIDIA provides a self-paced course on image segmentation where the learner is taken through the steps and processes involved in image processing. The course utilizes technologies such as TensorFlow, Keras, and GPU libraries such as cuDNN, which allow TensorFlow to extract the most performance from the available GPUs.
Personally, this has especially been beneficial as the course has not only widened my knowledge base but has also exposed me to new criteria for performing my image segmentation project. I look forward to applying these concepts in my image segmentation project.
In addition to the self-paced course, NVIDIA also provides additional resources in their GitHub repository. The repository contains other Image segmentation projects as well as pre-trained models from NVIDIA GPU Cloud (NGC) that can be adapted to users’ needs based on their preferences. All these resources will be useful in meeting all my project’s objectives.
What Does This Partnership Mean For JENGA School Students?
~ Daisy Ondwari
Having the NVIDIA capability is a big win for JENGA School graduates and students. They now have the potential to actualize their dreams of becoming competent and outstanding data scientists, given the exposure to high-end tools and resources from NVIDIA.
The thought of having to wait for a deep learning job to train for about 5 days is forgone. Our learners now have the privilege to access massive computing resources such as Rapids and NVIDIA CUDA to accelerate machine learning, analytics and deep learning pipelines on NVIDIA GPUs at no cost. Processes such as loading data, processing and training will be complete in a split second.
In addition to the infrastructure, learners will also have access to diverse and self-paced learning materials through the NVIDIA Deep Learning Institute. This will equip them with the right skills to advance their knowledge and competence in accelerated computing, accelerated data science, and AI, as well as set up end-to-end projects within a day. They will further have access to readily available large datasets which will make it really easy for them to learn.
We’re all looking forward to reaping incredible benefits as these resources continue to be utilized to the max!