• We are available for your help 24/7
  • Email: info@isindexing.com, submission@isindexing.com


Bonfring International Journal of Data Mining

Journal Papers (35) Details Call for Paper Manuscript submission Publication Ethics Contact Authors' Guide Line
1 Prediction of RNA Secondary Structure from Random Sequences using ZEM , Cinita Mary Mathew and G.H. Meera Krishna
The biological role of many RNA crucially depends on their structure. The in depth understanding of the secondary structure of RNA would provide a better insight in to their functionality. Predicting secondary structure of RNA is the most important factor in determining its 3d structure and functions. This work proposes a model for exploring the features of a number of RNA sequences simultaneously so that comparison of sequences can be made and relevant sequences can be identified. The proposed model accepts RNA sequences in any valid biological file format. For each given sequence, required number of random sequences are generated. The generated sequences should have the same base composition as that of original sequence. ZEM (Zuker?s Energy Minimization) Algorithm finds the biologically correct structure of each RNA sequence and its corresponding free energy value. The proposed prototype enables to experiment with a number of RNA sequences and to study their features so that biologically relevant inferences can be made. An important area where it finds application is in the design of pharmaceutical products.
2 Efficient Classification of Data Using Decision Tree , Bhaskar N. Patel, Satish G. Prajapati and Dr. Kamaljit I. Lakhtaria
Data classification means categorization of data into different category according to rules. The aim of this proposed research is to extract a kind of ?structure? from a sample of objects. To rephrase it better to learn a concise representation of these data. Present research performed over the classification algorithm learns from the training set and builds a model and that model is used to classify new objects. This paper discusses one of the most widely used supervised classification techniques is the decision tree. And perform own Decision Tree evaluate strength of own classification with Performance analysis and Results analysis.
3 Convergence of Optimization Problems , K. Jeyalakshmi
In this paper we consider a general optimization problem (OP) and study the convergence and approximation of optimal values and optimal solutions to changes in the cost function and the set of feasible solutions. We consider the convergence optimization problems under the familiar notion of uniform convergence. We do not assume the convexity of the functions involved. Instead we consider a class of functions whose directional derivatives are convex. They are known as locally convex functions or following Craven and Mond nearly convex functions. We given necessary preliminaries and we prove that a sequence of locally convex optimization problems converge to a locally convex problem. We also prove that uniform convergence of locally convex optimization problems implies epi-graph convergence of the problems. Even though for simplicity we have taken locally convex functions, the results given here can be proved for locally Lipchitz functions also.
4 On ? - generalized ? - Continuous Mappings in Topological Spaces , N. Kalaivani and G. Sai Sundara Krishnan
In this paper we introduce the concept of On ? - generalized ? - Continuous Mappings in Topological Spaces and study its relationship with other mappings. Further we declare the concepts of ?-? continuous mappings and ? -g ? continuous mappings which coincide when the space is ?-? T-1/2. In addition, we define the concept of ? -g ?-irresolute mappings in topological spaces; also we attain the relationships between ? -g ?-continuous and ? -g ?-irresolute mappings and obtain some of its basic properties.
5 Oscillation of Nonlinear Neutral Type Second Order Delay Difference Equations , E. Thandapani and M. Vijaya
In this paper we present some new oscillation criteria for second order nonlinear neutral type delay difference equation of the form Where, is a forward difference operator defined by ?x_n=x_(n+1)-x_n, n?N(n_0 )=(n_(0,) n_0+1,?),? n?_0 a nonnegative integer, k and are positive integers, ? and ? are the ratio of odd positive integers and are real sequences. Examples are provided to illustrate the results.
6 On Certain Classes of Analytic and Univalent Functions Based on Al-Oboudi Operator , T.V. Sudharsan and S.P. Vijayalakshmi
Following the works of [2, 4, 7, 9] of analytic and univalent functions in this paper we introduce two new classes etc., We have obtained coefficient estimates, growth & distortion theorems, extremal properties for these two classes. The determination of extreme points of a family of univalent functions leads to solve many extremal points.
7 Asymptotic Behavior Results for Nonlinear Impulsive Neutral Differential Equations with Positive and Negative Coefficients , S. Pandian and Y. Balachandran
This paper is focused on the following nonlinear impulsive neutral differential equation.., Sufficient conditions are obtained for every solution of (*) to tends to a constant as.,
8 A Class of Harmonic Meromorphic Functions of Complex Order , R. Ezhilarasi, K.G. Subramanian and T.V. Sudharsan
The seminal work of Clunie and Sheil-Small [3] on harmonic mappings gave rise to studies on subclasses of complex-valued harmonic univalent functions. In this paper a class of harmonic meromorphic functions of the form f(z)=h(z)+g(z),|z| > 1 of complex order is introduced. It is shown that the functions in this class are sense preserving and univalent outside the unit disk. Sufficient coefficient conditions are obtained for functions in this class which are also shown to be necessary when the co-analytic part g(z) has negative coefficients. We also obtain properties such as distortion bounds, extreme points, convolution and convex combination for this class.
9 On the Study of Risk Factors of Ca. Cervix and Ca. Breast: a Case Study in Assam , Lipi B. Mahanta, Dilip C. Nath and Nijara Rajbongshi
Ca.cervix and ca.breast are the most common life threatening cancers among women worldwide and the same is true for north east region of India also. So these two cancers remain a serious public health problem worldwide. Therefore more research work on the risk factors for ca.cervix and ca.breast is needed to better understand its etiology and pathogenesis. With this background this study is conducted to observe the possible factors such as socio-economic, marital etc., which may lead to the occurrence of these cancers in our region. For conducting this study we collect data by interviewing the patient registered in the B. Barooah cancer research institute which is the main sources of cancer data in Assam. Here we take 30 diagnosed cases each for ca.cervix and ca.breast and plot the data in a cross tabular form and analyze different factors and finally we try to abstract the conclusion from these tables. From the study it is revealed that there is strong evidence between the cancer type (ca.breast and ca.cervix) and the following risk factors: family income (p=.017), age at marriage (p=.031), age of the patients (p=.017), number of children (p=.001), age at first child birth (p=.003), oral contraceptive used (p=.028). It has been further observed that most of the patients are house wives and non vegetarian. Moreover it is seen that of the two types of cancer the Bengali population of the state are more afflicted by ca.breast whereas the Assamese population are more afflicted by ca.cervix.
10 A Modification on Linear Systematic Sampling for Odd Sample Size , J. Subramani
The present paper deals with a modification on the selection of linear systematic sample of odd size. Consequently the proposed method is called modified linear systematic sampling. The performances of the modified linear systematic sampling are assessed with that of simple random sampling and linear systematic sampling for certain hypothetical populations. As a result, it is observed that the modified systematic sample mean performs better than the simple random sample mean and the usual systematic sample mean for estimating the population mean in the presence of linear trend among the population values.
11 Construction of Graeco Sudoku Square Designs of Odd Orders , J. Subramani
The Sudoku puzzle typically consists of a nine-by-nine grid, in which some of the spaces contain numbers; most of the spaces are blank. The goal is to fill in the blanks with digits from 1 to 9 so that each row, each column, and each of the nine three-by-three blocks making up the grid contains just one of each of the nine digits. Recently Subramani and Ponnuswamy (2009) have considered the Sudoku puzzle as an experimental design and introduced the concept of Sudoku designs. The Sudoku designs are similar to that of latin square designs but accommodate some additional factors. The method of constructing the Sudoku square designs, analysis and applications are also given by Subramani and Ponnuswamy (2009). In this paper we have extended the Sudoku designs to Orthogonal (Graeco) Sudoku square designs in line with that of the Orthogonal (Graeco) latin square designs. A simple method of constructing Graeco Sudoku square designs (GSSD) of odd orders is presented. The proposed method is explained with the help of numerical examples.
12 A Study on the Bi-Rayleigh ROC Curve Model , Sudesh Pundir and R. Amala
Receiver Operating Characteristic (ROC) curves are used to describe and compare the accuracy of diagnostic test or the ability of a continuous biomarker in discriminating between the subjects into healthy or diseased cases in medical field. The most familiar form of ROC curve is Bi-normal (Gaussian) ROC curve model, which assumes that the test scores or a monotone transformation of the test scores are from two normal populations (i.e. healthy and diseased). It may not be true all the time, it may violate the assumptions of normal distribution in some situations and also we cannot adopt the model as it is when the sample size is small. In this paper, we have proposed ROC curve model for Rayleigh distribution which can be used even when sample size is small. The properties of Bi-Rayleigh ROC model are studied and Area Under the ROC Curve (AUC) are derived. The proposed model is supported by real life example as well as simulation studies. The confidence interval for the population parameter is studied with simulation studies of varying sample sizes. It is found that Bi-Rayleigh ROC model provides better accuracy of classification than the conventional bi-normal ROC model.
13 Statistical Evaluation of Diagnostic Tests , Vishnu Vardhan Rudravaram
The use of routine laboratory tests in diagnosing disease is becoming of increasing importance. This emphasizes to test the efficiency of diagnostic tests, since relatively few diagnostic tests correctly classify all subjects tested as diseased or well. The more usual situation is one in which some well subjects are classified as diseased and some diseased subjects classified as well. In this type of situation, Diagnostics and prognostic models serve the purpose. Diagnostic models are usually used for classification and quite commonly used in medical field. In this paper, importance of statistical classification procedures are highlighted which helps in the evaluation of diagnostic tests.
14 Estimation of Area under the ROC Curve Using Exponential and Weibull Distributions , R. Vishnu Vardhan, Sudesh Pundir and G. Sameera
In recent years the Receiver Operating Characteristic (ROC) curves received much attention in medical diagnosis for classifying the subjects into one of the two groups. Many researchers have provided the mathematical formulation of the curve by assuming some specific distribution. Conventionally, much work has been carried out by assuming normal distribution. In this paper, we focused on estimating the ROC Curve and Area Under the Curve (AUC) using Exponential and Weibull distributions. As Exponential and Weibull distributions are important in life testing problems, the performance of ROC forms of these distributions are studied and then results are compared with conventional Binormal ROC form. The entire study was done using real and simulated data sets. In a perspective it is proposed that ROC form of Binormal is far better than the other two and Biexponential is better than the Biweibull model of ROC curve.
15 Second Hankel Determinant for Certain Classes of Analytic Functions , G. Shanmugam, B. Adolf Stephen and K. G. Subramanian
Denote S to be the class of functions which are analytic, normalized and univalent in the open unit disc D = {z: |z| < 1}. The important sub classes of S are the class of starlike and convex functions, which we denote by S* and C. This paper focuses on attaining the sharp upper bounds for the functional.,
16 On Certain Subclasses of Analytic and Univalent Functions based on an Extension of Salagean Operator , T.V. Sudharsan and R. Vijaya
There is many subclasses of analytic and univalent functions. A class T of functions with negative coefficients introduced by Silverman [8] opened up a new and fruitful line of research in the theory of univalent functions. Following the works of Khairnar and Meena More [3], Aghalary and Kulkarni [1], Silverman and Silvia [8] and Owa and Nishiwaki [5] on analytic and univalent functions, in this paper we introduce two new classes for a family of analytic function with negative coefficients. We have attempted to obtain coefficient estimate, distortion theorem and extreme points for the class
17 Applications of Number Theory in Statistics , A.M.S. Ramasamy
There have been several fascinating applications of Number Theory in Statistics. The purpose of this survey paper is to highlight certain important such applications. Prime numbers constitute an interesting and challenging area of research in number theory. Diophantine equations form the central part of number theory. An equation requiring integral solutions is called a Diophantine equation. In the first part of this paper, some problems related to prime numbers and the role of Diophantine equations in Design Theory is discussed. The contribution of Fibonacci and Lucas numbers to a quasi-residual Metis design is explained. A famous problem related to finite fields is the Discrete Logarithm problem. In the second part of this paper, the structure of Discrete Logarithm is discussed.
18 Conditional Variables Double Sampling Plan for Weibull Distributed Lifetimes under Sudden Death Testing , S. Balamurali and J. Subramani
In this paper, we propose a conditional sampling plan called conditional double sampling plan for lot acceptance of parts whose life time follows a Weibull distribution with known shape parameter under sudden death testing. A table is also developed for the selection and application of optimal parameters of the proposed plan for specified two points on the operating characteristic curve namely the acceptable reliability level and the limiting reliability level along with the producer and consumer's risks. The optimization problem is formulated as a nonlinear programming where the objective function to be minimized is the average group number and the constraints are related to lot acceptance probabilities at acceptable reliability level and limiting reliability level under the operating characteristic curve .
19 Comparison of Estimators of Extreme Value Distributions for Wind Data Analysis , N. Vivekanandan
Estimation of extreme wind speed potential at a region is of importance while designing tall structures such as cooling towers, stacks, transmission line towers, etc. Assessment of wind speed can expediently be carried out by probabilistic modelling of historic wind speed data using extreme value distribution, or by using standard procedures available under BIS code of practices for building and structures. This paper illustrates the use of extreme value distributions such as Gumbel, Frechet and Weibull for modelling wind speed data recorded at Kanyakumari. Method of Least Squares (MLS), maximum likelihood method and order statistics approach are used for determination of parameters of the distributions. Kolmogorov-Smirnov test is used for checking the adequacy of fitting the method/ distribution to the recorded data. D-index is used for selection of suitable method/ distribution for estimation of design wind speed. The study shows that the Gumbel distribution using MLS is better suited for estimation of design wind speed for Kanyakumari. A comparative study of wind speed estimates obtained using MLS of Gumbel and with BIS code of practices is carried out; and results presented.
20 Factorial Dimensions of Employee Engagement in Public and Private Sector Banks , Dr.P. Amirtha Gowri and Dr.M. Mariammal
Employee engagement is the level of commitment and involvement an employee has towards his organization and its values. An engaged employee is aware of business context, and works with colleagues to improve performance within the job for the benefit of the organization. It is a positive attitude held by the employees towards the organization and its values. This paper focuses on the assessment of three factors namely 'Commitment', 'Salary and benefits' and 'Job satisfaction' which ultimately decide employee engagement in public and private sector banks. 55 respondents from public sector banks and another 55 respondents from private sector banks were selected at random by adopting convenient sampling technique. The factors that grounds employee engagement is identified and they are assessed with descriptive statistics like mean and standard deviation and Pearson's product moment correlation is applied to find out their inter-relationship with each other. One way ANOVA test is also used to find out the relationship between the employee engagement factors and the demographic and other variables of the bank employees. Besides, suitable suggestions are also promulgated for better employee engagement in public and private sector banks.
21 Investigation of Managers' Perception about Employees' Learning Aptitude, Muhammad Faisal Aziz
Corporate sector is facing a cut throat competition these days. This competitive environment gives very less time to managers and supervisors to train their employees and make them learnt to meet their job requirements. Secondly business world is changing every day; people observe new technology, new products, fluctuation in demand, globalization, diverse work force and competent colleagues in their organizations. Employees have to take self-initiative for learning technology, systems, procedures and behaviors to meet the challenges of this competitive corporate environment. That shows a dire need of learning aptitude in employees of corporate sector. Managers have different perceptions about employees' learning aptitude. The study is conducted on corporate sector of three cities in Sultanate of Oman. Findings revealed that corporate mangers of this region of the country have not positive perception about employees' learning aptitude. Data analysis concluded that many managers have an ample disagreement or they are neutral that employees have learning aptitude. The lack of learning aptitude is a constraint to organizational performance. A large number of employees are not interested in learning by self initiative. That causes a hurdle in their career growth and a decreasing trend in their job performance. A social awareness and use of active learning methods in colleges and universities are desirable to create a better learning aptitude in employees.
22 Comparison of Estimators of Gumbel Distribution for Modelling Wind Speed Data, N. Vivekanandan and Dr. S.K. Roy
Estimation of extreme wind speed potential at a region is of importance while designing tall structures such as cooling towers, stacks, transmission line towers, etc. Assessment of wind speed in a region can expediently be carried out by probabilistic modelling of historic wind speed data using an appropriate extreme value distribution. This paper illustrates the use of five parameter estimation methods of Gumbel distribution for modelling Hourly Maximum Wind Speed (HMWS) data recorded at Delhi and Visakhapatnam regions. Goodness-of-Fit (GoF) tests involving Anderson-Darling and Kolmogorov-Smirnov are used for checking the adequacy of fitting of the method to the recorded data. Root Mean Square Error (RMSE) is used for selection of a suitable method for determination of estimators of Gumbel distribution for modelling HMWS data. The results of GoF tests and RMSE shows that order statistics approach is better suited for estimation of design wind speed for the regions under study.
23 Probabilistic Modelling of Hourly Rainfall Data for Development of Intensity-Duration-Frequency Relationships, N. Vivekanandan
The rainfall Intensity-Duration-Frequency (IDF) relationship is commonly required for planning and designing of various water resources projects. The IDF relationship is a mathematical relationship between the rainfall intensity, duration and return period. This relationship is determined through statistical analysis of recorded rainfall data. In this paper, annual n-hourly maximum rainfall for different duration of 'n' such as 1-hour (hr), 2-hr, 3-hr, 6-hr, 12-hr, 18-hr, 24-hr, 48-hr and 72-hr are extracted from hourly rainfall data recorded at Kalingapatnam and Hissar; and further used for estimation of rainfall for different return periods. Order statistics approach is applied for determination of estimators of Gumbel distribution. Regional IDF relationships to estimate rainfall intensity for different return periods for Hissar and Kalingapatnam regions are developed and presented in the paper.
24 Assessment of Probable Maximum Precipitation Using Gumbel Distribution and Hershfield Method, N. Vivekanandan and Dr. S.K. Roy
Assessment of Probable Maximum Precipitation (PMP) has utmost importance for planning, design, management and risk analysis of hydraulic and other structures in a region. This paper details the procedures involved in estimation of Extreme Rainfall (ER) for Bhavnagar region using five parameter estimation methods of Gumbel distribution. Goodness-of-Fit test involving Kolmogorov-Smirnov (KS) statistics is used for checking the adequacy of fitting of the method for determination of parameters of the distribution. Root Mean Square Error (RMSE) is used for selection of a suitable method for estimation of ER. The paper presents that the Probability Weighted Moments (PWM) is better suited for modelling daily maximum rainfall and Order Statistics Approach (OSA) for 24-hour maximum rainfall for the region under study. The results obtained from Gumbel distribution are compared with PMP value given by Hershfield method. The study shows that the Mean+SE(where Mean denotes the estimated ER and SE the standard error) value of 1000-year return period one-day ER given by PWM may be considered for design purposes for Bhavnagar region.
25 A New Algorithm for Model Order Reduction of Interval Systems, D. Kranthi Kumar, S.K. Nagar and J.P. Tiwari
Mixed method of interval systems is a combination of classical reduction methods and stability preserving methods of interval systems. This paper proposed a new method for model order reduction of systems with uncertain parameters. The bounds on the uncertain parameters are known a priori. Two separate methods are used for finding parameters of the numerator and denominator. The numerator parameters are obtained by either of these methods such as differentiation method, factor division method, cauer second form, moment matching method or Pade approximation method. The denominator is obtained by the differentiation method in all the cases. A numerical example has been discussed to illustrate the procedures. From the above mixed methods, differentiation method and cauer second form as resulted in better approximation when compared with other methods. The errors between the original higher order and reduced order models have also been highlighted to support the effectiveness of the proposed methods.
26 Prediction of USD/JPY Exchange Rate Time Series Directional Status by KNN with Dynamic Time Warping AS Distance Function, ArashNegahdari Kia, Dr. SamanHaratizadeh and Dr. HadiZare
Exchange rate prediction is a challenging topic in the recent decade. Various studies have been done to improve the prediction regarding the accuracy in terms of level error and directional status error. The aim of this paper is to introduce a methodology that uses KNN (K-nearest neighbors) and DTW (dynamic time warping) to improve the fluctuation prediction and to have better evaluation parameters in the literature of financial market forecasting, comparing to other researches. The study is done with USD/JPY(United States Dollar/Japanese Yen) exchange rate time series and the results show improvement of prediction regarding the direction of time series. USD/JPY exchange rates are gathered from 1971 to 2012 and are partitioned into 30 element segments regarding the monthly cyclic behavior of the time series. Then two different set of these 30 element segments are divided with 7:3 ratio and the KNN is used to find out the 3 nearest neighbors regarding the DTW as similarity function. By a chosen function introduced also in this research, the directional status of the last element is predicted and the prediction result is then compared with other results in the literature of exchange rate prediction.
27 Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining, Alpa Reshamwala and Dr. Sunita Mahajan
Computer Systems are exposed to an increasing number of different types of security threats due to the expanding of internet in recent years. How to detect network intrusions effectively becomes an important security technique. Many intrusions aren?t composed by single events, but by a series of attack steps taken in chronological order. Analyzing the order in which events occur can improve the attack detection accuracy and reduce false alarms. Intrusion is a multi step process in which a number of events must occur sequentially in order to launch a successful attack. Intrusion detection using sequential pattern mining is a research topic focusing on the field of information security. Sequential Pattern Mining is used to discover the frequent sequential pattern in the event dataset. Sequential Pattern mining algorithms can be broadly classified into Apriori based, Pattern growth based and a combination of both. The first algorithm is based on the characteristic of Apriori and the second uses a pattern growth approach. The major drawback of the Apriori based algorithm is the multiple scans of the database, generating maximal patterns. In this paper, a simulation study of both the algorithms, a modified AprioriALL Algorithm to optimize the processing by including set theory techniques and the original AprioriALL algorithm is done on a network intrusion dataset from KDD cup 1999. Experimental results show that the modified algorithm shrinks the dataset size. At the most, it also scans the database twice. Also, as the interestingness of the itemset is increased with the dataset shrinking it leads to efficient sequences with high associativity. As the database is reduced, the time taken to mine sequences also reduces and is faster than Apriori based algorithm.
28 An Analytical Study on Early Diagnosis and Classification of Diabetes Mellitus, S. Peter
Diabetes mellitus (DM) is a chronic, general, life-threatening syndrome occurring all around the world. It is characterized by hyperglycemia occurring due to abnormalities in insulin secretion which would in turn result in irregular raise of glucose level. In recent years, the impact of Diabetes mellitus has increased to a great extent especially in developing countries like India. This is mainly due to the irregularities in the food habits of several IT professionals. Thus, early diagnosis and classification of this deadly disease has become an active area of research in the last decade. A number of techniques have been developed to deal with his disease. Numerous clustering and classifications techniques are available in the literature to visualize temporal data to identifying trends for controlling diabetes mellitus. This survey presents an analytical study of several algorithms which diagnosis and classifies Diabetes mellitus data effectively. The existing algorithms are analyzed thoroughly to identify their advantages and limitations. The performance evaluation of the existing algorithms is carried out to determine the best approach. A best approach among the existing approach is determined and a solution is also suggested to improve the overall performance of diagnosis process.
29 A Difference-Cum-Exponential Type Estimator for Estimating the Population Mean Under Stratification, H.S. Jhajj and L. Kusam Lata
In survey sampling, stratification is helpful in improving precision of estimators over simple random sampling in case of heterogeneous population. In the present paper, a difference-cum exponential type estimator of population mean under two phase stratified random sampling design has been proposed for the case of heterogeneous population. The expressions for bias and mean squared error of the proposed estimator have been obtained up to first order of approximation. It has been shown that proposed estimator is efficient than linear regression estimator under the same sampling design for some range of variation in the values of constants involved. The results obtained have also been illustrated numerically as well as graphically by taking data from the population considered in the literature
30 Hankel Determinant for a Subclass of Alpha Convex Functions, Gagandeep Singh and Gurcharanjit Singh
In the present investigation, the upper bound of second Hankel determinant for functions belonging to the subclass of analytic functions is studied. Results presented in this paper would extend the corresponding results of various authors
31 New Approach to Solve Fuzzy Linear Programming Problems by the Ranking Function, A. Karpagam and Dr.P. Sumathi
In this paper, a new method is proposed to find the fuzzy optimal solution of fully fuzzy linear programming problems with triangular fuzzy numbers. A computational method for solving fully fuzzy linear programming problems (FFLPP) is proposed, based upon the new Ranking function. The proposed method is very easy to understand and to apply for fully fuzzy linear programming problems occurring in real life situations as compared to the existing methods. To illustrate the proposed method numerical examples are solved
32 Consensus Clustering for Microarray Gene Expression Data, Selvamani Muthukalathi, Ravanan Ramanujam and Anbupalam Thalamuthu
Cluster analysis in microarray gene expression studies is used to find groups of correlated and co-regulated genes. Several clustering algorithms are available in the literature. However no single algorithm is optimal for data generated under different technological platforms and experimental conditions. It is possible to combine several clustering methods and solutions using an ensemble approach. The method also known as consensus clustering is used here to examine the robustness of cluster solutions from several different algorithms. The method proposed here also is useful for estimating the number of clusters in a dataset. Here we examine the properties of consensus clustering using real and simulated datasets
33 RST Approach for Efficient CARs Mining, Thabet Slimani
In data mining, an association rule is a pattern that states the occurrence of two items (premises and consequences) together with certain probability. A class association rule set (CARs) is a subset of association rules with classes specified as their consequences. This paper focuses on class association rules mining based on the approach of Rough Set Theory (RST). In addition, this paper presents an algorithm for finest class rule set mining inspired from Apriori algorithm, where the support and confidence are computed based on the elementary set of lower approximation inspired from RST. The proposed approach has been shown very effective, where the rough set approach for class association discovery is much simpler than the classic association method
34 Data Integration in Big Data Environment, B. Arputhamary and L. Arockiam
Data Integration is the process of transferring the data in source format into the destination format. Many data warehousing and data management approaches has been supported by integration tools for data migration and transportation by using Extract-Transform-Load (ETL) approach. These tools are widely fit for handling large volumes of data and not flexible to handle semi or unstructured data. To overcome these challenges in big data world, programmatically driven parallel techniques such as map-reduce models were introduced. Data Integration as a process is highly cumbersome and iterative especially to add new data sources. The process of adding these new data sources are time consuming which results in delay, loss of data and irrelevance of the data and improper utilization of useful information. Traditionally waterfall approach is used in EDW (Enterprise Data Warehouse), where one cannot move to the next phase before completing the earlier one. This approach has its merits to ensure the right data sources are picked and right data integration processes are developed to sustain the usefulness of EDW. In big data environment, the situation is completely different. Therefore the traditional approaches of integration are inefficient in handling the current situation. So people are expected to do something regarding this issue. In this paper the importance of data integration in Big Data world are identified and the open problems of Big Data Integration are outlined to proceed future research in Big Data environment.
35 Health and Safety Measures in Chettinad Cement Corporation Limited, Karur, Dr.G. Yoganandan and G. Sivasamy
The cement industry plays an important role in the construction and engineering industry. This study aimed at finding out the views and awareness workers on health and safety measures in Chettinad Cement Corporation Limited, Karur. The various welfare measures provided by the employer will have immediate impact on the health, physical and mental efficiency of the employees. The sample size was 319. The tools used in this research are percentage analysis, chi-square, t-test and factor analysis. This study found that majority of the employees belong to the age group of 31-40 year and there is a significance relationship between experience and their perception on health and safety measures in Chettinad Cement Corporation Limited and there is a significance relationship between designation and the workers perception on overall facilities. The study suggested that the organization need to increase salary to the employees, take appropriate measure to reduce the air pollution caused by the manufacturing operation and also through other measures like planting trees and using air filters